Re: [HACKERS] pg_upgrade segfaults when given an invalid PGSERVICE value
On 13-03-26 12:40 AM, Tom Lane wrote: Bruce Momjian br...@momjian.us writes: On Mon, Mar 25, 2013 at 07:07:42PM -0400, Tom Lane wrote: Well, plan B would be to invent a replacement function that does have the ability to return an error message, but that seems like a lot of work for a problem that's so marginal that it wasn't noticed till now. (It's not so much creating the function that worries me, it's fixing clients to use it.) Plan C would be to redefine bogus value of PGSERVICE as not an error, period. Given all of these poor options, is defining a PQconndefaults() as perhaps out of memory or a service file problem really not better? Uh ... no. In the first place, what evidence have you got that those are (and will continue to be) the only two possible causes? In the second place, this still requires changing every client of PQconndefaults(), even if it's only to the extent of fixing their error message texts. If we're going to do that, I'd rather ask them to change to a more future-proof solution. So to summarise: Plan A: The first patch I attached for pg_upgrade + documentation changes, and changing the other places that call PQconndefaults() to accept failures on either out of memory, or an invalid PGSERVICE Plan B: Create a new function PQconndefaults2(char * errorBuffer) or something similar that returned error information to the caller. Plan C: PQconndefaults() just ignores an invalid service but connection attempts fail because other callers of conninfo_add_defaults still pay attention to connection failures. This is the second patch I sent. Plan D: Service lookup failures are always ignored by conninfo_add_defaults. If you attempt to connect with a bad PGSERVICE set it will behave as if no PGSERVICE value was set. I don't think anyone explicitly proposed this yet. Plan 'D' is the only option that I'm opposed to, it will effect a lot more applications then ones that call PQconndefaults() and I feel it will confuse users. I'm not convinced plan B is worth the effort of having to maintain two versions of PQconndefaults() for a while to fix a corner case. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_upgrade segfaults when given an invalid PGSERVICE value
On 13-03-20 05:54 PM, Tom Lane wrote: Steve Singer ssin...@ca.afilias.info writes: From a end-user expectations point of view I am okay with somehow marking the structure returned by PQconndefaults in a way that the connect calls will later fail. Unless the program changes the value of PGSERVICE, surely all subsequent connection attempts will fail for the same reason, regardless of what PQconndefaults returns? regards, tom lane So your proposing we do something like the attached patch? Where we change conninfo_add_defaults to ignore an invalid PGSERVICE if being called by PQconndefaults() but keep the existing behaviour in other contexts where it is actually being used to establish a connection? In this case even if someone takes the result of PQconndefaults and uses that to build connection options for a new connection it should fail when it does the pgservice lookup when establishing the connection. That sounds reasonable to me. Steve diff --git a/contrib/pg_upgrade/server.c b/contrib/pg_upgrade/server.c index ed67759..c1d8d69 100644 *** a/contrib/pg_upgrade/server.c --- b/contrib/pg_upgrade/server.c *** check_pghost_envvar(void) *** 300,306 /* Get valid libpq env vars from the PQconndefaults function */ start = PQconndefaults(); ! for (option = start; option-keyword != NULL; option++) { if (option-envvar (strcmp(option-envvar, PGHOST) == 0 || --- 300,309 /* Get valid libpq env vars from the PQconndefaults function */ start = PQconndefaults(); ! if (start == NULL) ! { ! pg_log(PG_FATAL,can not get default connection options out of memory\n); ! } for (option = start; option-keyword != NULL; option++) { if (option-envvar (strcmp(option-envvar, PGHOST) == 0 || diff --git a/src/interfaces/libpq/fe-connect.c b/src/interfaces/libpq/fe-connect.c index eea9c6b..0a9a29e 100644 *** a/src/interfaces/libpq/fe-connect.c --- b/src/interfaces/libpq/fe-connect.c *** static PQconninfoOption *conninfo_array_ *** 347,353 const char *const * values, PQExpBuffer errorMessage, bool use_defaults, int expand_dbname); static bool conninfo_add_defaults(PQconninfoOption *options, ! PQExpBuffer errorMessage); static PQconninfoOption *conninfo_uri_parse(const char *uri, PQExpBuffer errorMessage, bool use_defaults); static bool conninfo_uri_parse_options(PQconninfoOption *options, --- 347,354 const char *const * values, PQExpBuffer errorMessage, bool use_defaults, int expand_dbname); static bool conninfo_add_defaults(PQconninfoOption *options, ! PQExpBuffer errorMessage, ! bool ignore_missing_service); static PQconninfoOption *conninfo_uri_parse(const char *uri, PQExpBuffer errorMessage, bool use_defaults); static bool conninfo_uri_parse_options(PQconninfoOption *options, *** PQconndefaults(void) *** 875,881 connOptions = conninfo_init(errorBuf); if (connOptions != NULL) { ! if (!conninfo_add_defaults(connOptions, errorBuf)) { PQconninfoFree(connOptions); connOptions = NULL; --- 876,882 connOptions = conninfo_init(errorBuf); if (connOptions != NULL) { ! if (!conninfo_add_defaults(connOptions, errorBuf,true)) { PQconninfoFree(connOptions); connOptions = NULL; *** conninfo_parse(const char *conninfo, PQE *** 4243,4249 */ if (use_defaults) { ! if (!conninfo_add_defaults(options, errorMessage)) { PQconninfoFree(options); return NULL; --- 4244,4250 */ if (use_defaults) { ! if (!conninfo_add_defaults(options, errorMessage,false)) { PQconninfoFree(options); return NULL; *** conninfo_array_parse(const char *const * *** 4395,4401 */ if (use_defaults) { ! if (!conninfo_add_defaults(options, errorMessage)) { PQconninfoFree(options); return NULL; --- 4396,4402 */ if (use_defaults) { ! if (!conninfo_add_defaults(options, errorMessage,false)) { PQconninfoFree(options); return NULL; *** conninfo_array_parse(const char *const * *** 4416,4422 * error condition here --- we just leave the option's value as NULL. */ static bool ! conninfo_add_defaults(PQconninfoOption *options, PQExpBuffer errorMessage) { PQconninfoOption *option; char *tmp; --- 4417,4424 * error condition here --- we just leave the option's value as NULL. */ static bool ! conninfo_add_defaults(PQconninfoOption *options, PQExpBuffer errorMessage, ! bool ignore_missing_service) { PQconninfoOption *option; char *tmp; *** conninfo_add_defaults(PQconninfoOption * *** 4425,4431 * If there's a service spec, use it to obtain any not-explicitly-given * parameters. */ ! if (parseServiceInfo(options, errorMessage) != 0) return false
Re: [HACKERS] pg_upgrade segfaults when given an invalid PGSERVICE value
On 13-03-20 02:17 PM, Bruce Momjian wrote: On Wed, Mar 20, 2013 at 01:30:20PM -0400, Tom Lane wrote: While this surely isn't the nicest answer, it doesn't seem totally unreasonable to me. A bad service name indeed does not contribute anything to the set of defaults available. I think the concern is that the services file could easily change the defaults that are used for connecting, though you could argue that the real defaults for a bad service entry are properly returned. Yes, my concern is that if I have a typo in the value of PGSERVICE I don't want to end up getting connected a connection to localhost instead of an error. From a end-user expectations point of view I am okay with somehow marking the structure returned by PQconndefaults in a way that the connect calls will later fail. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_upgrade segfaults when given an invalid PGSERVICE value
On 13-03-18 09:17 PM, Bruce Momjian wrote: On Mon, Mar 18, 2013 at 12:08:09PM -0400, Steve Singer wrote: If you try running pg_upgrade with the PGSERVICE environment variable set to some invalid/non-existent service pg_upgrade segfaults Program received signal SIGSEGV, Segmentation fault. 0x0040bdd1 in check_pghost_envvar () at server.c:304 304 for (option = start; option-keyword != NULL; option++) (gdb) p start $5 = (PQconninfoOption *) 0x0 PQconndefaults can return NULL if it has issues. The attached patch prints a minimally useful error message. I don't a good way of getting a more useful error message out of PQconndefaults() I checked this against master but it was reported to me as a issue in 9.2 Well, that's interesting. There is no mention of PQconndefaults() returning NULL except for out of memory: Returns a connection options array. This can be used to determine all possible functionPQconnectdb/function options and their current default values. The return value points to an array of structnamePQconninfoOption/structname structures, which ends with an entry having a null structfieldkeyword/ pointer. The --null pointer is returned if memory could not be allocated. Note that the current default values (structfieldval/structfield fields) will depend on environment variables and other context. Callers must treat the connection options data as read-only. Looking at libpq/fe-connect.c::PQconndefaults(), it calls conninfo_add_defaults(), which has this: /* * If there's a service spec, use it to obtain any not-explicitly-given * parameters. */ if (parseServiceInfo(options, errorMessage) != 0) return false; so it is clearly possible for PQconndefaults() to return NULL for service file failures. The questions are: * Is this what we want? What other choices do we have? I don't think PQconndefaults() should continue on as if PGSERVICE wasn't set in the environment after a failure from parseServiceInfo. * Should we document this? Yes the documentation should indicate that PQconndefaults() can return NULL for more than just memory failures. * Should we change this to just throw a warning? How would we communicate warnings from PQconndefaults() back to the caller? Also, it seems pg_upgrade isn't the only utility that is confused: contrib/postgres_fdw/option.c and contrib/dblink/dblink.c think PQconndefaults() returning NULL means out of memory and report that as the error string. bin/scripts/pg_isready.c and contrib/pg_upgrade/server.c have no check for NULL return. libpq/test/uri-regress.c knows to throw a generic error message. So, we have some decisions and work to do. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] pg_upgrade segfaults when given an invalid PGSERVICE value
If you try running pg_upgrade with the PGSERVICE environment variable set to some invalid/non-existent service pg_upgrade segfaults Program received signal SIGSEGV, Segmentation fault. 0x0040bdd1 in check_pghost_envvar () at server.c:304 304 for (option = start; option-keyword != NULL; option++) (gdb) p start $5 = (PQconninfoOption *) 0x0 PQconndefaults can return NULL if it has issues. The attached patch prints a minimally useful error message. I don't a good way of getting a more useful error message out of PQconndefaults() I checked this against master but it was reported to me as a issue in 9.2 Steve diff --git a/contrib/pg_upgrade/server.c b/contrib/pg_upgrade/server.c index ed67759..f2e4d63 100644 *** a/contrib/pg_upgrade/server.c --- b/contrib/pg_upgrade/server.c *** check_pghost_envvar(void) *** 300,306 /* Get valid libpq env vars from the PQconndefaults function */ start = PQconndefaults(); ! for (option = start; option-keyword != NULL; option++) { if (option-envvar (strcmp(option-envvar, PGHOST) == 0 || --- 300,309 /* Get valid libpq env vars from the PQconndefaults function */ start = PQconndefaults(); ! if (start == NULL) ! { ! pg_log(PG_FATAL,can not get default connection options\n); ! } for (option = start; option-keyword != NULL; option++) { if (option-envvar (strcmp(option-envvar, PGHOST) == 0 || -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] transforms
On 13-03-03 08:13 PM, Josh Berkus wrote: This (creating the extensions) works fine for me on a Ubuntu 10.x system Now if only we can work out the combinatorics issue ... The plpython2u extensions worked fine but I haven't been able to get this to work with plpython3u (python 3.1). create extension hstore_plpython3u; ERROR: could not load library /usr/local/pgsql93git/lib/hstore_plpython3.so: /usr/local/pgsql93git/lib/hstore_plpython3.so: undefined symbol: _Py_NoneStruct Peter mentioned in the submission that the unit tests don't pass with python3, it doesn't work for meoutside the tests either. Steve -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] transforms
On 13-03-03 06:15 PM, Josh Berkus wrote: transforms=# create extension hstore_plperl; ERROR: could not load library /home/josh/pg93/lib/postgresql/hstore_plperl.so: /home/josh/pg93/lib/postgresql/hstore_plperl.so: undefined symbol: hstoreUniquePairs STATEMENT: create extension hstore_plperl; This surprised me, because make check for the extensions passed fine. Oh, this is on Ubuntu 12.10, not OSX. So possibly the fixes you made to fix linking on OSX broke other platforms. This (creating the extensions) works fine for me on a Ubuntu 10.x system template1=# create database test; CREATE DATABASE template1=# \c test You are now connected to database test as user ssinger. test=# create extension hstore; CREATE EXTENSION test=# create extension hstore_plpythonu; ERROR: required extension plpythonu is not installed STATEMENT: create extension hstore_plpythonu; ERROR: required extension plpythonu is not installed test=# create extension plpythonu; CREATE EXTENSION test=# create extension hstore_plpythonu; CREATE EXTENSION test=# test=# create extension plperl; CREATE EXTENSION test=# create extension hstore_plperl; CREATE EXTENSION test=# create extension plperlu; CREATE EXTENSION test=# create extension hstore_plperlu; CREATE EXTENSION test=# -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] json generation enhancements
On 13-02-25 05:37 PM, Andrew Dunstan wrote: On 02/24/2013 01:09 AM, Steve Singer wrote: On 13-01-11 11:03 AM, Andrew Dunstan wrote: On 01/11/2013 11:00 AM, Andrew Dunstan wrote: I have not had anyone follow up on this, so I have added docs and will add this to the commitfest. Recap: This adds the following: json_agg(anyrecord) - json to_json(any) - json hstore_to_json(hstore) - json (also used as a cast) hstore_to_json_loose(hstore) - json Also, in json generation, if any non-builtin type has a cast to json, that function is used instead of the type's output function. This time with a patch. Here is a review of this patch., Overview - This patch adds a set of functions to convert types to json. Specifically * An aggregate that take the elements and builds up a json array. * Conversions from an hstore to json For non-builtin types the text conversion is used unless a cast has specifically been defined from the type to json. There was some discussion last year on this patch that raised three issues a) Robert was concerned that if someone added a cast from a non-core type like citext to json that it would change the behaviour of this function. No one else offered an opinion on this at the time. I don't see this as a problem, if I add a cast between citext (or any other non-core datatype) to json I would expect it to effect how that datatype is generated as a json object in any functions that generate json. I don't see why this would be surprising behaviour. If I add a cast between my datatype and json to generate a json representation that differs from the textout representation then I would expect that representation to be used in the json_agg function as well. b) There was some talk in the json bikeshedding thread that json_agg() should be renamed to collect_json() but more people preferred json_agg(). c) Should an aggregate of an empty result produce NULL or an empty json element. Most people who commented on the thread seemed to feel that NULL is preferred because it is consistent with other aggregates I feel that the functionality of this patch is fine. Testing --- When I try running the regression tests with this patch I get: creating template1 database in /usr/local/src/postgresql/src/test/regress/./tmp_check/data/base/1 ... FATAL: could not create unique index pg_proc_oid_index DETAIL: Key (oid)=(3171) is duplicated. child process exited with exit code 1 oid 3170, 3171 and 3172 appears to be used by lo_tell64 and lo_lseek64 If I fix those up the regression tests pass. I tried using the new functions in a few different ways and everything worked as expected. Code Review --- in contrib/hstore/hstore_io.c + /* not an int - try a double */ + double doubleres = strtod(src-data,endptr); + if (*endptr == '\0') + is_number = true; + else if (false) + { + /* shut the compiler up about unused variables */ + longres = (long) doubleres; + longres = longres / 2; I dislike that we have to do this to avoid compiler warnings. I am also worried the some other compiler might decide that the else if (false) is worthy of a warning. Does anyone have cleaner way of getting rid of the warning we get if we don't store the strtol/strtod result? Should we do something like: (void) ( strtol(src-data,endptr,10)+1); Other than that I don't see any issues. Thanks for the review. Revised patch attached with fresh OIDs and numeric detection code cleaned up along the lines you suggest. The opr_test unit test still fails. I think you forgot to include your update to pg_aggregate.h. See the attached diff. Other than that it looks fine, Craig is satisfied with the casting behaviour and no one else has objected to it. I think your good to commit this Steve cheers andrew diff --git a/src/include/catalog/pg_aggregate.h b/src/include/catalog/pg_aggregate.h index 3c5f59b..6fb10a9 100644 *** a/src/include/catalog/pg_aggregate.h --- b/src/include/catalog/pg_aggregate.h *** DATA(insert ( 3538 string_agg_transfn st *** 233,239 DATA(insert ( 3545 bytea_string_agg_transfn bytea_string_agg_finalfn 0 2281 _null_ )); /* json */ ! DATA(insert ( 3172 json_agg_transfn json_agg_finalfn 0 2281 _null_ )); /* * prototypes for functions in pg_aggregate.c --- 233,239 DATA(insert ( 3545 bytea_string_agg_transfn bytea_string_agg_finalfn 0 2281 _null_ )); /* json */ ! DATA(insert ( 3175 json_agg_transfn json_agg_finalfn 0 2281 _null_ )); /* * prototypes for functions
Re: [HACKERS] json generation enhancements
On 13-01-11 11:03 AM, Andrew Dunstan wrote: On 01/11/2013 11:00 AM, Andrew Dunstan wrote: I have not had anyone follow up on this, so I have added docs and will add this to the commitfest. Recap: This adds the following: json_agg(anyrecord) - json to_json(any) - json hstore_to_json(hstore) - json (also used as a cast) hstore_to_json_loose(hstore) - json Also, in json generation, if any non-builtin type has a cast to json, that function is used instead of the type's output function. This time with a patch. Here is a review of this patch., Overview - This patch adds a set of functions to convert types to json. Specifically * An aggregate that take the elements and builds up a json array. * Conversions from an hstore to json For non-builtin types the text conversion is used unless a cast has specifically been defined from the type to json. There was some discussion last year on this patch that raised three issues a) Robert was concerned that if someone added a cast from a non-core type like citext to json that it would change the behaviour of this function. No one else offered an opinion on this at the time. I don't see this as a problem, if I add a cast between citext (or any other non-core datatype) to json I would expect it to effect how that datatype is generated as a json object in any functions that generate json. I don't see why this would be surprising behaviour. If I add a cast between my datatype and json to generate a json representation that differs from the textout representation then I would expect that representation to be used in the json_agg function as well. b) There was some talk in the json bikeshedding thread that json_agg() should be renamed to collect_json() but more people preferred json_agg(). c) Should an aggregate of an empty result produce NULL or an empty json element. Most people who commented on the thread seemed to feel that NULL is preferred because it is consistent with other aggregates I feel that the functionality of this patch is fine. Testing --- When I try running the regression tests with this patch I get: creating template1 database in /usr/local/src/postgresql/src/test/regress/./tmp_check/data/base/1 ... FATAL: could not create unique index pg_proc_oid_index DETAIL: Key (oid)=(3171) is duplicated. child process exited with exit code 1 oid 3170, 3171 and 3172 appears to be used by lo_tell64 and lo_lseek64 If I fix those up the regression tests pass. I tried using the new functions in a few different ways and everything worked as expected. Code Review --- in contrib/hstore/hstore_io.c + /* not an int - try a double */ + double doubleres = strtod(src-data,endptr); + if (*endptr == '\0') + is_number = true; + else if (false) + { + /* shut the compiler up about unused variables */ + longres = (long) doubleres; + longres = longres / 2; I dislike that we have to do this to avoid compiler warnings. I am also worried the some other compiler might decide that the else if (false) is worthy of a warning. Does anyone have cleaner way of getting rid of the warning we get if we don't store the strtol/strtod result? Should we do something like: (void) ( strtol(src-data,endptr,10)+1); Other than that I don't see any issues. Steve cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PL/Python result object str handler
On 13-01-07 09:58 PM, Peter Eisentraut wrote: By implementing a str handler for the result object, it now prints something like PLyResult status=5 nrows=2 rows=[{'foo': 1, 'bar': '11'}, {'foo': 2, 'bar': '22'}] Patch attached for review. Here is a review: This patch adds a function that pl/python functions can call to convert a query result hash into a string suitable for debug purposes. The use case for this feature is primarily for debugging and logging purposes. I feel that this is useful since a lot of debugging of stored functions is usually done with print/elog style debugging. There already some discussion on the thread as if the number of rows printed should be limited, the consensus seemed to be 'no' since someone would be unhappy with any limit and printing everything is the same behaviour you get with the standard python print. I've tested this with python2.6 and 3.1 and it seems to work as described. I've looked through the code and everything looks fine. The patch includes no documentation. Adding a few lines to the Utility Functions section of the plpython documentation so people know about this feature would be good. Other than that I think it is fine to commit. I am setting this as ready for committer, I assume you'll commit this yourself and that you can add a paragraph to the docs as you commit it. Steve -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] logical changeset generation v4 - Heikki's thoughts about the patch state
On 13-01-28 06:17 AM, Andres Freund wrote: Hi, 3. Pass the delete (with no key values) onto the replication client and let it deal with it (see 1 and 2) Hm. While I agree that nicer behaviour would be good I think the real enforcement should happen on a higher level, e.g. with event triggers or somesuch. It seems way too late to do anything about it when we're already decoding. The transaction will already have committed... Ideally the first line of enforcement would be with event triggers. The thing with user-level mechanisms for enforcing things is that they sometimes can be disabled or by-passed. I don't have a lot of sympathy for people who do this but I like the idea of at least having the option coding defensively to detect the situation and whine to the user. How do you plan on dealing with sequences? I don't see my plugin being called on sequence changes and I don't see XLOG_SEQ_LOG listed in DecodeRecordIntoReorderBuffer. Is there a reason why this can't be easily added? I basically was hoping for Simon's sequence-am to get in before doing anything real here. That didn't really happen yet. I am not sure whether there's a real usecase in decoding normal XLOG_SEQ_LOG records, their content isn't all that easy to interpet unless youre rather familiar with pg's innards. So, adding support wouldn't hard from a technical pov but it seems the semantics are a bit hard to nail down. Also what do we want to do about TRUNCATE support. I could always leave a TRUNCATE trigger in place that logged the truncate to a sl_truncates and have my replication daemon respond to the insert on a sl_truncates table by actually truncating the data on the replica. I have planned to add some generic table_rewrite handling, but I have to admit I haven't thought too much about it yet. Currently if somebody rewrites a table, e.g. with an ALTER ... ADD COLUMN .. DEFAULT .. or ALTER COLUMN ... USING ..., you will see INSERTs into a temporary table. That basically seems to be a good thing, but the user needs to be told about that ;) I've spent some time this weekend updating my prototype plugin that generates slony 2.2 style COPY output. I have attached my progress here (also https://github.com/ssinger/slony1-engine/tree/logical_repl). I have not gotten as far as modifying slon to act as a logical log receiver, or made a version of the slony apply trigger that would process these changes. I only gave it a quick look and have a couple of questions and remarks. The way you used the options it looks like youre thinking of specifying all the tables as options? I would have thought those would get stored queried locally and only something like the 'replication set' name or such would be set as an option. The way slony works today is that the list of tables to pull for a SYNC comes from the subscriber because the subscriber might be behind the provider, where a table has been removed from the set in the meantime. The subscriber still needs to receive data from that table until it is caught up to the point where that removal happens. Having a time-travelled version of a user table (sl_table) might fix that problem but I haven't yet figured out how that needs to work with cascading (since that is a feature of slony today I can't ignore the problem). I'm also not sure how that will work with table renames. Today if the user renames a table inside of an EXECUTE SCRIPT slony will update the name of the table in sl_table. This type of change wouldn't be visible (yet) in the time-travelled catalog. There might be a solution to this yet but I haven't figured out it. Sticking with what slony does today seemed easier as a first step. Iterating over a list with for(i = 0; i options-length; i= i + 2 ) { DefElem * def_schema = (DefElem*) list_nth(options,i); is not a good idea btw, thats quadratic in complexity ;) Thanks I'll rewrite this to walk a list of ListCell objects with next. In the REORDER_BUFFER_CHANGE_UPDATE I suggest using relation-rd_primary, just as in the DELETE case, that should always give you a consistent candidate key in an efficient manner. I haven't looked into the details of what is involved in setting up a subscription with the snapshot exporting. That hopefully shouldn't be too hard... At least thats the idea :P I couldn't get the options on the START REPLICATION command to parse so I just hard coded some list building code in the init method. I do plan on pasing the list of tables to replicate from the replica to the plugin (because this list comes from the replica). Passing what could be a few thousand table names as a list of arguments is a bit ugly and I admit my list processing code is rough. Does this make us want to reconsider the format of the option_list ? Yea, something's screwed up there, sorry. Will push a fix later today. I guess should provide an opinion on if I think that the patch in this CF, if committed
Re: [HACKERS] logical changeset generation v4
On 13-01-28 06:23 AM, Andres Freund wrote: The CF is also there to find UI warts and such, so something like this seems perfectly fine. Even moreso as it doesn't look this will get into 9.3 anyway. I wanted to add such an option, but I was too lazy^Wbusy to think about the sematics. Your current syntax doesn't really allow arguments to be specified in a nice way. I was thinking of -o name=value and allowing multiple specifications of -o to build the option string. Any arguments against that? Multiple -o options sound fine to me. /* Initiate the replication stream at specified location */ - snprintf(query, sizeof(query), START_LOGICAL_REPLICATION '%s' %X/%X, -slot, (uint32) (startpos 32), (uint32) startpos); + snprintf(query, sizeof(query), START_LOGICAL_REPLICATION '%s' %X/%X (%s), +slot, (uint32) (startpos 32), (uint32) startpos,plugin_opts); ISTM that (%s) shouldn't be specified when there are no options, but as the options need to be pre-escaped anyway, that looks like a non-problem in a bit more complete implementation. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Event Triggers: adding information
On 13-01-26 11:11 PM, Robert Haas wrote: On Fri, Jan 25, 2013 at 11:58 AM, Dimitri Fontaine dimi...@2ndquadrant.fr wrote: My understanding is that if the command string we give to event triggers is ambiguous (sub-object names, schema qualifications, etc), it comes useless for logical replication use. I'll leave it to the consumers of that to speak up now. Yeah, that's probably true. I think it might be useful for other purposes, but I think we need a bunch of infrastructure we don't have yet to make logical replication of DDL a reality. I agree. Does anyone have a specific use case other than DDL replication where an ambiguous command string would be useful? Even for use cases like automatically removing a table from replication when it is dropped, I would want to be able to determine which table is being dropped unambiguously. Could I determine that from an oid? I suspect so, but parsing a command string and then trying to figure out the table from the search_path doesn't sound very appealing. Well, the point is that if you have a function that maps a parse tree onto an object name, any API or ABI changes can be reflected in an updated definition for that function. So suppose I have the command CREATE TABLE public.foo (a int). And we have a call pg_target_object_namespace(), which will return public given the parse tree for the foregoing command. And we have a call pg_target_object_name(), which will return foo. We can whack around the underlying parse tree representation all we want and still not break anything - because any imaginable parse tree representation will allow the object name and object namespace to be extracted. Were that not possible it could scarcely be called a parse tree any longer. How do you get the fully qualified type of the first column? col1=pg_target_get_column(x, 0) pg_target_get_type(col1); or something similar. I think that could work but we would be adding a lot of API functions to get all the various bits of info one would want the API to expose. I also suspect executing triggers that had to make lots of function calls to walk a tree would be much slower than an extension that could just walk the parse-tree or some other abstract tree like structure. Steve -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] logical changeset generation v4
On 13-01-22 11:30 AM, Andres Freund wrote: Hi, I pushed a new rebased version (the xlogreader commit made it annoying to merge). The main improvements are * way much coherent code internally for intializing logical rep * explicit control over slots * options for logical replication Exactly what is the syntax for using that. My reading your changes to repl_gram.y make me think that any of the following should work (but they don't). START_LOGICAL_REPLICATION 'slon1' 0/0 ('opt1') ERROR: syntax error: unexpected character ( START_LOGICAL_REPLICATION 'slon1' 0/0 ('opt1' 'val1') ERROR: syntax error: unexpected character ( START_LOGICAL_REPLICATION 'slon1' 0/0 ('opt1','opt2') ERROR: syntax error: unexpected character ( I'm also attaching a patch to pg_receivellog that allows you to specify these options on the command line. I'm not saying I think that it is appropriate to be adding more bells and whistles to the utilities two weeks into the CF but I found this useful for testing so I'm sharing it. Greetings, Andres Freund -- Andres Freundhttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services From 176087bacec6cbf0b86e4ffeb918f41b4a5b8d7a Mon Sep 17 00:00:00 2001 From: Steve Singer ssin...@ca.afilias.info Date: Sun, 27 Jan 2013 12:24:33 -0500 Subject: [PATCH] allow pg_receivellog to pass plugin options from the command line to the plugin --- src/bin/pg_basebackup/pg_receivellog.c | 14 ++ 1 files changed, 10 insertions(+), 4 deletions(-) diff --git a/src/bin/pg_basebackup/pg_receivellog.c b/src/bin/pg_basebackup/pg_receivellog.c index 04bedbe..30b3cea 100644 --- a/src/bin/pg_basebackup/pg_receivellog.c +++ b/src/bin/pg_basebackup/pg_receivellog.c @@ -54,7 +54,7 @@ static XLogRecPtr startpos; static bool do_init_slot = false; static bool do_start_slot = false; static bool do_stop_slot = false; - +static const char * plugin_opts=; static void usage(void); static void StreamLog(); @@ -84,6 +84,7 @@ usage(void) printf(_( -s, --status-interval=INTERVAL\n time between status packets sent to server (in seconds)\n)); printf(_( -S, --slot=SLOTuse existing replication slot SLOT instead of starting a new one\n)); + printf(_( -o --options=OPTIONS A comma separated list of options to the plugin\n)); printf(_(\nAction to be performed:\n)); printf(_( --init initiate a new replication slot (for the slotname see --slot)\n)); printf(_( --startstart streaming in a replication slot (for the slotname see --slot)\n)); @@ -264,8 +265,8 @@ StreamLog(void) slot); /* Initiate the replication stream at specified location */ - snprintf(query, sizeof(query), START_LOGICAL_REPLICATION '%s' %X/%X, - slot, (uint32) (startpos 32), (uint32) startpos); + snprintf(query, sizeof(query), START_LOGICAL_REPLICATION '%s' %X/%X (%s), + slot, (uint32) (startpos 32), (uint32) startpos,plugin_opts); res = PQexec(conn, query); if (PQresultStatus(res) != PGRES_COPY_BOTH) { @@ -560,6 +561,7 @@ main(int argc, char **argv) {init, no_argument, NULL, 1}, {start, no_argument, NULL, 2}, {stop, no_argument, NULL, 3}, + {options,required_argument,NULL,'o'}, {NULL, 0, NULL, 0} }; int c; @@ -584,7 +586,7 @@ main(int argc, char **argv) } } - while ((c = getopt_long(argc, argv, f:nvd:h:p:U:wWP:s:S:, + while ((c = getopt_long(argc, argv, f:nvd:h:p:U:wWP:s:S:o:, long_options, option_index)) != -1) { switch (c) @@ -659,6 +661,10 @@ main(int argc, char **argv) case 3: do_stop_slot = true; break; + case 'o': +if(optarg != NULL) + plugin_opts = pg_strdup(optarg); +break; /* action */ default: -- 1.7.0.4 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] logical changeset generation v4 - Heikki's thoughts about the patch state
On 13-01-24 11:15 AM, Steve Singer wrote: On 13-01-24 06:40 AM, Andres Freund wrote: Fair enough. I am also working on a user of this infrastructure but that doesn't help you very much. Steve Singer seemed to make some stabs at writing an output plugin as well. Steve, how far did you get there? I was able to get something that generated output for INSERT statements in a format similar to what a modified slony apply trigger would want. This was with the list of tables to replicate hard-coded in the plugin. This was with the patchset from the last commitfest.I had gotten a bit hung up on the UPDATE and DELETE support because slony allows you to use an arbitrary user specified unique index as your key. It looks like better support for tables with a unique non-primary key is in the most recent patch set. I am hoping to have time this weekend to update my plugin to use parameters passed in on the init and other updates in the most recent version. If I make some progress I will post a link to my progress at the end of the weekend. My big issue is that I have limited time to spend on this. A few more comments; In decode.c DecodeDelete + if (r-xl_len = (SizeOfHeapDelete + SizeOfHeapHeader)) + { + elog(DEBUG2, huh, no primary key for a delete on wal_level = logical?); + return; + } + I think we should be passing delete's with candidate key data logged to the plugin. If the table isn't a replicated table then ignoring the delete is fine. If the table is a replicated table but someone has deleted the unique index from the table then the plugin will receive INSERT changes on the table but not DELETE changes. If this happens the plugin would have any way of knowing that it is missing delete changes. If my plugin gets passed a DELETE change record but with no key data then my plugin could do any of 1. Start screaming for help (ie log errors) 2. Drop the table from replication 3. Pass the delete (with no key values) onto the replication client and let it deal with it (see 1 and 2) Also, 'huh' isn't one of our standard log message phrases :) How do you plan on dealing with sequences? I don't see my plugin being called on sequence changes and I don't see XLOG_SEQ_LOG listed in DecodeRecordIntoReorderBuffer. Is there a reason why this can't be easily added? Also what do we want to do about TRUNCATE support. I could always leave a TRUNCATE trigger in place that logged the truncate to a sl_truncates and have my replication daemon respond to the insert on a sl_truncates table by actually truncating the data on the replica. I've spent some time this weekend updating my prototype plugin that generates slony 2.2 style COPY output. I have attached my progress here (also https://github.com/ssinger/slony1-engine/tree/logical_repl). I have not gotten as far as modifying slon to act as a logical log receiver, or made a version of the slony apply trigger that would process these changes. I haven't looked into the details of what is involved in setting up a subscription with the snapshot exporting. I couldn't get the options on the START REPLICATION command to parse so I just hard coded some list building code in the init method. I do plan on pasing the list of tables to replicate from the replica to the plugin (because this list comes from the replica). Passing what could be a few thousand table names as a list of arguments is a bit ugly and I admit my list processing code is rough. Does this make us want to reconsider the format of the option_list ? I guess should provide an opinion on if I think that the patch in this CF, if committed could be used to act as a source for slony instead of the log trigger. The biggest missing piece I mentioned in my email yesterday, that we aren't logging the old primary key on row UPDATEs. I don't see building a credible replication system where you don't allow users to update any column of a row. The other issues I've raised (DecodeDelete hiding bad deletes, replication options not parsing for me) look like easy fixes no wal decoding support for sequences or truncate are things that I could work around by doing things much like slony does today. The SYNC can still capture the sequence changes in a table (where the INSERT's would be logged) and I can have a trigger capture truncates. I mostly did this review from the point of view of someone trying to use the feature, I haven't done a line-by-line review of the code. I suspect Andres can address these issues and get an updated patch out during this CF. I think a more detailed code review by someone more familiar with postgres internals will reveal a handful of other issues that hopefully can be fixed without a lot of effort. If this were the only patch in the commitfest I would encourage Andres to push to get these changes done. If the standard for CF4 is that a patch needs to be basically in a commitable state
Re: [HACKERS] logical changeset generation v4 - Heikki's thoughts about the patch state
On 13-01-24 11:15 AM, Steve Singer wrote: On 13-01-24 06:40 AM, Andres Freund wrote: Fair enough. I am also working on a user of this infrastructure but that doesn't help you very much. Steve Singer seemed to make some stabs at writing an output plugin as well. Steve, how far did you get there? I was able to get something that generated output for INSERT statements in a format similar to what a modified slony apply trigger would want. This was with the list of tables to replicate hard-coded in the plugin. This was with the patchset from the last commitfest.I had gotten a bit hung up on the UPDATE and DELETE support because slony allows you to use an arbitrary user specified unique index as your key. It looks like better support for tables with a unique non-primary key is in the most recent patch set. I am hoping to have time this weekend to update my plugin to use parameters passed in on the init and other updates in the most recent version. If I make some progress I will post a link to my progress at the end of the weekend. My big issue is that I have limited time to spend on this. This isn't a complete review just a few questions I've hit so far that I thought I'd ask to see if I'm not seeing something related to updates. *** a/src/include/catalog/index.h --- b/src/include/catalog/index.h *** extern bool ReindexIsProcessingHeap(Oid *** 114,117 --- 114,121 extern bool ReindexIsProcessingIndex(Oid indexOid); extern OidIndexGetRelation(Oid indexId, bool missing_ok); + extern void relationFindPrimaryKey(Relation pkrel, Oid *indexOid, +int16 *nratts, int16 *attnums, Oid *atttypids, +Oid *opclasses); + #endif /* INDEX_H */ I don't see this defined anywhere could it be left over from a previous version of the patch? In decode.c DecodeUpdate: + + /* +* FIXME: need to get/save the old tuple as well if we want primary key +* changes to work. +*/ + change-newtuple = ReorderBufferGetTupleBuf(reorder); I also don't see any code in heap_update to find + save the old primary key values like you added to heap_delete. You didn't list Add ability to change the primary key on an UPDATE in the TODO so I'm wondering if I'm missing something. Is there another way I can bet the primary key values for the old_tuple? Also, I think the name of the test contrib module was changed but you didn't update the make file. This fixes it diff --git a/contrib/Makefile b/contrib/Makefile index 1cc30fe..36e6bfe 100644 --- a/contrib/Makefile +++ b/contrib/Makefile @@ -50,7 +50,7 @@ SUBDIRS = \ tcn \ test_parser \ test_decoding \ - test_logical_replication \ + test_logical_decoding \ tsearch2\ unaccent\ vacuumlo\ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] logical changeset generation v4 - Heikki's thoughts about the patch state
On 13-01-24 06:40 AM, Andres Freund wrote: Fair enough. I am also working on a user of this infrastructure but that doesn't help you very much. Steve Singer seemed to make some stabs at writing an output plugin as well. Steve, how far did you get there? I was able to get something that generated output for INSERT statements in a format similar to what a modified slony apply trigger would want. This was with the list of tables to replicate hard-coded in the plugin. This was with the patchset from the last commitfest.I had gotten a bit hung up on the UPDATE and DELETE support because slony allows you to use an arbitrary user specified unique index as your key. It looks like better support for tables with a unique non-primary key is in the most recent patch set. I am hoping to have time this weekend to update my plugin to use parameters passed in on the init and other updates in the most recent version. If I make some progress I will post a link to my progress at the end of the weekend. My big issue is that I have limited time to spend on this. BTW, why does all the transaction reordering stuff has to be in core? It didn't use to, but people argued pretty damned hard that no undecoded data should ever allowed to leave the postgres cluster. And to be fair it makes writing an output plugin *way* much easier. Check http://git.postgresql.org/gitweb/?p=users/andresfreund/postgres.git;a=blob;f=contrib/test_decoding/test_decoding.c;hb=xlog-decoding-rebasing-cf4 If you skip over tuple_to_stringinfo(), which is just pretty generic scaffolding for converting a whole tuple to a string, writing out the changes in some format by now is pretty damn simple. I think we will find that the replication systems won't be the only users of this feature. I have often seen systems that have a logging requirement for auditing purposes or to log then reconstruct the sequence of changes made to a set of tables in order to feed a downstream application. Triggers and a journaling table are the traditional way of doing this but it should be pretty easy to write a plugin to accomplish the same thing that should give better performance. If the reordering stuff wasn't in core this would be much harder. How much of this infrastructure is to support replicating DDL changes? IOW, if we drop that requirement, how much code can we slash? Unfortunately I don't think too much unless we add in other code that allows us to check whether the current definition of a table is still the same as it was back when the tuple was logged. Any other features or requirements that could be dropped? I think it's clear at this stage that this patch is not going to be committed as it is. If you can reduce it to a fraction of what it is now, that fraction might have a chance. Otherwise, it's just going to be pushed to the next commitfest as whole, and we're going to be having the same doubts and discussions then. One thing that reduces complexity is to declare the following as unsupported: - CREATE TABLE foo(data text); - DECODE UP TO HERE; - INSERT INTO foo(data) VALUES(very-long-to-be-externally-toasted-tuple); - DROP TABLE foo; - DECODE UP TO HERE; but thats just a minor thing. I think what we can do more realistically than to chop of required parts of changeset extraction is to start applying some of the preliminary patches independently: - the relmapper/relfilenode changes + pg_relation_by_filenode(spc, relnode) should be independently committable if a bit boring - allowing walsenders to connect to a database possibly needs an interface change but otherwise it should be fine to go in independently. It also has other potential use-cases, so I think thats fair. - logging xl_running_xact's more frequently could also be committed independently and makes sense independently as it allows a standby to enter HS faster if the master is busy - Introducing InvalidCommandId should be relatively uncontroversial. The fact that no invalid value for command ids exists is imo an oversight - the *Satisfies change could be applied and they are imo ready but there's no use-case for it without the rest, so I am not sure whether theres a point - currently not separately available, but we could add wal_level=logical independently. There would be no user of it, but it would be partial work. That includes the relcache support for keeping track of the primary key which already is available separately. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Event Triggers: adding information
On 13-01-24 05:43 AM, Dimitri Fontaine wrote: Robert Haas robertmh...@gmail.com writes: On Mon, Jan 21, 2013 at 12:27 PM, Dimitri Fontaine dimi...@2ndquadrant.fr wrote: - Context exposing and filtering I'm not feeling very sanguine about any of this. I feel strongly that we should try to do something that's more principled than just throwing random junk into ProcessUtility. The goal is to allow for advanced users of that feature to get at the sequence, constraints and index names that have been picked automatically by PostgreSQL when doing some CREATE TABLE statement that involves them: CREATE TABLE foo (id serial PRIMARY KEY, f1 text); Andres and Steve, as you're the next ones to benefit directly from that whole feature and at different levels, do you have a strong opinion on whether you really need that to operate at the logical level? I haven't been following this thread very closely and I'm not exactly sure what your asking. If the question is what a replication system would need to replicate DDL commands, then I need a method of translating whatever the DDL trigger is passed into either the 'CREATE TABLE public.foo(id serial primary key, f1 text)' OR some set of equivalent commands that will allow me to create the same objects on the replica. I don't have any brilliant insight on how I would go about during it. Honestly I haven't spent a lot of time thinking about it in the past 8 months. If your asking what I would need for a trigger to automatically add a new table to replication then I think I would only need to know (or be able to obtain) the fully qualified name of the table and then in another trigger call be given the name of the fully qualified name of the sequence. The triggers would need to be able to do DML to configuration tables. I don't need to know that a table and sequence are connected if all I want to do is replicate them. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)
On 13-01-21 02:28 AM, Andres Freund wrote: I haven't removed it from the patch afaik, so it would be great to get a profile here! Its only for xlogdump, but that tool helped me immensely and I don't want to maintain it independently... Here is the output from tprof Here is the baseline: Average time 37862 Total Ticks For All Processes (./postgres_perftest) = 19089 Subroutine Ticks% Source Address Bytes == = == == === = .AllocSetAlloc 2270 0.99 aset.c 3bfb84b0 .base_yyparse 1217 0.53 gram.c fe644 168ec .text 1015 0.44 copyfuncs.c 11bfb4 4430 .MemoryContextAllocZero 535 0.23 mcxt.c 3b990 f0 .MemoryContextAlloc 533 0.23 mcxt.c 3b780 e0 .check_stack_depth 392 0.17 postgres.c 568ac100 .core_yylex 385 0.17 gram.c fb5b4 1c90 .expression_tree_walker 347 0.15 nodeFuncs.c 50da4750 .AllocSetFree308 0.13 aset.c 3c9981b0 .grouping_planner242 0.11 planner.c 2399f0 1d10 .SearchCatCache 198 0.09 catcache.c 7ec20550 ._SPI_execute_plan 195 0.09 spi.c 2f0c187c0 .pfree 195 0.09 mcxt.c 3b3b0 70 query_dependencies_walker185 0.08 setrefs.c 2559441b0 .GetSnapshotData 183 0.08 procarray.c 69efc460 .query_tree_walker 176 0.08 nodeFuncs.c 50ae4210 .strncpy 168 0.07 strncpy.s ba080130 .fmgr_info_cxt_security 166 0.07 fmgr.c 3f7b0850 .transformStmt 159 0.07 analyze.c 29091c 12d0 .text141 0.06 parse_collate.c 28ddf8 1 .ExecInitExpr137 0.06 execQual.c 17f18c 15f0 .fix_expr_common 132 0.06 setrefs.c 2557e4160 .standard_ExecutorStart 127 0.06 execMain.c 1d9a00940 .GetCachedPlan 125 0.05 plancache.c ce664310 .strcpy 121 0.05 noname 3bd401a8 With your changes (same test as I described before) Average: 37938 Total Ticks For All Processes (./postgres_perftest) = 19311 Subroutine Ticks% Source Address Bytes == = == == === = .AllocSetAlloc 2162 2.17 aset.c 3bfb84b0 .base_yyparse 1242 1.25 gram.c fdc7c 167f0 .text 1028 1.03 copyfuncs.c 11b4d0 4210 .palloc 553 0.56 mcxt.c 3b4c8 d0 .MemoryContextAllocZero 509 0.51 mcxt.c 3b9e8 f0 .core_yylex 413 0.41 gram.c fac2c 1c60 .check_stack_depth 404 0.41 postgres.c 56730100 .expression_tree_walker 320 0.32 nodeFuncs.c 50d28750 .AllocSetFree261 0.26 aset.c 3c9981b0 ._SPI_execute_plan 232 0.23 spi.c 2ee6247c0 .GetSnapshotData 221 0.22 procarray.c 69d54460 .grouping_planner211 0.21 planner.c 237b60 1cf0 .MemoryContextAlloc 190 0.19 mcxt.c 3b738 e0 .query_tree_walker 184 0.18 nodeFuncs.c 50a68210 .SearchCatCache 182 0.18 catcache.c 7ea08550 .transformStmt 181 0.18 analyze.c 28e774 12d0 query_dependencies_walker180 0.18 setrefs.c 25397c1b0 .strncpy 175 0.18 strncpy.s b9a60130 .MemoryContextCreate 167 0.17 mcxt.c 3bad8160 .pfree 151 0.15 mcxt.c 3b208 70 .strcpy 150 0.15 noname 3bd401a8 .fmgr_info_cxt_security 146 0.15 fmgr.c 3f790850 .text132 0.13 parse_collate.c 28bc50 1 .ExecInitExpr125 0.13 execQual.c 17de28 15e0 .expression_tree_mutator 124 0.12 nodeFuncs.c 53268 1080 .strcmp 122 0.12 noname 1e44158 .fix_expr_common 117 0.12 setrefs.c 25381c160 Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)
On 13-01-21 12:15 PM, Andres Freund wrote: On 2013-01-21 11:59:18 -0500, Steve Singer wrote: On 13-01-21 02:28 AM, Andres Freund wrote: I haven't removed it from the patch afaik, so it would be great to get a profile here! Its only for xlogdump, but that tool helped me immensely and I don't want to maintain it independently... Here is the output from tprof Here is the baseline: Any chance to do the test ontop of HEAD? The elog stuff has gone in only afterwards and might actually effect this enough to be relevant. HEAD(:fb197290) Average: 28877 .AllocSetAlloc 1442 1.90 aset.c 3a9384b0 .base_yyparse 1220 1.61 gram.c fdbc0 168ec .MemoryContextAlloc 485 0.64 mcxt.c 3a154 e0 .core_yylex 407 0.54 gram.c fab30 1c90 .AllocSetFree320 0.42 aset.c 3b3181b0 .MemoryContextAllocZero 316 0.42 mcxt.c 3a364 f0 .grouping_planner271 0.36 planner.c 2b0ce8 1d10 .strncpy 256 0.34 strncpy.s b8ca0130 .expression_tree_walker 222 0.29 nodeFuncs.c 4f734750 ._SPI_execute_plan 221 0.29 spi.c 2fb230840 .SearchCatCache 182 0.24 catcache.c 7d870550 .GetSnapshotData 161 0.21 procarray.c 68acc460 .fmgr_info_cxt_security 155 0.20 fmgr.c 3e130850 .pfree 152 0.20 mcxt.c 39d84 70 .expression_tree_mutator 151 0.20 nodeFuncs.c 51c84 1170 .check_stack_depth 142 0.19 postgres.c 5523c100 .text126 0.17 parse_collate.c 233d90 1 .transformStmt 125 0.16 analyze.c 289e88 12c0 .ScanKeywordLookup 123 0.16 kwlookup.c f7ab4154 .new_list120 0.16 list.c 40f74 b0 .subquery_planner115 0.15 planner.c 2b29f8dc0 .GetCachedPlan 115 0.15 plancache.c cdab0310 .ExecInitExpr114 0.15 execQual.c 17e690 15f0 .set_plan_refs 113 0.15 setrefs.c 252630cb0 .standard_ExecutorStart 110 0.14 execMain.c 1d9264940 .heap_form_tuple 107 0.14 heaptuple.c 177c842f0 .query_tree_walker 105 0.14 nodeFuncs.c 4f474210 HEAD + patch: Average: 29035 Total Ticks For All Processes (./postgres_perftest) = 15044 Subroutine Ticks% Source Address Bytes == = == == === = .AllocSetAlloc 1422 1.64 aset.c 3a9384b0 .base_yyparse 1201 1.38 gram.c fd1e8 167f0 .palloc 470 0.54 mcxt.c 39e04 d0 .core_yylex 364 0.42 gram.c fa198 1c60 .MemoryContextAllocZero 282 0.33 mcxt.c 3a324 f0 ._SPI_execute_plan 250 0.29 spi.c 2f8c18840 .AllocSetFree244 0.28 aset.c 3b3181b0 .strncpy 234 0.27 strncpy.s b86a0130 .expression_tree_walker 229 0.26 nodeFuncs.c 4f698750 .grouping_planner176 0.20 planner.c 2aea84 1d30 .SearchCatCache 170 0.20 catcache.c 7d638550 .GetSnapshotData 168 0.19 procarray.c 68904460 .assign_collations_walker155 0.18 parse_collate.c 231f4c7e0 .subquery_planner141 0.16 planner.c 2b07b4dc0 .fmgr_info_cxt_security 141 0.16 fmgr.c 3e110850 .check_stack_depth 140 0.16 postgres.c 550a0100 .ExecInitExpr138 0.16 execQual.c 17d2f8 15e0 .pfree 137 0.16 mcxt.c 39b44 70 .transformStmt 132 0.15 analyze.c 287dec 12c0 .new_list128 0.15 list.c 40f44 90 .expression_tree_mutator 125 0.14 nodeFuncs.c 51bd8 1080 .preprocess_expression 118 0.14 planner.c 2adf541a0 .strcmp 118 0.14 noname 1e44158 .standard_ExecutorStart 115 0.13 execMain.c 1d77c0940 .MemoryContextAlloc 111 0.13 mcxt.c 3a074 e0 .set_plan_refs 109 0.13 setrefs.c 2506c4ca0 Otherwise I have to say I am a bit confused by the mighty difference in costs attributed to AllocSetAlloc given the amount of calls should be exactly the same and the number of trampoline function calls should also stay the same. Greetings, Andres Freund -- Andres Freundhttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)
On 13-01-09 03:07 PM, Tom Lane wrote: Andres Freund and...@2ndquadrant.com writes: Well, I *did* benchmark it as noted elsewhere in the thread, but thats obviously just machine (E5520 x 2) with one rather restricted workload (pgbench -S -jc 40 -T60). At least its rather palloc heavy. Here are the numbers: before: #101646.763208 101350.361595 101421.425668 101571.211688 101862.172051 101449.857665 after: #101553.596257 102132.277795 101528.816229 101733.541792 101438.531618 101673.400992 So on my system if there is a difference, its positive (0.12%). pgbench-based testing doesn't fill me with a lot of confidence for this --- its numbers contain a lot of communication overhead, not to mention that pgbench itself can be a bottleneck. It struck me that we have a recent test case that's known to be really palloc-intensive, namely Pavel's example here: http://www.postgresql.org/message-id/CAFj8pRCKfoz6L82PovLXNK-1JL=jzjwat8e2bd2pwnkm7i7...@mail.gmail.com I set up a non-cassert build of commit 78a5e738e97b4dda89e1bfea60675bcf15f25994 (ie, just before the patch that reduced the data-copying overhead for that). On my Fedora 16 machine (dual 2.0GHz Xeon E5503, gcc version 4.6.3 20120306 (Red Hat 4.6.3-2)) I get a runtime for Pavel's example of 17023 msec (average over five runs). I then applied oprofile and got a breakdown like this: samples| %| -- 108409 84.5083 /home/tgl/testversion/bin/postgres 13723 10.6975 /lib64/libc-2.14.90.so 3153 2.4579 /home/tgl/testversion/lib/postgresql/plpgsql.so samples %symbol name 1096010.1495 AllocSetAlloc 6325 5.8572 MemoryContextAllocZeroAligned 6225 5.7646 base_yyparse 3765 3.4866 copyObject 2511 2.3253 MemoryContextAlloc 2292 2.1225 grouping_planner 2044 1.8928 SearchCatCache 1956 1.8113 core_yylex 1763 1.6326 expression_tree_walker 1347 1.2474 MemoryContextCreate 1340 1.2409 check_stack_depth 1276 1.1816 GetCachedPlan 1175 1.0881 AllocSetFree 1106 1.0242 GetSnapshotData 1106 1.0242 _SPI_execute_plan 1101 1.0196 extract_query_dependencies_walker I then applied the palloc.h and mcxt.c hunks of your patch and rebuilt. Now I get an average runtime of 1 ms, a full 2% faster, which is a bit astonishing, particularly because the oprofile results haven't moved much: 107642 83.7427 /home/tgl/testversion/bin/postgres 14677 11.4183 /lib64/libc-2.14.90.so 3180 2.4740 /home/tgl/testversion/lib/postgresql/plpgsql.so samples %symbol name 10038 9.3537 AllocSetAlloc 6392 5.9562 MemoryContextAllocZeroAligned 5763 5.3701 base_yyparse 4810 4.4821 copyObject 2268 2.1134 grouping_planner 2178 2.0295 core_yylex 1963 1.8292 palloc 1867 1.7397 SearchCatCache 1835 1.7099 expression_tree_walker 1551 1.4453 check_stack_depth 1374 1.2803 _SPI_execute_plan 1282 1.1946 MemoryContextCreate 1187 1.1061 AllocSetFree ... 653 0.6085 palloc0 ... 552 0.5144 MemoryContextAlloc The number of calls of AllocSetAlloc certainly hasn't changed at all, so how did that get faster? I notice that the postgres executable is about 0.2% smaller, presumably because a whole lot of inlined fetches of CurrentMemoryContext are gone. This makes me wonder if my result is due to chance improvements of cache line alignment for inner loops. I would like to know if other people get comparable results on other hardware (non-Intel hardware would be especially interesting). If this result holds up across a range of platforms, I'll withdraw my objection to making palloc a plain function. regards, tom lane Sorry for the delay I only read this thread today. I just tried Pawel's test on a POWER5 machine with an older version of gcc (see the grebe buildfarm animal for details) 78a5e738e: 37874.855 (average of 6 runs) 78a5e738 + palloc.h + mcxt.c: 38076.8035 The functions do seem to slightly slow things down on POWER. I haven't bothered to run oprofile or tprof to get a breakdown of the functions since Andres has already removed this from his patch. Steve -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] logical changeset generation v4
On 13-01-14 08:38 PM, Andres Freund wrote: Hi everyone, Here is the newest version of logical changeset generation. 2) Currently the logical replication infrastructure assigns a 'slot-id' when a new replica is setup. That slot id isn't really nice (e.g. id-321578-3). It also requires that [18] keeps state in a global variable to make writing regression tests easy. I think it would be better to make the user specify those replication slot ids, but I am not really sure about it. Shortly after trying out the latest version I hit the following scenario 1. I started pg_receivellog but mistyped the name of my plugin 2. It looped and used up all of my logical replication slots I killed pg_receivellog and restarted it with the correct plugin name but it won't do anything because I have no free slots. I can't free the slots with -F because I have no clue what the names of the slots are. I can figure the names out by looking in pg_llog but if my replication program can't do that so it won't be able to clean up from a failed attempt. I agree with you that we should make the user program specify a slot, we eventually might want to provide a view that shows the currently allocated slots. For a logical based slony I would just generate the slot name based on the remote node id. If walsender generates the slot name then I would need to store a mapping between slot names and slons so when a slon restarted it would know which slot to resume using. I'd have to use a table in the slony schema on the remote database for this. There would always be a risk of losing track of a slot id if the slon crashed after getting the slot number but before committing the mapping on the remote database. 3) Currently no options can be passed to an output plugin. I am thinking about making INIT_LOGICAL_REPLICATION 'plugin' accept the now widely used ('option' ['value'], ...) syntax and pass that to the output plugin's initialization function. I think we discussed this last CF, I like this idea. 4) Does anybody object to: -- allocate a permanent replication slot INIT_LOGICAL_REPLICATION 'plugin' 'slotname' (options); -- stream data START_LOGICAL_REPLICATION 'slotname' 'recptr'; -- deallocate a permanent replication slot FREE_LOGICAL_REPLICATION 'slotname'; +1 5) Currently its only allowed to access catalog tables, its fairly trivial to extend this to additional tables if you can accept some (noticeable but not too big) overhead for modifications on those tables. I was thinking of making that an option for tables, that would be useful for replication solutions configuration tables. I think this will make the life of anyone developing a new replication system easier. Slony has a lot of infrastructure for allowing slonik scripts to wait for configuration changes to popogate everywhere before making other configuration changes because you can get race conditions. If I were designing a new replication system and I had this feature then I would try to use it to come up with a simpler model of propagating configuration changes. Andres Freund Steve -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] AIX buildfarm member
The only animal in the buildfarm running AIX is grebe (http://buildfarm.postgresql.org/cgi-bin/show_history.pl?nm=grebebr=HEAD) It is likely that the server running this animal will go away sometime this year and the machine replacing it isn't running AIX. If someone else in the community is running PostgreSQL on AIX then it would be good if they setup a buildfarm member, perhaps with a more recent version of AIX. Steve -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Logical decoding exported base snapshot
On 12-12-12 06:20 AM, Andres Freund wrote: Possible solutions: 1) INIT_LOGICAL_REPLICATION waits for an answer from the client that confirms that logical replication initialization is finished. Before that the walsender connection cannot be used for anything else. 2) we remove the snapshot as soon as any other commend is received, this way the replication connection stays usable, e.g. to issue a START_LOGICAL_REPLICATION in parallel to the initial data dump. In that case the snapshot would have to be imported *before* the next command was received as SET TRANSACTION SNAPSHOT requires the source transaction to be still open. Option 2 sounds more flexible. Is it more difficult to implement? No, I don't think so. It's a bit more intrusive in that it requires knowledge about logical replication in more parts of walsender, but it should be ok. Note btw, that my description of 1) was easy to misunderstand. The that in Before that the walsender connection cannot be used for anything else. is the answer from the client, not the usage of the exported snapshot. Once the snapshot has been iimported into other session(s) the source doesn't need to be alive anymore. Does that explanation change anything? I think I understood you were saying the walsender connection can't be used for anything else (ie streaming WAL) until the exported snapshot has been imported. I think your clarification is still consistent with this? WIth option 2 I can still get the option 1 behaviour by not sending the next command to the walsender until I am done importing the snapshot. However if I want to start processing WAL before the snapshot has been imported I can do that with option 2. I am not sure I would need to do that, I'm just saying having the option is more flexible. Greetings, Andres Freund -- Andres Freundhttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Logical decoding exported base snapshot
On 12-12-11 06:52 PM, Andres Freund wrote: Hi, Problem 1: One problem I see is that while exporting a snapshot solves the visibility issues of the table's contents it does not protect against schema changes. I am not sure whether thats a problem. If somebody runs a CLUSTER or something like that, the table's contents will be preserved including MVCC semantics. That's fine. The more problematic cases I see are TRUNCATE, DROP and ALTER TABLE. Imagine the following: S1: INIT_LOGICAL_REPLICATION S1: get snapshot: 0333-1 S2: ALTER TABLE foo ALTER COLUMN blub text USING (blub::text); S3: pg_dump --snapshot 0333-1 S1: START_LOGICAL_REPLICATION In that case the pg_dump would dump foo using the schema *after* the ALTER TABLE but showing only rows visible to our snapshot. After START_LOGICAL_REPLICATION all changes after the xlog position from INIT_LOGICAL_REPLICATION will be returned though - including all the tuples from the ALTER TABLE and potentially - if some form of schema capturing was in place - the ALTER TABLE itself. The copied schema would have the new format already though. Does anybody see that as aproblem or is it just a case of PEBKAC? One argument for the latter is that thats already a problematic case for normal pg_dump's. Its just that the window is a bit larger here. Is there anyway to detect this situation as part of the pg_dump? If I could detect this, abort my pg_dump then go and get a new snapshot then I don't see this as a big deal. I can live with telling users, don't do DDL like things while subscribing a new node, if you do the subscription will restart. I am less keen on telling users don't do DDL like things while subscribing a new node or the results will be unpredictable I'm trying to convince myself if I will be able to take a pg_dump from an exported snapshot plus the changes made after in between the snapshot id to some later time and turn the results into a consistent database. What if something like this comes along INIT REPLICATION insert into foo (id,bar) values (1,2); alter table foo drop column bar; pg_dump --snapshot The schema I get as part of the pg_dump won't have bar because it has been dropped, even though it will have the rows that existed with bar. I then go to process the INSERT statement. It will have a WAL record with column data for bar and the logical replication replay will lookup the catalog rows from before the alter table so it will generate a logical INSERT record with BAR. That will fail on the replica. I'm worried that it will be difficult to pragmatically stitch together the inconsistent snapshot from the pg_dump plus the logical records generated in between the snapshot and the dump (along with any record of the DDL if it exists). Problem 2: Control Flow. To be able to do a SET TRANSACTION SNAPSHOT the source transaction needs to be alive. That's currently solved by exporting the snapshot in the walsender connection that did the INIT_LOGICAL_REPLICATION. The question is how long should we preserve that snapshot? There is no requirement - and there *cannot* be one in the general case, the slot needs to be usable after a restart - that START_LOGICAL_REPLICATION has to be run in the same replication connection as INIT. Possible solutions: 1) INIT_LOGICAL_REPLICATION waits for an answer from the client that confirms that logical replication initialization is finished. Before that the walsender connection cannot be used for anything else. 2) we remove the snapshot as soon as any other commend is received, this way the replication connection stays usable, e.g. to issue a START_LOGICAL_REPLICATION in parallel to the initial data dump. In that case the snapshot would have to be imported *before* the next command was received as SET TRANSACTION SNAPSHOT requires the source transaction to be still open. Opinions? Option 2 sounds more flexible. Is it more difficult to implement? Andres -- Andres Freundhttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH 11/14] Introduce wal decoding via catalog timetravel
On 12-12-03 07:42 AM, Andres Freund wrote: Two things: 1) Which exact options are you using for pg_receivellog? Not -d replication by any chance? Yes that is exactly what I'md doing. Using a real database name instead makes this go away. Thanks This seems to produce exactly that kind off error messages. I added some error checking around that. Pushed. Thanks! Andres Freund -- Andres Freundhttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH 11/14] Introduce wal decoding via catalog timetravel
On 12-12-03 09:48 AM, Andres Freund wrote: On 2012-12-03 09:35:55 -0500, Steve Singer wrote: On 12-12-03 07:42 AM, Andres Freund wrote: Yes that is exactly what I'md doing. Using a real database name instead makes this go away. Was using replication an accident or do you think it makes sense in some way? The 'replication' line in pg_hba.conf made me think that the database name had to be replication for walsender connections. Greetings, Andres Freund -- Andres Freundhttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH 11/14] Introduce wal decoding via catalog timetravel
On 12-11-14 08:17 PM, Andres Freund wrote: I am getting errors like the following when I try to use either your test_decoding plugin or my own (which does even less than yours) LOG: database system is ready to accept connections LOG: autovacuum launcher started WARNING: connecting to WARNING: Initiating logical rep LOG: computed new xmin: 773 LOG: start reading from 0/17F5D58, scrolled back to 0/17F4000 LOG: got new xmin 773 at 25124280 LOG: found initial snapshot (via running xacts). Done: 1 WARNING: reached consistent point, stopping! WARNING: Starting logical replication LOG: start reading from 0/17F5D58, scrolled back to 0/17F4000 LOG: found initial snapshot (via running xacts). Done: 1 FATAL: cannot read pg_class without having selected a database TRAP: FailedAssertion(!(SHMQueueEmpty((MyProc-myProcLocks[i]))), File: proc.c, Line: 759) This seems to be happening under the calls at reorderbuffer.c:832if (!SnapBuildHasCatalogChanges(NULL, xid, change-relnode)) The sequence of events I do is: 1. start pg_receivellog 2. run a checkpoint 3. Attach to the walsender process with gdb 4. Start a new client connection with psql and do 'INSERT INTO a values (1)' twice. (skipping step 3 doesn't make a difference) I This introduces several things: * 'reorderbuffer' module which reassembles transactions from a stream of interspersed changes * 'snapbuilder' which builds catalog snapshots so that tuples from wal can be understood * logging more data into wal to facilitate logical decoding * wal decoding into an reorderbuffer * shared library output plugins with 5 callbacks * init * begin * change * commit * walsender infrastructur to stream out changes and to keep the global xmin low enough * INIT_LOGICAL_REPLICATION $plugin; waits till a consistent snapshot is built and returns * initial LSN * replication slot identifier * id of a pg_export() style snapshot * START_LOGICAL_REPLICATION $id $lsn; streams out changes * uses named output plugins for output specification Todo: * testing infrastructure (isolationtester) * persistence/spilling to disk of built snapshots, longrunning transactions * user docs * more frequent lowering of xmins * more docs about the internals * support for user declared catalog tables * actual exporting of initial pg_export snapshots after INIT_LOGICAL_REPLICATION * own shared memory segment instead of piggybacking on walsender's * nicer interface between snapbuild.c, reorderbuffer.c, decode.c and the outside. * more frequent xl_running_xid's so xmin can be upped more frequently * add STOP_LOGICAL_REPLICATION $id --- src/backend/access/heap/heapam.c| 280 +- src/backend/access/transam/xlog.c |1 + src/backend/catalog/index.c | 74 ++ src/backend/replication/Makefile|2 + src/backend/replication/logical/Makefile| 19 + src/backend/replication/logical/decode.c| 496 ++ src/backend/replication/logical/logicalfuncs.c | 247 + src/backend/replication/logical/reorderbuffer.c | 1156 +++ src/backend/replication/logical/snapbuild.c | 1144 ++ src/backend/replication/repl_gram.y | 32 +- src/backend/replication/repl_scanner.l |2 + src/backend/replication/walsender.c | 566 ++- src/backend/storage/ipc/procarray.c | 23 + src/backend/storage/ipc/standby.c |8 +- src/backend/utils/cache/inval.c |2 +- src/backend/utils/cache/relcache.c |3 +- src/backend/utils/misc/guc.c| 11 + src/backend/utils/time/tqual.c | 249 + src/bin/pg_controldata/pg_controldata.c |2 + src/include/access/heapam_xlog.h| 23 + src/include/access/transam.h|5 + src/include/access/xlog.h |3 +- src/include/catalog/index.h |4 + src/include/nodes/nodes.h |2 + src/include/nodes/replnodes.h | 22 + src/include/replication/decode.h| 21 + src/include/replication/logicalfuncs.h | 44 + src/include/replication/output_plugin.h | 76 ++ src/include/replication/reorderbuffer.h | 284 ++ src/include/replication/snapbuild.h | 128 +++ src/include/replication/walsender.h |1 + src/include/replication/walsender_private.h | 34 +- src/include/storage/itemptr.h |3 + src/include/storage/sinval.h|2 + src/include/utils/tqual.h | 31 +- 35 files changed, 4966 insertions(+), 34 deletions(-) create mode 100644 src/backend/replication/logical/Makefile create mode 100644 src/backend/replication/logical/decode.c
Re: [HACKERS] logical changeset generation v3 - Source for Slony
First, you can add me to the list of people saying 'wow', I'm impressed. The approach I am taking to reviewing this to try and answer the following question 1) How might a future version of slony be able to use logical replication as described by your patch and design documents and what would that look like. 2) What functionality is missing from the patch set that would stop me from implementing or prototyping the above. Connecting slon to the remote postgresql Today the slony remote listener thread queries a bunch of events from sl_event for a batch of SYNC events. Then the remote helper thread queries data from sl_log_1 and sl_log_2.I see this changing, instead the slony remote listener thread would connect to the remote system and get a logical replication stream. 1) Would slony connect as a normal client connection and call something like 'select pg_slony_process_xlog(...)' to get bunch of logical replication change records to process. OR 2) Would slony connect as a replication connection similar to how the pg_receivelog program does today and then process the logical changeset outputs. Instead of writing it to a file (as pg_receivelog does) It seems that the second approach is what is encouraged. I think we would put a lot of the pg_receivelog functionality into slon and it would issue a command like 'INIT_LOGICAL_REPLICATION 'slony') to use the slony logical replication plugin. Slon would also have to provide feedback to the walsender about what it has processed so the origin database knows what catalog snapshots can be expired. Based on eyeballing pg_receivelog.c it seems that about half the code in the 700 file is related to command line arguments etc, and the other half is related to looping over the copy out stream, sending feedback and other things that we would need to duplicate in slon. pg_receivelog.c has comment: /* * We have to use postgres.h not postgres_fe.h here, because there's so much * backend-only stuff in the XLOG include files we need. But we need a * frontend-ish environment otherwise.Hence this ugly hack. */ This looks like more of a carryover from pg_receivexlog.c. From what I can tell we can eliminate the postgres.h include if we also eliminate the utils/datetime.h and utils/timestamp.h and instead add in: #include postgres_fe.h #define POSTGRES_EPOCH_JDATE 2451545 #define UNIX_EPOCH_JDATE 2440588 #define SECS_PER_DAY 86400 #define USECS_PER_SEC INT64CONST(100) typedef int64 XLogRecPtr; #define InvalidXLogRecPtr 0 If there is a better way of getting these defines someone should speak up. I recall that in the past slon actually did include postgres.h and it caused some issues (I think with MSVC win32 builds). Since pg_receivelog.c will be used as a starting point/sample for third parties to write client programs it would be better if it didn't encourage client programs to include postgres.h The Slony Output Plugin = Once we've modified slon to connect as a logical replication client we will need to write a slony plugin. As I understand the plugin API: * A walsender is processing through WAL records, each time it sees a COMMIT WAL record it will call my plugins .begin .change (for each change in the transaction) .commit * The plugin for a particular stream/replication client will see one transaction at a time passed to it in commit order. It won't see .change(t1) followed by .change (t2), followed by a second .change(t1). The reorder buffer code hides me from all that complexity (yah) From a slony point of view I think the output of the plugin will be rows, suitable to be passed to COPY IN of the form: origin_id, table_namespace,table_name,command_type, cmd_updatencols,command_args This is basically the Slony 2.2 sl_log format minus a few columns we no longer need (txid, actionseq). command_args is a postgresql text array of column=value pairs. Ie [ {id=1},{name='steve'},{project='slony'}] I don't t think our output plugin will be much more complicated than the test_decoding plugin. I suspect we will want to give it the ability to filter out non-replicated tables. We will also have to filter out change records that didn't originate on the local-node that aren't part of a cascaded subscription. Remember that in a two node cluster slony will have connections from A--B and from B---A even if user tables only flow one way. Data that is replicated from A into B will show up in the WAL stream for B. Exactly how we do this filtering is an open question, I think the output plugin will at a minimum need to know: a) What the slony node id is of the node it is running on. This is easy to figure out if the output plugin is able/allowed to query its database. Will this be possible? I would expect to be able to query the database as it exists now(at plugin invocation time) not as it existed in the past
Re: [HACKERS] logical changeset generation v3 - Source for Slony
On 12-11-18 11:07 AM, Andres Freund wrote: Hi Steve! I think we should provide some glue code to do this, otherwise people will start replicating all the bugs I hacked into this... More seriously: I think we should have support code here, no user will want to learn the intracacies of feedback messages and such. Where that would live? No idea. libpglogicalrep.so ? I wholeheartedly aggree. It should also be cleaned up a fair bit before others copy it should we not go for having some client side library. Imo the library could very roughly be something like: state = SetupStreamingLLog(replication-slot, ...); while((message = StreamingLLogNextMessage(state)) { write(outfd, message-data, message-length); if (received_100_messages) { fsync(outfd); StreamingLLogConfirm(message); } } Although I guess thats not good enough because StreamingLLogNextMessage would be blocking, but that shouldn't be too hard to work around. How about we pass a timeout value to StreamingLLogNextMessage (..) where it returns if no data is available after the timeout to give the caller a chance to do something else. This is basically the Slony 2.2 sl_log format minus a few columns we no longer need (txid, actionseq). command_args is a postgresql text array of column=value pairs. Ie [ {id=1},{name='steve'},{project='slony'}] It seems to me that that makes escaping unneccesarily complicated, but given you already have all the code... ;) When I look at the actual code/representation we picked it is closer to {column1,value1,column2,value2...} I don't t think our output plugin will be much more complicated than the test_decoding plugin. Good. Thats the idea ;). Are you ok with the interface as it is now or would you like to change something? I'm going to think about this some more and maybe try to write an example plugin before I can say anything with confidence. Yes. We will also need something like that. If you remember the first prototype we sent to the list, it included the concept of an 'origin_node' in wal record. I think you actually reviewed that one ;) That was exactly aimed at something like this... Since then my thoughts about how the origin_id looks like have changed a bit: - origin id is internally still represented as an uint32/Oid - never visible outside of wal/system catalogs - externally visible it gets - assigned an uuid - optionally assigned a user defined name - user settable (permissions?) origin when executing sql: - SET change_origin_uuid = 'uuid'; - SET change_origin_name = 'user-settable-name'; - defaults to the local node - decoding callbacks get passed the origin of a change - txn-{origin_uuid, origin_name, origin_internal?} - the init decoding callback can setup an array of interesting origins, so the others don't even get the ReorderBuffer treatment I have to thank the discussion on -hackers and a march through prague with Marko here... So would the uuid and optional name assignment be done in the output plugin or some else? When/how does the uuid get generated and where do we store it so the same uuid gets returned when postgres restarts. Slony today stores all this type of stuff in user-level tables and user-level functions (because it has no other choice).What is the connection between these values and the 'slot-id' in your proposal for the init arguments? Does the slot-id need to be the external uuid of the other end or is there no direct connection? Today slony allows us to replicate between two databases in the same postgresql cluster (I use this for testing all the time) Slony also allows for two different 'slony clusters' to be setup in the same database (or so I'm told, I don't think I have ever tried this myself). plugin functions that let me query the local database and then return the uuid and origin_name would work in this model. +1 on being able to mark the 'change origin' in a SET command when the replication process is pushing data into the replica. Exactly how we do this filtering is an open question, I think the output plugin will at a minimum need to know: a) What the slony node id is of the node it is running on. This is easy to figure out if the output plugin is able/allowed to query its database. Will this be possible? I would expect to be able to query the database as it exists now(at plugin invocation time) not as it existed in the past when the WAL was generated. In addition to the node ID I can see us wanting to be able to query other slony tables (sl_table,sl_set etc...) Hm. There is no fundamental reason not to allow normal database access to the current database but it won't be all that cheap, so doing it frequently is not a good idea. The reason its not cheap is that you basically need to teardown the postgres internal caches if you switch the timestream in which you are working. Would go something like: TransactionContext =
Re: [HACKERS] [PATCH 09/14] Adjust all *Satisfies routines to take a HeapTuple instead of a HeapTupleHeader
On 12-11-14 08:17 PM, Andres Freund wrote: For the regular satisfies routines this is needed in prepareation of logical decoding. I changed the non-regular ones for consistency as well. The naming between htup, tuple and similar is rather confused, I could not find any consistent naming anywhere. This is preparatory work for the logical decoding feature which needs to be able to get to a valid relfilenode from when checking the visibility of a tuple. I have taken a look at this patch. The patch does what it says, it changes a bunch of HeapTupleSatisfies routines to take a HeapTuple structure instead of a HeapTupleHeader. It also sets the HeapTuple.t_tableOid value before calling these routines. The patch does not modify these routines to actually do anything useful with the additional data fields though it does add some assertions to make sure t_tableOid is set. The patch compiles cleanly and the unit tests pass. This patch does not seem to depend on any of the other patches in this set and applies cleanly against master. The patch doesn't actually add any functionality, unless someone sees a reason for complaining about this that I don't see, then I think it can be committed. Steve --- contrib/pgrowlocks/pgrowlocks.c | 2 +- src/backend/access/heap/heapam.c | 13 ++ src/backend/access/heap/pruneheap.c | 16 ++-- src/backend/catalog/index.c | 2 +- src/backend/commands/analyze.c | 3 ++- src/backend/commands/cluster.c | 2 +- src/backend/commands/vacuumlazy.c| 3 ++- src/backend/storage/lmgr/predicate.c | 2 +- src/backend/utils/time/tqual.c | 50 +--- src/include/utils/snapshot.h | 4 +-- src/include/utils/tqual.h| 20 +++ 11 files changed, 83 insertions(+), 34 deletions(-)
Re: [HACKERS] [RFC][PATCH] wal decoding, attempt #2 - Design Documents (really attached)
On 12-10-15 04:51 PM, Andres Freund wrote: Well, as a crosscheck, could you list your requirements? Do you need anything more than outputting data in a format compatible to whats stored in sl_log_*? You wouldn't have sl_actionseq, everything else should be there (Well, you would need to do lookups to get the tableid, but thats not really much of a problem). The results would be ordered in complete transactions, in commit order. I guess the other tables would stay as they are as they contain the added value of slony? Greetings, I actually had spent some time a few weeks ago looking over the documents and code. I never did get around to writing a review as elegant as Peter's. I have not seen any red flags that make me thing that what your proposing wouldn't be suitable for slony but sometimes you don't see details until you start implementing something. My initial approach to modifying slony to work with this might be something like: * Leave sl_event as is for non SYNC events, slon would still generate SYNC events in sl_event * We would modify the remote_worker thread in slon to instead of selecting from sl_event it would get the the next 'committed' transaction from your apply cache. For each ApplyChange record we would check to see if it is an insert into sl_event ,if so we would trigger our existing event processing logic based on the contents of the ev_type column. * If the change involves a insert/update/delete/truncate to a replicated table we would translate that change into SQL and apply it on the replica, we would not commit changes on the replica until we encounter a SYNC being added to sl_event for the current origin. * SQL will be applied in a slightly different order than slony does today. Today if two concurrent transactions are inserting into the same replicated table and they commit one after the other there is a good chance that the apply order on the replica will also be intermixed (assuming both commits were in between two SYNC events). My thinking is that we would just replay them one after the other on the replica in commit order. (Slony doesn't use commit order because we don't have it, not because we don't like it) this would mean we do away with tracking the action id. * If a node is configured as a 'forwarder' not it would store the processed output of each ApplyChange record in a table on the replica. If a slon is pulling data from a non-orign (ie if remoteWorkerThread_1 is pulling data from node 2) then it would need to query this table instead of calling the functions that process the ApplyCache contents. * To subscribe a node we would generate a SYNC event on the provider and do the copy_set. We would keep track of that SYNC event. The remote worker would then ignore any data that comes before that SYNC event when it starts pulling data from the apply cache. * DDL events in 2.2+ go into sl_ddl_script (or someting like that) when we see INSERT commands to that table we would now to then apply the DDL on the node. * We would need to continue to populate sl_confirm because nowing what SYNC events have already been processed by a node is pretty important in a MOVE SET or FAILOVER. It is possible that we might need to still track the xip lists of each SYNC for MOVE SET/FAILOVER but I'm not sure why/why not. This is all easier said than implemented Steve Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH 8/8] Introduce wal decoding via catalog timetravel
On 12-10-11 06:27 PM, Josh Berkus wrote: On 10/10/12 7:26 PM, Bruce Momjian wrote: How does Slony write its changes without causing serialization replay conflicts? Since nobody from the Slony team answered this: a) Slony replicates *rows*, not *statements* True, but the proposed logical replication also would replicate rows not the original statements. I don't consider this proposal to be an example of 'statement' replication like pgpool is. If the original SQL was 'update foo set x=x+1 where id 10'; there will be a WAL record to decode for each row modified by the table. In a million row table I'd expect the replica will have to apply a million records (whether they be binary heap tuples or SQL statements). b) Slony uses serializable mode internally for row replication Actually recent versions of slony apply transactions against the replica in read committed mode. Older versions used serializable mode but with the SSI changes in 9.1 we found slony tended to have serialization conflicts with itself on the slony internal tables resulting in a lot of aborted transactions. When slony applies changes on a replica table it does so in a single transaction. Slony finds a set of transactions that committed on the master in between two SYNC events. It then applies all of the rows changed by any of those transactions as part of a single transaction on the replica. Chris's post explains this in more detail. Conflicts with user transactions on the replica are possible. c) it's possible (though difficult) for creative usage to get Slony into a deadlock situation FWIW, I have always assumed that is is impossible (even theoretically) to have statement-based replication without some constraints on the statements you can run, or some replication failures. I think we should expect 9.3's logical replication out-the-gate to have some issues and impose constraints on users, and improve with time but never be perfect. The design Andres and Simon have advanced already eliminates a lot of the common failure cases (now(), random(), nextval()) suffered by pgPool and similar tools. But remember, this feature doesn't have to be *perfect*, it just has to be *better* than the alternatives. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] canceling autovacuum task woes
On 12-07-24 01:48 PM, Robert Haas wrote: I am running into a lot of customer situations where the customer reports that canceling autovacuum task shows up in the logs, and it's unclear whether this is happening often enough to matter, and even more unclear what's causing it. Could autovacuum be compacting a lot of space at the end of the table. This is described in the thread http://archives.postgresql.org/message-id/4d8df88e.7080...@yahoo.com Me: So, do you know what table it's getting cancelled on? Customer: Nope. Me: Are you running any DDL commands anywhere in the cluster? Customer: Nope, absolutely none. Me: Well you've got to be running something somewhere or it wouldn't be having a lock conflict. Customer: OK, well I don't know of any. What should I do? It would be awfully nice if the process that does the cancelling would provide the same kind of reporting that we do for a deadlock: the relevant lock tag, the PID of the process sending the cancel, and the query string. Personally, I'm starting to have a sneaky suspicion that there is something actually broken here - that is, that there are lock conflicts involve here other than the obvious one (SHARE UPDATE EXCLUSIVE on the table) that are allowing autovac to get cancelled more often than we realize. But whether that's true or not, the current logging is wholly inadequate. Thoughts? Anybody else have this problem? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH 08/16] Introduce the ApplyCache module which can reassemble transactions from a stream of interspersed changes
On 12-06-21 04:37 AM, Andres Freund wrote: Hi Steve, Thanks! Attached is a detailed review of the patch. Very good analysis, thanks! Another reasons why we cannot easily do 1) is that subtransactions aren't discernible from top-level transactions before the top-level commit happens, we can only properly merge in the right order (by sorting via lsn) once we have seen the commit record which includes a list of all committed subtransactions. Based on that and your comments further down in your reply (and that no one spoke up and disagreed with you) It sounds like that doing (1) isn't going to be practical. I also don't think 1) would be particularly welcome by people trying to replicate into foreign systems. They could still sort the changes into transaction groups before applying to the foreign system. I planned to have some cutoff 'max_changes_in_memory_per_txn' value. If it has been reached for one transaction all existing changes are spilled to disk. New changes again can be kept in memory till its reached again. Do you want max_changes_per_in_memory_txn or do you want to put a limit on the total amount of memory that the cache is able to use? How are you going to tell a DBA to tune max_changes_in_memory_per_txn? They know how much memory their system has and that they can devote to the apply cache versus other things, giving them guidance on how estimating how much open transactions they might have at a point in time and how many WAL change records each transaction generates seems like a step backwards from the progress we've been making in getting Postgresql to be easier to tune. The maximum number of transactions that could be opened at a time is governed by max_connections on the master at the time the WAL was generated , so I don't even see how the machine processing the WAL records could autotune/guess that. We need to support serializing the cache for crash recovery + shutdown of the receiving side as well. Depending on how we do the wal decoding we will need it more frequently... Have you described your thoughts on crash recovery on another thread? I am thinking that this module would have to serialize some state everytime it calls cache-commit() to ensure that consumers don't get invoked twice on the same transaction. If the apply module is making changes to the same backend that the apply cache serializes to then both the state for the apply cache and the changes that committed changes/transactions make will be persisted (or not persisted) together. What if I am replicating from x86 to x86_64 via a apply module that does textout conversions? x86 Proxy x86_64 WAL-- apply cache | (proxy catalog) | apply module textout - SQL statements How do we ensure that the commits are all visible(or not visible) on the catalog on the proxy instance used for decoding WAL, the destination database, and the state + spill files of the apply cache stay consistent in the event of a crash of either the proxy or the target? I don't think you can (unless we consider two-phase commit, and I'd rather we didn't). Can we come up with a way of avoiding the need for them to be consistent with each other? I think apply modules will need to be able to be passed the same transaction twice (once before the crash and again after) and come up with a way of deciding if that transaction has a) Been applied to the translation/proxy catalog and b) been applied on the replica instance. How is the walreceiver going to decide which WAL sgements it needs to re-process after a crash? I would want to see more of these details worked out before we finalize the interface between the apply cache and the apply modules and how the serialization works. Code Review = applycache.h --- +typedef struct ApplyCacheTupleBuf +{ +/* position in preallocated list */ +ilist_s_node node; + +HeapTupleData tuple; +HeapTupleHeaderData header; +char data[MaxHeapTupleSize]; +} ApplyCacheTupleBuf; Each ApplyCacheTupleBuf will be about 8k (BLKSZ) big no matter how big the data in the transaction is? Wouldn't workloads with inserts of lots of small rows in a transaction eat up lots of memory that is allocated but storing nothing? The only alternative I can think of is dynamically allocating these and I don't know what the cost/benefit of that overhead will be versus spilling to disk sooner. +* FIXME: better name + */ +ApplyCacheChange* +ApplyCacheGetChange(ApplyCache*); How about: ApplyCacheReserveChangeStruct(..) ApplyCacheReserveChange(...) ApplyCacheAllocateChange(...) as ideas? +/* + * Return an unused ApplyCacheChange struct +*/ +void +ApplyCacheReturnChange(ApplyCache*, ApplyCacheChange*);
Re: [HACKERS] [PATCH 08/16] Introduce the ApplyCache module which can reassemble transactions from a stream of interspersed changes
On 12-06-13 07:28 AM, Andres Freund wrote: From: Andres Freundand...@anarazel.de The individual changes need to be identified by an xid. The xid can be a subtransaction or a toplevel one, at commit those can be reintegrated by doing a k-way mergesort between the individual transaction. Callbacks for apply_begin, apply_change and apply_commit are provided to retrieve complete transactions. Missing: - spill-to-disk - correct subtransaction merge, current behaviour is simple/wrong - DDL handling (?) - resource usage controls Here is an initial review of the ApplyCache patch. This patch provides a module for taking actions in the WAL stream and groups the actions by transaction, then passing these change records to a set of plugin functions. For each transaction it encounters it keeps a list of the actions in that transaction. The ilist included in an earlier patch is used, changes resulting from that patch review would effect the code here but not in a way that chances the design. When the module sees a commit for a transaction it calls the apply_change callback for each change. I can think of three ways that a replication system like this could try to apply transactions. 1) Each time it sees a new transaction it could open up a new transaction on the replica and makes that change. It leaves the transaction open and goes on applying the next change (which might be for the current transaction or might be for another one). When it comes across a commit record it would then commit the transaction. If 100 concurrent transactions were open on the origin then 100 concurrent transactions will be open on the replica. 2) Determine the commit order of the transactions, group all the changes for a particular transaction together and apply them in that order for the transaction that committed first, commit that transaction and then move onto the transaction that committed second. 3) Group the transactions in a way that you move the replica from one consistent snapshot to another. This is what Slony and Londiste do because they don't have the commit order or commit timestamps. Built-in replication can do better. This patch implements option (2). If we had a way of implementing option (1) efficiently would we be better off? Option (2) requires us to put unparsed WAL data (HeapTuples) in the apply cache. You can't translate this to an independent LCR until you call the apply_change record (which happens once the commit is encountered). The reason for this is because some of the changes might be DDL (or things generated by a DDL trigger) that will change the translation catalog so you can't translate the HeapData to LCR's until your at a stage where you can update the translation catalog. In both cases you might need to see later WAL records before you can convert an earlier one into an LCR (ie TOAST). Some of my concerns with the apply cache are Big transactions (bulk loads, mass updates) will be cached in the apply cache until the commit comes along. One issue Slony has todo with bulk operations is that the replicas can't start processing the bulk INSERT until after it has commited. If it takes 10 hours to load the data on the master it will take another 10 hours (at best) to load the data into the replica(20 hours after you start the process). With binary streaming replication your replica is done processing the bulk update shortly after the master is. Long running transactions can sit in the cache for a long time. When you spill to disk we would want the long running but inactive ones spilled to disk first. This is solvable but adds to the complexity of this module, how were you planning on managing which items of the list get spilled to disk? The idea that we can safely reorder the commands into transactional groupings works (as far as I know) today because DDL commands get big heavy locks that are held until the end of the transaction. I think Robert mentioned earlier in the parent thread that maybe some of that will be changed one day. The downsides of (1) that I see are: We would want a single backend to keep open multiple transactions at once. How hard would that be to implement? Would subtransactions be good enough here? Applying (or even translating WAL to LCR's) the changes in parallel across transactions might complicate the catalog structure because each concurrent transaction might need its own version of the catalog (or can you depend on the locking at the master for this? I think you can today) With approach (1) changes that are part of a rolledback transaction would have more overhead because you would call apply_change on them. With approach (1) a later component could still group the LCR's by transaction before applying by running the LCR's through a data structure very similar to the ApplyCache. I think I need more convincing that approach (2), what this patch implements, is the best way doing
Re: [HACKERS] [PATCH 10/16] Introduce the concept that wal has a 'origin' node
On 12-06-18 07:30 AM, Andres Freund wrote: Hrmpf #666. I will go through through the series commit-by-commit again to make sure everything compiles again. Reordinging this late definitely wasn't a good idea... I pushed a rebased version with all those fixups (and removal of the zeroRecPtr patch). Where did you push that rebased version to? I don't see an attachment, or an updated patch in the commitfest app and your repo at http://git.postgresql.org/gitweb/?p=users/andresfreund/postgres.git;a=summary hasn't been updated in 5 days. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH 10/16] Introduce the concept that wal has a 'origin' node
On 12-06-18 11:50 AM, Andres Freund wrote: Hi Simon, I think we need to agree on the parameter name. It currently is 'multimaster_node_id'. In the discussion with Steve we got to replication_node_id. I don't particularly like either. Other suggestions? Other things that come to mind (for naming this parameter in the postgresql.conf) node_id origin_node_id local_node_id I wished we had some flag bits available before as well. I find 256 nodes a pretty low value to start with though, 4096 sounds better though, so I would be happy with 4 flag bits. I think for cascading setups and such you want to add node ids for every node, not only masters... Any opinions from others on this? 256 sounds a bit low to me as well. Sometimes the use case of a retail chain comes up where people want each store to have a postgresql instance and replicate back to a central office. I can think of many chains with more than 256 stores. Thanks, Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH 10/16] Introduce the concept that wal has a 'origin' node
On 12-06-13 01:27 PM, Andres Freund wrote: The previous mail contained a patch with a mismerge caused by reording commits. Corrected version attached. Thanks to Steve Singer for noticing this quickly. Attached is a more complete review of this patch. I agree that we will need to identify the node a change originated at. We will not only want this for multi-master support but it might also be very helpful once we introduce things like cascaded replicas. Using a 16 bit integer for this purpose makes sense to me. This patch (with the previous numbered patches already applied), still doesn't compile. gcc -O2 -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -I../../../../src/include -D_GNU_SOURCE -c -o xact.o xact.c xact.c: In function 'xact_redo_commit': xact.c:4678: error: 'xl_xact_commit' has no member named 'origin_lsn' make[4]: *** [xact.o] Error 1 Your complete patch set did compile. origin_lsn gets added as part of your 12'th patch. Managing so many related patches is going to be a pain. but it beats one big patch. I don't think this patch actually requires the origin_lsn change. Code Review - src/backend/utils/misc/guc.c @@ -1598,6 +1600,16 @@ static struct config_int ConfigureNamesInt[] = }, { +{multimaster_node_id, PGC_POSTMASTER, REPLICATION_MASTER, +gettext_noop(node id for multimaster.), +NULL +}, + guc_replication_origin_id, +InvalidMultimasterNodeId, InvalidMultimasterNodeId, MaxMultimasterNodeId, +NULL, assign_replication_node_id, NULL +}, I'd rather see us refer to this as the 'node id for logical replication' over the multimaster node id. I think that terminology will be less controversial. Multi-master means different things to different people and it is still unclear what forms of multi-master we will have in-core. For example, most people don't consider slony to be multi-master replication. If a future version of slony were to feed off logical replication (instead of triggers) then I think it would need this node id to determine which node a particular change has come from. The description inside the gettext call should probably be Sets the node id for . to be consistent with the description of the rest of the GUC's BootStrapXLOG in xlog.c creates a XLogRecord structure and shouldit set xl_origin_id to the InvalidMultimasterNodeId? WriteEmptyXLOG in pg_resetxlog.c might also should set xl_origin_id to a well defined value. I think InvalidMultimasterNodeId should be safe even for a no-op record in from a node that actually has a node_id set on real records. backend/replication/logical/logical.c: XLogRecPtr current_replication_origin_lsn = {0, 0}; This variable isn't used/referenced by this patch it probably belongs as part of the later patch. Steve Andres
Re: [HACKERS] [RFC][PATCH] Logical Replication/BDR prototype and architecture
On 12-06-15 04:03 PM, Robert Haas wrote: On Thu, Jun 14, 2012 at 4:13 PM, Andres Freundand...@2ndquadrant.com wrote: I don't plan to throw in loads of conflict resolution smarts. The aim is to get to the place where all the infrastructure is there so that a MM solution can be built by basically plugging in a conflict resolution mechanism. Maybe providing a very simple one. I think without in-core support its really, really hard to build a sensible MM implementation. Which doesn't mean it has to live entirely in core. Of course, several people have already done it, perhaps most notably Bucardo. Anyway, it would be good to get opinions from more people here. I am sure I am not the only person with an opinion on the appropriateness of trying to build a multi-master replication solution in core or, indeed, the only person with an opinion on any of these other issues. This sounds like a good place for me to chime in. I feel that in-core support to capture changes and turn them into change records that can be replayed on other databases, without relying on triggers and log tables, would be good to have. I think we want some flexible enough that people write consumers of the LCRs to do conflict resolution for multi-master but I am not sure that the conflict resolution support actually belongs in core. Most of the complexity of slony (both in terms of lines of code, and issues people encounter using it) comes not from the log triggers or replay of the logged data but comes from the configuration of the cluster. Controlling things like * Which tables replicate from a node to which other nodes * How do you change the cluster configuration on a running system (adding nodes, removing nodes, moving the origin of a table, adding tables to replication etc...) This is the harder part of the problem, I think we need to first get the infrastructure committed (that the current patch set deals with) to capturing, transporting and translating the LCR's into the system before get too caught up in the configuration aspects. I think we will have a hard time agreeing on behaviours for some of that other stuff that are both flexible for enough use cases and simple enough for administrators. I'd like to see in-core support for a lot of that stuff but I'm not holding my breath. It is not good for those other opinions to be saved for a later date. Hm. Yes, you could do that. But I have to say I don't really see a point. Maybe the fact that I do envision multimaster systems at some point is clouding my judgement though as its far less easy in that case. Why? I don't think that particularly changes anything. It also complicates the wal format as you now need to specify whether you transport a full or a primary-key only tuple... Why? If the schemas are in sync, the target knows what the PK is perfectly well. If not, you're probably in trouble anyway. I think though that we do not want to enforce that mode of operation for tightly coupled instances. For those I was thinking of using command triggers to synchronize the catalogs. One of the big screwups of the current replication solutions is exactly that you cannot sensibly do DDL which is not a big problem if you have a huge system with loads of different databases and very knowledgeable people et al. but at the beginning it really sucks. I have no problem with making one of the nodes the schema master in that case. Also I would like to avoid the overhead of the proxy instance for use-cases where you really want one node replicated as fully as possible with the slight exception of being able to have summing tables, different indexes et al. In my view, a logical replication solution is precisely one in which the catalogs don't need to be in sync. If the catalogs have to be in sync, it's not logical replication. ISTM that what you're talking about is sort of a hybrid between physical replication (pages) and logical replication (tuples) - you want to ship around raw binary tuple data, but not entire pages. The problem with that is it's going to be tough to make robust. Users could easily end up with answers that are total nonsense, or probably even crash the server. I see three catalogs in play here. 1. The catalog on the origin 2. The catalog on the proxy system (this is the catalog used to translate the WAL records to LCR's). The proxy system will need essentially the same pgsql binaries (same architecture, important complie flags etc..) as the origin 3. The catalog on the destination system(s). The catalog 2 must be in sync with catalog 1, catalog 3 shouldn't need to be in-sync with catalog 1. I think catalogs 2 and 3 are combined in the current patch set (though I haven't yet looked at the code closely). I think the performance optimizations Andres has implemented to update tuples through low-level functions should be left for later and that we should be generating SQL in the apply cache so we don't start
Re: [HACKERS] [RFC][PATCH] Logical Replication/BDR prototype and architecture
On 12-06-13 07:27 AM, Andres Freund wrote: Its also available in the 'cabal-rebasing' branch on git.postgresql.org/users/andresfreund/postgres.git . That branch will modify history though. That branch has a merge error in f685a11ce43b9694cbe61ffa42e396c9fbc32b05 gcc -O2 -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -I../../../../src/include -D_GNU_SOURCE -c -o xact.o xact.c xact.c:4684: error: expected identifier or ‘(’ before ‘’ token xact.c:4690:46: warning: character constant too long for its type xact.c:4712:46: warning: character constant too long for its type xact.c:4719: error: expected identifier or ‘(’ before ‘’ token xact.c:4740:46: warning: character constant too long for its type make[4]: *** [xact.o] Error 1 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Online base backup from the hot-standby
On 12-01-17 05:38 AM, Fujii Masao wrote: On Fri, Jan 13, 2012 at 5:02 PM, Fujii Masaomasao.fu...@gmail.com wrote: The amount of code changes to allow pg_basebackup to make a backup from the standby seems to be small. So I ended up merging that changes and the infrastructure patch. WIP patch attached. But I'd happy to split the patch again if you want. Attached is the updated version of the patch. I wrote the limitations of standby-only backup in the document and changed the error messages. Here is my review of this verison of the patch. I think this patch has been in every CF for 9.2 and I feel it is getting close to being committed. The only issue of significants is a crash I encountered while testing, see below. I am fine with including the pg_basebackup changes in the patch it also makes testing some of the other changes possible. The documentation updates you have are good I don't see any issues looking at the code. Testing Review I encountered this on my first replica (the one based on the master). I am not sure if it is related to this patch, it happened after the pg_basebackup against the replica finished. TRAP: FailedAssertion(!(((xid) != ((TransactionId) 0))), File: twophase.c, Line: 1238) LOG: startup process (PID 1) was terminated by signal 6: Aborted LOG: terminating any other active server processes A little earlier this postmaster had printed. LOG: restored log file 0001001F from archive LOG: restored log file 00010020 from archive cp: cannot stat `/usr/local/pgsql92git/archive/00010021': No such file or directory LOG: unexpected pageaddr 0/1900 in log file 0, segment 33, offset 0 cp: cannot stat `/usr/local/pgsql92git/archive/00010021': No such file or directory I have NOT been able to replicate this error and I am not sure exactly what I had done in my testing prior to that point. In another test run I had - set full page writes=off and did a checkpoint - Started the pg_basebackup - set full_page_writes=on and did a HUP + some database activity that might have forced a checkpoint. I got this message from pg_basebackup. ./pg_basebackup -D ../data3 -l foo -h localhost -p 5438 pg_basebackup: could not get WAL end position from server I point this out because the message is different than the normal could not initiate base backup: FATAL: WAL generated with full_page_writes=off thatI normally see.We might want to add a PQerrorMessage(conn)) to pg_basebackup to print the error details. Since this patch didn't actually change pg_basebackup I don't think your required to improve the error messages in it. I am just mentioning this because it came up in testing. The rest of the tests I did involving changing full_page_writes with/without checkpoints and sighups and promoting the replica seemed to work as expected. Regards,
Re: [HACKERS] static or dynamic libpgport
On 11-12-09 11:13 AM, Andrew Dunstan wrote: Recently I attempted to build an external package (pg_bulkload) against the latest Fedora packages. Unfortunately this fails, as pgxs adds -lpgport to any link line for an executable, and the corresponding libpgport.a isn't there. And in fact, pg_bulkload does use some of the functionality there (e.g. pg_strncasecmp), so just stripping -lpgport out doesn't work either. This happened because Fedora packaging guidelines http://fedoraproject.org/wiki/Packaging:Guidelines#Packaging_Static_Libraries are strongly against shipping static libraries, and so all the PostgreSQL static libraries are excluded from the distribution (and I believe there are similar restrictions for RHEL). Of these libraries, I believe the only one that is *only* built as a static library is libpgport. Is there any good reason why we shouldn't build and install a dynamic libpgport.so? +1 We've struggled with slony and pgport because so many users have had problems with pgport not being included in some distributions. It has some useful functions, I think recent versions of slony use it on win32 but don't elsewhere. Wee have had at least one patch floating around that makes conditionally includes certain small behaviours in slony based on if pgport is available or not based on a configure check. What package would a shared static pgport be installed with? Slony requires a server + headers to build but slon and slonik only have a runtime dependency on libpq (I don't know if anyone installs slon/slonik on a machine without a postgresql server but you could) Steve cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] plpython SPI cursors
On 11-11-23 01:58 PM, Jan Urbański wrote: On 20/11/11 19:14, Steve Singer wrote: On 11-10-15 07:28 PM, Jan Urbański wrote: Hi, attached is a patch implementing the usage of SPI cursors in PL/Python. Currently when trying to process a large table in PL/Python you have slurp it all into memory (that's what plpy.execute does). J I found a few bugs (see my testing section below) that will need fixing + a few questions about the code Responding now to all questions and attaching a revised patch based on your comments. I've looked over the revised version of the patch and it now seems fine. Ready for committer. Do we like the name plpy.cursor or would we rather call it something like plpy.execute_cursor(...) or plpy.cursor_open(...) or plpy.create_cursor(...) Since we will be mostly stuck with the API once we release 9.2 this is worth some opinions on. I like cursor() but if anyone disagrees now is the time. We use plpy.subtransaction() to create Subxact objects, so I though plpy.cursor() would be most appropriate. This patch does not provide a wrapper around SPI_cursor_move. The patch is useful without that and I don't see anything that preculdes someone else adding that later if they see a need. My idea is to add keyword arguments to plpy.cursor() that will allow you to decide whether you want a scrollable cursor and after that provide a move() method. The patch includes documentation updates that describes the new feature. The Database Access page doesn't provide a API style list of database access functions like the plperl http://www.postgresql.org/docs/9.1/interactive/plperl-builtins.html page does. I think the organization of the perl page is clearer than the python one and we should think about a doing some documentaiton refactoring. That should be done as a seperate patch and shouldn't be a barrier to committing this one. Yeah, the PL/Python docs are a bit chaotic right now. I haven't yet summoned force to overhaul them. in PLy_cursor_plan line 4080 + PG_TRY(); + { + Portal portal; + char *volatile nulls; + volatile int j; I am probably not seeing a code path or misunderstanding something about the setjmp/longjump usages but I don't see why nulls and j need to be volatile here. It looked like you could drop volatile there (and in PLy_spi_execute_plan, where this is copied from (did I mention there's quite some code duplication in PL/Python?)) but digging in git I found this commit: http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=2789b7278c11785750dd9d2837856510ffc67000 that added the original volatile qualification, so I guess there's a reason. line 444 PLy_cursor(PyObject *self, PyObject *args) + { + char *query; + PyObject *plan; + PyObject *planargs = NULL; + + if (PyArg_ParseTuple(args, s, query)) + return PLy_cursor_query(query); + Should query be freed with PyMem_free() No, PyArg_ParseTuple returns a string on the stack, I check that repeatedly creating a cursor with a plan argument does not leak memory and that adding PyMem_Free there promptly leads to a segfault. I tested both python 2.6 and 3 on a Linux system [test cases demonstrating bugs] Turns out it's a really bad idea to store pointers to Portal structures, because they get invalidated by the subtransaction abort hooks. I switched to storing the cursor name and looking it up in the appropriate hash table every time it's used. The examples you sent (which I included as regression tests) now cause a ValueError to be raised with a message stating that the cursor has been created in an aborted subtransaction. Not sure about the wording of the error message, though. Thanks again for the review! Cheers, Jan
Re: [HACKERS] plpython SPI cursors
On 11-10-15 07:28 PM, Jan Urbański wrote: Hi, attached is a patch implementing the usage of SPI cursors in PL/Python. Currently when trying to process a large table in PL/Python you have slurp it all into memory (that's what plpy.execute does). J I found a few bugs (see my testing section below) that will need fixing + a few questions about the code Overview Feature Review --- This patch adds cursor support to plpython. SPI cursors will allow a plpython function to read process a large results set without having to read it all into memory at once. This is a good thing. Without this patch I think you could accomplish the same with using SQL DECLARE CURSOR and SQL fetch.This feature allows you to use a python cursor as an iterator resulting in much cleaner python code than the SQL FETCH approach. I think the feature is worth having Usability Review -- The patch adds the user methods cursor=plpy.cursor(query_or_plan) cursor.fetch(100) cursor.close() Do we like the name plpy.cursor or would we rather call it something like plpy.execute_cursor(...) or plpy.cursor_open(...) or plpy.create_cursor(...) Since we will be mostly stuck with the API once we release 9.2 this is worth some opinions on. I like cursor() but if anyone disagrees now is the time. This patch does not provide a wrapper around SPI_cursor_move. The patch is useful without that and I don't see anything that preculdes someone else adding that later if they see a need. Documentation Review - The patch includes documentation updates that describes the new feature. The Database Access page doesn't provide a API style list of database access functions like the plperl http://www.postgresql.org/docs/9.1/interactive/plperl-builtins.html page does. I think the organization of the perl page is clearer than the python one and we should think about a doing some documentaiton refactoring. That should be done as a seperate patch and shouldn't be a barrier to committing this one. Code Review in PLy_cursor_plan line 4080 + PG_TRY(); + { + Portalportal; + char *volatile nulls; + volatile int j; + + if (nargs 0) + nulls = palloc(nargs * sizeof(char)); + else + nulls = NULL; + + for (j = 0; j nargs; j++) + { + PyObject *elem; I am probably not seeing a code path or misunderstanding something about the setjmp/longjump usages but I don't see why nulls and j need to be volatile here. line 444 PLy_cursor(PyObject *self, PyObject *args) + { + char*query; + PyObject*plan; + PyObject *planargs = NULL; + + if (PyArg_ParseTuple(args, s, query)) + return PLy_cursor_query(query); + Should query be freed with PyMem_free() Testing --- I tested both python 2.6 and 3 on a Linux system create or replace function x() returns text as $$ cur=None try: with plpy.subtransaction(): cur=plpy.cursor('select generate_series(1,1000)') rows=cur.fetch(10); plpy.execute('select f()') except plpy.SPIError: rows=cur.fetch(10); return rows[0]['generate_series'] return 'none' $$ language plpythonu; select x(); crashes the backend test=# select x(); The connection to the server was lost. Attempting reset: LOG: server process (PID 3166) was terminated by signal 11: Segmentation fault The below test gives me a strange error message: create or replace function x1() returns text as $$ plan=None try: with plpy.subtransaction(): plpy.execute('CREATE TEMP TABLE z AS select generate_series(1,1000)') plan=plpy.prepare('select * FROM z') plpy.execute('select * FROM does_not_exist') except plpy.SPIError, e: cur=plpy.cursor(plan) rows=cur.fetch(10) return rows[0]['generate_series'] return '1' $$ language plpythonu; select x1(); test=# select x1() test-# ; ERROR: TypeError: Expected sequence of 82187072 arguments, got 0: NULL CONTEXT: Traceback (most recent call last): PL/Python function x1, line 9, in module cur=plpy.cursor(plan) PL/Python function x1 STATEMENT: select x1() I was expecting an error from the function just a bit more useful one. Performance --- I did not do any specific performance testing but I don't see this patch as having any impact to performance -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Online base backup from the hot-standby
On 11-10-31 12:11 AM, Jun Ishiduka wrote: Agreed. I'll extract FPW stuff from the patch that I submitted, and revise it as the infrastructure patch. The changes of pg_start_backup() etc that Ishiduka-san did are also a server-side infrastructure. I will extract them as another infrastructure one. Ishiduka-san, if you have time, feel free to try the above, barring objection. Done. Changed the name of the patch. Modifications So changed to the positioning of infrastructure, * Removed the documentation. * changed to an error when you run pg_start/stop_backup() on the standby. Here is my stab at reviewing this version of this version of the patch. Submission --- The purpose of this version of the patch is to provide some infrastructure needed for backups from the slave without having to solve some of the usability issues raised in previous versions of the patch. This patch applied fine earlier versions of head but it doesn't today. Simon moved some of the code touched by this patch as part of the xlog refactoring. Please post an updated/rebased version of the patch. I think the purpose of this patch is to provide a) The code changes to record changes to fpw state of the master in WAL. b) Track the state of FPW while in recovery mode This version of the patch is NOT intended to allow SQL calls to pg_start_backup() on slaves to work. This patch lays the infrastructure for another patch (which I haven't seen) to allow pg_basebackup to do a base backup from a slave assuming fpw=on has been set on the master (my understanding of this patch is that it puts into place all of the pieces required for the pg_basebackup patch to detect if fpw!=on and abort). The consensus upthread was to get this infrastructure in and figure out a safe+usable way of doing a slave backup without pg_basebackup later. The patch seems to do what I expect of it. I don't see any issues with most of the code changes in this patch. However I admit that even after reviewing many versions of this patch I still am not familiar enough with the recovery code to comment on a lot of the details. One thing I did see: In pg_ctl.c ! if (stat(recovery_file, statbuf) != 0) ! print_msg(_(WARNING: online backup mode is active\n ! Shutdown will not complete until pg_stop_backup() is called.\n\n)); ! else ! print_msg(_(WARNING: online backup mode is active if you can connect as a superuser to server\n ! If so, shutdown will not complete until pg_stop_backup() is called.\n\n)); I am having difficulty understanding what this error message is trying to tell me. I think it is telling me (based on the code comments) that if I can't connect to the server because the server is not yet accepting connections then I shouldn't worry about anything. However if the server is accepting connections then I need to login and call pg_stop_backup(). Maybe WARNING: online backup mode is active. If your server is accepting connections then you must connect as superuser and run pg_stop_backup() before shutdown will complete I will wait on attempting to test the patch until you have sent a version that applies against the current HEAD. Regards. Jun Ishizuka NTT Software Corporation TEL:045-317-7018 E-Mail: ishizuka@po.ntts.co.jp
Re: [HACKERS] pg_dump 9.1.1 hanging (collectSecLabels gets 0 labels)
On 11-11-09 06:35 PM, Tom Lane wrote: Steve Singerssin...@ca.afilias.info writes: I've tracked the issue down to collectSecLabels in pg_dump.c SELECT label, provider, classoid, objoid, objsbid FROM pg_catalog.pg_seclabel; returns 0 rows. The code in collectSecLabels() is not prepared to deal with a zero row result and tries to malloc 0 bytes. pg_seclabel is almost always empty, so I'm not convinced that you've identified your problem correctly. regards, tom lane The attached patch seems to fix the issue. The man page for malloc on AIX is pretty clear on what happens when you try to malloc 0 bytes. It returns NULL. diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c index fce9d3b..9e31767 100644 *** a/src/bin/pg_dump/pg_dump.c --- b/src/bin/pg_dump/pg_dump.c *** findSecLabels(Archive *fout, Oid classoi *** 11760,11766 /* Get security labels if we didn't already */ if (nlabels 0) nlabels = collectSecLabels(fout, labels); ! /* * Do binary search to find some item matching the object. */ --- 11760,11770 /* Get security labels if we didn't already */ if (nlabels 0) nlabels = collectSecLabels(fout, labels); ! if (nlabels == 0) ! { ! *items=NULL; ! return 0; ! } /* * Do binary search to find some item matching the object. */ *** collectSecLabels(Archive *fout, SecLabel *** 11858,11875 i_objsubid = PQfnumber(res, objsubid); ntups = PQntuples(res); ! ! labels = (SecLabelItem *) malloc(ntups * sizeof(SecLabelItem)); ! ! for (i = 0; i ntups; i++) { ! labels[i].label = PQgetvalue(res, i, i_label); ! labels[i].provider = PQgetvalue(res, i, i_provider); ! labels[i].classoid = atooid(PQgetvalue(res, i, i_classoid)); ! labels[i].objoid = atooid(PQgetvalue(res, i, i_objoid)); ! labels[i].objsubid = atoi(PQgetvalue(res, i, i_objsubid)); } /* Do NOT free the PGresult since we are keeping pointers into it */ destroyPQExpBuffer(query); --- 11862,11889 i_objsubid = PQfnumber(res, objsubid); ntups = PQntuples(res); ! if ( ntups == 0) { ! labels = NULL; } + else + { + labels = (SecLabelItem *) malloc(ntups * sizeof(SecLabelItem)); + if (labels == NULL ) + { + write_msg(NULL, out of memory); + exit(1); + } + for (i = 0; i ntups; i++) + { + labels[i].label = PQgetvalue(res, i, i_label); + labels[i].provider = PQgetvalue(res, i, i_provider); + labels[i].classoid = atooid(PQgetvalue(res, i, i_classoid)); + labels[i].objoid = atooid(PQgetvalue(res, i, i_objoid)); + labels[i].objsubid = atoi(PQgetvalue(res, i, i_objsubid)); + } + } /* Do NOT free the PGresult since we are keeping pointers into it */ destroyPQExpBuffer(query); -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_dump 9.1.1 hanging (collectSecLabels gets 0 labels)
On 11-11-10 02:00 PM, Tom Lane wrote: Steve Singerssin...@ca.afilias.info writes: The man page for malloc on AIX is pretty clear on what happens when you try to malloc 0 bytes. It returns NULL. Yes, that's a pretty common behavior for malloc(0). It should not cause a problem here AFAICS. ... Oh, I see, the problem is thatlabels[-1] might not compare to labels[0] the way we want. I think only the first hunk of your patch is actually necessary. regards, tom lane Yes the problem is still fixed if I only apply the first hunk. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] pg_dump 9.1.1 hanging (collectSecLabels gets 0 labels)
We have a cluster running 9.1.1 where pg_dump hangs when we try to dump some a database inside of the cluster. The server is running AIX. I can see this on clean cluster where we do an initdb, followed by a createdb and try running pg_dump. I've tracked the issue down to collectSecLabels in pg_dump.c SELECT label, provider, classoid, objoid, objsbid FROM pg_catalog.pg_seclabel; returns 0 rows. The code in collectSecLabels() is not prepared to deal with a zero row result and tries to malloc 0 bytes. I am not yet sure if the problem is that my pg_seclabel is empty or if the issue is in collectSecLabels() or if collectSecLabels shouldn't even be called. Has anyone seen something similar? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Online base backup from the hot-standby
On 11-10-25 02:44 AM, Heikki Linnakangas wrote: With pg_basebackup, we have a fighting chance of getting this right, because we have more control over how the backup is made. For example, we can co-operate with the buffer manager to avoid torn-pages, eliminating the need for full_page_writes=on, and we can include a control file with the correct end-of-backup location automatically, without requiring user intervention. pg_basebackup is less flexible than the pg_start/stop_backup method, and unfortunately you're more likely to need the flexibility in a more complicated setup with a hot standby server and all, but making the generic pg_start/stop_backup method work seems infeasible at the moment. Would pg_basebackup be able to work with the buffer manager on the slave to avoid full_page_writes=on needing to be set on the master? (the point of this is to be able to take the base backup without having the backup program contact the master). If so could pg_start_backup() not just put the buffer manager into the same state? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Online base backup from the hot-standby
On 11-10-11 11:17 AM, Jun Ishiduka wrote: Done. Updated patch attached. I have taken Jun's latest patch and applied it on top of Fujii's most recent patch. I did some testing with the result but nothing theory enough to stumble on any race conditions. Some testing notes -- select pg_start_backup('x'); ERROR: full_page_writes on master is set invalid at least once since latest checkpoint I think this error should be rewritten as ERROR: full_page_writes on master has been off at some point since latest checkpoint We should be using 'off' instead of 'invalid' since that is what is what the user sets it to. I switched full_page_writes=on , on the master did a pg_start_backup() on the slave1. Then I switched full_page_writes=off on the master, did a reload + checkpoint. I was able to then do my backup of slave1, copy the control file, and pg_stop_backup(). When I did the test slave2 started okay, but is this safe? Do we need a warning from pg_stop_backup() that is printed if it is detected that full_page_writes was turned off on the master during the backup period? Code Notes - *** 6865,6870 --- 6871,6886 /* Pre-scan prepared transactions to find out the range of XIDs present */ oldestActiveXID = PrescanPreparedTransactions(NULL, NULL); + /* + * The startup updates FPW in shaerd-memory after REDO. However, it must + * perform before writing the WAL of the CHECKPOINT. The reason is that + * it uses a value of fpw in shared-memory when it writes a WAL of its + * CHECKPOTNT. + */ Minor typo above at 'CHECKPOTNT' If my concern about full page writes being switched to off in the middle of a backup is unfounded then I think this patch is ready for a committer. They can clean the two editorial changes when they apply the patches. If do_pg_stop_backup is going to need some logic to recheck the full page write status then an updated patch is required. Regards. Jun Ishizuka NTT Software Corporation TEL:045-317-7018 E-Mail: ishizuka@po.ntts.co.jp
Re: [HACKERS] Online base backup from the hot-standby
On 11-09-26 10:56 PM, Fujii Masao wrote: Looks weired. Though the WAL record starting from 0/6000298 was read successfully, then re-fetch of the same record fails at the end of recovery. One possible cause is the corruption of archived WAL file. What restore_command on the standby and archive_command on the master are you using? Could you confirm that there is no chance to overwrite archive WAL files in your environment? I tried to reproduce this problem several times, but I could not. Could you provide the test case which reproduces the problem? This is the test procedure I'm trying today, I wasn't able to reproduce the crash. What I was doing the other day was similar but I can't speak to unintentional differences. I have my master server data port=5439 wal_level=hot_standby archive_mode=on archive_command='cp -i %p /usr/local/pgsql92git/archive/%f' hot_standby=on I then run select pg_start_backup('foo'); $ rm -r ../data2 $ cp -r ../data ../data2 $ rm ../data2/postmaster.pid select pg_stop_backup(); I edit data2/postgresql.conf so port=5438 I commented out archive_mode and archive_command (or at least today I did) recovery.conf is standby_mode='on' primary_conninfo='host=127.0.0.1 port=5439 user=ssinger dbname=test' restore_command='cp /usr/local/pgsql92git/archive/%f %p' I then start up the second cluster. On it I run select pg_start_backup('1'); $ rm -r ../data3 $ rm -r ../archive2 $ cp -r ../data2 ../data3 $ cp ../data2/global/pg_control ../data3/global select pg_stop_backup(); I edit ../data2/postgresql.conf port=5437 archive_mode=on # (change requires restart) archive_command='cp -i %p /usr/local/pgsql92git/archive2/%f' recovery.conf is standby_mode='on' primary_conninfo='host=127.0.0.1 port=5439 user=ssinger dbname=test' restore_command='cp /usr/local/pgsql92git/archive/%f %p' trigger_file='/tmp/3' $ postgres -D ../data3 The first time I did this postgres came up quickly. $ touch /tmp/3 worked fine. I then stopped data3 $ rm -r ../data3 on data 2 I run pg_start_backup('1') $ cp -r ../data2 ../data3 $ cp ../data2/global/pg_control ../data3/global select pg_stop_backup() # on data2 $ rm ../data3/postmaster.pid vi ../data3/postgresql.conf # same changes as above for data3 vi ../data3/recovery.conf # same as above for data 3 postgres -D ../data3 This time I got ./postgres -D ../data3 LOG: database system was interrupted while in recovery at log time 2011-09-27 22:04:17 GMT HINT: If this has occurred more than once some data might be corrupted and you might need to choose an earlier recovery target. LOG: entering standby mode cp: cannot stat `/usr/local/pgsql92git/archive/0001000C': No such file or directory LOG: redo starts at 0/C20 LOG: record with incorrect prev-link 0/958 at 0/CB0 cp: cannot stat `/usr/local/pgsql92git/archive/0001000C': No such file or directory LOG: streaming replication successfully connected to primary FATAL: the database system is starting up FATAL: the database system is starting up LOG: consistent recovery state reached at 0/CE8 LOG: database system is ready to accept read only connections In order to get the database to come in read only mode I manually issued a checkpoint on the master (data) shortly after the checkpoint command the data3 instance went to read only mode. then touch /tmp/3 trigger file found: /tmp/3 FATAL: terminating walreceiver process due to administrator command cp: cannot stat `/usr/local/pgsql92git/archive/0001000C': No such file or directory LOG: record with incorrect prev-link 0/9000298 at 0/C0002F0 cp: cannot stat `/usr/local/pgsql92git/archive/0001000C': No such file or directory LOG: redo done at 0/C000298 cp: cannot stat `/usr/local/pgsql92git/archive/0001000C': No such file or directory cp: cannot stat `/usr/local/pgsql92git/archive/0002.history': No such file or directory LOG: selected new timeline ID: 2 cp: cannot stat `/usr/local/pgsql92git/archive/0001.history': No such file or directory LOG: archive recovery complete LOG: database system is ready to accept connections LOG: autovacuum launcher started It looks like data3 is still pulling files with the recovery command after it sees the touch file (is this expected behaviour?) $ grep archive ../data3/postgresql.conf #wal_level = minimal# minimal, archive, or hot_standby #archive_mode = off# allows archiving to be done archive_mode=on archive_command='cp -i %p /usr/local/pgsql92git/archive2/%f' I have NOT been able to make postgres crash during a recovery (today). It is *possible* that on some of my runs the other day I had skipped changing the archive command on data3 to write to archive2 instead of archive. I have also today not been able to get it to attempt to restore the same WAL file twice. If a base backup is in progress on a recovery database
Re: [HACKERS] Online base backup from the hot-standby
On 11-09-22 09:24 AM, Fujii Masao wrote: On Wed, Sep 21, 2011 at 11:50 AM, Fujii Masaomasao.fu...@gmail.com wrote: 2011/9/13 Jun Ishidukaishizuka@po.ntts.co.jp: Update patch. Changes: * set 'on' full_page_writes by user (in document) * read FROM: XX in backup_label (in xlog.c) * check status when pg_stop_backup is executed (in xlog.c) Thanks for updating the patch. Before reviewing the patch, to encourage people to comment and review the patch, I explain what this patch provides: Attached is the updated version of the patch. I refactored the code, fixed some bugs, added lots of source code comments, improved the document, but didn't change the basic design. Please check this patch, and let's use this patch as the base if you agree with that. I have looked at both Jun's patch from Sept 13 and Fujii's updates to the patch. I agree that Fujii's updated version should be used as the basis for changes going forward. My comments below refer to that version (unless otherwise noted). In backup.sgml the new section titled Making a Base Backup during Recovery I would prefer to see some mention in the title that this procedure is for standby servers ie Making a Base Backup from a Standby Database. Users who have setup a hot-standby database should be familiar with the 'standby' terminology. I agree that the during recovery description is technically correct but I'm not sure someone who is looking through the manual for instructions on making a base backup from here standby will realize this is the section they should read. Around line 969 where you give an example of copying the control file I would be a bit clearer that this is an example command. Ie (Copy the pg_control file from the cluster directory to the global sub-directory of the backup. For example cp $PGDATA/global/pg_control /mnt/server/backupdir/global) Testing Notes - I created a standby server from a base backup of another standby server. On this new standby server I then 1. Ran pg_start_backup('3'); and left the psql connection open 2. touch /tmp/3 -- my trigger_file ssinger@ssinger-laptop:/usr/local/pgsql92git/bin$ LOG: trigger file found: /tmp/3 FATAL: terminating walreceiver process due to administrator command LOG: restored log file 00010006 from archive LOG: record with zero length at 0/60002F0 LOG: restored log file 00010006 from archive LOG: redo done at 0/6000298 LOG: restored log file 00010006 from archive PANIC: record with zero length at 0/6000298 LOG: startup process (PID 19011) was terminated by signal 6: Aborted LOG: terminating any other active server processes WARNING: terminating connection because of crash of another server process DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory. HINT: In a moment you should be able to reconnect to the database and repeat your command. The new postmaster (the one trying to be promoted) dies. This is somewhat repeatable. If a base backup is in progress on a recovery database and that recovery database is promoted to master, following the promotion (if you don't restart the postmaster). I see select pg_stop_backup(); ERROR: database system status mismatches between pg_start_backup() and pg_stop_backup() If you restart the postmaster this goes away. When the postmaster leaves recovery mode I think it should abort an existing base backup so pg_stop_backup() will say no backup in progress, or give an error message on pg_stop_backup() saying that the base backup won't be usable. The above error doesn't really tell the user why there is a mismatch. - In my testing a few times I got into a situation where a standby server coming from a recovery target took a while to finish recovery (this is on a database with no activity). Then when i tried promoting that server to master I got LOG: trigger file found: /tmp/3 FATAL: terminating walreceiver process due to administrator command LOG: restored log file 00010009 from archive LOG: restored log file 00010009 from archive LOG: redo done at 0/9E8 LOG: restored log file 00010009 from archive PANIC: unexpected pageaddr 0/600 in log file 0, segment 9, offset 0 LOG: startup process (PID 1804) was terminated by signal 6: Aborted LOG: terminating any other active server processes It is *possible* I mixed up the order of a step somewhere since my testing isn't script based. A standby server that 'looks' okay but can't actually be promoted is dangerous. This version of the patch (I was testing the Sept 22nd version) seems less stable than how I remember the version from the July CF. Maybe I'm just testing it harder or maybe something has been broken. In the current
Re: [HACKERS] postgesql-9.0.4 compile on AIX 6.1 using gcc 4.4.6
On 11-08-30 07:58 AM, Weiss, Wilfried wrote: Hello, I am just trying to compile postgresql-9.0.4 on AIX 6100-06-03-1048 using gcc 4.4.6. Unfortunately that was not all. There was also: [Bug target/46072] AIX linker chokes on debug info for uninitialized static variables This is an IBM bug in AIX's assembler (as) which causes corrupt object code that is crashing when trying to execute it. As far as I know IBM still not delived a fix for this. It seems that they are not interested in this as IBM's xlc is not using the assembler to create object code. Does any one know whether there is an alternate way to compile postgresql on AIX 6.1 using gcc??? I appreciate even the smallest hint! I have compiled 9.0.4 on AIX 5.3 with GCC 4.1.1 without any issues. (well the regression tests hit an issue on REL9_0_STABLE builds that they don't hit with more recent branches but that is due to a makefile related issue that I should post about in a different thread. The buildfarm member grebe (http://buildfarm.postgresql.org/cgi-bin/show_history.pl?nm=grebebr=HEAD) does this. I do not have access to a AIX 6.1 machine Regards WW http://www.pilkington.com/nsg/disclaimer.htm
[HACKERS] skip WAL on COPY patch
The attached patch adds an option to the COPY command to skip writing WAL when the following conditions are all met: 1) The table is empty (zero size on disk) 2) The copy command can obtain an access exclusive lock on the table with out blocking. 3) The WAL isn't needed for replication For example COPY a FROM '/tmp/a.txt' (SKIP_WAL); A non-default option to the copy command is required because the copy will block out any concurrent access to the table which would be undesirable in some cases and is different from the current behaviour. This can safely be done because if the transaction does not commit the empty version of the data files are still available. The COPY command already skips WAL if the table was created in the current transaction. There was a discussion on something similar before[1] but I didn't see any discussion of having it only obtain the lock if it can do so without waiting (nor could I find in the archives what happened to that patch). I'm not attached to the SKIP_WAL vs LOCK as the option 1- see http://archives.postgresql.org/pgsql-patches/2005-12/msg00206.php Steve diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml index a73b022..3a0e521 100644 *** a/doc/src/sgml/ref/copy.sgml --- b/doc/src/sgml/ref/copy.sgml *** COPY { replaceable class=parameterta *** 42,47 --- 42,48 FORCE_QUOTE { ( replaceable class=parametercolumn/replaceable [, ...] ) | * } FORCE_NOT_NULL ( replaceable class=parametercolumn/replaceable [, ...] ) | ENCODING 'replaceable class=parameterencoding_name/replaceable' + SKIP_WAL /synopsis /refsynopsisdiv *** COPY { replaceable class=parameterta *** 293,298 --- 294,312 for more details. /para /listitem + /varlistentry + varlistentry + termliteralSKIP_WAL//term + listitem + para + Specifies that the writing of WAL should be skipped if possible. + WAL can be skipped if the table being copied into is empty and + if an exclusive lock can be obtained without waiting. If this + option is specified and WAL is skipped then the transaction will + hold an exclusive lock on the table being copied until the transaction + commits. + /para + /listitem /varlistentry /variablelist diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c index 528a3a1..bd81a4b 100644 *** a/src/backend/commands/copy.c --- b/src/backend/commands/copy.c *** *** 29,34 --- 29,35 #include commands/defrem.h #include commands/trigger.h #include executor/executor.h + #include commands/tablecmds.h #include libpq/libpq.h #include libpq/pqformat.h #include mb/pg_wchar.h *** *** 37,42 --- 38,44 #include parser/parse_relation.h #include rewrite/rewriteHandler.h #include storage/fd.h + #include storage/lmgr.h #include tcop/tcopprot.h #include utils/acl.h #include utils/builtins.h *** typedef struct CopyStateData *** 120,125 --- 122,128 bool *force_quote_flags; /* per-column CSV FQ flags */ List *force_notnull; /* list of column names */ bool *force_notnull_flags; /* per-column CSV FNN flags */ + bool skip_wal;/* skip WAL if able */ /* these are just for error messages, see CopyFromErrorCallback */ const char *cur_relname; /* table name for error messages */ *** ProcessCopyOptions(CopyState cstate, *** 965,970 --- 968,978 errmsg(argument to option \%s\ must be a valid encoding name, defel-defname))); } + else if (strcmp(defel-defname,skip_wal) == 0) + { + + cstate-skip_wal=true; + } else ereport(ERROR, (errcode(ERRCODE_SYNTAX_ERROR), *** CopyFrom(CopyState cstate) *** 1910,1915 --- 1918,1957 if (!XLogIsNeeded()) hi_options |= HEAP_INSERT_SKIP_WAL; } + + /* + * if SKIP_WAL was requested we try to avoid writing + * WAL if the table is 0 bytes on disk (empty) and + * that we can obtain an exclusive lock on it without blocking. + * + */ + if(cstate-skip_wal !XLogIsNeeded() + ConditionalLockRelationOid(cstate-rel-rd_id,AccessExclusiveLock)) + { + + Datum size = DirectFunctionCall2(pg_relation_size, + ObjectIdGetDatum(cstate-rel-rd_id), + PointerGetDatum(cstring_to_text(main))); + if ( DatumGetInt64(size)==0) + { + /** + * The relation is empty + unused. + * truncate it so that if this transaction + * rollsback then the changes to the relation files + * will dissapear (the current relation files will + * remain untouched) + */ + truncate_relation(cstate-rel); + hi_options |= HEAP_INSERT_SKIP_FSM; + hi_options |= HEAP_INSERT_SKIP_WAL; + } + else + { + UnlockRelation(cstate-rel,AccessExclusiveLock); + } + + } + /* * We need a ResultRelInfo so we can use
Re: [HACKERS] skip WAL on COPY patch
On 11-08-23 04:17 PM, Tom Lane wrote: Robert Haasrobertmh...@gmail.com writes: What I think would be really interesting is a way to make this work when the table *isn't* empty. In other words, have a COPY option that (1) takes an exclusive lock on the table, (2) writes the data being inserted into new pages beyond the old EOF, and (3) arranges for crash recovery or transaction abort to truncate the table back to its previous length. Then you could do fast bulk loads even into a table that's already populated, so long as you don't mind that the table will be excusive-locked and freespace within existing heap pages won't be reused. What are you going to do with the table's indexes? regards, tom lane What about not updating the indexes during the copy operation then to an automatic rebuild of the indexes after the copy (but during the same transaction). If your only adding a few rows to a large table this wouldn't be what you want, but if your only adding a few rows then a small amount of WAL isn't a big concern either. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Online base backup from the hot-standby
On 11-08-16 02:09 AM, Jun Ishiduka wrote: Thanks. This has the following two problems. * pg_start_backup() must set 'on' to full_page_writes of the master that is actual writing of the WAL, but not the standby. Is there any way to tell from the WAL segments if they contain the full page data? If so could you verify this on the second slave when it is brought up? Or can you track this on the first slave and produce an error in either pg_start_backup or pg_stop_backup() I see in xlog.h XLR_BKP_REMOVABLE, the comment above it says that this flag is used to indicate that the archiver can compress the full page blocks to non-full page blocks. I am not familiar with where in the code this actually happens but will this cause issues if the first standby is processing WAL files from the archive? * The standby doesn't need to connect to the master that's actual writing WAL. (Ex. Standby2 in Cascade Replication: Master - Standby1 - Standby2) I'm worried how I should clear these problems. Regards. Jun Ishizuka NTT Software Corporation TEL:045-317-7018 E-Mail: ishizuka@po.ntts.co.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] walprotocol.h vs frontends
On 11-08-15 10:00 AM, Peter Geoghegan wrote: Without commenting on what should be done in your specific case, I wonder whether it's time to fully retire the deprecated double representation of timestamps. Is anyone actually expected to rely on their availability when 9.2 is released? This also caused difficulties for Johann Oskarsson recently, during work on PL/Java. This would mean that anyone using the floating point timestamps today won't be able to use pg_upgrade to upgrade to whichever version we remove them from. 8.3 had float based timestamps as the default and I suspect many installations with the default 8.3 settings have been upgraded via pg_upgrade to 9.0 built the old timestamps representation. Steve -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] walprotocol.h vs frontends
On 11-08-15 12:33 PM, Peter Geoghegan wrote: On 15 August 2011 16:56, Steve Singerssin...@ca.afilias.info wrote: This would mean that anyone using the floating point timestamps today won't be able to use pg_upgrade to upgrade to whichever version we remove them from. 8.3 had float based timestamps as the default and I suspect many installations with the default 8.3 settings have been upgraded via pg_upgrade to 9.0 built the old timestamps representation. Really? I find that slightly surprising, considering that a quick look at master's timestamp.c suggests that the choice to use the in64 representation over the double representation is entirely a case of compile time either/or. There is no apparent fall-back to the double representation available to binaries built without --disable-integer-datetimes. I *think* the default on 8.3 was float based timestamps. If you want to upgrade a system running 8.3 (that uses float based timestamps) in using pg_upgrade you must compile 9.0 (or 8.4 or 9.1) with --disable-integer-datetimes. If at some point in the future you then want to upgrade to 9.2 with pg_upgrade you will again need to build 9.2 with --disable-integer-datetimes.If we remove the code for floating point representations of datetime then you won't be able to do that. Steve -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Online base backup from the hot-standby
On 11-07-07 09:22 PM, Jun Ishiduka wrote: As you proposed, adding new field which stores the backup end location taken from minRecoveryPoint, into pg_control sounds good idea. Update patch. Here is a review of the updated patch This version of the patch adds a field into pg_controldata that tries to store the source of the base backup while in recovery mode. I think your ultimate goal with this patch is to be able to take a backup of a running hot-standby slave and recover it as another instance. This patch seems to provide the ability to have the second slave stop recovery at minRecoveryPoint from the control file. My understanding of the procedure you want to get to to take base backups off a slave is 1. execute pg_start_backup('x') on the slave (*) 2. take a backup of the data dir 3. call pg_stop_backup() on the slave 4. Copy the control file on the slave This patch only addresses the recovery portions. * - I think your goal is to be able to run pg_start_backup() on the slave, the patch so far doesn't do this. If your goal was require this to be run on the master, then correct me. Code Review --- A few comments on the code *** postgresql/src/include/catalog/pg_control.h 2011-06-30 10:04:48.0 +0900 --- postgresql_with_patch/src/include/catalog/pg_control.h 2011-07-07 18:23:56.0 +0900 *** *** 142,147 --- 142,157 XLogRecPtr backupStartPoint; /* + * Postgres keeps where to take a backup server. + * + * backupserver is none , master or slave, its default is none. + * When postgres starts and it is none, it is updated to either master + * or slave with minRecoveryPoint of the backup server. + * When postgres reaches backup end location, it is updated to none. + */ + int backupserver; + + /* * Parameter settings that determine if the WAL can be used for archival * or hot standby. */ I don't think the above comment is very clear on what backupserver is. Perhaps /** * backupserver is used while postgresql is in recovery mode to * store the location of where the backup comes from. * When Postgres starts recovery operations * it is set to none. During recovery it is updated to either master, or slave * When recovery operations finish it is updated back to none **/ Also shouldn't backupServer be the enum type of 'BackupServer' not int? Other enums in the structure such as DBState are defined this way. Testing Review -- Since I can't yet call pg_start_backup or pg_stop_backup() on the slave I am calling them on the master. (I also did some testing where I didn't put the system into backup mode). I admit that I am not sure what to look for as an indication that the system isn't recovering to the correct point. In much of my testing I was just verifying that the slave started and my data 'looked' okay. I seem to get this warning in my logs when I start up the instance based on the slave backup. LOG: 0: database system was interrupted while in recovery at log time 2011-07-08 18:40:20 EDT HINT: If this has occurred more than once some data might be corrupted and you might need to choose an earlier recovery target I'm wondering if this warning is a bit misleading to users because it is an expected message when starting up an instance based on a slave backup (because the slave was already in recovery mode). If I shutdown this instance and start it up again I keep getting the warning. My understanding of your patch is that there shouldn't be any risk of corruption in that case (assuming your patch has no bugs). Can/should we be suppressing this message when we detect that we are recovering from a slave backup? The direction of the patch has changed a bit during this commit fest. I think it would be good to provide an update on the rest of the changes you plan for this to be a complete useable feature. That would make it easier to comment on something you missed versus something your planning on dealing with in the next stage. Steve Regards. Jun Ishizuka NTT Software Corporation TEL:045-317-7018 E-Mail: ishizuka@po.ntts.co.jp
Re: [HACKERS] libpq SSL with non-blocking sockets
On 11-06-28 02:14 PM, Martin Pihlak wrote: Thanks for the review! I have since simplified the patch to assume that partial SSL writes are disabled -- according to SSL_write(3) this is the default behaviour. Now the SSL retry buffer only holds the data to be retried, the remainder is moved to the new outBuffer. That sounds okay. Does it make sense to add in a check to verify that SSL didn't send a partial write? I don't know how bad openssl is about changing default behaviours or if we are concerned about protecting against someone changing the SSL parameters. My inclination is that this isn't needed but I'll raise the issue. Fixed. New version of the patch attached. Otherwise this version of the patch looks good to me. The only testing I have done is running the test program you sent earlier on in the thread and verified that the regression tests all pass. Other than something like your test program I'm not sure how else this bug can be induced. Since the original patch was submitted as a WIP patch and this version wasn't sent until well into the commit fest I am not sure if it qualifies for a committer during this commitfest or if it needs to wait until the next one. regards, Martin
Re: [HACKERS] libpq SSL with non-blocking sockets
On 11-06-28 02:14 PM, Martin Pihlak wrote: Hmm, I thought I thought about that. There was a check in the original patch: if (conn-sslRetryBytes || (conn-outCount - remaining) 0) So if the SSL retry buffer was emptied it would return 1 if there was something left in the regular output buffer. Or did I miss something there? The issue I saw in the original patch was that at that point in the function, sslRetryBytes could be zero (if the data was sent) but conn-outCount - remaining would be an incorrect value since remaining is the number of bytes left to send from sslRetryBuf NOT conn-outBuffer. This is no longer an issue in the updated patch. I will try to take a closer look at your updated patch in the next few days. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Online base backup from the hot-standby
On 11-06-28 01:52 AM, Jun Ishiduka wrote: Considering everything that has been discussed on this thread so far. Do you still think your patch is the best way to accomplish base backups from standby servers? If not what changes do you think should be made? I reconsider the way to not use pg_stop_backup(). Process of online base backup on standby server: 1. pg_start_backup('x'); 2. copy the data directory 3. copy *pg_control* Behavior while restore: * read Minimum recovery ending location of the copied pg_control. * use the value with the same purposes as the end-of-backup location. - When the value is equal to 0/0, this behavior do not do. This situation is to acquire backup from master server. The behaviour you describe above sounds okay to me, if someone else sees issues with this then they should speak up (ideally before you go off and write a new patch) I'm going to consolidate my other comments below so this can act as a more complete review. Usability Review - We would like to be able to perform base backups from slaves without having to call pg_start_backup() on the master. We can not currently do this. The patch tries to do this. From a useability point of view it would be nice if this could be done both manually with pg_start_backup() and with pg_basebackup. The main issue I have with the first patch you submitted is that it does not work for cases where you don't want to call pg_basebackup or you don't want to include the wal segments in the output of pg_basebackup. There are a number of these use-cases (examples include the wal is already available on an archive server, or you want to use filesystem/disk array level snapshots instead of tar) . I feel your above proposal to copy the control file as the last step in the basebackup and the get the minRecoveryEnding location from this solves these issues. It would be nice if things 'worked' when calling pg_basebackup against the slave (maybe by having perform_base_backup() resend the control file after it has sent the log?). Feature test Performance review - Skipped since a new patch is coming Coding Review -- I didn't look too closely at the code since a new patch that might change a lot of the code. I did like how you added comments to most of the larger code blocks that you added. Architecture Review --- There were some concerns with your original approach but using the control file was suggested by Heikki and seems sound to me. I'm marking this 'waiting for author' , if you don't think you'll be able to get a reworked patch out during this commitfest then you should move it to 'returned with feedback' Steve Jun Ishizuka NTT Software Corporation TEL:045-317-7018 E-Mail: ishizuka@po.ntts.co.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Online base backup from the hot-standby
On 11-06-24 12:41 AM, Jun Ishiduka wrote: The logic that not use pg_stop_backup() would be difficult, because pg_stop_backup() is used to identify minRecoveryPoint. Considering everything that has been discussed on this thread so far. Do you still think your patch is the best way to accomplish base backups from standby servers? If not what changes do you think should be made? Steve Jun Ishizuka NTT Software Corporation TEL:045-317-7018 E-Mail: ishizuka@po.ntts.co.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] libpq SSL with non-blocking sockets
On 11-06-15 03:20 PM, Martin Pihlak wrote: Yes, that sounds like a good idea -- especially considering that COPY is not the only operation that can cause SSL_write retries. This is of course still work in progress, needs cleaning up and definitely more testing. But at this point before going any further, I'd really appreciate a quick review from resident libpq gurus. Martin, I'm not a libpq guru but I've given your patch a look over. The idea of storing the original buffer in new members of connection structure for later re-use seems like a good way to approach this. A few things I noticed (that you might be aware of since you mentioned it needs cleanup) -The patch doesn't compile with non-ssl builds, the debug at the bottom of PQSendSome isn't in an #ifdef -I don't think your handling the return code properly. Consider this case. pqSendSome(some data) sslRetryBuf = some Data return 1 pqSendSome(more data) it sends all of 'some data' returns 0 I think 1 should be returned because all of 'more data' still needs to be sent. I think returning a 0 will break PQsetnonblocking if you call it when there is data in both sslRetryBuf and outputBuffer. We might even want to try sending the data in outputBuffer after we've sent all the data sitting in sslRetryBuf. If you close the connection with an outstanding sslRetryBuf you need to free it. Other than running your test program I didn't do any testing with this patch. Steve regards, Martin
Re: [HACKERS] Online base backup from the hot-standby
On 11-06-23 02:41 AM, Jun Ishiduka wrote: I receive this mail, so I notice I do wrong recognition to what Heikki is proposing. my recognition: Before: * I thought Heikki proposes, Execute SQL(pg_start_backup('x'); copy the data directory and pg_stop_backup();) from the standby server to the primary server. - I disliked this. Now: * Heikki is proposing both No using pg_basebackup and Not specify -x. So, * Use the output of pg_stop_backup(). * Don't use backup history file. he thinks. Right? What I think he is proposing would not require using pg_stop_backup() but you could also modify pg_stop_back() to work as well. What do you think of this idea? Do you see how the patch can be reworked to accomplish this? Jun Ishizuka NTT Software Corporation TEL:045-317-7018 E-Mail: ishizuka@po.ntts.co.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] patch for 9.2: enhanced errors
On 11-06-20 03:44 PM, Pavel Stehule wrote: Hello You need to update config.sgml at the same time you update this format. You need to append a , after application name but before constraintName. As it stands the CSV log has something like: .nbtinsert.c:433,psqla_pkey,public,a,a fixed The CSV log seems fine now. nbtinsert.c pg_get_indrelation is named differently than everything else in this file (ie _bt...). My guess is that this function belongs somewhere else but I don't know the code well enough to say where you should move it too. I renamed this function to IndexRelationGetParentRelation and muved to relcache.c Thanks, it looks less out of place there than it did in nbtinsert.c I don't call a quote_identifier on only data error properties like table_name or schema_name (but I am open to arguments for it or against it). The quote_identifier is used for column names, because there should be a more names and comma should be used inside name - and this is consistent with pg_get_indexdef_columns. Regards Okay. Pavel Stehule I'm going to mark this as ready for a committer.
Re: [HACKERS] Online base backup from the hot-standby
On 11-06-14 02:52 AM, Jun Ishiduka wrote: I still think that's headed in the wrong direction. (http://archives.postgresql.org/pgsql-hackers/2011-05/msg01405.php) Please check these mails, and teach the reason for content of the wrong direction. (http://archives.postgresql.org/pgsql-hackers/2011-06/msg00209.php) (http://archives.postgresql.org/pgsql-hackers/2011-05/msg01566.php) Jun, I've been reviewing these threads as a start to reviewing your patch (I haven't yet looked at the patch). I *think* the concern is that 1) Today you can do a backup by just calling pg_start_backup('x'); copy the data directory and pg_stop_backup(); You do not need to use pg_basebackup to create a backup. The solution you are proposing would require pg_basebackup to be used to build backups from standby servers. 2) If I run pg_basebackup but do not specify '-x' then no pg_xlog segments are included in the output. The relevant pg_xlog segments are in the archive from the master. I can see situations where you are already copying the archive to the remote site that the new standby will be created in so you don't want to have to copy the pg_xlog segments twice over your network. What Heikki is proposing will work both when you aren't using pg_basebackup (as long the output of pg_stop_backup() is somehow captured in a way that it can be read) and will also work with pg_basebackup when '-x' isn't specified. Steve Jun Ishizuka NTT Software Corporation TEL:045-317-7018 E-Mail: ishizuka@po.ntts.co.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] patch for 9.2: enhanced errors
On Sun, 19 Jun 2011, Pavel Stehule wrote: Maybe there is second issue (little bit - performance - you have to call a output function), But I agree, so this information is very interesting and can help. I am concerned about the performance impact of doing that. Not all constraints are on int4 columns. Some constraints might be on a geometry type that is megabytes in side taking a substantial chunk of CPU and bandwith to convert it into a text representation and then send it back to the client. I am open for any ideas in this direction. Regards Pavel best regards, Florian Pflug -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] patch for 9.2: enhanced errors
On 11-06-08 04:14 PM, Pavel Stehule wrote: Hello Attached patch implements a new erros's fields that describes table, colums related to error. This enhanced info is limited to constraints and RI. Here is my review of this patch Submission Review: The patch applies cleanly against master The patch does not include any documentation updates (see note below to update config.sgml) The patch does not include any unit tests. At a minimum it should add a few tests with verbosity set to verbose Usability Review The patch adds the ability to get more information about the reasons a query failed. Pavel indicates that this is a building block for a later patch. This sounds like a worthwhile goal, without this patch I don't see another good way of getting the details on what columns make up the constraint that fails, other than making additional queries into the catalog. This patch should not impact pg_dump or pg_upgrade. Pavel has submitted a related patch that adds support for this feature to plpgsql, in theory other pl's might want to use the information this patch exposes. Feature Test --- The error messages behave as described with \set verbosity verbose. I tried this using both the 8.3 and 9.0 versions of psql (against a postgresql server with this patch) and things worked as expected (the extra error messages did not show). I also tried the patched psql against an 8.3 backend and verified that we don't get strange behaviour going against an older backend with verbosity set. I tried adding multiple constraints to a table and inserting a row that violates them, only one of the constraints showed up in the error message, this is fine and consistent with the existing behaviour Consider this example of an error that gets generated ERROR: 23505: duplicate key value violates unique constraint A_pkey DETAIL: Key (a)=(1) already exists. LOCATION: _bt_check_unique, nbtinsert.c:433 CONSTRAINT: A_pkey SCHEMA: public TABLE: A COLUMN: a STATEMENT: insert into A values (1); I think that both the CONSTRAINT, and TABLE name should be double quoted like A_pkey is above. When doing this make sure you don't break the quoting in the CSV format log. Performance Review - I don't see this patch impacting performance, I did not conduct any performance tests. Coding Review - In tupdesc.c line 202 the existing code is performing a deep copy of ConstrCheck. Do you need to copy nkeys and conkey here as well? Then at line 250 ccname is freed but not conkey postgres_ext.h line 55 + #define PG_DIAG_SCHEMA_NAME's' + #define PG_DIAG_TABLE_NAME't' + #define PG_DIAG_COLUMN_NAMES'c' + #define PG_DIAG_CONSTRAINT_NAME 'n' The assignment of letters to parameters seems arbitrary to me, I don't have a different non-arbitrary mapping in mind but if anyone else does they should speak up. I think it will be difficult to change this after 9.2 goes out. elog.c: *** *** 2197,2202 --- 2299,2319 if (application_name) appendCSVLiteral(buf, application_name); + /* constraint_name */ + appendCSVLiteral(buf, edata-constraint_name); + appendStringInfoChar(buf, ','); + + /* schema name */ + appendCSVLiteral(buf, edata-schema_name); + appendStringInfoChar(buf, ','); You need to update config.sgml at the same time you update this format. You need to append a , after application name but before constraintName. As it stands the CSV log has something like: .nbtinsert.c:433,psqla_pkey,public,a,a nbtinsert.c pg_get_indrelation is named differently than everything else in this file (ie _bt...). My guess is that this function belongs somewhere else but I don't know the code well enough to say where you should move it too. Everything I've mentioned above is a minor issue, I will move the patch to 'waiting for author' and wait for you to release an updated patch. Steve Singer -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] patch for 9.2: enhanced errors
On 11-06-18 06:36 PM, Steve Singer wrote: On 11-06-08 04:14 PM, Pavel Stehule wrote: Here is my review of this patch Submission Review: The patch applies cleanly against master The patch does not include any documentation updates (see note below to update config.sgml) The patch does not include any unit tests. At a minimum it should add a few tests with verbosity set to verbose On second thought tests might not work. Is the only way to get the constraint details are in verbose mode where line numbers from the c file are also included? If so then this won't work for the regression tests. Having the diff comparison fail every time someone makes an unrelated change to a source file isn't what we want. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_listener in 9.0
On 11-06-01 09:30 AM, Christopher Browne wrote: On Wed, Jun 1, 2011 at 8:29 AM, Dave Pagedp...@pgadmin.org wrote: On Wed, Jun 1, 2011 at 12:27 PM, Andrew Dunstanand...@dunslane.net wrote: The whole point of the revamp was that pg_listener was a major performance bottleneck and needed to go, and without it being gone we would not have got notification payloads. Yeah, I know why it was replaced. That doesn't mean we cannot provide an alternative interface to the same info though (other things might of course). I suspect you're pretty much out of luck. Not me - our users. Note that in Slony 2.1, there's a table called sl_components, which is used to capture the state of the various database connections, checking in as the various threads do their various actions. Also, slon and slonik try to report their respective application, so it can be reported on pg_stat_activity. Slony 2.1 also sets application_name. If this were a big deal for pgAdmin we could consider backporting the application_name change to 2.0.x for users running against 9.0. Slony also has a table called sl_nodelock that each slon process writes adds a row for on startup. This includes the backend pid() for one of the connections. Slony 1.2, 2.0 and 2.1 all use sl_nodelock -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] dblink crash on PPC
On 11-05-27 12:35 PM, Tom Lane wrote: grebe, which is also a PPC64 machine, isn't showing the bug. And I just failed to reproduce the problem on a RHEL6 PPC64 box. About to go try it on RHEL5, which has a gcc version much closer to what wombat says it's using, but I'm not very hopeful about that. I think the more likely thing to be keeping in mind is that Gentoo is a platform with poor quality control. regards, tom lane As another data point, the dblink regression tests work fine for me on a PPC32 debian (squeeze,gcc 4.4.5) based system. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] branching for 9.2devel
On 11-04-25 03:40 PM, Robert Haas wrote: At the risk of getting a bit cranky, you haven't participated in a material way in any CommitFest we've had in well over a year. AFAICS, the first, last, and only time you are listed in the CommitFest application is as co-reviewer of a patch in July 2009, which means that the last time you really had a major roll in this process was during the 8.4 cycle. So I'm really rather suspicious that you know what's wrong with the process and how to fix it better than the people who are involved currently. I think we need here is more input from the people who are regularly submitting and reviewing patches, and those who have tried recently but been turned off by some aspect of the process. I reviewed a handful of patches during commitfests during the 9.1 release. I think a commitfest lasting one week will make me review fewer patches not more.At the start of a commitfest I would volunteer to review a single patch knowing that it wouldn't be hard to find 4-6 hours to review the patch during the next few weeks. Once I was done with the first patch when I thought I'd have sometime in the next few days to review another patch I would pick another one off the list. At the start of the commitfest I wasn't concerned about picking up a patch because I knew I had lots of time to get to it. Near the end of the last commitfest I wasn't concerned about picking up an unassigned patch because there were so many patches people picked up earlier on in the commitfest waiting for review that I didn't think I'd get pressured on a patch I had only picked up a day or two ago. If I knew I was expected to turn a review around in a short window I think I would only pick up a patch if I was 90% sure I'd have time to get to it during the week. I sometimes can spend $work time on reviewing patches but I can rarely block time off or schedule when that will be, and the same somewhat applies to the patch reviews I do on my own time (postgresql stuff comes after other commitments). It is easy to say four weeks in a row I won't have time to review one patch this week it is harder to say I won't have time to review a single patch in the next month I also should add that sometimes I'd review a patch and the only issue from the review might be is this really how we want the thing to work? and the commitfest app doesn't have a good state for this. If patch needs feedback or a decision from the community and it sometimes isn't clear when enough silence or +1's justify moving it to a committer or bouncing the patch to be reworked. Steve -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] JDBC connections to 9.1
I'm getting JDBC exceptions when I try to connect to 9.1 (master) with the postgresql-9.0-801.jdbc3.jar I don't have this issue with 9.0. There is nothing obvious at http://jdbc.postgresql.org or in the 9.1 alpha release notes that indicate a newer JDBC driver will be required. Have other people seen similar issues? If 9.1 does require a JDBC driver upgrade then it would be good if an updated driver was posted before we start encouraging people to test their applications with the beta. Caused by: org.postgresql.util.PSQLException: Protocol error. Session setup failed. at org.postgresql.core.v3.ConnectionFactoryImpl.readStartupMessages(ConnectionFactoryImpl.java:496) at org.postgresql.core.v3.ConnectionFactoryImpl.openConnectionImpl(ConnectionFactoryImpl.java:112) at org.postgresql.core.ConnectionFactory.openConnection(ConnectionFactory.java:66) at org.postgresql.jdbc2.AbstractJdbc2Connection.init(AbstractJdbc2Connection.java:125) at org.postgresql.jdbc3.AbstractJdbc3Connection.init(AbstractJdbc3Connection.java:30) at org.postgresql.jdbc3.Jdbc3Connection.init(Jdbc3Connection.java:24) at org.postgresql.Driver.makeConnection(Driver.java:393) at org.postgresql.Driver.connect(Driver.java:267) at java.sql.DriverManager.getConnection(DriverManager.java:620) at java.sql.DriverManager.getConnection(DriverManager.java:200) at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:616) at org.mozilla.javascript.MemberBox.invoke(MemberBox.java:161) ... 14 more -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] JDBC connections to 9.1
On 11-04-18 09:44 AM, Tom Lane wrote: Steve Singerssin...@ca.afilias.info writes: I'm getting JDBC exceptions when I try to connect to 9.1 (master) with the postgresql-9.0-801.jdbc3.jar I don't have this issue with 9.0. Hmm, what shows up in the postmaster log? regards, tom lane LOG: unexpected EOF on client connection -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pl/python custom exceptions for SPI
On 11-02-12 05:58 AM, Jan Urbański wrote: On 11/02/11 10:53, Jan Urbański wrote: On 10/02/11 22:26, Steve Singer wrote: Here's an updated patch with documentation. It's an incremental patch on top of the latest explicit-subxacts version. This looks fine. I've attached a one word documentation change to go o top of the patch. I'll let Peter decide if he likes those assert's or not. I don't have a good enough sense of if we often use asserts in that fashion elsewhere. Cheers, Jan diff --git a/doc/src/sgml/plpython.sgml b/doc/src/sgml/plpython.sgml index aee54bf..4a90430 100644 *** a/doc/src/sgml/plpython.sgml --- b/doc/src/sgml/plpython.sgml *** $$ LANGUAGE plpythonu; *** 966,972 /programlisting /para para ! The actual class of the exception being raised corresponds to exact condition that caused the error (refer to xref linkend=errcodes-table for a list of possible conditions). The literalplpy.spiexceptions/literal module defines an exception class for --- 966,972 /programlisting /para para ! The actual class of the exception being raised corresponds to the exact condition that caused the error (refer to xref linkend=errcodes-table for a list of possible conditions). The literalplpy.spiexceptions/literal module defines an exception class for -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pl/python explicit subtransactions
On 11-02-10 05:20 AM, Jan Urbański wrote: D'oh, I was thinking about whether it's safe to skip the internal subxact if you're in an implicit one and somehow I always convinced myself that since you eventually close the explicit one, it is. Obviously my testing wasn't enough :( Attaching an updated patch with improved docs incorporating Steve's fixes, and fixes tests for not statring the implicit subxact. That actually makes the patch a bit smaller ;) OTOH I had to remove the section from the docs that claimed performance improvement due to only starting the explicit subxact... This version of the patch looks fine to me and seems to work as expected. Cheers, Jan
Re: [HACKERS] log_hostname and pg_stat_activity
On 11-02-10 10:13 AM, Robert Haas wrote: On Tue, Feb 1, 2011 at 1:33 PM, Robert Haasrobertmh...@gmail.com wrote: On Tue, Feb 1, 2011 at 1:09 PM, Peter Eisentrautpete...@gmx.net wrote: On tis, 2011-01-18 at 19:24 -0500, Steve Singer wrote: However if I connect with a line in pg_hba that matches on an IP network then my client_hostname is always null unless log_hostname is set to true. This is consistent with the behavior you describe but I think the average user will find it a bit confusing. Having a column that is always null unless a GUC is set is less than ideal but I understand why log_hostname isn't on by default. Well, we have all these track_* variables, which also control what appears in the statistics views. After thinking about this some more, I think it might be better to be less cute and forget about the interaction with the pg_hba.conf hostname behavior. That is, the host name is set if and only if log_hostname is on. +1 for doing it that way. I think there are no outstanding issues with this patch of any significance, so I'm marking it Ready for Committer. Was there an uodated version of this patch I missed? The original patch needed some sort of documentation saying that having something showup in the new pg_stat_activity columns is controlled by log_hostname. Above Peter and you seem to agree that having the having the line matched in pg_hba being a controlling factor should be removed but I haven't seen an updated patch that implements that. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] log_hostname and pg_stat_activity
On 11-02-10 10:32 AM, Robert Haas wrote: I was assuming those changes were sufficiently trivial that they could be made at commit-time, especially if Peter is committing it himself. Of course if he'd like a re-review, he can always post an updated patch, but I just thought that was overly pedantic in this particular case. Sounds reasonable. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pl/python custom exceptions for SPI
On 11-02-10 03:13 PM, Jan Urbański wrote: On 10/02/11 20:24, Peter Eisentraut wrote: Here is the rest of my review. Submission Review --- Patch applies cleanly. Documentation is still outstanding but Jan has promised it soon. Usability Review --- We don't have this for plpython, that we have a similar idea with plpgsql. I think this feature is useful and worth having. The CamelCase naming of the exceptions is consistent with how the built-in python exceptions are named (camel case). Feature Test --- I did basic testing of the feature (catching a few exception types thrown from both direct SQL and prepared statements) and the feature worked as expected. Performance Impact The impact of mapping error codes to exception types shouldn't come into play unless an SPI error is returned and with the hash it should still be minimal. Code Review - Ideally char * members of ExceptionMap would be const, but since many versions of python take a non-const value to PyErr_NewException that won't work :( After you search the for an exception in the hash you have: /* We really should find it, but just in case have a fallback */ Assert(entry != NULL); exc = entry ? entry-exc : PLy_exc_spi_error; I'm not sure the assert is needed here. Just falling back to the exception type seems reasonable and more desirable than an assert if showhow a new exception gets missed from the list. I don't feel that strongly on this. line 3575: PLy_elog(ERROR, Failed to add the spiexceptions module); Failed should be failed Other than that the patch looks fine to me. Updated again. Why do the error messages print spiexceptions.SyntaxError instead of plpy.spiexceptions.SyntaxError? Is this intentional or just the way it comes out of Python? That's how traceback.format_exception() works IIRC, which is what the Python interpreter uses and what PL/Python mimicks in PLy_traceback. Please add some documentation. Not a list of all exceptions, but at least a paragraph that various kinds of specific exceptions may be generated, what package and module they are in, and how they relate. Sure, Steve already asked for docs in another thread, and I'm writing them. Jan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pl/python explicit subtransactions
On 11-02-09 05:22 PM, Peter Eisentraut wrote: On tis, 2011-02-08 at 00:32 -0500, Steve Singer wrote: On 11-02-06 11:40 AM, Jan Urbański wrote: PFA an updated patch with documentation. Yeah, changed them. Those changes look fine. The tests now pass. I've attached a new version of the patch that fixes a few typos/wording issues I saw in the documentation. I also changed the link to the python reference manual section on context managers. I think it is better to link to that versus the original PEP. The documentation could probably still use more word-smithing but that can happen later. I'm marking this as ready for a committer. Is it necessarily a good idea that an explicit subtransaction disables the implicit sub-subtransactions? It might be conceivable that you'd still want to do some try/catch within explicit subtransactions. I had tested nested subtransactions but not a normal try/catch within a subtransaction. That sounds reasonable to allow. Unfortunately it leads to: test=# create table foo(a int4 primary key); NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index foo_pkey for table foo CREATE TABLE test=# DO $$ test$# try: test$# with plpy.subtransaction(): test$# plpy.execute(insert into foo values(1)) test$# try: test$# plpy.execute(insert into foo values(1)) test$# except: test$# plpy.notice('inside exception') test$# except plpy.SPIError: test$# f=0 test$# $$ language plpythonu; TRAP: FailedAssertion(!(afterTriggers-query_depth == afterTriggers-depth_stack[my_level]), File: trigger.c, Line: 3846) NOTICE: inside exception CONTEXT: PL/Python anonymous code block server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] postponing some large patches to 9.2
On 11-02-08 10:07 AM, Jan Urbański wrote: * custom SPI exceptions - I'd really like this one to go in, because it allows writing UPSERT-kind functions in PL/Python very easily, and it's just a handful of lines of code I will try to do a review of this one (probably tomorrow night) since I've reviewed many of the related patches. * don't remove arguments - a bugfix, really, and a very small one So from the above, I'd say custom datatype parsers could get rejected if noone feels like having a discussion about it for 9.1. Table functions, custom SPI exceptions and tracebacks are niceties that if postponed to 9.2 will just mean that many features less in 9.1. The rest is bordering on bugfixes, and I think should go in. Cheers, Jan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] postponing some large patches to 9.2
On 11-02-07 10:37 PM, Robert Haas wrote: - The PL/python extravaganza. I'm not really clear where we stand with this. There are a lot of patches here. Some of the patches have been committed a few others are ready (or almost ready) for a committer. The table function one is the only one in 'waiting for author' 4 of the patches haven't yet received any review. Jan Urbanski has been pretty good about posting updated patches as the dependent patches get updated. It would be good if a few people grabbed these. The individual patches tend to not be that large. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pl/python explicit subtransactions
On 11-02-06 11:40 AM, Jan Urbański wrote: PFA an updated patch with documentation. Yeah, changed them. Those changes look fine. The tests now pass. I've attached a new version of the patch that fixes a few typos/wording issues I saw in the documentation. I also changed the link to the python reference manual section on context managers. I think it is better to link to that versus the original PEP. The documentation could probably still use more word-smithing but that can happen later. I'm marking this as ready for a committer. Thanks, Jan diff --git a/doc/src/sgml/plpython.sgml b/doc/src/sgml/plpython.sgml index e05c293..9137ceb 100644 *** a/doc/src/sgml/plpython.sgml --- b/doc/src/sgml/plpython.sgml *** $$ LANGUAGE plpythonu; *** 943,949 /sect2 ! sect2 titleTrapping Errors/title para --- 943,949 /sect2 ! sect2 id=plpython-trapping titleTrapping Errors/title para *** $$ LANGUAGE plpythonu; *** 969,974 --- 969,1087 /sect2 /sect1 + sect1 id=plpython-subtransaction + titleExplicit subtransactions/title + para + Recovering from errors caused by database access as described + in xref linkend=plpython-trapping can lead to an undesirable situation + where some operations succeed before one of them fails and after recovering + from that error the data is left in an inconsistent state. PL/Python offers + a solution to this problem in the form of explicit subtransactions. + /para + + sect2 +titleSubtransaction context managers/title +para + Consider a function that implements a transfer between two accounts: + programlisting + CREATE FUNCTION transfer_funds() RETURNS void AS $$ + try: + plpy.execute(UPDATE accounts SET balance = balance - 100 WHERE account_name = 'joe') + plpy.execute(UPDATE accounts SET balance = balance + 100 WHERE account_name = 'mary') + except plpy.SPIError, e: + result = error transferring funds: %s % e.args + else: + result = funds transferred correctly + plpy.execute(INSERT INTO operations(result) VALUES ('%s') % result) + $$ LANGUAGE plpythonu; + /programlisting + If the second literalUPDATE/literal statement results in an exception + being raised, this function will report the error, but the result of the + first literalUPDATE/literal will nevertheless be committed. In other + words, the funds will be withdrawn from Joe's account, but will not be + transferred to Mary's account. +/para +para + To avoid such issues, you can wrap your literalplpy.execute/literal + calls in an explicit subtransaction. The literalplpy/literal module + provides a helper object to manage explicit subtransactions that gets + created with the literalplpy.subtransaction()/literal function. + Objects created by this function implement the ulink url=http://www.python.org/doc//current/library/stdtypes.html#context-manager-types;context manager interface/ulink. + + Using explicit subtransactions we can rewrite our function as: + programlisting + CREATE FUNCTION transfer_funds2() RETURNS void AS $$ + try: + with plpy.subtransaction(): + plpy.execute(UPDATE accounts SET balance = balance - 100 WHERE account_name = 'joe') + plpy.execute(UPDATE accounts SET balance = balance + 100 WHERE account_name = 'mary') + except plpy.SPIError, e: + result = error transferring funds: %s % e.args + else: + result = funds transferred correctly + plpy.execute(INSERT INTO operations(result) VALUES ('%s') % result) + $$ LANGUAGE plpythonu; + /programlisting + Note that the use of literaltry/catch/literal is still + required. Otherwise the exception would propagate to the top of the Python + stack and would cause the whole function to abort with + a productnamePostgreSQL/productname error. + The literaloperations/literal table would not have any row inserted + into it. The subtransaction context manager does not trap errors, it only + assures that all database operations executed inside its scope will be + atomically committed or rolled back. A rollback of the subtransaction + block occurs on any kind of exception exit, not only ones caused by + errors originating from database access. A regular Python exception raised + inside an explicit subtransaction block would also cause the + subtransaction to be rolled back. +/para +para + Another reason to use explicit subtransactions is that each + time literalplpy.execute/literal or literalplpy.prepare/literal is + used, it has to create its own internal subtransaction in order to be able + to recover from errors using the literaltry/catch/literal construct. If + you explicitly put your database access code in the scope of a +
Re: [HACKERS] pl/python explicit subtransactions
On 11-01-27 05:11 PM, Jan Urbański wrote: On 23/12/10 15:32, Jan Urbański wrote: Here's a patch implementing explicitly starting subtransactions mentioned in http://archives.postgresql.org/pgsql-hackers/2010-12/msg01991.php. It's an incremental patch on top of the spi-in-subxacts patch sent eariler. Updated to the spi-in-subxacts version sent earlier. Submission Review - The patch applies against master. Test updates are included. The patch doesn't included any documentation updates. The author did mention that he'll do these if it looks like the patch is going to be accepted. The plpython_subxact regression test you addded is failing on both python3 and 2.4 for me. It seems to be creating functions with the same name twice and the second time is failing with ERROR: function . already exists. I think this is just an issue with your expect files. Usability Review --- The patch implements a python context manager that allows plpython programs to control subtransactions with the python 'with' syntax. The patch implements what it describes. Using the subtransaction manager seems consistent with other examples of Python context managers. This feature seems useful for pl/python developers. The 'with' syntax was only officially added with python 2.6. I have confirmed that the patch does not break plpython going as far back as 2.5 and 2.4. I have no reason to think that earlier versions will be broken either, I just didn't test anything earlier than 2.4. I think this feature is useful for developing more complicated functions in pl/python and we don't have an existing way of managing savepoints from pl/python. The context manager approach seems consistent with how recent python versions deal with this type of thing in other areas. Feature Test No issues found. Code Review PLy_abort_open_subtransactions(...) line 1215: ereport(WARNING, (errmsg(Forcibly aborting a subtransaction that has not been exited))); Forcibly should be forcibly (lower case) Similarly in PLy_subxact_enter and PLy_subxact_exit a few PLy_exception_set calls start with an upper case character when I think we want it to be lower case. Other than that I don't see any issues. I am marking this as waiting for author since the documentation is still outstanding. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pl/python SPI in subtransactions
On 11-01-27 04:33 PM, Jan Urbański wrote: Right, without the patch you can never catch errors originating from plpy.execute, so any error terminates the whole function, and so rolls back the statement. FWIW PL/Perl works the same: begin; create table foo(i int primary key); DO $$ spi_exec_query(insert into foo values ('1')); eval { spi_exec_query(insert into foo values ('1')); }; elog(LOG, $@) if $@; $$ language plperl; select * from foo; You will see a row in foo. AFAICS PL/Tcl also does it, but I don't have it complied it to try. It does consitute a behaviour change, but we didn't get any complains when the same thing happened for Perl. If we've made this type of behaviour change for pl/perl and no one complained then I don't see an issue with doing it for plpython (if anyone does they should speak up) I am finding the treatment of savepoints very strange. If as a function author I'm able to recover from errors then I'd expect (or maybe want) to be able to manage them through savepoints Ooops, you found a bug there. In the attached patch you get the same error (SPI_ERROR_TRANSACTION) as in master. Also added a unit test for that. I think you need to make the same change to PLy_spi_execute_plan. Try BEGIN; create table foo(a int4 primary key); DO $$ prep=plpy.prepare(savepoint save) plpy.execute(prep) r=plpy.execute(insert into foo values ('1')) try : r=plpy.execute(insert into foo values ('1')) except: prep=plpy.prepare(rollback to save) plpy.execute(prep) plpy.log(something went wrong) $$ language plpythonu; -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pl/python SPI in subtransactions
On 11-01-29 03:39 PM, Jan Urbański wrote: D'oh, you're right, thanks. Attached patch with fix. Curiosly, right now in master your example with plpy.prepare will result in savepoint being swallowed, but it's of course better to react with an error. Cheers, Jan This seems to fix it. You mentioned that you were going to add a few paragraphs to the documentation saying that you can now actually catch SPI errors. I haven't seen that yet. Other than that I don't see any issues with the patch and it should be ready for a committer. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pl/python SPI in subtransactions
On 10-12-23 08:45 AM, Jan Urbański wrote: Here's a patch implementing a executing SPI in an subtransaction mentioned in http://archives.postgresql.org/pgsql-hackers/2010-12/msg01991.php. It's an incremental patch on top of the plpython-refactor patch sent eariler. Git branch for this patch: https://github.com/wulczer/postgres/tree/spi-in-subxacts. Without it the error handling in PL/Python is really broken, as we jump between from a saught longjmp back into Python without any cleanup. As an additional bonus you no longer get all the ugly unrecognized error in PLy_spi_execute_query errors. I see you've merged the changes from the refactoring branch down but haven't yet posted an updated patch. This review is based on 2f2b4a33bf344058620a5c684d1f2459e505c727 As a disclaimer, I have worked python before but not used plpython for anything substantial. Submission Review --- I think Jan intends to post an updated patch once the refactor branch has been dealt with. The patch updates the excepted results of the regression tests so they no longer expect the 'unrecognized error' warnings. No new unit test are added to verify that behavior changes will continue to function as intended (though they could be) No documentation updates are included. The current documentation is silent on the behaviour of plpython when SPI calls generate errors so this change doesn't invalidate any documentation but it would be nice if we described what effect SQL invoked through SPI from the functions have on the transaction. Maybe a Trapping Errors section? Usability Review --- Does the patch implement what it says: yes Do we want this: yes I think so. This patch allows pl/python functions to catch errors from the SQL they issue and deal with them as the function author sees fit. I see this being useful. A concern I have is that some users might find this surprising. For plpgsql the exception handling will rollback SQL from before the exception and I suspect the other PL's are the same. It would be good if a few more people chimed in on if they see this as a good idea. Another concern is that we are probably breaking some peoples code. Consider the following: BEGIN; create table foo(a int4 primary key); DO $$ r=plpy.execute(insert into foo values ('1')) try : r=plpy.execute(insert into foo values ('1')) except: plpy.log(something went wrong) $$ language plpython2u; select * FROM foo; a --- 1 (1 row) This code is going to behave different with the patch. Without the patch the select fails because a) the transaction has rolled back which rollsback both insert and the create table. With the patch the first row shows up in the select. How concerned are we with changing the behaviour of existing plpython functions? This needs discussion. I am finding the treatment of savepoints very strange. If as a function author I'm able to recover from errors then I'd expect (or maybe want) to be able to manage them through savepoints BEGIN; create table foo(a int4 primary key); DO $$ plpy.execute(savepoint save) r=plpy.execute(insert into foo values ('1')) try : r=plpy.execute(insert into foo values ('1')) except: plpy.execute(rollback to save) plpy.log(something went wrong) $$ language plpython2u; select * FROM foo; a --- 1 (1 row) when I wrote the above I was expecting either an error when I tried to use the savepoint (like in plpgsql) or for it rollback the insert. Without the patch I get PL/Python: plpy.SPIError: SPI_execute failed: SPI_ERROR_TRANSACTION This is much better than silently ignoring the command. I've only glanced at your transaction manager patch, from what I can tell it will give me another way of managing the inner transaction but I'm not sure it will make the savepoint calls work(will it?). I also wonder how useful in practice this patch will be if the other patch doesn't also get applied (function others will be stuck with an all or nothing as their options for error handling) Code Review --- I don't see any issues with the code. Cheers, Jan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] log_hostname and pg_stat_activity
Here is my review for this patch Submission Review -Patch applies cleanly -Patch does not include documentation changes. At a minimum: update the table that lists what pg_stat_activity and pg_stat_replication includes in monitoring.sgml but I propose more below. -No tests are included but writing unit tests that depend on produce the same output involving the hostname of the client is not possible. Usability review See my comments below in the testing section. The patch does do what it says but the log_hostname issue is a usability issue (it not being obvious why you get only null owhen log_hostname is turned off). Documenting it might be fine. If log_hostname were new to 9.1 I would encourage renaming it to something that implies it does more than just control logging output but I don't see this as a good enough reason to rename a parameter. I think being able to see the hostnames connections in pg_stat_activity come from is useful and it is a feature we don't already have. Feature Test If my connection is authorized through a line in pg_hba that uses client_hostname then the column shows what I expect even with log_hostname set to off. However if I connect with a line in pg_hba that matches on an IP network then my client_hostname is always null unless log_hostname is set to true. This is consistent with the behavior you describe but I think the average user will find it a bit confusing. Having a column that is always null unless a GUC is set is less than ideal but I understand why log_hostname isn't on by default. I would be inclined to add an entry for these views in catalogs.sgml that where we can then give users a pointer that they need to set log_hostname to get anything out of this column. If I connect through unix sockets (say psql -h /tmp --port 5433) I was also expecting to see [local] in client_hostname but I am just getting NULL. This this lookup is static I don't see why it should be dependent on log_hostname (even with log_hostname set I'm not seeing [local]) I have not tested this with ipv6 Performance Review -- The lookup is done when the connection is established not each time the view is queried. I don't see any performance issues Coding Review - As Alvaro pointed out BackendStatusShmemSize should be updated. To answer his question about why clientaddr works: clientaddr is a SockAddr which is a structure not a pointer so the data gets copied by value to beentry. That won't work for strings, I have no issues with how your allocating space in beentry and then strlcpy'ing the values. I see no issues with the implementation I'm marking this as 'Waiting for Author' pending documentation changes and maybe a fix the behaviour with unix sockets Steve -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch to add a primary key using an existing index
I've taken a look at this version of the patch. Submission Review This version of the patch applies cleanly to master. It matches your git repo and includes test + docs. Usability Review --- The command syntax now matches what was discussed during the last cf. The text of the notice: test=# alter table a add constraint acons unique using index aind2; NOTICE: ALTER TABLE / ADD UNIQUE USING INDEX will rename index aind2 to acons Documentation -- I've attached a patch (to be applied on top of your latest patch) with some editorial changes I'd recommend to your documentation. I feel it reads a bit clearer (but others should chime in if they disagree or have better wordings) A git tree with changes rebased to master + this change is available at https://github.com/ssinger/postgres/tree/ssinger/constraint_with_index Code Review --- src/backend/parser/parse_utilcmd.c: 1452 Your calling strdup on the attribute name. I don't have a good enough grasp on the code to be able to trace this through to where the memory gets free'd. Does it get freed? Should/could this be a call to pstrdup Feature Test - I wasn't able to find any issues in my testing I'm marking this as returned with feedback pending your answer on the possible memory leak above but I think the patch is very close to being ready. Steve Singer diff --git a/doc/src/sgml/ref/alter_table.sgml b/doc/src/sgml/ref/alter_table.sgml index 83d2fbb..0b486ab 100644 *** a/doc/src/sgml/ref/alter_table.sgml --- b/doc/src/sgml/ref/alter_table.sgml *** ALTER TABLE replaceable class=PARAMETE *** 242,268 termliteralADD replaceable class=PARAMETERtable_constraint_using_index/replaceable/literal/term listitem para ! This form adds a new literalPRIMARY KEY/ or literalUNIQUE/ constraint to the table using an existing index. All the columns of the index will be included in the constraint. /para para ! The index should be UNIQUE, and should not be a firsttermpartial index/ ! or an firsttermexpressional index/. /para para ! This can be helpful in situations where one wishes to recreate or ! literalREINDEX/ the index of a literalPRIMARY KEY/ or a ! literalUNIQUE/ constraint, but dropping and recreating the constraint ! to acheive the effect is not desirable. See the illustrative example below. /para note para If a constraint name is provided then the index will be renamed to that ! name, else the constraint will be named to match the index name. /para /note --- 242,270 termliteralADD replaceable class=PARAMETERtable_constraint_using_index/replaceable/literal/term listitem para ! This form adds a new literalPRIMARY KEY/ or literalUNIQUE/literal constraint to the table using an existing index. All the columns of the index will be included in the constraint. /para para ! The index should be a UNIQUE index. A firsttermpartial index/firstterm ! or an firsttermexpressional index/firstterm is not allowed. /para para ! Adding a constraint using an existing index can be helpful in situations ! where you wishes to rebuild an index used for a ! literalPRIMARY KEY/literal or a literalUNIQUE/literal constraint, ! but dropping and recreating the constraint ! is not desirable. See the illustrative example below. /para note para If a constraint name is provided then the index will be renamed to that ! name of the constraint. Otherwise the constraint will be named to match ! the index name. /para /note -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] We really ought to do something about O_DIRECT and data=journalled on ext4
On 10-12-06 09:00 PM, Josh Berkus wrote: Steve, If you tell me which options to pgbench and which .conf file settings you'd like to see I can probably arrange to run some tests on AIX. Compile and run test_fsync in PGSRC/src/tools/fsync. Attached are runs against two different disk sub-systems from a server running AIX 5.3. The first one is against the local disks Loops = 1 Simple write: 8k write 60812.454/second Compare file sync methods using one write: open_datasync 8k write 162.160/second open_sync 8k write 158.472/second 8k write, fdatasync 158.157/second 8k write, fsync 45.382/second Compare file sync methods using two writes: 2 open_datasync 8k writes79.472/second 2 open_sync 8k writes80.095/second 8k write, 8k write, fdatasync 159.268/second 8k write, 8k write, fsync44.725/second Compare open_sync with different sizes: open_sync 16k write 162.017/second 2 open_sync 8k writes79.709/second Test if fsync on non-write file descriptor is honored: (If the times are similar, fsync() can sync data written on a different descriptor.) 8k write, fsync, close 45.361/second 8k write, close, fsync 36.311/second The below profile is from the same machine using an IBM DS 6800 SAN for storage. Loops = 1 Simple write: 8k write 75933.027/second Compare file sync methods using one write: open_datasync 8k write 2762.801/second open_sync 8k write 2453.822/second 8k write, fdatasync2867.331/second 8k write, fsync1094.048/second Compare file sync methods using two writes: 2 open_datasync 8k writes 1287.845/second 2 open_sync 8k writes 1332.084/second 8k write, 8k write, fdatasync 1966.411/second 8k write, 8k write, fsync 1048.354/second Compare open_sync with different sizes: open_sync 16k write2281.425/second 2 open_sync 8k writes 1401.561/second Test if fsync on non-write file descriptor is honored: (If the times are similar, fsync() can sync data written on a different descriptor.) 8k write, fsync, close 1298.404/second 8k write, close, fsync 1188.582/second -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] We really ought to do something about O_DIRECT and data=journalled on ext4
On 10-12-06 06:56 PM, Greg Smith wrote: Tom Lane wrote: The various testing that's been reported so far is all for Linux and thus doesn't directly address the question of whether other kernels will have similar performance properties. Survey of some popular platforms: snip So my guess is that some small percentage of Windows users might notice a change here, and some testing on FreeBSD would be useful too. That's about it for platforms that I think anybody needs to worry about. If you tell me which options to pgbench and which .conf file settings you'd like to see I can probably arrange to run some tests on AIX. -- Greg Smith 2ndQuadrant usg...@2ndquadrant.comBaltimore, MD PostgreSQL Training, Services and Supportwww.2ndQuadrant.us PostgreSQL 9.0 High Performance:http://www.2ndQuadrant.com/books -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch to add a primary key using an existing index
On 10-11-22 03:24 PM, Steve Singer wrote: On 10-11-22 09:37 AM, Gurjeet Singh wrote: On Sat, Nov 20, 2010 at 9:00 AM, Steve Singer ssinger...@sympatico.ca Almost fixed. I still get an unexpected difference. ! DETAIL: cannot create PRIMARY KEY/UNIQUE constraint with a non-unique index. CREATE UNIQUE INDEX rpi_idx2 ON rpi_test(a , b); -- should fail; WITH INDEX option specified more than once. ALTER TABLE rpi_test ADD PRIMARY KEY (a, b) --- 35,41 -- should fail; non-unique ALTER TABLE rpi_test ADD primary key(a, b) WITH (INDEX = 'rpi_idx1'); ERROR: rpi_idx1 is not a unique index ! DETAIL: Cannot create PRIMARY KEY/UNIQUE constraint using a non-unique index. The attached version of the patch gets your regression tests to pass. I'm going to mark this as ready for a committer. replace_pkey_index.revised2.patch.gz Description: GNU Zip compressed data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch to add a primary key using an existing index
On 10-11-22 09:37 AM, Gurjeet Singh wrote: On Sat, Nov 20, 2010 at 9:00 AM, Steve Singer ssinger...@sympatico.ca mailto:ssinger...@sympatico.ca wrote: Submission Review: Tests The expected output for the regression tests you added don't match what I'm getting when I run the tests with your patch applied. I think you just need to regenerate the expected results they seem to be from a previous version of the patch (different error messages etc..). Fixed. Also modified one test to cover the case where constraint name is provided. Almost fixed. I still get an unexpected difference. ! DETAIL: cannot create PRIMARY KEY/UNIQUE constraint with a non-unique index. CREATE UNIQUE INDEX rpi_idx2 ON rpi_test(a , b); -- should fail; WITH INDEX option specified more than once. ALTER TABLE rpi_test ADD PRIMARY KEY (a, b) --- 35,41 -- should fail; non-unique ALTER TABLE rpi_test ADD primary key(a, b) WITH (INDEX = 'rpi_idx1'); ERROR: rpi_idx1 is not a unique index ! DETAIL: Cannot create PRIMARY KEY/UNIQUE constraint using a non-unique index. Documentation --- I was able to generate the docs. The ALTER TABLE page under the synopsis has ADD table_constraint where table_constraint is defined on the CREATE TABLE page. On the CREATE TABLE page table_constraint isn't defined as having the WITH , the WITH is part of index_parameters. I propose the alter table page instead have ADD table_constraint [index_parameters] where index_parameters also references the CREATE TABLE page like table_constraint. IMHO index_parameters is an optional component of table_constraint, and hence can't be mentioned here, at least not the way shown above. My reading of CREATE TABLE is that index_parameters is an optional parameter that comes after table_constraint and isn't part of table_constraint. Any other opinions? Everything else I mentioned seems fixed in this version gurjeet.singh @ EnterpriseDB - The Enterprise Postgres Company http://www.EnterpriseDB.com singh.gurj...@{ gmail | yahoo }.com Twitter/Skype: singh_gurjeet Mail sent from my BlackLaptop device
Re: [HACKERS] Patch to add a primary key using an existing index
On 10-11-07 01:54 PM, Gurjeet Singh wrote: Attached is the patch that extends the same feature for UNIQUE indexes. It also includes some doc changes for the ALTER TABLE command, but I could not verify the resulting changes since I don't have the doc-building infrastructure installed. Regards, Gurjeet, I've taken a stab at reviewing this. Submission Review: Tests The expected output for the regression tests you added don't match what I'm getting when I run the tests with your patch applied. I think you just need to regenerate the expected results they seem to be from a previous version of the patch (different error messages etc..). Documentation --- I was able to generate the docs. The ALTER TABLE page under the synopsis has ADD table_constraint where table_constraint is defined on the CREATE TABLE page. On the CREATE TABLE page table_constraint isn't defined as having the WITH , the WITH is part of index_parameters. I propose the alter table page instead have ADD table_constraint [index_parameters] where index_parameters also references the CREATE TABLE page like table_constraint. Usability Review Behaviour - I feel that if the ALTER TABLE ... renames the the index a NOTICE should be generated. We generate notices about creating an index for a new pkey. We should give them a notice that we are renaming an index on them. Coding Review: == Error Messages - in tablecmds your errdetail messages often don't start with a capital letter. I belive the preference is to have the errdetail strings start with a capital letter and end with a period. tablecmds.c - get_constraint_index_oid contains the check /* Currently only B-tree indexes are suupported for primary keys */ if (index_rel-rd_rel-relam != BTREE_AM_OID) elog(ERROR, \%s\ is not a B-Tree index, index_name); but above we already validate that the index is a unique index with another check. Today only B-tree indexes support unique constraints. If this changed at some point and we could have a unique index of some other type, would something in this patch need to be changed to support them? If we are only depending on the uniqueness property then I think this check is covered by the uniquness one higher in the function. Also note the typo in your comment above (suupported) Comments - index.c: Line 671 and 694. Your indentation changes make the comments run over 80 characters. If you end up submitting a new version of the patch I'd reformat those two comments. Other than those issues the patch looks good to me. Steve -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review: rollback sequence reset for TRUNCATE ... RESTART IDENTITY
On 10-11-17 03:00 PM, Marc Cousin wrote: The Wednesday 17 November 2010 19:41:19, Tom Lane wrote : Marc Cousincousinm...@gmail.com writes: - Does the feature work as advertised? Yes. It works consistently, isn't fooled by savepoints or multiple serials in a table, or concurrent transactions I think there's a rather nasty problem here, which is what to do with the cached nextval/currval state. As submitted, the patch does the same thing as ALTER SEQUENCE RESTART (to wit, clear any cached unissued nextval values, but don't touch currval) at the time of resetting the sequence. That's fine, but what if the transaction later rolls back? The cached state is untouched by rollback, so if the transaction had done any nextval()s meanwhile, the cache will be out of step with the rolled-back sequence contents. Yes, I completely missed testing with non default cache value. And it fails, of course, some values are generated a second time twice after a rollback I will look at addressing this in an updated patch. We never had to worry about this before because sequence operations didn't roll back, by definition. If we're going to add a situation where they do roll back, we need to consider the case. I think we can arrange to clear cached unissued values on the next attempt to nextval() the sequence, by dint of adding the relfilenode to SeqTable entries and clearing cached state whenever we note that it doesn't match the current relfilenode of the sequence. However, I'm unsure what ought to happen to currval. It doesn't seem too practical to try to roll it back to its pre-transaction value. Should we leave it alone (ie, possibly reflecting a value that was assigned inside the failed transaction)? The other alternative would be to clear it as though nextval had never been issued at all in the session. Should currval really be used after a failed transaction ? Right now, we can have a value that has been generated inside a rollbacked transaction too. I'd vote for leave it alone. I agree probably shouldn't be using curval after a failed transaction which is why having it return as if it hadn't been issued sounds like a more reasonable behaviour. If an application tries a currval following the rollback then at least the application won't get a bogus value. It is better to return an error than to let the application continuing on thinking it has a sequence value that won't be (or has not) been assigned to some other session. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Rollback sequence reset on TRUNCATE rollback patch
The attached patch modifies TRUNCATE ... RESTART IDENTITY so that if the transaction rolls back the restart of the sequence will also be rolled back. It follows the general outline discussed at http://archives.postgresql.org/pgsql-hackers/2008-05/msg00550.php of assigning a new reffilenode to the sequence. I will add this to the next commitfest. Steve diff --git a/doc/src/sgml/ref/truncate.sgml b/doc/src/sgml/ref/truncate.sgml index f32d255..137eade 100644 *** a/doc/src/sgml/ref/truncate.sgml --- b/doc/src/sgml/ref/truncate.sgml *** TRUNCATE [ TABLE ] [ ONLY ] replaceable *** 159,190 transaction does not commit. /para - warning -para - Any commandALTER SEQUENCE RESTART/ operations performed as a - consequence of using the literalRESTART IDENTITY/ option are - nontransactional and will not be rolled back on failure. To minimize - the risk, these operations are performed only after all the rest of - commandTRUNCATE/'s work is done. However, there is still a risk - if commandTRUNCATE/ is performed inside a transaction block that is - aborted afterwards. For example, consider - programlisting - BEGIN; - TRUNCATE TABLE foo RESTART IDENTITY; - COPY foo FROM ...; - COMMIT; - /programlisting - - If the commandCOPY/ fails partway through, the table data - rolls back correctly, but the sequences will be left with values - that are probably smaller than they had before, possibly leading - to duplicate-key failures or other problems in later transactions. - If this is likely to be a problem, it's best to avoid using - literalRESTART IDENTITY/, and accept that the new contents of - the table will have higher serial numbers than the old. -/para - /warning /refsect1 refsect1 --- 159,165 diff --git a/src/backend/commands/sequence.c b/src/backend/commands/sequence.c index 04b0c71..4fb9093 100644 *** a/src/backend/commands/sequence.c --- b/src/backend/commands/sequence.c *** *** 35,41 #include utils/lsyscache.h #include utils/resowner.h #include utils/syscache.h ! /* * We don't want to log each fetching of a value from a sequence, --- 35,41 #include utils/lsyscache.h #include utils/resowner.h #include utils/syscache.h ! #include utils/snapmgr.h /* * We don't want to log each fetching of a value from a sequence, *** static void init_params(List *options, b *** 96,101 --- 96,104 static void do_setval(Oid relid, int64 next, bool iscalled); static void process_owned_by(Relation seqrel, List *owned_by); + static void init_seq_relation(Relation rel,TupleDesc tupDesc,Datum * value, + bool * null,List * owned_by); + /* * DefineSequence *** DefineSequence(CreateSeqStmt *seq) *** 109,119 CreateStmt *stmt = makeNode(CreateStmt); Oid seqoid; Relation rel; ! Buffer buf; ! Page page; ! sequence_magic *sm; ! HeapTuple tuple; ! TupleDesc tupDesc; Datum value[SEQ_COL_LASTCOL]; bool null[SEQ_COL_LASTCOL]; int i; --- 112,118 CreateStmt *stmt = makeNode(CreateStmt); Oid seqoid; Relation rel; ! TupleDesc tupDesc; Datum value[SEQ_COL_LASTCOL]; bool null[SEQ_COL_LASTCOL]; int i; *** DefineSequence(CreateSeqStmt *seq) *** 210,217 rel = heap_open(seqoid, AccessExclusiveLock); tupDesc = RelationGetDescr(rel); ! /* Initialize first page of relation with special magic number */ buf = ReadBuffer(rel, P_NEW); Assert(BufferGetBlockNumber(buf) == 0); --- 209,293 rel = heap_open(seqoid, AccessExclusiveLock); tupDesc = RelationGetDescr(rel); + init_seq_relation(rel, tupDesc,value,null,owned_by); + heap_close(rel, NoLock); + } ! /** ! * Resets the relation used by a sequence. ! * ! * The sequence is reset to its initial values, ! * the old sequence is left in place in case the ! * transaction rolls back. ! */ ! void ResetSequenceRelation(Oid seq_relid,List * options) ! { ! Relation seq_rel = relation_open(seq_relid,AccessExclusiveLock); ! SeqTable elm; ! Page page; ! Form_pg_sequence seq; ! Buffer buf; ! TupleDesc tupDesc; ! sequence_magic * sm; ! HeapTupleData tuple; ! ItemId lp; ! Datum * values; ! bool * isnull; ! ! /** ! * Read the old sequence. ! * ! */ ! init_sequence(seq_relid,elm,seq_rel); ! seq = read_info(elm,seq_rel,buf); ! page = BufferGetPage(buf); ! sm = (sequence_magic *) PageGetSpecialPointer(page); ! ! if (sm-magic != SEQ_MAGIC) ! elog(ERROR, bad magic number in sequence \%s\: %08X, ! RelationGetRelationName(seq_rel), sm-magic); ! ! lp = PageGetItemId(page, FirstOffsetNumber); ! Assert(ItemIdIsNormal(lp)); ! ! tuple.t_data = (HeapTupleHeader) PageGetItem(page, lp); ! tupDesc = RelationGetDescr(seq_rel); ! values=palloc(sizeof(Datum)*tupDesc-natts); ! isnull=palloc(sizeof(bool)*tupDesc-natts); !
Re: [HACKERS] Review: Fix snapshot taking inconsistencies
On Sun, 10 Oct 2010, Marko Tiikkaja wrote: On 2010-10-07 5:21 AM +0300, Steve Singer wrote: Since no one else has proposed a better idea and the commit fest is ticking away I think you should go ahead and do that. Here's a new version of the patch, deprecating pg_parse_and_rewrite. I duplicated the parse/rewrite logic in the two places where pg_parse_and_rewrite is currently used, per comment from Tom. That looks fine. I've marking the patch as ready for a committer. Regards, Marko Tiikkaja -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Sync Rep at Oct 5
On 10-10-07 05:52 AM, Fujii Masao wrote: On Wed, Oct 6, 2010 at 4:06 PM, Simon Riggssi...@2ndquadrant.com wrote: The problem is how much WAL is stored on (any) node. Currently that is wal_keep_segments, which doesn't work very well, but I've seen no better ideas that cover all important cases. What about allowing the master to read and send WAL from the archive? Regards, Then you have to deal with telling the archive how long it needs to keep WAL segments because the master might ask for them back. If the archive is remote from the master then you have some extra network copying going on. It would be better to let the slave being reconfigured to read the missing WAL from the archive. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers