Re: [HACKERS] xml_is_document and selective pg_re_throw
No, I don't see any particular risk there. The places that might throw ERRCODE_INVALID_XML_DOCUMENT are sufficiently few (as in, exactly one, in this usage) that we can have reasonable confidence we know what the system state is when we catch that error. Hmmm, I was writing some code in which I happened to hold a LWLock when this function was called. The first catch/rethrow cleaned up the InterruptHoldoffCount value. A subsequent release of that LWLock tripped up the (Assert(InterruptHoldoffCount 0);) inside RESUME_INTERRUPTS(). I know holding an lwlock like this might not be a good idea, but this behavior just got me thinking about other probable issues. Regards, Nikhils A better way would have been to modify xml_parse to take an additional boolean argument to_rethrow and not to rethrow if that is false? We could do that, but it would greatly complicate xml_parse IMO, since it still needs its own PG_TRY block to handle other error cases, and only one of those error cases ought to optionally return failure instead of re-throwing. regards, tom lane
Re: [HACKERS] hint bit i/o reduction
1) Keep track # times the last transaction id was repeatedly seen in tqual.c (resetting as soon as a new xid was touched. We can do this just for xmin, or separately for both xmin and xmax. Will this be done when we see a new xid duricg scan of tuple, if yes then Won't this logic impact incase there are very few records per transaction in database. As in that case the new optimization won't help much, but it adds the new instructions. We can also try to be smart and disable the 'avoid setting the hint bit' logic if the page is already dirty. Whats the benefit for not setting hint bit for a already dirty page. -Original Message- From: pgsql-hackers-ow...@postgresql.org [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Merlin Moncure Sent: Wednesday, June 13, 2012 4:28 AM To: PostgreSQL-development Subject: [HACKERS] hint bit i/o reduction hackers, A while back I made an effort implementing a backend local hint bit cache with the intention of mitigating i/o impact for scan heavy workloads that involve moving a lot of records around per transaction. The basic concept was to keep some backend private memory that tracked the resolution of recently seen transactions and worked in a similar fashion to the hint bits: if (!commit_hint_bit_set hint_bit_cache(xid) == committed) { The other interesting aspect was that, if the bit was found in the cache, the bit was set on the tuple but the page was not dirtied. If the xid was not found in the cache, the bit was set and the page was dirtied normally. From a strictly performance standpoint, limited testing seemed to indicate it more or less fixed the hint bit i/o problems. Still, injecting extra logic (especially a cache lookup) into the visibility routines is not to be taken lightly, and maybe there's a better way. There simply has to be a way to ameliorate hint bit i/o without spending a lot of cycles and hopefully not too much impacting other workloads. Unfortunately, assuming the CRC patch makes it in, any of the branch of tricks in the line of 'set the bit but avoid dirtying the page' aren't going to fly. I think, at the end of the day, we have to avoid setting the bit, but only in cases where we are fairly sure we would prefer not to do that. I'm thinking something fairly simple would get some good mileage: 1) Keep track # times the last transaction id was repeatedly seen in tqual.c (resetting as soon as a new xid was touched. We can do this just for xmin, or separately for both xmin and xmax. 2) right after checking the hint bit (assuming it wasn't set), test to see if our xid is the 'last xid', and iff #times_seen_same #some_number (say, 100 or 1000), use it as if the hint bit was set. #some_number can be chosen to some fairly conservative defined value, or perhaps we can be sneaky and try and adjust it based on things like how many unforced clog faults we're doing -- maybe even keeping some accounting at the page level. We can also try to be smart and disable the 'avoid setting the hint bit' logic if the page is already dirty. The presumption here is that if you're seeing the same xid over and over, there simply isn't a lot of value in recording it in page after page after page. It's tempting to suggest that there is already an 'is this xid the same as the last one AKA last_transaction_id' at the clog visibility checking level, but that doesn't help for the following reasons: 1) while it does avoid the jump out to clog, it occurs at the wrong place in the sequence of visibility checking (after you've wasted cycles in TransactionIdIsInProgress() etc) and 2) by being in the clog level can't influence visibility behaviors of whether or not to set the bit. 3) it's not inline I see two potential downsides here: 1) maybe the extra logic in visibility is too expensive (but i doubt it) 2) you're missing early opportunities to set the all visible bit which in turn will influence index only scans. The above might be a small price to pay for saving all the i/o and sooner or later if the records are long lived vacuum will swing around and tag them. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Minimising windows installer password confusion
On Wed, Jun 13, 2012 at 2:12 AM, Craig Ringer cr...@postnewspapers.com.au wrote: Users don't remember passwords, though. It's one of those constants, and is why practically every web site etc out there offers password recovery. The installer IMO needs to store the postgres account password in a registry key with permissions set so that only users with local admin rights (ie: who can use the installer) can view it. I don't like the idea of storing a password, but it's only going to be accessible if you already have rights to the registry as local admin, in which case the attacker can just reset it themselves (or root your machine). So long as they installer warns that the password shouldn't be one you use elsewhere because it can be recovered from your computer, I don't see a problem.--- The idea of storing the password in clear text in the registry gives me nervous twitches. Whilst is should be secure if done as you suggest, a) a simple mistake could leave it vulnerable and give us an embarrassing security issue to deal with. It also doesn't help us in the cases where users have another installation of PostgreSQL from somewhere that doesn't store the password (which is likely to be the case for years to come, even if it was our installer that was used previously). -- Dave Page Blog: http://pgsnake.blogspot.com Twitter: @pgsnake EnterpriseDB UK: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Minimising windows installer password confusion
On Wed, Jun 13, 2012 at 2:18 AM, Craig Ringer cr...@postnewspapers.com.au wrote: On 06/12/2012 08:08 PM, Dave Page wrote: Some background: By default the installer will use 'postgres' for both the service (OS) account, and the database superuser account. It will use the same password for both (though, users have complete control at the command line if they want it, which is why the account names are shown in the installer). Unlike *nix installations, the service account must have a password, which is required during both installation and upgrade to configure the PostgreSQL service. We therefore ask for the password during both installation and upgrade. If the user account exists (which can happen if there has previously been an installation of Postgres on the system), the user must specify the existing password for the account during installation (and of course, during all upgrades). This is where users normally get stuck, as they've forgotten the password they set in the past. They may not even have set it, because the EnterpriseDB installer can be run unattended and may've been used by some 3rd party package. I'd be interested in seeing what Microsoft installers that create isolated user accounts do. I think .NET creates one for ASP, so that'd be one option to look into. They tend to use the localsystem or networkservice accounts for most things, which don't have passwords. The reason we don't do that is that since the early days of 8.0 we've said we want to enable users to use the service account as if it were a regular account, so that they can do things like access network resources (useful for server-side copy for example). I wasn't overly convinced back then that that was necessary, and I'm still not now. I suppose we potentially could use the networkservice account by default, and let advanced users override that on the command line if they need to. Then remembering the password becomes their responsibility. -- Dave Page Blog: http://pgsnake.blogspot.com Twitter: @pgsnake EnterpriseDB UK: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] errdetail_busy_db plural forms
I was looking for missing use of gettext plural forms, which led me to errdetail_busy_db(). While we can't do much about this: errdetail(There are %d other session(s) and %d prepared transaction(s) using the database., notherbackends, npreparedxacts); I think it's still worth pluralizing the other cases errdetail(There are %d other session(s) using the database., notherbackends); and errdetail(There are %d prepared transaction(s) using the database., npreparedxacts); Especially the other sessions case is probably the one most seen by users. So I propose the attached patch. diff --git i/src/backend/commands/dbcommands.c w/src/backend/commands/dbcommands.c index b7224bd..c9b80ad 100644 --- i/src/backend/commands/dbcommands.c +++ w/src/backend/commands/dbcommands.c @@ -1804,20 +1804,21 @@ static bool get_db_info(const char *name, LOCKMODE lockmode, static int errdetail_busy_db(int notherbackends, int npreparedxacts) { - /* - * We don't worry about singular versus plural here, since the English - * rules for that don't translate very well. But we can at least avoid - * the case of zero items. - */ if (notherbackends 0 npreparedxacts 0) + /* We don't deal with singular versus plural here, since gettext + * doesn't support multiple plurals in one string. */ errdetail(There are %d other session(s) and %d prepared transaction(s) using the database., notherbackends, npreparedxacts); else if (notherbackends 0) - errdetail(There are %d other session(s) using the database., - notherbackends); + errdetail_plural(There is %d other session using the database., + There are %d other sessions using the database., + notherbackends, + notherbackends); else - errdetail(There are %d prepared transaction(s) using the database., - npreparedxacts); + errdetail_plural(There is %d prepared transaction using the database., + There are %d prepared transactions using the database., + npreparedxacts, + npreparedxacts); return 0; /* just to keep ereport macro happy */ } -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Minimising windows installer password confusion
On Wed, Jun 13, 2012 at 3:07 AM, Craig Ringer cr...@postnewspapers.com.au wrote: On 06/13/2012 01:19 AM, Sachin Srivastava wrote: On Tue, Jun 12, 2012 at 7:43 PM, Dave Page dp...@pgadmin.org mailto:dp...@pgadmin.org wrote: On Tue, Jun 12, 2012 at 2:57 PM, Robert Haas robertmh...@gmail.com mailto:robertmh...@gmail.com wrote: What we need is to display a different dialogue based on the situation. If the account already exists, we should say Please enter the password for the existing postgres account. If you do not know the password, you can reset it using the Windows control panel. Why using the windows control panel ? Because when I wrote the email I was looking for a simple solution that wouldn't require writing code that has potential to fail depending on how the users environment is configured (the user account stuff tends to go wrong in weird ways, for example when used on domains in unusual (or high security) configurations. We're spending a lot of effort at the moment getting the 9.2 buildfarm together, and updating all the StackBuilder add-on packages (think multiple man-months) - I'm trying not to add to that too much. -- Dave Page Blog: http://pgsnake.blogspot.com Twitter: @pgsnake EnterpriseDB UK: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Minimising windows installer password confusion
On Jun13, 2012, at 11:10 , Dave Page wrote: On Wed, Jun 13, 2012 at 2:12 AM, Craig Ringer cr...@postnewspapers.com.au wrote: Users don't remember passwords, though. It's one of those constants, and is why practically every web site etc out there offers password recovery. The installer IMO needs to store the postgres account password in a registry key with permissions set so that only users with local admin rights (ie: who can use the installer) can view it. I don't like the idea of storing a password, but it's only going to be accessible if you already have rights to the registry as local admin, in which case the attacker can just reset it themselves (or root your machine). So long as they installer warns that the password shouldn't be one you use elsewhere because it can be recovered from your computer, I don't see a problem.--- The idea of storing the password in clear text in the registry gives me nervous twitches. Whilst is should be secure if done as you suggest, a) a simple mistake could leave it vulnerable and give us an embarrassing security issue to deal with. It also doesn't help us in the cases where users have another installation of PostgreSQL from somewhere that doesn't store the password (which is likely to be the case for years to come, even if it was our installer that was used previously). Hm, doesn't the registry already contain the postgres service account's password? AFAIR, on windows you cannot really impersonate an account without knowing it's password, which is the reason why a) the password of a user account is stored in the registry if you enable auto-logon and b) you need to know the service account's password to create a service. Some googling brought up a tool called isvcpwd[1] which seems to be able to change service account passwords without breaking services. Judging from a brief glance over the source code, it does so by iterating over all services domain-wide, and adjusting the service definition of those which rely on the modified account(s). So that seems to support the theory that the passwords are stored in the individual machine's registries. Some further googling indicates that, yes, the service account passwords are stored in the registry, but are only accessible to the LocalSystem account [2]. Querying them from the postgres installer thus isn't really an option. But what you could do, I guess, is to offer the user the ability to change the password, and using the approach from [1] to update the affected service definitions afterwards. best regards, Florian Pflug [1] https://www.itefix.no/i2/isvcpwd [2] http://www.windowsnetworking.com/kbase/windowstips/windowsnt/registrytips/miscellaneous/LSASecrets.html -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] initdb and fsync
On tis, 2012-06-12 at 21:09 -0700, Jeff Davis wrote: On Sun, 2012-03-25 at 19:59 -0700, Jeff Davis wrote: On Sat, 2012-03-17 at 17:48 +0100, Cédric Villemain wrote: I agree with Andres. I believe we should use sync_file_range (_before?) with linux. And we can use posix_fadvise_dontneed on other kernels. OK, updated patch attached. sync_file_range() is preferred, posix_fadvise() is a fallback. Rebased patch attached. No other changes. The --help output for the -N option was copy-and-pasted wrongly. The message issued when using -N is also a bit content-free. Maybe something like Running in nosync mode. The data directory might become corrupt if the operating system crashes.\n Which leads to the question, how does one get out of this state? Is running sync(1) enough? Is starting the postgres server enough? There are no updates to the initdb man page included in your patch, which would be a suitable place to discuss this briefly. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] [RFC][PATCH] Logical Replication/BDR prototype and architecture
Hi everyone, This mail contains the highlevel design description of how our prototype of in-core logical replication works. The individual patches will be posted as replies to this email. I obviously welcome all sorts of productive comments to both the individual patches and the architecture. Unless somebody objects I will add most of the individual marked as RFC to the current commitfest. I hope that with comments stemming from that round we can get several of the patches into the first or second commitfest. As soon as the design is clear/accepted we will try very hard to get the following patches into the second or third round. If anybody disaggrees with the procedual way we try do this, please raise a hand now. I tried to find the right balance between keeping the description short enough that anybody will read the design docs and verbose enough that it is understandable. I can go into much more detail in any part if wanted. Please, keep in mind that those patches are *RFC* and a prototype and not intended to be applied as-is. There is a short description of the individual patches and their relevancy at the end of the email. Greetings, Andres === Design goals for logical replication === : - in core - fast - async - robust - multi-master - modular - as unintrusive as possible implementation wise - basis for other technologies (sharding, replication into other DBMSs, ...) For reasons why we think this is an important set of features please check out the presentation from the in-core replication summit at pgcon: http://wiki.postgresql.org/wiki/File:BDR_Presentation_PGCon2012.pdf While you may argue that most of the above design goals are already provided by various trigger based replication solutions like Londiste or Slony, we think that thats not enough for various reasons: - not in core (and thus less trustworthy) - duplication of writes due to an additional log - performance in general (check the end of the above presentation) - complex to use because there is no native administration interface We want to emphasize that this proposed architecture is based on the experience of developing a minimal prototype which we developed with the above goals in mind. While we obviously hope that a good part of it is reusable for the community we definitely do *not* expect that the community accepts this +as-is. It is intended to be the basis upon which we, the community, can build and design the future logical replication. === Basic architecture === : Very broadly speaking there are several major pieces common to most approaches to replication: 1. Source data generation 2. Transportation of that data 3. Applying the changes 4. Conflict resolution 1.: As we need a change stream that contains all required changes in the correct order, the requirement for this stream to reflect changes across multiple concurrent backends raises concurrency and scalability issues. Reusing the WAL stream for this seems a good choice since it is needed anyway and adresses those issues already, and it further means that we don't incur duplicate writes. Any other stream generating componenent would introduce additional scalability issues. We need a change stream that contains all required changes in the correct order which thus needs to be synchronized across concurrent backends which introduces obvious concurrency/scalability issues. Reusing the WAL stream for this seems a good choice since it is needed anyway and adresses those issues already, and it further means we don't duplicate the writes and locks already performance for its maintenance. Unfortunately, in this case, the WAL is mostly a physical representation of the changes and thus does not, by itself, contain the necessary information in a convenient format to create logical changesets. The biggest problem is, that interpreting tuples in the WAL stream requires an up-to-date system catalog and needs to be done in a compatible backend and architecture. The requirement of an up-to-date catalog could be solved by adding more data to the WAL stream but it seems to be likely that that would require relatively intrusive complex changes. Instead we chose to require a synchronized catalog at the decoding site. That adds some complexity to use cases like replicating into a different database or cross-version replication. For those it is relatively straight-forward to develop a proxy pg instance that only contains the catalog and does the transformation to textual changes. This also is the solution to the other big problem, the need to work around architecture/version specific binary formats. The alternative, producing cross-version, cross-architecture compatible binary changes or even moreso textual changes all the time seems to be prohibitively expensive. Both from a cpu and a storage POV and also from the point of implementation effort. The catalog on the site where changes originate can *not* be used for the decoding because at the time we decode the WAL
[HACKERS] [PATCH 03/16] Add a new syscache to fetch a pg_class entry via its relfilenode
From: Andres Freund and...@anarazel.de This patch is problematic because formally indexes used by syscaches needs to be unique, this one is not though because of 0/InvalidOids entries for nailed/shared catalog entries. Those values aren't allowed to be queried though. It might be nicer to add infrastructure to do this properly, I just don't have a clue what the best way for this would be. --- src/backend/utils/cache/syscache.c | 11 +++ src/include/catalog/indexing.h |2 ++ src/include/utils/syscache.h |1 + 3 files changed, 14 insertions(+) diff --git a/src/backend/utils/cache/syscache.c b/src/backend/utils/cache/syscache.c index c365ec7..9cfb013 100644 --- a/src/backend/utils/cache/syscache.c +++ b/src/backend/utils/cache/syscache.c @@ -588,6 +588,17 @@ static const struct cachedesc cacheinfo[] = { }, 1024 }, + {RelationRelationId,/* RELFILENODE */ + ClassRelfilenodeIndexId, + 1, + { + Anum_pg_class_relfilenode, + 0, + 0, + 0 + }, + 1024 + }, {RewriteRelationId, /* RULERELNAME */ RewriteRelRulenameIndexId, 2, diff --git a/src/include/catalog/indexing.h b/src/include/catalog/indexing.h index 450ec25..5c9419b 100644 --- a/src/include/catalog/indexing.h +++ b/src/include/catalog/indexing.h @@ -106,6 +106,8 @@ DECLARE_UNIQUE_INDEX(pg_class_oid_index, 2662, on pg_class using btree(oid oid_o #define ClassOidIndexId 2662 DECLARE_UNIQUE_INDEX(pg_class_relname_nsp_index, 2663, on pg_class using btree(relname name_ops, relnamespace oid_ops)); #define ClassNameNspIndexId 2663 +DECLARE_INDEX(pg_class_relfilenode_index, 2844, on pg_class using btree(relfilenode oid_ops)); +#define ClassRelfilenodeIndexId 2844 DECLARE_UNIQUE_INDEX(pg_collation_name_enc_nsp_index, 3164, on pg_collation using btree(collname name_ops, collencoding int4_ops, collnamespace oid_ops)); #define CollationNameEncNspIndexId 3164 diff --git a/src/include/utils/syscache.h b/src/include/utils/syscache.h index d59dd4e..63a5042 100644 --- a/src/include/utils/syscache.h +++ b/src/include/utils/syscache.h @@ -73,6 +73,7 @@ enum SysCacheIdentifier RANGETYPE, RELNAMENSP, RELOID, + RELFILENODE, RULERELNAME, STATRELATTINH, TABLESPACEOID, -- 1.7.10.rc3.3.g19a6c.dirty -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] [PATCH 04/16] Add embedded list interface (header only)
From: Andres Freund and...@anarazel.de Adds a single and a double linked list which can easily embedded into other datastructures and can be used without any additional allocations. Problematic: It requires USE_INLINE to be used. It could be remade to fallback to to externally defined functions if that is not available but that hardly seems sensibly at this day and age. Besides, the speed hit would be noticeable and its only used in new code which could be disabled on machines - given they still exists - without proper support for inline functions --- src/include/utils/ilist.h | 253 + 1 file changed, 253 insertions(+) create mode 100644 src/include/utils/ilist.h diff --git a/src/include/utils/ilist.h b/src/include/utils/ilist.h new file mode 100644 index 000..81d540e --- /dev/null +++ b/src/include/utils/ilist.h @@ -0,0 +1,253 @@ +#ifndef ILIST_H +#define ILIST_H + +#ifdef __GNUC__ +#define unused_attr __attribute__((unused)) +#else +#define unused_attr +#endif + +#ifndef USE_INLINE +#error a compiler supporting static inlines is required +#endif + +#include assert.h + +typedef struct ilist_d_node ilist_d_node; + +struct ilist_d_node +{ + ilist_d_node* prev; + ilist_d_node* next; +}; + +typedef struct +{ + ilist_d_node head; +} ilist_d_head; + +typedef struct ilist_s_node ilist_s_node; + +struct ilist_s_node +{ + ilist_s_node* next; +}; + +typedef struct +{ + ilist_s_node head; +} ilist_s_head; + +#ifdef ILIST_DEBUG +void ilist_d_check(ilist_d_head* head); +#else +static inline void ilist_d_check(ilist_d_head* head) +{ +} +#endif + +static inline void ilist_d_init(ilist_d_head *head) +{ + head-head.next = head-head.prev = head-head; + ilist_d_check(head); +} + +/* + * adds a node at the beginning of the list + */ +static inline void ilist_d_push_front(ilist_d_head *head, ilist_d_node *node) +{ + node-next = head-head.next; + node-prev = head-head; + node-next-prev = node; + head-head.next = node; + ilist_d_check(head); +} + + +/* + * adds a node at the end of the list + */ +static inline void ilist_d_push_back(ilist_d_head *head, ilist_d_node *node) +{ + node-next = head-head; + node-prev = head-head.prev; + node-prev-next = node; + head-head.prev = node; + ilist_d_check(head); +} + + +/* + * adds a node after another *in the same list* + */ +static inline void ilist_d_add_after(unused_attr ilist_d_head *head, ilist_d_node *after, ilist_d_node *node) +{ + node-prev = after; + node-next = after-next; + after-next = node; + node-next-prev = node; + ilist_d_check(head); +} + +/* + * adds a node after another *in the same list* + */ +static inline void ilist_d_add_before(unused_attr ilist_d_head *head, ilist_d_node *before, ilist_d_node *node) +{ + node-prev = before-prev; + node-next = before; + before-prev = node; + node-prev-next = node; + ilist_d_check(head); +} + + +/* + * removes a node from a list + */ +static inline void ilist_d_remove(unused_attr ilist_d_head *head, ilist_d_node *node) +{ + ilist_d_check(head); + node-prev-next = node-next; + node-next-prev = node-prev; + ilist_d_check(head); +} + +/* + * removes the first node from a list or returns NULL + */ +static inline ilist_d_node* ilist_d_pop_front(ilist_d_head *head) +{ + ilist_d_node* ret; + + if (head-head == head-head.next) + return NULL; + + ret = head-head.next; + ilist_d_remove(head, head-head.next); + return ret; +} + + +static inline bool ilist_d_has_next(ilist_d_head *head, ilist_d_node *node) +{ + return node-next != head-head; +} + +static inline bool ilist_d_has_prev(ilist_d_head *head, ilist_d_node *node) +{ + return node-prev != head-head; +} + +static inline bool ilist_d_is_empty(ilist_d_head *head) +{ + return head-head.next == head-head.prev; +} + +#define ilist_d_front(type, membername, ptr) (((ptr)-head) == (ptr)-head.next) ? \ + NULL : ilist_container(type, membername, (ptr)-head.next) + +#define ilist_d_front_unchecked(type, membername, ptr) ilist_container(type, membername, (ptr)-head.next) + +#define ilist_d_back(type, membername, ptr) (((ptr)-head) == (ptr)-head.prev) ? \ + NULL : ilist_container(type, membername, (ptr)-head.prev) + +#define ilist_container(type, membername, ptr) ((type*)((char*)(ptr) - offsetof(type, membername))) + +#define ilist_d_foreach(name, ptr) for(name = (ptr)-head.next;\ + name != (ptr)-head; \ + name = name-next) + +#define ilist_d_foreach_modify(name, nxt, ptr) for(name = (ptr)-head.next, \ + nxt = name-next; \ + name != (ptr)-head \ +
[HACKERS] [PATCH 02/16] Add zeroRecPtr as a shortcut for initializing a local variable to {0, 0}
From: Andres Freund and...@anarazel.de This is locally defined in lots of places and would get introduced frequently in the next commits. It is expected that this can be defined in a header-only manner as soon as the XLogInsert scalability groundwork from Heikki gets in. --- src/backend/access/transam/xlogutils.c |1 + src/include/access/xlogdefs.h |1 + 2 files changed, 2 insertions(+) diff --git a/src/backend/access/transam/xlogutils.c b/src/backend/access/transam/xlogutils.c index 6ddcc59..3a2462b 100644 --- a/src/backend/access/transam/xlogutils.c +++ b/src/backend/access/transam/xlogutils.c @@ -51,6 +51,7 @@ typedef struct xl_invalid_page static HTAB *invalid_page_tab = NULL; +XLogRecPtr zeroRecPtr = {0, 0}; /* Report a reference to an invalid page */ static void diff --git a/src/include/access/xlogdefs.h b/src/include/access/xlogdefs.h index 5e6d7e6..2768427 100644 --- a/src/include/access/xlogdefs.h +++ b/src/include/access/xlogdefs.h @@ -35,6 +35,7 @@ typedef struct XLogRecPtr uint32 xrecoff;/* byte offset of location in log file */ } XLogRecPtr; +extern XLogRecPtr zeroRecPtr; #define XLogRecPtrIsInvalid(r) ((r).xrecoff == 0) -- 1.7.10.rc3.3.g19a6c.dirty -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] [PATCH 08/16] Introduce the ApplyCache module which can reassemble transactions from a stream of interspersed changes
From: Andres Freund and...@anarazel.de The individual changes need to be identified by an xid. The xid can be a subtransaction or a toplevel one, at commit those can be reintegrated by doing a k-way mergesort between the individual transaction. Callbacks for apply_begin, apply_change and apply_commit are provided to retrieve complete transactions. Missing: - spill-to-disk - correct subtransaction merge, current behaviour is simple/wrong - DDL handling (?) - resource usage controls --- src/backend/replication/Makefile |2 + src/backend/replication/logical/Makefile | 19 ++ src/backend/replication/logical/applycache.c | 380 ++ src/include/replication/applycache.h | 185 + 4 files changed, 586 insertions(+) create mode 100644 src/backend/replication/logical/Makefile create mode 100644 src/backend/replication/logical/applycache.c create mode 100644 src/include/replication/applycache.h diff --git a/src/backend/replication/Makefile b/src/backend/replication/Makefile index 9d9ec87..ae7f6b1 100644 --- a/src/backend/replication/Makefile +++ b/src/backend/replication/Makefile @@ -17,6 +17,8 @@ override CPPFLAGS := -I$(srcdir) $(CPPFLAGS) OBJS = walsender.o walreceiverfuncs.o walreceiver.o basebackup.o \ repl_gram.o syncrep.o +SUBDIRS = logical + include $(top_srcdir)/src/backend/common.mk # repl_scanner is compiled as part of repl_gram diff --git a/src/backend/replication/logical/Makefile b/src/backend/replication/logical/Makefile new file mode 100644 index 000..2eadab8 --- /dev/null +++ b/src/backend/replication/logical/Makefile @@ -0,0 +1,19 @@ +#- +# +# Makefile-- +#Makefile for src/backend/replication/logical +# +# IDENTIFICATION +#src/backend/replication/logical/Makefile +# +#- + +subdir = src/backend/replication/logical +top_builddir = ../../../.. +include $(top_builddir)/src/Makefile.global + +override CPPFLAGS := -I$(srcdir) $(CPPFLAGS) + +OBJS = applycache.o + +include $(top_srcdir)/src/backend/common.mk diff --git a/src/backend/replication/logical/applycache.c b/src/backend/replication/logical/applycache.c new file mode 100644 index 000..b73b0ba --- /dev/null +++ b/src/backend/replication/logical/applycache.c @@ -0,0 +1,380 @@ +/*- + * + * applycache.c + * + * PostgreSQL logical replay cache management + * + * + * Portions Copyright (c) 2012, PostgreSQL Global Development Group + * + * + * IDENTIFICATION + * src/backend/replication/applycache.c + * + */ +#include postgres.h + +#include access/heapam.h +#include access/xact.h +#include catalog/pg_class.h +#include catalog/pg_control.h +#include replication/applycache.h + +#include utils/ilist.h +#include utils/memutils.h +#include utils/relcache.h +#include utils/syscache.h + +const Size max_memtries = 116; + +const size_t max_cached_changes = 1024; +const size_t max_cached_tuplebufs = 1024; /* ~8MB */ +const size_t max_cached_transactions = 512; + +typedef struct ApplyCacheTXNByIdEnt +{ + TransactionId xid; + ApplyCacheTXN* txn; +} ApplyCacheTXNByIdEnt; + +static ApplyCacheTXN* ApplyCacheGetTXN(ApplyCache *cache); +static void ApplyCacheReturnTXN(ApplyCache *cache, ApplyCacheTXN* txn); + +static ApplyCacheTXN* ApplyCacheTXNByXid(ApplyCache*, TransactionId xid, bool create); + + +ApplyCache* +ApplyCacheAllocate(void) +{ + ApplyCache* cache = (ApplyCache*)malloc(sizeof(ApplyCache)); + HASHCTL hash_ctl; + + if (!cache) + elog(ERROR, Could not allocate the ApplyCache); + + memset(hash_ctl, 0, sizeof(hash_ctl)); + + cache-context = AllocSetContextCreate(TopMemoryContext, + ApplyCache, + ALLOCSET_DEFAULT_MINSIZE, + ALLOCSET_DEFAULT_INITSIZE, + ALLOCSET_DEFAULT_MAXSIZE); + + hash_ctl.keysize = sizeof(TransactionId); + hash_ctl.entrysize = sizeof(ApplyCacheTXNByIdEnt); + hash_ctl.hash = tag_hash; + hash_ctl.hcxt = cache-context; + + cache-by_txn = hash_create(ApplyCacheByXid, 1000, hash_ctl, + HASH_ELEM | HASH_FUNCTION | HASH_CONTEXT); + + cache-nr_cached_transactions = 0; + cache-nr_cached_changes = 0; + cache-nr_cached_tuplebufs = 0; + + ilist_d_init(cache-cached_transactions); + ilist_d_init(cache-cached_changes); + ilist_s_init(cache-cached_tuplebufs); + + return cache; +} + +void ApplyCacheFree(ApplyCache* cache) +{ + /* FIXME: check for in-progress transactions */ + /* FIXME: clean up cached transaction */ + /* FIXME: clean up cached changes */ + /* FIXME: clean
[HACKERS] [PATCH 01/16] Overhaul walsender wakeup handling
From: Andres Freund and...@anarazel.de The previous coding could miss xlog writeouts at several places. E.g. when wal was written out by the background writer or even after a commit if synchronous_commit=off. This could lead to delays in sending data to the standby of up to 7 seconds. To fix this move the responsibility of notification to the layer where the neccessary information is actually present. We take some care not to do the notification while we hold conteded locks like WALInsertLock or WalWriteLock locks. Document the preexisting fact that we rely on SetLatch to be safe from within signal handlers and critical sections. This removes the temporary bandaid from 2c8a4e9be2730342cbca85150a2a9d876aa77ff6 --- src/backend/access/transam/twophase.c | 21 - src/backend/access/transam/xact.c |7 -- src/backend/access/transam/xlog.c | 24 +-- src/backend/port/unix_latch.c |3 +++ src/backend/port/win32_latch.c|4 src/backend/replication/walsender.c | 41 - src/include/replication/walsender.h |2 ++ 7 files changed, 66 insertions(+), 36 deletions(-) diff --git a/src/backend/access/transam/twophase.c b/src/backend/access/transam/twophase.c index b94fae3..bdb7bcd 100644 --- a/src/backend/access/transam/twophase.c +++ b/src/backend/access/transam/twophase.c @@ -1042,13 +1042,6 @@ EndPrepare(GlobalTransaction gxact) /* If we crash now, we have prepared: WAL replay will fix things */ - /* -* Wake up all walsenders to send WAL up to the PREPARE record immediately -* if replication is enabled -*/ - if (max_wal_senders 0) - WalSndWakeup(); - /* write correct CRC and close file */ if ((write(fd, statefile_crc, sizeof(pg_crc32))) != sizeof(pg_crc32)) { @@ -2045,13 +2038,6 @@ RecordTransactionCommitPrepared(TransactionId xid, /* Flush XLOG to disk */ XLogFlush(recptr); - /* -* Wake up all walsenders to send WAL up to the COMMIT PREPARED record -* immediately if replication is enabled -*/ - if (max_wal_senders 0) - WalSndWakeup(); - /* Mark the transaction committed in pg_clog */ TransactionIdCommitTree(xid, nchildren, children); @@ -2133,13 +2119,6 @@ RecordTransactionAbortPrepared(TransactionId xid, XLogFlush(recptr); /* -* Wake up all walsenders to send WAL up to the ABORT PREPARED record -* immediately if replication is enabled -*/ - if (max_wal_senders 0) - WalSndWakeup(); - - /* * Mark the transaction aborted in clog. This is not absolutely necessary * but we may as well do it while we are here. */ diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c index 8f00186..3cc2bfa 100644 --- a/src/backend/access/transam/xact.c +++ b/src/backend/access/transam/xact.c @@ -1141,13 +1141,6 @@ RecordTransactionCommit(void) XLogFlush(XactLastRecEnd); /* -* Wake up all walsenders to send WAL up to the COMMIT record -* immediately if replication is enabled -*/ - if (max_wal_senders 0) - WalSndWakeup(); - - /* * Now we may update the CLOG, if we wrote a COMMIT record above */ if (markXidCommitted) diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index bcb71c4..166efb0 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -1017,6 +1017,8 @@ begin:; END_CRIT_SECTION(); + /* wakeup the WalSnd now that we released the WALWriteLock */ + WalSndWakeupProcess(); return RecPtr; } @@ -1218,6 +1220,9 @@ begin:; END_CRIT_SECTION(); + /* wakeup the WalSnd now that we outside contented locks */ + WalSndWakeupProcess(); + return RecPtr; } @@ -1822,6 +1827,10 @@ XLogWrite(XLogwrtRqst WriteRqst, bool flexible, bool xlog_switch) if (finishing_seg || (xlog_switch last_iteration)) { issue_xlog_fsync(openLogFile, openLogId, openLogSeg); + + /* signal that we need to wakeup WalSnd later */ + WalSndWakeupRequest(); + LogwrtResult.Flush = LogwrtResult.Write; /* end of page */ if (XLogArchivingActive()) @@ -1886,6 +1895,9 @@ XLogWrite(XLogwrtRqst WriteRqst, bool flexible, bool xlog_switch) openLogOff = 0; } issue_xlog_fsync(openLogFile, openLogId, openLogSeg); + +
[HACKERS] [PATCH 07/16] Log enough data into the wal to reconstruct logical changes from it if wal_level=logical
From: Andres Freund and...@anarazel.de This adds a new wal_level value 'logical' Missing cases: - heap_multi_insert - primary key changes for updates - no primary key - LOG_NEWPAGE --- src/backend/access/heap/heapam.c| 135 --- src/backend/access/transam/xlog.c |1 + src/backend/catalog/index.c | 74 + src/bin/pg_controldata/pg_controldata.c |2 + src/include/access/xlog.h |3 +- src/include/catalog/index.h |4 + 6 files changed, 207 insertions(+), 12 deletions(-) diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c index 9519e73..9149d53 100644 --- a/src/backend/access/heap/heapam.c +++ b/src/backend/access/heap/heapam.c @@ -52,6 +52,7 @@ #include access/xact.h #include access/xlogutils.h #include catalog/catalog.h +#include catalog/index.h #include catalog/namespace.h #include miscadmin.h #include pgstat.h @@ -1937,10 +1938,19 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid, xl_heap_insert xlrec; xl_heap_header xlhdr; XLogRecPtr recptr; - XLogRecData rdata[3]; + XLogRecData rdata[4]; Pagepage = BufferGetPage(buffer); uint8 info = XLOG_HEAP_INSERT; + /* +* For the logical replication case we need the tuple even if were +* doing a full page write. We could alternatively store a pointer into +* the fpw though. +* For that to work we add another rdata entry for the buffer in that +* case. +*/ + boolneed_tuple = wal_level == WAL_LEVEL_LOGICAL; + xlrec.all_visible_cleared = all_visible_cleared; xlrec.target.node = relation-rd_node; xlrec.target.tid = heaptup-t_self; @@ -1960,18 +1970,32 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid, */ rdata[1].data = (char *) xlhdr; rdata[1].len = SizeOfHeapHeader; - rdata[1].buffer = buffer; + rdata[1].buffer = need_tuple ? InvalidBuffer : buffer; rdata[1].buffer_std = true; rdata[1].next = (rdata[2]); /* PG73FORMAT: write bitmap [+ padding] [+ oid] + data */ rdata[2].data = (char *) heaptup-t_data + offsetof(HeapTupleHeaderData, t_bits); rdata[2].len = heaptup-t_len - offsetof(HeapTupleHeaderData, t_bits); - rdata[2].buffer = buffer; + rdata[2].buffer = need_tuple ? InvalidBuffer : buffer; rdata[2].buffer_std = true; rdata[2].next = NULL; /* +* add record for the buffer without actual content thats removed if +* fpw is done for that buffer +*/ + if(need_tuple){ + rdata[2].next = (rdata[3]); + + rdata[3].data = NULL; + rdata[3].len = 0; + rdata[3].buffer = buffer; + rdata[3].buffer_std = true; + rdata[3].next = NULL; + } + + /* * If this is the single and first tuple on page, we can reinit the * page instead of restoring the whole thing. Set flag, and hide * buffer references from XLogInsert. @@ -1980,7 +2004,7 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid, PageGetMaxOffsetNumber(page) == FirstOffsetNumber) { info |= XLOG_HEAP_INIT_PAGE; - rdata[1].buffer = rdata[2].buffer = InvalidBuffer; + rdata[1].buffer = rdata[2].buffer = rdata[3].buffer = InvalidBuffer; } recptr = XLogInsert(RM_HEAP_ID, info, rdata); @@ -2568,7 +2592,9 @@ l1: { xl_heap_delete xlrec; XLogRecPtr recptr; - XLogRecData rdata[2]; + XLogRecData rdata[4]; + + bool need_tuple = wal_level == WAL_LEVEL_LOGICAL relation-rd_id = FirstNormalObjectId; xlrec.all_visible_cleared = all_visible_cleared; xlrec.target.node = relation-rd_node; @@ -2584,6 +2610,73 @@ l1: rdata[1].buffer_std = true; rdata[1].next = NULL; + /* +* XXX: We could decide not to log changes when the origin is not the +* local node, that should reduce redundant logging. +*/ + if(need_tuple){ + xl_heap_header xlhdr; + + Oid indexoid = InvalidOid; + int16 pknratts; +
[HACKERS] [PATCH 12/16] Add state to keep track of logical replication
From: Andres Freund and...@anarazel.de In order to have restartable replication with minimal additional writes its very useful to know up to which point we have replayed/received changes from a foreign node. One representation of that is the lsn of changes at the originating cluster. We need to keep track of the point up to which we received data and up to where we applied data. For that we added a field 'origin_lsn' to commit records. This allows to keep track of the apply position with crash recovery with minimal additional io. We only added the field to non-compact commit records to reduce the overhead in case logical replication is not used. Checkpoints need to keep track of the apply/receive positions as well because otherwise it would be hard to determine the lsn from where to restart receive/apply after a shutdown/crash if no changes happened since the last shutdown/crash. While running the startup process, the walreceiver and a (future) apply process will need a coherent picture those two states so add shared memory state to keep track of it. Currently this is represented in the walreceivers shared memory segment. This will likely need to change. During crash recovery/physical replication the origin_lsn field of commit records is used to update the shared memory, and thus the next checkpoint's, notion of the apply state. Missing: - For correct crash recovery we need more state than the 'apply lsn' because transactions on the originating side can overlap. At the lsn we just applied many other transaction can be in-progres. To correctly handle that we need to keep track of oldest start lsn of any transaction currently being reassembled (c.f. ApplyCache). Then we can start to reassemble the ApplyCache up from that point and throw away any transaction which comitted before the recorded/recovered apply lsn. It should be sufficient to store that knowledge in shared memory and checkpoint records. --- src/backend/access/transam/xact.c | 22 - src/backend/access/transam/xlog.c | 73 src/backend/replication/walreceiverfuncs.c |8 +++ src/include/access/xact.h |1 + src/include/catalog/pg_control.h | 13 - src/include/replication/walreceiver.h | 13 + 6 files changed, 128 insertions(+), 2 deletions(-) diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c index dc30a17..40ac965 100644 --- a/src/backend/access/transam/xact.c +++ b/src/backend/access/transam/xact.c @@ -39,11 +39,13 @@ #include replication/logical.h #include replication/syncrep.h #include replication/walsender.h +#include replication/walreceiver.h #include storage/lmgr.h #include storage/predicate.h #include storage/procarray.h #include storage/sinvaladt.h #include storage/smgr.h +#include storage/spin.h #include utils/combocid.h #include utils/guc.h #include utils/inval.h @@ -1015,7 +1017,8 @@ RecordTransactionCommit(void) /* * Do we need the long commit record? If not, use the compact format. */ - if (nrels 0 || nmsgs 0 || RelcacheInitFileInval || forceSyncCommit) + if (nrels 0 || nmsgs 0 || RelcacheInitFileInval || forceSyncCommit || + (wal_level == WAL_LEVEL_LOGICAL current_replication_origin_id != guc_replication_origin_id)) { XLogRecData rdata[4]; int lastrdata = 0; @@ -1037,6 +1040,8 @@ RecordTransactionCommit(void) xlrec.nrels = nrels; xlrec.nsubxacts = nchildren; xlrec.nmsgs = nmsgs; + xlrec.origin_lsn = current_replication_origin_lsn; + rdata[0].data = (char *) (xlrec); rdata[0].len = MinSizeOfXactCommit; rdata[0].buffer = InvalidBuffer; @@ -4575,6 +4580,21 @@ xact_redo_commit_internal(TransactionId xid, RepNodeId originating_node, LWLockRelease(XidGenLock); } + /* +* record where were at wrt to recovery. We need that to know from where on +* to restart applying logical change records +*/ + if(LogicalWalReceiverActive() !XLByteEQ(origin_lsn, zeroRecPtr)) + { + /* +* probably we don't need the locking because no lcr receiver can run +* yet. +*/ + SpinLockAcquire(WalRcv-mutex); + WalRcv-mm_applyState[originating_node] = origin_lsn; + SpinLockRelease(WalRcv-mutex); + } + if (standbyState == STANDBY_DISABLED) { /* diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 0622726..20a4611 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c
[HACKERS] [PATCH 09/16] Decode wal (with wal_level=logical) into changes in an ApplyCache instance
From: Andres Freund and...@anarazel.de This requires an up2date catalog and can thus only be run on a replica. Missing: - HEAP_NEWPAGE support - HEAP2_MULTI_INSERT support - DDL integration. *No* ddl, including TRUNCATE is possible atm --- src/backend/replication/logical/Makefile |2 +- src/backend/replication/logical/decode.c | 439 ++ src/include/replication/decode.h | 23 ++ 3 files changed, 463 insertions(+), 1 deletion(-) create mode 100644 src/backend/replication/logical/decode.c create mode 100644 src/include/replication/decode.h diff --git a/src/backend/replication/logical/Makefile b/src/backend/replication/logical/Makefile index 2eadab8..7dd9663 100644 --- a/src/backend/replication/logical/Makefile +++ b/src/backend/replication/logical/Makefile @@ -14,6 +14,6 @@ include $(top_builddir)/src/Makefile.global override CPPFLAGS := -I$(srcdir) $(CPPFLAGS) -OBJS = applycache.o +OBJS = applycache.o decode.o include $(top_srcdir)/src/backend/common.mk diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c new file mode 100644 index 000..7e07d50 --- /dev/null +++ b/src/backend/replication/logical/decode.c @@ -0,0 +1,439 @@ +/*- + * + * decode.c + * + * Decodes wal records from an xlogreader.h callback into an applycache + * + * + * Portions Copyright (c) 2010-2012, PostgreSQL Global Development Group + * + * + * IDENTIFICATION + * src/backend/replication/logical/decode.c + * + */ +#include postgres.h + +#include access/heapam.h +#include access/transam.h +#include access/xlog_internal.h +#include access/xact.h + +#include replication/applycache.h +#include replication/decode.h + +#include utils/memutils.h +#include utils/syscache.h +#include utils/lsyscache.h + +static void DecodeXLogTuple(char* data, Size len, +HeapTuple table, ApplyCacheTupleBuf* tuple); + +static void DecodeInsert(ApplyCache *cache, XLogRecordBuffer* buf); + +static void DecodeUpdate(ApplyCache *cache, XLogRecordBuffer* buf); + +static void DecodeDelete(ApplyCache *cache, XLogRecordBuffer* buf); + +static void DecodeNewpage(ApplyCache *cache, XLogRecordBuffer* buf); +static void DecodeMultiInsert(ApplyCache *cache, XLogRecordBuffer* buf); + +static void DecodeCommit(ApplyCache* cache, XLogRecordBuffer* buf, TransactionId xid, +TransactionId *sub_xids, int nsubxacts); + + +void DecodeRecordIntoApplyCache(ApplyCache *cache, XLogRecordBuffer* buf) +{ + XLogRecord* r = buf-record; + uint8 info = r-xl_info ~XLR_INFO_MASK; + + switch (r-xl_rmid) + { + case RM_HEAP_ID: + { + info = XLOG_HEAP_OPMASK; + switch (info) + { + case XLOG_HEAP_INSERT: + DecodeInsert(cache, buf); + break; + + /* no guarantee that we get an HOT update again, so handle it as a normal update*/ + case XLOG_HEAP_HOT_UPDATE: + case XLOG_HEAP_UPDATE: + DecodeUpdate(cache, buf); + break; + + case XLOG_HEAP_NEWPAGE: + DecodeNewpage(cache, buf); + break; + + case XLOG_HEAP_DELETE: + DecodeDelete(cache, buf); + break; + default: + break; + } + break; + } + case RM_HEAP2_ID: + { + info = XLOG_HEAP_OPMASK; + switch (info) + { + case XLOG_HEAP2_MULTI_INSERT: + /* this also handles the XLOG_HEAP_INIT_PAGE case */ + DecodeMultiInsert(cache, buf); + break; + default: + /* everything else here is just physical stuff were not interested in */ + break; + } + break; + } + + case RM_XACT_ID: + { + switch (info) + { + case XLOG_XACT_COMMIT: + { + TransactionId *sub_xids; + xl_xact_commit *xlrec = (xl_xact_commit*)buf-record_data; + +
[HACKERS] [PATCH 11/16] Add infrastructure for manipulating multiple streams of wal on a segment handling level
From: Andres Freund and...@anarazel.de For that add a 'node_id' parameter to most commands dealing with wal segments. A node_id thats 'InvalidMultimasterNodeId' references local wal, every other node_id referes to wal in a new pg_lcr directory. Using duplicated code would reduce the impact of that change but the long-term code-maintenance burden outweighs that by a far bit. Besides the decision to add a 'node_id' parameter to several functions the changes in this patch are fairly mechanical. --- src/backend/access/transam/xlog.c | 54 --- src/backend/replication/basebackup.c|4 +- src/backend/replication/walreceiver.c |2 +- src/backend/replication/walsender.c |9 +++-- src/bin/initdb/initdb.c |1 + src/bin/pg_resetxlog/pg_resetxlog.c |2 +- src/include/access/xlog.h |2 +- src/include/access/xlog_internal.h | 13 +-- src/include/replication/logical.h |2 + src/include/replication/walsender_private.h |2 +- 10 files changed, 56 insertions(+), 35 deletions(-) diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 504b4d0..0622726 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -635,8 +635,8 @@ static bool XLogCheckBuffer(XLogRecData *rdata, bool doPageWrites, static bool AdvanceXLInsertBuffer(bool new_segment); static bool XLogCheckpointNeeded(uint32 logid, uint32 logseg); static void XLogWrite(XLogwrtRqst WriteRqst, bool flexible, bool xlog_switch); -static bool InstallXLogFileSegment(uint32 *log, uint32 *seg, char *tmppath, - bool find_free, int *max_advance, +static bool InstallXLogFileSegment(RepNodeId node_id, uint32 *log, uint32 *seg, + char *tmppath, bool find_free, int *max_advance, bool use_lock); static int XLogFileRead(uint32 log, uint32 seg, int emode, TimeLineID tli, int source, bool notexistOk); @@ -1736,8 +1736,8 @@ XLogWrite(XLogwrtRqst WriteRqst, bool flexible, bool xlog_switch) /* create/use new log file */ use_existent = true; - openLogFile = XLogFileInit(openLogId, openLogSeg, - use_existent, true); + openLogFile = XLogFileInit(InvalidMultimasterNodeId, openLogId, + openLogSeg, use_existent, true); openLogOff = 0; } @@ -2376,6 +2376,9 @@ XLogNeedsFlush(XLogRecPtr record) * place. This should be TRUE except during bootstrap log creation. The * caller must *not* hold the lock at call. * + * node_id: if != InvalidMultimasterNodeId this xlog file is actually a LCR + * file + * * Returns FD of opened file. * * Note: errors here are ERROR not PANIC because we might or might not be @@ -2384,8 +2387,8 @@ XLogNeedsFlush(XLogRecPtr record) * in a critical section. */ int -XLogFileInit(uint32 log, uint32 seg, -bool *use_existent, bool use_lock) +XLogFileInit(RepNodeId node_id, uint32 log, uint32 seg, + bool *use_existent, bool use_lock) { charpath[MAXPGPATH]; chartmppath[MAXPGPATH]; @@ -2396,7 +2399,7 @@ XLogFileInit(uint32 log, uint32 seg, int fd; int nbytes; - XLogFilePath(path, ThisTimeLineID, log, seg); + XLogFilePath(path, ThisTimeLineID, node_id, log, seg); /* * Try to use existent file (checkpoint maker may have created it already) @@ -2425,6 +2428,11 @@ XLogFileInit(uint32 log, uint32 seg, */ elog(DEBUG2, creating and filling new WAL file); + /* +* FIXME: to be safe we need to create tempfile in the pg_lcr directory if +* its actually an lcr file because pg_lcr might be in a different +* partition. +*/ snprintf(tmppath, MAXPGPATH, XLOGDIR /xlogtemp.%d, (int) getpid()); unlink(tmppath); @@ -2493,7 +2501,7 @@ XLogFileInit(uint32 log, uint32 seg, installed_log = log; installed_seg = seg; max_advance = XLOGfileslop; - if (!InstallXLogFileSegment(installed_log, installed_seg, tmppath, + if (!InstallXLogFileSegment(node_id, installed_log, installed_seg, tmppath, *use_existent, max_advance, use_lock)) { @@ -2548,7 +2556,7 @@ XLogFileCopy(uint32 log, uint32 seg, /* * Open the source file */ - XLogFilePath(path, srcTLI, srclog, srcseg); + XLogFilePath(path, srcTLI,
[HACKERS] [PATCH 10/16] Introduce the concept that wal has a 'origin' node
From: Andres Freund and...@anarazel.de One solution to avoid loops when doing wal based logical replication in topologies which are more complex than one unidirectional transport is introducing the concept of a 'origin_id' into the wal stream. Luckily there is some padding in the XLogRecord struct that allows us to add that field without further bloating the struct. This solution was chosen because it allows for just about any topology and is inobstrusive. This adds a new configuration parameter multimaster_node_id which determines the id used for wal originating in one cluster. When applying changes from wal from another cluster code can set the variable current_replication_origin_id. This is a global variable because passing it through everything which can generate wal would be far to intrusive. --- src/backend/access/transam/xact.c | 48 +++-- src/backend/access/transam/xlog.c |3 +- src/backend/access/transam/xlogreader.c |2 ++ src/backend/replication/logical/Makefile |2 +- src/backend/replication/logical/logical.c | 19 ++ src/backend/utils/misc/guc.c | 19 ++ src/backend/utils/misc/postgresql.conf.sample |3 ++ src/include/access/xlog.h |4 +-- src/include/access/xlogdefs.h |2 ++ src/include/replication/logical.h | 22 10 files changed, 110 insertions(+), 14 deletions(-) create mode 100644 src/backend/replication/logical/logical.c create mode 100644 src/include/replication/logical.h diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c index 3cc2bfa..dc30a17 100644 --- a/src/backend/access/transam/xact.c +++ b/src/backend/access/transam/xact.c @@ -36,8 +36,9 @@ #include libpq/be-fsstubs.h #include miscadmin.h #include pgstat.h -#include replication/walsender.h +#include replication/logical.h #include replication/syncrep.h +#include replication/walsender.h #include storage/lmgr.h #include storage/predicate.h #include storage/procarray.h @@ -4545,12 +4546,13 @@ xactGetCommittedChildren(TransactionId **ptr) * actions for which the order of execution is critical. */ static void -xact_redo_commit_internal(TransactionId xid, XLogRecPtr lsn, - TransactionId *sub_xids, int nsubxacts, - SharedInvalidationMessage *inval_msgs, int nmsgs, - RelFileNode *xnodes, int nrels, - Oid dbId, Oid tsId, - uint32 xinfo) +xact_redo_commit_internal(TransactionId xid, RepNodeId originating_node, + XLogRecPtr lsn, XLogRecPtr origin_lsn, + TransactionId *sub_xids, int nsubxacts, + SharedInvalidationMessage *inval_msgs, int nmsgs, + RelFileNode *xnodes, int nrels, + Oid dbId, Oid tsId, + uint32 xinfo) { TransactionId max_xid; int i; @@ -4659,8 +4661,13 @@ xact_redo_commit_internal(TransactionId xid, XLogRecPtr lsn, * Utility function to call xact_redo_commit_internal after breaking down xlrec */ static void + HEAD xact_redo_commit(xl_xact_commit *xlrec, TransactionId xid, XLogRecPtr lsn) +=== +xact_redo_commit(xl_xact_commit *xlrec, RepNodeId originating_node, + TransactionId xid, XLogRecPtr lsn) + Introduce the concept that wal has a 'origin' node { TransactionId *subxacts; SharedInvalidationMessage *inval_msgs; @@ -4670,18 +4677,26 @@ xact_redo_commit(xl_xact_commit *xlrec, /* invalidation messages array follows subxids */ inval_msgs = (SharedInvalidationMessage *) (subxacts[xlrec-nsubxacts]); + HEAD xact_redo_commit_internal(xid, lsn, subxacts, xlrec-nsubxacts, inval_msgs, xlrec-nmsgs, xlrec-xnodes, xlrec-nrels, xlrec-dbId, xlrec-tsId, xlrec-xinfo); +=== + xact_redo_commit_internal(xid, originating_node, lsn, xlrec-origin_lsn, + subxacts, xlrec-nsubxacts, inval_msgs, + xlrec-nmsgs, xlrec-xnodes, xlrec-nrels, + xlrec-dbId, xlrec-tsId, xlrec-xinfo); + Introduce the concept that wal has a 'origin' node } /* * Utility function to call xact_redo_commit_internal for compact form of message. */ static void + HEAD
[HACKERS] [PATCH 06/16] Add support for a generic wal reading facility dubbed XLogReader
From: Andres Freund and...@anarazel.de Features: - streaming reading/writing - filtering - reassembly of records Reusing the ReadRecord infrastructure in situations where the code that wants to do so is not tightly integrated into xlog.c is rather hard and would require changes to rather integral parts of the recovery code which doesn't seem to be a good idea. Missing: - compressing the stream when removing uninteresting records - writing out correct CRCs - validating CRCs - separating reader/writer --- src/backend/access/transam/Makefile |2 +- src/backend/access/transam/xlogreader.c | 914 +++ src/include/access/xlogreader.h | 173 ++ 3 files changed, 1088 insertions(+), 1 deletion(-) create mode 100644 src/backend/access/transam/xlogreader.c create mode 100644 src/include/access/xlogreader.h diff --git a/src/backend/access/transam/Makefile b/src/backend/access/transam/Makefile index f82f10e..660b5fc 100644 --- a/src/backend/access/transam/Makefile +++ b/src/backend/access/transam/Makefile @@ -13,7 +13,7 @@ top_builddir = ../../../.. include $(top_builddir)/src/Makefile.global OBJS = clog.o transam.o varsup.o xact.o rmgr.o slru.o subtrans.o multixact.o \ - twophase.o twophase_rmgr.o xlog.o xlogfuncs.o xlogutils.o + twophase.o twophase_rmgr.o xlog.o xlogfuncs.o xlogreader.o xlogutils.o include $(top_srcdir)/src/backend/common.mk diff --git a/src/backend/access/transam/xlogreader.c b/src/backend/access/transam/xlogreader.c new file mode 100644 index 000..6f15d66 --- /dev/null +++ b/src/backend/access/transam/xlogreader.c @@ -0,0 +1,914 @@ +/*- + * + * xlogreader.c + * + * Aa somewhat generic xlog read interface + * + * Portions Copyright (c) 2010-2012, PostgreSQL Global Development Group + * + * IDENTIFICATION + * src/backend/access/transam/readxlog.c + * + *- + * + * FIXME: + * * CRC computation + * * separation of reader/writer + */ + +#include postgres.h + +#include access/xlog_internal.h +#include access/transam.h +#include catalog/pg_control.h +#include access/xlogreader.h + +/* FIXME */ +#include replication/walsender_private.h +#include replication/walprotocol.h + +//#define VERBOSE_DEBUG + +XLogReaderState* XLogReaderAllocate(void) +{ + XLogReaderState* state = (XLogReaderState*)malloc(sizeof(XLogReaderState)); + int i; + + if (!state) + goto oom; + + memset(state-buf.record, 0, sizeof(XLogRecord)); + state-buf.record_data_size = XLOG_BLCKSZ*8; + state-buf.record_data = + malloc(state-buf.record_data_size); + + if (!state-buf.record_data) + goto oom; + + if (!state) + goto oom; + + for (i = 0; i XLR_MAX_BKP_BLOCKS; i++) + { + state-buf.bkp_block_data[i] = + malloc(BLCKSZ); + + if (!state-buf.bkp_block_data[i]) + goto oom; + } + XLogReaderReset(state); + return state; + +oom: + elog(ERROR, could not allocate memory for XLogReaderState); + return 0; +} + +void XLogReaderReset(XLogReaderState* state) +{ + state-in_record = false; + state-in_bkp_blocks = 0; + state-in_bkp_block_header = false; + state-in_skip = false; + state-remaining_size = 0; + state-nbytes = 0; + state-incomplete = false; + state-initialized = false; + state-needs_input = false; + state-needs_output = false; +} + +static inline bool +XLogReaderHasInput(XLogReaderState* state, Size size) +{ + XLogRecPtr tmp = state-curptr; + XLByteAdvance(tmp, size); + if (XLByteLE(state-endptr, tmp)) + return false; + return true; +} + +static inline bool +XLogReaderHasOutput(XLogReaderState* state, Size size){ + if (state-nbytes + size MAX_SEND_SIZE) + return false; + return true; +} + +static inline bool +XLogReaderHasSpace(XLogReaderState* state, Size size) +{ + XLogRecPtr tmp = state-curptr; + XLByteAdvance(tmp, size); + if (XLByteLE(state-endptr, tmp)) + return false; + else if (state-nbytes + size MAX_SEND_SIZE) + return false; + return true; +} + +void +XLogReaderRead(XLogReaderState* state) +{ + XLogRecord* temp_record; + + state-needs_input = false; + state-needs_output = false; + + /* +* Do some basic sanity checking and setup if were starting anew. +*/ + if (!state-initialized) + { + state-initialized = true; + /* +* we need to start reading at the beginning of the page to understand +* what we are currently reading. We will skip over that because we +* check curptr startptr
[HACKERS] [PATCH 16/16] current version of the design document
From: Andres Freund and...@anarazel.de --- src/backend/replication/logical/DESIGN | 209 1 file changed, 209 insertions(+) create mode 100644 src/backend/replication/logical/DESIGN diff --git a/src/backend/replication/logical/DESIGN b/src/backend/replication/logical/DESIGN new file mode 100644 index 000..2cf08ff --- /dev/null +++ b/src/backend/replication/logical/DESIGN @@ -0,0 +1,209 @@ +=== Design goals for logical replication ===: +- in core +- fast +- async +- robust +- multi-master +- modular +- as unintrusive as possible implementation wise +- basis for other technologies (sharding, replication into other DBMSs, ...) + +For reasons why we think this is an important set of features please check out +the presentation from the in-core replication summit at pgcon: +http://wiki.postgresql.org/wiki/File:BDR_Presentation_PGCon2012.pdf + +While you may argue that most of the above design goals are already provided by +various trigger based replication solutions like Londiste or Slony, we think +that thats not enough for various reasons: + +- not in core (and thus less trustworthy) +- duplication of writes due to an additional log +- performance in general (check the end of the above presentation) +- complex to use because there is no native administration interface + +We want to emphasize that this proposed architecture is based on the experience +of developing a minimal prototype which we developed with the above goals in +mind. While we obviously hope that a good part of it is reusable for the +community we definitely do *not* expect that the community accepts this ++as-is. It is intended to be the basis upon which we, the community, can build +and design the future logical replication. + +=== Basic architecture ===: +Very broadly speaking there are several major pieces common to most approaches +to replication: +1. Source data generation +2. Transportation of that data +3. Applying the changes +4. Conflict resolution + + +1.: + +As we need a change stream that contains all required changes in the correct +order, the requirement for this stream to reflect changes across multiple +concurrent backends raises concurrency and scalability issues. Reusing the +WAL stream for this seems a good choice since it is needed anyway and adresses +those issues already, and it further means that we don't incur duplicate +writes. Any other stream generating componenent would introduce additional +scalability issues. + +We need a change stream that contains all required changes in the correct order +which thus needs to be synchronized across concurrent backends which introduces +obvious concurrency/scalability issues. +Reusing the WAL stream for this seems a good choice since it is needed anyway +and adresses those issues already, and it further means we don't duplicate the +writes and locks already performance for its maintenance. + +Unfortunately, in this case, the WAL is mostly a physical representation of the +changes and thus does not, by itself, contain the necessary information in a +convenient format to create logical changesets. + +The biggest problem is, that interpreting tuples in the WAL stream requires an +up-to-date system catalog and needs to be done in a compatible backend and +architecture. The requirement of an up-to-date catalog could be solved by +adding more data to the WAL stream but it seems to be likely that that would +require relatively intrusive complex changes. Instead we chose to require a +synchronized catalog at the decoding site. That adds some complexity to use +cases like replicating into a different database or cross-version +replication. For those it is relatively straight-forward to develop a proxy pg +instance that only contains the catalog and does the transformation to textual +changes. + +This also is the solution to the other big problem, the need to work around +architecture/version specific binary formats. The alternative, producing +cross-version, cross-architecture compatible binary changes or even moreso +textual changes all the time seems to be prohibitively expensive. Both from a +cpu and a storage POV and also from the point of implementation effort. + +The catalog on the site where changes originate can *not* be used for the +decoding because at the time we decode the WAL the catalog may have changed +from the state it was in when the WAL was generated. A possible solution for +this would be to have a fully versioned catalog but that again seems to be +rather complex and intrusive. + +For some operations (UPDATE, DELETE) and corner-cases (e.g. full page writes) +additional data needs to be logged, but the additional amount of data isn't +that big. Requiring a primary-key for any change but INSERT seems to be a +sensible thing for now. The required changes are fully contained in heapam.c +and are pretty simple so far. + +2.: + +For transport of the non-decoded data from the originating site to the decoding +site we decided to
[HACKERS] [PATCH 15/16] Activate/Implement the apply process which applies received updates from another node
From: Andres Freund and...@anarazel.de One apply process currently can only apply changes from one database in another cluster (with a specific node_id). Currently synchronous_commit=off is statically set in the apply process because after a crash we can safely recover all changes which we didn't apply so there is no point of incurring the overhead of synchronous commits. This might be problematic in combination with synchronous replication. Missing/Todo: - The foreign node_id currently is hardcoded (2, 1 depending on the local id) as is the database (postgres). This obviously need to change. - Proper mainloop with error handling, PROCESS_INTERRUPTS and everything - Start multiple apply processes (per node_id per database) - Possibly switch databases during runtime? --- src/backend/postmaster/bgworker.c | 10 +- src/backend/replication/logical/logical.c | 194 + src/include/replication/logical.h |3 + 3 files changed, 198 insertions(+), 9 deletions(-) diff --git a/src/backend/postmaster/bgworker.c b/src/backend/postmaster/bgworker.c index 8144050..bbb7e86 100644 --- a/src/backend/postmaster/bgworker.c +++ b/src/backend/postmaster/bgworker.c @@ -52,6 +52,7 @@ #include postmaster/bgworker.h #include postmaster/fork_process.h #include postmaster/postmaster.h +#include replication/logical.h #include storage/bufmgr.h #include storage/ipc.h #include storage/latch.h @@ -91,8 +92,6 @@ static void bgworker_sigterm_handler(SIGNAL_ARGS); NON_EXEC_STATIC void BgWorkerMain(int argc, char *argv[]); -static bool do_logicalapply(void); - / * BGWORKER CODE / @@ -394,10 +393,3 @@ NumBgWorkers(void) return numWorkers; #endif } - -static bool -do_logicalapply(void) -{ - elog(LOG, doing logical apply); - return false; -} diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c index 4f34488..7fadafe 100644 --- a/src/backend/replication/logical/logical.c +++ b/src/backend/replication/logical/logical.c @@ -13,7 +13,201 @@ * */ #include postgres.h + +#include access/xlogreader.h + +#include replication/applycache.h #include replication/logical.h +#include replication/apply.h +#include replication/decode.h +#include replication/walreceiver.h +/*FIXME: XLogRead*/ +#include replication/walsender_private.h + +#include storage/ipc.h +#include storage/proc.h + int guc_replication_origin_id = InvalidMultimasterNodeId; RepNodeId current_replication_origin_id = InvalidMultimasterNodeId; XLogRecPtr current_replication_origin_lsn = {0, 0}; + +static XLogReaderState* xlogreader_state = 0; + +static bool +replay_record_is_interesting(XLogReaderState* state, XLogRecord* r) +{ + /* +* we filtered in the sender, so no filtering necessary atm. +* +* If we want to introduce per-table filtering after a proxy or such this +* would be the place. +*/ + return true; +} + +static void +replay_writeout_data(XLogReaderState* state, char* data, Size len) +{ + /* no data needs to persists after this */ + return; +} + +static void +replay_finished_record(XLogReaderState* state, XLogRecordBuffer* buf) +{ + ReaderApplyState* apply_state = state-private_data; + ApplyCache *cache = apply_state-apply_cache; + + DecodeRecordIntoApplyCache(cache, buf); +} + +static void +replay_read_page(XLogReaderState* state, char* cur_page, XLogRecPtr startptr) +{ + XLogPageHeader page_header; + + Assert((startptr.xrecoff % XLOG_BLCKSZ) == 0); + + /* FIXME: more sensible/efficient implementation */ + XLogRead(cur_page, receiving_from_node_id, startptr, XLOG_BLCKSZ); + + page_header = (XLogPageHeader)cur_page; + + if (page_header-xlp_magic != XLOG_PAGE_MAGIC) + { + elog(FATAL, page header magic %x, should be %x at %X/%X, page_header-xlp_magic, +XLOG_PAGE_MAGIC, startptr.xlogid, startptr.xrecoff); + } +} + +bool +do_logicalapply(void) +{ + XLogRecPtr *from; + XLogRecPtr *to; + ReaderApplyState *apply_state; + int res; + + static bool initialized = false; + + if (!initialized) + { + /* +* FIXME: We need a sensible implementation for choosing this. +*/ + if (guc_replication_origin_id == 1) + { + receiving_from_node_id = 2; + } + else + { + receiving_from_node_id = 1; + } + } + + ResetLatch(MyProc-procLatch); + + SpinLockAcquire(WalRcv-mutex); + from = WalRcv-mm_applyState[receiving_from_node_id]; + to = WalRcv-mm_receiveState[receiving_from_node_id]; +
[HACKERS] [PATCH 14/16] Add module to apply changes from an apply-cache using low-level functions
From: Andres Freund and...@anarazel.de We decided to use low level functions to do the apply instead of producing sql statements containing the data (or using prepared statements) because both, the text conversion and the full executor overhead aren't introduce a significant overhead which is unneccesary if youre using the same version of pg on the same architecture. There are loads of use cases though that require different methods of applyin though - so the part doing the applying from an ApplyCache is just a bunch of well abstracted callbacks getting passed all the required knowledge to change the data representation into other formats. Missing: - TOAST handling. For physical apply not much needs to be done because the toast inserts will have been made beforehand. There needs to be an option in ApplyCache that helps reassembling TOAST datums to make it easier to write apply modules which convert to text. --- src/backend/replication/logical/Makefile |2 +- src/backend/replication/logical/apply.c | 313 ++ src/include/replication/apply.h | 24 +++ 3 files changed, 338 insertions(+), 1 deletion(-) create mode 100644 src/backend/replication/logical/apply.c create mode 100644 src/include/replication/apply.h diff --git a/src/backend/replication/logical/Makefile b/src/backend/replication/logical/Makefile index c2d6d82..d0e0b13 100644 --- a/src/backend/replication/logical/Makefile +++ b/src/backend/replication/logical/Makefile @@ -14,6 +14,6 @@ include $(top_builddir)/src/Makefile.global override CPPFLAGS := -I$(srcdir) $(CPPFLAGS) -OBJS = applycache.o decode.o logical.o +OBJS = apply.o applycache.o decode.o logical.o include $(top_srcdir)/src/backend/common.mk diff --git a/src/backend/replication/logical/apply.c b/src/backend/replication/logical/apply.c new file mode 100644 index 000..646bd54 --- /dev/null +++ b/src/backend/replication/logical/apply.c @@ -0,0 +1,313 @@ +/*- + * + * logical.c + * + * Support functions for logical/multimaster replication + * + * + * Portions Copyright (c) 2010-2012, PostgreSQL Global Development Group + * + * + * IDENTIFICATION + * src/backend/replication/logical.c + * + */ +#include postgres.h + +#include access/xact.h +#include access/heapam.h +#include access/genam.h + +#include catalog/pg_control.h +#include catalog/index.h + +#include executor/executor.h + +#include replication/applycache.h +#include replication/apply.h + +#include utils/rel.h +#include utils/snapmgr.h +#include utils/lsyscache.h + + + +static void +UserTableUpdateIndexes(Relation heapRel, HeapTuple heapTuple); + + +void apply_begin_txn(ApplyCache* cache, ApplyCacheTXN* txn) +{ + ApplyApplyCacheState *state = cache-private_data; + + state-original_resource_owner = CurrentResourceOwner; + + PreventTransactionChain(true, Apply Process cannot be started inside a txn); + + StartTransactionCommand(); + + PushActiveSnapshot(GetTransactionSnapshot()); +} + +void apply_commit_txn(ApplyCache* cache, ApplyCacheTXN* txn) +{ + ApplyApplyCacheState *state = cache-private_data; + + current_replication_origin_lsn = txn-lsn; + + PopActiveSnapshot(); + CommitTransactionCommand(); + + + /* +* for some reason after (Start|Commit)TransactionCommand we loose our +* resource owner, restore it. +* XXX: is that correct? +*/ + CurrentResourceOwner = state-original_resource_owner; + + current_replication_origin_lsn.xlogid = 0; + current_replication_origin_lsn.xrecoff = 0; +} + + +void apply_change(ApplyCache* cache, ApplyCacheTXN* txn, ApplyCacheTXN* subtxn, ApplyCacheChange* change) +{ + /* for inserting */ + Relationtuple_rel; + + tuple_rel = heap_open(HeapTupleGetOid(change-table), RowExclusiveLock); + + switch (change-action) + { + case APPLY_CACHE_CHANGE_INSERT: + { +#ifdef VERBOSE_DEBUG + elog(LOG, INSERT); +#endif + simple_heap_insert(tuple_rel, change-newtuple-tuple); + + UserTableUpdateIndexes(tuple_rel, change-newtuple-tuple); + break; + } + case APPLY_CACHE_CHANGE_UPDATE: + { + Oid indexoid = InvalidOid; + int16 pknratts; + int16 pkattnum[INDEX_MAX_KEYS]; + Oid pktypoid[INDEX_MAX_KEYS]; + Oid pkopclass[INDEX_MAX_KEYS]; + + ScanKeyData cur_skey[INDEX_MAX_KEYS]; + int i; + bool isnull; + TupleDesc desc = RelationGetDescr(tuple_rel); + + Relation index_rel; + + HeapTuple old_tuple; + bool found = false; + +
[HACKERS] [PATCH 13/16] Introduction of pair of logical walreceiver/sender
From: Andres Freund and...@anarazel.de A logical WALReceiver is started directly by Postmaster when we enter PM_RUN state and the new parameter multimaster_conninfo is set. For now only one of those is started, but the code doesn't rely on that. In future multiple ones should be allowed. To transfer that data a new command, START_LOGICAL_REPLICATION is introduced in the walsender reusing most of the infrastructure for START_REPLICATION. The former uses the same on-the-wire format as the latter. To make initialization possibly IDENTIFY_SYSTEM returns two new columns node_id returning the multimaster_node_id and last_checkpoint returning the RedoRecPtr. The walreceiver writes that data into the previously introduce pg_lcr/$node_id directory. Future Directions/TODO: - pass node_ids were interested in to START_LOGICAL_REPLICATION to allow complex topologies - allow to pass filters to reduce the transfer volume - compress the transferred data by actually removing uninteresting records instead of replacing them by NOOP records. This adds some complexities because we still need to map the received lsn to the requested lsn so we know where to restart transferring data and such. - check that wal on the sending side was generated with WAL_LEVEL_LOGICAL --- src/backend/postmaster/postmaster.c| 10 +- .../libpqwalreceiver/libpqwalreceiver.c| 104 - src/backend/replication/repl_gram.y| 19 +- src/backend/replication/repl_scanner.l |1 + src/backend/replication/walreceiver.c | 165 +++- src/backend/replication/walreceiverfuncs.c |1 + src/backend/replication/walsender.c| 422 +++- src/backend/utils/misc/guc.c |9 + src/backend/utils/misc/postgresql.conf.sample |1 + src/include/nodes/nodes.h |1 + src/include/nodes/replnodes.h | 10 + src/include/replication/logical.h |4 + src/include/replication/walreceiver.h |9 +- 13 files changed, 624 insertions(+), 132 deletions(-) diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c index 71cfd6d..13e9592 100644 --- a/src/backend/postmaster/postmaster.c +++ b/src/backend/postmaster/postmaster.c @@ -1449,6 +1449,11 @@ ServerLoop(void) kill(AutoVacPID, SIGUSR2); } + /* Restart walreceiver process in certain states only. */ + if (WalReceiverPID == 0 pmState == PM_RUN + LogicalWalReceiverActive()) + WalReceiverPID = StartWalReceiver(); + /* Check all the workers requested are running. */ if (pmState == PM_RUN) StartBackgroundWorkers(); @@ -2169,7 +2174,8 @@ pmdie(SIGNAL_ARGS) /* and the walwriter too */ if (WalWriterPID != 0) signal_child(WalWriterPID, SIGTERM); - + if (WalReceiverPID != 0) + signal_child(WalReceiverPID, SIGTERM); /* * If we're in recovery, we can't kill the startup process * right away, because at present doing so does not release @@ -2421,6 +2427,8 @@ reaper(SIGNAL_ARGS) PgArchPID = pgarch_start(); if (PgStatPID == 0) PgStatPID = pgstat_start(); + if (WalReceiverPID == 0 LogicalWalReceiverActive()) + WalReceiverPID = StartWalReceiver(); StartBackgroundWorkers(); /* at this point we are really open for business */ diff --git a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c index 979b66b..0ea3fce 100644 --- a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c +++ b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c @@ -46,7 +46,8 @@ static PGconn *streamConn = NULL; static char *recvBuf = NULL; /* Prototypes for interface functions */ -static bool libpqrcv_connect(char *conninfo, XLogRecPtr startpoint); +static bool libpqrcv_connect(char *conninfo, XLogRecPtr* redo, XLogRecPtr* where_at, bool startedDuringRecovery); +static bool libpqrcv_start(char *conninfo, XLogRecPtr* startpoint, bool startedDuringRecovery); static bool libpqrcv_receive(int timeout, unsigned char *type, char **buffer, int *len); static void libpqrcv_send(const char *buffer, int nbytes); @@ -63,10 +64,12 @@ void _PG_init(void) { /* Tell walreceiver how to reach us */ - if (walrcv_connect != NULL || walrcv_receive !=
[HACKERS] [PATCH 05/16] Preliminary: Introduce Bgworker process
From: Simon Riggs si...@2ndquadrant.com Early prototype that allows for just 1 bgworker which calls a function called do_applyprocess(). Expect major changes in this, but not in ways that would effect the apply process. --- src/backend/postmaster/Makefile |4 +- src/backend/postmaster/bgworker.c | 403 + src/backend/postmaster/postmaster.c | 91 -- src/backend/tcop/postgres.c |5 + src/backend/utils/init/miscinit.c |5 +- src/backend/utils/init/postinit.c |3 +- src/backend/utils/misc/guc.c | 37 ++- src/backend/utils/misc/postgresql.conf.sample |4 + src/include/postmaster/bgworker.h | 29 ++ 9 files changed, 550 insertions(+), 31 deletions(-) create mode 100644 src/backend/postmaster/bgworker.c create mode 100644 src/include/postmaster/bgworker.h diff --git a/src/backend/postmaster/Makefile b/src/backend/postmaster/Makefile index 3056b09..7b23353 100644 --- a/src/backend/postmaster/Makefile +++ b/src/backend/postmaster/Makefile @@ -12,7 +12,7 @@ subdir = src/backend/postmaster top_builddir = ../../.. include $(top_builddir)/src/Makefile.global -OBJS = autovacuum.o bgwriter.o fork_process.o pgarch.o pgstat.o postmaster.o \ - startup.o syslogger.o walwriter.o checkpointer.o +OBJS = autovacuum.o bgworker.o bgwriter.o fork_process.o pgarch.o pgstat.o \ + postmaster.o startup.o syslogger.o walwriter.o checkpointer.o include $(top_srcdir)/src/backend/common.mk diff --git a/src/backend/postmaster/bgworker.c b/src/backend/postmaster/bgworker.c new file mode 100644 index 000..8144050 --- /dev/null +++ b/src/backend/postmaster/bgworker.c @@ -0,0 +1,403 @@ +/*- + * + * bgworker.c + * + * PostgreSQL Integrated Worker Daemon + * + * Background workers can execute arbitrary user code. A shared library + * can request creation of a worker using RequestAddinBGWorkerProcess(). + * + * The worker process is forked from the postmaster and then attaches + * to shared memory similarly to an autovacuum worker and finally begins + * executing the supplied WorkerMain function. + * + * If the fork() call fails in the postmaster, it will try again later. + * Note that the failure can only be transient (fork failure due to + * high load, memory pressure, too many processes, etc); more permanent + * problems, like failure to connect to a database, are detected later in the + * worker and dealt with just by having the worker exit normally. Postmaster + * will launch a new worker again later. + * + * Note that there can be more than one worker in a database concurrently. + * + * Portions Copyright (c) 1996-2012, PostgreSQL Global Development Group + * Portions Copyright (c) 1994, Regents of the University of California + * + * + * IDENTIFICATION + * src/backend/postmaster/bgworker.c + * + *- + */ +#include postgres.h + +#include signal.h +#include sys/types.h +#include sys/time.h +#include time.h +#include unistd.h + +#include access/heapam.h +#include access/reloptions.h +#include access/transam.h +#include access/xact.h +#include catalog/dependency.h +#include catalog/namespace.h +#include catalog/pg_database.h +#include commands/dbcommands.h +#include commands/vacuum.h +#include libpq/pqsignal.h +#include miscadmin.h +#include pgstat.h +#include postmaster/bgworker.h +#include postmaster/fork_process.h +#include postmaster/postmaster.h +#include storage/bufmgr.h +#include storage/ipc.h +#include storage/latch.h +#include storage/pmsignal.h +#include storage/proc.h +#include storage/procsignal.h +#include storage/sinvaladt.h +#include tcop/tcopprot.h +#include utils/fmgroids.h +#include utils/lsyscache.h +#include utils/memutils.h +#include utils/ps_status.h +#include utils/rel.h +#include utils/snapmgr.h +#include utils/syscache.h +#include utils/timestamp.h +#include utils/tqual.h + + +/* + * GUC parameters + */ +intMaxWorkers; + +static int bgworker_addin_request = 0; +static bool bgworker_addin_request_allowed = true; + +/* Flags to tell if we are in a worker process */ +static bool am_bgworker = false; + +/* Flags set by signal handlers */ +static volatile sig_atomic_t got_SIGHUP = false; +static volatile sig_atomic_t got_SIGUSR2 = false; +static volatile sig_atomic_t got_SIGTERM = false; + +static void bgworker_sigterm_handler(SIGNAL_ARGS); + +NON_EXEC_STATIC void BgWorkerMain(int argc, char *argv[]); + +static bool do_logicalapply(void); + +/ + * BGWORKER CODE + / + +/* SIGTERM: time to die */ +static void +bgworker_sigterm_handler(SIGNAL_ARGS) +{ + int save_errno =
Re: [HACKERS] Ability to listen on two unix sockets
On 06/10/2012 12:37 AM, Peter Eisentraut wrote: On sön, 2012-06-10 at 00:25 +0200, Andres Freund wrote: We already have the ability to configure the Unix socket directory in postgresql.conf. All you need to do is turn that into a list. That doesn't help libpq using clients. There is no mandate here to do anything about that. Well, Martin Pitt/the debian package is patching exactly that. Youre saying that everything that needs to be done to make that easier is to make unix_socket_dir a list. So there seems to be some disparity there ;) The Debian package doesn't need any change or assistance, really. You can change the default location of the socket by patching pg_config_manual.h, and that's a one-liner. The only thing that would make that slightly easier or better is making it a configure option, but that was intentionally decided against in the old days (not by me). Since systemd with it's PrivateTmp feature is going to be used in more and more distros, there will probably be a bigger need to solve in-accessible default unix socket directory /tmp in the future. Thus, I'd like to ask if anybody is aware of any issue connected with simply patching pg_config_manual.h, same as Debian does it already? For example, is there any piece of software, that simply rely on /tmp location of the socket and doesn't use libpg for the communication? Regards, Honza -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] hint bit i/o reduction
On Wed, Jun 13, 2012 at 3:57 AM, Amit Kapila amit.kap...@huawei.com wrote: 1) Keep track # times the last transaction id was repeatedly seen in tqual.c (resetting as soon as a new xid was touched. We can do this just for xmin, or separately for both xmin and xmax. Will this be done when we see a new xid duricg scan of tuple, if yes then Won't this logic impact incase there are very few records per transaction in database. As in that case the new optimization won't help much, but it adds the new instructions. Yes, but only in the unhinted case -- in oltp workloads tuples get hinted fairly quickly so I doubt this would be a huge impact. Hinted scans will work exactly as they do now. In the unhinted case for OLTP a few tests are added but that's noise compared to the other stuff going on. We can also try to be smart and disable the 'avoid setting the hint bit' logic if the page is already dirty. Whats the benefit for not setting hint bit for a already dirty page. Nothing -- we are disabling the '*avoid* setting the hint bit' logic (double negative there). In other words we would be disabling the setting of hint bits if we see the same transaction many times in a row, but only if the page isn't dirty. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Ability to listen on two unix sockets
On 06/10/2012 03:41 PM, Robert Haas wrote: On Sun, Jun 10, 2012 at 8:36 AM, Tom Lanet...@sss.pgh.pa.us wrote: Peter Eisentrautpete...@gmx.net writes: On lör, 2012-06-09 at 18:26 -0400, Tom Lane wrote: That's not actually quite the same thing as what I suggest above. Currently, unix_socket_directory *overrides* the compiled-in choice. I'm suggesting that it would be better to invent a list that is *added to* the compiled-in choice. If we think it would be best to still be able to override that, then I'd vote for keeping unix_socket_directory as is, and then adding a list named something like secondary_socket_directories. But if we just turn unix_socket_directory into a list, I think the lack of separation between primary and secondary directories will be confusing. By that logic, any list-valued parameter should be split into a primary and secondary setting. Well, no: the key point here is that there will be one directory that is special because it's the one baked into libpq. I agree that for the purposes of the backend in isolation, we might as well just have a list. What's less clear is whether, when considering the backend+client ecosystem as a whole, the special status of the configure-time socket directory ought to be reflected in the way we set up the GUCs. I have to admit that I'm not totally sold on either approach. I think we should consider this in the context of allowing both additional UNIX sockets and additional TCP ports. In the case of TCP ports, it's clearly no good to turn port into a list, because one port number has to be primary, since it goes into the PID file and also affects the naming of any UNIX sockets created. If we add secondary_socket_dirs, I think we will also need secondary_ports. One idea might be to have one new GUC that serves both purposes: additional_sockets = '12345, /foo' I'm sure there are other ways to skin the cat as well. Going through the thread, I'd like to sum it up choosing approach with less potential issues and would like to find a consensus if possible. It seems unix_socket_directory could be turned into list and probably renamed to unix_socket_directories, since it would be confusing if a list value is in singular. On the other hand, we probably don't want to specify listening ports together with additional unix sockets in one configuration option, so it seems better to add a new configuration option to distinguish the primary listening port from additional ports. Regards, Honza -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] hint bit i/o reduction
Yes, but only in the unhinted case -- in oltp workloads tuples get hinted fairly quickly so I doubt this would be a huge impact. Hinted scans will work exactly as they do now. In the unhinted case for OLTP a few tests are added but that's noise compared to the other stuff going on. I believe the design you have purposed is for the unhinted cases only, means if the tuple has already hint set then it will not go to new logic. Is that right? In unhinted cases, there can be 2 ways hint bit can be set 1. It gets the transaction status from Clog memory buffers 2. It needs to do I/O to get the transaction status from Clog. So, I think it will add overhead in case-1 where the processing is fast, but now we will add some new instructions to it. However I think if you have already tested the scenario for same then there should not be any problem. -Original Message- From: Merlin Moncure [mailto:mmonc...@gmail.com] Sent: Wednesday, June 13, 2012 6:50 PM To: Amit Kapila Cc: PostgreSQL-development Subject: Re: [HACKERS] hint bit i/o reduction On Wed, Jun 13, 2012 at 3:57 AM, Amit Kapila amit.kap...@huawei.com wrote: 1) Keep track # times the last transaction id was repeatedly seen in tqual.c (resetting as soon as a new xid was touched. We can do this just for xmin, or separately for both xmin and xmax. Will this be done when we see a new xid duricg scan of tuple, if yes then Won't this logic impact incase there are very few records per transaction in database. As in that case the new optimization won't help much, but it adds the new instructions. Yes, but only in the unhinted case -- in oltp workloads tuples get hinted fairly quickly so I doubt this would be a huge impact. Hinted scans will work exactly as they do now. In the unhinted case for OLTP a few tests are added but that's noise compared to the other stuff going on. We can also try to be smart and disable the 'avoid setting the hint bit' logic if the page is already dirty. Whats the benefit for not setting hint bit for a already dirty page. Nothing -- we are disabling the '*avoid* setting the hint bit' logic (double negative there). In other words we would be disabling the setting of hint bits if we see the same transaction many times in a row, but only if the page isn't dirty. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP patch for Todo Item : Provide fallback_application_name in contrib/pgbench, oid2name, and dblink
I have created the patch by including fallback_application_name for dblink as well. In this I have used the name of fallback_application_name as dblink. Please let me know your suggestions regarding the same. -Original Message- From: Robert Haas [mailto:robertmh...@gmail.com] Sent: Wednesday, June 13, 2012 12:13 AM To: Amit Kapila Cc: pgsql-hackers@postgresql.org Subject: Re: [HACKERS] WIP patch for Todo Item : Provide fallback_application_name in contrib/pgbench, oid2name, and dblink On Mon, Jun 11, 2012 at 5:30 AM, Amit Kapila amit.kap...@huawei.com wrote: As per the previous discussion in link below, it seems that fallback application name needs to be provided for only pgbench and oid2name. http://archives.postgresql.org/message-id/w2g9837222c1004070216u3bc46b3ahbdd fdffdbfb46...@mail.gmail.com However the title of Todo Item suggests it needs to be done for dblink as well. Please suggest if it needs to be done for dblink, if yes then what should be fallback_application_name for it? Why not 'dblink'? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company fallback_application_name.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH 16/16] current version of the design document
On Wed, Jun 13, 2012 at 6:28 AM, Andres Freund and...@2ndquadrant.com wrote: +synchronized catalog at the decoding site. That adds some complexity to use +cases like replicating into a different database or cross-version +replication. For those it is relatively straight-forward to develop a proxy pg +instance that only contains the catalog and does the transformation to textual +changes. wow. Anyways, could you elaborate on a little on how this proxy instance concept would work? Let's take the case where I have N small-ish schema identical database shards that I want to aggregate into a single warehouse -- something that HS/SR currently can't do. There's a lot of ways to do that obviously but assuming the warehouse would have to have a unique schema, could it be done in your architecture? merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Mark JSON error detail messages for translation.
Robert Haas robertmh...@gmail.com writes: On Tue, Jun 12, 2012 at 9:52 PM, Tom Lane t...@sss.pgh.pa.us wrote: The code for this is as attached. Note that I'd rip out the normal-path tracking of line boundaries; it seems better to have a second scan of the data in the error case and save the cycles in non-error cases. Really?! Um ... do you have a problem with that idea, and if so what? It would be considerably more complicated to do it without a second pass. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH 16/16] current version of the design document
Hi Merlin, On Wednesday, June 13, 2012 04:21:12 PM Merlin Moncure wrote: On Wed, Jun 13, 2012 at 6:28 AM, Andres Freund and...@2ndquadrant.com wrote: +synchronized catalog at the decoding site. That adds some complexity to use +cases like replicating into a different database or cross-version +replication. For those it is relatively straight-forward to develop a proxy pg +instance that only contains the catalog and does the transformation to textual +changes. wow. Anyways, could you elaborate on a little on how this proxy instance concept would work? To do the decoding into another form you need an up2date catalog + correct binaries. So the idea would be to have a minimal instance which is just a copy of the database with all the tables with an oid FirstNormalObjectId i.e. only the catalog tables. Then you can apply all xlog changes on system tables using the existing infrastructure for HS (or use the command trigger equivalent we need to build for BDR) and decode everything else into the ApplyCache just as done in the patch. Then you would fill out the callbacks for the ApplyCache (see patch 14/16 and 15/16 for an example) to do whatever you want with the data. I.e. generate plain sql statements or run some transform procedure. Let's take the case where I have N small-ish schema identical database shards that I want to aggregate into a single warehouse -- something that HS/SR currently can't do. There's a lot of ways to do that obviously but assuming the warehouse would have to have a unique schema, could it be done in your architecture? Not sure what you mean by the warehouse having a unique schema? It has the same schema as the OLTP counterparts? That would obviously be the easy case if you take care and guarantee uniqueness of keys upfront. That basically would be trivial ;) It gets a bit more complex if you need to transform the data for the warehouse. I don't plan to put in work to make that possible without some C coding (filling out the callbacks and doing the work in there). It shouldn't need much though. Does that answer your question? Andres -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Mark JSON error detail messages for translation.
On Wed, Jun 13, 2012 at 10:35 AM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On Tue, Jun 12, 2012 at 9:52 PM, Tom Lane t...@sss.pgh.pa.us wrote: The code for this is as attached. Note that I'd rip out the normal-path tracking of line boundaries; it seems better to have a second scan of the data in the error case and save the cycles in non-error cases. Really?! Um ... do you have a problem with that idea, and if so what? It would be considerably more complicated to do it without a second pass. Could you explain how it's broken now, and why it will be hard to fix? People may well want to use a cast to JSON within an exception block as a way of testing whether strings are valid JSON. We should not assume that the cost of an exception is totally irrelevant, because this might be iterated. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] hint bit i/o reduction
On Wed, Jun 13, 2012 at 9:02 AM, Amit Kapila amit.kap...@huawei.com wrote: Yes, but only in the unhinted case -- in oltp workloads tuples get hinted fairly quickly so I doubt this would be a huge impact. Hinted scans will work exactly as they do now. In the unhinted case for OLTP a few tests are added but that's noise compared to the other stuff going on. I believe the design you have purposed is for the unhinted cases only, means if the tuple has already hint set then it will not go to new logic. Is that right? In unhinted cases, there can be 2 ways hint bit can be set 1. It gets the transaction status from Clog memory buffers 2. It needs to do I/O to get the transaction status from Clog. So, I think it will add overhead in case-1 where the processing is fast, but now we will add some new instructions to it. yeah -- but you still have to call: *) TransactionIdIsCurrentTransactionId(), which adds a couple of tests (and a bsearch in the presence of subtransactions) *) TransactionIdIsInProgress() which adds a few tests and a out of line call in the typical case *) TransactionIdDidCommit() which adds a few tests and two out of line calls in the typical case and, while setting the hint it: *) Several more tests plus: *) TransactionIdGetCommitLSN() another out of line call and a test *) XLogNeedsFlush() two more out of line calls plus a few tests *) SetBufferCommitInfoNeedsSave out of line call, several more tests adding a few instructions to the above probably won't be measurable (naturally it would be tested to confirm that). merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Mark JSON error detail messages for translation.
On Wednesday, June 13, 2012 05:03:38 PM Robert Haas wrote: On Wed, Jun 13, 2012 at 10:35 AM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On Tue, Jun 12, 2012 at 9:52 PM, Tom Lane t...@sss.pgh.pa.us wrote: The code for this is as attached. Note that I'd rip out the normal-path tracking of line boundaries; it seems better to have a second scan of the data in the error case and save the cycles in non-error cases. Really?! Um ... do you have a problem with that idea, and if so what? It would be considerably more complicated to do it without a second pass. Could you explain how it's broken now, and why it will be hard to fix? People may well want to use a cast to JSON within an exception block as a way of testing whether strings are valid JSON. We should not assume that the cost of an exception is totally irrelevant, because this might be iterated. Exception blocks/subtransctions already are considerably expensive. I have a hard time believing this additional cost would be measureable. Andres -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Ability to listen on two unix sockets
On Jun13, 2012, at 15:14 , Honza Horak wrote: Since systemd with it's PrivateTmp feature is going to be used in more and more distros, there will probably be a bigger need to solve in-accessible default unix socket directory /tmp in the future. Thus, I'd like to ask if anybody is aware of any issue connected with simply patching pg_config_manual.h, same as Debian does it already? For example, is there any piece of software, that simply rely on /tmp location of the socket and doesn't use libpg for the communication? I've used self-compiled postgres version on debian for years, and debian's way of doing things is major PITA in that situation. You end up having to specify that full path to the socket directory *everywhere*, because otherwise things start to break if you recompile something and it suddenly happens to use the debian-supplied libpq instead of your own. Supporting sockets in multiple directories would solve that, once and for all. best regards, Florian Pflug -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Mark JSON error detail messages for translation.
On Wed, Jun 13, 2012 at 11:06 AM, Andres Freund and...@2ndquadrant.com wrote: On Wednesday, June 13, 2012 05:03:38 PM Robert Haas wrote: On Wed, Jun 13, 2012 at 10:35 AM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On Tue, Jun 12, 2012 at 9:52 PM, Tom Lane t...@sss.pgh.pa.us wrote: The code for this is as attached. Note that I'd rip out the normal-path tracking of line boundaries; it seems better to have a second scan of the data in the error case and save the cycles in non-error cases. Really?! Um ... do you have a problem with that idea, and if so what? It would be considerably more complicated to do it without a second pass. Could you explain how it's broken now, and why it will be hard to fix? People may well want to use a cast to JSON within an exception block as a way of testing whether strings are valid JSON. We should not assume that the cost of an exception is totally irrelevant, because this might be iterated. Exception blocks/subtransctions already are considerably expensive. I have a hard time believing this additional cost would be measureable. According to my testing, the main cost of an exception block catching a division-by-zero error is that of generating the error message, primarily sprintf()-type stuff. The cost of scanning a multi-megabyte string seems likely to be much larger. Mind you, I'm not going to spend a lot of time crying into my beer if it turns out that there's no other reasonable way to implement this, but I do think that it's entirely appropriate to ask why it's not possible to do better. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Mark JSON error detail messages for translation.
On Wednesday, June 13, 2012 05:18:22 PM Robert Haas wrote: On Wed, Jun 13, 2012 at 11:06 AM, Andres Freund and...@2ndquadrant.com wrote: On Wednesday, June 13, 2012 05:03:38 PM Robert Haas wrote: On Wed, Jun 13, 2012 at 10:35 AM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On Tue, Jun 12, 2012 at 9:52 PM, Tom Lane t...@sss.pgh.pa.us wrote: The code for this is as attached. Note that I'd rip out the normal-path tracking of line boundaries; it seems better to have a second scan of the data in the error case and save the cycles in non-error cases. Really?! Um ... do you have a problem with that idea, and if so what? It would be considerably more complicated to do it without a second pass. Could you explain how it's broken now, and why it will be hard to fix? People may well want to use a cast to JSON within an exception block as a way of testing whether strings are valid JSON. We should not assume that the cost of an exception is totally irrelevant, because this might be iterated. Exception blocks/subtransctions already are considerably expensive. I have a hard time believing this additional cost would be measureable. According to my testing, the main cost of an exception block catching a division-by-zero error is that of generating the error message, primarily sprintf()-type stuff. The cost of scanning a multi-megabyte string seems likely to be much larger. True. I ignored that there doesn't have to be an xid assigned yet... I still think its not very relevant though. Andres -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] 9.3devel branch
Seeing as we're about to start a CommitFest, I think it's about time to create the 9.3devel branch. I'm happy to do it, but I believe it's usually Tom's gig. Tom? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH 07/16] Log enough data into the wal to reconstruct logical changes from it if wal_level=logical
Andres Freund and...@2ndquadrant.com wrote: This adds a new wal_level value 'logical' Missing cases: - heap_multi_insert - primary key changes for updates - no primary key - LOG_NEWPAGE First, Wow! I look forward to the point where we can replace our trigger-based replication with this! Your missing cases for primary key issues would not cause us any pain for our current system, since we require a primary key and don't support updates to PKs for replicated tables. While I don't expect that the first cut of this will be able to replace our replication-related functionality, I'm interested in making sure it can be extended in that direction, so I have a couple things to consider: (1) For our usage, with dozens of source databases feeding into multiple aggregate databases and interfaces, DDL replication is not of much if any interest. It should be easy enough to ignore as long as it is low volume, so that doesn't worry me too much; but if I'm missing something any you run across any logical WAL logging for DDL which does generate a lot of WAL traffic, it would be nice to have a way to turn that off at generation time rather than filtering it or ignoring it later. (Probably won't be an issue, just a head-up.) (2) To match the functionality we now have, we would need the logical stream to include the *before* image of the whole tuple for each row updated or deleted. I understand that this is not needed for the use cases you are initially targeting; I just hope the design leaves this option open without needing to disturb other use cases. Perhaps this would require yet another wal_level value. Perhaps rather than testing the current value directly for determining whether to log something, the GUC processing could set some booleans for faster testing and less code churn when the initial implementation is expanded to support other use cases (like ours). (3) Similar to point 2, it would be extremely desirable to be able to determine table name and columns names for the tuples in a stream from that stream, without needing to query a hot standby or similar digging into other sources of information. Not only will the various source databases all have different OID values for the same objects, and the aggregate targets have different values from each other and the sources, but some targets don't have the tables at all. I'm talking about our database transaction repository and the interfaces to business partners which we currently drive off of the same transaction stream which drives replication. Would it be helpful or just a distraction if I were to provide a more detailed description of our whole replication / transaction store / interface area? If it would be useful, I could also describe some other replication patterns I have seen over the years. In particular, one which might be interesting is where subsets of the data are distributed to multiple standalone machines which have intermittent or unreliable connections to a central site, which periodically collects data from all the remote sites, recalculates distribution, and sends transactions back out to those remote sites to add, remove, and update rows based on the distribution rules and the new data. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH 16/16] current version of the design document
On Wed, Jun 13, 2012 at 9:40 AM, Andres Freund and...@2ndquadrant.com wrote: Hi Merlin, On Wednesday, June 13, 2012 04:21:12 PM Merlin Moncure wrote: On Wed, Jun 13, 2012 at 6:28 AM, Andres Freund and...@2ndquadrant.com wrote: +synchronized catalog at the decoding site. That adds some complexity to use +cases like replicating into a different database or cross-version +replication. For those it is relatively straight-forward to develop a proxy pg +instance that only contains the catalog and does the transformation to textual +changes. wow. Anyways, could you elaborate on a little on how this proxy instance concept would work? To do the decoding into another form you need an up2date catalog + correct binaries. So the idea would be to have a minimal instance which is just a copy of the database with all the tables with an oid FirstNormalObjectId i.e. only the catalog tables. Then you can apply all xlog changes on system tables using the existing infrastructure for HS (or use the command trigger equivalent we need to build for BDR) and decode everything else into the ApplyCache just as done in the patch. Then you would fill out the callbacks for the ApplyCache (see patch 14/16 and 15/16 for an example) to do whatever you want with the data. I.e. generate plain sql statements or run some transform procedure. Let's take the case where I have N small-ish schema identical database shards that I want to aggregate into a single warehouse -- something that HS/SR currently can't do. There's a lot of ways to do that obviously but assuming the warehouse would have to have a unique schema, could it be done in your architecture? Not sure what you mean by the warehouse having a unique schema? It has the same schema as the OLTP counterparts? That would obviously be the easy case if you take care and guarantee uniqueness of keys upfront. That basically would be trivial ;) by unique I meant 'not the same as the shards' -- presumably this would mean one of a) each shard's data would be in a private schema folder or b) you'd have one set of tables but decorated with an extra shard identifying column that would to be present in all keys to get around uniqueness issues It gets a bit more complex if you need to transform the data for the warehouse. I don't plan to put in work to make that possible without some C coding (filling out the callbacks and doing the work in there). It shouldn't need much though. Does that answer your question? yes. Do you envision it would be possible to wrap the ApplyCache callbacks in a library that could be exposed as an extension? For example, a library that would stick the replication data into a queue that a userland (non C) process could walk, transform, etc? I know that's vague -- my general thrust here is that I find the transformation features particularly interesting and I'm wondering how much C coding would be needed to access them in the long term. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [RFC][PATCH] Logical Replication/BDR prototype and architecture
On Wed, Jun 13, 2012 at 7:27 AM, Andres Freund and...@2ndquadrant.com wrote: Unless somebody objects I will add most of the individual marked as RFC to the current commitfest. I hope that with comments stemming from that round we can get several of the patches into the first or second commitfest. As soon as the design is clear/accepted we will try very hard to get the following patches into the second or third round. I made a logical replication topic within the CommitFest for this patch series. I think you should add them all there. I have some substantive thoughts about the design as well, which I will write up in a separate email. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Mark JSON error detail messages for translation.
Andres Freund and...@2ndquadrant.com writes: On Wednesday, June 13, 2012 05:18:22 PM Robert Haas wrote: According to my testing, the main cost of an exception block catching a division-by-zero error is that of generating the error message, primarily sprintf()-type stuff. The cost of scanning a multi-megabyte string seems likely to be much larger. True. I ignored that there doesn't have to be an xid assigned yet... I still think its not very relevant though. I don't think it's relevant either. In the first place, I don't think that errors occurring multi megabytes into a JSON blob are going to occur often enough to be performance critical. In the second place, removing cycles from the non-error case is worth something too, probably more than making the error case faster is worth. In any case, the proposed scheme for providing context requires that you know where the error is before you can identify the context. I considered schemes that would keep track of the last N characters or line breaks in case one of them proved to be the one we need. That would add enough cycles to non-error cases to almost certainly not be desirable. I also considered trying to back up, but that doesn't look promising either for arbitrary multibyte encodings. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.3devel branch
Robert Haas robertmh...@gmail.com writes: Seeing as we're about to start a CommitFest, I think it's about time to create the 9.3devel branch. I'm happy to do it, but I believe it's usually Tom's gig. There are a couple open issues (like reversion of that checkpoint-skipping patch) that probably ought to get dealt with before we branch, just to save double patching. Barring objections I'll cut the branch tonight or tomorrow though. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH 07/16] Log enough data into the wal to reconstruct logical changes from it if wal_level=logical
On Wednesday, June 13, 2012 05:27:06 PM Kevin Grittner wrote: Andres Freund and...@2ndquadrant.com wrote: This adds a new wal_level value 'logical' Missing cases: - heap_multi_insert - primary key changes for updates - no primary key - LOG_NEWPAGE First, Wow! Thanks ;) I hope you will still be convinced after reading some of the code :P I look forward to the point where we can replace our trigger-based replication with this! Your missing cases for primary key issues would not cause us any pain for our current system, since we require a primary key and don't support updates to PKs for replicated tables. While I don't expect that the first cut of this will be able to replace our replication-related functionality, I'm interested in making sure it can be extended in that direction, so I have a couple things to consider: Ok. (1) For our usage, with dozens of source databases feeding into multiple aggregate databases and interfaces, DDL replication is not of much if any interest. It should be easy enough to ignore as long as it is low volume, so that doesn't worry me too much; but if I'm missing something any you run across any logical WAL logging for DDL which does generate a lot of WAL traffic, it would be nice to have a way to turn that off at generation time rather than filtering it or ignoring it later. (Probably won't be an issue, just a head-up.) I don't really see a problem there. I don't yet have a mental image of the API/Parameters to START_LOGICAL_REPLICATION to specify filters on the source side, but this should be possible. (2) To match the functionality we now have, we would need the logical stream to include the *before* image of the whole tuple for each row updated or deleted. I understand that this is not needed for the use cases you are initially targeting; I just hope the design leaves this option open without needing to disturb other use cases. Perhaps this would require yet another wal_level value. Perhaps rather than testing the current value directly for determining whether to log something, the GUC processing could set some booleans for faster testing and less code churn when the initial implementation is expanded to support other use cases (like ours). Hm. I don't see a big problem implementing this although I have to say that I am a bit hesitant to do this without in-core users of it for fear of silent breakage. WAL is kind of a central thing... ;). But then, the implementation should be relatively easy. I don't see a need to break the wal level down into some booleans: changing the test from wal_level = WAL_LEVEL_LOGICAL into something else shouldn't result in any measurable difference in that codepath. I definitely have the use-case of replicating into databases where you need to do some transformation in mind. (3) Similar to point 2, it would be extremely desirable to be able to determine table name and columns names for the tuples in a stream from that stream, without needing to query a hot standby or similar digging into other sources of information. Not only will the various source databases all have different OID values for the same objects, and the aggregate targets have different values from each other and the sources, but some targets don't have the tables at all. I'm talking about our database transaction repository and the interfaces to business partners which we currently drive off of the same transaction stream which drives replication. I don't forsee this as a realistic thing. I think the required changes would be way to intrusive for too little gain. The performance and space requirements would probably also be rather noticeable. I think you will have to live with the mentioned 'proxy' pg instances which only contain the catalog (which normally isn't very big anyway) doing the decoding into your target database (which then doesn't need the same oids). For how I imagine those proxy instances check my mail to merlin earlier today, I described it in some more detail there. If I can find the time I should possibly develop (another) prototype of such a proxy instance. I don't forsee it needing much more infrastructure/code. Is that a problem for your use-case? If yes, why? Would it be helpful or just a distraction if I were to provide a more detailed description of our whole replication / transaction store / interface area? If you have it ready, yes. Otherwise I think I have a good enough image already. I tried to listen to you at pgcon and I have developed similar things before. To be honest, I would prefer you spending the time on checking some of the code if not ;) If it would be useful, I could also describe some other replication patterns I have seen over the years. In particular, one which might be interesting is where subsets of the data are distributed to multiple standalone machines which have intermittent or unreliable connections to a central site, which
Re: [HACKERS] 9.3devel branch
On Wed, Jun 13, 2012 at 11:57 AM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: Seeing as we're about to start a CommitFest, I think it's about time to create the 9.3devel branch. I'm happy to do it, but I believe it's usually Tom's gig. There are a couple open issues (like reversion of that checkpoint-skipping patch) that probably ought to get dealt with before we branch, just to save double patching. Barring objections I'll cut the branch tonight or tomorrow though. Yeah, that'd be nice to get cleaned up. Thanks. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Mark JSON error detail messages for translation.
On Wed, Jun 13, 2012 at 11:55 AM, Tom Lane t...@sss.pgh.pa.us wrote: Andres Freund and...@2ndquadrant.com writes: On Wednesday, June 13, 2012 05:18:22 PM Robert Haas wrote: According to my testing, the main cost of an exception block catching a division-by-zero error is that of generating the error message, primarily sprintf()-type stuff. The cost of scanning a multi-megabyte string seems likely to be much larger. True. I ignored that there doesn't have to be an xid assigned yet... I still think its not very relevant though. I don't think it's relevant either. In the first place, I don't think that errors occurring multi megabytes into a JSON blob are going to occur often enough to be performance critical. In the second place, removing cycles from the non-error case is worth something too, probably more than making the error case faster is worth. In any case, the proposed scheme for providing context requires that you know where the error is before you can identify the context. I considered schemes that would keep track of the last N characters or line breaks in case one of them proved to be the one we need. That would add enough cycles to non-error cases to almost certainly not be desirable. I also considered trying to back up, but that doesn't look promising either for arbitrary multibyte encodings. Oh, I see. :-( Well, I guess I'll have to suck it up, then. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH 16/16] current version of the design document
Hi, On Wednesday, June 13, 2012 05:39:36 PM Merlin Moncure wrote: On Wed, Jun 13, 2012 at 9:40 AM, Andres Freund and...@2ndquadrant.com wrote: Let's take the case where I have N small-ish schema identical database shards that I want to aggregate into a single warehouse -- something that HS/SR currently can't do. There's a lot of ways to do that obviously but assuming the warehouse would have to have a unique schema, could it be done in your architecture? Not sure what you mean by the warehouse having a unique schema? It has the same schema as the OLTP counterparts? That would obviously be the easy case if you take care and guarantee uniqueness of keys upfront. That basically would be trivial ;) by unique I meant 'not the same as the shards' -- presumably this would mean one of a) each shard's data would be in a private schema folder or b) you'd have one set of tables but decorated with an extra shard identifying column that would to be present in all keys to get around uniqueness issues I think it would have to mean a) and that you have N of those logical import processes hanging around. We really need an identical TupleDesc to do the decoding. It gets a bit more complex if you need to transform the data for the warehouse. I don't plan to put in work to make that possible without some C coding (filling out the callbacks and doing the work in there). It shouldn't need much though. Does that answer your question? yes. Do you envision it would be possible to wrap the ApplyCache callbacks in a library that could be exposed as an extension? For example, a library that would stick the replication data into a queue that a userland (non C) process could walk, transform, etc? I know that's vague -- my general thrust here is that I find the transformation features particularly interesting and I'm wondering how much C coding would be needed to access them in the long term. I can definitely imagine the callbacks calling some wrapper around a higher- level language. Not sure how that fits into an extension (if you mean it as in CREATE EXTENSION) though. I don't think you will be able to start the replication process from inside a normal backend. I imagine something like specifying a shared object + parameters in the config or such. Andres -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [RFC][PATCH] Logical Replication/BDR prototype and architecture
On Wednesday, June 13, 2012 05:53:31 PM Robert Haas wrote: On Wed, Jun 13, 2012 at 7:27 AM, Andres Freund and...@2ndquadrant.com wrote: Unless somebody objects I will add most of the individual marked as RFC to the current commitfest. I hope that with comments stemming from that round we can get several of the patches into the first or second commitfest. As soon as the design is clear/accepted we will try very hard to get the following patches into the second or third round. I made a logical replication topic within the CommitFest for this patch series. I think you should add them all there. Thanks. Added all but the bgworker patch (which is not ready) and the WalSndWakeup (different category) one which is not really relevant to the topic. I have the feeling I am due quite some reviewing this round... I have some substantive thoughts about the design as well, which I will write up in a separate email. Thanks. Looking forward to it. At least now, before I have read it. Andres -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [GENERAL] pg_upgrade from 9.0.7 to 9.1.3: duplicate key pg_authid_oid_index
On Mon, Jun 04, 2012 at 10:16:45AM -0400, Bruce Momjian wrote: I think the checks that are actually needed here are (1) bootstrap superusers are named the same, and (2) there are no roles other than the bootstrap superuser in the new cluster. You are right that it is more complex than I stated, but given the limited feedback I got on the pg_upgrade/plplython, I figured people didn't want to hear the details. Here they are: There are three failure modes for pg_upgrade: 1. check failure 2. schema restore failure 3. silent failure/corruption Of course, the later items are worse than the earlier ones. The reporter got a schema restore failure while still following the pg_upgrade instructions. My initial patch changed that #2 error to a #1 error. Tom is right that creating users in the new cluster (against instructions), can still generate a #2 error if a new/old pg_authid.oid match, and they are not the install user, but seeing that is something that is against the instructions, I was going to leave that as a #2. Applied and back-patched to Postgres 9.1. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Mark JSON error detail messages for translation.
Robert Haas robertmh...@gmail.com writes: On Wed, Jun 13, 2012 at 11:55 AM, Tom Lane t...@sss.pgh.pa.us wrote: In any case, the proposed scheme for providing context requires that you know where the error is before you can identify the context. I considered schemes that would keep track of the last N characters or line breaks in case one of them proved to be the one we need. That would add enough cycles to non-error cases to almost certainly not be desirable. I also considered trying to back up, but that doesn't look promising either for arbitrary multibyte encodings. Oh, I see. :-( Attached is a complete proposed patch for this, with some further adjustments to make the output look a bit more like what we use for SQL error context printouts (in particular, ... at both ends of the excerpt when appropriate). One thing I did not touch, but am less than happy with, is the wording of this detail message: errdetail(Expected string, number, object, array, true, false, or null, but found \%s\., token), This seems uselessly verbose to me. It could be argued that enumerating all the options is helpful to rank JSON novices ... but unless you know exactly what an object is and why that's different from the other options, it's not all that helpful. I'm inclined to think that something like this would be better: errdetail(Expected JSON value, but found \%s\., token), Thoughts? regards, tom lane diff --git a/src/backend/utils/adt/json.c b/src/backend/utils/adt/json.c index e79c294..59335db 100644 *** a/src/backend/utils/adt/json.c --- b/src/backend/utils/adt/json.c *** typedef struct /* state of JSON lexe *** 43,50 char *token_start; /* start of current token within input */ char *token_terminator; /* end of previous or current token */ JsonValueType token_type; /* type of current token, once it's known */ - int line_number; /* current line number (counting from 1) */ - char *line_start; /* start of current line within input (BROKEN!!) */ } JsonLexContext; typedef enum /* states of JSON parser */ --- 43,48 *** static void json_lex_string(JsonLexConte *** 78,83 --- 76,82 static void json_lex_number(JsonLexContext *lex, char *s); static void report_parse_error(JsonParseStack *stack, JsonLexContext *lex); static void report_invalid_token(JsonLexContext *lex); + static int report_json_context(JsonLexContext *lex); static char *extract_mb_char(char *s); static void composite_to_json(Datum composite, StringInfo result, bool use_line_feeds); *** json_validate_cstring(char *input) *** 185,192 /* Set up lexing context. */ lex.input = input; lex.token_terminator = lex.input; - lex.line_number = 1; - lex.line_start = input; /* Set up parse stack. */ stacksize = 32; --- 184,189 *** json_lex(JsonLexContext *lex) *** 335,345 /* Skip leading whitespace. */ s = lex-token_terminator; while (*s == ' ' || *s == '\t' || *s == '\n' || *s == '\r') - { - if (*s == '\n') - lex-line_number++; s++; - } lex-token_start = s; /* Determine token type. */ --- 332,338 *** json_lex(JsonLexContext *lex) *** 350,356 { /* End of string. */ lex-token_start = NULL; ! lex-token_terminator = NULL; } else { --- 343,349 { /* End of string. */ lex-token_start = NULL; ! lex-token_terminator = s; } else { *** json_lex(JsonLexContext *lex) *** 397,403 /* * We got some sort of unexpected punctuation or an otherwise * unexpected character, so just complain about that one ! * character. */ lex-token_terminator = s + 1; report_invalid_token(lex); --- 390,397 /* * We got some sort of unexpected punctuation or an otherwise * unexpected character, so just complain about that one ! * character. (It can't be multibyte because the above loop ! * will advance over any multibyte characters.) */ lex-token_terminator = s + 1; report_invalid_token(lex); *** json_lex_string(JsonLexContext *lex) *** 443,453 lex-token_terminator = s; report_invalid_token(lex); } ereport(ERROR, (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION), errmsg(invalid input syntax for type json), ! errdetail(line %d: Character with value \0x%02x\ must be escaped., ! lex-line_number, (unsigned char) *s))); } else if (*s == '\\') { --- 437,449 lex-token_terminator = s; report_invalid_token(lex); } + lex-token_terminator = s + pg_mblen(s); ereport(ERROR, (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION), errmsg(invalid input syntax for type json), ! errdetail(Character with value 0x%02x must be escaped., !
[HACKERS] uncataloged tables are a vestigial husk
While working on some code today, I noticed that RELKIND_UNCATALOGED appears to serve no useful purpose. In the few places where we check for it at all, we treat it in exactly the same way as RELKIND_RELATION. It seems that it's only purpose is to serve as a placeholder inside each newly-created relcache entry until the real relkind is filled in. But this seems pretty silly, because RelationBuildLocalRelation(), where the relcache entry is created, is called in only one place, heap_create(), which already knows the relkind. So, essentially, right now, we're relying on the callers of heap_create() to pass in a relkind and then, after heap_create() returns, stick that same relkind into the relcache entry before inserting the pg_class tuple. The only place where that doesn't happen is in the bootstrap code, which instead allows RELKIND_UNCATALOGED to stick around in the relcache entry even though we have RELKIND_RELATION in the pg_class tuple. But we don't actually rely on that for anything, so it seems this is just an unnecessary complication. The attached patch cleans it up by removing RELKIND_UNCATALOGED and teaching RelationBuildLocalRelation() to set the relkind itself. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company remove-relkind-uncataloged.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Mark JSON error detail messages for translation.
On Wed, Jun 13, 2012 at 12:27 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On Wed, Jun 13, 2012 at 11:55 AM, Tom Lane t...@sss.pgh.pa.us wrote: In any case, the proposed scheme for providing context requires that you know where the error is before you can identify the context. I considered schemes that would keep track of the last N characters or line breaks in case one of them proved to be the one we need. That would add enough cycles to non-error cases to almost certainly not be desirable. I also considered trying to back up, but that doesn't look promising either for arbitrary multibyte encodings. Oh, I see. :-( Attached is a complete proposed patch for this, with some further adjustments to make the output look a bit more like what we use for SQL error context printouts (in particular, ... at both ends of the excerpt when appropriate). I like some of these changes - in particular, the use of errcontext(), but some of them still seem off. ! DETAIL: Token ' is invalid. ! CONTEXT: JSON data, line 1: '... This doesn't make sense to me. ! DETAIL: Character with value 0x0a must be escaped. ! CONTEXT: JSON data, line 1: abc ! ... This seems an odd way to present this, especially if the goal is to NOT include the character needing escaping in the log unescaped, which I thought was the point of saying 0x0a. ! DETAIL: \u must be followed by four hexadecimal digits. ! CONTEXT: JSON data, line 1: \u000g... Why does this end in ... even though there's nothing further in the input string? One thing I did not touch, but am less than happy with, is the wording of this detail message: errdetail(Expected string, number, object, array, true, false, or null, but found \%s\., token), This seems uselessly verbose to me. It could be argued that enumerating all the options is helpful to rank JSON novices ... but unless you know exactly what an object is and why that's different from the other options, it's not all that helpful. I'm inclined to think that something like this would be better: errdetail(Expected JSON value, but found \%s\., token), Thoughts? Good idea. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] initdb and fsync
On Wed, 2012-06-13 at 13:53 +0300, Peter Eisentraut wrote: The --help output for the -N option was copy-and-pasted wrongly. The message issued when using -N is also a bit content-free. Maybe something like Running in nosync mode. The data directory might become corrupt if the operating system crashes.\n Thank you, fixed. Which leads to the question, how does one get out of this state? Is running sync(1) enough? Is starting the postgres server enough? sync(1) calls sync(2), and the man page says: According to the standard specification (e.g., POSIX.1-2001), sync() schedules the writes, but may return before the actual writing is done. However, since version 1.3.20 Linux does actually wait. (This still does not guarantee data integrity: modern disks have large caches.) So it looks like sync is enough if you are on linux *and* you have any unprotected write cache disabled. I don't think starting the postgres server is enough. Before, I think we were safe because we could assume that the OS would flush the buffers before you had time to store any important data. But now, that window can be much larger. There are no updates to the initdb man page included in your patch, which would be a suitable place to discuss this briefly. Thank you, I added that. Regards, Jeff Davis initdb-fsync-20120613.patch.gz Description: GNU Zip compressed data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Mark JSON error detail messages for translation.
Robert Haas robertmh...@gmail.com writes: I like some of these changes - in particular, the use of errcontext(), but some of them still seem off. ! DETAIL: Token ' is invalid. ! CONTEXT: JSON data, line 1: '... This doesn't make sense to me. Well, the input is two single quotes and the complaint is that the first one of them doesn't constitute a valid JSON token. What would you expect to see instead? I considered putting double quotes around the excerpt text, but we do not do that in SQL error-context reports, and it seems likely to just be confusing given the prevalence of double quotes in JSON data. But happy to consider opposing arguments. Another thing that could be done here is to print the rest of the line, rather than ..., if there's not very much of it. I'm not sure that's an improvement though. The advantage of the proposed design is that the point where the excerpt ends is exactly where the error was detected; lacking an error cursor, I don't see how else to present that info. ! DETAIL: Character with value 0x0a must be escaped. ! CONTEXT: JSON data, line 1: abc ! ... This seems an odd way to present this, especially if the goal is to NOT include the character needing escaping in the log unescaped, which I thought was the point of saying 0x0a. Do you think it would be better to present something that isn't what the user typed? Again, I don't see an easy improvement here. If you don't want newlines in the logged context, what will we do for something like {foo: { bar:44 } ] There basically isn't any useful context to present unless we are willing to back up several lines, and I don't think it'll be more readable if we escape all the newlines. ! DETAIL: \u must be followed by four hexadecimal digits. ! CONTEXT: JSON data, line 1: \u000g... Why does this end in ... even though there's nothing further in the input string? There is something further, namely the trailing double quote. The examples in the regression tests are not really designed to show cases where this type of error reporting is an improvement, because there's hardly any context around the error sites. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [RFC][PATCH] Logical Replication/BDR prototype and architecture
On 12-06-13 07:27 AM, Andres Freund wrote: Its also available in the 'cabal-rebasing' branch on git.postgresql.org/users/andresfreund/postgres.git . That branch will modify history though. That branch has a merge error in f685a11ce43b9694cbe61ffa42e396c9fbc32b05 gcc -O2 -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -I../../../../src/include -D_GNU_SOURCE -c -o xact.o xact.c xact.c:4684: error: expected identifier or ‘(’ before ‘’ token xact.c:4690:46: warning: character constant too long for its type xact.c:4712:46: warning: character constant too long for its type xact.c:4719: error: expected identifier or ‘(’ before ‘’ token xact.c:4740:46: warning: character constant too long for its type make[4]: *** [xact.o] Error 1 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] uncataloged tables are a vestigial husk
Robert Haas robertmh...@gmail.com writes: While working on some code today, I noticed that RELKIND_UNCATALOGED appears to serve no useful purpose. In the few places where we check for it at all, we treat it in exactly the same way as RELKIND_RELATION. It seems that it's only purpose is to serve as a placeholder inside each newly-created relcache entry until the real relkind is filled in. I suspect that it had some actual usefulness back in Berkeley days. But now that catalogs are created with the correct relkind to start with during initdb, I agree it's probably just inertia keeping that around. The attached patch cleans it up by removing RELKIND_UNCATALOGED and teaching RelationBuildLocalRelation() to set the relkind itself. I think there are probably some places to fix in the docs too. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [RFC][PATCH] Logical Replication/BDR prototype and architecture
On Wednesday, June 13, 2012 07:11:40 PM Steve Singer wrote: On 12-06-13 07:27 AM, Andres Freund wrote: Its also available in the 'cabal-rebasing' branch on git.postgresql.org/users/andresfreund/postgres.git . That branch will modify history though. That branch has a merge error in f685a11ce43b9694cbe61ffa42e396c9fbc32b05 gcc -O2 -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -I../../../../src/include -D_GNU_SOURCE -c -o xact.o xact.c xact.c:4684: error: expected identifier or ‘(’ before ‘’ token xact.c:4690:46: warning: character constant too long for its type xact.c:4712:46: warning: character constant too long for its type xact.c:4719: error: expected identifier or ‘(’ before ‘’ token xact.c:4740:46: warning: character constant too long for its type make[4]: *** [xact.o] Error 1 Aw crap. Will fix that. Sorry. Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] uncataloged tables are a vestigial husk
On Wed, Jun 13, 2012 at 1:13 PM, Tom Lane t...@sss.pgh.pa.us wrote: The attached patch cleans it up by removing RELKIND_UNCATALOGED and teaching RelationBuildLocalRelation() to set the relkind itself. I think there are probably some places to fix in the docs too. catalogs.sgml doesn't include it in the list of possible relkinds, since it never hits the disk. And grep -i uncatalog doc/src/sgml comes up empty. Where else should I be looking? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Is cachedFetchXidStatus provably valid?
It's probably an academic concern, but what happens if a backend saves off cachedFetchXidStatus and then sleeps for a very long time. During that time an xid wraparound happens and the backend wakes up and happens to read another unhinted tuple with the same xid and a different commit status. This is obviously incredibly unlikely, but shouldn't cachedFetchXid be cleared at some appropriate point -- perhaps end of transaction? merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [RFC][PATCH] Logical Replication/BDR prototype and architecture
On Wednesday, June 13, 2012 07:11:40 PM Steve Singer wrote: On 12-06-13 07:27 AM, Andres Freund wrote: Its also available in the 'cabal-rebasing' branch on git.postgresql.org/users/andresfreund/postgres.git . That branch will modify history though. That branch has a merge error in f685a11ce43b9694cbe61ffa42e396c9fbc32b05 gcc -O2 -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -I../../../../src/include -D_GNU_SOURCE -c -o xact.o xact.c xact.c:4684: error: expected identifier or ‘(’ before ‘’ token xact.c:4690:46: warning: character constant too long for its type xact.c:4712:46: warning: character constant too long for its type xact.c:4719: error: expected identifier or ‘(’ before ‘’ token xact.c:4740:46: warning: character constant too long for its type make[4]: *** [xact.o] Error 1 Hrmpf. I reordered the patch series a last time to be more clear and I somehow didn't notice this. I compiled tested the now pushed head (7e0340a3bef927f79b3d97a11f94ede4b911560c) and will submit an updated patch [10/16]. Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH 10/16] Introduce the concept that wal has a 'origin' node
The previous mail contained a patch with a mismerge caused by reording commits. Corrected version attached. Thanks to Steve Singer for noticing this quickly. Andres -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services From efb28b1f931a30738faac83703810b598a82a6ee Mon Sep 17 00:00:00 2001 From: Andres Freund and...@anarazel.de Date: Sat, 9 Jun 2012 14:49:34 +0200 Subject: [PATCH] Introduce the concept that wal has a 'origin' node One solution to avoid loops when doing wal based logical replication in topologies which are more complex than one unidirectional transport is introducing the concept of a 'origin_id' into the wal stream. Luckily there is some padding in the XLogRecord struct that allows us to add that field without further bloating the struct. This solution was chosen because it allows for just about any topology and is inobstrusive. This adds a new configuration parameter multimaster_node_id which determines the id used for wal originating in one cluster. When applying changes from wal from another cluster code can set the variable current_replication_origin_id. This is a global variable because passing it through everything which can generate wal would be far to intrusive. --- src/backend/access/transam/xact.c | 54 + src/backend/access/transam/xlog.c |3 +- src/backend/access/transam/xlogreader.c |2 + src/backend/replication/logical/Makefile |2 +- src/backend/replication/logical/logical.c | 19 + src/backend/utils/misc/guc.c | 19 + src/backend/utils/misc/postgresql.conf.sample |3 ++ src/include/access/xlog.h |4 +- src/include/access/xlogdefs.h |2 + src/include/replication/logical.h | 22 ++ 10 files changed, 100 insertions(+), 30 deletions(-) create mode 100644 src/backend/replication/logical/logical.c create mode 100644 src/include/replication/logical.h diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c index 3cc2bfa..79ec19a 100644 --- a/src/backend/access/transam/xact.c +++ b/src/backend/access/transam/xact.c @@ -36,8 +36,9 @@ #include libpq/be-fsstubs.h #include miscadmin.h #include pgstat.h -#include replication/walsender.h +#include replication/logical.h #include replication/syncrep.h +#include replication/walsender.h #include storage/lmgr.h #include storage/predicate.h #include storage/procarray.h @@ -4545,12 +4546,13 @@ xactGetCommittedChildren(TransactionId **ptr) * actions for which the order of execution is critical. */ static void -xact_redo_commit_internal(TransactionId xid, XLogRecPtr lsn, - TransactionId *sub_xids, int nsubxacts, - SharedInvalidationMessage *inval_msgs, int nmsgs, - RelFileNode *xnodes, int nrels, - Oid dbId, Oid tsId, - uint32 xinfo) +xact_redo_commit_internal(TransactionId xid, RepNodeId originating_node, + XLogRecPtr lsn, XLogRecPtr origin_lsn, + TransactionId *sub_xids, int nsubxacts, + SharedInvalidationMessage *inval_msgs, int nmsgs, + RelFileNode *xnodes, int nrels, + Oid dbId, Oid tsId, + uint32 xinfo) { TransactionId max_xid; int i; @@ -4659,8 +4661,8 @@ xact_redo_commit_internal(TransactionId xid, XLogRecPtr lsn, * Utility function to call xact_redo_commit_internal after breaking down xlrec */ static void -xact_redo_commit(xl_xact_commit *xlrec, - TransactionId xid, XLogRecPtr lsn) +xact_redo_commit(xl_xact_commit *xlrec, RepNodeId originating_node, + TransactionId xid, XLogRecPtr lsn) { TransactionId *subxacts; SharedInvalidationMessage *inval_msgs; @@ -4670,27 +4672,26 @@ xact_redo_commit(xl_xact_commit *xlrec, /* invalidation messages array follows subxids */ inval_msgs = (SharedInvalidationMessage *) (subxacts[xlrec-nsubxacts]); - xact_redo_commit_internal(xid, lsn, subxacts, xlrec-nsubxacts, - inval_msgs, xlrec-nmsgs, - xlrec-xnodes, xlrec-nrels, - xlrec-dbId, - xlrec-tsId, - xlrec-xinfo); + xact_redo_commit_internal(xid, originating_node, lsn, xlrec-origin_lsn, + subxacts, xlrec-nsubxacts, inval_msgs, + xlrec-nmsgs, xlrec-xnodes, xlrec-nrels, + xlrec-dbId, xlrec-tsId, xlrec-xinfo); } /* * Utility function to call xact_redo_commit_internal for compact form of message. */ static void -xact_redo_commit_compact(xl_xact_commit_compact *xlrec, - TransactionId xid, XLogRecPtr lsn) +xact_redo_commit_compact(xl_xact_commit_compact *xlrec, RepNodeId originating_node, + TransactionId xid, XLogRecPtr lsn) { - xact_redo_commit_internal(xid, lsn, xlrec-subxacts, xlrec-nsubxacts, - NULL, 0,
Re: [HACKERS] uncataloged tables are a vestigial husk
Robert Haas robertmh...@gmail.com writes: On Wed, Jun 13, 2012 at 1:13 PM, Tom Lane t...@sss.pgh.pa.us wrote: The attached patch cleans it up by removing RELKIND_UNCATALOGED and teaching RelationBuildLocalRelation() to set the relkind itself. I think there are probably some places to fix in the docs too. catalogs.sgml doesn't include it in the list of possible relkinds, since it never hits the disk. And grep -i uncatalog doc/src/sgml comes up empty. Where else should I be looking? Huh. Okay, there probably isn't anyplace then. I'm surprised we didn't list it in catalogs.sgml, though. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [RFC][PATCH] Logical Replication/BDR prototype and architecture
Hi, The patch as of yet doesn't contain how you actually can use the prototype... Obviously at this point its not very convenient: I have two config files: Node 1: port = 5501 wal_level = logical max_wal_senders = 10 wal_keep_segments = 200 multimaster_conninfo = 'port=5502 host=/tmp' multimaster_node_id = 1 Node 2: port = 5502 wal_level = logical max_wal_senders = 10 wal_keep_segments = 200 multimaster_conninfo = 'port=5501 host=/tmp' multimaster_node_id = 2 after initdb'ing the first cluster (initdb required): $ ~/src/postgresql/build/assert/src/backend/postgres -D ~/tmp/postgres/bdr/1/datadir/ -c config_file=~/tmp/postgres/bdr/1/postgresql.conf -c hba_file=~/tmp/postgres/bdr/1/pg_hba.conf -c ident_file=~/tmp/postgres/bdr/1/pg_ident.conf $ psql -p 5501 -U andres postgres CREATE TABLE data(id serial primary key, data bigint); ALTER SEQUENCE data_id_seq INCREMENT 2; SELECT setval('data_id_seq', 1); shutdown cluster $ rsync -raxv --delete /home/andres/tmp/postgres/bdr/1/datadir/* /home/andres/tmp/postgres/bdr/2/datadir start both cluster which should sync after some output. $ psql -p 5501 -U andres postgres SELECT setval('data_id_seq', 2); -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [RFC][PATCH] BDR Prototype startup instructions (not prematurely sent this time)
Hi, The patch as of yet doesn't contain how you actually can use the prototype... Obviously at this point its not very convenient: I have two config files: Node 1: port = 5501 wal_level = logical max_wal_senders = 10 wal_keep_segments = 200 multimaster_conninfo = 'port=5502 host=/tmp' multimaster_node_id = 1 Node 2: port = 5502 wal_level = logical max_wal_senders = 10 wal_keep_segments = 200 multimaster_conninfo = 'port=5501 host=/tmp' multimaster_node_id = 2 after initdb'ing the first cluster (initdb required): $ ~/src/postgresql/build/assert/src/backend/postgres -D ~/tmp/postgres/bdr/1/datadir/ -c config_file=~/tmp/postgres/bdr/1/postgresql.conf -c hba_file=~/tmp/postgres/bdr/1/pg_hba.conf -c ident_file=~/tmp/postgres/bdr/1/pg_ident.conf $ psql -p 5501 -U andres postgres CREATE TABLE data(id serial primary key, data bigint); ALTER SEQUENCE data_id_seq INCREMENT 2; SELECT setval('data_id_seq', 1); shutdown cluster $ rsync -raxv --delete /home/andres/tmp/postgres/bdr/1/datadir/* /home/andres/tmp/postgres/bdr/2/datadir start both clusters which should sync after some output. $ psql -p 5501 -U andres postgres INSERT INTO data(data) VALUES (random()*1) RETURNING *; INSERT INTO data(data) VALUES (random()*1) RETURNING *; INSERT INTO data(data) VALUES (random()*1) RETURNING *; INSERT INTO data(data) VALUES (random()*1) RETURNING *; $ psql -p 5502 -U andres postgres SELECT setval('data_id_seq', 2); INSERT INTO data(data) VALUES (random()*1) RETURNING *; INSERT INTO data(data) VALUES (random()*1) RETURNING *; INSERT INTO data(data) VALUES (random()*1) RETURNING *; INSERT INTO data(data) VALUES (random()*1) RETURNING *; INSERT INTO data(data) VALUES (random()*1) RETURNING *; postgres=# SELECT * FROM data; id | data +-- 3 | 396 5 | 2522 7 | 275 9 | 9632 11 | 1176 4 | 6977 6 | 7339 8 | 6383 10 | 4600 12 | 8878 14 | 1987 (11 rows) $ psql -p 5501 -U andres postgres id | data +-- 3 | 396 5 | 2522 7 | 275 9 | 9632 11 | 1176 4 | 6977 6 | 7339 8 | 6383 10 | 4600 12 | 8878 14 | 1987 (11 rows) DELETE FROM data; $ psql -p 5502 -U andres postgres SELECT * FROM data; id | data +-- (0 rows) There is not much more you can do at this time. You can run an unmodified pgbench if you do it before the rsync if you prohibit vacuum (-n) which actually does a truncate on the history table. Hope that helps the brave sould already trying this, Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Mark JSON error detail messages for translation.
On Wed, Jun 13, 2012 at 1:04 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: I like some of these changes - in particular, the use of errcontext(), but some of them still seem off. ! DETAIL: Token ' is invalid. ! CONTEXT: JSON data, line 1: '... This doesn't make sense to me. Well, the input is two single quotes and the complaint is that the first one of them doesn't constitute a valid JSON token. What would you expect to see instead? Oh, I see. Another thing that could be done here is to print the rest of the line, rather than ..., if there's not very much of it. I'm not sure that's an improvement though. The advantage of the proposed design is that the point where the excerpt ends is exactly where the error was detected; lacking an error cursor, I don't see how else to present that info. OK. ! DETAIL: Character with value 0x0a must be escaped. ! CONTEXT: JSON data, line 1: abc ! ... This seems an odd way to present this, especially if the goal is to NOT include the character needing escaping in the log unescaped, which I thought was the point of saying 0x0a. Do you think it would be better to present something that isn't what the user typed? Again, I don't see an easy improvement here. If you don't want newlines in the logged context, what will we do for something like {foo: { bar:44 } ] There basically isn't any useful context to present unless we are willing to back up several lines, and I don't think it'll be more readable if we escape all the newlines. Hmm. If your plan is to trace back to the opening brace you were expecting to match, I don't think that's going to work either. What if there are three pages (or 3MB) of data in between? The examples in the regression tests are not really designed to show cases where this type of error reporting is an improvement, because there's hardly any context around the error sites. Yeah, true. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: [COMMITTERS] pgsql: Add ERROR msg for GLOBAL/LOCAL TEMP is not yet implemented
On Mon, Jun 11, 2012 at 09:18:39PM -0400, Robert Haas wrote: I guess the remaining question is whether to do it only for LOCAL TEMP tables or also for GLOBAL TEMP ones. A survey of what other products do might be of some value. Thanks for investigating. Sybase ASE, which I include only because it is one of the few systems that actually support the CREATE LOCAL TEMPORARY TABLE syntax, appears to give them the same semantics as our existing temp tables: session local. Sybase ASE also includes two kinds of global temporary tables: non-shared - i.e. permanent tables with session-local contents - and shared - i.e. what we call unlogged tables, except that they don't survive a clean shutdown. http://dcx.sybase.com/1200/en/dbreference/create-local-temporary-table-statement.html http://dcx.sybase.com/1200/en/dbusage/temporary-tables.html FWIW, that's SQL Anywhere, not ASE. ASE is closer to Microsoft SQL Server in this area. So I can't find any evidence that any database product in existence uses CREATE LOCAL TEMPORARY TABLE to mean anything other than what CREATE TEMPORARY TABLE does in PostgreSQL, and there's at least one where it means exactly the thing that we do. Given that, I am inclined to think that we should only warn about using GLOBAL TEMP, and not LOCAL TEMP. It seems needlessly hard-headed to warn about using a syntax for which there are no existing, incompatible implementations and for which we have no plans to change the existing semantics. YMMV, of course. Oracle Rdb implemented the SQL standard behavior: http://www.oracle.com/technetwork/products/rdb/implementing-procedure-result-sets-091225.html So, one implementation mirrors our current CREATE LOCAL TEMPORARY TABLE semantics and another implements SQL standard semantics. No classic migration source product implements the syntax at all. Given that, I think we should make the decision independent of migration concerns. Our continuing users will be quicker to accept the need to remove GLOBAL than LOCAL; the table is not at all global but is, informally, local. Future users will benefit from a self-consistent system. Though it's difficult to quantify, future users also benefit from a system following the SQL standard. Given that, how about warning on GLOBAL only but having the documentation equally discourage use of both? nm -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH 14/16] Add module to apply changes from an apply-cache using low-level functions
On Wed, Jun 13, 2012 at 7:28 AM, Andres Freund and...@2ndquadrant.com wrote: From: Andres Freund and...@anarazel.de We decided to use low level functions to do the apply instead of producing sql statements containing the data (or using prepared statements) because both, the text conversion and the full executor overhead aren't introduce a significant overhead which is unneccesary if youre using the same version of pg on the same architecture. There are loads of use cases though that require different methods of applyin though - so the part doing the applying from an ApplyCache is just a bunch of well abstracted callbacks getting passed all the required knowledge to change the data representation into other formats. It's important to make sure that it's not going to be *too* difficult to jump through the hoops necessary to apply changes on a different version. While pg_upgrade has diminished the need to use replication to handle cross-version/architecture upgrades, I don't think it has brought that to zero. One other complication I'll observe... The assumption is being made that UPDATE/DELETE will be handled via The Primary Key. For the most part, I approve of this. Once upon a time, Slony had a misfeature where you could tell it to add in a surrogate primary key, and that caused no end of trouble. However, the alternative, that *does* seem to work alright, is to allow selecting a candidate primary key, that is, a set of columns that have UNIQUE + NOT NULL characteristics. I could see people having a problem switching over to use this system if they MUST begin with a 'Right Proper Primary Key.' If they start with a system with a 2TB table full of data that lacks that particular constraint, that could render them unable to use the facility. Missing: - TOAST handling. For physical apply not much needs to be done because the toast inserts will have been made beforehand. There needs to be an option in ApplyCache that helps reassembling TOAST datums to make it easier to write apply modules which convert to text. Dumb question: Is it possible that two nodes would have a different idea as to which columns need to get toasted? I should think it possible for nodes to be configured with different values for TOAST policies, and while it's likely pretty dumb to set them to have different handlings, it would seem undesirable to not bother looking, and find the backend crashing due to an un-noticed mismatch. -- When confronted by a difficult problem, solve it by reducing it to the question, How would the Lone Ranger handle this? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Mark JSON error detail messages for translation.
Robert Haas robertmh...@gmail.com writes: On Wed, Jun 13, 2012 at 1:04 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: ! DETAIL: Character with value 0x0a must be escaped. ! CONTEXT: JSON data, line 1: abc ! ... This seems an odd way to present this, especially if the goal is to NOT include the character needing escaping in the log unescaped, which I thought was the point of saying 0x0a. Do you think it would be better to present something that isn't what the user typed? Again, I don't see an easy improvement here. If you don't want newlines in the logged context, what will we do for something like {foo: { bar:44 } ] Hmm. If your plan is to trace back to the opening brace you were expecting to match, I don't think that's going to work either. What if there are three pages (or 3MB) of data in between? No, that's not the proposal; I only anticipate printing a few dozen characters of context. But that could still mean printing something like DETAIL: expected , or }, but found ]. CONTEXT: JSON data, line 123: ...bar:44 } ] which I argue is much more useful than just seeing the ]. So the question is whether it's still as useful if we mangle the whitespace. I'm thinking it's not. We don't mangle whitespace when printing SQL statements into the log, anyway. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Add ERROR msg for GLOBAL/LOCAL TEMP is not yet implemented
Noah Misch n...@leadboat.com writes: Given that, how about warning on GLOBAL only but having the documentation equally discourage use of both? Yeah, that's about what I was thinking, too. Any thoughts on the wording of the WARNING message? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH 14/16] Add module to apply changes from an apply-cache using low-level functions
On Wednesday, June 13, 2012 08:50:42 PM Christopher Browne wrote: On Wed, Jun 13, 2012 at 7:28 AM, Andres Freund and...@2ndquadrant.com wrote: From: Andres Freund and...@anarazel.de We decided to use low level functions to do the apply instead of producing sql statements containing the data (or using prepared statements) because both, the text conversion and the full executor overhead aren't introduce a significant overhead which is unneccesary if youre using the same version of pg on the same architecture. There are loads of use cases though that require different methods of applyin though - so the part doing the applying from an ApplyCache is just a bunch of well abstracted callbacks getting passed all the required knowledge to change the data representation into other formats. It's important to make sure that it's not going to be *too* difficult to jump through the hoops necessary to apply changes on a different version. I aggree. But I don't see it as a feature of the first version for the moment. Getting a base set of features into 9.3 is going to be hard enough as-is. But I think there is enough interest from all sides to make that possible cross- version. While pg_upgrade has diminished the need to use replication to handle cross-version/architecture upgrades, I don't think it has brought that to zero. Aggreed. One other complication I'll observe... The assumption is being made that UPDATE/DELETE will be handled via The Primary Key. For the most part, I approve of this. Once upon a time, Slony had a misfeature where you could tell it to add in a surrogate primary key, and that caused no end of trouble. However, the alternative, that *does* seem to work alright, is to allow selecting a candidate primary key, that is, a set of columns that have UNIQUE + NOT NULL characteristics. I could see people having a problem switching over to use this system if they MUST begin with a 'Right Proper Primary Key.' If they start with a system with a 2TB table full of data that lacks that particular constraint, that could render them unable to use the facility. It wouldn't need that much code to allow candidate keys. The data representation in the catalogs is a bit unfriendly for that, but there has been talk about changing that for some time now. I am not convinced that its worth the cpu cycles though. Btw, you can convert a unique key to a primary key since 9.1. The unique key previously can be created CONCURRENTLY. Missing: - TOAST handling. For physical apply not much needs to be done because the toast inserts will have been made beforehand. There needs to be an option in ApplyCache that helps reassembling TOAST datums to make it easier to write apply modules which convert to text. Dumb question: Is it possible that two nodes would have a different idea as to which columns need to get toasted? I should think it possible for nodes to be configured with different values for TOAST policies, and while it's likely pretty dumb to set them to have different handlings, it would seem undesirable to not bother looking, and find the backend crashing due to an un-noticed mismatch. I don't think it should be possible to configure the toast configurations differently if you use the binary apply mode. But even if it were a value which is toasted although the local policy says it should not be wouldn't cause any problems as far as I can see. The one problem that could cause problems for that are different page sizes et al, but that needs to be prohibited anyway. Andres -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Backup docs
Magnus Hagander mag...@hagander.net writes: I would like to see that page changed to list pg_basebackup as the default way of doing base backups, and then list the manual way as an option if you need more flexibility. +1 I'd also like to add pg_basebackup -x under standalone hot backups, again as the main option. +1 I also wonder if we need a tl;dr; section of that whole page that just goes through *what to do*, rather than why we do it? Of course not removing the details, just showing the simplest case in, um, a simpler way? +1 Come to think about it, that is the perfect occasion to have the tutorial open itself up to dealing with admin tasks, right? Please let's apply that documentation patch to 9.2 too. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [COMMITTERS] pgsql: Send new protocol keepalive messages to standby servers.
Tom Lane t...@sss.pgh.pa.us writes: I thought I already pointed that out, but: we have *extensions*. What we don't have is a convenient method of dealing with functions that need to be migrated across extensions, or from an extension to core, between one major release and the next. It would clearly be nice to have that someday, but we don't have it now. Designing on the assumption that 9.3 Well, my patch for 9.2 called Finer Extension Dependencies was all about supporting that. The idea is to be able to name a set of functions (or other objects) then setup a dependency graph towards that arbitrary name. Then it's possible to migrate that named set of objects from an extension to another one, and it's possible for core to publish a list of provided names of set of objects provided. https://commitfest.postgresql.org/action/patch_view?id=727 Inspiring my work from some other development facilities I enjoy spending my time with, I called that set a feature and added ways for extensions to provide and require them, like has been done in some lisps for more than 3 decades now. I'm not wedded to those terms, which have been bringing confusion on the table before, please suggest some other ones if you think you like the feature. I'm going to have this patch in the next CF so that we can talk about it again, as I think it is actually designed to help us fix the problem here. The only missing part in the patch is allowing for the core to declare a set of set of objects (a set of features in its current terminology) that it brings on the table. Such a list already exists though, and is using the same terminology as in my patch: http://www.postgresql.org/docs/9.2/static/features-sql-standard.html We wouldn't only publish the standard compliant feature list with such a mechanism though or it would be quite useless for our operations here. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: [COMMITTERS] pgsql: Add ERROR msg for GLOBAL/LOCAL TEMP is not yet implemented
On Wed, Jun 13, 2012 at 02:56:58PM -0400, Tom Lane wrote: Noah Misch n...@leadboat.com writes: Given that, how about warning on GLOBAL only but having the documentation equally discourage use of both? Yeah, that's about what I was thinking, too. Any thoughts on the wording of the WARNING message? My patch used GLOBAL is deprecated in temporary table creation, which still seems fine to me. Here's an update based on this discussion. *** a/doc/src/sgml/ref/create_table.sgml --- b/doc/src/sgml/ref/create_table.sgml *** *** 163,169 CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } | UNLOGGED ] TABLE [ IF NOT EXI para Optionally, literalGLOBAL/literal or literalLOCAL/literal can be written before literalTEMPORARY/ or literalTEMP/. ! This makes no difference in productnamePostgreSQL/, but see xref linkend=sql-createtable-compatibility endterm=sql-createtable-compatibility-title. /para --- 163,170 para Optionally, literalGLOBAL/literal or literalLOCAL/literal can be written before literalTEMPORARY/ or literalTEMP/. ! This presently makes no difference in productnamePostgreSQL/ ! and is deprecated; see xref linkend=sql-createtable-compatibility endterm=sql-createtable-compatibility-title. /para *** *** 1310,1316 CREATE TABLE employees OF employee_type ( productnamePostgreSQL/productname does not have. For compatibility's sake, productnamePostgreSQL/productname will accept the literalGLOBAL/literal and literalLOCAL/literal keywords ! in a temporary table declaration, but they have no effect. /para para --- 1311,1318 productnamePostgreSQL/productname does not have. For compatibility's sake, productnamePostgreSQL/productname will accept the literalGLOBAL/literal and literalLOCAL/literal keywords ! in a temporary table declaration, but they have no effect. Use in new ! applications is discouraged. /para para *** a/src/backend/parser/gram.y --- b/src/backend/parser/gram.y *** *** 2514,2521 OptTemp: TEMPORARY { $$ = RELPERSISTENCE_TEMP; } | TEMP { $$ = RELPERSISTENCE_TEMP; } | LOCAL TEMPORARY { $$ = RELPERSISTENCE_TEMP; } | LOCAL TEMP{ $$ = RELPERSISTENCE_TEMP; } ! | GLOBAL TEMPORARY { $$ = RELPERSISTENCE_TEMP; } ! | GLOBAL TEMP { $$ = RELPERSISTENCE_TEMP; } | UNLOGGED { $$ = RELPERSISTENCE_UNLOGGED; } | /*EMPTY*/ { $$ = RELPERSISTENCE_PERMANENT; } ; --- 2514,2533 | TEMP { $$ = RELPERSISTENCE_TEMP; } | LOCAL TEMPORARY { $$ = RELPERSISTENCE_TEMP; } | LOCAL TEMP{ $$ = RELPERSISTENCE_TEMP; } ! | GLOBAL TEMPORARY ! { ! ereport(WARNING, ! (errmsg(GLOBAL is deprecated in temporary table creation), ! parser_errposition(@1))); ! $$ = RELPERSISTENCE_TEMP; ! } ! | GLOBAL TEMP ! { ! ereport(WARNING, ! (errmsg(GLOBAL is deprecated in temporary table creation), ! parser_errposition(@1))); ! $$ = RELPERSISTENCE_TEMP; ! } | UNLOGGED { $$ = RELPERSISTENCE_UNLOGGED; } | /*EMPTY*/ { $$ = RELPERSISTENCE_PERMANENT; } ; *** *** 8930,8940 OptTempTableName: --- 8942,8958 } | GLOBAL TEMPORARY opt_table qualified_name { + ereport(WARNING, + (errmsg(GLOBAL is deprecated in temporary table creation), + parser_errposition(@1))); $$ = $4; $$-relpersistence = RELPERSISTENCE_TEMP;
Re: [HACKERS] temporal support patch
There would be no problem to make my solution compatible with SQL 2011, but the standard is not freely available. Can anybody provide me with this standard? 2012/5/20 Pavel Stehule pavel.steh...@gmail.com Hello 2012/5/18 Miroslav Šimulčík simulcik.m...@gmail.com: Hello. SQL 2011 standard wasn't available in time I started this project so I built my implementation on older standards TSQL2 and SQL/Temporal, that were only available. None of these were accepted by ANSI/ISO commissions however. There is different syntax in SQL 2011 and it looks like one that IBM DB2 had been using even before this standard were published. So my implementation differs in syntax, but features are same as stated in system versioned tables part of slideshow. I would to see temporal functionality in pg, but only in SQL 2011 syntax. Using syntax from deprecated proposals has no sense. I am not sure so history table concept is best from performance view - it is simpler for implementation, but you duplicate all indexes - there will be lot of redundant fields in history table. A important query is difference in cost for some non trivial query for actual data and same query for historic data. Regards Pavel Stehule Regards Miroslav Simulcik 2012/5/17 Pavel Stehule pavel.steh...@gmail.com Hello what is conformance of your solution with temporal extension in ANSI SQL 2011 http://www.slideshare.net/CraigBaumunk/temporal-extensions-tosql20112012010438 Regards Pavel Stehule 2012/5/16 Miroslav Šimulčík simulcik.m...@gmail.com: Hi all, as a part of my master's thesis I have created temporal support patch for PostgreSQL. It enables the creation of special temporal tables with entries versioning. Modifying operations (UPDATE, DELETE, TRUNCATE) on these tables don't cause permanent changes to entries, but create new versions of them. Thus user can easily get to the past states of the table. Basic information on temporal databases can be found on http://en.wikipedia.org/wiki/Temporal_database In field of temporal databases, there are only proprietary solution available. During the analysis I found these: - IBM DB2 10 for z/OS - Oracle 11g Workspace Manager - Teradata Database 13.10 Primary goal of my work was the creation of opensource solution, that is easy to use and is backward compatible with existing applications, so that the change of the original tables to temporal ones, does not require changes to applications that work with them. This patch is built on standard SQL/Temporal with some minor modifications inspired by commercial temporal database systems. Currently it only deals with transaction time support. Here is simple description on how it works: 1. user can create transaction time table using modified CREATE TABLE command: CREATE TABLE person(name varchar(50)) AS TRANSACTIONTIME; This command automatically creates all objects required for transaction time support: List of relations Schema | Name | Type | Owner +--+--+-- public | person | table| tester public | person__entry_id_seq | sequence | tester public | person_hist | table| postgres Table public.person Column |Type | Modifiers +-+-- name | character varying(50) | _entry_id | bigint | not null default nextval('person__entry_id_seq'::regclass) _sys_start | timestamp without time zone | not null default clock_timestamp() _sys_end | timestamp without time zone | not null default '294276-12-31 23:59:59.99'::timestamp without time zone Indexes: person__entry_id_idx btree (_entry_id) person__sys_start__sys_end_idx btree (_sys_start, _sys_end) Table public.person_hist Column |Type | Modifiers +-+--- name | character varying(50) | _entry_id | bigint | not null _sys_start | timestamp without time zone | not null _sys_end | timestamp without time zone | not null Indexes: person_hist__entry_id_idx btree (_entry_id) person_hist__sys_start__sys_end_idx btree (_sys_start, _sys_end) Table person stores current versions of entries. 3 additional columns are added: _entry_id - id of entry. It groups together
Re: [HACKERS] [PATCH] Support for foreign keys with arrays
Hi Noah, Il 10/06/12 22:53, Noah Misch ha scritto: This has bitrotted; please refresh. Also, please evaluate Peter's feedback: http://archives.postgresql.org/message-id/1333693277.32606.9.ca...@vanquo.pezone.net Our goal is to work on this patch from the next commit fest. What we are about to do for this commit fest is to split the previous patch and send a small one just for the array_remove() and array_replace() functions. Then we will sit down and organise the development of the feature according to Peter's feedback. It is important indeed that we find a commonly accepted terminology and syntax for this feature. I hope this sounds like a reasonable plan. Thank you. Cheers, Gabriele -- Gabriele Bartolini - 2ndQuadrant Italia PostgreSQL Training, Services and Support gabriele.bartol...@2ndquadrant.it | www.2ndQuadrant.it -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] temporal support patch
2012/5/30 Jim Nasby j...@nasby.net On 5/18/12 2:06 AM, Miroslav Šimulčík wrote: - no data redundancy - in my extension current versions of entries are stored only once in original table (in table_log - entries are inserted to both original and log table) That's not necessarily a benefit... it makes querying for both history *and* current data a lot more complex. Table inheritance might be an elegant solution to that, but I doubt you could just bolt that on top of what you've created. Yes, querying for history data is more complex, but i focused on preserving the performance of current queries. That's the reason why I use separate table for old versions. Table inheritance is very good idea and it will not require so much effort to use it in my solution. Currently, when user queries whole history of entries, table reference in FROM clause is replaced with subselect, which access data in both tables. For example when user executes command: NONSEQUENCED TRANSACTIONTIME SELECT * FROM person; The actually executed command is: SELECT * FROM (SELECT * FROM person UNION ALL SELECT * FROM person_hist) as person Use of table inheritance can make things simpler and more elegant, but I'm not sure about how it affect performance. Will it cause gain in performance? The timestamp fields need to have timezone info. If you change the timezone for a connection you will get inconsistent results without it. _sys_end should either be NULLable or if it's going to have a magic value that magic value should be Infinity: Good point. I will use timestamp with timezone and value Infinity instead of max timestamp value
Re: [HACKERS] Is cachedFetchXidStatus provably valid?
Merlin Moncure mmonc...@gmail.com writes: It's probably an academic concern, but what happens if a backend saves off cachedFetchXidStatus and then sleeps for a very long time. During that time an xid wraparound happens and the backend wakes up and happens to read another unhinted tuple with the same xid and a different commit status. This is obviously incredibly unlikely, but shouldn't cachedFetchXid be cleared at some appropriate point -- perhaps end of transaction? Well, aside from what the odds might be of hitting the case if you did manage to sleep through an XID wraparound, I think it's impossible for a backend to sleep that long, because of cache inval signals. (Or, to put it differently, a backend has typically got a whole lot of XIDs cached within tuples in its syscaches. cachedFetchXidStatus is the least of its worries if it fails to engage in cache inval activity.) If we had a multiple-entry cache in place of the single-entry cache, I think this would be a more realistic concern. You'd need some way to flush old entries from that cache, rather than being able to expect that the single entry would get overwritten with newer values anytime something happened. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] temporal support patch
* I'd very much like to see you make use of Range Types from 9.2; in particular, TSTZRANGE would be much better than holding two timestamps. If a standard requires you to display two timestamps in certain situations, perhaps you could use ranges internally and display the boundaries as timestamps when needed. I agree, new range types will be ideal for this * There is other useful information that could be recorded, such as the user who inserted/updated/deleted the record. Yes I considered addition of user ID and transaction ID columns, because it can be useful in some cases (for example to find all changes made by transaction). However it wasn't necessary, so i omitted it. It can be easily added. * For some purposes, it's very useful to keep track of the columns that changed. For instance, a query like show me any time a salary was changed over the last month (or some other rare event) would be very slow to run if there was not some explicit annotation on the historical records (e.g. a columns changed bitmap or something). Another useful feature. I can take a look on it * As Jim mentioned, it might make sense to use something resembling inheritance so that selecting from the historical table includes the current data (but with no upper bound for the range). See reply to Jim's post. * It might make sense to hammer out as many of the details as we can with an extension. For instance, exactly what options will be available, what data types will be used, what objects will be created, the trigger code, etc. Then, it will be more obvious exactly what we need to add extra core support for (e.g. if we are going to use some inheritance like mechanism), and what we need to add syntax sugar for. I recommend that you start posting more detailed designs on http://wiki.postgresql.org In which section of wiki can I post detailed design of my solution? If you already have code, feel free to submit it for the next commitfest ( http://commitfest.postgresql.org ), but this is a relatively large project, so it will most likely take several commitfest cycles. I have working patch for postgresql version 9.0.4, but it needs refactoring before i can submit it, because some parts don't meet formatting requirements yet. And yes, changes are large, so it will be better to discuss design first and then deal with code. Do you insist on compatibility with standard SQL 2011 as Pavel wrote?
Re: [HACKERS] temporal support patch
On Wed, Jun 13, 2012 at 4:10 PM, Miroslav Šimulčík simulcik.m...@gmail.com wrote: I have working patch for postgresql version 9.0.4, but it needs refactoring before i can submit it, because some parts don't meet formatting requirements yet. And yes, changes are large, so it will be better to discuss design first and then deal with code. Do you insist on compatibility with standard SQL 2011 as Pavel wrote? Standards compliance is always going to make things easier in terms of gaining community acceptance if you're targeting in core adoption. At the very least it will remove one barrier although you might be in for a slog for other reasons. You may not have known this, but postgres had a time travel feature waaay back in the day (see: http://www.postgresql.org/docs/6.3/static/c0503.htm). It was removed for performance reasons and the first thing I'm wondering is how your stuff performs in various scenarios and various other interesting things. Also, +1 on use of range types Anyways, thanks for submitting and good luck! merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Is cachedFetchXidStatus provably valid?
On Wed, Jun 13, 2012 at 3:55 PM, Tom Lane t...@sss.pgh.pa.us wrote: Merlin Moncure mmonc...@gmail.com writes: It's probably an academic concern, but what happens if a backend saves off cachedFetchXidStatus and then sleeps for a very long time. During that time an xid wraparound happens and the backend wakes up and happens to read another unhinted tuple with the same xid and a different commit status. This is obviously incredibly unlikely, but shouldn't cachedFetchXid be cleared at some appropriate point -- perhaps end of transaction? Well, aside from what the odds might be of hitting the case if you did manage to sleep through an XID wraparound, I think it's impossible for a backend to sleep that long, because of cache inval signals. (Or, to put it differently, a backend has typically got a whole lot of XIDs cached within tuples in its syscaches. cachedFetchXidStatus is the least of its worries if it fails to engage in cache inval activity.) If we had a multiple-entry cache in place of the single-entry cache, I think this would be a more realistic concern. You'd need some way to flush old entries from that cache, rather than being able to expect that the single entry would get overwritten with newer values anytime something happened. Right -- thanks for that -- I figured as much. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Add ERROR msg for GLOBAL/LOCAL TEMP is not yet implemented
Noah Misch n...@leadboat.com writes: On Wed, Jun 13, 2012 at 02:56:58PM -0400, Tom Lane wrote: Any thoughts on the wording of the WARNING message? My patch used GLOBAL is deprecated in temporary table creation, which still seems fine to me. Here's an update based on this discussion. Applied with some further wordsmithing on docs and comments. We can still tweak this if anyone objects, of course, but I thought it'd probably save work to get it in before the branch. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Support for foreign keys with arrays
On Wed, Jun 13, 2012 at 10:12:18PM +0200, Gabriele Bartolini wrote: Our goal is to work on this patch from the next commit fest. What we are about to do for this commit fest is to split the previous patch and send a small one just for the array_remove() and array_replace() functions. Then we will sit down and organise the development of the feature according to Peter's feedback. It is important indeed that we find a commonly accepted terminology and syntax for this feature. I hope this sounds like a reasonable plan. Thank you. Sounds fine. The 2012-01 CF entry for this patch has been moved to the 2012-06 CF. Please mark that entry Returned with Feedback and create a new entry for the subset you're repackaging for this CommitFest. Thanks, nm -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Ability to listen on two unix sockets
On ons, 2012-06-13 at 15:14 +0200, Honza Horak wrote: Thus, I'd like to ask if anybody is aware of any issue connected with simply patching pg_config_manual.h, same as Debian does it already? For example, is there any piece of software, that simply rely on /tmp location of the socket and doesn't use libpg for the communication? If you're asking, should Red Hat/Fedora simply fix the issue locally by patching pg_config_manual.h, then yes, that would work, see Debian, but it has its annoyances, especially with additional software compiled from source, or third-party binary installers. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Ability to listen on two unix sockets
On mån, 2012-06-11 at 18:07 -0400, Tom Lane wrote: Peter Eisentraut pete...@gmx.net writes: On sön, 2012-06-10 at 17:24 -0400, Robert Haas wrote: Why would that matter? If you configure M ports and N Unix socket locations, you get M*N actual sockets created. ...I *seriously* doubt that this is the behavior anyone wants. Creating M sockets per directory seems patently silly. How else would it work? If I say, syntax aside, listen on ports 5432 and 5433, and use socket directories /tmp and /var/run/postgresql, then a libpq-using client would expect to be able to connect using This argument seems quite circular to me: you are assuming that we will adopt exactly the behavior that Robert is questioning. What would make more sense to me is (1) there is still a *single* port parameter, which is what we use for things like shared memory keys; (2) listen_addresses (and the hypothetical socket_directories list) grows the ability to specify a port number in any list element. The primary port number parameter sets the default. So for instance port = 5432 listen_addresses = '*, 127.0.0.1:5433' results in listening on *:5432 and 127.0.0.1:5433. So you do need to create M*N sockets. I don't really see a problem with that. I do: first, it's a lotta sockets, and second, it's not real hard to foresee cases where somebody actively doesn't want that cross-product. Well, it's fine if we provide ways not to have the cross-product, but there should also be an easy way to get it. I can easily see cases in systems I have administered where I would have liked to use two unix sockets, two IP sockets, and two ports. Maybe I actually would have needed only 7 out of those 8 sockets, but it's far easier to configure, document, and explain if I just set up all 8 of them. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: [COMMITTERS] pgsql: Add ERROR msg for GLOBAL/LOCAL TEMP is not yet implemented
On Wed, Jun 13, 2012 at 05:50:36PM -0400, Tom Lane wrote: Applied with some further wordsmithing on docs and comments. We can still tweak this if anyone objects, of course, but I thought it'd probably save work to get it in before the branch. Thanks. The SQL standard also distinguishes between global and local temporary tables, where a local temporary table is only visible within a specific SQL module, though its definition is still shared across sessions. Since PostgreSQL does not support SQL modules, this distinction is not relevant in PostgreSQL. That new documentation paragraph describes the standard behavior for DECLARE LOCAL TEMPORARY TABLE. CREATE LOCAL TEMPORARY TABLE produces a table available to all modules but having one instance of its contents per module, per session. With GLOBAL, by contrast, all modules see the same table contents during a given session. nm -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Ability to listen on two unix sockets
On mån, 2012-06-11 at 18:45 -0500, Michael Nolan wrote: What about entries in pg_hba.conf? Will they need to be able to specify both the directory and the port number? I think the philosophy behind pg_hba.conf is that you distinguish client *systems*. So one client might be Kerberos-capable, or one client might be Windows and use SSPI. For that, it doesn't matter which of multiple ports or local sockets it uses. So I don't think there are any changes needed in this area. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Ability to listen on two unix sockets
On tis, 2012-06-12 at 14:47 +0200, Honza Horak wrote: This could be true in case all listening ports are equal, which I guess isn't a good idea, because IIUIC the port number as a part of the socket name is used for distinguish sockets of various postmasters in the same directory. In that scenario every client should know which port to connect and also which one is primary. Why? The client won't care which port is primary or secondary or whatever as long as it reaches the server. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Ability to listen on two unix sockets
On ons, 2012-06-13 at 15:25 +0200, Honza Horak wrote: It seems unix_socket_directory could be turned into list and probably renamed to unix_socket_directories, since it would be confusing if a list value is in singular. Well, it would also be annoying to rename the parameter name for a marginal change in functionality. On the other hand, we probably don't want to specify listening ports together with additional unix sockets in one configuration option, so it seems better to add a new configuration option to distinguish the primary listening port from additional ports. I think it would be wiser if you left the port business out of this. There are more issues hidden in there than you need to deal with. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Add ERROR msg for GLOBAL/LOCAL TEMP is not yet implemented
Noah Misch n...@leadboat.com writes: On Wed, Jun 13, 2012 at 05:50:36PM -0400, Tom Lane wrote: The SQL standard also distinguishes between global and local temporary tables, where a local temporary table is only visible within a specific SQL module, though its definition is still shared across sessions. Since PostgreSQL does not support SQL modules, this distinction is not relevant in PostgreSQL. That new documentation paragraph describes the standard behavior for DECLARE LOCAL TEMPORARY TABLE. CREATE LOCAL TEMPORARY TABLE produces a table available to all modules but having one instance of its contents per module, per session. With GLOBAL, by contrast, all modules see the same table contents during a given session. [ reads spec more closely... ] Yeah, you're right. Will fix, thanks. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] SP-GiST for ranges based on 2d-mapping and quad-tree
Hackers, attached patch implements quad-tree on ranges. Some performance results in comparison with current GiST indexing. Index creation is slightly slower. Probably, it need some investigation. Search queries on SP-GiST use much more pages. However this comparison can be not really correct, because SP-GiST can pin same buffer several times during one scan. In CPU search queries on SP-GiST seems to be slightly faster. Dramatical difference in column @ const query is thanks to 2d-mapping. test=# create index period_idx on period_test using gist (period); CREATE INDEX Time: 49141,148 ms test=# create index period_idx2 on period_test2 using spgist (period); CREATE INDEX Time: 59503,215 ms test=# explain (analyze, buffers) select * from period_test where period daterange('2011-12-10', '2011-12-11'); QUERY PLAN -- Bitmap Heap Scan on period_test (cost=471.63..13716.45 rows=11866 width=32) (actual time=107.258..147.139 rows=365451 loops=1) Recheck Cond: (period '[2011-12-10,2011-12-11)'::daterange) Buffers: shared read=7636 - Bitmap Index Scan on period_idx (cost=0.00..468.67 rows=11866 width=0) (actual time=106.697..106.697 rows=365451 loops=1) Index Cond: (period '[2011-12-10,2011-12-11)'::daterange) Buffers: shared read=3694 Total runtime: 160.670 ms (7 rows) test=# explain (analyze, buffers) select * from period_test2 where period daterange('2011-12-10', '2011-12-11'); QUERY PLAN - Bitmap Heap Scan on period_test2 (cost=414.52..13659.34 rows=11866 width=32) (actual time=88.793..129.608 rows=365451 loops=1) Recheck Cond: (period '[2011-12-10,2011-12-11)'::daterange) Buffers: shared hit=7623 read=7687 - Bitmap Index Scan on period_idx2 (cost=0.00..411.55 rows=11866 width=0) (actual time=88.285..88.285 rows=365451 loops=1) Index Cond: (period '[2011-12-10,2011-12-11)'::daterange) Buffers: shared hit=7623 read=3745 Total runtime: 142.971 ms (7 rows) test=# explain (analyze, buffers) select * from period_test where period @ daterange('2011-12-10', '2011-12-11'); QUERY PLAN --- --- Bitmap Heap Scan on period_test (cost=102.06..6140.66 rows=2373 width=32) (actual time=85.437..85.437 rows=0 loops=1) Recheck Cond: (period @ '[2011-12-10,2011-12-11)'::daterange) Buffers: shared read=3694 - Bitmap Index Scan on period_idx (cost=0.00..101.47 rows=2373 width=0) (actual time=85.431..85.431 rows=0 loops=1) Index Cond: (period @ '[2011-12-10,2011-12-11)'::daterange) Buffers: shared read=3694 Total runtime: 85.493 ms (7 rows) test=# explain (analyze, buffers) select * from period_test2 where period @ daterange('2011-12-10', '2011-12-11'); QUERY PLAN -- Bitmap Heap Scan on period_test2 (cost=88.95..6127.54 rows=2373 width=32) (actual time=18.666..18.666 rows=0 loops=1) Recheck Cond: (period @ '[2011-12-10,2011-12-11)'::daterange) Buffers: shared hit=1404 - Bitmap Index Scan on period_idx2 (cost=0.00..88.35 rows=2373 width=0) (actual time=18.660..18.660 rows=0 loops=1) Index Cond: (period @ '[2011-12-10,2011-12-11)'::daterange) Buffers: shared hit=1404 Total runtime: 18.717 ms (7 rows) test=# explain (analyze, buffers) select * from period_test where period @ daterange('2011-08-10', '2011-12-31'); QUERY PLAN --- Bitmap Heap Scan on period_test (cost=102.06..6140.66 rows=2373 width=32) (actual time=56.114..73.785 rows=118097 loops=1) Recheck Cond: (period @ '[2011-08-10,2011-12-31)'::daterange) Buffers: shared read=4125 - Bitmap Index Scan on period_idx (cost=0.00..101.47 rows=2373 width=0) (actual time=55.740..55.740 rows=118097 loops=1) Index Cond: (period @ '[2011-08-10,2011-12-31)'::daterange) Buffers: shared read=1391 Total runtime: 78.469 ms (7 rows) test=# explain (analyze, buffers) select * from period_test2 where period @ daterange('2011-08-10', '2011-12-31'); QUERY PLAN
Re: [HACKERS] Minimising windows installer password confusion
On 06/13/2012 05:10 PM, Dave Page wrote: The idea of storing the password in clear text in the registry gives me nervous twitches. Me too. It's horrible, and I really dislike the idea. I can't imagine that Microsoft don't have a better solution to this. I talked to some Microsoft people at an event yesterday, and they said that they just don't use completely isolated user accounts for services. Microsoft's services install into the three standard service access levels: LocalService NetworkService LocalSystem as mentioned: http://msdn.microsoft.com/en-us/library/ms143504.aspx http://msdn.microsoft.com/en-us/library/windows/desktop/ms686005(v=vs.85).aspx ... so maybe the answer is that we're trying to do it too UNIX-ish (ie: securely) and we should by default use the NetworkService, allowing users to change the service account if they want to as an advanced feature. Personally I think that'd be better than the current situation, which is not user friendly, and has a much lower squick-factor than storing passwords in the registry. This'd also solve issues with other Pg installs; we just switch smoothly over to installing in NetworkService by default, giving users a radiobox to switch to custom service user account where the name postgres is prefilled. -- Craig Ringer POST Newspapers 276 Onslow Rd, Shenton Park Ph: 08 9381 3088 Fax: 08 9388 2258 ABN: 50 008 917 717 http://www.postnewspapers.com.au/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Mark JSON error detail messages for translation.
Robert Haas robertmh...@gmail.com writes: ! DETAIL: Character with value 0x0a must be escaped. ! CONTEXT: JSON data, line 1: abc ! ... This seems an odd way to present this, especially if the goal is to NOT include the character needing escaping in the log unescaped, which I thought was the point of saying 0x0a. I thought of a simple way to address that objection for this particular case: we can just truncate the context display *at* the offending character, instead of *after* it. This is playing a bit fast and loose with the general principle that the context should end at the point of detection of the error; but given that the character in question is always unprintable, I think it's probably not going to bother anyone. I've gone ahead and committed this before branching, though I'm certainly still willing to entertain suggestions for further improvement. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Minimising windows installer password confusion
On 06/13/2012 05:14 PM, Dave Page wrote: On Wed, Jun 13, 2012 at 2:18 AM, Craig Ringer cr...@postnewspapers.com.au wrote: On 06/12/2012 08:08 PM, Dave Page wrote: Some background: By default the installer will use 'postgres' for both the service (OS) account, and the database superuser account. It will use the same password for both (though, users have complete control at the command line if they want it, which is why the account names are shown in the installer). Unlike *nix installations, the service account must have a password, which is required during both installation and upgrade to configure the PostgreSQL service. We therefore ask for the password during both installation and upgrade. If the user account exists (which can happen if there has previously been an installation of Postgres on the system), the user must specify the existing password for the account during installation (and of course, during all upgrades). This is where users normally get stuck, as they've forgotten the password they set in the past. They may not even have set it, because the EnterpriseDB installer can be run unattended and may've been used by some 3rd party package. I'd be interested in seeing what Microsoft installers that create isolated user accounts do. I think .NET creates one for ASP, so that'd be one option to look into. They tend to use the localsystem or networkservice accounts for most things, which don't have passwords. The reason we don't do that is that since the early days of 8.0 we've said we want to enable users to use the service account as if it were a regular account, so that they can do things like access network resources (useful for server-side copy for example). I wasn't overly convinced back then that that was necessary, and I'm still not now. I suppose we potentially could use the networkservice account by default, and let advanced users override that on the command line if they need to. Then remembering the password becomes their responsibility. +1 from me on this, though I think making the service account part of the install process makes sense. SERVICE ACCOUNT - You can probably just accept the default here. PostgreSQL runs in the background as a network service in a user account that only has limited access to the files and programs on the computer. This is fine for most purposes, but will prevent certain operations like server-side COPY and direct access by the server to resources on shared folders from working. If you need these capabilities, we recommend that you create a postgres user account below and have the PostgreSQL server run in that instead of the default NetworkService account. -- [service account] --- | | | [*] LocalService | | | | [ ] Custom Service Account | | | | -- [custom account]--| | | || | | | Username: [ ] | | | | Password: [ ] | | | | Domain: [ THISCOMPUTER] | | | || | | || | |--| Reasonable option? -- Craig Ringer POST Newspapers 276 Onslow Rd, Shenton Park Ph: 08 9381 3088 Fax: 08 9388 2258 ABN: 50 008 917 717 http://www.postnewspapers.com.au/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Ability to listen on two unix sockets
On Thu, Jun 14, 2012 at 01:31:31AM +0300, Peter Eisentraut wrote: On ons, 2012-06-13 at 15:25 +0200, Honza Horak wrote: It seems unix_socket_directory could be turned into list and probably renamed to unix_socket_directories, since it would be confusing if a list value is in singular. Well, it would also be annoying to rename the parameter name for a marginal change in functionality. We often rename admin-only config variables for clarity, and this seems to be such a case. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Minimising windows installer password confusion
On 06/13/2012 05:18 PM, Dave Page wrote: On Wed, Jun 13, 2012 at 3:07 AM, Craig Ringer Why using the windows control panel ? Because when I wrote the email I was looking for a simple solution that wouldn't require writing code that has potential to fail depending on how the users environment is configured (the user account stuff tends to go wrong in weird ways, for example when used on domains in unusual (or high security) configurations. We're spending a lot of effort at the moment getting the 9.2 buildfarm together, and updating all the StackBuilder add-on packages (think multiple man-months) - I'm trying not to add to that too much. Ah, sorry. I'm *not* trying to say that any of this is stuff that EDB should just up and do. I have no say over what you do and how, I'm just trying to raise possible usability points that might be useful, either soon or to inform design of later releases. -- Craig Ringer POST Newspapers 276 Onslow Rd, Shenton Park Ph: 08 9381 3088 Fax: 08 9388 2258 ABN: 50 008 917 717 http://www.postnewspapers.com.au/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Minimising windows installer password confusion
On 06/13/2012 06:32 PM, Florian Pflug wrote: Some further googling indicates that, yes, the service account passwords are stored in the registry, but are only accessible to the LocalSystem account [2]. Querying them from the postgres installer thus isn't really an option. But what you could do, I guess, is to offer the user the ability to change the password, and using the approach from [1] to update the affected service definitions afterwards. Yep, that fits with how MS SQL server does things: Always use SQL Server tools such as SQL Server Configuration Manager to change the account used by the SQL Server Database Engine or SQL Server Agent services, or to change the password for the account. In addition to changing the account name, SQL Server Configuration Manager performs additional configuration such as updating the Windows local security store which protects the service master key for the Database Engine. Other tools such as the Windows Services Control Manager can change the account name but do not change all the required settings. http://msdn.microsoft.com/en-us/library/ms143504.aspx -- Craig Ringer POST Newspapers 276 Onslow Rd, Shenton Park Ph: 08 9381 3088 Fax: 08 9388 2258 ABN: 50 008 917 717 http://www.postnewspapers.com.au/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: [COMMITTERS] pgsql: Send new protocol keepalive messages to standby servers.
On Mon, Jun 4, 2012 at 8:49 PM, Christopher Browne cbbro...@gmail.com wrote: What if the two servers are in different time zones? NTP shouldn't have any problem; it uses UTC underneath. As does PostgreSQL, underneath. As an aside, this is not strictly speaking true. NTP doesn't use UTC -- afaik it doesn't know about time zones at all. Likewise Postgres's underlying representation is not UTC either. They both use the number of seconds that have passed since the epoch. That's simply a number, not a time at all, and the number is the same regardless of what time zone you're in. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [COMMITTERS] pgsql: Send new protocol keepalive messages to standby servers.
Dimitri Fontaine dimi...@2ndquadrant.fr writes: Tom Lane t...@sss.pgh.pa.us writes: I thought I already pointed that out, but: we have *extensions*. What we don't have is a convenient method of dealing with functions that need to be migrated across extensions, or from an extension to core, between one major release and the next. It would clearly be nice to have that someday, but we don't have it now. Designing on the assumption that 9.3 Well, my patch for 9.2 called Finer Extension Dependencies was all about supporting that. The idea is to be able to name a set of functions (or other objects) then setup a dependency graph towards that arbitrary name. Then it's possible to migrate that named set of objects from an extension to another one, and it's possible for core to publish a list of provided names of set of objects provided. Well, TBH I didn't think that concept was a useful solution for this at all. I can't imagine that we would define features at a sufficiently fine granularity, or with enough advance foresight, to solve the sort of problem that's being faced here. How would you deal with the need for, say, some of contrib/xml2's functions to get migrated to core in 9.4 or so? When you didn't know that would happen, much less exactly which functions, when 9.3 came out? AFAICS the only way that features could fix this would be if we always created a feature for every exposed function, which is unmanageable. The only missing part in the patch is allowing for the core to declare a set of set of objects (a set of features in its current terminology) that it brings on the table. Such a list already exists though, and is using the same terminology as in my patch: http://www.postgresql.org/docs/9.2/static/features-sql-standard.html We wouldn't only publish the standard compliant feature list with such a mechanism though or it would be quite useless for our operations here. AFAICS, the SQL-standard features list is just about entirely irrelevant to this discussion. How often is it going to happen that we implement a standard feature in a contrib module before migrating it into core? I think almost every case in which we'll face this problem will involve a PG-specific feature not mentioned in the SQL feature taxonomy. The case at hand (some proposed new functions for managing replication) certainly isn't going to be found there. And, quite aside from whether we could invent feature names that match what we want to move from contrib to core, exactly how would having a feature name help? The problem we're actually facing is getting pg_upgrade to not dump particular functions when it's doing a binary-upgrade dump of an extension. Maybe I've forgotten, but I do not recall any exposed connection between feature names and particular SQL objects in your proposal. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers