Re: [HACKERS] Oid registry
On Mon, Sep 24, 2012 at 8:21 PM, Tom Lane t...@sss.pgh.pa.us wrote: So, yeah, we could reserve a couple hundred OIDs for a scheme like this and (probably) not regret it later. But a couple thousand would scare me ... and I'm not exactly convinced that a couple hundred is enough, if there's any demand out there at all. I think some kind of way to compose extension objects -- this includes and goes beyond just the things in pg_type, but operators and so on -- will have great impact on Postgres' power without making the community effort unscalable. I think PostGIS is the largest success by this metric -- although certain requirements spill back into pgsql-hackers, it's a pretty huge island of functionality out there that neatly progresses on its own without coordination. That's fantastic. I am fairly certain that if some form of in-line extensions were supported, we would see people building smaller extensions that use many little types (such as composites) relying on other extensions to do things or manage certain tables, increasing convenience overall. Things like the JSON data type (in spite of my own opinion that it should be added) are a highly useful concession. Yet, it is still a concession that extensions simply cannot be as close to good as a core data type when it comes to the ability to compose. The gap between pre-JSON-in-the-standard-library in Python, Ruby, et al and post-JSON-in-stdlib was much smaller. Even for statically typed languages -- like Java -- the only central-registries that exist are mostly for the convenience of distribution and deployment, not as the means of composition, and I think that is a good thing. However, an IANA-style OID registry I find pretty un-compelling. Is there no way we can use symbols, the type system, and invalidation messages to a client to do this? I feel like there is a way we can, and it is probably worth it to minimize coordination among extension authors and enabling smaller extensions. If reserved OID ranges for extensions are to become a thing, I think the right scope would be to presume that these extensions are not bundled with Postgres, but practically, whoever uses that space is probably going to have to be the kind of person who would correspond with -hackers. Ad-hoc composition among small or live-and-die-fast user-space libraries (that are written in trusted languages, for example) are out, which is kind of a bummer for what I think plv8 can enable. -- fdr -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] plpgsql gram.y make rule
On Mon, Sep 24, 2012 at 10:21 PM, Tom Lane t...@sss.pgh.pa.us wrote: Peter Eisentraut pete...@gmx.net writes: I wanted to refactor the highly redundant flex and bison rules throughout the source into common pattern rules. (Besides saving some redundant code, this could also help some occasionally flaky code in pgxs modules.) The only outlier that breaks this is in plpgsql pl_gram.c: gram.y I would like to either rename the intermediate file(s) to gram.{c,h}, or possibly rename the source file to pl_gram.y. Any preferences or other comments? Hmmm ... it's annoyed me for a long time that that file is named the same as the core backend's gram.y. So renaming to pl_gram.y might be better. On the other hand I have very little confidence in git's ability to preserve change history if we do that. Has anyone actually done a file rename in a project with lots of history, and how well did it turn out? (For instance, does git blame still provide any useful tracking of pre-rename changes? If you try to cherry-pick a patch against the new file into a pre-rename branch, does it work?) git handles renaming just fine with cherry-picks, no special options necessary. (Well, there are probably corner cases, but it's code, there are always corner cases!) For git log, you'll want to add the --follow parameter if you're asking for the history of a specific file or directory beyond a renaming event. git blame will show you the commit that renamed the file, by default, but then you can request the revision prior to that using the commit hash || '^', for example. git blame 2fb6cc90^ -- src/backend/parser/gram.y to work your way back through history. -- Dan Scott Laurentian University -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Switching timeline over streaming replication
On 24.09.2012 16:33, Amit Kapila wrote: On Tuesday, September 11, 2012 10:53 PM Heikki Linnakangas wrote: I've been working on the often-requested feature to handle timeline changes over streaming replication. At the moment, if you kill the master and promote a standby server, and you have another standby server that you'd like to keep following the new master server, you need a WAL archive in addition to streaming replication to make it cross the timeline change. Streaming replication will just error out. Having a WAL archive is usually a good idea in complex replication scenarios anyway, but it would be good to not require it. Confirm my understanding of this feature: This feature is for case when standby-1 who is going to be promoted to master has archive mode 'on'. No. This is for the case where there is no WAL archive. archive_mode='off' on all servers. Or to be precise, you can also have a WAL archive, but this patch doesn't affect that in any way. This is strictly about streaming replication. As in that case only its timeline will change. The timeline changes whenever you promote a standby. It's not related to whether you have a WAL archive or not. If above is right, then there can be other similar scenario's where it can be used: Scenario-1 (1 Master, 1 Stand-by) 1. Master (archive_mode=on) goes down. 2. Master again comes up 3. Stand-by tries to follow it Now in above scenario also due to timeline mismatch it gives error, but your patch should fix it. If the master simply crashes or is shut down, and then restarted, the timeline doesn't change. The standby will reconnect / poll the archive, and sync up just fine, even without this patch. However I am not sure about splitting for RestoreArchivedFile() and ExecuteRecoveryCommand() into separate file. How about splitting for all Archive related functions: static void XLogArchiveNotify(const char *xlog); static void XLogArchiveNotifySeg(XLogSegNo segno); static bool XLogArchiveCheckDone(const char *xlog); static bool XLogArchiveIsBusy(const char *xlog); static void XLogArchiveCleanup(const char *xlog); Hmm, sounds reasonable. In any case, it will be better if you can split it into multiple patches: 1. Having new functionality of Switching timeline over streaming replication 2. Refactoring related changes. It can make my testing and review for new feature patch little easier. Yep, I'll go ahead and split the patch. Thanks! - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Configuration include directory
On 24.09.2012 22:13, Gavin Flower wrote: On 25/09/12 02:41, Heikki Linnakangas wrote: Multiple files within an include directory are processed in filename order. The filenames are ordered by C locale rules, ie. numbers before letters, and uppercase letters before lowercase ones. Even I can understand that! :-) More to the point: are fullstops '.' sorted before digits? Yes. But files beginning with a fullstop are ignored. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Oid registry
On 24 September 2012 21:26, Andrew Dunstan and...@dunslane.net wrote: On 09/24/2012 09:37 PM, Peter Eisentraut wrote: On Mon, 2012-09-24 at 18:59 -0400, Andrew Dunstan wrote: This rather overdue mail arises out the developer's meeting back in May, where we discussed an item I raised suggesting an Oid registry. The idea came from some difficulties I encountered when I did the backport of the JSON work we did in 9.2 to 9.1, but has wider application. Say someone writes an extension that defines type A. You want to write an extension that takes advantage of that type, but it's difficult if you don't know the type's Oid, Could you fill the rest of us in with some technical details about why this might be necessary and what it aims to achieve? Well, an obvious case is how record_to_json handles fields. If it knows nothing about the type all it can do is output the string value. That doesn't work well for types such as hstore. If it could reliably recognize a field as an hstore it might well be able to do lots better. I think we're missing something in the discussion here. Why can't you find out the oid of the type by looking it up by name? That mechanism is used throughout postgres already and seems to work just fine. Why would we need to hardcode it? -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Oid registry
On 09/25/2012 04:26 AM, Andrew Dunstan wrote: On 09/24/2012 09:37 PM, Peter Eisentraut wrote: On Mon, 2012-09-24 at 18:59 -0400, Andrew Dunstan wrote: This rather overdue mail arises out the developer's meeting back in May, where we discussed an item I raised suggesting an Oid registry. The idea came from some difficulties I encountered when I did the backport of the JSON work we did in 9.2 to 9.1, but has wider application. Say someone writes an extension that defines type A. You want to write an extension that takes advantage of that type, but it's difficult if you don't know the type's Oid, Could you fill the rest of us in with some technical details about why this might be necessary and what it aims to achieve? Well, an obvious case is how record_to_json handles fields. If it knows nothing about the type all it can do is output the string value. That doesn't work well for types such as hstore. If it could reliably recognize a field as an hstore it might well be able to do lots better. During the earlier to_json() discussions I already proposed a solution to this - try a cast to JSON type before falling back to outputting the text version. As I understand it, this is how typecasts are supposed to be used. In addition to being how PostgreSQL is intended to work it also allows for users to override it and have their own version of json representations for types which have no standard json format. Just for the record - I'm not opposed to having a PostgreSQL OID Registry though I think that for anything in the backend the lookup by name should be fast enough. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_reorg in core?
On Tuesday, September 25, 2012 04:37:05 AM Michael Paquier wrote: On Tue, Sep 25, 2012 at 8:13 AM, Andres Freund and...@2ndquadrant.comwrote: On Tuesday, September 25, 2012 12:55:35 AM Josh Berkus wrote: On 9/24/12 3:43 PM, Simon Riggs wrote: On 24 September 2012 17:36, Josh Berkus j...@agliodbs.com wrote: For me, the Postgres user interface should include * REINDEX CONCURRENTLY I don't see why we don't have REINDEX CONCURRENTLY now. Same reason for everything on (anyone's) TODO list. Yes, I'm just pointing out that it would be a very small patch for someone, and that AFAIK it didn't make it on the TODO list yet. Its not *that* small. 1. You need more than you can do with CREATE INDEX CONCURRENTLY and DROP INDEX CONCURRENTLY because the index can e.g. be referenced by a foreign key constraint. So you need to replace the existing index oid with a new one by swapping the relfilenodes of both after verifying several side conditions (indcheckxmin, indisvalid, indisready). It would probably have to look like: - build new index with indisready = false - newindex.indisready = true - wait - newindex.indisvalid = true - wait - swap(oldindex.relfilenode, newindex.relfilenode) - oldindex.indisvalid = false - wait - oldindex.indisready = false - wait - drop new index with old relfilenode Every wait indicates an externally visible state which you might encounter/need to cleanup... Could you clarify what do you mean here by cleanup? I am afraid I do not get your point here. Sorry, was a bit tired when writing the above. The point is that to work concurrent the CONCURRENT operations commit/start multiple transactions internally. It can be interrupted (user, shutdown, error, crash) and leave transient state behind every time it does so. What I wanted to say is that we need to take care that each of those can easily be cleaned up afterwards. 2. no support for concurrent on system tables (not easy for shared catalogs) Doesn't this exclude all the tables that are in the schema catalog? No. Only SELECT array_to_string(array_agg(relname), ', ') FROM pg_class WHERE relisshared AND relkind = 'r'; their toast tables and their indexes are shared. The problem is that for those you cannot create a separate index and let it update concurrently because you cannot write into each databases pg_class/pg_index. 3. no support for the indexes of exclusion constraints (not hard I think) This just consists in a check of indisready in pg_index. It will probably be several places, but yea, I don't think its hard. Andres -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Oid registry
On Tue, Sep 25, 2012 at 1:06 AM, Simon Riggs si...@2ndquadrant.com wrote: On 24 September 2012 21:26, Andrew Dunstan and...@dunslane.net wrote: Well, an obvious case is how record_to_json handles fields. If it knows nothing about the type all it can do is output the string value. That doesn't work well for types such as hstore. If it could reliably recognize a field as an hstore it might well be able to do lots better. I think we're missing something in the discussion here. Why can't you find out the oid of the type by looking it up by name? That mechanism is used throughout postgres already and seems to work just fine. Sure, but how do you know the type named hstore is what you know as hstore? We don't have a global consensus a specific type name means exactly what everyone thinks it is. Thanks, -- Hitoshi Harada -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Oid registry
On 25.09.2012 12:19, Hitoshi Harada wrote: On Tue, Sep 25, 2012 at 1:06 AM, Simon Riggssi...@2ndquadrant.com wrote: On 24 September 2012 21:26, Andrew Dunstanand...@dunslane.net wrote: Well, an obvious case is how record_to_json handles fields. If it knows nothing about the type all it can do is output the string value. That doesn't work well for types such as hstore. If it could reliably recognize a field as an hstore it might well be able to do lots better. I'm not at all familiar with record_to_json or the json datatype, but wouldn't it be appropriate to create a cast from hstore to json to handle that case? That brings us to another question: should the cast be part of the hstore extension, or json? (json is built-in, but imagine for a moment that it was an extension, too, so that there was a choice). IIRC someone started a discussion on that recently on pgsql-hackers, but I don't think there was any conclusion on that. I think we're missing something in the discussion here. Why can't you find out the oid of the type by looking it up by name? That mechanism is used throughout postgres already and seems to work just fine. Sure, but how do you know the type named hstore is what you know as hstore? We don't have a global consensus a specific type name means exactly what everyone thinks it is. If you create a type called hstore that isn't the one in contrib/hstore, you're just asking for trouble. Having a central registry of assigned oid numbers doesn't really make that problem go away either; you could still assign one of the registered oids for a different datatype if you wanted to. So whether you rely on the oid or type name, your code needs to behave sanely, even if the datatype doesn't behave the way you'd expect. At a minimum, there needs to be sanity checks so that you don't crash, and give a meaningful error message. Beyond that, I think it's the user responsibility to not confuse things by using well-known datatype names like hstore for something else. Here's another thought: imagine that someone creates a datatype that's more or less compatible with contrib/hstore, but the internal implementation is different. Would you call that hstore? Would you use the same assigned oid for it? If you have some other extension that knows about hstore, and treats it specially, would you want the new datatype to be treated specially too? And what about domains? If you have code that knows the oid of hstore and gives it special treatment, should domains over hstore also be given special treatment? - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Switching timeline over streaming replication
On Tuesday, September 25, 2012 12:39 PM Heikki Linnakangas wrote: On 24.09.2012 16:33, Amit Kapila wrote: On Tuesday, September 11, 2012 10:53 PM Heikki Linnakangas wrote: I've been working on the often-requested feature to handle timeline changes over streaming replication. At the moment, if you kill the master and promote a standby server, and you have another standby server that you'd like to keep following the new master server, you need a WAL archive in addition to streaming replication to make it cross the timeline change. Streaming replication will just error out. Having a WAL archive is usually a good idea in complex replication scenarios anyway, but it would be good to not require it. Confirm my understanding of this feature: This feature is for case when standby-1 who is going to be promoted to master has archive mode 'on'. No. This is for the case where there is no WAL archive. archive_mode='off' on all servers. Or to be precise, you can also have a WAL archive, but this patch doesn't affect that in any way. This is strictly about streaming replication. As in that case only its timeline will change. The timeline changes whenever you promote a standby. It's not related to whether you have a WAL archive or not. Yes that is correct. I thought timeline change happens only when somebody does PITR. Can you please tell me why we change timeline after promotion, because the original Timeline concept was for PITR and I am not able to trace from code the reason why on promotion it is required? If above is right, then there can be other similar scenario's where it can be used: Scenario-1 (1 Master, 1 Stand-by) 1. Master (archive_mode=on) goes down. 2. Master again comes up 3. Stand-by tries to follow it Now in above scenario also due to timeline mismatch it gives error, but your patch should fix it. If the master simply crashes or is shut down, and then restarted, the timeline doesn't change. The standby will reconnect / poll the archive, and sync up just fine, even without this patch. How about when Master does PITR when it comes again? With Regards, Amit Kapila. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Switching timeline over streaming replication
On 25.09.2012 14:10, Amit Kapila wrote: On Tuesday, September 25, 2012 12:39 PM Heikki Linnakangas wrote: On 24.09.2012 16:33, Amit Kapila wrote: On Tuesday, September 11, 2012 10:53 PM Heikki Linnakangas wrote: I've been working on the often-requested feature to handle timeline changes over streaming replication. At the moment, if you kill the master and promote a standby server, and you have another standby server that you'd like to keep following the new master server, you need a WAL archive in addition to streaming replication to make it cross the timeline change. Streaming replication will just error out. Having a WAL archive is usually a good idea in complex replication scenarios anyway, but it would be good to not require it. Confirm my understanding of this feature: This feature is for case when standby-1 who is going to be promoted to master has archive mode 'on'. No. This is for the case where there is no WAL archive. archive_mode='off' on all servers. Or to be precise, you can also have a WAL archive, but this patch doesn't affect that in any way. This is strictly about streaming replication. As in that case only its timeline will change. The timeline changes whenever you promote a standby. It's not related to whether you have a WAL archive or not. Yes that is correct. I thought timeline change happens only when somebody does PITR. Can you please tell me why we change timeline after promotion, because the original Timeline concept was for PITR and I am not able to trace from code the reason why on promotion it is required? Bumping the timeline helps to avoid confusion if, for example, the master crashes, and the standby isn't fully in sync with it. In that situation, there are some WAL records in the master that are not in the standby, so promoting the standby is effectively the same as doing PITR. If you promote the standby, and later try to turn the old master into a standby server that connects to the new master, things will go wrong. Assigning the new master a new timeline ID helps the system and the administrator to notice that. It's not bulletproof, for example you can easily avoid the timeline change if you just remove recovery.conf and restart the server, but the timelines help to manage such situations. If above is right, then there can be other similar scenario's where it can be used: Scenario-1 (1 Master, 1 Stand-by) 1. Master (archive_mode=on) goes down. 2. Master again comes up 3. Stand-by tries to follow it Now in above scenario also due to timeline mismatch it gives error, but your patch should fix it. If the master simply crashes or is shut down, and then restarted, the timeline doesn't change. The standby will reconnect / poll the archive, and sync up just fine, even without this patch. How about when Master does PITR when it comes again? Then the timeline will be bumped and this patch will be helpful. Assuming the standby is behind the point in time that the master was recovered to, it will be able to follow the master to the new timeline. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_reorg in core?
On Tue, Sep 25, 2012 at 5:55 PM, Andres Freund and...@2ndquadrant.comwrote: On Tuesday, September 25, 2012 04:37:05 AM Michael Paquier wrote: On Tue, Sep 25, 2012 at 8:13 AM, Andres Freund and...@2ndquadrant.com wrote: Could you clarify what do you mean here by cleanup? I am afraid I do not get your point here. Sorry, was a bit tired when writing the above. The point is that to work concurrent the CONCURRENT operations commit/start multiple transactions internally. It can be interrupted (user, shutdown, error, crash) and leave transient state behind every time it does so. What I wanted to say is that we need to take care that each of those can easily be cleaned up afterwards. Sure, many errors may happen. But, in the case of CREATE INDEX CONCURRENTLY, there is no clean up method implemented as far as I know (might be missing something though). Isn't an index only considered as invalid in case of failure for concurrent creation? In the case of REINDEX it would be essential to create such a cleanup mechanism as I cannot imagine a production database with an index that has been marked as invalid due to a concurrent reindex failure, by assuming here, of course, that REINDEX CONCURRENTLY would use the same level of process error as CREATE INDEX CONCURRENTLY. One of the possible cleanup mechanisms I got on top of my head is a callback at transaction abort, each callback would need to be different for each subtransaction used at during the concurrent operation. In case the callback itself fails, well the old and/or new indexes become invalid. 2. no support for concurrent on system tables (not easy for shared catalogs) Doesn't this exclude all the tables that are in the schema catalog? No. Only SELECT array_to_string(array_agg(relname), ', ') FROM pg_class WHERE relisshared AND relkind = 'r'; their toast tables and their indexes are shared. The problem is that for those you cannot create a separate index and let it update concurrently because you cannot write into each databases pg_class/pg_index. Yes indeed, I didn't think about things that are shared among databases. Blocking that is pretty simple, only a matter of places checked. -- Michael Paquier http://michael.otacoo.com
Re: [HACKERS] Patch for option in pg_resetxlog for restore from WAL files
On 24 September 2012 04:00, Heikki Linnakangas hlinnakan...@vmware.com wrote: On 18.07.2012 16:47, Amit kapila wrote: Patch implementing the design in below mail chain is attached with this mail. This patch copies the ReadRecord() function and a bunch of related functions from xlog.c into pg_resetxlog.c. There's a separate patch in the current commitfest to make that code reusable, without having to copy-paste it to every tool that wants to parse the WAL. See https://commitfest.postgresql.org/action/patch_view?id=860. This patch should be refactored to make use of that framework, as soon as it's committed. Agreed, moving to next commitfest. Amit, suggest review of the patch that this now depends upon. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [ADMIN] pg_upgrade from 9.1.3 to 9.2 failed
On Tue, Sep 25, 2012 at 12:22:43PM +0800, Rural Hunter wrote: OK, that is good to know. I developed the attached C program that does the setlocale canonical test. On Debian Squeeze, I could not see any change: if I pass en_US.UTF-8, I get en_US.UTF-8 returned; if I pass en_US.UTF8, I get en_US.UTF8 returned. Can anyone test this and find a case where the local is canonicalized? Run it this way: $ canonical LC_COLLATE = 3 LC_CTYPE = 0 $ canonical 0 en_US.UTF8 en_US.UTF8 We are looking for cases where the second argument produces a non-matching locale name as output. It matches on my system(ubuntu 10.10 server): $ ./canonical LC_COLLATE = 3 LC_CTYPE = 0 $ ./canonical 0 zh_CN.UTF-8 zh_CN.UTF-8 $ ./canonical 0 zh_CN.UTF8 zh_CN.UTF8 $ ./canonical 0 zh_CN.utf8 zh_CN.utf8 $ ./canonical 0 zh_CN.utf-8 zh_CN.utf-8 I tested the checker with the patch: $ /opt/PostgreSQL/9.2/bin/pg_upgrade -b /opt/PostgreSQL/9.1/bin -B /opt/PostgreSQL/9.2/bin -d /raid/pgsql -D /raid/pg92data -c Performing Consistency Checks - Checking current, bin, and data directories ok Checking cluster versions ok Checking database user is a superuser ok Checking for prepared transactions ok Checking for reg* system OID user data types ok Checking for contrib/isn with bigint-passing mismatch ok lc_collate cluster values do not match: old zh_CN.utf8, new zh_CN.UTF-8 Failure, exiting zh_CN.utf8 is provided by the installer and zh_CN.UTF-8 is my system default. OK, this tells us that the canonicalization code used in initdb is not going to help us in pg_upgrade, at least not on your system, and not on mine. I think we should apply the patch that fixes the TOAST problem with information_schema, and the patch that outputs the old/new values for easier debugging. Other than that, I don't know what else we can do except to ignore dashes when comparing locale names, which I am told is unacceptable. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] DROP INDEX CONCURRENTLY is not really concurrency safe leaves around undroppable indexes
On Tuesday, September 25, 2012 01:58:31 AM Andres Freund wrote: On Monday, September 24, 2012 01:37:59 PM Andres Freund wrote: On Monday, September 24, 2012 01:27:54 PM Andres Freund wrote: Problem 2: undroppable indexes: Session 1: CREATE TABLE test_drop_concurrently(id serial primary key, data int); CREATE INDEX test_drop_concurrently_data ON test_drop_concurrently(data); BEGIN; EXPLAIN ANALYZE SELECT * FROM test_drop_concurrently WHERE data = 34343; Session 2: DROP INDEX CONCURRENTLY test_drop_concurrently_data; waiting ^CCancel request sent ERROR: canceling statement due to user request Session 1: ROLLBACK; DROP TABLE test_drop_concurrently; SELECT indexrelid, indrelid, indexrelid::regclass, indrelid::regclass, indisvalid, indisready FROM pg_index WHERE indexrelid = 'test_drop_concurrently_data'::regclass; indexrelid | indrelid | indexrelid | indrelid | indisvalid | indisready +--+-+--+-- -- -- --+ 24703 |24697 | test_drop_concurrently_data | 24697| f | f (1 row) DROP INDEX test_drop_concurrently_data; ERROR: could not open relation with OID 24697 Haven't looked at this one at all. Thats because it has to commit transactions inbetween to make the catalog changes visible. Unfortunately at that point it already deleted the dependencies in deleteOneObject... Seems to be solvable with some minor reshuffling in deleteOneObject. We can only perform the deletion of outgoing pg_depend records *after* we have dropped the object with doDeletion in the concurrent case. As the actual drop of the relation happens in the same transaction that will then go on to drop the dependency records that seems to be fine. I don't see any problems with that right now, will write a patch tomorrow. We will see if thats problematic... Patch attached. Review appreciated, there might be consequences of moving the deletion of dependencies I didn't forsee. Greetings, Andres -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services From 8891fcd59496483793aecc21a096fc0119369e73 Mon Sep 17 00:00:00 2001 From: Andres Freund and...@anarazel.de Date: Tue, 25 Sep 2012 01:41:29 +0200 Subject: [PATCH] Fix concurrency issues in concurrent index drops Previously a DROP INDEX CONCURRENTLY started with unsetting indisvalid *and* indisready. Thats problematic if some transaction is still looking at the index and another transction makes changes. See the example below. Now we do the drop in three stages, just as a concurrent index build. First unset indivalid, wait, unset indisready, wait, drop index. Example: Session 1: CREATE TABLE test_drop_concurrently(id serial primary key, data int); INSERT INTO test_drop_concurrently(data) SELECT * FROM generate_series(1, 10); CREATE INDEX test_drop_concurrently_data ON test_drop_concurrently(data); BEGIN; EXPLAIN ANALYZE SELECT * FROM test_drop_concurrently WHERE data = 34343; SELECT * FROM test_drop_concurrently WHERE data = 34343; (1 row) Session 2: BEGIN; SELECT * FROM test_drop_concurrently WHERE data = 34343; Session 3: DROP INDEX CONCURRENTLY test_drop_concurrently_data; (in-progress) Session 2: INSERT INTO test_drop_concurrently(data) SELECT * FROM generate_series(1, 10); COMMIT; Session 1: SELECT * FROM test_drop_concurrently WHERE data = 34343; (1 row) SET enable_bitmapscan = false; SET enable_indexscan = false; SELECT * FROM test_drop_concurrently WHERE data = 34343; (2 rows) --- src/backend/catalog/index.c | 99 ++--- 1 file changed, 84 insertions(+), 15 deletions(-) diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c index a61b424..3e1794f 100644 --- a/src/backend/catalog/index.c +++ b/src/backend/catalog/index.c @@ -1318,6 +1318,10 @@ index_drop(Oid indexId, bool concurrent) * table lock strong enough to prevent all queries on the table from * proceeding until we commit and send out a shared-cache-inval notice * that will make them update their index lists. + * + * In the concurrent case we make sure that nobody can be looking at the + * indexes by dropping the index in multiple steps, so we don't need a full + * fledged AccessExlusiveLock yet. */ heapId = IndexGetRelation(indexId, false); if (concurrent) @@ -1338,7 +1342,19 @@ index_drop(Oid indexId, bool concurrent) /* * Drop Index concurrently is similar in many ways to creating an index - * concurrently, so some actions are similar to DefineIndex() + * concurrently, so some actions are similar to DefineIndex() just in the + * reverse order. + * + * First we unset indisvalid so queries starting afterwards don't use the + * index to answer queries anymore. We have to keep indisready = true + *
Re: [HACKERS] Oid registry
On 09/25/2012 06:13 AM, Heikki Linnakangas wrote: On 25.09.2012 12:19, Hitoshi Harada wrote: On Tue, Sep 25, 2012 at 1:06 AM, Simon Riggssi...@2ndquadrant.com wrote: On 24 September 2012 21:26, Andrew Dunstanand...@dunslane.net wrote: Well, an obvious case is how record_to_json handles fields. If it knows nothing about the type all it can do is output the string value. That doesn't work well for types such as hstore. If it could reliably recognize a field as an hstore it might well be able to do lots better. I'm not at all familiar with record_to_json or the json datatype, but wouldn't it be appropriate to create a cast from hstore to json to handle that case? No, the difficulty (or at least the first difficulty) is in having the code recognize that it has an hstore at all. The code picks apart the record field by field at run time and takes various actions depending on the field's type. For any type it doesn't recognize it just outputs the value as a string, and so that's what it does with hstore. Mostly this is the right thing but in the hstore case it's rather sad. That brings us to another question: should the cast be part of the hstore extension, or json? (json is built-in, but imagine for a moment that it was an extension, too, so that there was a choice). IIRC someone started a discussion on that recently on pgsql-hackers, but I don't think there was any conclusion on that. I think we're missing something in the discussion here. Why can't you find out the oid of the type by looking it up by name? That mechanism is used throughout postgres already and seems to work just fine. Sure, but how do you know the type named hstore is what you know as hstore? We don't have a global consensus a specific type name means exactly what everyone thinks it is. If you create a type called hstore that isn't the one in contrib/hstore, you're just asking for trouble. Having a central registry of assigned oid numbers doesn't really make that problem go away either; you could still assign one of the registered oids for a different datatype if you wanted to. That's true to some extent, but names are worse, since they are schema qualified. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] One patch, one review
As part of the PostgreSQL Development process, we agreed that everybody that supplies a patch would also review a patch. Doing that gives the process balance and avoids focusing on particular individuals for review. It's Peer Review, not Free Review. In Sept 2012 review process we have 5 people doing more reviews than patches and 14 people who've submitted more patches than reviews. From that list, I've excluded 7 people who've submitted their first patch (or at least I don't recognise the name) and who can be forgiven for not signing up, yet. I myself haven't put down to review anything yet, but this mail is part of me thinking about that. There are still 35 patches in need of review. We must remember that if we don't do this then the alternative is simply longer delays in the patch acceptance process and that affects developers worse than anyone else. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch for option in pg_resetxlog for restore from WAL files
On Tuesday, September 25, 2012 6:27 PM Simon Riggs wrote : On 24 September 2012 04:00, Heikki Linnakangas hlinnakan...@vmware.com wrote: On 18.07.2012 16:47, Amit kapila wrote: Patch implementing the design in below mail chain is attached with this mail. This patch copies the ReadRecord() function and a bunch of related functions from xlog.c into pg_resetxlog.c. There's a separate patch in the current commitfest to make that code reusable, without having to copy-paste it to every tool that wants to parse the WAL. See https://commitfest.postgresql.org/action/patch_view?id=860. This patch should be refactored to make use of that framework, as soon as it's committed. Agreed, moving to next commitfest. Amit, suggest review of the patch that this now depends upon. Earlier I thought, I will try to finish in this CommitFest if the XLogReader Patch gets committed by next week. However if you feel it is better to work it for next CommitFest, I shall do it that way. With Regards, Amit Kapila. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Oid registry
On 09/24/2012 11:39 PM, Tom Lane wrote: My recollection of the PGCon discussion is that people wanted to allow client-side code to hard-wire type OIDs for add-on types, in more or less the same way that things like JDBC know that 25 is text. That's not unreasonable, since the alternatives are complicated and not all that reliable. I'm not sure the usage applies to anything except data types though, so at least for starters I'd only want to accept reservations of OIDs for data types. Yes, I certainly think that's a sufficient case. And just to be clear, I don't care about the private range suggestion. I was just reporting that as it came up in the discussion and at least wasn't immediately shouted down. But I'm happy to abandon it at least for now. If there is ever actual demand for it we could revisit the matter. Given your previous comments, perhaps we could just start handing out Oids (if there is any demand) numbered, say, 9000 and up. That should keep us well clear of any existing use. Regarding the use of pre-allocated Oids, unless we provide some facility to use them when creating types extension authors will be reduced to using the pretty ugly tricks I had to use when backporting JSON for binary upgrade-ability. This used some of pg_upgrade's tricks to force use of particular Oids, but I don't think this should be recommended practice. The suggestion I made was based on something you suggested (albeit with some caveats) in the recent discussion of pg_upgrade with extensions. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: incorrect array offset in backend replication tar header
Tom, I actually plan on doing a lot of work on the frontend pg_basebackup for my employer. pg_basebackup is 90% of the way to a solution that I need for doing backups of *large* databases while allowing the database to continue to work. The problem is a lack of secondary disk space to save a replication of the original database cluster. I want to modify pg_basebackup to include the WAL files in the tar output. I have several ideas but I need to code and test them. That was the main reason I was examining the backend code. If you're willing to wait a bit on me to code and test my extensions to pg_basebackup I will try to address some of the deficiencies as well add new features. I agree the checksum algorithm could definitely use some refactoring. I was already working on that before I retired last night. -- Brian On Mon, Sep 24, 2012 at 10:36 PM, Tom Lane t...@sss.pgh.pa.us wrote: Brian Weaver cmdrcluel...@gmail.com writes: Here are lines 321 through 329 of 'archive_read_support_format_tar.c' from libarchive 321 /* Recognize POSIX formats. */ 322 if ((memcmp(header-magic, ustar\0, 6) == 0) 323 (memcmp(header-version, 00, 2) == 0)) 324 bid += 56; 325 326 /* Recognize GNU tar format. */ 327 if ((memcmp(header-magic, ustar , 6) == 0) 328 (memcmp(header-version, \0, 2) == 0)) 329 bid += 56; I'm wondering if the original committer put the 'ustar00\0' string in by design? The second part of that looks to me like it matches ustar \0, not ustar00\0. I think the pg_dump coding is just wrong. I've already noticed that its code for writing the checksum is pretty brain-dead too :-( Note that according to the wikipedia page, tar programs typically accept files as pre-POSIX format if the checksum is okay, regardless of what is in the magic field; and the fields that were added by POSIX are noncritical so we'd likely never notice that they were being ignored. (In fact, looking closer, pg_dump isn't even filling those fields anyway, so the fact that it's not producing a compliant magic field may be a good thing ...) regards, tom lane -- /* insert witty comment here */ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Switching timeline over streaming replication
On Tuesday, September 25, 2012 6:29 PM Heikki Linnakangas wrote: On 25.09.2012 10:08, Heikki Linnakangas wrote: On 24.09.2012 16:33, Amit Kapila wrote: In any case, it will be better if you can split it into multiple patches: 1. Having new functionality of Switching timeline over streaming replication 2. Refactoring related changes. It can make my testing and review for new feature patch little easier. Yep, I'll go ahead and split the patch. Thanks! Ok, here you go. xlog-c-split-1.patch contains the refactoring of existing code, with no user-visible changes. streaming-tli-switch-2.patch applies over xlog-c-split-1.patch, and contains the new functionality. Thanks, it will make my review easier than previous. With Regards, Amit Kapila. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [WIP] Performance Improvement by reducing WAL for Update Operation
On 24.09.2012 13:57, Amit kapila wrote: Rebased version of patch based on latest code. When HOT was designed, we decided that heap_update needs to compare the old and new attributes directly, with memcmp(), to determine whether any of the indexed columns have changed. It was not deemed infeasible to pass down that information from the executor. I don't remember the details of why that was, but you seem to trying to same thing in this patch, and pass the bitmap of modified cols from the executor to heap_update(). I'm pretty sure that won't work, for the same reasons we didn't do it for HOT. I still feel that it would probably be better to use a generic delta encoding scheme, instead of inventing one. How about VCDIFF (http://tools.ietf.org/html/rfc3284), for example? Or you could reuse the LZ compressor that we already have in the source tree. You can use LZ for delta compression by initializing the history buffer of the algorithm with the old tuple, and then compressing the new tuple as usual. Or you could still use the knowledge of where the attributes begin and end and which attributes were updated, and do the encoding similar to how you did in the patch, but use LZ as the output format. That way the decoding would be the same as LZ decompression. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: incorrect array offset in backend replication tar header
On 9/25/12 3:38 PM, Brian Weaver wrote: I want to modify pg_basebackup to include the WAL files in the tar output. Doesn't pg_basebackup -x do exactly that? Regards, Marko Tiikkaja -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] xlog filename formatting functions in recovery
On 21 September 2012 02:25, Heikki Linnakangas hlinnakan...@vmware.com wrote: On 03.07.2012 15:13, Robert Haas wrote: On the substance of the patch, I believe the reason why this is currently disallowed is because the TLI is implicitly taken from the running system, and on the standby that might be the wrong value. Yeah, I believe that's the reason. So the question is, what timeline should the functions use on a standby? With the patch as it is, they use 0: postgres=# select pg_xlogfile_name_offset('3/FF02'); pg_xlogfile_name_offset --- (000300FF,131072) (1 row) There's a few different options: 1. current recovery_target_timeline (XLogCtl-recoveryTargetTLI) 2. current ThisTimeLineID, which is bumped every time a timeline-bumping checkpoint record is replayed. (this is not currently visible to backends, but we could easily add a shared memory variable for it) 3. curFileTLI. That is, the TLI of the current file that we're replaying. This is usually the same as ThisTimeLineID, except when replaying a WAL segment where the timeline changes 4. Something else? What do you use these functions for? Which option would make the most sense? I would say there is no sensible solution. So we keep pg_xlogfile_name_offset() banned in recovery, as it is now. We introduce pg_xlogfile_name_offset_timeline() where you have to manually specify the timeline, then introduce 3 functions that map onto the 3 options above, forcing the user to choose which one they mean. pg_recovery_target_timeline() pg_recovery_current_timeline() pg_reocvery_current_file_timeline() Usage would then be pg_xlogfile_name_offset_timeline( my_choice_of_timeline(), '3/FF02') -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Oid registry
Andrew Dunstan and...@dunslane.net writes: Given your previous comments, perhaps we could just start handing out Oids (if there is any demand) numbered, say, 9000 and up. That should keep us well clear of any existing use. No, I think you missed my point entirely: handing out OIDs at the top of the manual assignment range is approximately the worst possible scenario. I foresee having to someday move FirstBootstrapObjectId down to 9000, or 8000, or even less, to cope with growth of the auto-assigned OID set. So we need to keep manually assigned OIDs reasonably compact near the bottom of the range, and it doesn't matter at all whether such OIDs are used internally or reserved for external developers. Nor do I see a need for such reserved OIDs to look different from internally-used OIDs. Reserved is reserved. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: incorrect array offset in backend replication tar header
Brian Weaver cmdrcluel...@gmail.com writes: If you're willing to wait a bit on me to code and test my extensions to pg_basebackup I will try to address some of the deficiencies as well add new features. I think it's a mistake to try to handle these issues in the same patch as feature extensions. If you want to submit a patch for them, I'm happy to let you do the legwork, but please keep it narrowly focused on fixing file-format deficiencies. The notes I had last night after examining pg_dump were: magic number written incorrectly, but POSIX fields aren't filled anyway (which is why tar tvf doesn't show them) checksum code is brain-dead; no use in lastSum nor in looping per spec, there should be 1024 zeroes not 512 at end of file; this explains why tar whines about a lone zero block ... Not sure which of these apply to pg_basebackup. As far as the backwards compatibility issue goes, what seems like a good idea after sleeping on it is (1) fix pg_dump in HEAD to emit standard-compliant tar files; (2) fix pg_restore in HEAD and all back branches to accept both the standard and the incorrect magic field. This way, the only people with a compatibility problem would be those trying to use by-then-ancient pg_restore versions to read 9.3 or later pg_dump output. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] pg_upgrade does not completely honor --new-port
Hi, I just performed a test upgrade from 9.1 to 9.2, and used --new-port variable. However, the analyze_new_cluster.sh does not include the new port, thus when I run it, it fails. Any chance to add the port number to the script? Also, is it worth to add the value specified in --new-bindir as a prefix to vacuumdb command in the same script? Regards, -- Devrim GÜNDÜZ Principal Systems Engineer @ EnterpriseDB: http://www.enterprisedb.com PostgreSQL Danışmanı/Consultant, Red Hat Certified Engineer Community: devrim~PostgreSQL.org, devrim.gunduz~linux.org.tr http://www.gunduz.org Twitter: http://twitter.com/devrimgunduz signature.asc Description: This is a digitally signed message part
Re: [HACKERS] Oid registry
On 09/25/2012 10:23 AM, Tom Lane wrote: Andrew Dunstan and...@dunslane.net writes: Given your previous comments, perhaps we could just start handing out Oids (if there is any demand) numbered, say, 9000 and up. That should keep us well clear of any existing use. No, I think you missed my point entirely: handing out OIDs at the top of the manual assignment range is approximately the worst possible scenario. I foresee having to someday move FirstBootstrapObjectId down to 9000, or 8000, or even less, to cope with growth of the auto-assigned OID set. So we need to keep manually assigned OIDs reasonably compact near the bottom of the range, and it doesn't matter at all whether such OIDs are used internally or reserved for external developers. Nor do I see a need for such reserved OIDs to look different from internally-used OIDs. Reserved is reserved. OK, point taken. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: incorrect array offset in backend replication tar header
Unless I misread the code, the tar format and streaming xlog are mutually exclusive. Considering my normal state of fatigue it's not unlikely. I don't want to have to set my wal_keep_segments artificially high just for the backup On Tue, Sep 25, 2012 at 10:05 AM, Marko Tiikkaja pgm...@joh.to wrote: On 9/25/12 3:38 PM, Brian Weaver wrote: I want to modify pg_basebackup to include the WAL files in the tar output. Doesn't pg_basebackup -x do exactly that? Regards, Marko Tiikkaja -- /* insert witty comment here */ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: incorrect array offset in backend replication tar header
Tom, I'm fine with submitting highly focused patches first. I was just explaining my end-goal. Still I will need time to patch, compile, and test before submitting so you're not going to see any output from me for a few days. That's all assuming my employer can leave me alone long enough to focus on a single task. I'm far too interrupt driven at work. -- Brian On Tue, Sep 25, 2012 at 10:30 AM, Tom Lane t...@sss.pgh.pa.us wrote: Brian Weaver cmdrcluel...@gmail.com writes: If you're willing to wait a bit on me to code and test my extensions to pg_basebackup I will try to address some of the deficiencies as well add new features. I think it's a mistake to try to handle these issues in the same patch as feature extensions. If you want to submit a patch for them, I'm happy to let you do the legwork, but please keep it narrowly focused on fixing file-format deficiencies. The notes I had last night after examining pg_dump were: magic number written incorrectly, but POSIX fields aren't filled anyway (which is why tar tvf doesn't show them) checksum code is brain-dead; no use in lastSum nor in looping per spec, there should be 1024 zeroes not 512 at end of file; this explains why tar whines about a lone zero block ... Not sure which of these apply to pg_basebackup. As far as the backwards compatibility issue goes, what seems like a good idea after sleeping on it is (1) fix pg_dump in HEAD to emit standard-compliant tar files; (2) fix pg_restore in HEAD and all back branches to accept both the standard and the incorrect magic field. This way, the only people with a compatibility problem would be those trying to use by-then-ancient pg_restore versions to read 9.3 or later pg_dump output. regards, tom lane -- /* insert witty comment here */ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Oid registry
Excerpts from Hitoshi Harada's message of mar sep 25 02:11:14 -0300 2012: Of course you can look up type name conlusting SysCache and see if the type name is hstore, but it's expensive to do it for every function invocation, so caching the hstore's oid in plv8 is the current workaround, which is not truly safe because the hstore's oid may change while caching. Hm, couldn't you just register an invalidation callback with CacheRegisterSyscacheCallback()? -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [WIP] Performance Improvement by reducing WAL for Update Operation
On Tuesday, September 25, 2012 7:30 PM Heikki Linnakangas wrote: On 24.09.2012 13:57, Amit kapila wrote: Rebased version of patch based on latest code. When HOT was designed, we decided that heap_update needs to compare the old and new attributes directly, with memcmp(), to determine whether any of the indexed columns have changed. It was not deemed infeasible to pass down that information from the executor. I don't remember the details of why that was, but you seem to trying to same thing in this patch, and pass the bitmap of modified cols from the executor to heap_update(). I'm pretty sure that won't work, for the same reasons we didn't do it for HOT. I think the reason of not relying on modified columns can be some such case where modified columns might not give the correct information. It may be due to Before triggers can change the modified columns that's why for HOT update we need to do Comparison. In our case we have taken care of such a case by not doing optimization, so not relying on modified columns. If you feel it is must to do the comparison, we can do it in same way as we identify for HOT? I still feel that it would probably be better to use a generic delta encoding scheme, instead of inventing one. How about VCDIFF (http://tools.ietf.org/html/rfc3284), for example? Or you could reuse the LZ compressor that we already have in the source tree. You can use LZ for delta compression by initializing the history buffer of the algorithm with the old tuple, and then compressing the new tuple as usual. Or you could still use the knowledge of where the attributes begin and end and which attributes were updated, and do the encoding similar to how you did in the patch, but use LZ as the output format. That way the decoding would be the same as LZ decompression. Can you please explain me why you think that after doing encoding doing LZ compression on it is better, as already we have reduced the amount of WAL for update by only storing changed column information? a. is it to further reduce the size of WAL b. storing diff WAL in some standard format c. or does it give any other kind of benefit With Regards, Amit Kapila. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Oid registry
Alvaro Herrera alvhe...@2ndquadrant.com writes: Excerpts from Hitoshi Harada's message of mar sep 25 02:11:14 -0300 2012: Of course you can look up type name conlusting SysCache and see if the type name is hstore, but it's expensive to do it for every function invocation, so caching the hstore's oid in plv8 is the current workaround, which is not truly safe because the hstore's oid may change while caching. Hm, couldn't you just register an invalidation callback with CacheRegisterSyscacheCallback()? Yeah, if the code that needs the value is in the backend; but if it is, the cost of looking the value up again probably isn't that much either. The need for a cache is much greater in client-side code. The bigger issues in my mind have to do with how you look up the type in the first place (considering schema search path and so forth) and how you know that hstore is what you think it is. I recall some discussion of assigning a UUID to each datatype, which might alleviate the second part of that at least. Does nothing for malicious impersonation of a datatype of course, but should prevent accidental collisions. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] xlog filename formatting functions in recovery
On Tue, Sep 25, 2012 at 11:05 PM, Simon Riggs si...@2ndquadrant.com wrote: On 21 September 2012 02:25, Heikki Linnakangas hlinnakan...@vmware.com wrote: On 03.07.2012 15:13, Robert Haas wrote: On the substance of the patch, I believe the reason why this is currently disallowed is because the TLI is implicitly taken from the running system, and on the standby that might be the wrong value. Yeah, I believe that's the reason. So the question is, what timeline should the functions use on a standby? With the patch as it is, they use 0: postgres=# select pg_xlogfile_name_offset('3/FF02'); pg_xlogfile_name_offset --- (000300FF,131072) (1 row) There's a few different options: 1. current recovery_target_timeline (XLogCtl-recoveryTargetTLI) 2. current ThisTimeLineID, which is bumped every time a timeline-bumping checkpoint record is replayed. (this is not currently visible to backends, but we could easily add a shared memory variable for it) 3. curFileTLI. That is, the TLI of the current file that we're replaying. This is usually the same as ThisTimeLineID, except when replaying a WAL segment where the timeline changes 4. Something else? What do you use these functions for? Which option would make the most sense? I would say there is no sensible solution. So we keep pg_xlogfile_name_offset() banned in recovery, as it is now. We introduce pg_xlogfile_name_offset_timeline() where you have to manually specify the timeline, I agree to introduce such function, but what about extending pg_xlogfile_name and pg_xlogfile_name_offset so that they accept the timeline ID as the second argument, instead of adding new functions? If the second argument is omitted, current timeline ID is used to calculate the WAL file name. then introduce 3 functions that map onto the 3 options above, forcing the user to choose which one they mean. pg_recovery_target_timeline() pg_recovery_current_timeline() pg_reocvery_current_file_timeline() It seems to me that it's not easy for a user to understand the difference of those functions. Since I think that pg_xlogfile_name is likely to be used together with pg_last_xlog_receive_location and pg_last_xlog_replay_location in the standby, it seems better to introduce something like pg_last_xlog_receive_timeline and pg_last_xlog_replay_timeline. That is, a user would execute, for example, SELECT pg_xlogfile_name(pg_last_xlog_replay_location(), pg_last_xlog_replay_timeline()); Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Oid registry
On 09/25/2012 12:14 PM, Tom Lane wrote: Alvaro Herrera alvhe...@2ndquadrant.com writes: Excerpts from Hitoshi Harada's message of mar sep 25 02:11:14 -0300 2012: Of course you can look up type name conlusting SysCache and see if the type name is hstore, but it's expensive to do it for every function invocation, so caching the hstore's oid in plv8 is the current workaround, which is not truly safe because the hstore's oid may change while caching. Hm, couldn't you just register an invalidation callback with CacheRegisterSyscacheCallback()? Yeah, if the code that needs the value is in the backend; but if it is, the cost of looking the value up again probably isn't that much either. The need for a cache is much greater in client-side code. The bigger issues in my mind have to do with how you look up the type in the first place (considering schema search path and so forth) and how you know that hstore is what you think it is. I recall some discussion of assigning a UUID to each datatype, which might alleviate the second part of that at least. Does nothing for malicious impersonation of a datatype of course, but should prevent accidental collisions. One nice thing about that idea (if I understand it correctly) is that we could use it for existing types. The horse has long bolted on Oids for hstore, because it has lots of existing deployments with unknown Oids. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Oid registry
On Tue, Sep 25, 2012 at 09:18:30AM -0400, Andrew Dunstan wrote: I'm not at all familiar with record_to_json or the json datatype, but wouldn't it be appropriate to create a cast from hstore to json to handle that case? No, the difficulty (or at least the first difficulty) is in having the code recognize that it has an hstore at all. The code picks apart the record field by field at run time and takes various actions depending on the field's type. For any type it doesn't recognize it just outputs the value as a string, and so that's what it does with hstore. Mostly this is the right thing but in the hstore case it's rather sad. Is there a particular reason to special case hstore though? Wouldn't it be sufficient to simply look for a cast from the given type oid to json and use that if present. If you don't like cast, allow other extensions to define a function to_json_hook(yourtype) - json which is used. Hardcoding IDs in extensions just doesn't seem right somehow... (Hmm, I see someone else in the thread pointed this out too). Have a nice day, -- Martijn van Oosterhout klep...@svana.org http://svana.org/kleptog/ He who writes carelessly confesses thereby at the very outset that he does not attach much importance to his own thoughts. -- Arthur Schopenhauer signature.asc Description: Digital signature
Re: [HACKERS] Switching timeline over streaming replication
Amit: At some point, every master - slave replicator gets to the point where they need to start thinking about master-master replication. Instead of getting stuck in the weeds to finally realize that master-master is the ONLY way to go, many developers do not start out planning for master - master, but they should, out of habit. You can save yourself a lot of grief just be starting with master-master architecture. But you don't have to USE it, you can just not send WRITE traffic to the servers that you do not want to WRITE to, but all of them should be WRITE servers. That way, the only timeline you ever need is your decision to send WRITE traffic request to them, but there is nothing that prevents you from running MASTER - MASTER all the time and skip the whole slave thing entirely. At this point, I think synchronous replication is only for immediate local replication needs and async for all the master - master stuff. cheers, marco On 9/24/2012 9:44 PM, Amit Kapila wrote: On Monday, September 24, 2012 9:08 PM m...@rpzdesign.com wrote: What a disaster waiting to happen. Maybe the only replication should be master-master replication so there is no need to sequence timelines or anything, all servers are ready masters, no backups or failovers. If you really do not want a master serving, then it should only be handled in the routing of traffic to that server and not the replication logic itself. The only thing that ever came about from failovers was the failure to turn over. The above is opinion only. This feature is for users who want to use master-standby configurations. What do you mean by : then it should only be handled in the routing of traffic to that server and not the replication logic itself. Do you have any idea other than proposed implementation or do you see any problem in currently proposed solution? On 9/24/2012 7:33 AM, Amit Kapila wrote: On Tuesday, September 11, 2012 10:53 PM Heikki Linnakangas wrote: I've been working on the often-requested feature to handle timeline changes over streaming replication. At the moment, if you kill the master and promote a standby server, and you have another standby server that you'd like to keep following the new master server, you need a WAL archive in addition to streaming replication to make it cross the timeline change. Streaming replication will just error out. Having a WAL archive is usually a good idea in complex replication scenarios anyway, but it would be good to not require it. Confirm my understanding of this feature: This feature is for case when standby-1 who is going to be promoted to master has archive mode 'on'. As in that case only its timeline will change. If above is right, then there can be other similar scenario's where it can be used: Scenario-1 (1 Master, 1 Stand-by) 1. Master (archive_mode=on) goes down. 2. Master again comes up 3. Stand-by tries to follow it Now in above scenario also due to timeline mismatch it gives error, but your patch should fix it. Some parts of this patch are just refactoring that probably make sense regardless of the new functionality. For example, I split off the timeline history file related functions to a new file, timeline.c. That's not very much code, but it's fairly isolated, and xlog.c is massive, so I feel that anything that we can move off from xlog.c is a good thing. I also moved off the two functions RestoreArchivedFile() and ExecuteRecoveryCommand(), to a separate file. Those are also not much code, but are fairly isolated. If no-one objects to those changes, and the general direction this work is going to, I'm going split off those refactorings to separate patches and commit them separately. I also made the timeline history file a bit more detailed: instead of recording just the WAL segment where the timeline was changed, it now records the exact XLogRecPtr. That was required for the walsender to know the switchpoint, without having to parse the XLOG records (it reads and parses the history file, instead) IMO separating timeline history file related functions to a new file is good. However I am not sure about splitting for RestoreArchivedFile() and ExecuteRecoveryCommand() into separate file. How about splitting for all Archive related functions: static void XLogArchiveNotify(const char *xlog); static void XLogArchiveNotifySeg(XLogSegNo segno); static bool XLogArchiveCheckDone(const char *xlog); static bool XLogArchiveIsBusy(const char *xlog); static void XLogArchiveCleanup(const char *xlog); .. .. In any case, it will be better if you can split it into multiple patches: 1. Having new functionality of Switching timeline over streaming replication 2. Refactoring related changes. It can make my testing and review for new feature patch little easier. With Regards, Amit Kapila. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription:
Re: [HACKERS] New statistics for WAL buffer dirty writes
Hi, 2012/08/12 7:11, Jeff Janes wrote: On Sat, Jul 28, 2012 at 3:33 PM, Jeff Janes jeff.ja...@gmail.com wrote: On Sat, Jul 7, 2012 at 9:17 PM, Satoshi Nagayasu sn...@uptime.jp wrote: Hi, Jeff Janes has pointed out that my previous patch could hold a number of the dirty writes only in single local backend, and it could not hold all over the cluster, because the counter was allocated in the local process memory. That's true, and I have fixed it with moving the counter into the shared memory, as a member of XLogCtlWrite, to keep total dirty writes in the cluster. ... The comment XLogCtrlWrite must be protected with WALWriteLock mis-spells XLogCtlWrite. The final patch will need to add a sections to the documentation. Thanks to Robert and Tom for addressing my concerns about the pointer volatility. I think there is enough consensus that this is useful without adding more things to it, like histograms or high water marks. However, I do think we will want to add a way to query for the time of the last reset, as other monitoring features are going that way. Is it OK that the count is reset upon a server restart? pg_stat_bgwriter, for example, does not do that. Unfortunately I think fixing this in an acceptable way will be harder than the entire rest of the patch was. The coding looks OK to me, it applies and builds, and passes make check, and does what it says. I didn't do performance testing, as it is hard to believe it would have a meaningful effect. I'll marked it as waiting on author, for the documentation and reset time. I'd ask a more senior hacker to comment on the durability over restarts. I have rewritten the patch to deal with dirty write statistics through pgstat collector as bgwriter does. Yeah, it's a bit bigger rewrite. With this patch, walwriter process and each backend process would sum up dirty writes, and send it to the stat collector. So, the value could be saved in the stat file, and could be kept on restarting. The statistics could be retreive with using pg_stat_get_xlog_dirty_writes() function, and could be reset with calling pg_stat_reset_shared('walwriter'). Now, I have one concern. The reset time could be captured in globalStats.stat_reset_timestamp, but this value is the same with the bgwriter one. So, once pg_stat_reset_shared('walwriter') is called, stats_reset column in pg_stat_bgwriter does represent the reset time for walwriter, not for bgwriter. How should we handle this? Should we split this value? And should we have new system view for walwriter? Of course, I will work on documentation next. Regards, Cheers, Jeff -- Satoshi Nagayasu sn...@uptime.jp Uptime Technologies, LLC. http://www.uptime.jp diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index ff56c26..234d568 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -1518,6 +1518,7 @@ AdvanceXLInsertBuffer(bool new_segment) WriteRqst.Write = OldPageRqstPtr; WriteRqst.Flush = 0; XLogWrite(WriteRqst, false, false); + WalWriterStats.m_xlog_dirty_writes++; LWLockRelease(WALWriteLock); TRACE_POSTGRESQL_WAL_BUFFER_WRITE_DIRTY_DONE(); } diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c index 8389d5c..f031be1 100644 --- a/src/backend/postmaster/pgstat.c +++ b/src/backend/postmaster/pgstat.c @@ -125,6 +125,15 @@ char *pgstat_stat_tmpname = NULL; */ PgStat_MsgBgWriter BgWriterStats; +/* + * WalWriter global statistics counter. + * Despite its name, this counter is actually used not only in walwriter, + * but also in each backend process to sum up xlog dirty writes. + * Those processes would increment this counter in each XLogWrite call, + * then send it to the stat collector process. + */ +PgStat_MsgWalWriter WalWriterStats; + /* -- * Local data * -- @@ -279,6 +288,7 @@ static void pgstat_recv_autovac(PgStat_MsgAutovacStart *msg, int len); static void pgstat_recv_vacuum(PgStat_MsgVacuum *msg, int len); static void pgstat_recv_analyze(PgStat_MsgAnalyze *msg, int len); static void pgstat_recv_bgwriter(PgStat_MsgBgWriter *msg, int len); +static void pgstat_recv_walwriter(PgStat_MsgWalWriter *msg, int len); static void pgstat_recv_funcstat(PgStat_MsgFuncstat *msg, int len); static void pgstat_recv_funcpurge(PgStat_MsgFuncpurge *msg, int len); static void pgstat_recv_recoveryconflict(PgStat_MsgRecoveryConflict *msg, int len); @@ -1188,6 +1198,8 @@ pgstat_reset_shared_counters(const char *target) if (strcmp(target, bgwriter) == 0) msg.m_resettarget = RESET_BGWRITER; + else if (strcmp(target, walwriter) == 0) + msg.m_resettarget = RESET_WALWRITER; else ereport(ERROR,
Re: [HACKERS] pg_reorg in core?
Simon Riggs si...@2ndquadrant.com writes: For me, the Postgres user interface should include * REINDEX CONCURRENTLY * CLUSTER CONCURRENTLY * ALTER TABLE CONCURRENTLY and also that autovacuum would be expanded to include REINDEX and CLUSTER, renaming it to automaint. FWIW, +1 to all those user requirements, and for not having pg_reorg simply moved as-is nearer to core. I would paint the shed autoheal, maybe. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Oid registry
Tom Lane t...@sss.pgh.pa.us writes: Andrew Dunstan and...@dunslane.net writes: Given your previous comments, perhaps we could just start handing out Oids (if there is any demand) numbered, say, 9000 and up. That should keep us well clear of any existing use. No, I think you missed my point entirely: handing out OIDs at the top of the manual assignment range is approximately the worst possible scenario. I foresee having to someday move FirstBootstrapObjectId If that's the only problem, there's an easy way out by registering from the top of the Oid range down to some constant up over there. Now it seems like the problem here is that we both want the extension to be managed as disconnected as possible from PostgreSQL, and at the same time we want to be able to trust them from within the backend with it being the only trusted authority. I'm not security oriented enough to devise a working scheme here, but it makes me think about some shared secret or GnuPG signature. Is it possible to automatically verify that an extension's ID card (containing some type's OID, name, etc) has been signed with PostgreSQL's own private key, without ever having to ship that key in the backend code nor binary? To me it's all about how to setup a distributed network of trust… Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] ldap_err2string
Is there any reason we don't use ldap_err2string() to get readable error messages from LDAP, instead something like error code 49? It's already used in some places, so it's apparently OK to use. Proposed patch attached. diff --git a/src/backend/libpq/auth.c b/src/backend/libpq/auth.c index c765454..74036e2 100644 --- a/src/backend/libpq/auth.c +++ b/src/backend/libpq/auth.c @@ -2037,8 +2037,7 @@ static int pam_passwd_conv_proc(int num_msg, const struct pam_message ** msg, { #ifndef WIN32 ereport(LOG, - (errmsg(could not initialize LDAP: error code %d, - errno))); + (errmsg(could not initialize LDAP: %m))); #else ereport(LOG, (errmsg(could not initialize LDAP: error code %d, @@ -2051,7 +2050,7 @@ static int pam_passwd_conv_proc(int num_msg, const struct pam_message ** msg, { ldap_unbind(*ldap); ereport(LOG, - (errmsg(could not set LDAP protocol version: error code %d, r))); + (errmsg(could not set LDAP protocol version: %s, ldap_err2string(r; return STATUS_ERROR; } @@ -2104,7 +2103,7 @@ static int pam_passwd_conv_proc(int num_msg, const struct pam_message ** msg, { ldap_unbind(*ldap); ereport(LOG, -(errmsg(could not start LDAP TLS session: error code %d, r))); +(errmsg(could not start LDAP TLS session: %s, ldap_err2string(r; return STATUS_ERROR; } } @@ -2193,8 +2192,8 @@ static int pam_passwd_conv_proc(int num_msg, const struct pam_message ** msg, if (r != LDAP_SUCCESS) { ereport(LOG, - (errmsg(could not perform initial LDAP bind for ldapbinddn \%s\ on server \%s\: error code %d, - port-hba-ldapbinddn, port-hba-ldapserver, r))); + (errmsg(could not perform initial LDAP bind for ldapbinddn \%s\ on server \%s\: %s, + port-hba-ldapbinddn, port-hba-ldapserver, ldap_err2string(r; return STATUS_ERROR; } @@ -2218,8 +2217,8 @@ static int pam_passwd_conv_proc(int num_msg, const struct pam_message ** msg, if (r != LDAP_SUCCESS) { ereport(LOG, - (errmsg(could not search LDAP for filter \%s\ on server \%s\: error code %d, - filter, port-hba-ldapserver, r))); + (errmsg(could not search LDAP for filter \%s\ on server \%s\: %s, + filter, port-hba-ldapserver, ldap_err2string(r; pfree(filter); return STATUS_ERROR; } @@ -2306,8 +2305,8 @@ static int pam_passwd_conv_proc(int num_msg, const struct pam_message ** msg, if (r != LDAP_SUCCESS) { ereport(LOG, - (errmsg(LDAP login failed for user \%s\ on server \%s\: error code %d, - fulluser, port-hba-ldapserver, r))); + (errmsg(LDAP login failed for user \%s\ on server \%s\: %s, + fulluser, port-hba-ldapserver, ldap_err2string(r; pfree(fulluser); return STATUS_ERROR; } -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Oid registry
On 9/24/12 10:26 PM, Andrew Dunstan wrote: Well, an obvious case is how record_to_json handles fields. If it knows nothing about the type all it can do is output the string value. That doesn't work well for types such as hstore. If it could reliably recognize a field as an hstore it might well be able to do lots better. I think the fix there is that the type supplies a to JSON function as part of its type definition (just like input and output functions). to XML would also be useful. Most types wouldn't need them and default to string output. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Oid registry
On 9/24/12 11:39 PM, Tom Lane wrote: My recollection of the PGCon discussion is that people wanted to allow client-side code to hard-wire type OIDs for add-on types, in more or less the same way that things like JDBC know that 25 is text. If I write a custom uint type and want to explain to JDBC that it is number-like, I don't think it's very attractive to have to register a hard-wired OID for it. There would not only be the burden of maintaining this OID database, JDBC and every driver would on the other hand have to maintain its own database of all custom types in the world. I would rather imagine a system where the type communicates its properties to the client using some sort of label system, like number, string, pair of number (point), pair of string (hstore). Or something with more or less detail. Maybe something like that already exists (maybe somewhere in thrift, protocol buffers, etc.?). -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Oid registry
On 9/25/12 1:11 AM, Hitoshi Harada wrote: Say, if plv8 wants to convert hstore into a javascript object. It is arbitrary for users to define such a function that accepts hstore as arguments, but how does plv8 know the input is actually hstore? That's what the proposed transforms feature would be for: https://commitfest.postgresql.org/action/patch_view?id=879 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Oid registry
Peter Eisentraut pete...@gmx.net writes: On 9/24/12 10:26 PM, Andrew Dunstan wrote: Well, an obvious case is how record_to_json handles fields. If it knows nothing about the type all it can do is output the string value. That doesn't work well for types such as hstore. If it could reliably recognize a field as an hstore it might well be able to do lots better. I think the fix there is that the type supplies a to JSON function as part of its type definition (just like input and output functions). to XML would also be useful. Most types wouldn't need them and default to string output. Yes ... but I really don't want to go down the path of treating those as new type properties; it doesn't scale. (And please don't tell me that JSON is the last word in container types and there will never be requests for any more of these.) Can we define these functions as being the cast-from-foo-to-json and cast-from-foo-to-xml functions? That would let us use the existing cast infrastructure to manage them. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Oid registry
On 9/25/12 6:13 AM, Heikki Linnakangas wrote: That brings us to another question: should the cast be part of the hstore extension, or json? (json is built-in, but imagine for a moment that it was an extension, too, so that there was a choice). IIRC someone started a discussion on that recently on pgsql-hackers, but I don't think there was any conclusion on that. That just depends how just want to direct the dependency arrows between these things. Either you make it a third extension, or you put it into one or the other, for example, if you feel that json support is integral to the hstore functionality. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Oid registry
On 9/25/12 9:18 AM, Andrew Dunstan wrote: No, the difficulty (or at least the first difficulty) is in having the code recognize that it has an hstore at all. The code picks apart the record field by field at run time and takes various actions depending on the field's type. For any type it doesn't recognize it just outputs the value as a string, and so that's what it does with hstore. Mostly this is the right thing but in the hstore case it's rather sad. But if you have a cast defined from hstore to json, then it is unambiguous, because the pg_cast entry binds the types together by OID. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Oid registry
On 9/25/12 5:58 PM, Tom Lane wrote: Yes ... but I really don't want to go down the path of treating those as new type properties; it doesn't scale. (And please don't tell me that JSON is the last word in container types and there will never be requests for any more of these.) Yeah, I didn't like that part either, but we only add one every five years or so. Can we define these functions as being the cast-from-foo-to-json and cast-from-foo-to-xml functions? That would let us use the existing cast infrastructure to manage them. Sounds attractive, but there might be some problems in the details. For example, you can't cast scalar values to valid json values, because a valid json value can only be a dictionary or an array. If we had a flag of some kind saying cast from foo to json, but only when part of a larger json serialization, not by itself, then it might work. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Switching timeline over streaming replication
On Tue, Sep 25, 2012 at 11:01 AM, m...@rpzdesign.com m...@rpzdesign.com wrote: Amit: At some point, every master - slave replicator gets to the point where they need to start thinking about master-master replication. Even in a master-master system, the ability to cleanly swap leaders managing a member of the master-master cluster is very useful. This patch can make writing HA software for Postgres a lot less ridiculous. Instead of getting stuck in the weeds to finally realize that master-master is the ONLY way to go, many developers do not start out planning for master - master, but they should, out of habit. You can save yourself a lot of grief just be starting with master-master architecture. I've seen more projects get stuck spinning their wheels on the one Master-Master system to rule them all then succeed and move on. It doesn't help that master-master does not have a single definition, and different properties are possible with different logical models, too, so that pervades its way up to the language layer. As-is, managing single-master HA Postgres is a huge pain without this patch. If there is work to be done on master-master, the logical replication and event trigger work are probably more relevant, and I know the authors of those projects are keen to make it more feasible to experiment. -- fdr -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Oid registry
Peter Eisentraut pete...@gmx.net writes: On 9/25/12 5:58 PM, Tom Lane wrote: Can we define these functions as being the cast-from-foo-to-json and cast-from-foo-to-xml functions? That would let us use the existing cast infrastructure to manage them. Sounds attractive, but there might be some problems in the details. For example, you can't cast scalar values to valid json values, because a valid json value can only be a dictionary or an array. If we had a flag of some kind saying cast from foo to json, but only when part of a larger json serialization, not by itself, then it might work. Actually, after reading another message you sent, I thought you were going to respond that your proposed transforms feature would cover it. If there's some reason that's not what to use, I guess we could add another optional argument to cast support functions; but that interface is already rather overloaded. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Switching timeline over streaming replication
On 09/25/12 11:01 AM, m...@rpzdesign.com wrote: At some point, every master - slave replicator gets to the point where they need to start thinking about master-master replication. master-master and transactional integrity are mutually exclusive, except perhaps in special cases like Oracle RAC, where the masters share a coherent cache and implement global locks. -- john r pierceN 37, W 122 santa cruz ca mid-left coast -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Oid registry
On 09/25/2012 12:13 PM, Heikki Linnakangas wrote: On 25.09.2012 12:19, Hitoshi Harada wrote: On Tue, Sep 25, 2012 at 1:06 AM, Simon Riggssi...@2ndquadrant.com wrote: On 24 September 2012 21:26, Andrew Dunstanand...@dunslane.net wrote: Well, an obvious case is how record_to_json handles fields. If it knows nothing about the type all it can do is output the string value. That doesn't work well for types such as hstore. If it could reliably recognize a field as an hstore it might well be able to do lots better. I'm not at all familiar with record_to_json or the json datatype, but wouldn't it be appropriate to create a cast from hstore to json to handle that case? That brings us to another question: should the cast be part of the hstore extension, or json? (json is built-in, but imagine for a moment that it was an extension, too, so that there was a choice). IIRC someone started a discussion on that recently on pgsql-hackers, but I don't think there was any conclusion on that. I would make the CAST part of both extensions and try to do it so, that if the other type does not exists then it will not be created. So the extension that comes later will be creating it. The most sensible extension would be to make it into a 3rd extension which depends on both json and hstore being installed CREATE EXTENSION hstore_to_json_cast; - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Oid registry
On 09/26/2012 12:06 AM, Peter Eisentraut wrote: On 9/25/12 5:58 PM, Tom Lane wrote: Yes ... but I really don't want to go down the path of treating those as new type properties; it doesn't scale. (And please don't tell me that JSON is the last word in container types and there will never be requests for any more of these.) Yeah, I didn't like that part either, but we only add one every five years or so. Can we define these functions as being the cast-from-foo-to-json and cast-from-foo-to-xml functions? That would let us use the existing cast infrastructure to manage them. Sounds attractive, but there might be some problems in the details. For example, you can't cast scalar values to valid json values, because a valid json value can only be a dictionary or an array. Nope. Json _value_ can be anything you put in these dicts or arrays. It was just definition about something called a json generator which was defined to return an array or object/dict and the json type already can hold all the scalar values, the input conversion functions happily accept these and generate corresponding json values. It was all hashed through when I woke up on this too late in the 9.2 dev cycle and proposed all the CAST and to_json functions and also a single to_json for any json generation instead of array_to_json and row_to_json. so currently we do and we don't have this json is an array or dict functionality we cant convert anything else directly to json using a to_json function, but we can do it through input/output functions and thus we can output scalar-valued json values : jt=# select '1'::json,'1'::json; json | json --+-- 1| 1 (1 row) IIRC it was something about the to_json functions being generators and type io functions not being that :P If we had a flag of some kind saying cast from foo to json, but only when part of a larger json serialization, not by itself, then it might work. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Switching timeline over streaming replication
John: Who has the money for oracle RAC or funding arrogant bastard Oracle CEO Ellison to purchase another island? Postgres needs CHEAP, easy to setup, self healing, master-master-master-master and it needs it yesterday. I was able to patch the 9.2.0 code base in 1 day and change my entire architecture strategy for replication into self healing async master-master-master and the tiniest bit of sharding code imaginable That is why I suggest something to replace OIDs with ROIDs for replication ID. (CREATE TABLE with ROIDS) I implement ROIDs as a uniform design pattern for the table structures. Synchronous replication maybe between 2 local machines if absolutely no local hardware failure is acceptable, but cheap, scaleable synchronous, TRANSACTIONAL, master-master-master-master is a real tough slog. I could implement global locks in the external replication layer if I choose, but there are much easier ways in routing requests thru the load balancer and request sharding than trying to manage global locks across the WAN. Good luck with your HA patch for Postgres. Thanks for all of the responses! You guys are 15 times more active than the MySQL developer group, likely because they do not have a single db engine that meets all the requirements like PG. marco On 9/25/2012 5:10 PM, John R Pierce wrote: On 09/25/12 11:01 AM, m...@rpzdesign.com wrote: At some point, every master - slave replicator gets to the point where they need to start thinking about master-master replication. master-master and transactional integrity are mutually exclusive, except perhaps in special cases like Oracle RAC, where the masters share a coherent cache and implement global locks. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] htup header reorganization breaks many extension modules
I haven't followed the details of the htup header reorganization, but I have noticed that a number of external extension modules will be broken because of the move of GETSTRUCT() and to a lesser extent heap_getattr(). Of course some #ifdefs can fix that, but it seems annoying to make everyone do that. Maybe this could be reconsidered to reduce the impact on other projects. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Oid registry
On Tue, 2012-09-25 at 18:22 -0400, Tom Lane wrote: Actually, after reading another message you sent, I thought you were going to respond that your proposed transforms feature would cover it. I had thought about this some time ago, but it's clearer to think of casts as associating two types, versus transforms associating a type and a language. JSON and XML tend to be types. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Oid registry
On Mon, 2012-09-24 at 21:18 -0700, Daniel Farina wrote: The gap between pre-JSON-in-the-standard-library in Python, Ruby, et al and post-JSON-in-stdlib was much smaller. Except in Python they renamed the thing. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_reorg in core?
On Wed, Sep 26, 2012 at 4:42 AM, Dimitri Fontaine dimi...@2ndquadrant.frwrote: Simon Riggs si...@2ndquadrant.com writes: For me, the Postgres user interface should include * REINDEX CONCURRENTLY * CLUSTER CONCURRENTLY * ALTER TABLE CONCURRENTLY and also that autovacuum would be expanded to include REINDEX and CLUSTER, renaming it to automaint. FWIW, +1 to all those user requirements, and for not having pg_reorg simply moved as-is nearer to core. I would paint the shed autoheal, maybe. Yes, completely agreed. Based on what Simon is suggesting, REINDEX and CLUSTER extensions are prerequisites for autovacuum extension. It would need to use a mechanism that it slightly different than pg_reorg. ALTER TABLE could used something close to pg_reorg by creating a new table then swaping the 2 tables. The cases of column drop and addition particularly need some thoughts. I would like to work on such features and provide patches for the 2 first. This will of course strongly depend on the time I can spend on in the next couple of months. -- Michael Paquier http://michael.otacoo.com
Re: [HACKERS] [COMMITTERS] pgsql: Properly set relpersistence for fake relcache entries.
On Fri, Sep 21, 2012 at 10:41 AM, Andres Freund and...@2ndquadrant.com wrote: Hrm. I retract my earlier statement about the low likelihood of corruption due to this. Yeah. :-( We've recently had at least one report of autovacuum failing to terminate due to a series of index pages forming a circular loop, and at least one case where it appears that the data became not-unique on a column upon which a unique index existed, in releases that contain this bug. It seems therefore that REINDEX + VACUUM with vacuum_freeze_table_age=0 is not quite sufficient to recover from this problem. If your index has come to contain a circularity, vacuum will fail to terminate, and you'll need to drop it completely to recover. And if you were relying on your index to enforce a unique constraint and it didn't, you'll need to do manual data repair before it will be possible to rebuild or replace that index. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Oid registry
On 09/25/2012 08:35 PM, Peter Eisentraut wrote: On Tue, 2012-09-25 at 18:22 -0400, Tom Lane wrote: Actually, after reading another message you sent, I thought you were going to respond that your proposed transforms feature would cover it. I had thought about this some time ago, but it's clearer to think of casts as associating two types, versus transforms associating a type and a language. JSON and XML tend to be types. OK, I think this solves the object_to_json problem after all - we'll look for a cast to json and if it's not there use the string form of the object. Nice. That still leaves the other uses for having well known Oids (or possibly UUIDs) for non-builtin types (e.g. so Drivers don't have to look them up in the catalogs, or having issues when types are added to the core.) cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Oid registry
On Tue, Sep 25, 2012 at 10:23 AM, Tom Lane t...@sss.pgh.pa.us wrote: Andrew Dunstan and...@dunslane.net writes: Given your previous comments, perhaps we could just start handing out Oids (if there is any demand) numbered, say, 9000 and up. That should keep us well clear of any existing use. No, I think you missed my point entirely: handing out OIDs at the top of the manual assignment range is approximately the worst possible scenario. I foresee having to someday move FirstBootstrapObjectId down to 9000, or 8000, or even less, to cope with growth of the auto-assigned OID set. So we need to keep manually assigned OIDs reasonably compact near the bottom of the range, and it doesn't matter at all whether such OIDs are used internally or reserved for external developers. Nor do I see a need for such reserved OIDs to look different from internally-used OIDs. Reserved is reserved. I'm not sure how much anyone cares, but just in case anyone does... it would be mighty useful to EnterpriseDB to have a range of OIDs that are guarantee not to be assigned to anyone else, because we're maintaining a fork of PostgreSQL that regularly merges with the mainline. We're actually likely to get crunched in our fork well before PostgreSQL itself does. There are enough other forks of PostgreSQL out there that there may other people who are in a similar situation, though I am not sure how much we want to cater to people doing such things. That having been said, I can't really see how it would be practical anyway unless we raise the 16384 lower limit for user-assigned OIDs considerably. And I'm not sure how to do that without breaking pg_upgrade. I am somewhat opposed to the idea of an OID registry. I can't see why the space of things people want to reserve OIDs for would be only tens rather than thousands. There are surely plenty of extensions that would like to depend on specific other extensions, or on core. If we legislate that hard-coded OIDs are the way to do that, I think we're going to end up with a lot of 'em. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Review for pg_dump: Sort overloaded functions in deterministic order
Patch looks good, all concerns that had been raised previously have been addressed in this version of the patch. The only thing that IMO needs to change is a stylistic issue: if (fout-remoteVersion = 80200) { [...] (fout-remoteVersion = 80400) ? pg_catalog.pg_get_function_identity_arguments(oid) : NULL::text, [...] } Please just create a whole new if (fout-remoteVersion = 80400) { [...] } here. Other than that, the feature works as advertised and does not negatively affect runtime or memory consumption (the new field is only added to functions / aggregates). Please send a new version of the patch that changes the above mentioned item, the patch also needs rebasing (off by a couple of lines). -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] _FORTIFY_SOURCE by default?
On Mon, 2012-09-17 at 23:10 -0400, Tom Lane wrote: I had suggested to put it into src/include/port/linux.h OK, that's probably reasonable. It turns out that doesn't work because port.h is included after all the interesting libc header files (stdio.h etc.). I think the best place to put it is going to be src/template/linux. That would mirror the existing practice of putting it on the compiler command line. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] htup header reorganization breaks many extension modules
On Tue, Sep 25, 2012 at 5:30 PM, Peter Eisentraut pete...@gmx.net wrote: I haven't followed the details of the htup header reorganization, but I have noticed that a number of external extension modules will be broken because of the move of GETSTRUCT() and to a lesser extent heap_getattr(). Of course some #ifdefs can fix that, but it seems annoying to make everyone do that. Maybe this could be reconsidered to reduce the impact on other projects. But it's only add #include access/htup_details.h? I'd not argue it's harmful unless I missed your point. Thanks, -- Hitoshi Harada -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] system_information.triggers truncate triggers
Hello hackers: I need a list of all triggers created in my database, but the view system_information.triggers does not show truncate triggers, but it does for insert, update and delete triggers. The same problem is found in versions 9.1 and 9.2. Regards. 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS INFORMATICAS... CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION http://www.uci.cu http://www.facebook.com/universidad.uci http://www.flickr.com/photos/universidad_uci -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] system_information.triggers truncate triggers
On Wed, Sep 26, 2012 at 12:17 AM, Daymel Bonne Solís dbo...@uci.cu wrote: Hello hackers: I need a list of all triggers created in my database, but the view system_information.triggers does not show truncate triggers, but it does for insert, update and delete triggers. The same problem is found in versions 9.1 and 9.2. The definition of information_schema.triggers contains this: FROM pg_namespace n, pg_class c, pg_trigger t, -- hard-wired refs to TRIGGER_TYPE_INSERT, TRIGGER_TYPE_DELETE, -- TRIGGER_TYPE_UPDATE; we intentionally omit TRIGGER_TYPE_TRUNCATE (VALUES (4, 'INSERT'), (8, 'DELETE'), (16, 'UPDATE')) AS em (num, text) so it seems that we are not showing TRUNCATE triggers intentionally, but that comment fails to explain why -- Jaime Casanova www.2ndQuadrant.com Professional PostgreSQL: Soporte 24x7 y capacitación -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Properly set relpersistence for fake relcache entries.
I'm afraid I'm exactly in this situation now. Last entry from the 9.1.6 recommended VACUUM (FREEZE, VERBOSE, ANALYZE) was: INFO: meta_version_chunks: found 55363 removable, 32566245 nonremovable row versions in 450292 out of 450292 pages DETAIL: 0 dead row versions cannot be removed yet. There were 588315 unused item pointers. 0 pages are entirely empty. CPU 2.44s/5.77u sec elapsed 2150.18 sec. INFO: vacuuming pg_toast.pg_toast_16582 And here're are the locks held by the VACCUM backend: select oid,relname,relkind,relpages,reltuples::numeric(15,0),reltoastrelid,reltoastidxid from pg_class where oid in (select relation from pg_locks where pid = 1380); oid | relname| relkind | relpages | reltuples | reltoastrelid | reltoastidxid ---+--+-+--+---+---+--- 16585 | pg_toast_16582 | t | 16460004 | 58161600 | 0 | 16587 16587 | pg_toast_16582_index | i | 188469 | 58161600 | 0 | 0 16582 | meta_version_chunks | r | 450292 | 32566200 | 16585 | 0 I will not touch anything and would like to get some recommendations on how to proceed. 2012/9/26 Robert Haas robertmh...@gmail.com On Fri, Sep 21, 2012 at 10:41 AM, Andres Freund and...@2ndquadrant.com wrote: Hrm. I retract my earlier statement about the low likelihood of corruption due to this. Yeah. :-( We've recently had at least one report of autovacuum failing to terminate due to a series of index pages forming a circular loop, and at least one case where it appears that the data became not-unique on a column upon which a unique index existed, in releases that contain this bug. It seems therefore that REINDEX + VACUUM with vacuum_freeze_table_age=0 is not quite sufficient to recover from this problem. If your index has come to contain a circularity, vacuum will fail to terminate, and you'll need to drop it completely to recover. And if you were relying on your index to enforce a unique constraint and it didn't, you'll need to do manual data repair before it will be possible to rebuild or replace that index. -- Victor Y. Yegorov