Re: [HACKERS] New Event Trigger: table_rewrite
On 16 November 2014 06:59, Michael Paquier michael.paqu...@gmail.com wrote: Note that this patch has been submitted but there have been no real discussion around it.. This seems a bit too fast to commit it, no? Committing uncontentious patches at the end of the commitfest seems normal, no? -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] New Event Trigger: table_rewrite
On 16 November 2014 06:59, Michael Paquier michael.paqu...@gmail.com wrote: 1) This patch is authorizing VACUUM and CLUSTER to use the event triggers ddl_command_start and ddl_command_end, but aren't those commands actually not DDLs but control commands? I could go either way on that. I'm happy to remove those from this commit. 2) The documentation of src/sgml/event-trigger.sgml can be improved, particularly I think that the example function should use a maximum of upper-case letters for reserved keywords, and also this bit: you're not allowed to rewrite the table foo should be rewritten to something like that: Rewrite of table foo not allowed 3) A typo, missing a plural: provides two built-in event trigger helper functionS I thought the documentation was very good, in comparison to most other feature submissions. Given that this is one of the areas I moan about a lot, that says something. 4) pg_event_trigger_table_rewrite_oid is able to return only one OID, which is the one of the table being rewritten, and it is limited to one OID because VACUUM, CLUSTER and ALTER TABLE can only run on one object at the same time in a single transaction. What about thinking that we may have in the future multiple objects rewritten in a single transaction, hence multiple OIDs could be fetched? Why would this API support something which the normal trigger API doesn't, just in case we support a feature that hadn't ever been proposed or discussed? Why can't such a change wait until that feature arrives? 5) parsetree is passed to cluster_rel only for EventTriggerTableRewrite. I am not sure if there are any extension using cluster_rel as is but wouldn't it make more sense to call EventTriggerTableRewrite before the calls to cluster_rel instead? ISTM that this patch is breaking cluster_rel way of doing things. I will remove the call to CLUSTER and VACUUM as proposed above. 6) in_table_rewrite seems unnecessary. typedef struct EventTriggerQueryState { slist_head SQLDropList; boolin_sql_drop; + boolin_table_rewrite; + Oid tableOid; We could simplify that by renaming tableOid to rewriteTableOid or rewriteObjOid and check if its value is InvalidOid to determine if the event table_rewrite is in use or not. Each code path setting those variables sets them all the time similarly: + state-in_table_rewrite = false; + state-tableOid = InvalidOid; And if tableOid is InvaliOid, in_table_rewrite is false. If it is a valid Oid, in_table_rewrite is set to true. Well, that seems a minor change. I'm happy to accept the original coding, but also happy to receive suggested changes. 7) table_rewrite is kicked in ALTER TABLE only when ATRewriteTables is used. The list of commands that actually go through this code path should be clarified in the documentation IMO to help the user apprehend this function. That is somewhat orthogonal to the patch. The rules for rewriting are quite complex, which is why this is needed and why documentation isn't really the answer. Separate doc patch on that would be welcome. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review of Refactoring code for sync node detection
On 31 October 2014 00:27, Michael Paquier michael.paqu...@gmail.com wrote: On Fri, Oct 31, 2014 at 6:59 AM, Jim Nasby jim.na...@bluetreble.com wrote: If we stick with this version I'd argue it makes more sense to just stick the sync_node = and priority = statements into the if block and ditch the continue. /nitpick Let's go with the cleaner version then, I'd prefer code that can be read easily for this code path. Switching back is not much complicated either. I like where you are headed with this feature set. I will do my best to comment and review so we get something in 9.5. Some review comments * The new function calls them a Node rather than a Standby. That is a good change, but it needs to be applied consistently, so we no longer use the word Standby anywhere near the sync rep code. I'd prefer if we continue to use the abbreviation sync for synchronous, since its less typing and avoids spelling mistakes in code and comments. * Rationale... Robert Haas wrote Interestingly, the syncrep code has in some of its code paths the idea that a synchronous node is unique, while other code paths assume that there can be multiple synchronous nodes. If that's fine I think that it would be better to just make the code multiple-sync node aware, by having a single function call in walsender.c and syncrep.c that returns an integer array of WAL sender positions (WalSndCtl). as that seems more extensible long-term. I thought this was the rationale for the patch, but it doesn't do this. If you disagree with Robert, it would be best to say so now, since I would guess he's expecting the patch to be doing as he asked. Or perhaps I misunderstand. * Multiple sync standbys were originally avoided because of the code cost and complexity at ReleaseWaiters(). I'm very keen to keep the code at that point very robust, so simplicity is essential. It would also be useful to have some kind of micro-benchmark that allows us to understand the overhead of various configs. Jim mentions he isn't sure of the performance there. I don't see too much a problem with this patch, but the later patches will require this, so we may as well do this soon. * We probably don't need a comment like /* Cleanup */ inserted above a pfree. ;-) I see this should get committed in next few days, even outside the CF. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review of Refactoring code for sync node detection
Thanks for the comments! On Sun, Nov 16, 2014 at 8:32 PM, Simon Riggs si...@2ndquadrant.com wrote: On 31 October 2014 00:27, Michael Paquier michael.paqu...@gmail.com wrote: On Fri, Oct 31, 2014 at 6:59 AM, Jim Nasby jim.na...@bluetreble.com wrote: If we stick with this version I'd argue it makes more sense to just stick the sync_node = and priority = statements into the if block and ditch the continue. /nitpick Let's go with the cleaner version then, I'd prefer code that can be read easily for this code path. Switching back is not much complicated either. I like where you are headed with this feature set. I will do my best to comment and review so we get something in 9.5. Some review comments * The new function calls them a Node rather than a Standby. That is a good change, but it needs to be applied consistently, so we no longer use the word Standby anywhere near the sync rep code. I'd prefer if we continue to use the abbreviation sync for synchronous, since its less typing and avoids spelling mistakes in code and comments. Yes I guess this makes sense. * Rationale... Robert Haas wrote Interestingly, the syncrep code has in some of its code paths the idea that a synchronous node is unique, while other code paths assume that there can be multiple synchronous nodes. If that's fine I think that it would be better to just make the code multiple-sync node aware, by having a single function call in walsender.c and syncrep.c that returns an integer array of WAL sender positions (WalSndCtl). as that seems more extensible long-term. I thought this was the rationale for the patch, but it doesn't do this. If you disagree with Robert, it would be best to say so now, since I would guess he's expecting the patch to be doing as he asked. Or perhaps I misunderstand. This comment is originally from me :) http://www.postgresql.org/message-id/CAB7nPqTtmSo0YX1_3_PykpKbdGUi3UeFd0_=2-6vrqo5n_t...@mail.gmail.com Changing with my older opinion, I wrote the patch on this thread with in mind the idea to keep the code as simple as possible, so for now it is better to still think that we have a single sync node. Let's work the multiple node thing once we have a better spec of how to do it, visibly using a dedicated micro-language within s_s_names. * Multiple sync standbys were originally avoided because of the code cost and complexity at ReleaseWaiters(). I'm very keen to keep the code at that point very robust, so simplicity is essential. It would also be useful to have some kind of micro-benchmark that allows us to understand the overhead of various configs. Jim mentions he isn't sure of the performance there. I don't see too much a problem with this patch, but the later patches will require this, so we may as well do this soon. Yes I have been playing with that with the patch series of the previous commit fest, with something not that much kludgy. * We probably don't need a comment like /* Cleanup */ inserted above a pfree. ;-) Sure. I'll send an updated patch tomorrow. Regards, -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Failback to old master
Hi, On Sat, Nov 15, 2014 at 5:31 PM, Maeldron T. maeld...@gmail.com wrote: A safely shut down master (-m fast is safe) can be safely restarted as a slave to the newly promoted master. Fast shutdown shuts down all normal connections, does a shutdown checkpoint and then waits for this checkpoint to be replicated to all active streaming clients. Promoting slave to master creates a timeline switch, that prior to version 9.3 was only possible to replicate using the archive mechanism. As of version 9.3 you don't need to configure archiving to follow timeline switches, just add a recovery.conf to the old master to start it up as a slave and it will fetch everything it needs from the new master. I took your advice and I understood that removing the recovery.conf followed by a restart is wrong. I will not do that on my production servers. However, I can't make it work with promotion. What did I wrong? It was 9.4beta3. mkdir 1 mkdir 2 initdb -D 1/ edit config: change port, wal_level to hot_standby, hot_standby to on, max_wal_senders=7, wal_keep_segments=100, uncomment replication in hba.conf pg_ctl -D 1/ start createdb -p 5433 psql -p 5433 pg_basebackup -p 5433 -R -D 2/ mcedit 2/postgresql.conf change port chmod -R 700 1 chmod -R 700 2 pg_ctl -D 2/ start psql -p 5433 psql -p 5434 everything works pg_ctl -D 1/ stop pg_ctl -D 2/ promote psql -p 5434 cp 2/recovery.done 1/recovery.conf mcedit 1/recovery.conf change port pg_ctl -D 1/ start LOG: replication terminated by primary server DETAIL: End of WAL reached on timeline 1 at 0/3000AE0. LOG: restarted WAL streaming at 0/300 on timeline 1 LOG: replication terminated by primary server DETAIL: End of WAL reached on timeline 1 at 0/3000AE0. This is what I experienced in the past when I tried with promote. The old master disconnects from the new. What am I missing? I think you have to add recovery_target_timeline = '2' in recovery.conf with '2' being the new primary timeline . cf http://www.postgresql.org/docs/9.4/static/recovery-target-settings.html Didier -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] alternative model for handling locking in parallel groups
On Fri, Nov 14, 2014 at 12:03 PM, Andres Freund and...@2ndquadrant.com wrote: Note that you'd definitely not want to do this naively - currently there's baked in assumptions into the vaccum code that only one backend is doing parts of it. I think there's Did something you intended get left out here? 4. Parallel query on a locked relation. Parallel query should work on a table created in the current transaction, or one explicitly locked by user action. It's not acceptable for that to just randomly deadlock, and skipping parallelism altogether, while it'd probably be acceptable for a first version, is not going a good long-term solution. FWIW, I think it's perfectly acceptable to refuse to work in parallel in that scenario. And not just for now. I don't agree with that, but my point is that I think that fixing it so it works is probably no more work than detecting that it isn't going to work, whether the specific proposal in this email pans out or not. The biggest argument I can see to that is parallel index creation. It also sounds buggy and fragile for the query planner to try to guess whether the lock requests in the parallel workers will succeed or fail when issued. I don't know. Checking whether we hold a self exclusive lock on that relation doesn't sound very problematic to me. That seems like gross oversimplification of what we need to check. For example, suppose we want to do a parallel scan. We grab AccessShareLock. Now, another backends waits for AccessExcusiveLock. We start workers, who all try to get AccessShareLock. They wait behind the AccessExclusiveLock, while, outside the view of the current lock manager, we wait for them. No process in our group acquired more than AccesShareLock, yet we've got problems. We need a simple and elegant way to avoid this kind of situation. After thinking about these cases for a bit, I came up with a new possible approach to this problem. Suppose that, at the beginning of parallelism, when we decide to start up workers, we grant all of the locks already held by the master to each worker (ignoring the normal rules for lock conflicts). Thereafter, we do everything the same as now, with no changes to the deadlock detector. That allows the lock conflicts to happen normally in the first two cases above, while preventing the unwanted lock conflicts in the second two cases. I don't think that's safe enough. There's e.g. a reason why ANALYZE requires SUE lock. It'd definitely not be safe to simply grant the worker a SUE lock, just because the master backend already analyzed it or something like that. You could end up with the master and worker backends ANALYZEing the same relation. Well, in the first version of this, I expect to prohibit parallel workers from doing any DML or DDL whatsoever - they will be strictly read-only. So you won't have that problem. Now, eventually, we might relax that, but I would expect that a prohibition on the workers starting new utility commands while in parallel mode wouldn't be very high on anyone's list of restrictions to relax. That said, I can definitely see use for a infrastructure where we explicitly can grant another backend a lock that'd conflict with one we're already holding. But I think it'll need to be more explicit than just granting all currently held locks at the highest current level. And I think it's not necessarily going to be granting them all the locks at their current levels. What I'm thinking of is basically add a step to execMain.c:InitPlan() that goes through the locks and grants the child process all the locks that are required for the statement to run. But not possibly preexisting higher locks. This doesn't actually solve the problem, because we can be incidentally holding locks on other relations - system catalogs, in particular - that will cause the child processes to fail. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Failback to old master
On 16/11/14 13:13, didier wrote: I think you have to add recovery_target_timeline = '2' in recovery.conf with '2' being the new primary timeline . cf http://www.postgresql.org/docs/9.4/static/recovery-target-settings.html Thank you. Based on the link I have added: recovery_target_timeline = 'latest' And now it works. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Index scan optimization
On 30 October 2014 08:43, Haribabu Kommi kommi.harib...@gmail.com wrote: I marked the patch as ready for committer. This looks very interesting. The value of the patch seems to come from skipping costly checks. Your performance test has a leading VARCHAR column, so in that specific case skipping the leading column is a big win. I don't see it would be in all cases. Can we see a few tests on similar things to check for other opportunities and regressions. * Single column bigint index * Multi-column bigint index * 5 column index with mixed text and integers The explanatory comments need some work to more clearly explain what this patch does. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Order of views in stats docs
On Sun, Nov 9, 2014 at 3:46 PM, Magnus Hagander mag...@hagander.net wrote: On Thu, Nov 6, 2014 at 3:01 PM, Peter Eisentraut pete...@gmx.net wrote: On 11/6/14 6:16 AM, Magnus Hagander wrote: Another thought I had in that case is maybe we need to break out the pg_stat_activity and pg_stat_replication views into their own table. They are really the only two views that are different in a lot of ways. Maybe call the splits session statistics views and object statistics views, instead of just standard statistics views? The difference being that they show snapshots of *right now*, whereas all the other views show accumulated statistics over time. Yeah, splitting this up a bit would help, too. Here's an initial run of this. It still feels wrong that we have the dynamic views under the statistics collector. Perhaps longterm we should look at actually splitting them out to a completely different sect1? I only fixed the obvious parts in this - the split out, and the moving of pg_stat_database_conflicts. But it's a first step at least. Objections? Hearing no objections, I've pushed this. There's more to do, but it's a start. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [GSoC2014] Patch ALTER TABLE ... SET LOGGED
* Alvaro Herrera wrote: Michael Paquier wrote: Btw, perhaps this diff should be pushed as a different patch as this is a rather different thing: - if (heapRelation-rd_rel-relpersistence == RELPERSISTENCE_UNLOGGED + if (indexRelation-rd_rel-relpersistence == RELPERSISTENCE_UNLOGGED !smgrexists(indexRelation-rd_smgr, INIT_FORKNUM) As this is a correctness fix, it does not seem necessary to back-patch it: the parent relation always has the same persistence as its indexes. There was an argument for doing it this way that only applies if this patch went in, but I can't remember now what it was. Anyway I pushed the patch after some slight additional changes. Thanks! The buildfarm says -DCLOBBER_CACHE_ALWAYS does not like this patch. -- Christian -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] logical decoding - reading a user catalog table
On 11/13/2014 02:44 PM, Andres Freund wrote: Hi Steve, If it still happens, could you send me instructions of how to reproduce the problem after cloning the necessary source repositories? It's quite hard to validate a possible fix otherwise. 1. Install PG 9.4 2. Perform an initdb max_connections = 200 wal_level=logical max_walsenders=50 wal_keep_segments = 100 wal_sender_timeout = 600s max_replication_slots = 120 Create a user slon with superuser create a user test (set passwords for them you'll use them later) Configure pg_hba.conf to allow slon to connect as a replication user 3. Download slony from github (https://github.com/ssinger/slony1-engine.git) checkout the branch logical_remote_helper ./configure --with-pgconfigdir=/path_to_your_pgcbindir make make install 4. Download clustertest compile clustertest from (https://github.com/clustertest/clustertest-framework). 5. In the slony source tree cd to the clustertest directory 6. cp conf/disorder.properties.sample to conf/disorder.properties Edit the file to have the correct paths, ports, users, passwords. 7 cp conf/java.properties.sample to conf/java.properties edit the file to point at the JAR you downloaded earlier. I run with all 5 databases on the same PG cluster. Performance will be much better with 5 different clusters, but I recommend trying to emulate my configuration as much as possible to replicate this To run the tests then do ./run_all_disorder_tests.sh At the moment (commit df5acfd1a3) is configured to just run the Failover node test cases where I am seeing this not the entire suite. It typically takes between 30 minutes to an hour to run though. I installed things following the above steps on a different system than my usual development laptop and I have been unable to reproduce the error so for (on that system). But I am still able to reproduce it on occasion on my normal development laptop. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] printing table in asciidoc with psql
Hi 2014-11-07 22:37 GMT+01:00 Alvaro Herrera alvhe...@2ndquadrant.com: I did \o /tmp/tst, then \dS create table eh | oh (); \dS and then filtered the output file to HTML. The CREATE TABLE tag ended up in the same line as the title of the following table. I think there's a newline is missing somewhere. The good news is that the | in the table name was processed correctly; but I noticed that the table title is not using the escaped print, so I would imagine that if I put a | in the title, things would go wrong. (I don't know how to put titles on tables other than editing the hardcoded titles for \ commands in psql). Another thing is that spaces around the | seem gratuituous and produce bad results. I tried select * from pg_class which results in a very wide table, and then the HTML output contains some cells with the values in the second line; this makes all rows taller than they must be, because some cells use the first line and other cells in the same row use the second line for the text. I hand-edited the asciidoc and removed the spaces around | which makes the result nicer. (Maybe removing the trailing space is enough.) I see a trailing spaces, but I don't see a described effect. Please, can you send some more specific test case? I fixed a status view and removing trailing spaces Regards Pavel -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services commit 76e033f3a287115f15f249c657b544c148e2efd2 Author: Pavel Stehule pavel.steh...@gooddata.com Date: Sun Nov 16 23:38:46 2014 +0100 every table is in own paragraph diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml index e7fcc73..cd64b88 100644 --- a/doc/src/sgml/ref/psql-ref.sgml +++ b/doc/src/sgml/ref/psql-ref.sgml @@ -2092,8 +2092,8 @@ lo_import 152801 literalaligned/literal, literalwrapped/literal, literalhtml/literal, literallatex/literal (uses literaltabular/literal), - literallatex-longtable/literal, or - literaltroff-ms/literal. + literallatex-longtable/literal, + literaltroff-ms/literal, or literalasciidoc/literal. Unique abbreviations are allowed. (That would mean one letter is enough.) /para @@ -2120,7 +2120,8 @@ lo_import 152801 para The literalhtml/, literallatex/, - literallatex-longtable/literal, and literaltroff-ms/ + literallatex-longtable/literal, literaltroff-ms/, + and literalasciidoc/ formats put out tables that are intended to be included in documents using the respective mark-up language. They are not complete documents! This might not be diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c index 26089352..e00e47b 100644 --- a/src/bin/psql/command.c +++ b/src/bin/psql/command.c @@ -2248,6 +2248,9 @@ _align2string(enum printFormat in) case PRINT_TROFF_MS: return troff-ms; break; + case PRINT_ASCIIDOC: + return asciidoc; + break; } return unknown; } @@ -2321,9 +2324,11 @@ do_pset(const char *param, const char *value, printQueryOpt *popt, bool quiet) popt-topt.format = PRINT_LATEX_LONGTABLE; else if (pg_strncasecmp(troff-ms, value, vallen) == 0) popt-topt.format = PRINT_TROFF_MS; + else if (pg_strncasecmp(asciidoc, value, vallen) == 0) + popt-topt.format = PRINT_ASCIIDOC; else { - psql_error(\\pset: allowed formats are unaligned, aligned, wrapped, html, latex, troff-ms\n); + psql_error(\\pset: allowed formats are unaligned, aligned, wrapped, html, latex, troff-ms, asciidoc\n); return false; } diff --git a/src/bin/psql/common.c b/src/bin/psql/common.c index 66d80b5..ea8b9c1 100644 --- a/src/bin/psql/common.c +++ b/src/bin/psql/common.c @@ -858,6 +858,10 @@ PrintQueryStatus(PGresult *results) html_escaped_print(PQcmdStatus(results), pset.queryFout); fputs(/p\n, pset.queryFout); } + else if (pset.popt.topt.format == PRINT_ASCIIDOC) + { + fprintf(pset.queryFout, \n%s\n, PQcmdStatus(results)); + } else fprintf(pset.queryFout, %s\n, PQcmdStatus(results)); } diff --git a/src/bin/psql/help.c b/src/bin/psql/help.c index ae5fe88..b14b313 100644 --- a/src/bin/psql/help.c +++ b/src/bin/psql/help.c @@ -351,7 +351,7 @@ helpVariables(unsigned short int pager) fprintf(output, _( expanded (or x)toggle expanded output\n)); fprintf(output, _( fieldsep field separator for unaligned output (default '|')\n)); fprintf(output, _( fieldsep_zero set field separator in unaligned mode to zero\n)); - fprintf(output, _( format set output format [unaligned, aligned, wrapped, html, latex, ..]\n)); + fprintf(output, _( format set output format [unaligned, aligned, wrapped, html, latex, asciidoc ..]\n)); fprintf(output, _( footer enable or disable display of the table footer [on, off]\n));
Re: [HACKERS] PostgreSQL doesn't stop propley when --slot option is specified with pg_receivexlog.
On Sat, Nov 15, 2014 at 9:10 PM, Michael Paquier michael.paqu...@gmail.com wrote: On Sat, Nov 15, 2014 at 3:42 AM, Andres Freund and...@2ndquadrant.com wrote: On 2014-11-15 03:25:16 +0900, Fujii Masao wrote: On Fri, Nov 14, 2014 at 7:22 PM, furu...@pm.nttdata.co.jp wrote: pg_ctl stop does't work propley, if --slot option is specified when WAL is flushed only it has switched. These processes still continue even after the posmaster failed:pg_receivexlog, walsender and logger. I could reproduce this problem. At normal shutdown, walsender keeps waiting for the last WAL record to be replicated and flushed in pg_receivexlog. But pg_receivexlog issues sync command only when WAL file is switched. Thus, since pg_receivexlog may never flush the last WAL record, walsender may have to keep waiting infinitely. Right. It is surprising that nobody complained about that before, pg_receivexlog has been released two years ago. It's not so surprising because the problem can happen only when replication slot is specified, i.e., the version is 9.4 or later. pg_recvlogical handles this problem by calling fsync() when it receives the request of immediate reply from the server. That is, at shutdown, walsender sends the request, pg_receivexlog receives it, flushes the last WAL record, and sends the flush location back to the server. Since walsender can see that the last WAL record is successfully flushed in pg_receivexlog, it can exit cleanly. One idea to the problem is to introduce the same logic as pg_recvlogical has, to pg_receivexlog. Thought? Sounds sane to me. Are you looking into doing that? Yep, sounds a good thing to do if master requested answer from the client in the keepalive message. Something like the patch attached would make the deal. Isn't it better to do this only when replication slot is used? Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]
On 13 November 2014 15:35 Amit Kapila Wrote, As mentioned by you offlist that you are not able reproduce this issue, I have tried again and what I observe is that I am able to reproduce it only on *release* build and some cases work without this issue as well, example: ./vacuumdb --analyze-in-stages -t t1 -t t2 -t t3 -t t4 -t t5 -t t6 -t t7 -t t8 -j 8 -d postgres Generating minimal optimizer statistics (1 target) Generating medium optimizer statistics (10 targets) Generating default (full) optimizer statistics So to me, it looks like this is a timing issue and please notice why in error the statement looks like ANALYZE ng minimal optimizer statistics (1 target). I think this is not a valid statement. Let me know if you still could not reproduce it. Thank you for looking into it once again.. I have tried with the release mode, but could not reproduce the same. By looking at server LOG sent by you “ANALYZE ng minimal optimizer statistics (1 target). ”, seems like some corruption. So actually looks like two issues here. 1. Query string sent to server is getting corrupted. 2. Client is getting crashed. I will review the code and try to find the same, meanwhile if you can find some time to debug this, it will be really helpful. Regards, Dilip
Re: [HACKERS] PostgreSQL doesn't stop propley when --slot option is specified with pg_receivexlog.
On Mon, Nov 17, 2014 at 10:02 AM, Fujii Masao masao.fu...@gmail.com wrote: On Sat, Nov 15, 2014 at 9:10 PM, Michael Paquier michael.paqu...@gmail.com wrote: Yep, sounds a good thing to do if master requested answer from the client in the keepalive message. Something like the patch attached would make the deal. Isn't it better to do this only when replication slot is used? Makes sense. What about a check using reportFlushPosition then? -- Michael diff --git a/src/bin/pg_basebackup/receivelog.c b/src/bin/pg_basebackup/receivelog.c index c6c90fb..b426cfe 100644 --- a/src/bin/pg_basebackup/receivelog.c +++ b/src/bin/pg_basebackup/receivelog.c @@ -149,6 +149,34 @@ open_walfile(XLogRecPtr startpoint, uint32 timeline, char *basedir, } /* + * Flush the current WAL file to disk and update the last WAL flush + * positions as well as the last fsync timestamp. On failure, print + * a message to stderr and return false, otherwise return true. + */ +static bool +fsync_walfile(XLogRecPtr blockpos, int64 timestamp) +{ + /* nothing to do if no WAL file open */ + if (walfile == -1) + return true; + + /* nothing to do if data has already been flushed */ + if (blockpos = lastFlushPosition) + return true; + + if (fsync(walfile) != 0) + { + fprintf(stderr, _(%s: could not fsync file \%s\: %s\n), +progname, current_walfile_name, strerror(errno)); + return false; + } + + lastFlushPosition = blockpos; + last_fsync = timestamp; + return true; +} + +/* * Close the current WAL file (if open), and rename it to the correct * filename if it's complete. On failure, prints an error message to stderr * and returns false, otherwise returns true. @@ -787,21 +815,12 @@ HandleCopyStream(PGconn *conn, XLogRecPtr startpos, uint32 timeline, * If fsync_interval has elapsed since last WAL flush and we've written * some WAL data, flush them to disk. */ - if (lastFlushPosition blockpos - walfile != -1 - ((fsync_interval 0 - feTimestampDifferenceExceeds(last_fsync, now, fsync_interval)) || - fsync_interval 0)) + if ((fsync_interval 0 + feTimestampDifferenceExceeds(last_fsync, now, fsync_interval)) || + fsync_interval 0) { - if (fsync(walfile) != 0) - { -fprintf(stderr, _(%s: could not fsync file \%s\: %s\n), - progname, current_walfile_name, strerror(errno)); + if (!fsync_walfile(blockpos, now)) goto error; - } - - lastFlushPosition = blockpos; - last_fsync = now; } /* @@ -999,7 +1018,6 @@ ProcessKeepaliveMsg(PGconn *conn, char *copybuf, int len, { int pos; bool replyRequested; - int64 now; /* * Parse the keepalive message, enclosed in the CopyData message. @@ -1021,7 +1039,15 @@ ProcessKeepaliveMsg(PGconn *conn, char *copybuf, int len, /* If the server requested an immediate reply, send one. */ if (replyRequested still_sending) { - now = feGetCurrentTimestamp(); + int64 now = feGetCurrentTimestamp(); + + /* + * Flush data, so a fresh position is sent but do this only if a + * flush position needs to be reported. + */ + if (reportFlushPosition !fsync_walfile(blockpos, now)) + return false; + if (!sendFeedback(conn, blockpos, now, false)) return false; *last_status = now; -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] BRIN indexes - TRAP: BadArgument
On Sat, Nov 8, 2014 at 1:26 AM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: I just pushed this, after some more minor tweaks. Thanks, and please do continue testing! Few typo's and few questions 1. * range. Need to an extra flag in mmtuples for that. */ Datum brinbulkdelete(PG_FUNCTION_ARGS) Isn't the part of comment referring *mmtuples* require some change, as I think mmtuples was used in initial version of patch. 2. /* --- * mt_info is laid out in the following fashion: * * 7th (high) bit: has nulls * 6th bit: is placeholder tuple * 5th bit: unused * 4-0 bit: offset of data * --- */ uint8 bt_info; } BrinTuple; Here in comments, bt_info is referred as mt_info. 3. /* * t_info manipulation macros */ #define BRIN_OFFSET_MASK 0x1F I think in above comment it should be bt_info, rather than t_info. 4. static void revmap_physical_extend(BrinRevmap *revmap) { .. .. START_CRIT_SECTION(); /* the rm_tids array is initialized to all invalid by PageInit */ brin_page_init(page, BRIN_PAGETYPE_REVMAP); MarkBufferDirty(buf); metadata-lastRevmapPage = mapBlk; MarkBufferDirty(revmap-rm_metaBuf); .. } Can't we update revmap-rm_lastRevmapPage along with metadata-lastRevmap? 5. typedef struct BrinMemTuple { bool bt_placeholder; /* this is a placeholder tuple */ BlockNumber bt_blkno; /* heap blkno that the tuple is for */ MemoryContext bt_context; /* memcxt holding the dt_column values */ .. } How is this memory context getting used? I could see that this is used brin_deform_tuple() which gets called from 3 other places in core code bringetbitmap(), brininsert() and union_tuples() and in all the 3 places there is already another temporaray memory context used to avoid any form of memory leaks. 6. Is there anyway to force brin index to be off, if not, then do we need it as it is present for other type of scan's. like set enable_indexscan=off; With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com