Re: [HACKERS] Unportable implementation of background worker start
On 2017-04-21 23:50:41 -0400, Tom Lane wrote: > Attached are a couple of patches that represent a plausible Plan B. > The first one changes the postmaster to run its signal handlers without > specifying SA_RESTART. I've confirmed that that seems to fix the > select_parallel-test-takes-a-long-time problem on gaur/pademelon. > The second one uses pselect, if available, to replace the unblock-signals/ > select()/block-signals dance in ServerLoop. On platforms where pselect > exists and works properly, that should fix the race condition I described > previously. On platforms where it doesn't, we're no worse off than > before. We probably should note somewhere prominently that pselect isn't actually race-free on a number of platforms. > I think that these patches represent something we could back-patch > without a lot of trepidation, unlike the WaitEventSet-based approach. > Therefore, my proposal is to apply and backpatch these changes, and > call it good for v10. For v11, we could work on changing the postmaster > to not do work in signal handlers, as discussed upthread. That would > supersede these two patches completely, though I'd still advocate for > keeping the change in maybe_start_bgworker. Not yet having looked at your patches, that sounds like a reasonable plan. I'd still like to get something like your CLOEXEC patch applied independently however. < patches > Looks reasonable on a quick skim. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Replication lag tracking for walsenders
On Sat, Apr 22, 2017 at 9:04 AM, Tom Lanewrote: > Alvaro Herrera writes: >> Simon Riggs wrote: >>> Replication lag tracking for walsenders >>> >>> Adds write_lag, flush_lag and replay_lag cols to pg_stat_replication. > >> Did anyone notice that this seems to be causing buildfarm member 'tern' >> to fail the recovery check? See here: > >> https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=tern=2017-04-21%2012%3A48%3A09=recovery-check >> which has >> TRAP: FailedAssertion("!(lsn >= prev.lsn)", File: "walsender.c", Line: 3331) > >> Line 3331 was added by this commit. > > Note that while that commit was some time back, tern has only just started > running recovery-check, following its update to the latest buildfarm > script. It looks like it's run that test four times and failed twice, > so far. So, not 100% reproducible, but there's something rotten there. > Timing-dependent, maybe? Thanks. I'm away from my computer right now but will investigate this and send a fix later today. -- Thomas Munro http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] snapbuild woes
On 2017-04-17 21:16:57 -0700, Andres Freund wrote: > I've since the previous update reviewed Petr's patch, which he since has > updated over the weekend. I'll do another round tomorrow, and will see > how it looks. I think we might need some more tests for this to be > committable, so it might not become committable tomorrow. I hope we'll > have something in tree by end of this week, if not I'll send an update. I was less productive this week than I'd hoped, and creating a testsuite was more work than I'd anticipated, so I'm slightly lagging behind. I hope to have a patchset tomorrow, aiming to commit something Monday/Tuesday. - Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] snapbuild woes
On 2017-04-20 13:32:10 +0200, Petr Jelinek wrote: > On 20/04/17 02:09, Andres Freund wrote: > > On 2017-04-17 21:16:57 -0700, Andres Freund wrote: > > I'm working on some infrastructure around this. Not sure if it needs to > > be committed, but it's certainly useful for evaluation. Basically it's > > a small UDF that: > > 1) creates a slot via walsender protocol (to some dsn) > > 2) imports that snapshot into yet another connection to that dsn > > 3) runs some query over that new connection > > > > That makes it reasonably easy to run e.g. pgbench and continually create > > slots, and use the snapshot to run queries "verifying" that things look > > good. It's a bit shoestring-ed together, but everything else seems to > > require more code. And it's just test. > > > > Unless somebody has a better idea? > I don't. I mean it would be nice to have isolation tester support > walsender protocol, but I don't know anything about isolation tester > internals so no idea how much work that is. Now that the replication protocol supports normal queries, it's actually not much of an issue on its own. The problem is more that isolationtester's client side language isn't powerfull enough - you can't extract the snapshot name from one session and import it in another. While that might be something we want to address, I certainly don't want to tackle it for v10. I'd started to develop a C toolkit as above, but after I got the basics running I actually noticed it's pretty much unnecessary: You can just as well do it with dblink and some plpgsql. I can reliably reproduce several of the bugs in this thread in relatively short amount of time before applying the patch, and so far not after. Thats great! > I guess you plan to make that as one of the test/modules or something > similar (the UDF)? I've a bunch of tests, but I don't quite know whether we can expose all of them via classical tests. There are several easy ones that I definitely want to add (import "empty" snapshot; import snapshot with running xacts; create snapshot, perform some ddl, import snapshot, perform some ddl, check things work reasonably crazy), but there's enough others that are just probabilistic. I was wondering about adding a loop that simply runs for like 30s and then quits or such, but who knows. Testing around this made me wonder whether we need to make bgwriter.c's LOG_SNAPSHOT_INTERVAL_MS configurable - for efficient testing reducing it is quite valuable, and on busier machines it'll also almost always be a win to log more frequently... Opinions? Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgbench - allow to store select results into variables
Here is a v3 with a more precise comment. Looks good to me. I have marked the patch status as "Ready for committer". Ok. Thanks. When/if committed, it might trigger a few rebase of other pending patches. I'll see about that then. -- Fabien. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Unportable implementation of background worker start
I wrote: > Attached is a lightly-tested draft patch that converts the postmaster to > use a WaitEventSet for waiting in ServerLoop. I've got mixed emotions > about whether this is the direction to proceed, though. Attached are a couple of patches that represent a plausible Plan B. The first one changes the postmaster to run its signal handlers without specifying SA_RESTART. I've confirmed that that seems to fix the select_parallel-test-takes-a-long-time problem on gaur/pademelon. The second one uses pselect, if available, to replace the unblock-signals/ select()/block-signals dance in ServerLoop. On platforms where pselect exists and works properly, that should fix the race condition I described previously. On platforms where it doesn't, we're no worse off than before. As mentioned in the comments for the second patch, even if we don't have working pselect(), the only problem is that ServerLoop's response to an interrupt might be delayed by as much as the up-to-1-minute timeout. The only existing case where that's really bad is launching multiple bgworkers. I would therefore advocate also changing maybe_start_bgworker to start up to N bgworkers per call, where N is large enough to pretty much always satisfy simultaneously-arriving requests. I'd pick 100 or so, but am willing to negotiate. I think that these patches represent something we could back-patch without a lot of trepidation, unlike the WaitEventSet-based approach. Therefore, my proposal is to apply and backpatch these changes, and call it good for v10. For v11, we could work on changing the postmaster to not do work in signal handlers, as discussed upthread. That would supersede these two patches completely, though I'd still advocate for keeping the change in maybe_start_bgworker. Note: for testing purposes, these patches are quite independent; just ignore the hunk in the second patch that changes a comment added by the first one. regards, tom lane diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c index c382345..52b5e2c 100644 *** a/src/backend/postmaster/postmaster.c --- b/src/backend/postmaster/postmaster.c *** PostmasterMain(int argc, char *argv[]) *** 610,615 --- 610,624 /* * Set up signal handlers for the postmaster process. * + * In the postmaster, we want to install non-ignored handlers *without* + * SA_RESTART. This is because they'll be blocked at all times except + * when ServerLoop is waiting for something to happen, and during that + * window, we want signals to exit the select(2) wait so that ServerLoop + * can respond if anything interesting happened. On some platforms, + * signals marked SA_RESTART would not cause the select() wait to end. + * Child processes will generally want SA_RESTART, but we expect them to + * set up their own handlers before unblocking signals. + * * CAUTION: when changing this list, check for side-effects on the signal * handling setup of child processes. See tcop/postgres.c, * bootstrap/bootstrap.c, postmaster/bgwriter.c, postmaster/walwriter.c, *** PostmasterMain(int argc, char *argv[]) *** 620,635 pqinitmask(); PG_SETMASK(); ! pqsignal(SIGHUP, SIGHUP_handler); /* reread config file and have ! * children do same */ ! pqsignal(SIGINT, pmdie); /* send SIGTERM and shut down */ ! pqsignal(SIGQUIT, pmdie); /* send SIGQUIT and die */ ! pqsignal(SIGTERM, pmdie); /* wait for children and shut down */ pqsignal(SIGALRM, SIG_IGN); /* ignored */ pqsignal(SIGPIPE, SIG_IGN); /* ignored */ ! pqsignal(SIGUSR1, sigusr1_handler); /* message from child process */ ! pqsignal(SIGUSR2, dummy_handler); /* unused, reserve for children */ ! pqsignal(SIGCHLD, reaper); /* handle child termination */ pqsignal(SIGTTIN, SIG_IGN); /* ignored */ pqsignal(SIGTTOU, SIG_IGN); /* ignored */ /* ignore SIGXFSZ, so that ulimit violations work like disk full */ --- 629,648 pqinitmask(); PG_SETMASK(); ! pqsignal_no_restart(SIGHUP, SIGHUP_handler); /* reread config file ! * and have children do ! * same */ ! pqsignal_no_restart(SIGINT, pmdie); /* send SIGTERM and shut down */ ! pqsignal_no_restart(SIGQUIT, pmdie); /* send SIGQUIT and die */ ! pqsignal_no_restart(SIGTERM, pmdie); /* wait for children and shut ! * down */ pqsignal(SIGALRM, SIG_IGN); /* ignored */ pqsignal(SIGPIPE, SIG_IGN); /* ignored */ ! pqsignal_no_restart(SIGUSR1, sigusr1_handler); /* message from child ! * process */ ! pqsignal_no_restart(SIGUSR2, dummy_handler); /* unused, reserve for ! * children */ ! pqsignal_no_restart(SIGCHLD, reaper); /* handle child termination */ pqsignal(SIGTTIN, SIG_IGN); /* ignored */ pqsignal(SIGTTOU, SIG_IGN); /* ignored */ /* ignore SIGXFSZ, so that ulimit violations work like disk full */ diff --git a/src/include/port.h
Re: [HACKERS] Old versions of Test::More
On 22 Apr. 2017 4:23 am, "Tom Lane"wrote: Peter Eisentraut writes: > On 4/21/17 14:49, Andrew Dunstan wrote: >> I'll add a comment, but doing it in PostgresNode.pm would mean jacana >> (for instance) couldn't run any of the TAP tests. I'mm looking at >> installing a sufficiently modern Test::Simple package (includes >> Test::More and test::Build) there, but other oldish machines could also >> be affected. > Or you could define note() as an empty function if it doesn't exist. +1. I'm really not at all happy with the prospect that every time somebody adds a use of "note" to some new TAP test, we're going to get a complaint later that that test no longer works on jacana. We need to either decide that non-ancient Test::More is a hard requirement for all the tests That seems like a no-brainer TBH. Why are we bothering with backwards compat with ancient versions of test frameworks? It seems like a colossal waste of time for no benefit. or fix things with a centralized solution. A dummy (or not so dummy?) implementation would serve for the latter. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] multithreading in Batch/pipelining mode for libpq
On 2017-04-22 09:14:50 +0800, Craig Ringer wrote: > 2) if the answer to the previous question is negative, is it possible to > send asynchronous queries in one thread while reading results in another > thread? > > > Not right now. libpq's state tracking wouldn't cope. > > I imagine it could be modified to work with some significant refactoring. > You'd need to track state with a shared fifo of some kind where dispatch > outs queries on the fifo as it sends them and receive pops them from it. FWIW, I think it'd be a *SERIOUSLY* bad idea trying to make individual PGconn interactions threadsafe. It'd imply significant overhead in a lot of situations, and programming it would have to become a lot more complicated (since you need to synchronize command submission between threads). For almost all cases it's better to either use multiple connections or use a coarse grained mutex around all of libpq. - Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] multithreading in Batch/pipelining mode for libpq
On 22 Apr. 2017 6:04 am, "Ilya Roublev"wrote: 1) is it possible technically (possibly by changing some part of libpq code) to ignore results (especially for this sort of queries like insert), processing somehow separately the situation when some error occurs? There is a patch out there to allow libpq result processing by callback I think. Might be roughly what you want. 2) if the answer to the previous question is negative, is it possible to send asynchronous queries in one thread while reading results in another thread? Not right now. libpq's state tracking wouldn't cope. I imagine it could be modified to work with some significant refactoring. You'd need to track state with a shared fifo of some kind where dispatch outs queries on the fifo as it sends them and receive pops them from it. I started on that for the batch mode stuff but it's not in any way thread safe there. locking, info in PGconn very quickly becomes inconsistent, the number of queries sent does not correspond to the number of results to be read, etc. So I'd like to know at first is it possible at all (possibly by some changes to be made in libpq)? Sorry if my idea sounds rather naive. And thanks for your answer and advice. Yeah, it's possible. The protocol can handle it, it's just libpq that can't.
Re: [HACKERS] Unportable implementation of background worker start
Alvaro Herrerawrites: > Tom Lane wrote: >> It also appears to me that do_start_bgworker's treatment of fork >> failure is completely brain dead. Did anyone really think about >> that case? > Hmm, I probably modelled it on autovacuum without giving that case much > additional consideration. Attached is a proposed patch that should make fork failure behave sanely, ie it works much the same as a worker crash immediately after launch. I also refactored things a bit to make do_start_bgworker fully responsible for updating the RegisteredBgWorker's state, rather than doing just some of it as before. I tested this by hot-wiring the fork_process call to fail some of the time, which showed that the postmaster now seems to recover OK, but parallel.c's logic is completely innocent of the idea that worker-startup failure is possible. The leader backend just freezes, and nothing short of kill -9 on that backend will get you out of it. Fixing that seems like material for a separate patch though. regards, tom lane diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c index c382345..8461c24 100644 *** a/src/backend/postmaster/postmaster.c --- b/src/backend/postmaster/postmaster.c *** static void TerminateChildren(int signal *** 420,425 --- 420,426 #define SignalChildren(sig) SignalSomeChildren(sig, BACKEND_TYPE_ALL) static int CountChildren(int target); + static bool assign_backendlist_entry(RegisteredBgWorker *rw); static void maybe_start_bgworker(void); static bool CreateOptsFile(int argc, char *argv[], char *fullprogname); static pid_t StartChildProcess(AuxProcType type); *** bgworker_forkexec(int shmem_slot) *** 5531,5543 * Start a new bgworker. * Starting time conditions must have been checked already. * * This code is heavily based on autovacuum.c, q.v. */ ! static void do_start_bgworker(RegisteredBgWorker *rw) { pid_t worker_pid; ereport(DEBUG1, (errmsg("starting background worker process \"%s\"", rw->rw_worker.bgw_name))); --- 5532,5564 * Start a new bgworker. * Starting time conditions must have been checked already. * + * Returns true on success, false on failure. + * In either case, update the RegisteredBgWorker's state appropriately. + * * This code is heavily based on autovacuum.c, q.v. */ ! static bool do_start_bgworker(RegisteredBgWorker *rw) { pid_t worker_pid; + Assert(rw->rw_pid == 0); + + /* + * Allocate and assign the Backend element. Note we must do this before + * forking, so that we can handle out of memory properly. + * + * Treat failure as though the worker had crashed. That way, the + * postmaster will wait a bit before attempting to start it again; if it + * tried again right away, most likely it'd find itself repeating the + * out-of-memory or fork failure condition. + */ + if (!assign_backendlist_entry(rw)) + { + rw->rw_crashed_at = GetCurrentTimestamp(); + return false; + } + ereport(DEBUG1, (errmsg("starting background worker process \"%s\"", rw->rw_worker.bgw_name))); *** do_start_bgworker(RegisteredBgWorker *rw *** 5549,5557 #endif { case -1: ereport(LOG, (errmsg("could not fork worker process: %m"))); ! return; #ifndef EXEC_BACKEND case 0: --- 5570,5586 #endif { case -1: + /* in postmaster, fork failed ... */ ereport(LOG, (errmsg("could not fork worker process: %m"))); ! /* undo what assign_backendlist_entry did */ ! ReleasePostmasterChildSlot(rw->rw_child_slot); ! rw->rw_child_slot = 0; ! free(rw->rw_backend); ! rw->rw_backend = NULL; ! /* mark entry as crashed, so we'll try again later */ ! rw->rw_crashed_at = GetCurrentTimestamp(); ! break; #ifndef EXEC_BACKEND case 0: *** do_start_bgworker(RegisteredBgWorker *rw *** 5575,5588 PostmasterContext = NULL; StartBackgroundWorker(); break; #endif default: rw->rw_pid = worker_pid; rw->rw_backend->pid = rw->rw_pid; ReportBackgroundWorkerPID(rw); ! break; } } /* --- 5604,5627 PostmasterContext = NULL; StartBackgroundWorker(); + + exit(1); /* should not get here */ break; #endif default: + /* in postmaster, fork successful ... */ rw->rw_pid = worker_pid; rw->rw_backend->pid = rw->rw_pid; ReportBackgroundWorkerPID(rw); ! /* add new worker to lists of backends */ ! dlist_push_head(, >rw_backend->elem); ! #ifdef EXEC_BACKEND ! ShmemBackendArrayAdd(rw->rw_backend); ! #endif ! return true; } + + return false; } /* *** bgworker_should_start_now(BgWorkerStartT *** 5629,5634 --- 5668,5675 * Allocate the Backend struct for a connected background worker, but don't * add it to the list of backends just yet. *
Re: [HACKERS] pg_dump emits ALTER TABLE ONLY partitioned_table
On Fri, Apr 21, 2017 at 1:43 AM, Stephen Frostwrote: >> + Once >> + partitions exist, we do not support using ONLY to >> + add or drop constraints on only the partitioned table. >> >> I wonder if the following sounds a bit more informative: Once partitions >> exist, using ONLY will result in an error, because the >> constraint being added or dropped must be added or dropped from the >> partitions too. > > My thinking here is that we may add support later on down the line for > using the ONLY keyword, when the constraint has already been created on > the partitions themselves, so I would rather just clarify that we don't > (yet) support that. Your wording, at least to me, seems to imply that > if the constraint was added to or dropped from the partitions then the > ONLY keyword could be used, which isn't accurate today. Well, I think that Amit's wording has the virtue of giving a reason which is basically valid, whereas someone reading your text might conceivably be left wondering what the reason for the restriction is. Also, I think saying that it it will result in an error is a bit more helpful than saying that it is is not supported, since the latter might lead someone to believe that it could result in undefined behavior (i.e. arbitrarily awful misbehavior) which is not the case. So I like what he wrote, for whatever that's worth. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Why is get_cheapest_parallel_safe_total_inner() in pathkeys.c?
On Fri, Apr 21, 2017 at 12:10 PM, David Rowleywrote: > This probably ended up here because there's a bunch of other functions > named get_cheapest* in that file, but all of those relate to getting a > path for specific PathKeys. get_cheapest_parallel_safe_total_inner() > does not do that. Yes, I just put it near to the code from which it was derived. > Maybe allpaths.c is a better home for it? I will let others be the judge of that. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Partition-wise join for join between (declaratively) partitioned tables
On Fri, Apr 21, 2017 at 8:41 AM, Ashutosh Bapatwrote: > I don't see how is that fixed. For a join relation we need to come up > with one set of partition bounds by merging partition bounds of the > joining relation and in order to understand how to interpret the > datums in the partition bounds, we need to associate data types. The > question is which data type we should use if the relations being > joined have different data types associated with their respective > partition bounds. > > Or are you saying that we don't need to associate data type with > merged partition bounds? In that case, I don't know how do we compare > the partition bounds of two relations? Well, since there is no guarantee that a datatype exists which can be used to "merge" the partition bounds in the sense that you are describing, and even if there is one we have no opfamily infrastructure to find out which one it is, I think it would be smart to try to set things up so that we don't need to do that. I believe that's probably possible. > In your example, A has partition key of type int8, has bound datums > X1.. X10. B has partition key of type int4 and has bounds datums X1 .. > X11. C has partition key type int2 and bound datums X1 .. X12. OK, sure. > The binary representation of X's is going to differ between A, B and C > although each Xk for A, B and C is equal, wherever exists. Agreed. > Join > between A and B will have merged bound datums X1 .. X10 (and X11 > depending upon the join type). In order to match bounds of AB with C, > we need to know the data type of bounds of AB, so that we can choose > appropriate equality operator. The question is what should we choose > as data type of partition bounds of AB, int8 or int4. This is > different from applying join conditions between AB and C, which can > choose the right opfamily operator based on the join conditions. Well, the join is actually being performed either on A.keycol = C.keycol or on B.keycol = C.keycol, right? It has to be one or the other; there's no "merged" join column in any relation's targetlist, but only columns derived from the various baserels. So let's use that set of bounds for the matching. It makes sense to use the set of bounds for the matching that corresponds to the column actually being joined, I think. It's late here and I'm tired, but it seems like it should be possible to relate the child joinrels of the AB join back to the child joinrels of either A or B. (AB)1 .. (AB)10 related back to A1 .. A10 and B1 .. B10. (AB)11 relates back to B11 but, of course not to A11, which doesn't exist. If the join is INNER, (AB)11 is a dummy rel anyway and actually we should probably see whether we can omit it altogether. If the join is an outer join of some kind, there's an interesting case where the user wrote A LEFT JOIN B or B RIGHT JOIN A so that A is not on the nullable side of the join; in that case, too, (AB)11 is dummy or nonexistent. Otherwise, assuming A is nullable, (AB)11 maps only to B11 and not to A11. But that's absolutely right: if the join to C uses A.keycol, either the join operator is strict and (AB)11 won't match anything anyway, or it's not and partition-wise join is illegal because A.keycol in (AB)11 can include not only values from X11 but also nulls. So, it seems to me that what you can do is loop over the childrels on the outer side of the join. For each one, you've got a join clause that relates the outer rel to the inner rel, and that join clause mentions some baserel which is contained in the joinrel. So drill down through the childrel to the corresponding partition of the baserel and get those bounds. Then if you do the same thing for the inner childrels, you've now got two lists of bounds, and the type on the left matches the outer side of the join and the type on the right matches the inner side of the join and the opfamily of the operator in the join clause gives you a comparison operator that relates those two types, and now you can match them up. (We should also keep in mind the case where there are multiple columns in the partition key.) > This seems to suggest that we can not come up with merged bounds for > join if the partition key types of joining relations differ. Yes, I think that would be difficult. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] scram and \password
On Sat, Apr 22, 2017 at 5:04 AM, Heikki Linnakangaswrote: > I'll continue reviewing the rest of the patch on Monday, but one glaring > issue is that we cannot add an argument to the existing libpq > PQencryptPassword() function, because that's part of the public API. It > would break all applications that use PQencryptPassword(). > > What we need to do is to add a new function. What should we call that? We > don't have a tradition of using "Ex" or such suffix to mark extended > versions of existing functions. PQencryptPasswordWithMethod(user, pass, > method) ? Do we really want to add a new function or have a hard failure? Any application calling PQencryptPassword may trap itself silently if the server uses scram as hba key or if the default is switched to that, from this point of view extending the function makes sense as well. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] multithreading in Batch/pipelining mode for libpq
Hello, I'm investigating Batch/pipelining mode for libpq connected to this thread: https://www.postgresql.org/message-id/flat/CADT4RqA6XoDCVY-G13ME1oRVshE2oNk4fRHKZC0K-jJymQJV0Q%40mail.gmail.com#cadt4rqa6xodcvy-g13me1orvshe2onk4frhkzc0k-jjymqj...@mail.gmail.com And I'm wondering, what are the possibilities to speed up execution of asynchronous queries yet more. What I need is to make a huge amount of inserts (or other sorts of queries either returning no results or those which results I'd like somehow to ignore completely) very quickly, so the idea to use batch mode sounds rather good for me taking into account https://blog.2ndquadrant.com/postgresql-latency-pipelining-batching/ So I'd like to create the corresponding prepared statement and to perform asynchronously all these queries through this prepared statement. What I'm trying to do now is to understand (at least in general): 1) is it possible technically (possibly by changing some part of libpq code) to ignore results (especially for this sort of queries like insert), processing somehow separately the situation when some error occurs? 2) if the answer to the previous question is negative, is it possible to send asynchronous queries in one thread while reading results in another thread? I understand that this immediately contradicts the requirement not to use the same PGconn object in different threads. But may be some locks of this PGconn object are possible, so that the state of PGconn cannot be made in one thread while the other locks PGconn and vice versa? Naturally I failed to implement this without locking, info in PGconn very quickly becomes inconsistent, the number of queries sent does not correspond to the number of results to be read, etc. So I'd like to know at first is it possible at all (possibly by some changes to be made in libpq)? Sorry if my idea sounds rather naive. And thanks for your answer and advice. Only in the case all this sounds not so stupid and something is possible to make, I'd like to ask for some details, if you do not mind. Thank you very much again in advance. With my best regards, Ilya
Re: [HACKERS] [COMMITTERS] pgsql: Replication lag tracking for walsenders
On 2017-04-21 17:04:08 -0400, Tom Lane wrote: > Some excavation in the buildfarm database says that the coverage for > the recovery-check test has been mighty darn thin up until just > recently. Hm, not good. Just enabled it for most of my animals (there seems to be no point in having it on the C89 animal). Very curious what it'll do to skink's runtime :/. I'll add a few more cores to the machine, otherwise it'll probably wreak havock. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Replication lag tracking for walsenders
Alvaro Herrerawrites: > Simon Riggs wrote: >> Replication lag tracking for walsenders >> >> Adds write_lag, flush_lag and replay_lag cols to pg_stat_replication. > Did anyone notice that this seems to be causing buildfarm member 'tern' > to fail the recovery check? See here: > https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=tern=2017-04-21%2012%3A48%3A09=recovery-check > which has > TRAP: FailedAssertion("!(lsn >= prev.lsn)", File: "walsender.c", Line: 3331) > Line 3331 was added by this commit. Note that while that commit was some time back, tern has only just started running recovery-check, following its update to the latest buildfarm script. It looks like it's run that test four times and failed twice, so far. So, not 100% reproducible, but there's something rotten there. Timing-dependent, maybe? Some excavation in the buildfarm database says that the coverage for the recovery-check test has been mighty darn thin up until just recently. These are all the reports we have: pgbfprod=> select sysname, min(snapshot) as oldest, count(*) from build_status_log where log_stage = 'recovery-check.log' group by 1 order by 2; sysname | oldest| count --+-+--- hamster | 2016-03-01 02:34:26 | 182 crake| 2017-04-09 01:58:15 |80 nightjar | 2017-04-11 15:54:34 |52 longfin | 2017-04-19 16:29:39 | 9 hornet | 2017-04-20 14:12:08 | 4 mandrill | 2017-04-20 14:14:08 | 4 sungazer | 2017-04-20 14:16:08 | 4 tern | 2017-04-20 14:18:08 | 4 prion| 2017-04-20 14:23:05 | 8 jacana | 2017-04-20 15:00:17 | 3 (10 rows) So, other than hamster which is certainly going to have its own spin on the timing question, we have next to no track record for this test. I wouldn't bet that this issue is unique to tern; more likely, that's just the first critter to show an intermittent issue. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Old versions of Test::More
Peter Eisentrautwrites: > On 4/21/17 14:49, Andrew Dunstan wrote: >> I'll add a comment, but doing it in PostgresNode.pm would mean jacana >> (for instance) couldn't run any of the TAP tests. I'mm looking at >> installing a sufficiently modern Test::Simple package (includes >> Test::More and test::Build) there, but other oldish machines could also >> be affected. > Or you could define note() as an empty function if it doesn't exist. +1. I'm really not at all happy with the prospect that every time somebody adds a use of "note" to some new TAP test, we're going to get a complaint later that that test no longer works on jacana. We need to either decide that non-ancient Test::More is a hard requirement for all the tests, or fix things with a centralized solution. A dummy (or not so dummy?) implementation would serve for the latter. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Unportable implementation of background worker start
Tom Lanewrites: > Andres Freund writes: >> On 2017-04-21 14:08:21 -0400, Tom Lane wrote: >>> but I see that SUSv2 >>> mandates that fcntl.h provide both F_SETFD and FD_CLOEXEC, so by our own >>> coding rules it ought to be okay to assume they're there. I'm tempted to >>> rip out the quoted bit, as well as the #ifdef F_SETFD, from libpq and see >>> if anything in the buildfarm complains. >> +1 > Done, we'll soon see what happens. Should have seen this coming, I guess: some of the Windows critters are falling over, apparently because they lack fcntl() altogether. So the #ifdef F_SETFD was really acting as a proxy for "#ifdef HAVE_FCNTL". There's no HAVE_FCNTL test in configure ATM, and I'm thinking it would be pretty silly to add one, since surely it would succeed on anything Unix-y enough to run the configure script. I'm guessing the best thing to do is put back #ifdef F_SETFD; alternatively we might spell it like "#ifndef WIN32", but I'm unsure if that'd do what we want on Cygwin or MinGW. In non-Windows code paths in latch.c, we probably wouldn't need to bother with #ifdef F_SETFD. Hopefully we can leave in the removal of "#define FD_CLOEXEC". Will wait a bit longer for more results. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] scram and \password
On 04/10/2017 08:42 AM, Michael Paquier wrote: As there have been some conflicts because of the commit of SASLprep, here is a rebased set of patches. I've committed a modified version of the first patch, to change the on-disk format to RFC 5803, as mentioned on the other thread (https://www.postgresql.org/message-id/351ba574-85ea-d9b8-9689-8c928dd09...@iki.fi). I'll continue reviewing the rest of the patch on Monday, but one glaring issue is that we cannot add an argument to the existing libpq PQencryptPassword() function, because that's part of the public API. It would break all applications that use PQencryptPassword(). What we need to do is to add a new function. What should we call that? We don't have a tradition of using "Ex" or such suffix to mark extended versions of existing functions. PQencryptPasswordWithMethod(user, pass, method) ? - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] On-disk format of SCRAM verifiers
On 04/21/2017 05:33 PM, Simon Riggs wrote: On 21 April 2017 at 14:42, Heikki Linnakangaswrote: SCRAM-SHA-256$:$: Could you explain where you are looking? I don't see that in RFC5803 >From 1. Overview: Yeah, it's not easy to see, I missed it earlier too. You have to look at RFC 5803 and RFC 3112 together. RFC 3112 says that the overall format is "$$", and RFC5803 says that for SCRAM, scheme is "SCRAM-SHA-256" (for our variant), authInfo is ":" and authValue is ":" They really should've included examples in those RFCs. Thanks +1 for change Committed, thanks. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Old versions of Test::More
On 4/21/17 14:49, Andrew Dunstan wrote: > I'll add a comment, but doing it in PostgresNode.pm would mean jacana > (for instance) couldn't run any of the TAP tests. I'mm looking at > installing a sufficiently modern Test::Simple package (includes > Test::More and test::Build) there, but other oldish machines could also > be affected. Or you could define note() as an empty function if it doesn't exist. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Replication lag tracking for walsenders
Simon Riggs wrote: > Replication lag tracking for walsenders > > Adds write_lag, flush_lag and replay_lag cols to pg_stat_replication. Did anyone notice that this seems to be causing buildfarm member 'tern' to fail the recovery check? See here: https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=tern=2017-04-21%2012%3A48%3A09=recovery-check which has TRAP: FailedAssertion("!(lsn >= prev.lsn)", File: "walsender.c", Line: 3331) Line 3331 was added by this commit. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] some review comments on logical rep code
On Fri, Apr 21, 2017 at 4:02 PM, Masahiko Sawadawrote: > On Fri, Apr 21, 2017 at 1:19 AM, Fujii Masao wrote: >> On Tue, Apr 18, 2017 at 5:16 PM, Masahiko Sawada >> wrote: >>> On Tue, Apr 18, 2017 at 12:24 PM, Kyotaro HORIGUCHI >>> wrote: Hi, Thank you for the revised version. At Mon, 17 Apr 2017 23:29:28 +0900, Masahiko Sawada wrote in > On Mon, Apr 17, 2017 at 9:13 PM, Masahiko Sawada > wrote: > > On Mon, Apr 17, 2017 at 7:39 PM, Kyotaro HORIGUCHI > > wrote: > >> At Mon, 17 Apr 2017 18:02:57 +0900, Masahiko Sawada > >> wrote in > >>
Re: [HACKERS] Unportable implementation of background worker start
Andres Freundwrites: > On 2017-04-21 14:08:21 -0400, Tom Lane wrote: >> but I see that SUSv2 >> mandates that fcntl.h provide both F_SETFD and FD_CLOEXEC, so by our own >> coding rules it ought to be okay to assume they're there. I'm tempted to >> rip out the quoted bit, as well as the #ifdef F_SETFD, from libpq and see >> if anything in the buildfarm complains. > +1 Done, we'll soon see what happens. In the same area, I noticed that POSIX does not say that the success result for fcntl(F_SETFD) and related cases is 0. It says that the failure result is -1 and the success result is some other value. We seem to have this right in most places, but e.g. port/noblock.c gets it wrong. The lack of field complaints implies that just about everybody actually does return 0 on success, but I still think it would be a good idea to run around and make all the calls test specifically for -1. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Old versions of Test::More
On 04/21/2017 10:45 AM, Tom Lane wrote: > Andrew Dunstanwrites: >> As we discovered yesterday via jacana, very old versions of Test::More >> don't support the note() function. It therefore seems prudent to specify >> a minimum version number for the module in those scripts that use the >> function. According to the changelog, version 0.82 (released in Oct >> 2008) should be sufficient. So I suggest the attached patch. > Maybe it'd be better to put the minimum-version check somewhere central, > like PostgresNode.pm? I suppose that our use of "note" will expand, > and I can't see us remembering to put this into all and only the specific > tests that use "note". Especially if there's no comment about it. > > I'll add a comment, but doing it in PostgresNode.pm would mean jacana (for instance) couldn't run any of the TAP tests. I'mm looking at installing a sufficiently modern Test::Simple package (includes Test::More and test::Build) there, but other oldish machines could also be affected. cheers andrew -- Andrew Dunstanhttps://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Missing feature in Phrase Search?
Oleg, Teodor, folks: I was demo'ing phrase search for a meetup yesterday, and the user feedback I got showed that there's a missing feature with phrase search. Let me explain by example: 'fix <-> error' will match 'fixed error', 'fixing error' but not 'fixed language error' or 'fixed a small error' 'fix <2> error' will match 'fixed language error', but not 'fixing error' or 'fixed a small error' 'fix <3> error' will match 'fixed a small error', but not any of the other strings. This is because the # in <#> is an exact match. Seems like we could really use a way for users to indicate that they want a range of word gaps. Like, in the example above, users could search on: 'fix <1:3> error' ... which would search for any phrase where "error" followed "fix" by between 1 and 3 words. Not wedded to any particular syntax for that, of course. -- Josh Berkus Containers & Databases Oh My! -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Unportable implementation of background worker start
Tom, On 2017-04-21 13:49:27 -0400, Tom Lane wrote: > >> * If we are in PM_WAIT_DEAD_END state, then we don't want to accept > >> - * any new connections, so we don't call select(), and just > >> sleep. > >> + * any new connections, so we don't call WaitEventSetWait(), > >> and just > >> + * sleep. XXX not ideal > >> */ > > > Couldn't we just deactive the sockets in the set instead? > > Yeah, I think it'd be better to do something like that. The pg_usleep > call has the same issue of possibly not responding to interrupts. The > risks are a lot less, since it's a much shorter wait, but I would rather > eliminate the separate code path in favor of doing it honestly. Didn't > seem like something to fuss over in the first draft though. Ok, cool. > >> + ServerWaitSet = CreateWaitEventSet(PostmasterContext, MAXLISTEN + 1); > > > Why are we using MAXLISTEN, rather than the actual number of things to > > listen to? > > It'd take more code (ie, an additional scan of the array) to predetermine > that. I figured the space-per-item in the WaitEventSet wasn't enough to > worry about ... do you think differently? I'm not sure. We do create an epoll handler with enough space, and that has some overhead. Don't know whether that's worthwhile to care about. > > I kind of would like, in master, take a chance of replace all the work > > done in signal handlers, by just a SetLatch(), and do it outside of > > signal handlers instead. Forking from signal handlers is just plain > > weird. > > Yeah, maybe it's time. But in v11, and not for back-patch. Agreed. - Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Unportable implementation of background worker start
On 2017-04-21 14:08:21 -0400, Tom Lane wrote: > but I see that SUSv2 > mandates that fcntl.h provide both F_SETFD and FD_CLOEXEC, so by our own > coding rules it ought to be okay to assume they're there. I'm tempted to > rip out the quoted bit, as well as the #ifdef F_SETFD, from libpq and see > if anything in the buildfarm complains. +1 - Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Unportable implementation of background worker start
I wrote: > Andres Freundwrites: >> On 2017-04-21 12:50:04 -0400, Tom Lane wrote: >>> +#ifndef FD_CLOEXEC >>> +#define FD_CLOEXEC 1 >>> +#endif >> Hm? Are we sure this is portable? Is there really cases that have >> F_SETFD, but not CLOEXEC? > Copied-and-pasted from our only existing use of FD_CLOEXEC, in libpq. Looking closer, that code dates to Author: Tom Lane Branch: master Release: REL8_0_BR [7627b91cd] 2004-10-21 20:23:19 + Set the close-on-exec flag for libpq's socket to the backend, to avoid any possible problems from child programs executed by the client app. Per suggestion from Elliot Lee of Red Hat. and while the public discussion about it https://www.postgresql.org/message-id/flat/18172.1098382248%40sss.pgh.pa.us doesn't really say, I suspect the specific coding was Elliot's suggestion as well. It probably had some value at one time ... but I see that SUSv2 mandates that fcntl.h provide both F_SETFD and FD_CLOEXEC, so by our own coding rules it ought to be okay to assume they're there. I'm tempted to rip out the quoted bit, as well as the #ifdef F_SETFD, from libpq and see if anything in the buildfarm complains. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Unportable implementation of background worker start
Andres Freundwrites: > On 2017-04-21 12:50:04 -0400, Tom Lane wrote: >> Attached is a lightly-tested draft patch that converts the postmaster to >> use a WaitEventSet for waiting in ServerLoop. I've got mixed emotions >> about whether this is the direction to proceed, though. It adds at least >> a couple of kernel calls per postmaster signal delivery, and probably to >> every postmaster connection acceptance (ServerLoop iteration), to fix >> problems that are so corner-casey that we've never even known we had them >> till now. > I'm not concerned much about the signal delivery paths, and I can't > quite imagine that another syscall in the accept path is going to be > measurable - worth ensuring though. > ... > On the other hand most types of our processes do SetLatch() in just > nearly all the relevant signal handlers anyway, so they're pretty close > to behaviour already. True. Maybe I'm being too worried. >> * If we are in PM_WAIT_DEAD_END state, then we don't want to accept >> - * any new connections, so we don't call select(), and just >> sleep. >> + * any new connections, so we don't call WaitEventSetWait(), >> and just >> + * sleep. XXX not ideal >> */ > Couldn't we just deactive the sockets in the set instead? Yeah, I think it'd be better to do something like that. The pg_usleep call has the same issue of possibly not responding to interrupts. The risks are a lot less, since it's a much shorter wait, but I would rather eliminate the separate code path in favor of doing it honestly. Didn't seem like something to fuss over in the first draft though. >> +ServerWaitSet = CreateWaitEventSet(PostmasterContext, MAXLISTEN + 1); > Why are we using MAXLISTEN, rather than the actual number of things to > listen to? It'd take more code (ie, an additional scan of the array) to predetermine that. I figured the space-per-item in the WaitEventSet wasn't enough to worry about ... do you think differently? > Random note: Do we actually have any code that errors out if too many > sockets are being listened to? Yes, see StreamServerPort, about line 400. > I kind of would like, in master, take a chance of replace all the work > done in signal handlers, by just a SetLatch(), and do it outside of > signal handlers instead. Forking from signal handlers is just plain > weird. Yeah, maybe it's time. But in v11, and not for back-patch. >> +#ifndef FD_CLOEXEC >> +#define FD_CLOEXEC 1 >> +#endif > Hm? Are we sure this is portable? Is there really cases that have > F_SETFD, but not CLOEXEC? Copied-and-pasted from our only existing use of FD_CLOEXEC, in libpq. Might well be obsolete but I see no particular reason not to do it. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Unportable implementation of background worker start
Alvaro Herrerawrites: > Tom Lane wrote: >> I'm coming back to the idea that at least in the back branches, the >> thing to do is allow maybe_start_bgworker to start multiple workers. >> >> Is there any actual evidence for the claim that that might have >> bad side effects? > Well, I ran tests with a few dozen thousand sample workers and the > neglect for other things (such as connection requests) was visible, but > that's probably not a scenario many servers run often currently. Indeed. I'm pretty skeptical that that's an interesting case, and if it is, the current coding is broken anyway, because with that many workers you are going to start noticing that running maybe_start_bgworker over again for each worker is an O(N^2) proposition. Admittedly, iterating the loop in maybe_start_bgworker is really cheap compared to a fork(), but eventually the big-O problem is going to eat your lunch. > I don't strongly object to the idea of removing the "return" in older > branches, since it's evidently a problem. However, as bgworkers start > to be used more, I think we should definitely have some protection. In > a system with a large number of workers available for parallel queries, > it seems possible for a high velocity server to get stuck in the loop > for some time. (I haven't actually verified this, though. My > experiments were with the early kind, static bgworkers.) It might be sensible to limit the number of workers launched per call, but I think the limit should be quite a bit higher than 1 ... something like 100 or 1000 might be appropriate. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Unportable implementation of background worker start
Hi, On 2017-04-21 12:50:04 -0400, Tom Lane wrote: > Attached is a lightly-tested draft patch that converts the postmaster to > use a WaitEventSet for waiting in ServerLoop. I've got mixed emotions > about whether this is the direction to proceed, though. It adds at least > a couple of kernel calls per postmaster signal delivery, and probably to > every postmaster connection acceptance (ServerLoop iteration), to fix > problems that are so corner-casey that we've never even known we had them > till now. I'm not concerned much about the signal delivery paths, and I can't quite imagine that another syscall in the accept path is going to be measurable - worth ensuring though. I do agree that it's a bit of a big stick for the back-branches... > A different line of thought is to try to provide a bulletproof solution, > but push the implementation problems down into latch.c --- that is, the > goal would be to provide a pselect-like variant of WaitEventSetWait that > is guaranteed to return if interrupted, as opposed to the current behavior > where it's guaranteed not to. But that seems like quite a bit of work. Seems like a sane idea to me. The use of latches has already grown due to parallelism and it'll likely grow further - some of them seem likely to also have to deal with such concerns. I'd much rather centralize things down to a common place. On the other hand most types of our processes do SetLatch() in just nearly all the relevant signal handlers anyway, so they're pretty close to behaviour already. > Whether or not we decide to change over the postmaster.c code, I think > it'd likely be a good idea to apply most or all of the attached changes > in latch.c. Setting CLOEXEC on the relevant FDs is clearly a good thing, > and the other changes will provide some safety if some preloaded extension > decides to create a latch in the postmaster process. On the principle, I agree. Reading through the changes now. > @@ -1667,76 +1656,64 @@ ServerLoop(void) >* do nontrivial work. >* >* If we are in PM_WAIT_DEAD_END state, then we don't want to > accept > - * any new connections, so we don't call select(), and just > sleep. > + * any new connections, so we don't call WaitEventSetWait(), > and just > + * sleep. XXX not ideal >*/ Couldn't we just deactive the sockets in the set instead? > /* > - * Initialise the masks for select() for the ports we are listening on. > - * Return the number of sockets to listen on. > + * Create a WaitEventSet for ServerLoop() to wait on. This includes the > + * sockets we are listening on, plus the PostmasterLatch. > */ > -static int > -initMasks(fd_set *rmask) > +static void > +initServerWaitSet(void) > { > - int maxsock = -1; > int i; > > - FD_ZERO(rmask); > + ServerWaitSet = CreateWaitEventSet(PostmasterContext, MAXLISTEN + 1); Why are we using MAXLISTEN, rather than the actual number of things to listen to? The only benefit of this seems to be that we could theoretically allow dynamic reconfiguration of the sockets a bit more easily in the future, but that could just as well be done by recreating the set. Random note: Do we actually have any code that errors out if too many sockets are being listened to? > @@ -2553,6 +2543,9 @@ SIGHUP_handler(SIGNAL_ARGS) > #endif > } > > + /* Force ServerLoop to iterate */ > + SetLatch(); > + > PG_SETMASK(); > > errno = save_errno; > @@ -2724,6 +2717,9 @@ pmdie(SIGNAL_ARGS) > break; > } > > + /* Force ServerLoop to iterate */ > + SetLatch(); > + > PG_SETMASK(); > > errno = save_errno; > @@ -3037,6 +3033,9 @@ reaper(SIGNAL_ARGS) >*/ > PostmasterStateMachine(); > > + /* Force ServerLoop to iterate */ > + SetLatch(); > + > /* Done with signal handler */ > PG_SETMASK(); > > @@ -5078,6 +5077,9 @@ sigusr1_handler(SIGNAL_ARGS) > signal_child(StartupPID, SIGUSR2); > } > > + /* Force ServerLoop to iterate */ > + SetLatch(); > + > PG_SETMASK(); > > errno = save_errno; I kind of would like, in master, take a chance of replace all the work done in signal handlers, by just a SetLatch(), and do it outside of signal handlers instead. Forking from signal handlers is just plain weird. > diff --git a/src/backend/storage/ipc/latchindex 4798370..92d2ff0 100644 > --- a/src/backend/storage/ipc/latch.c > +++ b/src/backend/storage/ipc/latch.c > @@ -62,6 +62,10 @@ > #include "storage/pmsignal.h" > #include "storage/shmem.h" > > +#ifndef FD_CLOEXEC > +#define FD_CLOEXEC 1 > +#endif Hm? Are we sure this is portable? Is there really cases that have F_SETFD, but not CLOEXEC? Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your
Re: [HACKERS] Range Merge Join v1
Hi, Jeff! I'm planning to provide the review for this patch in future. I've done some performance tesing (see attachment). All in all, nothing important, everything works fine. Tests were executed under current HEAD on Ubuntu 16 over MacBook Air. I observe that: 1. Patch brings performance improvement for specified query 2. Performance improvement of Range Merge Join vs GiST Nested Loop Join (on Index Only Scan) is between 4x and 6x 3. Performance improvement of RMJ with B-tree index vs GiST is between 4x and 10x (due to lack of actually competing cases, here I omit T-tests, they are not that important when speaking about 4x difference) 4. Range Merge Join is capable to consume expressional index with exactly same effect as direct index 5. Other attributes residing in joined tables do not affect performance improvement I'll make code review some time later, probably next week. (I hope there is no urge in this, since commitfest is in July, or is it urgent?) BTW fixe typo in index specification of original performance test (both indexes were for same table) Best regards, Andrey Borodin. perf2.sql Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Unportable implementation of background worker start
Attached is a lightly-tested draft patch that converts the postmaster to use a WaitEventSet for waiting in ServerLoop. I've got mixed emotions about whether this is the direction to proceed, though. It adds at least a couple of kernel calls per postmaster signal delivery, and probably to every postmaster connection acceptance (ServerLoop iteration), to fix problems that are so corner-casey that we've never even known we had them till now. I'm looking longingly at pselect(2), ppoll(2), and epoll_pwait(2), which would solve the problem without need for a self-pipe. But the latter two are Linux-only, and while pselect does exist in recent POSIX editions, it's nonetheless got portability issues. Googling suggests that on a number of platforms, pselect is non-atomic, ie it's nothing but a wrapper for sigprocmask/select/sigprocmask, which would mean that the race condition I described in my previous message still exists. Despite that, a pretty attractive proposal is to do, essentially, #ifdef HAVE_PSELECT selres = pselect(nSockets, , NULL, NULL, , ); #else PG_SETMASK(); selres = select(nSockets, , NULL, NULL, ); PG_SETMASK(); #endif This fixes the race on platforms where pselect exists and is correctly implemented, and we're no worse off than before where that's not true. The other component of the problem is the possibility that select() will restart if the signal is marked SA_RESTART. (Presumably that would apply to pselect too.) I am thinking that maybe the answer is "if it hurts, don't do it" --- that is, in the postmaster maybe we shouldn't use SA_RESTART, at least not for these signals. A different line of thought is to try to provide a bulletproof solution, but push the implementation problems down into latch.c --- that is, the goal would be to provide a pselect-like variant of WaitEventSetWait that is guaranteed to return if interrupted, as opposed to the current behavior where it's guaranteed not to. But that seems like quite a bit of work. Whether or not we decide to change over the postmaster.c code, I think it'd likely be a good idea to apply most or all of the attached changes in latch.c. Setting CLOEXEC on the relevant FDs is clearly a good thing, and the other changes will provide some safety if some preloaded extension decides to create a latch in the postmaster process. regards, tom lane diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c index 6831342..658ba73 100644 *** a/src/backend/postmaster/postmaster.c --- b/src/backend/postmaster/postmaster.c *** *** 79,88 #include #include - #ifdef HAVE_SYS_SELECT_H - #include - #endif - #ifdef USE_BONJOUR #include #endif --- 79,84 *** static bool LoadedSSL = false; *** 379,384 --- 375,386 static DNSServiceRef bonjour_sdref = NULL; #endif + /* WaitEventSet that ServerLoop uses to wait for connections */ + static WaitEventSet *ServerWaitSet = NULL; + + /* Latch used to ensure that signals interrupt the wait in ServerLoop */ + static Latch PostmasterLatch; + /* * postmaster.c - function prototypes */ *** static int ServerLoop(void); *** 409,415 static int BackendStartup(Port *port); static int ProcessStartupPacket(Port *port, bool SSLdone); static void processCancelRequest(Port *port, void *pkt); ! static int initMasks(fd_set *rmask); static void report_fork_failure_to_client(Port *port, int errnum); static CAC_state canAcceptConnections(void); static bool RandomCancelKey(int32 *cancel_key); --- 411,417 static int BackendStartup(Port *port); static int ProcessStartupPacket(Port *port, bool SSLdone); static void processCancelRequest(Port *port, void *pkt); ! static void initServerWaitSet(void); static void report_fork_failure_to_client(Port *port, int errnum); static CAC_state canAcceptConnections(void); static bool RandomCancelKey(int32 *cancel_key); *** checkDataDir(void) *** 1535,1541 } /* ! * Determine how long should we let ServerLoop sleep. * * In normal conditions we wait at most one minute, to ensure that the other * background tasks handled by ServerLoop get done even when no requests are --- 1537,1543 } /* ! * Determine how long should we let ServerLoop sleep (in msec). * * In normal conditions we wait at most one minute, to ensure that the other * background tasks handled by ServerLoop get done even when no requests are *** checkDataDir(void) *** 1543,1551 * we don't actually sleep so that they are quickly serviced. Other exception * cases are as shown in the code. */ ! static void ! DetermineSleepTime(struct timeval * timeout) { TimestampTz next_wakeup = 0; /* --- 1545,1554 * we don't actually sleep so that they are quickly serviced. Other exception * cases are as shown in the code. */ ! static long !
Re: [HACKERS] Unportable implementation of background worker start
Tom Lane wrote: > After sleeping and thinking more, I've realized that the > slow-bgworker-start issue actually exists on *every* platform, it's just > harder to hit when select() is interruptable. But consider the case > where multiple bgworker-start requests arrive while ServerLoop is > actively executing (perhaps because a connection request just came in). > The postmaster has signals blocked, so nothing happens for the moment. > When we go around the loop and reach > > PG_SETMASK(); > > the pending SIGUSR1 is delivered, and sigusr1_handler reads all the > bgworker start requests, and services just one of them. Then control > returns and proceeds to > > selres = select(nSockets, , NULL, NULL, ); > > But now there's no interrupt pending. So the remaining start requests > do not get serviced until (a) some other postmaster interrupt arrives, > or (b) the one-minute timeout elapses. They could be waiting awhile. > > Bottom line is that any request for more than one bgworker at a time > faces a non-negligible risk of suffering serious latency. Interesting. It's hard to hit, for sure. > I'm coming back to the idea that at least in the back branches, the > thing to do is allow maybe_start_bgworker to start multiple workers. > > Is there any actual evidence for the claim that that might have > bad side effects? Well, I ran tests with a few dozen thousand sample workers and the neglect for other things (such as connection requests) was visible, but that's probably not a scenario many servers run often currently. I don't strongly object to the idea of removing the "return" in older branches, since it's evidently a problem. However, as bgworkers start to be used more, I think we should definitely have some protection. In a system with a large number of workers available for parallel queries, it seems possible for a high velocity server to get stuck in the loop for some time. (I haven't actually verified this, though. My experiments were with the early kind, static bgworkers.) -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Unportable implementation of background worker start
Andres Freundwrites: > On 2017-04-20 00:50:13 -0400, Tom Lane wrote: >> My first reaction was that that sounded like a lot more work than removing >> two lines from maybe_start_bgworker and adjusting some comments. But on >> closer inspection, the slow-bgworker-start issue isn't the only problem >> here. > FWIW, I vaguely remember somewhat related issues on x86/linux too. After sleeping and thinking more, I've realized that the slow-bgworker-start issue actually exists on *every* platform, it's just harder to hit when select() is interruptable. But consider the case where multiple bgworker-start requests arrive while ServerLoop is actively executing (perhaps because a connection request just came in). The postmaster has signals blocked, so nothing happens for the moment. When we go around the loop and reach PG_SETMASK(); the pending SIGUSR1 is delivered, and sigusr1_handler reads all the bgworker start requests, and services just one of them. Then control returns and proceeds to selres = select(nSockets, , NULL, NULL, ); But now there's no interrupt pending. So the remaining start requests do not get serviced until (a) some other postmaster interrupt arrives, or (b) the one-minute timeout elapses. They could be waiting awhile. Bottom line is that any request for more than one bgworker at a time faces a non-negligible risk of suffering serious latency. I'm coming back to the idea that at least in the back branches, the thing to do is allow maybe_start_bgworker to start multiple workers. Is there any actual evidence for the claim that that might have bad side effects? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Old versions of Test::More
Andrew Dunstanwrites: > As we discovered yesterday via jacana, very old versions of Test::More > don't support the note() function. It therefore seems prudent to specify > a minimum version number for the module in those scripts that use the > function. According to the changelog, version 0.82 (released in Oct > 2008) should be sufficient. So I suggest the attached patch. Maybe it'd be better to put the minimum-version check somewhere central, like PostgresNode.pm? I suppose that our use of "note" will expand, and I can't see us remembering to put this into all and only the specific tests that use "note". Especially if there's no comment about it. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Old versions of Test::More
On 04/21/2017 10:36 AM, Daniel Gustafsson wrote: >> On 21 Apr 2017, at 15:08, Andrew Dunstan>> wrote: >> >> As we discovered yesterday via jacana, very old versions of Test::More >> don't support the note() function. It therefore seems prudent to specify >> a minimum version number for the module in those scripts that use the >> function. According to the changelog, version 0.82 (released in Oct >> 2008) should be sufficient. So I suggest the attached patch. > +1 for specifying version (0.82 was replaced with 0.84 within hours but it > doesn’t really matter for this). However, since src/test/ssl/ServerSetup.pm > also invoke note() doesn’t it make sense to add the version requirement there > as well as a guard in case a testscript using ServerSetup isn’t explicitly > specifying the Test::More version? > Yeah, good catch. Will do. cheers andrew -- Andrew Dunstanhttps://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Old versions of Test::More
> On 21 Apr 2017, at 15:08, Andrew Dunstan> wrote: > > As we discovered yesterday via jacana, very old versions of Test::More > don't support the note() function. It therefore seems prudent to specify > a minimum version number for the module in those scripts that use the > function. According to the changelog, version 0.82 (released in Oct > 2008) should be sufficient. So I suggest the attached patch. +1 for specifying version (0.82 was replaced with 0.84 within hours but it doesn’t really matter for this). However, since src/test/ssl/ServerSetup.pm also invoke note() doesn’t it make sense to add the version requirement there as well as a guard in case a testscript using ServerSetup isn’t explicitly specifying the Test::More version? cheers ./daniel test-more-version-v2.diff Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] On-disk format of SCRAM verifiers
On 21 April 2017 at 14:42, Heikki Linnakangaswrote: SCRAM-SHA-256$:$: >>> >>> Could you explain where you are looking? I don't see that in RFC5803 >> > >From 1. Overview: > > Yeah, it's not easy to see, I missed it earlier too. You have to look at RFC > 5803 and RFC 3112 together. RFC 3112 says that the overall format is > "$$", and RFC5803 says that for SCRAM, scheme is > "SCRAM-SHA-256" (for our variant), authInfo is ":" and > authValue is ":" > > They really should've included examples in those RFCs. Thanks +1 for change -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] tablesync patch broke the assumption that logical rep depends on?
On 21/04/17 16:23, Peter Eisentraut wrote: > On 4/21/17 10:11, Petr Jelinek wrote: >> On 21/04/17 16:09, Peter Eisentraut wrote: >>> On 4/20/17 14:29, Petr Jelinek wrote: + /* Find unused worker slot. */ + if (!w->in_use) { - worker = >workers[slot]; - break; + worker = w; + slot = i; + } >>> >>> Doesn't this still need a break? Otherwise it always picks the last slot. >>> >> >> Yes it will pick the last slot, does that matter though, is the first >> one better somehow? >> >> We can't break because we also need to continue the counter (I think the >> issue that the counter solves is probably just theoretical, but still). > > I see. I think the code would be less confusing if we break the loop > like before and call logicalrep_sync_worker_count() separately. > >> Hmm actually, maybe the if (!w->in_use) should be if (worker == NULL && >> !w->in_use)? > > That would also do it. But it's getting a bit fiddly. > I just wanted to avoid looping twice, especially since the garbage collecting code has to also do the loop. I guess I'll go with my original coding for this then which was to put retry label above the loop first, then try finding worker slot, if found call the logicalrep_sync_worker_count and if not found do the garbage collection and if we cleaned up something then goto retry. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] tablesync patch broke the assumption that logical rep depends on?
On 4/21/17 10:11, Petr Jelinek wrote: > On 21/04/17 16:09, Peter Eisentraut wrote: >> On 4/20/17 14:29, Petr Jelinek wrote: >>> + /* Find unused worker slot. */ >>> + if (!w->in_use) >>> { >>> - worker = >workers[slot]; >>> - break; >>> + worker = w; >>> + slot = i; >>> + } >> >> Doesn't this still need a break? Otherwise it always picks the last slot. >> > > Yes it will pick the last slot, does that matter though, is the first > one better somehow? > > We can't break because we also need to continue the counter (I think the > issue that the counter solves is probably just theoretical, but still). I see. I think the code would be less confusing if we break the loop like before and call logicalrep_sync_worker_count() separately. > Hmm actually, maybe the if (!w->in_use) should be if (worker == NULL && > !w->in_use)? That would also do it. But it's getting a bit fiddly. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Interval for launching the table sync worker
On Fri, Apr 21, 2017 at 5:33 PM, Kyotaro HORIGUCHIwrote: > Hello, > > At Thu, 20 Apr 2017 13:21:14 +0900, Masahiko Sawada > wrote in
Re: [HACKERS] subscription worker doesn't start immediately on eabled
On 4/6/17 08:24, Kyotaro HORIGUCHI wrote: > Hello. I found dubious behavior while playing with logical > replication. > > When we disable a subscription, replication worker immediately stops. > > =# ALTER SUBSCRIPTION s1 DISABLE; > > On the other hand even if we enable a subscription, worker > doesn't start immediately. It takes 3 minutes in the worst > case. (DEFAULT_NAPTIME_PER_CYCLE) > > The attached patch wakes up launcher when a subscription is > enabled. This fails when a subscription is enabled immedaitely > after disabling but it won't be a matter. What do you mean by "this fails"? -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] tablesync patch broke the assumption that logical rep depends on?
On 21/04/17 16:09, Peter Eisentraut wrote: > On 4/20/17 14:29, Petr Jelinek wrote: >> +/* Find unused worker slot. */ >> +if (!w->in_use) >> { >> -worker = >workers[slot]; >> -break; >> +worker = w; >> +slot = i; >> +} > > Doesn't this still need a break? Otherwise it always picks the last slot. > Yes it will pick the last slot, does that matter though, is the first one better somehow? We can't break because we also need to continue the counter (I think the issue that the counter solves is probably just theoretical, but still). Hmm actually, maybe the if (!w->in_use) should be if (worker == NULL && !w->in_use)? -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] tablesync patch broke the assumption that logical rep depends on?
On 4/20/17 22:24, Noah Misch wrote: > On Mon, Apr 17, 2017 at 02:09:54PM -0400, Peter Eisentraut wrote: >> I think we're not really sure yet what to do about this. Discussion is >> ongoing. I'll report back on Wednesday. > > This PostgreSQL 10 open item is past due for your status update, and this is > overall the seventh time you have you allowed one of your open items to go out > of compliance. Kindly start complying with the policy on open item ownership: > https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com We are actively working on it. It should be resolved within a few days. Next check-in Monday. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] tablesync patch broke the assumption that logical rep depends on?
On 4/20/17 14:29, Petr Jelinek wrote: > + /* Find unused worker slot. */ > + if (!w->in_use) > { > - worker = >workers[slot]; > - break; > + worker = w; > + slot = i; > + } Doesn't this still need a break? Otherwise it always picks the last slot. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Interval for launching the table sync worker
On 21/04/17 10:33, Kyotaro HORIGUCHI wrote: > Hello, > > At Thu, 20 Apr 2017 13:21:14 +0900, Masahiko Sawada> wrote in
Re: [HACKERS] Interval for launching the table sync worker
On 21/04/17 04:38, Masahiko Sawada wrote: > On Thu, Apr 20, 2017 at 8:43 PM, Petr Jelinek >wrote: >> On 20/04/17 06:21, Masahiko Sawada wrote: >>> On Thu, Apr 20, 2017 at 12:30 AM, Petr Jelinek >>> wrote: On 19/04/17 15:57, Masahiko Sawada wrote: > On Wed, Apr 19, 2017 at 10:07 PM, Petr Jelinek > wrote: >> On 19/04/17 14:42, Masahiko Sawada wrote: >>> On Wed, Apr 19, 2017 at 5:12 PM, Kyotaro HORIGUCHI >>> wrote: At Tue, 18 Apr 2017 18:40:56 +0200, Petr Jelinek wrote in > On 18/04/17 18:14, Peter Eisentraut wrote: >> On 4/18/17 11:59, Petr Jelinek wrote: >>> Hmm if we create hashtable for this, I'd say create hashtable for >>> the >>> whole table_states then. The reason why it's list now was that it >>> seemed >>> unnecessary to have hashtable when it will be empty almost always >>> but >>> there is no need to have both hashtable + list IMHO. I understant that but I also don't like the frequent palloc/pfree in long-lasting context and double loop like Peter. >> The difference is that we blow away the list of states when the >> catalog >> changes, but we keep the hash table with the start times around. We >> need two things with different life times. On the other hand, hash seems overdone. Addition to that, the hash-version leaks stale entries while subscriptions are modified. But vacuuming them costs high. > Why can't we just update the hashtable based on the catalog? I mean > once > the record is not needed in the list, the table has been synced so > there > is no need for the timestamp either since we'll not try to start the > worker again. >>> >>> I guess the table sync worker for the same table could need to be >>> started again. For example, please image a case where the table >>> belonging to the publication is removed from it and the corresponding >>> subscription is refreshed, and then the table is added to it again. We >>> have the record of the table with timestamp in the hash table when the >>> table sync in the first time, but the table sync after refreshed could >>> have to wait for the interval. >>> >> >> But why do we want to wait in such case where user has explicitly >> requested refresh? >> > > Yeah, sorry, I meant that we don't want to wait but cannot launch the > tablesync worker in such case. > > But after more thought, the latest patch Peter proposed has the same > problem. Perhaps we need to scan always whole pg_subscription_rel and > remove the entry if the corresponding table get synced. > Yes that's what I mean by "Why can't we just update the hashtable based on the catalog". And if we do that then I don't understand why do we need both hastable and linked list if we need to update both based on catalog reads anyway. >>> >>> Thanks, I've now understood correctly. Yes, I think you're right. If >>> we update the hash table based on the catalog whenever table state is >>> invalidated, we don't need to have both hash table and list. >>> >>> BTW, in current HEAD the SUBREL_STATE_SYNCWAIT is not stored in the >>> pg_subscription_catalog. So the following condition seems not correct. >>> We should use "syncworker->relstate == SUBSCRIPTION_STATE_SYNCWAIT" >>> instead? >>> >>> /* >>> * There is a worker synchronizing the relation and waiting for >>> * apply to do something. >>> */ >>> if (syncworker && rstate->state == SUBREL_STATE_SYNCWAIT) >>> { >>> /* >>> * There are three possible synchronization situations here. >>> * >>> * a) Apply is in front of the table sync: We tell the table >>> *sync to CATCHUP. >>> * >>> * b) Apply is behind the table sync: We tell the table sync >>> *to mark the table as SYNCDONE and finish. >>> >>> * c) Apply and table sync are at the same position: We tell >>> *table sync to mark the table as READY and finish. >>> * >>> * In any case we'll need to wait for table sync to change >>> * the state in catalog and only then continue ourselves. >>> */ >>> >> >> Good catch. Although it's not comment that's wrong, it's the if. We >> should not compare rstate->state but syncworker->relstate. > > I've attached a patch to fix this bug. >
Re: [HACKERS] Triggers and logical replication (10devel)
On 21.04.2017 16:29, Merlin Moncure wrote: On Fri, Apr 21, 2017 at 5:08 AM, Egor Rogovwrote: Hello, It seams that tiggers don't fire on subscriber's tables during logical replication. Is it a bug? Reading the documentation (which is TBH a bit hard to follow) it appears that it is expected behavior. https://www.postgresql.org/docs/devel/static/sql-altertable.html states: "The trigger firing mechanism is also affected by the configuration variable session_replication_role. Simply enabled triggers will fire when the replication role is “origin” (the default) or “local”." Ah, ALTER TABLE ENABLE REPLICA TRIGGER is what I missed. Thanks! Regards, Egor Rogov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WITH clause in CREATE STATISTICS
Moin, On Fri, April 21, 2017 7:04 am, David Rowley wrote: > On 21 April 2017 at 22:30, Telswrote: >> I'd rather see: >> >> CREATE STATISTICS stats_name ON table(col); >> >> as this both mirrors CREATE INDEX and foreign keys with REFERENCES. It >> could also be extended to both more columns, expressions or other tables >> like so: >> >> CREATE STATISTICS stats ON t1(col1, col2 / 2), t2 (a,b); >> >> and even: >> >> CREATE STATISTICS stats ON t1(col1, col2 / 2), t2 (a,b) WITH (options) >> WHERE t2.a > 4; > > How would you express a join condition with that syntax? > >> This looks easy to remember, since it compares to: >> >> CREATE INDEX idx_name ON t2(a,b) WITH (options) WHERE t2.a > 4; >> >> Or am I'm missing something? > > Sadly yes, you are, and it's not the first time. > > I seem to remember mentioning this to you already in [1]. > > Please, can you read over [2], it mentions exactly what you're > proposing and why it's not any good. > > [1] > https://www.postgresql.org/message-id/cakjs1f9hmet+7adiceau8heompob5pkkcvyzliezje3dvut...@mail.gmail.com > [2] > https://www.postgresql.org/message-id/CAEZATCUtGR+U5+QTwjHhe9rLG2nguEysHQ5NaqcK=vbj78v...@mail.gmail.com Ah, ok, thank you, somehow I missed your answer the first time. So, just ignore me :) Best wishes, Tels -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] On-disk format of SCRAM verifiers
Michael Paquierwrites: > On Fri, Apr 21, 2017 at 10:02 PM, Simon Riggs wrote: >> On 21 April 2017 at 10:20, Heikki Linnakangas wrote: >>> But looking more closely, I think I misunderstood RFC 5803. It *does* in >>> fact specify a single string format to store the verifier in. And the format >>> looks like: >>> >>> SCRAM-SHA-256$:$: >> >> Could you explain where you are looking? I don't see that in RFC5803 > > From 1. Overview: > >Syntax of the attribute can be expressed using ABNF [RFC5234]. Non- >terminal references in the following ABNF are defined in either >[AUTHPASS], [RFC4422], or [RFC5234]. > >scram-mech = "SCRAM-SHA-1" / scram-mech-ext > ; Complies with ABNF for > ; defined in [AUTHPASS]. > >scram-authInfo = iter-count ":" salt > ; Complies with ABNF for > ; defined in [AUTHPASS]. > >scram-authValue = stored-key ":" server-key > ; Complies with ABNF for > ; defined in [AUTHPASS]. And scram-mech, scram-authInfo and scram-authValue are used as the "scheme", "authInfo" and "authValue" parts as specified in [AUTHPASS] (RFC3112): authPasswordValue = w scheme s authInfo s authValue w scheme = %x30-39 / %x41-5A / %x2D-2F / %x5F ; 0-9, A-Z, "-", ".", "/", or "_" authInfo= schemeSpecificValue authValue = schemeSpecificValue schemeSpecificValue = *( %x21-23 / %x25-7E ) ; printable ASCII less "$" and " " s = w SEP w w = *SP SEP = %x24 ; "$" SP = %x20 ; " " (space) > Thanks, > -- > Michael - ilmari -- "A disappointingly low fraction of the human race is, at any given time, on fire." - Stig Sandbeck Mathisen -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] On-disk format of SCRAM verifiers
On 21 April 2017 16:20:56 EEST, Michael Paquierwrote: >On Fri, Apr 21, 2017 at 10:02 PM, Simon Riggs >wrote: >> On 21 April 2017 at 10:20, Heikki Linnakangas >wrote: >>> But looking more closely, I think I misunderstood RFC 5803. It >*does* in >>> fact specify a single string format to store the verifier in. And >the format >>> looks like: >>> >>> SCRAM-SHA-256$:$: >> >> Could you explain where you are looking? I don't see that in RFC5803 > >From 1. Overview: Yeah, it's not easy to see, I missed it earlier too. You have to look at RFC 5803 and RFC 3112 together. RFC 3112 says that the overall format is "$$", and RFC5803 says that for SCRAM, scheme is "SCRAM-SHA-256" (for our variant), authInfo is ":" and authValue is ":" They really should've included examples in those RFCs. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] On-disk format of SCRAM verifiers
On 21 April 2017 at 14:20, Michael Paquierwrote: > On Fri, Apr 21, 2017 at 10:02 PM, Simon Riggs wrote: >> On 21 April 2017 at 10:20, Heikki Linnakangas wrote: >>> But looking more closely, I think I misunderstood RFC 5803. It *does* in >>> fact specify a single string format to store the verifier in. And the format >>> looks like: >>> >>> SCRAM-SHA-256$:$: >> >> Could you explain where you are looking? I don't see that in RFC5803 > > From 1. Overview: > >Syntax of the attribute can be expressed using ABNF [RFC5234]. Non- >terminal references in the following ABNF are defined in either >[AUTHPASS], [RFC4422], or [RFC5234]. > >scram-mech = "SCRAM-SHA-1" / scram-mech-ext > ; Complies with ABNF for > ; defined in [AUTHPASS]. > >scram-authInfo = iter-count ":" salt > ; Complies with ABNF for > ; defined in [AUTHPASS]. > >scram-authValue = stored-key ":" server-key > ; Complies with ABNF for > ; defined in [AUTHPASS]. > > Thanks, The above text, which I've already read, does not explain the suggested change from : to $. Could you explain? -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Triggers and logical replication (10devel)
On Fri, Apr 21, 2017 at 5:08 AM, Egor Rogovwrote: > Hello, > It seams that tiggers don't fire on subscriber's tables during logical > replication. Is it a bug? Reading the documentation (which is TBH a bit hard to follow) it appears that it is expected behavior. https://www.postgresql.org/docs/devel/static/logical-replication-architecture.html#logical-replication-snapshot states: "The apply process on the subscriber database always runs with session_replication_role set to replica, which produces the usual effects on triggers and constraints." https://www.postgresql.org/docs/devel/static/sql-altertable.html states: "The trigger firing mechanism is also affected by the configuration variable session_replication_role. Simply enabled triggers will fire when the replication role is “origin” (the default) or “local”." merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] On-disk format of SCRAM verifiers
On Fri, Apr 21, 2017 at 9:25 PM, Stephen Frostwrote: > * Heikki Linnakangas (hlinn...@iki.fi) wrote: >> I think we should adopt that exact format, so that our verifiers are >> compatible with RFC 5803. It doesn't make any immediate difference, >> but since there is a standard out there, might as well follow it. > > +1 > >> And just in case we get support for looking up SCRAM verifiers from >> an LDAP server in the future, it will come handy as we won't need to >> parse two different formats. > > Agreed. +1 to all that. Consistency is a good thing. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] On-disk format of SCRAM verifiers
On Fri, Apr 21, 2017 at 10:02 PM, Simon Riggswrote: > On 21 April 2017 at 10:20, Heikki Linnakangas wrote: >> But looking more closely, I think I misunderstood RFC 5803. It *does* in >> fact specify a single string format to store the verifier in. And the format >> looks like: >> >> SCRAM-SHA-256$:$: > > Could you explain where you are looking? I don't see that in RFC5803 >From 1. Overview: Syntax of the attribute can be expressed using ABNF [RFC5234]. Non- terminal references in the following ABNF are defined in either [AUTHPASS], [RFC4422], or [RFC5234]. scram-mech = "SCRAM-SHA-1" / scram-mech-ext ; Complies with ABNF for ; defined in [AUTHPASS]. scram-authInfo = iter-count ":" salt ; Complies with ABNF for ; defined in [AUTHPASS]. scram-authValue = stored-key ":" server-key ; Complies with ABNF for ; defined in [AUTHPASS]. Thanks, -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Old versions of Test::More
As we discovered yesterday via jacana, very old versions of Test::More don't support the note() function. It therefore seems prudent to specify a minimum version number for the module in those scripts that use the function. According to the changelog, version 0.82 (released in Oct 2008) should be sufficient. So I suggest the attached patch. cheers andrew -- Andrew Dunstanhttps://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services diff --git a/src/test/recovery/t/001_stream_rep.pl b/src/test/recovery/t/001_stream_rep.pl index ccd5943..29e648e 100644 --- a/src/test/recovery/t/001_stream_rep.pl +++ b/src/test/recovery/t/001_stream_rep.pl @@ -3,7 +3,7 @@ use strict; use warnings; use PostgresNode; use TestLib; -use Test::More tests => 28; +use Test::More 0.82 tests => 28; # Initialize master node my $node_master = get_new_node('master'); diff --git a/src/test/recovery/t/010_logical_decoding_timelines.pl b/src/test/recovery/t/010_logical_decoding_timelines.pl index cdddb4d..bd353ca 100644 --- a/src/test/recovery/t/010_logical_decoding_timelines.pl +++ b/src/test/recovery/t/010_logical_decoding_timelines.pl @@ -23,7 +23,7 @@ use warnings; use PostgresNode; use TestLib; -use Test::More tests => 13; +use Test::More 0.82 tests => 13; use RecursiveCopy; use File::Copy; use IPC::Run (); diff --git a/src/test/ssl/t/001_ssltests.pl b/src/test/ssl/t/001_ssltests.pl index 0c69bf0..4bd905d 100644 --- a/src/test/ssl/t/001_ssltests.pl +++ b/src/test/ssl/t/001_ssltests.pl @@ -2,7 +2,7 @@ use strict; use warnings; use PostgresNode; use TestLib; -use Test::More tests => 40; +use Test::More 0.82 tests => 40; use ServerSetup; use File::Copy; -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] On-disk format of SCRAM verifiers
On 21 April 2017 at 10:20, Heikki Linnakangaswrote: > But looking more closely, I think I misunderstood RFC 5803. It *does* in > fact specify a single string format to store the verifier in. And the format > looks like: > > SCRAM-SHA-256$:$: Could you explain where you are looking? I don't see that in RFC5803 -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] DROP SUBSCRIPTION, query cancellations and slot handling
On 4/20/17 22:57, Michael Paquier wrote: > On Fri, Apr 21, 2017 at 11:34 AM, Petr Jelinek >wrote: >> On 20/04/17 23:30, Peter Eisentraut wrote: >>> On 4/20/17 10:19, Petr Jelinek wrote: Hmm well since this only affects the synchronization of table states/names, I guess we could just simply do that before we create the slot as there is no expectancy of consistency between slot and the table list snapshot. >>> >>> I suppose that wouldn't hurt. >>> >>> Prior to the table sync patch, a missing target relation would just show >>> up as an error later on in the logs. So having the error sooner >>> actually seems like a good change. >>> >> >> Very simple patch to make. > > +1 for that. Committed that. I don't think there is anything else open here. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] dtrace probes
On 04/20/2017 10:30 AM, Jesper Pedersen wrote: I think this fix is harmless and has some value in terms of consistency. One minor suggestion is that you should leave a space after typecasting. - TRACE_POSTGRESQL_LWLOCK_WAIT_DONE(T_NAME(lock), mode); + TRACE_POSTGRESQL_LWLOCK_WAIT_DONE(T_NAME(lock), (int)mode); There should be a space like "(int) mode". v2 attached. I managed to attach the same patch again, so here is v3. Best regards, Jesper From 0d964df84950ca90c08ed6dd77a575d4b70ea7db Mon Sep 17 00:00:00 2001 From: jesperpedersenDate: Tue, 18 Apr 2017 11:44:18 -0400 Subject: [PATCH] Explicit cast LWLockMode, which an enum, to int in order to match the dtrace definition of the lwlock methods. Thereby all call sites will have the same definition instead of a mix between signed and unsigned. Author: Jesper Pedersen --- src/backend/storage/lmgr/lwlock.c | 18 +- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/src/backend/storage/lmgr/lwlock.c b/src/backend/storage/lmgr/lwlock.c index 3e13394..c551be2 100644 --- a/src/backend/storage/lmgr/lwlock.c +++ b/src/backend/storage/lmgr/lwlock.c @@ -1226,7 +1226,7 @@ LWLockAcquire(LWLock *lock, LWLockMode mode) #endif LWLockReportWaitStart(lock); - TRACE_POSTGRESQL_LWLOCK_WAIT_START(T_NAME(lock), mode); + TRACE_POSTGRESQL_LWLOCK_WAIT_START(T_NAME(lock), (int) mode); for (;;) { @@ -1248,7 +1248,7 @@ LWLockAcquire(LWLock *lock, LWLockMode mode) } #endif - TRACE_POSTGRESQL_LWLOCK_WAIT_DONE(T_NAME(lock), mode); + TRACE_POSTGRESQL_LWLOCK_WAIT_DONE(T_NAME(lock), (int) mode); LWLockReportWaitEnd(); LOG_LWDEBUG("LWLockAcquire", lock, "awakened"); @@ -1257,7 +1257,7 @@ LWLockAcquire(LWLock *lock, LWLockMode mode) result = false; } - TRACE_POSTGRESQL_LWLOCK_ACQUIRE(T_NAME(lock), mode); + TRACE_POSTGRESQL_LWLOCK_ACQUIRE(T_NAME(lock), (int) mode); /* Add lock to list of locks held by this backend */ held_lwlocks[num_held_lwlocks].lock = lock; @@ -1308,14 +1308,14 @@ LWLockConditionalAcquire(LWLock *lock, LWLockMode mode) RESUME_INTERRUPTS(); LOG_LWDEBUG("LWLockConditionalAcquire", lock, "failed"); - TRACE_POSTGRESQL_LWLOCK_CONDACQUIRE_FAIL(T_NAME(lock), mode); + TRACE_POSTGRESQL_LWLOCK_CONDACQUIRE_FAIL(T_NAME(lock), (int) mode); } else { /* Add lock to list of locks held by this backend */ held_lwlocks[num_held_lwlocks].lock = lock; held_lwlocks[num_held_lwlocks++].mode = mode; - TRACE_POSTGRESQL_LWLOCK_CONDACQUIRE(T_NAME(lock), mode); + TRACE_POSTGRESQL_LWLOCK_CONDACQUIRE(T_NAME(lock), (int) mode); } return !mustwait; } @@ -1387,7 +1387,7 @@ LWLockAcquireOrWait(LWLock *lock, LWLockMode mode) #endif LWLockReportWaitStart(lock); - TRACE_POSTGRESQL_LWLOCK_WAIT_START(T_NAME(lock), mode); + TRACE_POSTGRESQL_LWLOCK_WAIT_START(T_NAME(lock), (int) mode); for (;;) { @@ -1405,7 +1405,7 @@ LWLockAcquireOrWait(LWLock *lock, LWLockMode mode) Assert(nwaiters < MAX_BACKENDS); } #endif - TRACE_POSTGRESQL_LWLOCK_WAIT_DONE(T_NAME(lock), mode); + TRACE_POSTGRESQL_LWLOCK_WAIT_DONE(T_NAME(lock), (int) mode); LWLockReportWaitEnd(); LOG_LWDEBUG("LWLockAcquireOrWait", lock, "awakened"); @@ -1435,7 +1435,7 @@ LWLockAcquireOrWait(LWLock *lock, LWLockMode mode) /* Failed to get lock, so release interrupt holdoff */ RESUME_INTERRUPTS(); LOG_LWDEBUG("LWLockAcquireOrWait", lock, "failed"); - TRACE_POSTGRESQL_LWLOCK_ACQUIRE_OR_WAIT_FAIL(T_NAME(lock), mode); + TRACE_POSTGRESQL_LWLOCK_ACQUIRE_OR_WAIT_FAIL(T_NAME(lock), (int) mode); } else { @@ -1443,7 +1443,7 @@ LWLockAcquireOrWait(LWLock *lock, LWLockMode mode) /* Add lock to list of locks held by this backend */ held_lwlocks[num_held_lwlocks].lock = lock; held_lwlocks[num_held_lwlocks++].mode = mode; - TRACE_POSTGRESQL_LWLOCK_ACQUIRE_OR_WAIT(T_NAME(lock), mode); + TRACE_POSTGRESQL_LWLOCK_ACQUIRE_OR_WAIT(T_NAME(lock), (int) mode); } return !mustwait; -- 2.7.4 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] On-disk format of SCRAM verifiers
Heikki, * Heikki Linnakangas (hlinn...@iki.fi) wrote: > I think we should adopt that exact format, so that our verifiers are > compatible with RFC 5803. It doesn't make any immediate difference, > but since there is a standard out there, might as well follow it. +1 > And just in case we get support for looking up SCRAM verifiers from > an LDAP server in the future, it will come handy as we won't need to > parse two different formats. Agreed. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] WITH clause in CREATE STATISTICS
On 21 April 2017 at 22:30, Telswrote: > I'd rather see: > > CREATE STATISTICS stats_name ON table(col); > > as this both mirrors CREATE INDEX and foreign keys with REFERENCES. It > could also be extended to both more columns, expressions or other tables > like so: > > CREATE STATISTICS stats ON t1(col1, col2 / 2), t2 (a,b); > > and even: > > CREATE STATISTICS stats ON t1(col1, col2 / 2), t2 (a,b) WITH (options) > WHERE t2.a > 4; How would you express a join condition with that syntax? > This looks easy to remember, since it compares to: > > CREATE INDEX idx_name ON t2(a,b) WITH (options) WHERE t2.a > 4; > > Or am I'm missing something? Sadly yes, you are, and it's not the first time. I seem to remember mentioning this to you already in [1]. Please, can you read over [2], it mentions exactly what you're proposing and why it's not any good. [1] https://www.postgresql.org/message-id/cakjs1f9hmet+7adiceau8heompob5pkkcvyzliezje3dvut...@mail.gmail.com [2] https://www.postgresql.org/message-id/CAEZATCUtGR+U5+QTwjHhe9rLG2nguEysHQ5NaqcK=vbj78v...@mail.gmail.com -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WITH clause in CREATE STATISTICS
Moin, On Thu, April 20, 2017 8:21 pm, Tomas Vondra wrote: > On 04/21/2017 12:13 AM, Tom Lane wrote: >> Alvaro Herrerawrites: >>> Simon just pointed out that having the WITH clause appear in the middle >>> of the CREATE STATISTICS command looks odd; apparently somebody else >>> already complained on list about the same. Other commands put the WITH >>> clause at the end, so perhaps we should do likewise in the new command. >> >>> Here's a patch to implement that. I verified that if I change >>> qualified_name to qualified_name_list, bison does not complain about >>> conflicts, so this new syntax should support extension to multiple >>> relations without a problem. >> >> Yeah, WITH is fully reserved, so as long as the clause looks like >> WITH ( stuff... ) you're pretty much gonna be able to drop it >> wherever you want. >> >>> Discuss. >> >> +1 for WITH at the end; the existing syntax looks weird to me too. >> > > -1 from me > > I like the current syntax more, and WHERE ... WITH seems a bit weird to > me. But more importantly, one thing Dean probably considered when > proposing the current syntax was that we may add support for partial > statistics, pretty much like partial indexes. And we don't allow WITH at > the end (after WHERE) for indexes: > > test=# create index on t (a) where a < 100 with (fillfactor=10); > ERROR: syntax error at or near "with" > LINE 1: create index on t (a) where a < 100 with (fillfactor=10); > ^ > test=# create index on t (a) with (fillfactor=10) where a < 100; While I'm not sure about where to put the WITH, so to speak, I do favour consistency. So I'm inclinded to keep the syntax like for the "create index". More importantly however, I'd rather see the syntax on the "ON (column) FROM relname" to be changed to match the existing examples. (Already wrote this to Simon, not sure if my email made it to the list) So instead: CREATE STATISTICS stats_name ON (columns) FROM relname I'd rather see: CREATE STATISTICS stats_name ON table(col); as this both mirrors CREATE INDEX and foreign keys with REFERENCES. It could also be extended to both more columns, expressions or other tables like so: CREATE STATISTICS stats ON t1(col1, col2 / 2), t2 (a,b); and even: CREATE STATISTICS stats ON t1(col1, col2 / 2), t2 (a,b) WITH (options) WHERE t2.a > 4; This looks easy to remember, since it compares to: CREATE INDEX idx_name ON t2(a,b) WITH (options) WHERE t2.a > 4; Or am I'm missing something? Regards, Tels -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Why is get_cheapest_parallel_safe_total_inner() in pathkeys.c?
This probably ended up here because there's a bunch of other functions named get_cheapest* in that file, but all of those relate to getting a path for specific PathKeys. get_cheapest_parallel_safe_total_inner() does not do that. Maybe allpaths.c is a better home for it? It seems to have been added in [1] [1] https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=a71f10189dc10a2fe422158a2c9409e0f77c6b9e -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Triggers and logical replication (10devel)
Hello, It seams that tiggers don't fire on subscriber's tables during logical replication. Is it a bug? # # publisher: simple table and publication # postgres@publisher=# SELECT version(); version --- PostgreSQL 10devel on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 5.4.0-6ubuntu1~16.04.4) 5.4.0 20160609, 64-bit (1 row) postgres@publisher=# CREATE TABLE t(n integer PRIMARY KEY); postgres@publisher=# CREATE PUBLICATION testpub FOR TABLE t; # # subscriber: the same table, triggers to write some information into log table, and subscription # postgres@subscriber=# CREATE TABLE t(n integer PRIMARY KEY); postgres@subscriber=# CREATE TABLE log(tg_table_name text, tg_when text, tg_op text, tg_level text, tg_str text); postgres@subscriber=# CREATE OR REPLACE FUNCTION public.describe() RETURNS trigger LANGUAGE plpgsql AS $function$ DECLARE rec record; str text := ''; BEGIN IF TG_LEVEL = 'ROW' THEN CASE TG_OP WHEN 'DELETE' THEN rec := OLD; str := OLD::text; WHEN 'UPDATE' THEN rec := NEW; str := OLD || ' -> ' || NEW; WHEN 'INSERT' THEN rec := NEW; str := NEW::text; END CASE; END IF; INSERT INTO log(tg_table_name, tg_when, tg_op, tg_level, tg_str) VALUES (TG_TABLE_NAME, TG_WHEN, TG_OP, TG_LEVEL, str); RETURN rec; END; $function$; postgres@subscriber=# CREATE TRIGGER t_before_row BEFORE INSERT OR UPDATE OR DELETE ON t FOR EACH ROW EXECUTE PROCEDURE describe(); postgres@subscriber=# CREATE TRIGGER t_after_row AFTER INSERT OR UPDATE OR DELETE ON t FOR EACH ROW EXECUTE PROCEDURE describe(); postgres@subscriber=# CREATE TRIGGER t_before_stmt BEFORE INSERT OR UPDATE OR DELETE ON t FOR EACH STATEMENT EXECUTE PROCEDURE describe(); postgres@subscriber=# CREATE TRIGGER t_after_stmt AFTER INSERT OR UPDATE OR DELETE ON t FOR EACH STATEMENT EXECUTE PROCEDURE describe(); postgres@subscriber=# CREATE SUBSCRIPTION testsub CONNECTION 'host=localhost port=5432 user=postgres dbname=postgres' PUBLICATION testpub; # # publisher # postgres@publisher=# INSERT INTO t VALUES (1); INSERT 0 1 # # subscriber # postgres@subscriber=# SELECT * FROM t; n --- 1 (1 row) postgres@subscriber=# SELECT * FROM log; tg_table_name | tg_when | tg_op | tg_level | tg_str ---+-+---+--+ (0 rows) # # so replication works, but triggers don't. # now check that triggers work alright locally: # postgres@subscriber=# INSERT INTO t VALUES (100); INSERT 0 1 postgres@subscriber=# SELECT * FROM log; tg_table_name | tg_when | tg_op | tg_level | tg_str ---+-++---+ t | BEFORE | INSERT | STATEMENT | t | BEFORE | INSERT | ROW | (100) t | AFTER | INSERT | ROW | (100) t | AFTER | INSERT | STATEMENT | (4 rows) Regards, Egor Rogov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] On-disk format of SCRAM verifiers
The current format for SCRAM verifiers in pg_authid is: scram-sha-256 While reviewing Michael's patch to change that so that StoredKey and ServerKey are stored base64-encoded, rather than hex-encoded as they are currently [1], I looked again at RFC 5803. RFC 5803 specifies the format to use when storing SCRAM verifiers in LDAP. I looked at it earlier, and it was a source of inspiration for the current format, but I didn't think that it was directly applicable. I thought that in RFC 5803 the different fields were stored as separate fields or attributes, not as a single string. But looking more closely, I think I misunderstood RFC 5803. It *does* in fact specify a single string format to store the verifier in. And the format looks like: SCRAM-SHA-256$:$: Alternating '$' and ':' as the separators seems a bit wonky, but it actually makes sense. ":" is treated as one field, and ":" is treated as another, which is logical, since the iteration count and salt are sent together to the client in the SCRAM challenge, while StoredKey and ServerKey must be kept secret. I think we should adopt that exact format, so that our verifiers are compatible with RFC 5803. It doesn't make any immediate difference, but since there is a standard out there, might as well follow it. And just in case we get support for looking up SCRAM verifiers from an LDAP server in the future, it will come handy as we won't need to parse two different formats. Barring objections, I'll go change our on-disk format for SCRAM verifiers to follow RFC 5803. [1] https://www.postgresql.org/message-id/CAB7nPqSbsCBCxy8-DtwzRxYgTnbGUtY4uFEkLQhG%3DR%3Duo%3Dg8Fw%40mail.gmail.com - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Interval for launching the table sync worker
Hello, At Thu, 20 Apr 2017 13:21:14 +0900, Masahiko Sawadawrote in
Re: [HACKERS] Declarative partitioning - another take
Hi, I have observed below with the statement triggers. I am able to create statement triggers at root partition, but these triggers, not getting fired on updating partition. CREATE TABLE pt (a INT, b INT) PARTITION BY RANGE(a); CREATE TABLE pt1 PARTITION OF pt FOR VALUES FROM (1) to (7); CREATE TABLE pt2 PARTITION OF pt FOR VALUES FROM (7) to (11); INSERT INTO pt (a,b) SELECT i,i FROM generate_series(1,10)i; CREATE TABLE pt_trigger(TG_NAME varchar,TG_TABLE_NAME varchar,TG_LEVEL varchar,TG_WHEN varchar); CREATE OR REPLACE FUNCTION process_pt_trigger() RETURNS TRIGGER AS $pttg$ BEGIN IF (TG_OP = 'UPDATE') THEN INSERT INTO pt_trigger SELECT TG_NAME,TG_TABLE_NAME,TG_LEVEL,TG_WHEN; RETURN NEW; END IF; RETURN NULL; END; $pttg$ LANGUAGE plpgsql; CREATE TRIGGER pt_trigger_after_p0 AFTER UPDATE ON pt FOR EACH STATEMENT EXECUTE PROCEDURE process_pt_trigger(); CREATE TRIGGER pt_trigger_before_p0 BEFORE UPDATE ON pt FOR EACH STATEMENT EXECUTE PROCEDURE process_pt_trigger(); postgres=# UPDATE pt SET a = 2 WHERE a = 1; UPDATE 1 postgres=# SELECT * FROM pt_trigger ORDER BY 1; tg_name | tg_table_name | tg_level | tg_when -+---+--+- (0 rows) no statement level trigger fired in this case, is this expected behaviour?? but if i am creating triggers on leaf partition, trigger is getting fired. CREATE TRIGGER pt_trigger_after_p1 AFTER UPDATE ON pt1 FOR EACH STATEMENT EXECUTE PROCEDURE process_pt_trigger(); CREATE TRIGGER pt_trigger_before_p1 BEFORE UPDATE ON pt1 FOR EACH STATEMENT EXECUTE PROCEDURE process_pt_trigger(); postgres=# UPDATE pt SET a = 5 WHERE a = 4; UPDATE 1 postgres=# SELECT * FROM pt_trigger ORDER BY 1; tg_name| tg_table_name | tg_level | tg_when --+---+---+- pt_trigger_after_p1 | pt1 | STATEMENT | AFTER pt_trigger_before_p1 | pt1 | STATEMENT | BEFORE (2 rows) Thanks & Regards, Rajkumar Raghuwanshi QMG, EnterpriseDB Corporation
Re: [HACKERS] some review comments on logical rep code
On Fri, Apr 21, 2017 at 11:55 AM, Masahiko Sawadawrote: > On Fri, Apr 21, 2017 at 4:41 AM, Peter Eisentraut > wrote: >> On 4/20/17 12:30, Fujii Masao wrote: >>> I've pushed several patches, and there is now only one remaining patch. >>> I posted the review comment on that patch, and I'm expecting that >>> Masahiko-san will update the patch. So what about waiting for the updated >>> version of the patch by next Monday (April 24th)? >> >> Thank you for pitching in. >> >> Where is your latest patch? >> > > The remaining patch is 002_Reset_on_commit_launcher_wakeup_v3.patch Oops, there is one more remaining issue that reported on this thread and another one[1]. The latest patch is attached on [1]. [1] https://www.postgresql.org/message-id/20170406.212441.177025736.horiguchi.kyotaro%40lab.ntt.co.jp Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] some review comments on logical rep code
On Fri, Apr 21, 2017 at 1:19 AM, Fujii Masaowrote: > On Tue, Apr 18, 2017 at 5:16 PM, Masahiko Sawada > wrote: >> On Tue, Apr 18, 2017 at 12:24 PM, Kyotaro HORIGUCHI >> wrote: >>> Hi, >>> >>> Thank you for the revised version. >>> >>> At Mon, 17 Apr 2017 23:29:28 +0900, Masahiko Sawada >>> wrote in >>> On Mon, Apr 17, 2017 at 9:13 PM, Masahiko Sawada wrote: > On Mon, Apr 17, 2017 at 7:39 PM, Kyotaro HORIGUCHI > wrote: >> At Mon, 17 Apr 2017 18:02:57 +0900, Masahiko Sawada >> wrote in >>
Re: [HACKERS] identity columns
On Tue, Apr 18, 2017 at 11:53:36AM -0400, Peter Eisentraut wrote: > On 4/18/17 02:07, Noah Misch wrote: > > This PostgreSQL 10 open item is past due for your status update. Kindly > > send > > a status update within 24 hours, and include a date for your subsequent > > status > > update. Refer to the policy on open item ownership: > > https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com > > I am in disagreement with the submitter that this is a problem or what > the solution should be. The discussion is ongoing. Note that the > current state isn't actually broken, so it doesn't have to hold anything up. I see. If anyone in addition to the submitter thinks making a change in this area qualifies as a PostgreSQL 10 open item, please speak up. Otherwise, on Monday, I'll reclassify this as a non-bug. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Partition-wise join for join between (declaratively) partitioned tables
On Fri, Apr 21, 2017 at 1:34 AM, Robert Haaswrote: >> You seem to be suggesting that we keep as many sets of >> partition bounds as there are base relations participating in the join >> and then use appropriate partition bounds based on the columns in the >> join conditions, so that we can use the same operator as used in the >> join condition. That doesn't seem to be a good option since the >> partition bounds will all have same values, only differing in their >> binary representation because of differences in data types. > > Well, actually, I think it is a good option, as I wrote in > http://postgr.es/m/CA+TgmoY-LiJ+_S7OijNU_r2y=dhsj539wtqa7cayj-hcecc...@mail.gmail.com I guess, you are now confusing between partition bounds for a join relation and partition bounds of base relation. Above paragraph is about partition bounds of a join relation. I have already agreed that we need to store partition bounds in RelOptInfo. For base relation this is trivial; its RelOptInfo has to store partition bounds as stored in the partition descriptor of corresponding partitioned table. I am talking about partition bounds of a join relation. See below for more explanation. > > In that email, my principal concern was allowing partition-wise join > to succeed even with slightly different sets of partition boundaries > on the two sides of the join; in particular, if we've got A with A1 .. > A10 and B with B1 .. B10 and the DBA adds A11, I don't want > performance to tank until the DBA gets around to adding B11. Removing > the partition bounds from the PartitionScheme and storing them > per-RelOptInfo fixes that problem; We have an agreement on this. > the fact that it also solves this > problem of what happens when we have different data types on the two > sides looks to me like a second reason to go that way. I don't see how is that fixed. For a join relation we need to come up with one set of partition bounds by merging partition bounds of the joining relation and in order to understand how to interpret the datums in the partition bounds, we need to associate data types. The question is which data type we should use if the relations being joined have different data types associated with their respective partition bounds. Or are you saying that we don't need to associate data type with merged partition bounds? In that case, I don't know how do we compare the partition bounds of two relations? In your example, A has partition key of type int8, has bound datums X1.. X10. B has partition key of type int4 and has bounds datums X1 .. X11. C has partition key type int2 and bound datums X1 .. X12. The binary representation of X's is going to differ between A, B and C although each Xk for A, B and C is equal, wherever exists. Join between A and B will have merged bound datums X1 .. X10 (and X11 depending upon the join type). In order to match bounds of AB with C, we need to know the data type of bounds of AB, so that we can choose appropriate equality operator. The question is what should we choose as data type of partition bounds of AB, int8 or int4. This is different from applying join conditions between AB and C, which can choose the right opfamily operator based on the join conditions. > > And there's a third reason, too, which is that the opfamily mechanism > doesn't currently provide any mechanism for reasoning about which data > types are "wider" or "narrower" in the way that you want. In general, > there's not even a reason why such a relationship has to exist; > consider two data types t1 and t2 with opclasses t1_ops and t2_ops > that are part of the same opfamily t_ops, and suppose that t1 can > represent any positive integer and t2 can represent any even integer, > or in general that each data type can represent some but not all of > the values that can be represented by the other data type. In such a > case, neither would be "wider" than the other in the sense that you > need; you essentially want to find a data type within the opfamily to > which all values of any of the types involved in the query can be cast > without error, but there is nothing today which requires such a data > type to exist, and no way to identify which one it is. In practice, > for all of the built-in opfamilies that have more than one opclass, > such a data type always exists but is not always unique -- in > particular, datetime_ops contains date_ops, timestamptz_ops, and > timestamp_ops, and either of the latter two is a plausible choice for > the "widest" data type of the three. But there's no way to figure > that out from the opfamily or opclass information we have today. > > In theory, it would be possible to modify the opfamily machinery so > that every opfamily designates an optional ordering of types from > "narrowest" to "widest", such that saying t1 is-narrower-than t2 is a > guarantee that every value of type t1 can be cast without error to a > value of type t2. But I think that's a bad plan. It
Re: [HACKERS] Quorum commit for multiple synchronous replication.
On Fri, Apr 21, 2017 at 01:20:05PM +0900, Masahiko Sawada wrote: > On Fri, Apr 21, 2017 at 12:02 PM, Noah Mischwrote: > > On Wed, Apr 19, 2017 at 01:52:53PM +0900, Masahiko Sawada wrote: > >> On Wed, Apr 19, 2017 at 12:34 PM, Noah Misch wrote: > >> > On Sun, Apr 16, 2017 at 07:25:28PM +0900, Fujii Masao wrote: > >> >> As I told firstly this is not a bug. There are some proposals for > >> >> better design > >> >> of priority column in pg_stat_replication, but we've not reached the > >> >> consensus > >> >> yet. So I think that it's better to move this open item to "Design > >> >> Decisions to > >> >> Recheck Mid-Beta" section so that we can hear more opinions. > >> > > >> > I'm reading that some people want to report NULL priority, some people > >> > want to > >> > report a constant 1 priority, and nobody wants the current behavior. Is > >> > that > >> > an accurate summary? > >> > >> Yes, I think that's correct. > > > > Okay, but ... > > > >> FWIW the reason of current behavior is that it would be useful for the > >> user who is willing to switch from ANY to FIRST. They can know which > >> standbys will become sync or potential. > > > > ... does this mean you personally want to keep the current behavior? If > > not, > > has some other person stated a wish to keep the current behavior? > > No, I want to change the current behavior. IMO it's better to set > priority 1 to all standbys in quorum set. I guess there is no longer > person who supports the current behavior. In that case, this open item is not eligible for section "Design Decisions to Recheck Mid-Beta". That section is for items where we'll probably change nothing, but we plan to recheck later just in case. Here, we expect to change the behavior; the open question is which replacement behavior to prefer. Fujii, as the owner of this open item, you are responsible for moderating the debate until there's adequate consensus to make a particular change or to keep the current behavior after all. Please proceed to do that. Beta testers deserve a UI they may like, not a UI you already plan to change later. Thanks, nm -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers