Re: [HACKERS] Unportable implementation of background worker start

2017-04-21 Thread Andres Freund
On 2017-04-21 23:50:41 -0400, Tom Lane wrote:
> Attached are a couple of patches that represent a plausible Plan B.
> The first one changes the postmaster to run its signal handlers without
> specifying SA_RESTART.  I've confirmed that that seems to fix the
> select_parallel-test-takes-a-long-time problem on gaur/pademelon.

> The second one uses pselect, if available, to replace the unblock-signals/
> select()/block-signals dance in ServerLoop.  On platforms where pselect
> exists and works properly, that should fix the race condition I described
> previously.  On platforms where it doesn't, we're no worse off than
> before.

We probably should note somewhere prominently that pselect isn't
actually race-free on a number of platforms.


> I think that these patches represent something we could back-patch
> without a lot of trepidation, unlike the WaitEventSet-based approach.
> Therefore, my proposal is to apply and backpatch these changes, and
> call it good for v10.  For v11, we could work on changing the postmaster
> to not do work in signal handlers, as discussed upthread.  That would
> supersede these two patches completely, though I'd still advocate for
> keeping the change in maybe_start_bgworker.

Not yet having looked at your patches, that sounds like a reasonable
plan.  I'd still like to get something like your CLOEXEC patch applied
independently however.


< patches >

Looks reasonable on a quick skim.


Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Replication lag tracking for walsenders

2017-04-21 Thread Thomas Munro
On Sat, Apr 22, 2017 at 9:04 AM, Tom Lane  wrote:
> Alvaro Herrera  writes:
>> Simon Riggs wrote:
>>> Replication lag tracking for walsenders
>>>
>>> Adds write_lag, flush_lag and replay_lag cols to pg_stat_replication.
>
>> Did anyone notice that this seems to be causing buildfarm member 'tern'
>> to fail the recovery check?  See here:
>
>> https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=tern=2017-04-21%2012%3A48%3A09=recovery-check
>> which has
>> TRAP: FailedAssertion("!(lsn >= prev.lsn)", File: "walsender.c", Line: 3331)
>
>> Line 3331 was added by this commit.
>
> Note that while that commit was some time back, tern has only just started
> running recovery-check, following its update to the latest buildfarm
> script.  It looks like it's run that test four times and failed twice,
> so far.  So, not 100% reproducible, but there's something rotten there.
> Timing-dependent, maybe?

Thanks.  I'm away from my computer right now but will investigate this
and send a fix later today.

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] snapbuild woes

2017-04-21 Thread Andres Freund
On 2017-04-17 21:16:57 -0700, Andres Freund wrote:
> I've since the previous update reviewed Petr's patch, which he since has
> updated over the weekend.  I'll do another round tomorrow, and will see
> how it looks.  I think we might need some more tests for this to be
> committable, so it might not become committable tomorrow.  I hope we'll
> have something in tree by end of this week, if not I'll send an update.

I was less productive this week than I'd hoped, and creating a testsuite
was more work than I'd anticipated, so I'm slightly lagging behind.  I
hope to have a patchset tomorrow, aiming to commit something
Monday/Tuesday.

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] snapbuild woes

2017-04-21 Thread Andres Freund
On 2017-04-20 13:32:10 +0200, Petr Jelinek wrote:
> On 20/04/17 02:09, Andres Freund wrote:
> > On 2017-04-17 21:16:57 -0700, Andres Freund wrote:
> > I'm working on some infrastructure around this.  Not sure if it needs to
> > be committed, but it's certainly useful for evaluation.  Basically it's
> > a small UDF that:
> > 1) creates a slot via walsender protocol (to some dsn)
> > 2) imports that snapshot into yet another connection to that dsn
> > 3) runs some query over that new connection
> > 
> > That makes it reasonably easy to run e.g. pgbench and continually create
> > slots, and use the snapshot to run queries "verifying" that things look
> > good.  It's a bit shoestring-ed together, but everything else seems to
> > require more code. And it's just test.
> > 
> > Unless somebody has a better idea?

> I don't. I mean it would be nice to have isolation tester support
> walsender protocol, but I don't know anything about isolation tester
> internals so no idea how much work that is.

Now that the replication protocol supports normal queries, it's actually
not much of an issue on its own.  The problem is more that
isolationtester's client side language isn't powerfull enough - you
can't extract the snapshot name from one session and import it in
another. While that might be something we want to address, I certainly
don't want to tackle it for v10.

I'd started to develop a C toolkit as above, but after I got the basics
running I actually noticed it's pretty much unnecessary: You can just as
well do it with dblink and some plpgsql.

I can reliably reproduce several of the bugs in this thread in
relatively short amount of time before applying the patch, and so far
not after.  Thats great!


> I guess you plan to make that as one of the test/modules or something
> similar (the UDF)?

I've a bunch of tests, but I don't quite know whether we can expose all
of them via classical tests.  There are several easy ones that I
definitely want to add (import "empty" snapshot; import snapshot with
running xacts; create snapshot, perform some ddl, import snapshot,
perform some ddl, check things work reasonably crazy), but there's
enough others that are just probabilistic.  I was wondering about adding
a loop that simply runs for like 30s and then quits or such, but who
knows.

Testing around this made me wonder whether we need to make bgwriter.c's
LOG_SNAPSHOT_INTERVAL_MS configurable - for efficient testing reducing
it is quite valuable, and on busier machines it'll also almost always be
a win to log more frequently...  Opinions?

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgbench - allow to store select results into variables

2017-04-21 Thread Fabien COELHO



Here is a v3 with a more precise comment.


Looks good to me. I have marked the patch status as "Ready for
committer".


Ok. Thanks. When/if committed, it might trigger a few rebase of other 
pending patches. I'll see about that then.


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Unportable implementation of background worker start

2017-04-21 Thread Tom Lane
I wrote:
> Attached is a lightly-tested draft patch that converts the postmaster to
> use a WaitEventSet for waiting in ServerLoop.  I've got mixed emotions
> about whether this is the direction to proceed, though.

Attached are a couple of patches that represent a plausible Plan B.
The first one changes the postmaster to run its signal handlers without
specifying SA_RESTART.  I've confirmed that that seems to fix the
select_parallel-test-takes-a-long-time problem on gaur/pademelon.
The second one uses pselect, if available, to replace the unblock-signals/
select()/block-signals dance in ServerLoop.  On platforms where pselect
exists and works properly, that should fix the race condition I described
previously.  On platforms where it doesn't, we're no worse off than
before.

As mentioned in the comments for the second patch, even if we don't
have working pselect(), the only problem is that ServerLoop's response
to an interrupt might be delayed by as much as the up-to-1-minute timeout.
The only existing case where that's really bad is launching multiple
bgworkers.  I would therefore advocate also changing maybe_start_bgworker
to start up to N bgworkers per call, where N is large enough to pretty
much always satisfy simultaneously-arriving requests.  I'd pick 100 or
so, but am willing to negotiate.

I think that these patches represent something we could back-patch
without a lot of trepidation, unlike the WaitEventSet-based approach.
Therefore, my proposal is to apply and backpatch these changes, and
call it good for v10.  For v11, we could work on changing the postmaster
to not do work in signal handlers, as discussed upthread.  That would
supersede these two patches completely, though I'd still advocate for
keeping the change in maybe_start_bgworker.

Note: for testing purposes, these patches are quite independent; just
ignore the hunk in the second patch that changes a comment added by
the first one.

regards, tom lane

diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index c382345..52b5e2c 100644
*** a/src/backend/postmaster/postmaster.c
--- b/src/backend/postmaster/postmaster.c
*** PostmasterMain(int argc, char *argv[])
*** 610,615 
--- 610,624 
  	/*
  	 * Set up signal handlers for the postmaster process.
  	 *
+ 	 * In the postmaster, we want to install non-ignored handlers *without*
+ 	 * SA_RESTART.  This is because they'll be blocked at all times except
+ 	 * when ServerLoop is waiting for something to happen, and during that
+ 	 * window, we want signals to exit the select(2) wait so that ServerLoop
+ 	 * can respond if anything interesting happened.  On some platforms,
+ 	 * signals marked SA_RESTART would not cause the select() wait to end.
+ 	 * Child processes will generally want SA_RESTART, but we expect them to
+ 	 * set up their own handlers before unblocking signals.
+ 	 *
  	 * CAUTION: when changing this list, check for side-effects on the signal
  	 * handling setup of child processes.  See tcop/postgres.c,
  	 * bootstrap/bootstrap.c, postmaster/bgwriter.c, postmaster/walwriter.c,
*** PostmasterMain(int argc, char *argv[])
*** 620,635 
  	pqinitmask();
  	PG_SETMASK();
  
! 	pqsignal(SIGHUP, SIGHUP_handler);	/* reread config file and have
! 		 * children do same */
! 	pqsignal(SIGINT, pmdie);	/* send SIGTERM and shut down */
! 	pqsignal(SIGQUIT, pmdie);	/* send SIGQUIT and die */
! 	pqsignal(SIGTERM, pmdie);	/* wait for children and shut down */
  	pqsignal(SIGALRM, SIG_IGN); /* ignored */
  	pqsignal(SIGPIPE, SIG_IGN); /* ignored */
! 	pqsignal(SIGUSR1, sigusr1_handler); /* message from child process */
! 	pqsignal(SIGUSR2, dummy_handler);	/* unused, reserve for children */
! 	pqsignal(SIGCHLD, reaper);	/* handle child termination */
  	pqsignal(SIGTTIN, SIG_IGN); /* ignored */
  	pqsignal(SIGTTOU, SIG_IGN); /* ignored */
  	/* ignore SIGXFSZ, so that ulimit violations work like disk full */
--- 629,648 
  	pqinitmask();
  	PG_SETMASK();
  
! 	pqsignal_no_restart(SIGHUP, SIGHUP_handler);		/* reread config file
! 		 * and have children do
! 		 * same */
! 	pqsignal_no_restart(SIGINT, pmdie); /* send SIGTERM and shut down */
! 	pqsignal_no_restart(SIGQUIT, pmdie);		/* send SIGQUIT and die */
! 	pqsignal_no_restart(SIGTERM, pmdie);		/* wait for children and shut
!  * down */
  	pqsignal(SIGALRM, SIG_IGN); /* ignored */
  	pqsignal(SIGPIPE, SIG_IGN); /* ignored */
! 	pqsignal_no_restart(SIGUSR1, sigusr1_handler);		/* message from child
! 		 * process */
! 	pqsignal_no_restart(SIGUSR2, dummy_handler);		/* unused, reserve for
! 		 * children */
! 	pqsignal_no_restart(SIGCHLD, reaper);		/* handle child termination */
  	pqsignal(SIGTTIN, SIG_IGN); /* ignored */
  	pqsignal(SIGTTOU, SIG_IGN); /* ignored */
  	/* ignore SIGXFSZ, so that ulimit violations work like disk full */
diff --git a/src/include/port.h 

Re: [HACKERS] Old versions of Test::More

2017-04-21 Thread Craig Ringer
On 22 Apr. 2017 4:23 am, "Tom Lane"  wrote:

Peter Eisentraut  writes:
> On 4/21/17 14:49, Andrew Dunstan wrote:
>> I'll add a comment, but doing it in PostgresNode.pm would mean jacana
>> (for instance) couldn't run any of the TAP tests. I'mm looking at
>> installing a sufficiently modern Test::Simple package (includes
>> Test::More and test::Build) there, but other oldish machines could also
>> be affected.

> Or you could define note() as an empty function if it doesn't exist.

+1.  I'm really not at all happy with the prospect that every time
somebody adds a use of "note" to some new TAP test, we're going to
get a complaint later that that test no longer works on jacana.
We need to either decide that non-ancient Test::More is a hard
requirement for all the tests


That seems like a no-brainer TBH. Why are we bothering with backwards
compat with ancient versions of test frameworks? It seems like a colossal
waste of time for no benefit.

or fix things with a centralized
solution.  A dummy (or not so dummy?) implementation would serve
for the latter.

regards, tom lane


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] multithreading in Batch/pipelining mode for libpq

2017-04-21 Thread Andres Freund
On 2017-04-22 09:14:50 +0800, Craig Ringer wrote:
> 2) if the answer to the previous question is negative, is it possible to
> send asynchronous queries in one thread while reading results in another
> thread?
> 
> 
> Not right now. libpq's state tracking wouldn't cope.
> 
> I imagine it could be modified to work with some significant refactoring.
> You'd need to track state with a shared fifo of some kind where dispatch
> outs queries on the fifo as it sends them and receive pops them from it.

FWIW, I think it'd be a *SERIOUSLY* bad idea trying to make individual
PGconn interactions threadsafe. It'd imply significant overhead in a lot
of situations, and programming it would have to become a lot more
complicated (since you need to synchronize command submission between
threads).   For almost all cases it's better to either use multiple
connections or use a coarse grained mutex around all of libpq.

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] multithreading in Batch/pipelining mode for libpq

2017-04-21 Thread Craig Ringer
On 22 Apr. 2017 6:04 am, "Ilya Roublev"  wrote:


1) is it possible technically (possibly by changing some part of libpq
code) to ignore results (especially for this sort of queries like insert),
processing somehow separately the situation when some error occurs?


There is a patch out there to allow libpq result processing by callback I
think. Might be roughly what you want.

2) if the answer to the previous question is negative, is it possible to
send asynchronous queries in one thread while reading results in another
thread?


Not right now. libpq's state tracking wouldn't cope.

I imagine it could be modified to work with some significant refactoring.
You'd need to track state with a shared fifo of some kind where dispatch
outs queries on the fifo as it sends them and receive pops them from it.

I started on that for the batch mode stuff but it's not in any way thread
safe there.


 locking, info in PGconn very quickly becomes inconsistent, the number of
queries sent does not correspond to the number of results to be read, etc.
So I'd like to know at first is it possible at all (possibly by some
changes to be made in libpq)? Sorry if my idea sounds rather naive. And
thanks for your answer and advice.


Yeah, it's possible. The protocol can handle it, it's just libpq that can't.


Re: [HACKERS] Unportable implementation of background worker start

2017-04-21 Thread Tom Lane
Alvaro Herrera  writes:
> Tom Lane wrote:
>> It also appears to me that do_start_bgworker's treatment of fork
>> failure is completely brain dead.  Did anyone really think about
>> that case?

> Hmm, I probably modelled it on autovacuum without giving that case much
> additional consideration.

Attached is a proposed patch that should make fork failure behave
sanely, ie it works much the same as a worker crash immediately after
launch.  I also refactored things a bit to make do_start_bgworker
fully responsible for updating the RegisteredBgWorker's state,
rather than doing just some of it as before.

I tested this by hot-wiring the fork_process call to fail some of
the time, which showed that the postmaster now seems to recover OK,
but parallel.c's logic is completely innocent of the idea that
worker-startup failure is possible.  The leader backend just freezes,
and nothing short of kill -9 on that backend will get you out of it.
Fixing that seems like material for a separate patch though.

regards, tom lane

diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index c382345..8461c24 100644
*** a/src/backend/postmaster/postmaster.c
--- b/src/backend/postmaster/postmaster.c
*** static void TerminateChildren(int signal
*** 420,425 
--- 420,426 
  #define SignalChildren(sig)			   SignalSomeChildren(sig, BACKEND_TYPE_ALL)
  
  static int	CountChildren(int target);
+ static bool assign_backendlist_entry(RegisteredBgWorker *rw);
  static void maybe_start_bgworker(void);
  static bool CreateOptsFile(int argc, char *argv[], char *fullprogname);
  static pid_t StartChildProcess(AuxProcType type);
*** bgworker_forkexec(int shmem_slot)
*** 5531,5543 
   * Start a new bgworker.
   * Starting time conditions must have been checked already.
   *
   * This code is heavily based on autovacuum.c, q.v.
   */
! static void
  do_start_bgworker(RegisteredBgWorker *rw)
  {
  	pid_t		worker_pid;
  
  	ereport(DEBUG1,
  			(errmsg("starting background worker process \"%s\"",
  	rw->rw_worker.bgw_name)));
--- 5532,5564 
   * Start a new bgworker.
   * Starting time conditions must have been checked already.
   *
+  * Returns true on success, false on failure.
+  * In either case, update the RegisteredBgWorker's state appropriately.
+  *
   * This code is heavily based on autovacuum.c, q.v.
   */
! static bool
  do_start_bgworker(RegisteredBgWorker *rw)
  {
  	pid_t		worker_pid;
  
+ 	Assert(rw->rw_pid == 0);
+ 
+ 	/*
+ 	 * Allocate and assign the Backend element.  Note we must do this before
+ 	 * forking, so that we can handle out of memory properly.
+ 	 *
+ 	 * Treat failure as though the worker had crashed.  That way, the
+ 	 * postmaster will wait a bit before attempting to start it again; if it
+ 	 * tried again right away, most likely it'd find itself repeating the
+ 	 * out-of-memory or fork failure condition.
+ 	 */
+ 	if (!assign_backendlist_entry(rw))
+ 	{
+ 		rw->rw_crashed_at = GetCurrentTimestamp();
+ 		return false;
+ 	}
+ 
  	ereport(DEBUG1,
  			(errmsg("starting background worker process \"%s\"",
  	rw->rw_worker.bgw_name)));
*** do_start_bgworker(RegisteredBgWorker *rw
*** 5549,5557 
  #endif
  	{
  		case -1:
  			ereport(LOG,
  	(errmsg("could not fork worker process: %m")));
! 			return;
  
  #ifndef EXEC_BACKEND
  		case 0:
--- 5570,5586 
  #endif
  	{
  		case -1:
+ 			/* in postmaster, fork failed ... */
  			ereport(LOG,
  	(errmsg("could not fork worker process: %m")));
! 			/* undo what assign_backendlist_entry did */
! 			ReleasePostmasterChildSlot(rw->rw_child_slot);
! 			rw->rw_child_slot = 0;
! 			free(rw->rw_backend);
! 			rw->rw_backend = NULL;
! 			/* mark entry as crashed, so we'll try again later */
! 			rw->rw_crashed_at = GetCurrentTimestamp();
! 			break;
  
  #ifndef EXEC_BACKEND
  		case 0:
*** do_start_bgworker(RegisteredBgWorker *rw
*** 5575,5588 
  			PostmasterContext = NULL;
  
  			StartBackgroundWorker();
  			break;
  #endif
  		default:
  			rw->rw_pid = worker_pid;
  			rw->rw_backend->pid = rw->rw_pid;
  			ReportBackgroundWorkerPID(rw);
! 			break;
  	}
  }
  
  /*
--- 5604,5627 
  			PostmasterContext = NULL;
  
  			StartBackgroundWorker();
+ 
+ 			exit(1);			/* should not get here */
  			break;
  #endif
  		default:
+ 			/* in postmaster, fork successful ... */
  			rw->rw_pid = worker_pid;
  			rw->rw_backend->pid = rw->rw_pid;
  			ReportBackgroundWorkerPID(rw);
! 			/* add new worker to lists of backends */
! 			dlist_push_head(, >rw_backend->elem);
! #ifdef EXEC_BACKEND
! 			ShmemBackendArrayAdd(rw->rw_backend);
! #endif
! 			return true;
  	}
+ 
+ 	return false;
  }
  
  /*
*** bgworker_should_start_now(BgWorkerStartT
*** 5629,5634 
--- 5668,5675 
   * Allocate the Backend struct for a connected background worker, but don't
   * add it to the list of backends just yet.
   *

Re: [HACKERS] pg_dump emits ALTER TABLE ONLY partitioned_table

2017-04-21 Thread Robert Haas
On Fri, Apr 21, 2017 at 1:43 AM, Stephen Frost  wrote:
>> + Once
>> +   partitions exist, we do not support using ONLY to
>> +   add or drop constraints on only the partitioned table.
>>
>> I wonder if the following sounds a bit more informative: Once partitions
>> exist, using ONLY will result in an error, because the
>> constraint being added or dropped must be added or dropped from the
>> partitions too.
>
> My thinking here is that we may add support later on down the line for
> using the ONLY keyword, when the constraint has already been created on
> the partitions themselves, so I would rather just clarify that we don't
> (yet) support that.  Your wording, at least to me, seems to imply that
> if the constraint was added to or dropped from the partitions then the
> ONLY keyword could be used, which isn't accurate today.

Well, I think that Amit's wording has the virtue of giving a reason
which is basically valid, whereas someone reading your text might
conceivably be left wondering what the reason for the restriction is.
Also, I think saying that it it will result in an error is a bit more
helpful than saying that it is is not supported, since the latter
might lead someone to believe that it could result in undefined
behavior (i.e. arbitrarily awful misbehavior) which is not the case.
So I like what he wrote, for whatever that's worth.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Why is get_cheapest_parallel_safe_total_inner() in pathkeys.c?

2017-04-21 Thread Robert Haas
On Fri, Apr 21, 2017 at 12:10 PM, David Rowley
 wrote:
> This probably ended up here because there's a bunch of other functions
> named get_cheapest* in that file, but all of those relate to getting a
> path for specific PathKeys. get_cheapest_parallel_safe_total_inner()
> does not do that.

Yes, I just put it near to the code from which it was derived.

> Maybe allpaths.c is a better home for it?

I will let others be the judge of that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Partition-wise join for join between (declaratively) partitioned tables

2017-04-21 Thread Robert Haas
On Fri, Apr 21, 2017 at 8:41 AM, Ashutosh Bapat
 wrote:
> I don't see how is that fixed. For a join relation we need to come up
> with one set of partition bounds by merging partition bounds of the
> joining relation and in order to understand how to interpret the
> datums in the partition bounds, we need to associate data types. The
> question is which data type we should use if the relations being
> joined have different data types associated with their respective
> partition bounds.
>
> Or are you saying that we don't need to associate data type with
> merged partition bounds? In that case, I don't know how do we compare
> the partition bounds of two relations?

Well, since there is no guarantee that a datatype exists which can be
used to "merge" the partition bounds in the sense that you are
describing, and even if there is one we have no opfamily
infrastructure to find out which one it is, I think it would be smart
to try to set things up so that we don't need to do that.  I believe
that's probably possible.

> In your example, A has partition key of type int8, has bound datums
> X1.. X10. B has partition key of type int4 and has bounds datums X1 ..
> X11. C has partition key type int2 and bound datums X1 .. X12.

OK, sure.

> The binary representation of X's is going to differ between A, B and C
> although each Xk for A, B and C is equal, wherever exists.

Agreed.

> Join
> between A and B will have merged bound datums X1 .. X10 (and X11
> depending upon the join type). In order to match bounds of AB with C,
> we need to know the data type of bounds of AB, so that we can choose
> appropriate equality operator. The question is what should we choose
> as data type of partition bounds of AB, int8 or int4. This is
> different from applying join conditions between AB and C, which can
> choose the right opfamily operator based on the join conditions.

Well, the join is actually being performed either on A.keycol =
C.keycol or on B.keycol = C.keycol, right?  It has to be one or the
other; there's no "merged" join column in any relation's targetlist,
but only columns derived from the various baserels.  So let's use that
set of bounds for the matching.  It makes sense to use the set of
bounds for the matching that corresponds to the column actually being
joined, I think.

It's late here and I'm tired, but it seems like it should be possible
to relate the child joinrels of the AB join back to the child joinrels
of either A or B.  (AB)1 .. (AB)10 related back to A1 .. A10 and B1 ..
B10.  (AB)11 relates back to B11 but, of course not to A11, which
doesn't exist.  If the join is INNER, (AB)11 is a dummy rel anyway and
actually we should probably see whether we can omit it altogether.  If
the join is an outer join of some kind, there's an interesting case
where the user wrote A LEFT JOIN B or B RIGHT JOIN A so that A is not
on the nullable side of the join; in that case, too, (AB)11 is dummy
or nonexistent.  Otherwise, assuming A is nullable, (AB)11 maps only
to B11 and not to A11.  But that's absolutely right: if the join to C
uses A.keycol, either the join operator is strict and (AB)11 won't
match anything anyway, or it's not and partition-wise join is illegal
because A.keycol in (AB)11 can include not only values from X11 but
also nulls.

So, it seems to me that what you can do is loop over the childrels on
the outer side of the join.  For each one, you've got a join clause
that relates the outer rel to the inner rel, and that join clause
mentions some baserel which is contained in the joinrel.  So drill
down through the childrel to the corresponding partition of the
baserel and get those bounds.  Then if you do the same thing for the
inner childrels, you've now got two lists of bounds, and the type on
the left matches the outer side of the join and the type on the right
matches the inner side of the join and the opfamily of the operator in
the join clause gives you a comparison operator that relates those two
types, and now you can match them up.

(We should also keep in mind the case where there are multiple columns
in the partition key.)

> This seems to suggest that we can not come up with merged bounds for
> join if the partition key types of joining relations differ.

Yes, I think that would be difficult.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] scram and \password

2017-04-21 Thread Michael Paquier
On Sat, Apr 22, 2017 at 5:04 AM, Heikki Linnakangas  wrote:
> I'll continue reviewing the rest of the patch on Monday, but one glaring
> issue is that we cannot add an argument to the existing libpq
> PQencryptPassword() function, because that's part of the public API. It
> would break all applications that use PQencryptPassword().
>
> What we need to do is to add a new function. What should we call that? We
> don't have a tradition of using "Ex" or such suffix to mark extended
> versions of existing functions. PQencryptPasswordWithMethod(user, pass,
> method) ?

Do we really want to add a new function or have a hard failure? Any
application calling PQencryptPassword may trap itself silently if the
server uses scram as hba key or if the default is switched to that,
from this point of view extending the function makes sense as well.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] multithreading in Batch/pipelining mode for libpq

2017-04-21 Thread Ilya Roublev
Hello, 

I'm investigating Batch/pipelining mode for libpq connected to this thread: 
https://www.postgresql.org/message-id/flat/CADT4RqA6XoDCVY-G13ME1oRVshE2oNk4fRHKZC0K-jJymQJV0Q%40mail.gmail.com#cadt4rqa6xodcvy-g13me1orvshe2onk4frhkzc0k-jjymqj...@mail.gmail.com
And I'm wondering, what are the possibilities to speed up execution of 
asynchronous queries yet more. What I need is to make a huge amount of inserts 
(or other sorts of queries either returning no results or those which results  
I'd like somehow to ignore completely) very quickly, so the idea to use batch 
mode sounds rather good for me taking into account  
https://blog.2ndquadrant.com/postgresql-latency-pipelining-batching/

So I'd like to create the corresponding prepared statement and to perform 
asynchronously all these queries through this prepared statement. What I'm 
trying to do now is to understand (at least in general):
1) is it possible technically (possibly by changing some part of libpq code) to 
ignore results (especially for this sort of queries like insert), processing 
somehow separately the situation when some error occurs? 
2) if the answer to the previous question is negative, is it possible to send 
asynchronous queries in one thread while reading results in another thread? I 
understand that this immediately contradicts the requirement not to use the 
same PGconn object in different threads. But may be some locks of this PGconn 
object are possible, so that the state of PGconn cannot be made in one thread 
while the other locks PGconn and vice versa?

Naturally I failed to implement this without locking, info in PGconn very 
quickly becomes inconsistent, the number of queries sent does not correspond to 
the number of results to be read, etc. So I'd like to know at first is it 
possible at all (possibly by some changes to be made in libpq)? Sorry if my 
idea sounds rather naive. And thanks for your answer and advice. 

Only in the case all this sounds not so stupid and something is possible to 
make, I'd like to ask for some details, if you do not mind.

Thank you very much again in advance.

With my best regards,
Ilya



Re: [HACKERS] [COMMITTERS] pgsql: Replication lag tracking for walsenders

2017-04-21 Thread Andres Freund
On 2017-04-21 17:04:08 -0400, Tom Lane wrote:
> Some excavation in the buildfarm database says that the coverage for
> the recovery-check test has been mighty darn thin up until just
> recently.

Hm, not good. Just enabled it for most of my animals (there seems to be
no point in having it on the C89 animal).  Very curious what it'll do to
skink's runtime :/.  I'll add a few more cores to the machine, otherwise
it'll probably wreak havock.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Replication lag tracking for walsenders

2017-04-21 Thread Tom Lane
Alvaro Herrera  writes:
> Simon Riggs wrote:
>> Replication lag tracking for walsenders
>> 
>> Adds write_lag, flush_lag and replay_lag cols to pg_stat_replication.

> Did anyone notice that this seems to be causing buildfarm member 'tern'
> to fail the recovery check?  See here:

> https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=tern=2017-04-21%2012%3A48%3A09=recovery-check
> which has
> TRAP: FailedAssertion("!(lsn >= prev.lsn)", File: "walsender.c", Line: 3331)

> Line 3331 was added by this commit.

Note that while that commit was some time back, tern has only just started
running recovery-check, following its update to the latest buildfarm
script.  It looks like it's run that test four times and failed twice,
so far.  So, not 100% reproducible, but there's something rotten there.
Timing-dependent, maybe?

Some excavation in the buildfarm database says that the coverage for
the recovery-check test has been mighty darn thin up until just recently.
These are all the reports we have:

pgbfprod=> select sysname, min(snapshot) as oldest, count(*) from 
build_status_log where log_stage = 'recovery-check.log' group by 1 order by 2;
 sysname  |   oldest| count 
--+-+---
 hamster  | 2016-03-01 02:34:26 |   182
 crake| 2017-04-09 01:58:15 |80
 nightjar | 2017-04-11 15:54:34 |52
 longfin  | 2017-04-19 16:29:39 | 9
 hornet   | 2017-04-20 14:12:08 | 4
 mandrill | 2017-04-20 14:14:08 | 4
 sungazer | 2017-04-20 14:16:08 | 4
 tern | 2017-04-20 14:18:08 | 4
 prion| 2017-04-20 14:23:05 | 8
 jacana   | 2017-04-20 15:00:17 | 3
(10 rows)

So, other than hamster which is certainly going to have its own spin
on the timing question, we have next to no track record for this test.
I wouldn't bet that this issue is unique to tern; more likely, that's
just the first critter to show an intermittent issue.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Old versions of Test::More

2017-04-21 Thread Tom Lane
Peter Eisentraut  writes:
> On 4/21/17 14:49, Andrew Dunstan wrote:
>> I'll add a comment, but doing it in PostgresNode.pm would mean jacana
>> (for instance) couldn't run any of the TAP tests. I'mm looking at
>> installing a sufficiently modern Test::Simple package (includes
>> Test::More and test::Build) there, but other oldish machines could also
>> be affected.

> Or you could define note() as an empty function if it doesn't exist.

+1.  I'm really not at all happy with the prospect that every time
somebody adds a use of "note" to some new TAP test, we're going to
get a complaint later that that test no longer works on jacana.
We need to either decide that non-ancient Test::More is a hard
requirement for all the tests, or fix things with a centralized
solution.  A dummy (or not so dummy?) implementation would serve
for the latter.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Unportable implementation of background worker start

2017-04-21 Thread Tom Lane
Tom Lane  writes:
> Andres Freund  writes:
>> On 2017-04-21 14:08:21 -0400, Tom Lane wrote:
>>> but I see that SUSv2
>>> mandates that fcntl.h provide both F_SETFD and FD_CLOEXEC, so by our own
>>> coding rules it ought to be okay to assume they're there.  I'm tempted to
>>> rip out the quoted bit, as well as the #ifdef F_SETFD, from libpq and see
>>> if anything in the buildfarm complains.

>> +1

> Done, we'll soon see what happens.

Should have seen this coming, I guess: some of the Windows critters are
falling over, apparently because they lack fcntl() altogether.  So the
#ifdef F_SETFD was really acting as a proxy for "#ifdef HAVE_FCNTL".

There's no HAVE_FCNTL test in configure ATM, and I'm thinking it would
be pretty silly to add one, since surely it would succeed on anything
Unix-y enough to run the configure script.

I'm guessing the best thing to do is put back #ifdef F_SETFD;
alternatively we might spell it like "#ifndef WIN32", but I'm unsure
if that'd do what we want on Cygwin or MinGW.

In non-Windows code paths in latch.c, we probably wouldn't need to
bother with #ifdef F_SETFD.

Hopefully we can leave in the removal of "#define FD_CLOEXEC".
Will wait a bit longer for more results.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] scram and \password

2017-04-21 Thread Heikki Linnakangas

On 04/10/2017 08:42 AM, Michael Paquier wrote:

As there have been some conflicts because of the commit of SASLprep,
here is a rebased set of patches.


I've committed a modified version of the first patch, to change the 
on-disk format to RFC 5803, as mentioned on the other thread 
(https://www.postgresql.org/message-id/351ba574-85ea-d9b8-9689-8c928dd09...@iki.fi).


I'll continue reviewing the rest of the patch on Monday, but one glaring 
issue is that we cannot add an argument to the existing libpq 
PQencryptPassword() function, because that's part of the public API. It 
would break all applications that use PQencryptPassword().


What we need to do is to add a new function. What should we call that? 
We don't have a tradition of using "Ex" or such suffix to mark extended 
versions of existing functions. PQencryptPasswordWithMethod(user, pass, 
method) ?


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] On-disk format of SCRAM verifiers

2017-04-21 Thread Heikki Linnakangas

On 04/21/2017 05:33 PM, Simon Riggs wrote:

On 21 April 2017 at 14:42, Heikki Linnakangas  wrote:


SCRAM-SHA-256$:$:


Could you explain where you are looking? I don't see that in RFC5803



>From 1.  Overview:

Yeah, it's not easy to see, I missed it earlier too. You have to look at RFC 5803 and RFC 3112 together. RFC 3112 says that the overall format is 
"$$", and RFC5803 says that for SCRAM, scheme is "SCRAM-SHA-256" (for our variant), authInfo 
is ":" and authValue is ":"

They really should've included examples in those RFCs.


Thanks

+1 for change


Committed, thanks.

- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Old versions of Test::More

2017-04-21 Thread Peter Eisentraut
On 4/21/17 14:49, Andrew Dunstan wrote:
> I'll add a comment, but doing it in PostgresNode.pm would mean jacana
> (for instance) couldn't run any of the TAP tests. I'mm looking at
> installing a sufficiently modern Test::Simple package (includes
> Test::More and test::Build) there, but other oldish machines could also
> be affected.

Or you could define note() as an empty function if it doesn't exist.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Replication lag tracking for walsenders

2017-04-21 Thread Alvaro Herrera
Simon Riggs wrote:
> Replication lag tracking for walsenders
> 
> Adds write_lag, flush_lag and replay_lag cols to pg_stat_replication.

Did anyone notice that this seems to be causing buildfarm member 'tern'
to fail the recovery check?  See here:

https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=tern=2017-04-21%2012%3A48%3A09=recovery-check
which has
TRAP: FailedAssertion("!(lsn >= prev.lsn)", File: "walsender.c", Line: 3331)

Line 3331 was added by this commit.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] some review comments on logical rep code

2017-04-21 Thread Fujii Masao
On Fri, Apr 21, 2017 at 4:02 PM, Masahiko Sawada  wrote:
> On Fri, Apr 21, 2017 at 1:19 AM, Fujii Masao  wrote:
>> On Tue, Apr 18, 2017 at 5:16 PM, Masahiko Sawada  
>> wrote:
>>> On Tue, Apr 18, 2017 at 12:24 PM, Kyotaro HORIGUCHI
>>>  wrote:
 Hi,

 Thank you for the revised version.

 At Mon, 17 Apr 2017 23:29:28 +0900, Masahiko Sawada 
  wrote in 
 
> On Mon, Apr 17, 2017 at 9:13 PM, Masahiko Sawada  
> wrote:
> > On Mon, Apr 17, 2017 at 7:39 PM, Kyotaro HORIGUCHI
> >  wrote:
> >> At Mon, 17 Apr 2017 18:02:57 +0900, Masahiko Sawada 
> >>  wrote in 
> >> 

Re: [HACKERS] Unportable implementation of background worker start

2017-04-21 Thread Tom Lane
Andres Freund  writes:
> On 2017-04-21 14:08:21 -0400, Tom Lane wrote:
>> but I see that SUSv2
>> mandates that fcntl.h provide both F_SETFD and FD_CLOEXEC, so by our own
>> coding rules it ought to be okay to assume they're there.  I'm tempted to
>> rip out the quoted bit, as well as the #ifdef F_SETFD, from libpq and see
>> if anything in the buildfarm complains.

> +1

Done, we'll soon see what happens.

In the same area, I noticed that POSIX does not say that the success
result for fcntl(F_SETFD) and related cases is 0.  It says that the
failure result is -1 and the success result is some other value.
We seem to have this right in most places, but e.g. port/noblock.c
gets it wrong.  The lack of field complaints implies that just about
everybody actually does return 0 on success, but I still think it
would be a good idea to run around and make all the calls test
specifically for -1.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Old versions of Test::More

2017-04-21 Thread Andrew Dunstan


On 04/21/2017 10:45 AM, Tom Lane wrote:
> Andrew Dunstan  writes:
>> As we discovered yesterday via jacana, very old versions of Test::More
>> don't support the note() function. It therefore seems prudent to specify
>> a minimum version number for the module in those scripts that use the
>> function. According to the changelog, version 0.82 (released in Oct
>> 2008) should be sufficient. So I suggest the attached patch.
> Maybe it'd be better to put the minimum-version check somewhere central,
> like PostgresNode.pm?  I suppose that our use of "note" will expand,
> and I can't see us remembering to put this into all and only the specific
> tests that use "note".  Especially if there's no comment about it.
>
>   


I'll add a comment, but doing it in PostgresNode.pm would mean jacana
(for instance) couldn't run any of the TAP tests. I'mm looking at
installing a sufficiently modern Test::Simple package (includes
Test::More and test::Build) there, but other oldish machines could also
be affected.


cheers

andrew

-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Missing feature in Phrase Search?

2017-04-21 Thread Josh Berkus
Oleg, Teodor, folks:

I was demo'ing phrase search for a meetup yesterday, and the user
feedback I got showed that there's a missing feature with phrase search.
 Let me explain by example:


'fix <-> error' will match 'fixed error', 'fixing error'
but not 'fixed language error' or 'fixed a small error'

'fix <2> error' will match 'fixed language error',
but not 'fixing error' or 'fixed a small error'

'fix <3> error' will match 'fixed a small error',
but not any of the other strings.


This is because the # in <#> is an exact match.

Seems like we could really use a way for users to indicate that they
want a range of word gaps.  Like, in the example above, users could
search on:

'fix <1:3> error'

... which would search for any phrase where "error" followed "fix" by
between 1 and 3 words.

Not wedded to any particular syntax for that, of course.

-- 
Josh Berkus
Containers & Databases Oh My!


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Unportable implementation of background worker start

2017-04-21 Thread Andres Freund
Tom,

On 2017-04-21 13:49:27 -0400, Tom Lane wrote:
> >> * If we are in PM_WAIT_DEAD_END state, then we don't want to accept
> >> -   * any new connections, so we don't call select(), and just 
> >> sleep.
> >> +   * any new connections, so we don't call WaitEventSetWait(), 
> >> and just
> >> +   * sleep.  XXX not ideal
> >> */
> 
> > Couldn't we just deactive the sockets in the set instead?
> 
> Yeah, I think it'd be better to do something like that.  The pg_usleep
> call has the same issue of possibly not responding to interrupts.  The
> risks are a lot less, since it's a much shorter wait, but I would rather
> eliminate the separate code path in favor of doing it honestly.  Didn't
> seem like something to fuss over in the first draft though.

Ok, cool.


> >> +  ServerWaitSet = CreateWaitEventSet(PostmasterContext, MAXLISTEN + 1);
> 
> > Why are we using MAXLISTEN, rather than the actual number of things to
> > listen to?
> 
> It'd take more code (ie, an additional scan of the array) to predetermine
> that.  I figured the space-per-item in the WaitEventSet wasn't enough to
> worry about ... do you think differently?

I'm not sure.  We do create an epoll handler with enough space, and that
has some overhead. Don't know whether that's worthwhile to care about.


> > I kind of would like, in master, take a chance of replace all the work
> > done in signal handlers, by just a SetLatch(), and do it outside of
> > signal handlers instead.  Forking from signal handlers is just plain
> > weird.
> 
> Yeah, maybe it's time.  But in v11, and not for back-patch.

Agreed.


- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Unportable implementation of background worker start

2017-04-21 Thread Andres Freund
On 2017-04-21 14:08:21 -0400, Tom Lane wrote:
> but I see that SUSv2
> mandates that fcntl.h provide both F_SETFD and FD_CLOEXEC, so by our own
> coding rules it ought to be okay to assume they're there.  I'm tempted to
> rip out the quoted bit, as well as the #ifdef F_SETFD, from libpq and see
> if anything in the buildfarm complains.

+1

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers



Re: [HACKERS] Unportable implementation of background worker start

2017-04-21 Thread Tom Lane
I wrote:
> Andres Freund  writes:
>> On 2017-04-21 12:50:04 -0400, Tom Lane wrote:
>>> +#ifndef FD_CLOEXEC
>>> +#define FD_CLOEXEC 1
>>> +#endif

>> Hm? Are we sure this is portable?  Is there really cases that have
>> F_SETFD, but not CLOEXEC?

> Copied-and-pasted from our only existing use of FD_CLOEXEC, in libpq.

Looking closer, that code dates to 

Author: Tom Lane 
Branch: master Release: REL8_0_BR [7627b91cd] 2004-10-21 20:23:19 +

Set the close-on-exec flag for libpq's socket to the backend, to avoid
any possible problems from child programs executed by the client app.
Per suggestion from Elliot Lee of Red Hat.

and while the public discussion about it

https://www.postgresql.org/message-id/flat/18172.1098382248%40sss.pgh.pa.us

doesn't really say, I suspect the specific coding was Elliot's suggestion
as well.  It probably had some value at one time ... but I see that SUSv2
mandates that fcntl.h provide both F_SETFD and FD_CLOEXEC, so by our own
coding rules it ought to be okay to assume they're there.  I'm tempted to
rip out the quoted bit, as well as the #ifdef F_SETFD, from libpq and see
if anything in the buildfarm complains.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Unportable implementation of background worker start

2017-04-21 Thread Tom Lane
Andres Freund  writes:
> On 2017-04-21 12:50:04 -0400, Tom Lane wrote:
>> Attached is a lightly-tested draft patch that converts the postmaster to
>> use a WaitEventSet for waiting in ServerLoop.  I've got mixed emotions
>> about whether this is the direction to proceed, though.  It adds at least
>> a couple of kernel calls per postmaster signal delivery, and probably to
>> every postmaster connection acceptance (ServerLoop iteration), to fix
>> problems that are so corner-casey that we've never even known we had them
>> till now.

> I'm not concerned much about the signal delivery paths, and I can't
> quite imagine that another syscall in the accept path is going to be
> measurable - worth ensuring though.
> ...
> On the other hand most types of our processes do SetLatch() in just
> nearly all the relevant signal handlers anyway, so they're pretty close
> to behaviour already.

True.  Maybe I'm being too worried.

>> * If we are in PM_WAIT_DEAD_END state, then we don't want to accept
>> - * any new connections, so we don't call select(), and just 
>> sleep.
>> + * any new connections, so we don't call WaitEventSetWait(), 
>> and just
>> + * sleep.  XXX not ideal
>> */

> Couldn't we just deactive the sockets in the set instead?

Yeah, I think it'd be better to do something like that.  The pg_usleep
call has the same issue of possibly not responding to interrupts.  The
risks are a lot less, since it's a much shorter wait, but I would rather
eliminate the separate code path in favor of doing it honestly.  Didn't
seem like something to fuss over in the first draft though.

>> +ServerWaitSet = CreateWaitEventSet(PostmasterContext, MAXLISTEN + 1);

> Why are we using MAXLISTEN, rather than the actual number of things to
> listen to?

It'd take more code (ie, an additional scan of the array) to predetermine
that.  I figured the space-per-item in the WaitEventSet wasn't enough to
worry about ... do you think differently?

> Random note: Do we actually have any code that errors out if too many
> sockets are being listened to?

Yes, see StreamServerPort, about line 400.

> I kind of would like, in master, take a chance of replace all the work
> done in signal handlers, by just a SetLatch(), and do it outside of
> signal handlers instead.  Forking from signal handlers is just plain
> weird.

Yeah, maybe it's time.  But in v11, and not for back-patch.

>> +#ifndef FD_CLOEXEC
>> +#define FD_CLOEXEC 1
>> +#endif

> Hm? Are we sure this is portable?  Is there really cases that have
> F_SETFD, but not CLOEXEC?

Copied-and-pasted from our only existing use of FD_CLOEXEC, in libpq.
Might well be obsolete but I see no particular reason not to do it.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Unportable implementation of background worker start

2017-04-21 Thread Tom Lane
Alvaro Herrera  writes:
> Tom Lane wrote:
>> I'm coming back to the idea that at least in the back branches, the
>> thing to do is allow maybe_start_bgworker to start multiple workers.
>> 
>> Is there any actual evidence for the claim that that might have
>> bad side effects?

> Well, I ran tests with a few dozen thousand sample workers and the
> neglect for other things (such as connection requests) was visible, but
> that's probably not a scenario many servers run often currently.

Indeed.  I'm pretty skeptical that that's an interesting case, and if it
is, the current coding is broken anyway, because with that many workers
you are going to start noticing that running maybe_start_bgworker over
again for each worker is an O(N^2) proposition.  Admittedly, iterating
the loop in maybe_start_bgworker is really cheap compared to a fork(),
but eventually the big-O problem is going to eat your lunch.

> I don't strongly object to the idea of removing the "return" in older
> branches, since it's evidently a problem.  However, as bgworkers start
> to be used more, I think we should definitely have some protection.  In
> a system with a large number of workers available for parallel queries,
> it seems possible for a high velocity server to get stuck in the loop
> for some time.  (I haven't actually verified this, though.  My
> experiments were with the early kind, static bgworkers.)

It might be sensible to limit the number of workers launched per call,
but I think the limit should be quite a bit higher than 1 ... something
like 100 or 1000 might be appropriate.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Unportable implementation of background worker start

2017-04-21 Thread Andres Freund
Hi,

On 2017-04-21 12:50:04 -0400, Tom Lane wrote:
> Attached is a lightly-tested draft patch that converts the postmaster to
> use a WaitEventSet for waiting in ServerLoop.  I've got mixed emotions
> about whether this is the direction to proceed, though.  It adds at least
> a couple of kernel calls per postmaster signal delivery, and probably to
> every postmaster connection acceptance (ServerLoop iteration), to fix
> problems that are so corner-casey that we've never even known we had them
> till now.

I'm not concerned much about the signal delivery paths, and I can't
quite imagine that another syscall in the accept path is going to be
measurable - worth ensuring though.

I do agree that it's a bit of a big stick for the back-branches...


> A different line of thought is to try to provide a bulletproof solution,
> but push the implementation problems down into latch.c --- that is, the
> goal would be to provide a pselect-like variant of WaitEventSetWait that
> is guaranteed to return if interrupted, as opposed to the current behavior
> where it's guaranteed not to.  But that seems like quite a bit of work.

Seems like a sane idea to me.  The use of latches has already grown due
to parallelism and it'll likely grow further - some of them seem likely
to also have to deal with such concerns.  I'd much rather centralize
things down to a common place.

On the other hand most types of our processes do SetLatch() in just
nearly all the relevant signal handlers anyway, so they're pretty close
to behaviour already.



> Whether or not we decide to change over the postmaster.c code, I think
> it'd likely be a good idea to apply most or all of the attached changes
> in latch.c.  Setting CLOEXEC on the relevant FDs is clearly a good thing,
> and the other changes will provide some safety if some preloaded extension
> decides to create a latch in the postmaster process.

On the principle, I agree.  Reading through the changes now.


> @@ -1667,76 +1656,64 @@ ServerLoop(void)
>* do nontrivial work.
>*
>* If we are in PM_WAIT_DEAD_END state, then we don't want to 
> accept
> -  * any new connections, so we don't call select(), and just 
> sleep.
> +  * any new connections, so we don't call WaitEventSetWait(), 
> and just
> +  * sleep.  XXX not ideal
>*/

Couldn't we just deactive the sockets in the set instead?


>  /*
> - * Initialise the masks for select() for the ports we are listening on.
> - * Return the number of sockets to listen on.
> + * Create a WaitEventSet for ServerLoop() to wait on.  This includes the
> + * sockets we are listening on, plus the PostmasterLatch.
>   */
> -static int
> -initMasks(fd_set *rmask)
> +static void
> +initServerWaitSet(void)
>  {
> - int maxsock = -1;
>   int i;
>  
> - FD_ZERO(rmask);
> + ServerWaitSet = CreateWaitEventSet(PostmasterContext, MAXLISTEN + 1);

Why are we using MAXLISTEN, rather than the actual number of things to
listen to?  The only benefit of this seems to be that we could
theoretically allow dynamic reconfiguration of the sockets a bit more
easily in the future, but that could just as well be done by recreating the set.

Random note: Do we actually have any code that errors out if too many
sockets are being listened to?


> @@ -2553,6 +2543,9 @@ SIGHUP_handler(SIGNAL_ARGS)
>  #endif
>   }
>  
> + /* Force ServerLoop to iterate */
> + SetLatch();
> +
>   PG_SETMASK();
>  
>   errno = save_errno;
> @@ -2724,6 +2717,9 @@ pmdie(SIGNAL_ARGS)
>   break;
>   }
>  
> + /* Force ServerLoop to iterate */
> + SetLatch();
> +
>   PG_SETMASK();
>  
>   errno = save_errno;
> @@ -3037,6 +3033,9 @@ reaper(SIGNAL_ARGS)
>*/
>   PostmasterStateMachine();
>  
> + /* Force ServerLoop to iterate */
> + SetLatch();
> +
>   /* Done with signal handler */
>   PG_SETMASK();
>  
> @@ -5078,6 +5077,9 @@ sigusr1_handler(SIGNAL_ARGS)
>   signal_child(StartupPID, SIGUSR2);
>   }
>  
> + /* Force ServerLoop to iterate */
> + SetLatch();
> +
>   PG_SETMASK();
>  
>   errno = save_errno;

I kind of would like, in master, take a chance of replace all the work
done in signal handlers, by just a SetLatch(), and do it outside of
signal handlers instead.  Forking from signal handlers is just plain
weird.



> diff --git a/src/backend/storage/ipc/latchindex 4798370..92d2ff0 100644
> --- a/src/backend/storage/ipc/latch.c
> +++ b/src/backend/storage/ipc/latch.c
> @@ -62,6 +62,10 @@
>  #include "storage/pmsignal.h"
>  #include "storage/shmem.h"
>  
> +#ifndef FD_CLOEXEC
> +#define FD_CLOEXEC 1
> +#endif

Hm? Are we sure this is portable?  Is there really cases that have
F_SETFD, but not CLOEXEC?

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your 

Re: [HACKERS] Range Merge Join v1

2017-04-21 Thread Andrew Borodin
Hi, Jeff!

I'm planning to provide the review for this patch in future. I've done
some performance tesing (see attachment).
All in all, nothing important, everything works fine.
Tests were executed under current HEAD on Ubuntu 16 over MacBook Air.

I observe that:
1. Patch brings performance improvement for specified query
2. Performance improvement of Range Merge Join vs GiST Nested Loop
Join (on Index Only Scan) is between 4x and 6x
3. Performance improvement of RMJ with B-tree index vs GiST is between
4x and 10x
(due to lack of actually competing cases, here I omit T-tests, they
are not that important when speaking about 4x difference)
4. Range Merge Join is capable to consume expressional index with
exactly same effect as direct index
5. Other attributes residing in joined tables do not affect
performance improvement

I'll make code review some time later, probably next week. (I hope
there is no urge in this, since commitfest is in July, or is it
urgent?) BTW fixe typo in index specification of original performance
test (both indexes were for same table)

Best regards, Andrey Borodin.


perf2.sql
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Unportable implementation of background worker start

2017-04-21 Thread Tom Lane
Attached is a lightly-tested draft patch that converts the postmaster to
use a WaitEventSet for waiting in ServerLoop.  I've got mixed emotions
about whether this is the direction to proceed, though.  It adds at least
a couple of kernel calls per postmaster signal delivery, and probably to
every postmaster connection acceptance (ServerLoop iteration), to fix
problems that are so corner-casey that we've never even known we had them
till now.

I'm looking longingly at pselect(2), ppoll(2), and epoll_pwait(2), which
would solve the problem without need for a self-pipe.  But the latter two
are Linux-only, and while pselect does exist in recent POSIX editions,
it's nonetheless got portability issues.  Googling suggests that on a
number of platforms, pselect is non-atomic, ie it's nothing but a
wrapper for sigprocmask/select/sigprocmask, which would mean that the
race condition I described in my previous message still exists.

Despite that, a pretty attractive proposal is to do, essentially,

#ifdef HAVE_PSELECT
selres = pselect(nSockets, , NULL, NULL, , );
#else
PG_SETMASK();
selres = select(nSockets, , NULL, NULL, );
PG_SETMASK();
#endif

This fixes the race on platforms where pselect exists and is correctly
implemented, and we're no worse off than before where that's not true.

The other component of the problem is the possibility that select() will
restart if the signal is marked SA_RESTART.  (Presumably that would apply
to pselect too.)  I am thinking that maybe the answer is "if it hurts,
don't do it" --- that is, in the postmaster maybe we shouldn't use
SA_RESTART, at least not for these signals.

A different line of thought is to try to provide a bulletproof solution,
but push the implementation problems down into latch.c --- that is, the
goal would be to provide a pselect-like variant of WaitEventSetWait that
is guaranteed to return if interrupted, as opposed to the current behavior
where it's guaranteed not to.  But that seems like quite a bit of work.

Whether or not we decide to change over the postmaster.c code, I think
it'd likely be a good idea to apply most or all of the attached changes
in latch.c.  Setting CLOEXEC on the relevant FDs is clearly a good thing,
and the other changes will provide some safety if some preloaded extension
decides to create a latch in the postmaster process.

regards, tom lane

diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 6831342..658ba73 100644
*** a/src/backend/postmaster/postmaster.c
--- b/src/backend/postmaster/postmaster.c
***
*** 79,88 
  #include 
  #include 
  
- #ifdef HAVE_SYS_SELECT_H
- #include 
- #endif
- 
  #ifdef USE_BONJOUR
  #include 
  #endif
--- 79,84 
*** static bool LoadedSSL = false;
*** 379,384 
--- 375,386 
  static DNSServiceRef bonjour_sdref = NULL;
  #endif
  
+ /* WaitEventSet that ServerLoop uses to wait for connections */
+ static WaitEventSet *ServerWaitSet = NULL;
+ 
+ /* Latch used to ensure that signals interrupt the wait in ServerLoop */
+ static Latch PostmasterLatch;
+ 
  /*
   * postmaster.c - function prototypes
   */
*** static int	ServerLoop(void);
*** 409,415 
  static int	BackendStartup(Port *port);
  static int	ProcessStartupPacket(Port *port, bool SSLdone);
  static void processCancelRequest(Port *port, void *pkt);
! static int	initMasks(fd_set *rmask);
  static void report_fork_failure_to_client(Port *port, int errnum);
  static CAC_state canAcceptConnections(void);
  static bool RandomCancelKey(int32 *cancel_key);
--- 411,417 
  static int	BackendStartup(Port *port);
  static int	ProcessStartupPacket(Port *port, bool SSLdone);
  static void processCancelRequest(Port *port, void *pkt);
! static void initServerWaitSet(void);
  static void report_fork_failure_to_client(Port *port, int errnum);
  static CAC_state canAcceptConnections(void);
  static bool RandomCancelKey(int32 *cancel_key);
*** checkDataDir(void)
*** 1535,1541 
  }
  
  /*
!  * Determine how long should we let ServerLoop sleep.
   *
   * In normal conditions we wait at most one minute, to ensure that the other
   * background tasks handled by ServerLoop get done even when no requests are
--- 1537,1543 
  }
  
  /*
!  * Determine how long should we let ServerLoop sleep (in msec).
   *
   * In normal conditions we wait at most one minute, to ensure that the other
   * background tasks handled by ServerLoop get done even when no requests are
*** checkDataDir(void)
*** 1543,1551 
   * we don't actually sleep so that they are quickly serviced.  Other exception
   * cases are as shown in the code.
   */
! static void
! DetermineSleepTime(struct timeval * timeout)
  {
  	TimestampTz next_wakeup = 0;
  
  	/*
--- 1545,1554 
   * we don't actually sleep so that they are quickly serviced.  Other exception
   * cases are as shown in the code.
   */
! static long
! 

Re: [HACKERS] Unportable implementation of background worker start

2017-04-21 Thread Alvaro Herrera
Tom Lane wrote:

> After sleeping and thinking more, I've realized that the
> slow-bgworker-start issue actually exists on *every* platform, it's just
> harder to hit when select() is interruptable.  But consider the case
> where multiple bgworker-start requests arrive while ServerLoop is
> actively executing (perhaps because a connection request just came in).
> The postmaster has signals blocked, so nothing happens for the moment.
> When we go around the loop and reach
> 
> PG_SETMASK();
> 
> the pending SIGUSR1 is delivered, and sigusr1_handler reads all the
> bgworker start requests, and services just one of them.  Then control
> returns and proceeds to
> 
> selres = select(nSockets, , NULL, NULL, );
> 
> But now there's no interrupt pending.  So the remaining start requests
> do not get serviced until (a) some other postmaster interrupt arrives,
> or (b) the one-minute timeout elapses.  They could be waiting awhile.
> 
> Bottom line is that any request for more than one bgworker at a time
> faces a non-negligible risk of suffering serious latency.

Interesting.  It's hard to hit, for sure.

> I'm coming back to the idea that at least in the back branches, the
> thing to do is allow maybe_start_bgworker to start multiple workers.
>
> Is there any actual evidence for the claim that that might have
> bad side effects?

Well, I ran tests with a few dozen thousand sample workers and the
neglect for other things (such as connection requests) was visible, but
that's probably not a scenario many servers run often currently.  I
don't strongly object to the idea of removing the "return" in older
branches, since it's evidently a problem.  However, as bgworkers start
to be used more, I think we should definitely have some protection.  In
a system with a large number of workers available for parallel queries,
it seems possible for a high velocity server to get stuck in the loop
for some time.  (I haven't actually verified this, though.  My
experiments were with the early kind, static bgworkers.)

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Unportable implementation of background worker start

2017-04-21 Thread Tom Lane
Andres Freund  writes:
> On 2017-04-20 00:50:13 -0400, Tom Lane wrote:
>> My first reaction was that that sounded like a lot more work than removing
>> two lines from maybe_start_bgworker and adjusting some comments.  But on
>> closer inspection, the slow-bgworker-start issue isn't the only problem
>> here.

> FWIW, I vaguely remember somewhat related issues on x86/linux too.

After sleeping and thinking more, I've realized that the
slow-bgworker-start issue actually exists on *every* platform, it's just
harder to hit when select() is interruptable.  But consider the case
where multiple bgworker-start requests arrive while ServerLoop is
actively executing (perhaps because a connection request just came in).
The postmaster has signals blocked, so nothing happens for the moment.
When we go around the loop and reach

PG_SETMASK();

the pending SIGUSR1 is delivered, and sigusr1_handler reads all the
bgworker start requests, and services just one of them.  Then control
returns and proceeds to

selres = select(nSockets, , NULL, NULL, );

But now there's no interrupt pending.  So the remaining start requests
do not get serviced until (a) some other postmaster interrupt arrives,
or (b) the one-minute timeout elapses.  They could be waiting awhile.

Bottom line is that any request for more than one bgworker at a time
faces a non-negligible risk of suffering serious latency.

I'm coming back to the idea that at least in the back branches, the
thing to do is allow maybe_start_bgworker to start multiple workers.
Is there any actual evidence for the claim that that might have
bad side effects?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Old versions of Test::More

2017-04-21 Thread Tom Lane
Andrew Dunstan  writes:
> As we discovered yesterday via jacana, very old versions of Test::More
> don't support the note() function. It therefore seems prudent to specify
> a minimum version number for the module in those scripts that use the
> function. According to the changelog, version 0.82 (released in Oct
> 2008) should be sufficient. So I suggest the attached patch.

Maybe it'd be better to put the minimum-version check somewhere central,
like PostgresNode.pm?  I suppose that our use of "note" will expand,
and I can't see us remembering to put this into all and only the specific
tests that use "note".  Especially if there's no comment about it.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Old versions of Test::More

2017-04-21 Thread Andrew Dunstan


On 04/21/2017 10:36 AM, Daniel Gustafsson wrote:
>> On 21 Apr 2017, at 15:08, Andrew Dunstan  
>> wrote:
>>
>> As we discovered yesterday via jacana, very old versions of Test::More
>> don't support the note() function. It therefore seems prudent to specify
>> a minimum version number for the module in those scripts that use the
>> function. According to the changelog, version 0.82 (released in Oct
>> 2008) should be sufficient. So I suggest the attached patch.
> +1 for specifying version (0.82 was replaced with 0.84 within hours but it
> doesn’t really matter for this).  However, since src/test/ssl/ServerSetup.pm
> also invoke note() doesn’t it make sense to add the version requirement there
> as well as a guard in case a testscript using ServerSetup isn’t explicitly
> specifying the Test::More version?
>


Yeah, good catch. Will do.

cheers

andrew

-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Old versions of Test::More

2017-04-21 Thread Daniel Gustafsson
> On 21 Apr 2017, at 15:08, Andrew Dunstan  
> wrote:
> 
> As we discovered yesterday via jacana, very old versions of Test::More
> don't support the note() function. It therefore seems prudent to specify
> a minimum version number for the module in those scripts that use the
> function. According to the changelog, version 0.82 (released in Oct
> 2008) should be sufficient. So I suggest the attached patch.

+1 for specifying version (0.82 was replaced with 0.84 within hours but it
doesn’t really matter for this).  However, since src/test/ssl/ServerSetup.pm
also invoke note() doesn’t it make sense to add the version requirement there
as well as a guard in case a testscript using ServerSetup isn’t explicitly
specifying the Test::More version?

cheers ./daniel



test-more-version-v2.diff
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] On-disk format of SCRAM verifiers

2017-04-21 Thread Simon Riggs
On 21 April 2017 at 14:42, Heikki Linnakangas  wrote:

 SCRAM-SHA-256$:$:
>>>
>>> Could you explain where you are looking? I don't see that in RFC5803
>>
> >From 1.  Overview:
>
> Yeah, it's not easy to see, I missed it earlier too. You have to look at RFC 
> 5803 and RFC 3112 together. RFC 3112 says that the overall format is 
> "$$", and RFC5803 says that for SCRAM, scheme is 
> "SCRAM-SHA-256" (for our variant), authInfo is ":" and 
> authValue is ":"
>
> They really should've included examples in those RFCs.

Thanks

+1 for change

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] tablesync patch broke the assumption that logical rep depends on?

2017-04-21 Thread Petr Jelinek
On 21/04/17 16:23, Peter Eisentraut wrote:
> On 4/21/17 10:11, Petr Jelinek wrote:
>> On 21/04/17 16:09, Peter Eisentraut wrote:
>>> On 4/20/17 14:29, Petr Jelinek wrote:
 +  /* Find unused worker slot. */
 +  if (!w->in_use)
{
 -  worker = >workers[slot];
 -  break;
 +  worker = w;
 +  slot = i;
 +  }
>>>
>>> Doesn't this still need a break?  Otherwise it always picks the last slot.
>>>
>>
>> Yes it will pick the last slot, does that matter though, is the first
>> one better somehow?
>>
>> We can't break because we also need to continue the counter (I think the
>> issue that the counter solves is probably just theoretical, but still).
> 
> I see.  I think the code would be less confusing if we break the loop
> like before and call logicalrep_sync_worker_count() separately.
> 
>> Hmm actually, maybe the if (!w->in_use) should be if (worker == NULL &&
>> !w->in_use)?
> 
> That would also do it.  But it's getting a bit fiddly.
> 

I just wanted to avoid looping twice, especially since the garbage
collecting code has to also do the loop. I guess I'll go with my
original coding for this then which was to put retry label above the
loop first, then try finding worker slot, if found call the
logicalrep_sync_worker_count and if not found do the garbage collection
and if we cleaned up something then goto retry.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] tablesync patch broke the assumption that logical rep depends on?

2017-04-21 Thread Peter Eisentraut
On 4/21/17 10:11, Petr Jelinek wrote:
> On 21/04/17 16:09, Peter Eisentraut wrote:
>> On 4/20/17 14:29, Petr Jelinek wrote:
>>> +   /* Find unused worker slot. */
>>> +   if (!w->in_use)
>>> {
>>> -   worker = >workers[slot];
>>> -   break;
>>> +   worker = w;
>>> +   slot = i;
>>> +   }
>>
>> Doesn't this still need a break?  Otherwise it always picks the last slot.
>>
> 
> Yes it will pick the last slot, does that matter though, is the first
> one better somehow?
> 
> We can't break because we also need to continue the counter (I think the
> issue that the counter solves is probably just theoretical, but still).

I see.  I think the code would be less confusing if we break the loop
like before and call logicalrep_sync_worker_count() separately.

> Hmm actually, maybe the if (!w->in_use) should be if (worker == NULL &&
> !w->in_use)?

That would also do it.  But it's getting a bit fiddly.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Interval for launching the table sync worker

2017-04-21 Thread Masahiko Sawada
On Fri, Apr 21, 2017 at 5:33 PM, Kyotaro HORIGUCHI
 wrote:
> Hello,
>
> At Thu, 20 Apr 2017 13:21:14 +0900, Masahiko Sawada  
> wrote in 

Re: [HACKERS] subscription worker doesn't start immediately on eabled

2017-04-21 Thread Peter Eisentraut
On 4/6/17 08:24, Kyotaro HORIGUCHI wrote:
> Hello. I found dubious behavior while playing with logical
> replication.
> 
> When we disable a subscription, replication worker immediately stops.
> 
> =# ALTER SUBSCRIPTION s1 DISABLE;
> 
> On the other hand even if we enable a subscription, worker
> doesn't start immediately. It takes 3 minutes in the worst
> case. (DEFAULT_NAPTIME_PER_CYCLE)
> 
> The attached patch wakes up launcher when a subscription is
> enabled. This fails when a subscription is enabled immedaitely
> after disabling but it won't be a matter.

What do you mean by "this fails"?

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] tablesync patch broke the assumption that logical rep depends on?

2017-04-21 Thread Petr Jelinek
On 21/04/17 16:09, Peter Eisentraut wrote:
> On 4/20/17 14:29, Petr Jelinek wrote:
>> +/* Find unused worker slot. */
>> +if (!w->in_use)
>>  {
>> -worker = >workers[slot];
>> -break;
>> +worker = w;
>> +slot = i;
>> +}
> 
> Doesn't this still need a break?  Otherwise it always picks the last slot.
> 

Yes it will pick the last slot, does that matter though, is the first
one better somehow?

We can't break because we also need to continue the counter (I think the
issue that the counter solves is probably just theoretical, but still).

Hmm actually, maybe the if (!w->in_use) should be if (worker == NULL &&
!w->in_use)?

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] tablesync patch broke the assumption that logical rep depends on?

2017-04-21 Thread Peter Eisentraut
On 4/20/17 22:24, Noah Misch wrote:
> On Mon, Apr 17, 2017 at 02:09:54PM -0400, Peter Eisentraut wrote:
>> I think we're not really sure yet what to do about this.  Discussion is
>> ongoing.  I'll report back on Wednesday.
> 
> This PostgreSQL 10 open item is past due for your status update, and this is
> overall the seventh time you have you allowed one of your open items to go out
> of compliance.  Kindly start complying with the policy on open item ownership:
> https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com

We are actively working on it.  It should be resolved within a few days.
 Next check-in Monday.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] tablesync patch broke the assumption that logical rep depends on?

2017-04-21 Thread Peter Eisentraut
On 4/20/17 14:29, Petr Jelinek wrote:
> + /* Find unused worker slot. */
> + if (!w->in_use)
>   {
> - worker = >workers[slot];
> - break;
> + worker = w;
> + slot = i;
> + }

Doesn't this still need a break?  Otherwise it always picks the last slot.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Interval for launching the table sync worker

2017-04-21 Thread Petr Jelinek
On 21/04/17 10:33, Kyotaro HORIGUCHI wrote:
> Hello,
> 
> At Thu, 20 Apr 2017 13:21:14 +0900, Masahiko Sawada  
> wrote in 

Re: [HACKERS] Interval for launching the table sync worker

2017-04-21 Thread Petr Jelinek
On 21/04/17 04:38, Masahiko Sawada wrote:
> On Thu, Apr 20, 2017 at 8:43 PM, Petr Jelinek
>  wrote:
>> On 20/04/17 06:21, Masahiko Sawada wrote:
>>> On Thu, Apr 20, 2017 at 12:30 AM, Petr Jelinek
>>>  wrote:
 On 19/04/17 15:57, Masahiko Sawada wrote:
> On Wed, Apr 19, 2017 at 10:07 PM, Petr Jelinek
>  wrote:
>> On 19/04/17 14:42, Masahiko Sawada wrote:
>>> On Wed, Apr 19, 2017 at 5:12 PM, Kyotaro HORIGUCHI
>>>  wrote:
 At Tue, 18 Apr 2017 18:40:56 +0200, Petr Jelinek 
  wrote in 
 
> On 18/04/17 18:14, Peter Eisentraut wrote:
>> On 4/18/17 11:59, Petr Jelinek wrote:
>>> Hmm if we create hashtable for this, I'd say create hashtable for 
>>> the
>>> whole table_states then. The reason why it's list now was that it 
>>> seemed
>>> unnecessary to have hashtable when it will be empty almost always 
>>> but
>>> there is no need to have both hashtable + list IMHO.

 I understant that but I also don't like the frequent palloc/pfree
 in long-lasting context and double loop like Peter.

>> The difference is that we blow away the list of states when the 
>> catalog
>> changes, but we keep the hash table with the start times around.  We
>> need two things with different life times.

 On the other hand, hash seems overdone. Addition to that, the
 hash-version leaks stale entries while subscriptions are
 modified. But vacuuming them costs high.

> Why can't we just update the hashtable based on the catalog? I mean 
> once
> the record is not needed in the list, the table has been synced so 
> there
> is no need for the timestamp either since we'll not try to start the
> worker again.
>>>
>>> I guess the table sync worker for the same table could need to be
>>> started again. For example, please image a case where the table
>>> belonging to the publication is removed from it and the corresponding
>>> subscription is refreshed, and then the table is added to it again. We
>>> have the record of the table with timestamp in the hash table when the
>>> table sync in the first time, but the table sync after refreshed could
>>> have to wait for the interval.
>>>
>>
>> But why do we want to wait in such case where user has explicitly
>> requested refresh?
>>
>
> Yeah, sorry, I meant that we don't want to wait but cannot launch the
> tablesync worker in such case.
>
> But after more thought, the latest patch Peter proposed has the same
> problem. Perhaps we need to scan always whole pg_subscription_rel and
> remove the entry if the corresponding table get synced.
>

 Yes that's what I mean by "Why can't we just update the hashtable based
 on the catalog". And if we do that then I don't understand why do we
 need both hastable and linked list if we need to update both based on
 catalog reads anyway.
>>>
>>> Thanks, I've now understood correctly. Yes, I think you're right. If
>>> we update the hash table based on the catalog whenever table state is
>>> invalidated, we don't need to have both hash table and list.
>>>
>>> BTW, in current HEAD the SUBREL_STATE_SYNCWAIT is not stored in the
>>> pg_subscription_catalog. So the following condition seems not correct.
>>> We should use "syncworker->relstate == SUBSCRIPTION_STATE_SYNCWAIT"
>>> instead?
>>>
>>> /*
>>>  * There is a worker synchronizing the relation and waiting for
>>>  * apply to do something.
>>>  */
>>> if (syncworker && rstate->state == SUBREL_STATE_SYNCWAIT)
>>> {
>>> /*
>>>  * There are three possible synchronization situations here.
>>>  *
>>>  * a) Apply is in front of the table sync: We tell the table
>>>  *sync to CATCHUP.
>>>  *
>>>  * b) Apply is behind the table sync: We tell the table sync
>>>  *to mark the table as SYNCDONE and finish.
>>>
>>>  * c) Apply and table sync are at the same position: We tell
>>>  *table sync to mark the table as READY and finish.
>>>  *
>>>  * In any case we'll need to wait for table sync to change
>>>  * the state in catalog and only then continue ourselves.
>>>  */
>>>
>>
>> Good catch. Although it's not comment that's wrong, it's the if. We
>> should not compare rstate->state but syncworker->relstate.
> 
> I've attached a patch to fix this bug.
> 


Re: [HACKERS] Triggers and logical replication (10devel)

2017-04-21 Thread Egor Rogov

On 21.04.2017 16:29, Merlin Moncure wrote:

On Fri, Apr 21, 2017 at 5:08 AM, Egor Rogov  wrote:

Hello,
It seams that tiggers don't fire on subscriber's tables during logical
replication. Is it a bug?

Reading the documentation (which is TBH a bit hard to follow) it
appears that it is expected behavior.

https://www.postgresql.org/docs/devel/static/sql-altertable.html states:

"The trigger firing mechanism is also affected by the configuration
variable session_replication_role. Simply enabled triggers will fire
when the replication role is “origin” (the default) or “local”."

Ah, ALTER TABLE ENABLE REPLICA TRIGGER is what I missed.
Thanks!

Regards,
Egor Rogov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WITH clause in CREATE STATISTICS

2017-04-21 Thread Tels
Moin,

On Fri, April 21, 2017 7:04 am, David Rowley wrote:
> On 21 April 2017 at 22:30, Tels  wrote:
>> I'd rather see:
>>
>>  CREATE STATISTICS stats_name ON table(col);
>>
>> as this both mirrors CREATE INDEX and foreign keys with REFERENCES. It
>> could also be extended to both more columns, expressions or other tables
>> like so:
>>
>>  CREATE STATISTICS stats ON t1(col1, col2 / 2), t2 (a,b);
>>
>> and even:
>>
>>  CREATE STATISTICS stats ON t1(col1, col2 / 2), t2 (a,b) WITH (options)
>> WHERE t2.a > 4;
>
> How would you express a join condition with that syntax?
>
>> This looks easy to remember, since it compares to:
>>
>>  CREATE INDEX idx_name ON t2(a,b) WITH (options) WHERE t2.a > 4;
>>
>> Or am I'm missing something?
>
> Sadly yes, you are, and it's not the first time.
>
> I seem to remember mentioning this to you already in [1].
>
> Please, can you read over [2], it mentions exactly what you're
> proposing and why it's not any good.
>
> [1]
> https://www.postgresql.org/message-id/cakjs1f9hmet+7adiceau8heompob5pkkcvyzliezje3dvut...@mail.gmail.com
> [2]
> https://www.postgresql.org/message-id/CAEZATCUtGR+U5+QTwjHhe9rLG2nguEysHQ5NaqcK=vbj78v...@mail.gmail.com

Ah, ok, thank you, somehow I missed your answer the first time. So, just
ignore me :)

Best wishes,

Tels


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] On-disk format of SCRAM verifiers

2017-04-21 Thread Dagfinn Ilmari Mannsåker
Michael Paquier  writes:

> On Fri, Apr 21, 2017 at 10:02 PM, Simon Riggs  wrote:
>> On 21 April 2017 at 10:20, Heikki Linnakangas  wrote:
>>> But looking more closely, I think I misunderstood RFC 5803. It *does* in
>>> fact specify a single string format to store the verifier in. And the format
>>> looks like:
>>>
>>> SCRAM-SHA-256$:$:
>>
>> Could you explain where you are looking? I don't see that in RFC5803
>
> From 1.  Overview:
>
>Syntax of the attribute can be expressed using ABNF [RFC5234].  Non-
>terminal references in the following ABNF are defined in either
>[AUTHPASS], [RFC4422], or [RFC5234].
>
>scram-mech = "SCRAM-SHA-1" / scram-mech-ext
>   ; Complies with ABNF for 
>   ; defined in [AUTHPASS].
>
>scram-authInfo = iter-count ":" salt
>   ; Complies with ABNF for 
>   ; defined in [AUTHPASS].
>
>scram-authValue = stored-key ":" server-key
>   ; Complies with ABNF for 
>   ; defined in [AUTHPASS].

And scram-mech, scram-authInfo and scram-authValue are used as the
"scheme", "authInfo" and "authValue" parts as specified in [AUTHPASS]
(RFC3112):

authPasswordValue   = w scheme s authInfo s authValue w
scheme  = %x30-39 / %x41-5A / %x2D-2F / %x5F
; 0-9, A-Z, "-", ".", "/", or "_"
authInfo= schemeSpecificValue
authValue   = schemeSpecificValue
schemeSpecificValue = *( %x21-23 / %x25-7E )
; printable ASCII less "$" and " "
s   = w SEP w
w   = *SP
SEP = %x24 ; "$"
SP  = %x20 ; " " (space)

> Thanks,
> -- 
> Michael


- ilmari
-- 
"A disappointingly low fraction of the human race is,
 at any given time, on fire." - Stig Sandbeck Mathisen


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] On-disk format of SCRAM verifiers

2017-04-21 Thread Heikki Linnakangas


On 21 April 2017 16:20:56 EEST, Michael Paquier  
wrote:
>On Fri, Apr 21, 2017 at 10:02 PM, Simon Riggs 
>wrote:
>> On 21 April 2017 at 10:20, Heikki Linnakangas 
>wrote:
>>> But looking more closely, I think I misunderstood RFC 5803. It
>*does* in
>>> fact specify a single string format to store the verifier in. And
>the format
>>> looks like:
>>>
>>> SCRAM-SHA-256$:$:
>>
>> Could you explain where you are looking? I don't see that in RFC5803
>
>From 1.  Overview:

Yeah, it's not easy to see, I missed it earlier too. You have to look at RFC 
5803 and RFC 3112 together. RFC 3112 says that the overall format is 
"$$", and RFC5803 says that for SCRAM, scheme is 
"SCRAM-SHA-256" (for our variant), authInfo is ":" and 
authValue is ":"

They really should've included examples in those RFCs.

- Heikki


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] On-disk format of SCRAM verifiers

2017-04-21 Thread Simon Riggs
On 21 April 2017 at 14:20, Michael Paquier  wrote:
> On Fri, Apr 21, 2017 at 10:02 PM, Simon Riggs  wrote:
>> On 21 April 2017 at 10:20, Heikki Linnakangas  wrote:
>>> But looking more closely, I think I misunderstood RFC 5803. It *does* in
>>> fact specify a single string format to store the verifier in. And the format
>>> looks like:
>>>
>>> SCRAM-SHA-256$:$:
>>
>> Could you explain where you are looking? I don't see that in RFC5803
>
> From 1.  Overview:
>
>Syntax of the attribute can be expressed using ABNF [RFC5234].  Non-
>terminal references in the following ABNF are defined in either
>[AUTHPASS], [RFC4422], or [RFC5234].
>
>scram-mech = "SCRAM-SHA-1" / scram-mech-ext
>   ; Complies with ABNF for 
>   ; defined in [AUTHPASS].
>
>scram-authInfo = iter-count ":" salt
>   ; Complies with ABNF for 
>   ; defined in [AUTHPASS].
>
>scram-authValue = stored-key ":" server-key
>   ; Complies with ABNF for 
>   ; defined in [AUTHPASS].
>
> Thanks,

The above text, which I've already read, does not explain the
suggested change from : to $.

Could you explain?

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Triggers and logical replication (10devel)

2017-04-21 Thread Merlin Moncure
On Fri, Apr 21, 2017 at 5:08 AM, Egor Rogov  wrote:
> Hello,
> It seams that tiggers don't fire on subscriber's tables during logical
> replication. Is it a bug?

Reading the documentation (which is TBH a bit hard to follow) it
appears that it is expected behavior.

https://www.postgresql.org/docs/devel/static/logical-replication-architecture.html#logical-replication-snapshot
states:

"The apply process on the subscriber database always runs with
session_replication_role set to replica, which produces the usual
effects on triggers and constraints."

https://www.postgresql.org/docs/devel/static/sql-altertable.html states:

"The trigger firing mechanism is also affected by the configuration
variable session_replication_role. Simply enabled triggers will fire
when the replication role is “origin” (the default) or “local”."

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] On-disk format of SCRAM verifiers

2017-04-21 Thread Michael Paquier
On Fri, Apr 21, 2017 at 9:25 PM, Stephen Frost  wrote:
> * Heikki Linnakangas (hlinn...@iki.fi) wrote:
>> I think we should adopt that exact format, so that our verifiers are
>> compatible with RFC 5803. It doesn't make any immediate difference,
>> but since there is a standard out there, might as well follow it.
>
> +1
>
>> And just in case we get support for looking up SCRAM verifiers from
>> an LDAP server in the future, it will come handy as we won't need to
>> parse two different formats.
>
> Agreed.

+1 to all that. Consistency is a good thing.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] On-disk format of SCRAM verifiers

2017-04-21 Thread Michael Paquier
On Fri, Apr 21, 2017 at 10:02 PM, Simon Riggs  wrote:
> On 21 April 2017 at 10:20, Heikki Linnakangas  wrote:
>> But looking more closely, I think I misunderstood RFC 5803. It *does* in
>> fact specify a single string format to store the verifier in. And the format
>> looks like:
>>
>> SCRAM-SHA-256$:$:
>
> Could you explain where you are looking? I don't see that in RFC5803

>From 1.  Overview:

   Syntax of the attribute can be expressed using ABNF [RFC5234].  Non-
   terminal references in the following ABNF are defined in either
   [AUTHPASS], [RFC4422], or [RFC5234].

   scram-mech = "SCRAM-SHA-1" / scram-mech-ext
  ; Complies with ABNF for 
  ; defined in [AUTHPASS].

   scram-authInfo = iter-count ":" salt
  ; Complies with ABNF for 
  ; defined in [AUTHPASS].

   scram-authValue = stored-key ":" server-key
  ; Complies with ABNF for 
  ; defined in [AUTHPASS].

Thanks,
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Old versions of Test::More

2017-04-21 Thread Andrew Dunstan

As we discovered yesterday via jacana, very old versions of Test::More
don't support the note() function. It therefore seems prudent to specify
a minimum version number for the module in those scripts that use the
function. According to the changelog, version 0.82 (released in Oct
2008) should be sufficient. So I suggest the attached patch.


cheers

andrew

-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

diff --git a/src/test/recovery/t/001_stream_rep.pl b/src/test/recovery/t/001_stream_rep.pl
index ccd5943..29e648e 100644
--- a/src/test/recovery/t/001_stream_rep.pl
+++ b/src/test/recovery/t/001_stream_rep.pl
@@ -3,7 +3,7 @@ use strict;
 use warnings;
 use PostgresNode;
 use TestLib;
-use Test::More tests => 28;
+use Test::More 0.82 tests => 28;
 
 # Initialize master node
 my $node_master = get_new_node('master');
diff --git a/src/test/recovery/t/010_logical_decoding_timelines.pl b/src/test/recovery/t/010_logical_decoding_timelines.pl
index cdddb4d..bd353ca 100644
--- a/src/test/recovery/t/010_logical_decoding_timelines.pl
+++ b/src/test/recovery/t/010_logical_decoding_timelines.pl
@@ -23,7 +23,7 @@ use warnings;
 
 use PostgresNode;
 use TestLib;
-use Test::More tests => 13;
+use Test::More 0.82 tests => 13;
 use RecursiveCopy;
 use File::Copy;
 use IPC::Run ();
diff --git a/src/test/ssl/t/001_ssltests.pl b/src/test/ssl/t/001_ssltests.pl
index 0c69bf0..4bd905d 100644
--- a/src/test/ssl/t/001_ssltests.pl
+++ b/src/test/ssl/t/001_ssltests.pl
@@ -2,7 +2,7 @@ use strict;
 use warnings;
 use PostgresNode;
 use TestLib;
-use Test::More tests => 40;
+use Test::More 0.82 tests => 40;
 use ServerSetup;
 use File::Copy;
 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] On-disk format of SCRAM verifiers

2017-04-21 Thread Simon Riggs
On 21 April 2017 at 10:20, Heikki Linnakangas  wrote:

> But looking more closely, I think I misunderstood RFC 5803. It *does* in
> fact specify a single string format to store the verifier in. And the format
> looks like:
>
> SCRAM-SHA-256$:$:

Could you explain where you are looking? I don't see that in RFC5803

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] DROP SUBSCRIPTION, query cancellations and slot handling

2017-04-21 Thread Peter Eisentraut
On 4/20/17 22:57, Michael Paquier wrote:
> On Fri, Apr 21, 2017 at 11:34 AM, Petr Jelinek
>  wrote:
>> On 20/04/17 23:30, Peter Eisentraut wrote:
>>> On 4/20/17 10:19, Petr Jelinek wrote:
 Hmm well since this only affects the synchronization of table
 states/names, I guess we could just simply do that before we create the
 slot as there is no expectancy of consistency between slot and the table
 list snapshot.
>>>
>>> I suppose that wouldn't hurt.
>>>
>>> Prior to the table sync patch, a missing target relation would just show
>>> up as an error later on in the logs.  So having the error sooner
>>> actually seems like a good change.
>>>
>>
>> Very simple patch to make.
> 
> +1 for that.

Committed that.

I don't think there is anything else open here.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] dtrace probes

2017-04-21 Thread Jesper Pedersen

On 04/20/2017 10:30 AM, Jesper Pedersen wrote:

I think this fix is harmless and has some value in terms of
consistency.  One minor suggestion is that you should leave a space
after typecasting.

- TRACE_POSTGRESQL_LWLOCK_WAIT_DONE(T_NAME(lock), mode);
+ TRACE_POSTGRESQL_LWLOCK_WAIT_DONE(T_NAME(lock), (int)mode);

There should be a space like "(int) mode".




v2 attached.



I managed to attach the same patch again, so here is v3.

Best regards,
 Jesper


From 0d964df84950ca90c08ed6dd77a575d4b70ea7db Mon Sep 17 00:00:00 2001
From: jesperpedersen 
Date: Tue, 18 Apr 2017 11:44:18 -0400
Subject: [PATCH] Explicit cast LWLockMode, which an enum, to int in order to
 match the dtrace definition of the lwlock methods. Thereby all call sites
 will have the same definition instead of a mix between signed and unsigned.

Author: Jesper Pedersen 
---
 src/backend/storage/lmgr/lwlock.c | 18 +-
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/src/backend/storage/lmgr/lwlock.c 
b/src/backend/storage/lmgr/lwlock.c
index 3e13394..c551be2 100644
--- a/src/backend/storage/lmgr/lwlock.c
+++ b/src/backend/storage/lmgr/lwlock.c
@@ -1226,7 +1226,7 @@ LWLockAcquire(LWLock *lock, LWLockMode mode)
 #endif
 
LWLockReportWaitStart(lock);
-   TRACE_POSTGRESQL_LWLOCK_WAIT_START(T_NAME(lock), mode);
+   TRACE_POSTGRESQL_LWLOCK_WAIT_START(T_NAME(lock), (int) mode);
 
for (;;)
{
@@ -1248,7 +1248,7 @@ LWLockAcquire(LWLock *lock, LWLockMode mode)
}
 #endif
 
-   TRACE_POSTGRESQL_LWLOCK_WAIT_DONE(T_NAME(lock), mode);
+   TRACE_POSTGRESQL_LWLOCK_WAIT_DONE(T_NAME(lock), (int) mode);
LWLockReportWaitEnd();
 
LOG_LWDEBUG("LWLockAcquire", lock, "awakened");
@@ -1257,7 +1257,7 @@ LWLockAcquire(LWLock *lock, LWLockMode mode)
result = false;
}
 
-   TRACE_POSTGRESQL_LWLOCK_ACQUIRE(T_NAME(lock), mode);
+   TRACE_POSTGRESQL_LWLOCK_ACQUIRE(T_NAME(lock), (int) mode);
 
/* Add lock to list of locks held by this backend */
held_lwlocks[num_held_lwlocks].lock = lock;
@@ -1308,14 +1308,14 @@ LWLockConditionalAcquire(LWLock *lock, LWLockMode mode)
RESUME_INTERRUPTS();
 
LOG_LWDEBUG("LWLockConditionalAcquire", lock, "failed");
-   TRACE_POSTGRESQL_LWLOCK_CONDACQUIRE_FAIL(T_NAME(lock), mode);
+   TRACE_POSTGRESQL_LWLOCK_CONDACQUIRE_FAIL(T_NAME(lock), (int) 
mode);
}
else
{
/* Add lock to list of locks held by this backend */
held_lwlocks[num_held_lwlocks].lock = lock;
held_lwlocks[num_held_lwlocks++].mode = mode;
-   TRACE_POSTGRESQL_LWLOCK_CONDACQUIRE(T_NAME(lock), mode);
+   TRACE_POSTGRESQL_LWLOCK_CONDACQUIRE(T_NAME(lock), (int) mode);
}
return !mustwait;
 }
@@ -1387,7 +1387,7 @@ LWLockAcquireOrWait(LWLock *lock, LWLockMode mode)
 #endif
 
LWLockReportWaitStart(lock);
-   TRACE_POSTGRESQL_LWLOCK_WAIT_START(T_NAME(lock), mode);
+   TRACE_POSTGRESQL_LWLOCK_WAIT_START(T_NAME(lock), (int) 
mode);
 
for (;;)
{
@@ -1405,7 +1405,7 @@ LWLockAcquireOrWait(LWLock *lock, LWLockMode mode)
Assert(nwaiters < MAX_BACKENDS);
}
 #endif
-   TRACE_POSTGRESQL_LWLOCK_WAIT_DONE(T_NAME(lock), mode);
+   TRACE_POSTGRESQL_LWLOCK_WAIT_DONE(T_NAME(lock), (int) 
mode);
LWLockReportWaitEnd();
 
LOG_LWDEBUG("LWLockAcquireOrWait", lock, "awakened");
@@ -1435,7 +1435,7 @@ LWLockAcquireOrWait(LWLock *lock, LWLockMode mode)
/* Failed to get lock, so release interrupt holdoff */
RESUME_INTERRUPTS();
LOG_LWDEBUG("LWLockAcquireOrWait", lock, "failed");
-   TRACE_POSTGRESQL_LWLOCK_ACQUIRE_OR_WAIT_FAIL(T_NAME(lock), 
mode);
+   TRACE_POSTGRESQL_LWLOCK_ACQUIRE_OR_WAIT_FAIL(T_NAME(lock), 
(int) mode);
}
else
{
@@ -1443,7 +1443,7 @@ LWLockAcquireOrWait(LWLock *lock, LWLockMode mode)
/* Add lock to list of locks held by this backend */
held_lwlocks[num_held_lwlocks].lock = lock;
held_lwlocks[num_held_lwlocks++].mode = mode;
-   TRACE_POSTGRESQL_LWLOCK_ACQUIRE_OR_WAIT(T_NAME(lock), mode);
+   TRACE_POSTGRESQL_LWLOCK_ACQUIRE_OR_WAIT(T_NAME(lock), (int) 
mode);
}
 
return !mustwait;
-- 
2.7.4


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] On-disk format of SCRAM verifiers

2017-04-21 Thread Stephen Frost
Heikki,

* Heikki Linnakangas (hlinn...@iki.fi) wrote:
> I think we should adopt that exact format, so that our verifiers are
> compatible with RFC 5803. It doesn't make any immediate difference,
> but since there is a standard out there, might as well follow it.

+1

> And just in case we get support for looking up SCRAM verifiers from
> an LDAP server in the future, it will come handy as we won't need to
> parse two different formats.

Agreed.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] WITH clause in CREATE STATISTICS

2017-04-21 Thread David Rowley
On 21 April 2017 at 22:30, Tels  wrote:
> I'd rather see:
>
>  CREATE STATISTICS stats_name ON table(col);
>
> as this both mirrors CREATE INDEX and foreign keys with REFERENCES. It
> could also be extended to both more columns, expressions or other tables
> like so:
>
>  CREATE STATISTICS stats ON t1(col1, col2 / 2), t2 (a,b);
>
> and even:
>
>  CREATE STATISTICS stats ON t1(col1, col2 / 2), t2 (a,b) WITH (options)
> WHERE t2.a > 4;

How would you express a join condition with that syntax?

> This looks easy to remember, since it compares to:
>
>  CREATE INDEX idx_name ON t2(a,b) WITH (options) WHERE t2.a > 4;
>
> Or am I'm missing something?

Sadly yes, you are, and it's not the first time.

I seem to remember mentioning this to you already in [1].

Please, can you read over [2], it mentions exactly what you're
proposing and why it's not any good.

[1] 
https://www.postgresql.org/message-id/cakjs1f9hmet+7adiceau8heompob5pkkcvyzliezje3dvut...@mail.gmail.com
[2] 
https://www.postgresql.org/message-id/CAEZATCUtGR+U5+QTwjHhe9rLG2nguEysHQ5NaqcK=vbj78v...@mail.gmail.com

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WITH clause in CREATE STATISTICS

2017-04-21 Thread Tels
Moin,

On Thu, April 20, 2017 8:21 pm, Tomas Vondra wrote:
> On 04/21/2017 12:13 AM, Tom Lane wrote:
>> Alvaro Herrera  writes:
>>> Simon just pointed out that having the WITH clause appear in the middle
>>> of the CREATE STATISTICS command looks odd; apparently somebody else
>>> already complained on list about the same.  Other commands put the WITH
>>> clause at the end, so perhaps we should do likewise in the new command.
>>
>>> Here's a patch to implement that.  I verified that if I change
>>> qualified_name to qualified_name_list, bison does not complain about
>>> conflicts, so this new syntax should support extension to multiple
>>> relations without a problem.
>>
>> Yeah, WITH is fully reserved, so as long as the clause looks like
>> WITH ( stuff... ) you're pretty much gonna be able to drop it
>> wherever you want.
>>
>>> Discuss.
>>
>> +1 for WITH at the end; the existing syntax looks weird to me too.
>>
>
> -1 from me
>
> I like the current syntax more, and  WHERE ... WITH seems a bit weird to
> me. But more importantly, one thing Dean probably considered when
> proposing the current syntax was that we may add support for partial
> statistics, pretty much like partial indexes. And we don't allow WITH at
> the end (after WHERE) for indexes:
>
> test=# create index on t (a) where a < 100 with (fillfactor=10);
> ERROR:  syntax error at or near "with"
> LINE 1: create index on t (a) where a < 100 with (fillfactor=10);
>  ^
> test=# create index on t (a) with (fillfactor=10) where a < 100;

While I'm not sure about where to put the WITH, so to speak, I do favour
consistency.

So I'm inclinded to keep the syntax like for the "create index".

More importantly however, I'd rather see the syntax on the "ON (column)
FROM relname" to be changed to match the existing examples. (Already wrote
this to Simon, not sure if my email made it to the list)

So instead:

 CREATE STATISTICS stats_name ON (columns) FROM relname

I'd rather see:

 CREATE STATISTICS stats_name ON table(col);

as this both mirrors CREATE INDEX and foreign keys with REFERENCES. It
could also be extended to both more columns, expressions or other tables
like so:

 CREATE STATISTICS stats ON t1(col1, col2 / 2), t2 (a,b);

and even:

 CREATE STATISTICS stats ON t1(col1, col2 / 2), t2 (a,b) WITH (options)
WHERE t2.a > 4;

This looks easy to remember, since it compares to:

 CREATE INDEX idx_name ON t2(a,b) WITH (options) WHERE t2.a > 4;

Or am I'm missing something?

Regards,

Tels


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Why is get_cheapest_parallel_safe_total_inner() in pathkeys.c?

2017-04-21 Thread David Rowley
This probably ended up here because there's a bunch of other functions
named get_cheapest* in that file, but all of those relate to getting a
path for specific PathKeys. get_cheapest_parallel_safe_total_inner()
does not do that.

Maybe allpaths.c is a better home for it?

It seems to have been added in [1]

[1] 
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=a71f10189dc10a2fe422158a2c9409e0f77c6b9e

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Triggers and logical replication (10devel)

2017-04-21 Thread Egor Rogov

Hello,
It seams that tiggers don't fire on subscriber's tables during logical 
replication. Is it a bug?



#
# publisher: simple table and publication
#

postgres@publisher=# SELECT version();
version
---
 PostgreSQL 10devel on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 
5.4.0-6ubuntu1~16.04.4) 5.4.0 20160609, 64-bit

(1 row)

postgres@publisher=# CREATE TABLE t(n integer PRIMARY KEY);

postgres@publisher=# CREATE PUBLICATION testpub FOR TABLE t;

#
# subscriber: the same table, triggers to write some information into 
log table, and subscription

#

postgres@subscriber=# CREATE TABLE t(n integer PRIMARY KEY);

postgres@subscriber=# CREATE TABLE log(tg_table_name text, tg_when text, 
tg_op text, tg_level text, tg_str text);


postgres@subscriber=# CREATE OR REPLACE FUNCTION public.describe()
 RETURNS trigger
 LANGUAGE plpgsql
AS $function$
DECLARE
rec record;
str text := '';
BEGIN
IF TG_LEVEL = 'ROW' THEN
CASE TG_OP
WHEN 'DELETE' THEN rec := OLD; str := OLD::text;
WHEN 'UPDATE' THEN rec := NEW; str := OLD || ' -> ' || NEW;
WHEN 'INSERT' THEN rec := NEW; str := NEW::text;
END CASE;
END IF;
INSERT INTO log(tg_table_name, tg_when, tg_op, tg_level, tg_str) 
VALUES (TG_TABLE_NAME, TG_WHEN, TG_OP, TG_LEVEL, str);

RETURN rec;
END;
$function$;

postgres@subscriber=# CREATE TRIGGER t_before_row BEFORE INSERT OR 
UPDATE OR DELETE ON t FOR EACH ROW EXECUTE PROCEDURE describe();


postgres@subscriber=# CREATE TRIGGER t_after_row AFTER INSERT OR UPDATE 
OR DELETE ON t FOR EACH ROW EXECUTE PROCEDURE describe();


postgres@subscriber=# CREATE TRIGGER t_before_stmt BEFORE INSERT OR 
UPDATE OR DELETE ON t FOR EACH STATEMENT EXECUTE PROCEDURE describe();


postgres@subscriber=# CREATE TRIGGER t_after_stmt AFTER INSERT OR UPDATE 
OR DELETE ON t FOR EACH STATEMENT EXECUTE PROCEDURE describe();


postgres@subscriber=# CREATE SUBSCRIPTION testsub CONNECTION 
'host=localhost port=5432 user=postgres dbname=postgres' PUBLICATION 
testpub;


#
# publisher
#

postgres@publisher=# INSERT INTO t VALUES (1);
INSERT 0 1

#
# subscriber
#

postgres@subscriber=# SELECT * FROM t;
 n
---
 1
(1 row)

postgres@subscriber=# SELECT * FROM log;
 tg_table_name | tg_when | tg_op | tg_level | tg_str
---+-+---+--+
(0 rows)

#
# so replication works, but triggers don't.
# now check that triggers work alright locally:
#

postgres@subscriber=# INSERT INTO t VALUES (100);
INSERT 0 1

postgres@subscriber=# SELECT * FROM log;
 tg_table_name | tg_when | tg_op  | tg_level  | tg_str
---+-++---+
 t | BEFORE  | INSERT | STATEMENT |
 t | BEFORE  | INSERT | ROW   | (100)
 t | AFTER   | INSERT | ROW   | (100)
 t | AFTER   | INSERT | STATEMENT |
(4 rows)


Regards,
Egor Rogov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] On-disk format of SCRAM verifiers

2017-04-21 Thread Heikki Linnakangas

The current format for SCRAM verifiers in pg_authid is:

scram-sha-256

While reviewing Michael's patch to change that so that StoredKey and 
ServerKey are stored base64-encoded, rather than hex-encoded as they are 
currently [1], I looked again at RFC 5803. RFC 5803 specifies the format 
to use when storing SCRAM verifiers in LDAP. I looked at it earlier, and 
it was a source of inspiration for the current format, but I didn't 
think that it was directly applicable. I thought that in RFC 5803 the 
different fields were stored as separate fields or attributes, not as a 
single string.


But looking more closely, I think I misunderstood RFC 5803. It *does* in 
fact specify a single string format to store the verifier in. And the 
format looks like:


SCRAM-SHA-256$:$:

Alternating '$' and ':' as the separators seems a bit wonky, but it 
actually makes sense. ":" is treated as one 
field, and ":" is treated as another, which is 
logical, since the iteration count and salt are sent together to the 
client in the SCRAM challenge, while StoredKey and ServerKey must be 
kept secret.


I think we should adopt that exact format, so that our verifiers are 
compatible with RFC 5803. It doesn't make any immediate difference, but 
since there is a standard out there, might as well follow it. And just 
in case we get support for looking up SCRAM verifiers from an LDAP 
server in the future, it will come handy as we won't need to parse two 
different formats.


Barring objections, I'll go change our on-disk format for SCRAM 
verifiers to follow RFC 5803.


[1] 
https://www.postgresql.org/message-id/CAB7nPqSbsCBCxy8-DtwzRxYgTnbGUtY4uFEkLQhG%3DR%3Duo%3Dg8Fw%40mail.gmail.com


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Interval for launching the table sync worker

2017-04-21 Thread Kyotaro HORIGUCHI
Hello,

At Thu, 20 Apr 2017 13:21:14 +0900, Masahiko Sawada  
wrote in 

Re: [HACKERS] Declarative partitioning - another take

2017-04-21 Thread Rajkumar Raghuwanshi
Hi,

I have observed below with the statement triggers.

I am able to create statement triggers at root partition, but these
triggers, not getting fired on updating partition.

CREATE TABLE pt (a INT, b INT) PARTITION BY RANGE(a);
CREATE TABLE pt1 PARTITION OF pt FOR VALUES FROM (1) to (7);
CREATE TABLE pt2 PARTITION OF pt FOR VALUES FROM (7) to (11);
INSERT INTO pt (a,b) SELECT i,i FROM generate_series(1,10)i;
CREATE TABLE pt_trigger(TG_NAME varchar,TG_TABLE_NAME varchar,TG_LEVEL
varchar,TG_WHEN varchar);
CREATE OR REPLACE FUNCTION process_pt_trigger() RETURNS TRIGGER AS $pttg$
BEGIN
IF (TG_OP = 'UPDATE') THEN
INSERT INTO pt_trigger SELECT
TG_NAME,TG_TABLE_NAME,TG_LEVEL,TG_WHEN;
RETURN NEW;
END IF;
RETURN NULL;
END;
$pttg$ LANGUAGE plpgsql;
CREATE TRIGGER pt_trigger_after_p0 AFTER UPDATE ON pt FOR EACH STATEMENT
EXECUTE PROCEDURE process_pt_trigger();
CREATE TRIGGER pt_trigger_before_p0 BEFORE UPDATE ON pt FOR EACH STATEMENT
EXECUTE PROCEDURE process_pt_trigger();

postgres=# UPDATE pt SET a = 2 WHERE a = 1;
UPDATE 1
postgres=# SELECT * FROM pt_trigger ORDER BY 1;
 tg_name | tg_table_name | tg_level | tg_when
-+---+--+-
(0 rows)

no statement level trigger fired in this case, is this expected behaviour??

but if i am creating triggers on leaf partition, trigger is getting fired.

CREATE TRIGGER pt_trigger_after_p1 AFTER UPDATE ON pt1 FOR EACH STATEMENT
EXECUTE PROCEDURE process_pt_trigger();
CREATE TRIGGER pt_trigger_before_p1 BEFORE UPDATE ON pt1 FOR EACH STATEMENT
EXECUTE PROCEDURE process_pt_trigger();

postgres=# UPDATE pt SET a = 5 WHERE a = 4;
UPDATE 1
postgres=# SELECT * FROM pt_trigger ORDER BY 1;
   tg_name| tg_table_name | tg_level  | tg_when
--+---+---+-
 pt_trigger_after_p1  | pt1   | STATEMENT | AFTER
 pt_trigger_before_p1 | pt1   | STATEMENT | BEFORE
(2 rows)

Thanks & Regards,
Rajkumar Raghuwanshi
QMG, EnterpriseDB Corporation


Re: [HACKERS] some review comments on logical rep code

2017-04-21 Thread Masahiko Sawada
On Fri, Apr 21, 2017 at 11:55 AM, Masahiko Sawada  wrote:
> On Fri, Apr 21, 2017 at 4:41 AM, Peter Eisentraut
>  wrote:
>> On 4/20/17 12:30, Fujii Masao wrote:
>>> I've pushed several patches, and there is now only one remaining patch.
>>> I posted the review comment on that patch, and I'm expecting that
>>> Masahiko-san will update the patch. So what about waiting for the updated
>>> version of the patch by next Monday (April 24th)?
>>
>> Thank you for pitching in.
>>
>> Where is your latest patch?
>>
>
> The remaining patch is 002_Reset_on_commit_launcher_wakeup_v3.patch

Oops, there is one more remaining issue that reported on this thread
and another one[1]. The latest patch is attached on [1].

[1] 
https://www.postgresql.org/message-id/20170406.212441.177025736.horiguchi.kyotaro%40lab.ntt.co.jp

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] some review comments on logical rep code

2017-04-21 Thread Masahiko Sawada
On Fri, Apr 21, 2017 at 1:19 AM, Fujii Masao  wrote:
> On Tue, Apr 18, 2017 at 5:16 PM, Masahiko Sawada  
> wrote:
>> On Tue, Apr 18, 2017 at 12:24 PM, Kyotaro HORIGUCHI
>>  wrote:
>>> Hi,
>>>
>>> Thank you for the revised version.
>>>
>>> At Mon, 17 Apr 2017 23:29:28 +0900, Masahiko Sawada  
>>> wrote in 
>>> 
 On Mon, Apr 17, 2017 at 9:13 PM, Masahiko Sawada  
 wrote:
 > On Mon, Apr 17, 2017 at 7:39 PM, Kyotaro HORIGUCHI
 >  wrote:
 >> At Mon, 17 Apr 2017 18:02:57 +0900, Masahiko Sawada 
 >>  wrote in 
 >> 

Re: [HACKERS] identity columns

2017-04-21 Thread Noah Misch
On Tue, Apr 18, 2017 at 11:53:36AM -0400, Peter Eisentraut wrote:
> On 4/18/17 02:07, Noah Misch wrote:
> > This PostgreSQL 10 open item is past due for your status update.  Kindly 
> > send
> > a status update within 24 hours, and include a date for your subsequent 
> > status
> > update.  Refer to the policy on open item ownership:
> > https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com
> 
> I am in disagreement with the submitter that this is a problem or what
> the solution should be.  The discussion is ongoing.  Note that the
> current state isn't actually broken, so it doesn't have to hold anything up.

I see.  If anyone in addition to the submitter thinks making a change in this
area qualifies as a PostgreSQL 10 open item, please speak up.  Otherwise, on
Monday, I'll reclassify this as a non-bug.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Partition-wise join for join between (declaratively) partitioned tables

2017-04-21 Thread Ashutosh Bapat
On Fri, Apr 21, 2017 at 1:34 AM, Robert Haas  wrote:
>> You seem to be suggesting that we keep as many sets of
>> partition bounds as there are base relations participating in the join
>> and then use appropriate partition bounds based on the columns in the
>> join conditions, so that we can use the same operator as used in the
>> join condition. That doesn't seem to be a good option since the
>> partition bounds will all have same values, only differing in their
>> binary representation because of differences in data types.
>
> Well, actually, I think it is a good option, as I wrote in
> http://postgr.es/m/CA+TgmoY-LiJ+_S7OijNU_r2y=dhsj539wtqa7cayj-hcecc...@mail.gmail.com

I guess, you are now confusing between partition bounds for a join
relation and partition bounds of base relation. Above paragraph is
about partition bounds of a join relation. I have already agreed that
we need to store partition bounds in RelOptInfo. For base relation
this is trivial; its RelOptInfo has to store partition bounds as
stored in the partition descriptor of corresponding partitioned table.
I am talking about partition bounds of a join relation. See below for
more explanation.

>
> In that email, my principal concern was allowing partition-wise join
> to succeed even with slightly different sets of partition boundaries
> on the two sides of the join; in particular, if we've got A with A1 ..
> A10 and B with B1 .. B10 and the DBA adds A11, I don't want
> performance to tank until the DBA gets around to adding B11.  Removing
> the partition bounds from the PartitionScheme and storing them
> per-RelOptInfo fixes that problem;

We have an agreement on this.

> the fact that it also solves this
> problem of what happens when we have different data types on the two
> sides looks to me like a second reason to go that way.

I don't see how is that fixed. For a join relation we need to come up
with one set of partition bounds by merging partition bounds of the
joining relation and in order to understand how to interpret the
datums in the partition bounds, we need to associate data types. The
question is which data type we should use if the relations being
joined have different data types associated with their respective
partition bounds.

Or are you saying that we don't need to associate data type with
merged partition bounds? In that case, I don't know how do we compare
the partition bounds of two relations?

In your example, A has partition key of type int8, has bound datums
X1.. X10. B has partition key of type int4 and has bounds datums X1 ..
X11. C has partition key type int2 and bound datums X1 .. X12. The
binary representation of X's is going to differ between A, B and C
although each Xk for A, B and C is equal, wherever exists. Join
between A and B will have merged bound datums X1 .. X10 (and X11
depending upon the join type). In order to match bounds of AB with C,
we need to know the data type of bounds of AB, so that we can choose
appropriate equality operator. The question is what should we choose
as data type of partition bounds of AB, int8 or int4. This is
different from applying join conditions between AB and C, which can
choose the right opfamily operator based on the join conditions.

>
> And there's a third reason, too, which is that the opfamily mechanism
> doesn't currently provide any mechanism for reasoning about which data
> types are "wider" or "narrower" in the way that you want.  In general,
> there's not even a reason why such a relationship has to exist;
> consider two data types t1 and t2 with opclasses t1_ops and t2_ops
> that are part of the same opfamily t_ops, and suppose that t1 can
> represent any positive integer and t2 can represent any even integer,
> or in general that each data type can represent some but not all of
> the values that can be represented by the other data type.  In such a
> case, neither would be "wider" than the other in the sense that you
> need; you essentially want to find a data type within the opfamily to
> which all values of any of the types involved in the query can be cast
> without error, but there is nothing today which requires such a data
> type to exist, and no way to identify which one it is.  In practice,
> for all of the built-in opfamilies that have more than one opclass,
> such a data type always exists but is not always unique -- in
> particular, datetime_ops contains date_ops, timestamptz_ops, and
> timestamp_ops, and either of the latter two is a plausible choice for
> the "widest" data type of the three.  But there's no way to figure
> that out from the opfamily or opclass information we have today.
>
> In theory, it would be possible to modify the opfamily machinery so
> that every opfamily designates an optional ordering of types from
> "narrowest" to "widest", such that saying t1 is-narrower-than t2 is a
> guarantee that every value of type t1 can be cast without error to a
> value of type t2.  But I think that's a bad plan.  It 

Re: [HACKERS] Quorum commit for multiple synchronous replication.

2017-04-21 Thread Noah Misch
On Fri, Apr 21, 2017 at 01:20:05PM +0900, Masahiko Sawada wrote:
> On Fri, Apr 21, 2017 at 12:02 PM, Noah Misch  wrote:
> > On Wed, Apr 19, 2017 at 01:52:53PM +0900, Masahiko Sawada wrote:
> >> On Wed, Apr 19, 2017 at 12:34 PM, Noah Misch  wrote:
> >> > On Sun, Apr 16, 2017 at 07:25:28PM +0900, Fujii Masao wrote:
> >> >> As I told firstly this is not a bug. There are some proposals for 
> >> >> better design
> >> >> of priority column in pg_stat_replication, but we've not reached the 
> >> >> consensus
> >> >> yet. So I think that it's better to move this open item to "Design 
> >> >> Decisions to
> >> >> Recheck Mid-Beta" section so that we can hear more opinions.
> >> >
> >> > I'm reading that some people want to report NULL priority, some people 
> >> > want to
> >> > report a constant 1 priority, and nobody wants the current behavior.  Is 
> >> > that
> >> > an accurate summary?
> >>
> >> Yes, I think that's correct.
> >
> > Okay, but ...
> >
> >> FWIW the reason of current behavior is that it would be useful for the
> >> user who is willing to switch from ANY to FIRST. They can know which
> >> standbys will become sync or potential.
> >
> > ... does this mean you personally want to keep the current behavior?  If 
> > not,
> > has some other person stated a wish to keep the current behavior?
> 
> No, I want to change the current behavior. IMO it's better to set
> priority 1 to all standbys in quorum set. I guess there is no longer
> person who supports the current behavior.

In that case, this open item is not eligible for section "Design Decisions to
Recheck Mid-Beta".  That section is for items where we'll probably change
nothing, but we plan to recheck later just in case.  Here, we expect to change
the behavior; the open question is which replacement behavior to prefer.

Fujii, as the owner of this open item, you are responsible for moderating the
debate until there's adequate consensus to make a particular change or to keep
the current behavior after all.  Please proceed to do that.  Beta testers
deserve a UI they may like, not a UI you already plan to change later.

Thanks,
nm


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers