date:20170601

Re: [HACKERS] [PATCH] Fixed malformed error message on malformed SCRAM message.

2017-06-01 Thread Noah Misch

On Mon, May 29, 2017 at 01:43:26PM -0700, Michael Paquier wrote:
> On Mon, May 29, 2017 at 1:38 PM, Daniele Varrazzo
>  wrote:
> > Patch attached
> 
> Right. I am adding that to the list of open items, and Heikki in CC
> will likely take care of it.

[Action required within three days.  This is a generic notification.]

The above-described topic is currently a PostgreSQL 10 open item.  Heikki,
since you committed the patch believed to have created it, you own this open
item.  If some other commit is more relevant or if this does not belong as a
v10 open item, please let us know.  Otherwise, please observe the policy on
open item ownership[1] and send a status update within three calendar days of
this message.  Include a date for your subsequent status update.  Testers may
discover new open items at any time, and I want to plan to get them all fixed
well in advance of shipping v10.  Consequently, I will appreciate your efforts
toward speedy resolution.  Thanks.

[1] 
https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Get stuck when dropping a subscription during synchronizing table

2017-06-01 Thread Noah Misch

On Thu, Jun 01, 2017 at 12:00:33AM -0400, Peter Eisentraut wrote:
> On 5/31/17 02:51, Noah Misch wrote:
> > On Tue, May 30, 2017 at 01:30:35AM +, Noah Misch wrote:
> >> On Thu, May 18, 2017 at 10:27:51PM -0400, Peter Eisentraut wrote:
> >>> On 5/18/17 11:11, Noah Misch wrote:
>  IMMEDIATE ATTENTION REQUIRED.  This PostgreSQL 10 open item is past due 
>  for
>  your status update.  Please reacquaint yourself with the policy on open 
>  item
>  ownership[1] and then reply immediately.  If I do not hear from you by
>  2017-05-19 16:00 UTC, I will transfer this item to release management 
>  team
>  ownership without further notice.
> >>>
> >>> There is no progress on this issue at the moment.  I will report again
> >>> next Wednesday.
> >>
> >> IMMEDIATE ATTENTION REQUIRED.  This PostgreSQL 10 open item is long past 
> >> due
> >> for your status update.  If I do not hear one from you by 2017-05-31 02:00
> >> UTC, I will transfer this item to release management team ownership without
> >> further notice.
> > 
> > This PostgreSQL 10 open item now needs a permanent owner.  Would any other
> > committer like to take ownership?  If this role interests you, please read
> > this thread and the policy linked above, then send an initial status update
> > bearing a date for your subsequent status update.  If the item does not 
> > have a
> > permanent owner by 2017-06-03 07:00 UTC, I will investigate feature removals
> > that would resolve the item.
> 
> It seems I lost track of this item between all the other ones.  I will
> continue to work on this item.  We have patches proposed and I will work
> on committing them until Friday.

If any other committer cares about logical replication features in v10, I'd
recommend he take ownership of this open item despite your plan to work on it.
Otherwise, if you miss fixing this on Friday, it will be too late for others
to volunteer.

> I think I now have updates posted on all my items.

As of your writing that, two of your open items had no conforming status
update, and they still don't:

- Background worker display in pg_stat_activity (logical replication especially)
- ALTER SUBSCRIPTION REFRESH and table sync worker


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Fix a typo in predicate.c

2017-06-01 Thread Masahiko Sawada

Hi,

While reading predicate lock source code, I found a comment typo in
predicate.c file.
Attached patch fixes it.

s/scrach/scratch/

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


fix_typo_in_predicate_c.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hash Functions

2017-06-01 Thread Jeff Davis

On Thu, Jun 1, 2017 at 11:25 AM, Andres Freund  wrote:
> Secondly, I think that's to a significant degree caused by
> the fact that in practice people way more often partition on types like
> int4/int8/date/timestamp/uuid rather than text - there's rarely good
> reasons to do the latter.

Once we support more pushdowns to partitions, the only question is:
what are your join keys and what are your grouping keys?

Text is absolutely a normal join key or group key. Consider joins on a
user ID or grouping by a model number.

Regards,
 Jeff Davis

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hash Functions

2017-06-01 Thread Jeff Davis

On Thu, Jun 1, 2017 at 10:59 AM, Robert Haas  wrote:
> 1. Are the new problems worse than the old ones?
>
> 2. What could we do about it?

Exactly the right questions.

1. For range partitioning, I think it's "yes, a little". As you point
out, there are already some weird edge cases -- the main way range
partitioning would make the problem worse is simply by having more
users.

But for hash partitioning I think the problems will become more
substantial. Different encodings, endian issues, etc. will be a
headache for someone, and potentially a bad day if they are urgently
trying to restore on a new machine. We should remember that not
everyone is a full-time postgres DBA, and users might reasonably think
that the default options to pg_dump[all] will give them a portable
dump.

2. I basically see two approaches to solve the problem:
  (a) Tom suggested at PGCon that we could have a GUC that
automatically causes inserts to the partition to be re-routed through
the parent. We could discuss whether to always route through the
parent, or do a recheck on the partition constrains and only reroute
tuples that will fail it. If the user gets into trouble, the worst
that would happen is a helpful error message telling them to set the
GUC. I like this idea.
  (b) I had suggested before that we could make the default text dump
(and the default output from pg_restore, for consistency) route
through the parent. Advanced users would dump with -Fc, and pg_restore
would support an option to do partition-wise loading. To me, this is
simpler, but users might forget to use (or not know about) the
pg_restore option and then it would load more slowly. Also, the ship
is sailing on range partitioning, so we might prefer option (a) just
to avoid making any changes.

I am fine with either option.

> 2. Add an option like --dump-partition-data-with-parent.  I'm not sure
> who originally proposed this, but it seems that everybody likes it.
> What we disagree about is the degree to which it's sufficient.  Jeff
> Davis thinks it doesn't go far enough: what if you have an old
> plain-format dump that you don't want to hand-edit, and the source
> database is no longer available?  Most people involved in the
> unconference discussion of partitioning at PGCon seemed to feel that
> wasn't really something we should be worry about too much.  I had been
> taking that position also, more or less because I don't see that there
> are better alternatives.

If the suggestions above are unacceptable, and we don't come up with
anything better, then of course we have to move on. I am worrying now
primarily because now is the best time to worry; I don't expect any
horrible outcome.

> 3. Implement portable hash functions (Jeff Davis or me, not sure
> which).  Andres scoffed at this idea, but I still think it might have
> legs.

I think it reduces the problem, which has value, but it's hard to make
it rock-solid.

> make fast.  Those two things also solve different parts of the
> problem; one is insulating the user from a difference in hardware
> architecture, while the other is insulating the user from a difference
> in user-selected settings.  I think that the first of those things is
> more important than the second, because it's easier to change your
> settings than it is to change your hardware.

Good point.

Regards,
 Jeff Davis

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Race conditions with WAL sender PID lookups

2017-06-01 Thread Noah Misch

On Sun, May 21, 2017 at 08:19:34AM +0200, Erik Rijkers wrote:
> On 2017-05-21 06:37, Erik Rijkers wrote:
> >With this patch on current master my logical replication tests
> >(pgbench-over-logical-replication) run without errors for the first
> >time in many days (even weeks).
> 
> Unfortunately, just now another logical-replication failure occurred.  The
> same as I have seen all along:

This creates a rebuttable presumption of logical replication being the cause
of this open item.  (I am not stating an opinion on whether this rebuttable
presumption is true or is false.)  As long as that stands and Alvaro has not
explicitly requested ownership of this open item, Peter owns it.

On Tue, May 30, 2017 at 11:13:37AM -0700, Michael Paquier wrote:
> On Tue, May 30, 2017 at 11:09 AM, Peter Eisentraut
>  wrote:
> > I don't think this is my item.  Most of the behavior is old, and
> > pg_stat_get_wal_receiver() is from commit
> > b1a9bad9e744857291c7d5516080527da8219854.
> >
> > I would appreciate if another committer can take the lead on this.
> 
> Those things are on Alvaro's plate for the WAL receiver portion, and I
> was the author of those patches. The WAL sender portion is older
> though, but it seems crazy to me to not fix both things at the same
> time per their similarities.

As a 9.6 commit, b1a9bad cannot be the cause of a v10 open item.  If a v10
commit expanded the consequences of a pre-existing bug, the committer of that
v10 work owns this open item.  If the bug's consequences are the same in v9.6
and v10, this is ineligible to be an open item.  Which applies?

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical replication - still unstable after all these months

2017-06-01 Thread Erik Rijkers


On 2017-06-02 00:46, Mark Kirkwood wrote:

On 31/05/17 21:16, Petr Jelinek wrote:

I'm seeing a new failure with the patch applied - this time the
history table has missing rows. Petr, I'll put back your access :-)


Is this error during 1-minute runs?

I'm asking because I've moved back to longer (1-hour) runs (no errors so 
far), and I'd like to keep track of what the most 'vulnerable' 
parameters are.


thanks,

Erik Rijkers


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] BEFORE trigger can cause undetected partition constraint violation

2017-06-01 Thread Amit Langote

On 2017/06/02 10:36, Robert Haas wrote:
> On Thu, Jun 1, 2017 at 6:05 PM, Tom Lane  wrote:
>> Without having checked the code, I imagine the reason for this is
>> that BEFORE triggers are fired after tuple routing occurs.
> 
> Yep.
> 
>> Re-ordering that seems problematic, because what if you have different
>> triggers on different partitions?  We'd have to forbid that, probably
>> by saying that only the parent table's BEFORE ROW triggers are fired,
>> but that seems not very nice.
> 
> The parent hasn't got any triggers; that's forbidden.
> 
>> The alternative is to forbid BEFORE triggers on partitions to alter
>> the routing result, probably by rechecking the partition constraint
>> after trigger firing.
> 
> That's what we need to do.  Until we do tuple routing, we don't know
> which partition we're addressing, so we don't know which triggers
> we're firing.  So the only way to prevent this is to recheck.  Which I
> think is supposed to work already, but apparently doesn't.

That has to do with the assumption written down in the following portion
of a comment in InitResultRelInfo():

/*
 * If partition_root has been specified, that means we are building the
 * ResultRelInfo for one of its leaf partitions.  In that case, we need
 * *not* initialize the leaf partition's constraint, but rather the
 * partition_root's (if any).

which, as it turns out, is wrong.  It completely disregards the fact that
BR triggers are executed after tuple-routing and can change the row.

Attached patch makes InitResultRelInfo() *always* initialize the
partition's constraint, that is, regardless of whether insert/copy is
through the parent or directly on the partition.  I'm wondering if
ExecInsert() and CopyFrom() should check if it actually needs to execute
the constraint?  I mean it's needed if there exists BR insert triggers
which may change the row, but not otherwise.  The patch currently does not
implement that check.

Thanks,
Amit
From db67c73ea1924dc205ac088eaa63c93e20eb3fa0 Mon Sep 17 00:00:00 2001
From: amit 
Date: Fri, 2 Jun 2017 12:11:17 +0900
Subject: [PATCH] Check the partition constraint even after tuple-routing

Partition constraint expressions are not initialized when inserting
through the parent because tuple-routing is said to implicitly preserve
the constraint.  But BR triggers may change a row into one that violates
the partition constraint and they are executed after tuple-routing, so
any such change must be detected by checking the partition constraint
explicitly.  So, initialize them regardless of whether inserting through
the parent or directly into the partition.
---
 src/backend/commands/copy.c|  2 +-
 src/backend/executor/execMain.c| 37 --
 src/backend/executor/nodeModifyTable.c |  3 ++-
 src/test/regress/expected/insert.out   | 13 
 src/test/regress/sql/insert.sql| 10 +
 5 files changed, 34 insertions(+), 31 deletions(-)

diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 84b1a54cb9..cc2d75d167 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -2642,7 +2642,7 @@ CopyFrom(CopyState cstate)
{
/* Check the constraints of the tuple */
if (cstate->rel->rd_att->constr ||
-   resultRelInfo->ri_PartitionCheck)
+   (resultRelInfo->ri_PartitionCheck != 
NIL))
ExecConstraints(resultRelInfo, slot, 
estate);
 
if (useHeapMultiInsert)
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 4a899f1eb5..72872a2420 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -1339,35 +1339,14 @@ InitResultRelInfo(ResultRelInfo *resultRelInfo,
resultRelInfo->ri_projectReturning = NULL;
 
/*
-* If partition_root has been specified, that means we are building the
-* ResultRelInfo for one of its leaf partitions.  In that case, we need
-* *not* initialize the leaf partition's constraint, but rather the
-* partition_root's (if any).  We must do that explicitly like this,
-* because implicit partition constraints are not inherited like user-
-* defined constraints and would fail to be enforced by 
ExecConstraints()
-* after a tuple is routed to a leaf partition.
-*/
-   if (partition_root)
-   {
-   /*
-* Root table itself may or may not be a partition; 
partition_check
-* would be NIL in the latter case.
-*/
-   partition_check = RelationGetPartitionQual(partition_root);
-
-   /*
-* This is not our own partition constraint, but rather an 
ancestor's.
-

Re: [HACKERS] walsender & parallelism

2017-06-01 Thread Andres Freund

On 2017-06-01 22:17:57 -0400, Peter Eisentraut wrote:
> On 6/1/17 00:06, Andres Freund wrote:
> > On 2017-05-31 23:51:08 -0400, Peter Eisentraut wrote:
> >> I think the easiest and safest thing to do now is to just prevent
> >> parallel plans in the walsender.  See attached patch.  This prevents the
> >> hang in the select_parallel tests run under your new test setup.
> > I'm not quite sure I can buy this.  The lack of wired up signals has
> > more problems than just hurting parallelism.
> 
> Which problems are those?

For example not wiring up sigusr1, which is the cause of the paralellism
hang, breaks several things including recovery conflict interrupts.
Which means HS standby is affected. Just forbidding parallelism doesn't
address this.

> I can see from the code what they might be,
> but which ones actually happen in practice?  And are there any
> regressions from 9.6?

Yes. For example the issue that atm walsender backends don't ever quit
when executing sql statements is new.

> If someone wants to work all this out, that would be great.  But I'm
> here mainly to fix the one reported problem.

I'll deal with the issues in this thread.

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch: Add --no-comments to skip COMMENTs with pg_dump

2017-06-01 Thread Stephen Frost

Tom,

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> Robert Haas  writes:
> > On Tue, May 30, 2017 at 8:55 PM, David G. Johnston
> >  wrote:
> >>> Having --no-comments seems generally useful to me, in any case.
> 
> >> It smacks of being excessive to me.
> 
> > It sounds perfectly sensible to me.  It's not exactly an elegant
> > solution to the original problem, but it's a reasonable switch on its
> > own merits.
> 
> I dunno.  What's the actual use-case, other than as a bad workaround
> to a problem we should fix a different way?

Perhaps it's a bit of a stretch, I'll admit, but certainly "minmization"
and "obfuscation" come to mind, which are often done in other fields and
might well apply in very specific cases to PG schemas.  I can certainly
also see a case being made that you'd like to extract a schema-only dump
which doesn't include any comments because the comments have information
that you'd rather not share publicly, while the schema itself is fine to
share.  Again, a bit of a stretch, but not unreasonable.

Otherwise, well, for my 2c anyway, feels like it's simply fleshing out
the options which correspond to the different components of an object.
We provide similar for ACLs, security labels, and tablespace
association.  If there are other components of an object which we should
consider adding an option to exclude, I'm all ears, personally
(indexes?).  Also, with the changes that I've made to pg_dump, I'm
hopeful that such options will end up requiring a very minor amount of
code to implement.  There's more work to be done in that area too,
certainly, but I do feel like it's better than it was.

I definitely would like to see more flexibility in this area in general.

Thanks!

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] walsender & parallelism

2017-06-01 Thread Peter Eisentraut

On 6/1/17 00:06, Andres Freund wrote:
> On 2017-05-31 23:51:08 -0400, Peter Eisentraut wrote:
>> I think the easiest and safest thing to do now is to just prevent
>> parallel plans in the walsender.  See attached patch.  This prevents the
>> hang in the select_parallel tests run under your new test setup.
> I'm not quite sure I can buy this.  The lack of wired up signals has
> more problems than just hurting parallelism.

Which problems are those?  I can see from the code what they might be,
but which ones actually happen in practice?  And are there any
regressions from 9.6?

If someone wants to work all this out, that would be great.  But I'm
here mainly to fix the one reported problem.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch: Add --no-comments to skip COMMENTs with pg_dump

2017-06-01 Thread Tom Lane

Robert Haas  writes:
> On Tue, May 30, 2017 at 8:55 PM, David G. Johnston
>  wrote:
>>> Having --no-comments seems generally useful to me, in any case.

>> It smacks of being excessive to me.

> It sounds perfectly sensible to me.  It's not exactly an elegant
> solution to the original problem, but it's a reasonable switch on its
> own merits.

I dunno.  What's the actual use-case, other than as a bad workaround
to a problem we should fix a different way?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Error while creating subscription when server is running in single user mode

2017-06-01 Thread Andres Freund

On 2017-06-01 21:42:41 -0400, Peter Eisentraut wrote:
> We should look at what the underlying problem is before we prohibit
> anything at a high level.

I'm not sure there's any underlying issue here, except being in single
user mode.

> When I try it, I get a
> 
> TRAP: FailedAssertion("!(event->fd != (-1))", File: "latch.c", Line: 861)
> 
> which might indicate that there is a more general problem with latch use
> in single-user mode.

That just means that the latch isn't initialized.  Which makes:

> If I remove that assertion, things work fine after that.  The originally
> reported error "epoll_ctl() failed: Bad file descriptor" might indicate
> that there is platform-dependent behavior.

quite unsurprising.  I'm not sure how this hints at platform dependent
behaviour?

libpqrcv_connect() uses MyProc->procLatch, which doesn't exist/isn't
initialized in single user mode.  I'm very unclear why that code uses
MyProc->procLatch rather than MyLatch, but that'd not change anything -
the tablesync stuff etc would still not work.

- Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch: Add --no-comments to skip COMMENTs with pg_dump

2017-06-01 Thread Robert Haas

On Tue, May 30, 2017 at 8:55 PM, David G. Johnston
 wrote:
>> Having --no-comments seems generally useful to me, in any case.
>
> It smacks of being excessive to me.

It sounds perfectly sensible to me.  It's not exactly an elegant
solution to the original problem, but it's a reasonable switch on its
own merits.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Error while creating subscription when server is running in single user mode

2017-06-01 Thread Peter Eisentraut

On 6/1/17 04:49, Dilip Kumar wrote:
> On Thu, Jun 1, 2017 at 1:02 PM, Michael Paquier
>  wrote:
>> Thanks, this looks correct to me at quick glance.
>>
>> +if (!IsUnderPostmaster)
>> +ereport(FATAL,
>> +(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
>> +errmsg("subscription commands are not supported by
>> single-user servers")));
>> The messages could be more detailed, like directly the operation of
>> CREATE/ALTER/DROP SUBCRIPTION in each error message. But that's a nit.
> 
> Thanks for looking into it.  Yeah, I think it's better to give
> specific message instead of generic because we still support some of
> the subscription commands even in single-user mode i.e ALTER
> SUBSCRIPTION OWNER.  My patch doesn't block this command and there is
> no harm in supporting this in single-user mode but does this make any
> sense?  We may create some use case like creation subscription in
> normal mode and then ALTER OWNER in single user mode but it makes
> little sense to me.

We should look at what the underlying problem is before we prohibit
anything at a high level.

When I try it, I get a

TRAP: FailedAssertion("!(event->fd != (-1))", File: "latch.c", Line: 861)

which might indicate that there is a more general problem with latch use
in single-user mode.

If I remove that assertion, things work fine after that.  The originally
reported error "epoll_ctl() failed: Bad file descriptor" might indicate
that there is platform-dependent behavior.

I think the general problem is that the latch code that checks for
postmaster death does not handle single-user mode well.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical replication busy-waiting on a lock

2017-06-01 Thread Robert Haas

On Thu, Jun 1, 2017 at 2:28 PM, Andres Freund  wrote:
> On 2017-06-01 14:17:44 +0200, Petr Jelinek wrote:
>> Thinking more about this, I am not convinced it's a good idea to change
>> exports this late in the cycle. I still think it's best to do the xid
>> assignment only when the snapshot is actually exported but don't assign
>> xid when the export is only used by the local session (the new option in
>> PG10). That's one line change which impacts only logical
>> replication/decoding as opposed to everything else which uses exported
>> snapshots.
>
> I'm not quite convinced by this argument.  Exported snapshot contents
> are ephemeral, we can change the format at any time.  The wait is fairly
> annoying for every user of logical decoding.  For me the combination of
> those two fact implies that we should rather fix this properly.

+1.  The change Andres is proposing doesn't sound like it will be
terribly high-impact, and I think we'll be happier in the long run if
we install real fixes instead of kludging it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] BEFORE trigger can cause undetected partition constraint violation

2017-06-01 Thread Robert Haas

On Thu, Jun 1, 2017 at 6:05 PM, Tom Lane  wrote:
> Without having checked the code, I imagine the reason for this is
> that BEFORE triggers are fired after tuple routing occurs.

Yep.

> Re-ordering that seems problematic, because what if you have different
> triggers on different partitions?  We'd have to forbid that, probably
> by saying that only the parent table's BEFORE ROW triggers are fired,
> but that seems not very nice.

The parent hasn't got any triggers; that's forbidden.

> The alternative is to forbid BEFORE triggers on partitions to alter
> the routing result, probably by rechecking the partition constraint
> after trigger firing.

That's what we need to do.  Until we do tuple routing, we don't know
which partition we're addressing, so we don't know which triggers
we're firing.  So the only way to prevent this is to recheck.  Which I
think is supposed to work already, but apparently doesn't.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical replication and PANIC during shutdown checkpoint in publisher

2017-06-01 Thread Robert Haas

On Thu, Jun 1, 2017 at 6:05 PM, Andres Freund  wrote:
> I'm a unhappy how this is reusing SIGINT for WalSndLastCycleHandler.
> Normally INT is used cancel interrupts, and since walsender is now also
> working as a normal backend, this overlap is bad.

Yep, that's bad.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [JDBC] [HACKERS] Channel binding support for SCRAM-SHA-256

2017-06-01 Thread Robert Haas

On Thu, Jun 1, 2017 at 9:13 PM, Michael Paquier
 wrote:
> On Fri, Jun 2, 2017 at 10:08 AM, Stephen Frost  wrote:
>> What I find somewhat objectionable is the notion that if we don't have 5
>> different TLS/SSL implementations supported in PG and that we've tested
>> that channel binding works correctly among all combinations of all of
>> them, then we can't accept a patch implementing it.
>
> It seems to me that any testing in this area won't fly high as long as
> there is no way to enforce the list of TLS implementations that a
> server allows. There have been discussions about being able to control
> that after the OpenSSL vulnerabilities that were protocol-specific and
> there were even patches adding GUCs for this purpose. At the end,
> everything has been rejected as Postgres enforces the use of the
> newest one when doing the SSL handshake.

TLS implementations, or TLS versions?  What does the TLS version have
to do with this issue?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] "create publication..all tables" ignore 'partition not supported' error

2017-06-01 Thread Peter Eisentraut

On 5/31/17 21:26, Peter Eisentraut wrote:
> On 5/31/17 02:17, Kuntal Ghosh wrote:
>> On Wed, May 31, 2017 at 12:58 AM, Masahiko Sawada  
>> wrote:
>>>
>>> I'd say we can fix this issue by just changing the query. Attached
>>> patch changes the query so that it can handle publication name
>>> correctly, the query gets complex, though.
>>>
>> In is_publishable_class function, there are four conditions to decide
>> whether this is a publishable class or not.
>>
>> 1. relkind == RELKIND_RELATION
>> 2. IsCatalogClass()
>> 3. relpersistence == 'p'
>> 4. relid >= FirstNormalObjectId /* Skip tables created during initdb */
>>
>> I think the modified query should have a check for the fourth condition as 
>> well.
> 
> The query should be fixed like in the attached patch.
> pg_get_publication_tables() ends up calling is_publishable_class()
> internally.

committed

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical replication and PANIC during shutdown checkpoint in publisher

2017-06-01 Thread Andres Freund

On 2017-06-02 10:05:21 +0900, Michael Paquier wrote:
> On Fri, Jun 2, 2017 at 9:29 AM, Andres Freund  wrote:
> > On 2017-06-02 08:38:51 +0900, Michael Paquier wrote:
> >> On Fri, Jun 2, 2017 at 7:05 AM, Andres Freund  wrote:
> >> > I'm a unhappy how this is reusing SIGINT for WalSndLastCycleHandler.
> >> > Normally INT is used cancel interrupts, and since walsender is now also
> >> > working as a normal backend, this overlap is bad.  Even for plain
> >> > walsender backend this seems bad, because now non-superusers replication
> >> > users will terminate replication connections when they do
> >> > pg_cancel_backend().  For replication=dbname users it's especially bad
> >> > because there can be completely legitimate uses of pg_cancel_backend().
> >>
> >> Signals for WAL senders are set in WalSndSignals() which uses SIG_IGN
> >> for SIGINT now in ~9.6, and StatementCancelHandler does not get set up
> >> for a non-am_walsender backend. Am I missing something?
> >
> > Yes, but nothing in those observeration actually addresses my point?
> 
> I am still confused by your previous email, which, at least it seems
> to me, implies that logical WAL senders have been working correctly
> with query cancellations. Now SIGINT is just ignored, which means that
> pg_cancel_backend() has never worked for WAL senders until now, and
> this behavior has not changed with 086221c. So there is no new
> breakage introduced by this commit. I get your point to reuse SIGINT
> for query cancellations though, but that's a new feature.

The issue is that the commit made a non-existant feature
(pg_cancel_backend() to walsenders) into a broken one (pg_cancel_backend
terminates walsenders).  Additionally v10 added something new
(walsenders executing SQL), and that will need at least some signal
handling fixes - hard to do if e.g. SIGINT is reused for something else.

> > a) upon shutdown checkpointer (so we can use procsignal), not
> >postmaster, sends PROCSIG_WALSND_INIT_STOPPING to all walsenders (so
> >we don't have to use up a normal signal handler)
> 
> You'll need a second one that wakes up the latch of the WAL senders to
> send more WAL records.

Don't think so, procsignal_sigusr1_handler serves quite well for that.
There's nearby discussion that we need to do so anyway, to fix recovery
conflict interrupts, parallelism interrupts and such.


> > b) Upon reception walsenders immediately exit if !replication_active,
> >otherwise sets got_STOPPING
> 
> Okay, that's what happens now anyway, any new replication command
> received results in an error. I actually prefer the way of doing in
> HEAD, which at least reports an error.

Err, no. What happens right now is that plainly nothing is done if a
connection is idle or busy executing things.  Only if a new command is
sent we error out - that makes very little sense.


> > c) Logical decoding will finish if got_STOPPING is sent, *WITHOUT*, as
> >currently done, moving to WALSNDSTATE_STOPPING.  Not yet quite sure
> >how to best handle the WALSNDSTATE_STOPPING transition in WalSndLoop().
> 
> Wouldn't it make sense to have the logical receivers be able to
> receive WAL up to the end of checkpoint record?

Yea, that's what I'm doing.  For that we really only need to change the
check in WalSndWaitForWal() check of got_SIGINT to got_STOPPING, and add
a XLogSendLogical() check in the WalSndCaughtUp if() that sets
got_SIGUSR2 *without* setting WALSNDSTATE_STOPPING (otherwise we'd
possibly continue to trigger wal records until the send buffer is
emptied).

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [JDBC] [HACKERS] Channel binding support for SCRAM-SHA-256

2017-06-01 Thread Michael Paquier

On Fri, Jun 2, 2017 at 10:08 AM, Stephen Frost  wrote:
> What I find somewhat objectionable is the notion that if we don't have 5
> different TLS/SSL implementations supported in PG and that we've tested
> that channel binding works correctly among all combinations of all of
> them, then we can't accept a patch implementing it.

It seems to me that any testing in this area won't fly high as long as
there is no way to enforce the list of TLS implementations that a
server allows. There have been discussions about being able to control
that after the OpenSSL vulnerabilities that were protocol-specific and
there were even patches adding GUCs for this purpose. At the end,
everything has been rejected as Postgres enforces the use of the
newest one when doing the SSL handshake.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Perfomance bug in v10

2017-06-01 Thread Tom Lane

David Rowley  writes:
> On 1 June 2017 at 04:16, Teodor Sigaev  wrote:
>> I found an example where v10 chooses extremely non-optimal plan:
>> ...

> This is all caused by get_variable_numdistinct() deciding that all
> values are distinct because ntuples < DEFAULT_NUM_DISTINCT.

Uh, no.  I traced through this and found that with your hack in place,
final_cost_nestloop was costing the desired nestloop paths at less
than they were costed in HEAD.  That makes no sense: surely, accounting
for the fact that the executor might stop early should never result in
a higher cost estimate than ignoring that possibility does.  After
some navel-gazing I realized that there is an ancient oversight in
final_cost_nestloop's cost estimate for semi/anti-join cases.  To wit,
in its attempt to ensure that it always charges inner_run_cost at least
once, it may end up charging that twice.  Specifically, what we get in
this case is outer_path_rows = 1, outer_matched_rows = 0 (implying one
unmatched outer row) which will cause the existing logic to add both
inner_run_cost and inner_rescan_run_cost to the cost estimate, as if
we needed to run the inner plan twice not once.  Correcting that, as in
the attached draft patch, fixes Teodor's example.

Now, this patch also causes some changes in the regression test outputs
that are a bit like your patch's side-effects, but on close inspection
I am not at all convinced that these changes are wrong.  As an example,
looking at the first change without "costs off":

regression=# explain (verbose)
  select * from j1
  inner join (select distinct id from j3) j3 on j1.id = j3.id;
QUERY PLAN 
---
 Nested Loop  (cost=1.03..2.12 rows=1 width=8)
   Output: j1.id, j3.id
   Inner Unique: true
   Join Filter: (j1.id = j3.id)
   ->  Unique  (cost=1.03..1.04 rows=1 width=4)
 Output: j3.id
 ->  Sort  (cost=1.03..1.03 rows=2 width=4)
   Output: j3.id
   Sort Key: j3.id
   ->  Seq Scan on public.j3  (cost=0.00..1.02 rows=2 width=4)
 Output: j3.id
   ->  Seq Scan on public.j1  (cost=0.00..1.03 rows=3 width=4)
 Output: j1.id
(13 rows)

Note that both sides of this join are known unique, so that we'd produce
an inner-unique-true join from either direction of joining.  I don't think
it's so insane to put the larger rel on the inside, because with a size-1
rel on the inside, there is nothing to be gained from stop-early behavior.
Moreover, this way we don't need a Materialize node (since we're not
predicting the inner to be scanned more than once anyway).  So the fact
that this way is estimated to be cheaper than the other way is not that
surprising after all.  Yeah, it's a bit brittle in the face of the outer
rel producing more rows than we expect, but that's true of every nestloop
join plan we ever make.  Fixing that is not a task for v10.

Teodor, could you check if this patch fixes your real-world problem?

regards, tom lane

diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index cdb18d9..324c8c1 100644
*** a/src/backend/optimizer/path/costsize.c
--- b/src/backend/optimizer/path/costsize.c
*** final_cost_nestloop(PlannerInfo *root, N
*** 2287,2306 
  			 * difficult to estimate whether that will happen (and it could
  			 * not happen if there are any unmatched outer rows!), so be
  			 * conservative and always charge the whole first-scan cost once.
  			 */
  			run_cost += inner_run_cost;

  			/* Add inner run cost for additional outer tuples having matches */
! 			if (outer_matched_rows > 1)
! run_cost += (outer_matched_rows - 1) * inner_rescan_run_cost * inner_scan_frac;
! 
! 			/* Add inner run cost for unmatched outer tuples */
! 			run_cost += (outer_path_rows - outer_matched_rows) *
! inner_rescan_run_cost;

! 			/* And count the unmatched join tuples as being processed */
! 			ntuples += (outer_path_rows - outer_matched_rows) *
! inner_path_rows;
  		}
  	}
  	else
--- 2287,2317 
  			 * difficult to estimate whether that will happen (and it could
  			 * not happen if there are any unmatched outer rows!), so be
  			 * conservative and always charge the whole first-scan cost once.
+ 			 * We consider this charge to correspond to the first unmatched
+ 			 * outer row, unless there isn't one in our estimate, in which
+ 			 * case blame it on the first matched row.
  			 */
+ 			double		outer_unmatched_rows;
+ 
+ 			outer_unmatched_rows = outer_path_rows - outer_matched_rows;
+ 
+ 			/* While at it, count unmatched join tuples as being processed */
+ 			ntuples += outer_unmatched_rows * inner_path_rows;
+ 
+ 			/* Now add the forced full scan, and decrement appropriate count */
  			run_cost += inner_run_cost;
+ 			if

Re: [JDBC] [HACKERS] Channel binding support for SCRAM-SHA-256

2017-06-01 Thread Stephen Frost

Robert,

* Robert Haas (robertmh...@gmail.com) wrote:
> On Thu, Jun 1, 2017 at 11:50 AM, Stephen Frost  wrote:
> > I certainly wouldn't object to someone working on this, but I feel like
> > it's a good deal more work than perhaps you're realizing (and I tend to
> > think trying to use the Windows SSL implementation would increase the
> > level of effort, not minimize it).
> 
> I agree that it's a fair amount of work, but if nobody does it, then I
> think it's pretty speculative to suppose that the feature actually
> works correctly.

We've considered "works with OpenSSL" (and, to some extent, JDBC, but
I'm pretty sure that came later and just happened to be able to work...)
to be sufficient to include things like client-side certificate based
authentication, so this is raising the bar quite a bit for a feature
that, while important and valuable, frankly isn't as important as
client-side cert auth.

> > Perhaps it wouldn't be too bad to
> > write a one-off piece of code which just connects and then returns the
> > channel binding information on each side and then one could hand-check
> > that what's returned matches what it's supposed to based on the RFC, but
> > if it doesn't, then what?
> 
> Then something's broken and we need to fix it before we start
> committing patches.

... Or that implementation doesn't follow the RFC, which is what I was
getting at.

> > In the specific case we're talking about,
> > there's two approaches in the RFC and it seems like we should support
> > both because at least our current JDBC implementation only works with
> > one, and ideally we can allow for that to be extended to other methods,
> > but I wouldn't want to implement a method that only works on Windows SSL
> > because that implementation, for whatever reason, doesn't follow either
> > of the methods available in the RFC.
> 
> I am very skeptical that if two people on two different SSL
> interpretations both do something that appears to comply with the RFC,
> we can just assume those two people will get the same answer.  In an
> ideal world, that would definitely work, but in the real world, I
> think it needs to be tested or you don't really know.  I agree that if
> a given SSL implementation is such that it can't support either of the
> possible channel binding methods, then that's not our problem; people
> using that SSL implementation just can't get channel binding, and if
> they don't like that they can switch SSL implementations.  But I also
> think that if two SSL implementations both claim to support what's
> needed to make channel binding work and we don't do any testing that
> they actually interoperate, then I don't think we can really know that
> we've got it right.

I'm all for doing testing, as I've tried to point out a few times, and I
would like to see an implementation which is able to be extended in the
future to other channel binding methods, as we clearly need to support
at least the two listed in the RFC based on this discussion and there
might be a still better way down the road anyway.

> Another way of validating Michael's problem which I would find
> satisfactory is to go look at some other OpenSSL-based implementations
> of channel binding.  If in each case they are using the same functions
> that Michael used in the same way, then that's good evidence that his
> implementation is doing the right thing, especially if any of those
> implementations also support other SSL implementations and have
> verified that the OpenSSL mechanism does in fact interoperate.

I don't have any issue with asking that Michael, or someone, to go look
at other OpenSSL-using implementations which support channel binding.

> I don't really know why we're arguing about this.  It seems blindingly
> obvious to me that we can't just take it on faith that the functions
> Michael identified for this purpose are the right ones and will work
> perfectly in complete compliance with the RFC.  We are in general very
> reluctant to make such assumptions and if we were going to start,
> changes that affect wire protocol compatibility wouldn't be my first
> choice.  Is it really all that revolutionary or controversial to
> suggest that this patch has not yet had enough validation to really
> know that it does what we want?  To me, it seems like verifying that a
> patch which supposedly implements a standard interoperates with
> something other than itself is so obvious that it should barely need
> to be mentioned, let alone become a bone of contention.

As I said in an earlier reply, I'm hopefuly that we're closer to
agreement here than we are disagreement, but there seems to be some
confusion regarding my position on this specific patch.  I'm not
advocating for this patch to be committed as-is or even anytime soon,
and I'm unsure of where I gave that impression.  I'm encouraged by the
ongoing discussion between Michael and Alvaro and hope to see a new
patch down the road which incorporates the results of

Re: [HACKERS] logical replication and PANIC during shutdown checkpoint in publisher

2017-06-01 Thread Michael Paquier

On Fri, Jun 2, 2017 at 9:29 AM, Andres Freund  wrote:
> On 2017-06-02 08:38:51 +0900, Michael Paquier wrote:
>> On Fri, Jun 2, 2017 at 7:05 AM, Andres Freund  wrote:
>> > I'm a unhappy how this is reusing SIGINT for WalSndLastCycleHandler.
>> > Normally INT is used cancel interrupts, and since walsender is now also
>> > working as a normal backend, this overlap is bad.  Even for plain
>> > walsender backend this seems bad, because now non-superusers replication
>> > users will terminate replication connections when they do
>> > pg_cancel_backend().  For replication=dbname users it's especially bad
>> > because there can be completely legitimate uses of pg_cancel_backend().
>>
>> Signals for WAL senders are set in WalSndSignals() which uses SIG_IGN
>> for SIGINT now in ~9.6, and StatementCancelHandler does not get set up
>> for a non-am_walsender backend. Am I missing something?
>
> Yes, but nothing in those observeration actually addresses my point?

I am still confused by your previous email, which, at least it seems
to me, implies that logical WAL senders have been working correctly
with query cancellations. Now SIGINT is just ignored, which means that
pg_cancel_backend() has never worked for WAL senders until now, and
this behavior has not changed with 086221c. So there is no new
breakage introduced by this commit. I get your point to reuse SIGINT
for query cancellations though, but that's a new feature.

> Some points:
>
> 1) 086221cf6b1727c2baed4703c582f657b7c5350e changes things so walsender
>backends use SIGINT for WalSndLastCycleHandler(), which is now
>triggerable by pg_cancel_backend(). Especially for logical rep
>walsenders it's not absurd sending that.
> 2) Walsenders now can run normal queries.
> 3) Afaict 086221cf6b1727c2baed4703c582f657b7c5350e doesn't really
>address the PANIC problem for database connected walsenders at all,
>because it doesn't even cancel non-replication commands.  I.e. an
>already running query can just continue to run.  Which afaict just
>entirely breaks shutdown.  If the connection is idle, or running a
>query, we'll just wait forever.
> 4) the whole logic introduced in the above commit doesn't actually
>appear to handle logical decoding senders properly - wasn't the whole
>issue at hand that those can write WAL in some case?  But
>nevertheless WalSndWaitForWal() does a
>WalSndSetState(WALSNDSTATE_STOPPING); *and then continues decoding
>and waiting* - which seems to obviate the entire point of that commit.
>
> I'm working on a patch rejiggering things so:
>
> a) upon shutdown checkpointer (so we can use procsignal), not
>postmaster, sends PROCSIG_WALSND_INIT_STOPPING to all walsenders (so
>we don't have to use up a normal signal handler)

You'll need a second one that wakes up the latch of the WAL senders to
send more WAL records.

> b) Upon reception walsenders immediately exit if !replication_active,
>otherwise sets got_STOPPING

Okay, that's what happens now anyway, any new replication command
received results in an error. I actually prefer the way of doing in
HEAD, which at least reports an error.

> c) Logical decoding will finish if got_STOPPING is sent, *WITHOUT*, as
>currently done, moving to WALSNDSTATE_STOPPING.  Not yet quite sure
>how to best handle the WALSNDSTATE_STOPPING transition in WalSndLoop().

Wouldn't it make sense to have the logical receivers be able to
receive WAL up to the end of checkpoint record?

> d) Once all remaining walsenders are in stopping state, postmaster sends
>SIGUSR2 to trigger shutdown (basically as before)

OK.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical replication and PANIC during shutdown checkpoint in publisher

2017-06-01 Thread Andres Freund

On 2017-06-02 08:38:51 +0900, Michael Paquier wrote:
> On Fri, Jun 2, 2017 at 7:05 AM, Andres Freund  wrote:
> > I'm a unhappy how this is reusing SIGINT for WalSndLastCycleHandler.
> > Normally INT is used cancel interrupts, and since walsender is now also
> > working as a normal backend, this overlap is bad.  Even for plain
> > walsender backend this seems bad, because now non-superusers replication
> > users will terminate replication connections when they do
> > pg_cancel_backend().  For replication=dbname users it's especially bad
> > because there can be completely legitimate uses of pg_cancel_backend().
>
> Signals for WAL senders are set in WalSndSignals() which uses SIG_IGN
> for SIGINT now in ~9.6, and StatementCancelHandler does not get set up
> for a non-am_walsender backend. Am I missing something?

Yes, but nothing in those observeration actually addresses my point?

Some points:

1) 086221cf6b1727c2baed4703c582f657b7c5350e changes things so walsender
   backends use SIGINT for WalSndLastCycleHandler(), which is now
   triggerable by pg_cancel_backend(). Especially for logical rep
   walsenders it's not absurd sending that.
2) Walsenders now can run normal queries.
3) Afaict 086221cf6b1727c2baed4703c582f657b7c5350e doesn't really
   address the PANIC problem for database connected walsenders at all,
   because it doesn't even cancel non-replication commands.  I.e. an
   already running query can just continue to run.  Which afaict just
   entirely breaks shutdown.  If the connection is idle, or running a
   query, we'll just wait forever.
4) the whole logic introduced in the above commit doesn't actually
   appear to handle logical decoding senders properly - wasn't the whole
   issue at hand that those can write WAL in some case?  But
   nevertheless WalSndWaitForWal() does a
   WalSndSetState(WALSNDSTATE_STOPPING); *and then continues decoding
   and waiting* - which seems to obviate the entire point of that commit.

I'm working on a patch rejiggering things so:

a) upon shutdown checkpointer (so we can use procsignal), not
   postmaster, sends PROCSIG_WALSND_INIT_STOPPING to all walsenders (so
   we don't have to use up a normal signal handler)
b) Upon reception walsenders immediately exit if !replication_active,
   otherwise sets got_STOPPING
c) Logical decoding will finish if got_STOPPING is sent, *WITHOUT*, as
   currently done, moving to WALSNDSTATE_STOPPING.  Not yet quite sure
   how to best handle the WALSNDSTATE_STOPPING transition in WalSndLoop().
d) Once all remaining walsenders are in stopping state, postmaster sends
   SIGUSR2 to trigger shutdown (basically as before)

Does that seem to make sense?

- Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Perfomance bug in v10

2017-06-01 Thread David Rowley

On 2 June 2017 at 03:46, Teodor Sigaev  wrote:
> I miss here why could the presence of index influence on that? removing
> index causes a good plan although it isn't used in both plans .

Unique indexes are used as proofs when deciding if a join to the
relation is "inner_unique". A nested loop unique join is costed more
cheaply than a non-unique one since we can skip to the next outer
tuple once we've matched the current outer tuple to an inner tuple. In
theory that's half as many comparisons for a non-parameterised nested
loop.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical replication and PANIC during shutdown checkpoint in publisher

2017-06-01 Thread Michael Paquier

On Fri, Jun 2, 2017 at 7:05 AM, Andres Freund  wrote:
> I'm a unhappy how this is reusing SIGINT for WalSndLastCycleHandler.
> Normally INT is used cancel interrupts, and since walsender is now also
> working as a normal backend, this overlap is bad.  Even for plain
> walsender backend this seems bad, because now non-superusers replication
> users will terminate replication connections when they do
> pg_cancel_backend().  For replication=dbname users it's especially bad
> because there can be completely legitimate uses of pg_cancel_backend().

Signals for WAL senders are set in WalSndSignals() which uses SIG_IGN
for SIGINT now in ~9.6, and StatementCancelHandler does not get set up
for a non-am_walsender backend. Am I missing something?
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical replication - still unstable after all these months

2017-06-01 Thread Mark Kirkwood


On 31/05/17 21:16, Petr Jelinek wrote:


On 29/05/17 23:06, Mark Kirkwood wrote:

On 29/05/17 23:14, Petr Jelinek wrote:


On 29/05/17 03:33, Jeff Janes wrote:


What would you want to look at?  Would saving the WAL from the master be
helpful?


Useful info is, logs from provider (mainly the logical decoding logs
that mention LSNs), logs from subscriber (the lines about when sync
workers finished), contents of the pg_subscription_rel (with srrelid
casted to regclass so we know which table is which), and pg_waldump
output around the LSNs found in the logs and in the pg_subscription_rel
(+ few lines before and some after to get context). It's enough to only
care about LSNs for the table(s) that are out of sync.


I have a run that aborted with failure (accounts table md5 mismatch).
Petr - would you like to have access to the machine ? If so send me you
public key and I'll set it up.

Thanks to Mark's offer I was able to study the issue as it happened and
found the cause of this.

The busy loop in apply stops at the point when worker shmem state
indicates that table synchronization was finished, but that might not be
visible in the next transaction if it takes long to flush the final
commit to disk so we might ignore couple of transactions for given table
in the main apply because we think it's still being synchronized. This
also explains why I could not reproduce it on my testing machine (fast
ssd disk array where flushes were always fast) and why it happens
relatively rarely because it's one specific commit during the whole
synchronization process that needs to be slow.

So as solution I changed the busy loop in the apply to wait for
in-catalog status rather than in-memory status to make sure things are
really there and flushed.

While working on this I realized that the handover itself is bit more
complex than necessary (especially for debugging and for other people
understanding it) so I did some small changes as part of this patch to
make the sequences of states table goes during synchronization process
to always be the same. This might cause unnecessary update per one table
synchronization process in some cases but that seems like small enough
price to pay for clearer logic. And it also fixes another potential bug
that I identified where we might write wrong state to catalog if main
apply crashed while sync worker was waiting for status update.

I've been running tests on this overnight on another machine where I was
able to reproduce  the original issue within few runs (once I found what
causes it) and so far looks good.





I'm seeing a new failure with the patch applied - this time the history 
table has missing rows. Petr, I'll put back your access :-)


regards

Mark



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] quiet conversion warning in DatumGetFloat4

2017-06-01 Thread Tom Lane

Chapman Flack  writes:
> It might be fun to see how big a chunk of the 4106 would vanish just
> with the first tweak to one of the causes that's mentioned in a lot of
> them. (Unless your figures were already after culling to distinct causes,
> which would sound like a more-than-casual effort.)

No, I just did "make 2>&1 | grep 'warning: conversion' | wc".

I did look through the warnings a little bit.  A lot of them seem to be
caused by our being cavalier about using "int" parameters and/or loop
variables to represent attribute numbers; as soon as you pass one of those
to an API that's declared AttrNumber, warning.  Another large batch are
from conversions from size_t to int, a practice that's perfectly safe for
palloc'd values.  And I saw some in the planner from conversions of
rowcounts to double --- yes, I know that's imprecise, thank you very much.
Based on what I saw, there are hardly any places where a single touch
would remove a large number of these warnings; it'd be more like fixing
them retail, and it would be pretty pointless.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] quiet conversion warning in DatumGetFloat4

2017-06-01 Thread Chapman Flack

On 06/01/17 17:41, Tom Lane wrote:
> 12169 warnings generated by -Wconversion
> 4106 warnings generated by -Wconversion -Wno-sign-conversion
> ...
> So it's better with -Wno-sign-conversion, but I'd say we're still not
> going there anytime soon.

On an optimistic note, there might not turn out to be anywhere near
as many distinct causes; there's typically a lot of amplification.
The one patch I sent in eliminated screens upon screens of warning
output from the PL/Java build (I made no effort to count them, I just
listened to the noise in my speakers until I heard the scrolling stop).

It might be fun to see how big a chunk of the 4106 would vanish just
with the first tweak to one of the causes that's mentioned in a lot of
them. (Unless your figures were already after culling to distinct causes,
which would sound like a more-than-casual effort.)

Trouble is, after that first taste of success beyond expectation,
it gets like a drug.

-Chap

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] BEFORE trigger can cause undetected partition constraint violation

2017-06-01 Thread Tom Lane

Robert Haas  writes:
> I just discovered that a BEFORE trigger can allow data into a
> partition that violates the relevant partition constraint.  This is
> bad.

Without having checked the code, I imagine the reason for this is
that BEFORE triggers are fired after tuple routing occurs.

Re-ordering that seems problematic, because what if you have different
triggers on different partitions?  We'd have to forbid that, probably
by saying that only the parent table's BEFORE ROW triggers are fired,
but that seems not very nice.

The alternative is to forbid BEFORE triggers on partitions to alter
the routing result, probably by rechecking the partition constraint
after trigger firing.

We have always checked ordinary CHECK constraints after BEFORE
triggers fire, precisely because a trigger might change the data
to make it fail (or pass!) a constraint.  I take it somebody
decided that wasn't necessary for partition constraints.  Penny
wise and pound foolish?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical replication and PANIC during shutdown checkpoint in publisher

2017-06-01 Thread Andres Freund

On 2017-05-05 10:50:11 -0400, Peter Eisentraut wrote:
> On 5/5/17 01:26, Michael Paquier wrote:
> > The only code path doing HOT-pruning and generating WAL is
> > heap_page_prune(). Do you think that we need to worry about FPWs as
> > well?
> > 
> > Attached is an updated patch, which also forbids the run of any
> > replication commands when the stopping state is reached.
> 
> I have committed this without the HOT pruning change.  That can be
> considered separately, and I think it could use another round of
> thinking about it.

I'm a unhappy how this is reusing SIGINT for WalSndLastCycleHandler.
Normally INT is used cancel interrupts, and since walsender is now also
working as a normal backend, this overlap is bad.  Even for plain
walsender backend this seems bad, because now non-superusers replication
users will terminate replication connections when they do
pg_cancel_backend().  For replication=dbname users it's especially bad
because there can be completely legitimate uses of pg_cancel_backend().

- Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] strcmp() tie-breaker for identical ICU-collated strings

2017-06-01 Thread Thomas Munro

On Fri, Jun 2, 2017 at 9:27 AM, Peter Geoghegan  wrote:
> On Thu, Jun 1, 2017 at 2:24 PM, Thomas Munro
>  wrote:
>> Why should ICU be any different than the system provider in this
>> respect?  In both cases, we have a two-level comparison: first we use
>> the collation-aware comparison, and then as a tie breaker, we use a
>> binary comparison.  If we didn't do a binary comparison as a
>> tie-breaker, wouldn't the result be logically incompatible with the =
>> operator, which does a binary comparison?
>
> I agree with that assessment.

I think you *could* make a logically consistent set of operations with
no binary tie-breaker.  = could be defined in terms of strcoll and
hash could hash the output of strxfrm, but it it'd be impractical and
slow.  In order to take advantage of simple and fast = and hash, we go
the other way and teach < and > about binary order.

-- 
Thomas Munro
http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] strcmp() tie-breaker for identical ICU-collated strings

2017-06-01 Thread Tom Lane

Peter Geoghegan  writes:
> On Thu, Jun 1, 2017 at 2:24 PM, Thomas Munro
>  wrote:
>> Why should ICU be any different than the system provider in this
>> respect?  In both cases, we have a two-level comparison: first we use
>> the collation-aware comparison, and then as a tie breaker, we use a
>> binary comparison.  If we didn't do a binary comparison as a
>> tie-breaker, wouldn't the result be logically incompatible with the =
>> operator, which does a binary comparison?

> I agree with that assessment.

The critical reason why this is not optional is that if texteq were to
return true for strings that aren't bitwise identical, that breaks hashing
--- unless you can guarantee that the hash values for such strings will be
equal anyway.  That's hardly possible when we don't even know what the
collation's comparison rule is, and would likely be difficult even if
we had complete knowledge.

So no, we're not going there for ICU any more than we did for libc.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hash Functions

2017-06-01 Thread Joe Conway

On 06/01/2017 11:25 AM, Andres Freund wrote:
> On 2017-06-01 13:59:42 -0400, Robert Haas wrote:
>> My personal guess is that most people will prefer the fast
>> hash functions over the ones that solve their potential future
>> migration problems, but, hey, options are good.
> 
> I'm pretty sure that will be the case.  I'm not sure that adding
> infrastructure to allow for something that nobody will use in practice
> is a good idea.  If there ends up being demand for it, we can still go there.
> 
> I think that the number of people that migrate between architectures is
> low enough that this isn't going to be a very common issue.  Having some
> feasible way around this is important, but I don't think we should
> optimize heavily for it by developing new infrastructure / complicating
> experience for the 'normal' uses.

+1

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development



signature.asc
Description: OpenPGP digital signature

Re: [HACKERS] [PATCH] quiet conversion warning in DatumGetFloat4

2017-06-01 Thread Tom Lane

Chapman Flack  writes:
> On 05/31/2017 11:36 AM, Tom Lane wrote:
>> However, I grant your point that some extensions may have outside
>> constraints that mandate using -Wconversion, so to the extent that
>> we can keep key headers like postgres.h from triggering those warnings,
>> it's probably worth doing.  I suspect you're still seeing a lot of them
>> though --- experiments with some contrib modules suggest that a lot of
>> our other headers also contain code that would trigger them.  I do not
>> think I'd be on board with trying to silence them generally.

> That was actually the only one PL/Java gets, outside of /sign/
> conversions, a special subset of conversion warnings that can be
> separately turned off with -Wno-sign-conversion.

Just for the archives' sake: I experimented with this, using Fedora 25's
compiler (gcc version 6.3.1) against current HEAD (including your patch).
For the core build only, no contrib, I see:

12169 warnings generated by -Wconversion

4106 warnings generated by -Wconversion -Wno-sign-conversion

It's not just the core code that has issues either: contrib has 2202
warnings for the first case, 683 for the second.

So it's better with -Wno-sign-conversion, but I'd say we're still not
going there anytime soon.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] strcmp() tie-breaker for identical ICU-collated strings

2017-06-01 Thread Peter Geoghegan

On Thu, Jun 1, 2017 at 2:24 PM, Thomas Munro
 wrote:
> Why should ICU be any different than the system provider in this
> respect?  In both cases, we have a two-level comparison: first we use
> the collation-aware comparison, and then as a tie breaker, we use a
> binary comparison.  If we didn't do a binary comparison as a
> tie-breaker, wouldn't the result be logically incompatible with the =
> operator, which does a binary comparison?

I agree with that assessment.


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] strcmp() tie-breaker for identical ICU-collated strings

2017-06-01 Thread Thomas Munro

On Fri, Jun 2, 2017 at 6:58 AM, Amit Khandekar  wrote:
> While comparing two text strings using varstr_cmp(), if *strcoll*()
> call returns 0, we do strcmp() tie-breaker to do binary comparison,
> because strcoll() can return 0 for non-identical strings :
>
> varstr_cmp()
> {
> ...
> /*
> * In some locales strcoll() can claim that nonidentical strings are
> * equal.  Believing that would be bad news for a number of reasons,
> * so we follow Perl's lead and sort "equal" strings according to
> * strcmp().
> */
> if (result == 0)
> result = strcmp(a1p, a2p);
> ...
> }
>
> But is this supposed to apply for ICU collations as well ? If
> collation provider is icu, the comparison is done using
> ucol_strcoll*(). I suspect that ucol_strcoll*() intentionally returns
> some characters as being identical, so doing strcmp() may not make
> sense.
>
> For e.g. , if the below two characters are compared using
> ucol_strcollUTF8(), it returns 0, meaning the strings are identical :
> Greek Oxia : UTF-16 encoding : 0x1FFD
> (http://www.fileformat.info/info/unicode/char/1ffd/index.htm)
> Greek Tonos : UTF-16 encoding : 0x0384
> (http://www.fileformat.info/info/unicode/char/0384/index.htm)
>
> The characters are displayed like this :
> postgres=# select (U&'\+001FFD') , (U&'\+000384') collate ucatest;
>  ?column? | ?column?
> --+--
>  ´| ΄
> (Although this example has similar looking characters, this might not
> be a factor behind treating them equal)
>
> Now since ucol_strcoll*() returns 0, these strings are always compared
> using strcmp(), so 1FFD > 0384 returns true :
>
> create collation ucatest (locale = 'en_US.UTF8', provider = 'icu');
>
> postgres=# select (U&'\+001FFD') > (U&'\+000384') collate ucatest;
>  ?column?
> --
>  t
>
> Whereas, if strcmp() is skipped for ICU collations :
> if (result == 0 && !(mylocale && mylocale->provider == COLLPROVIDER_ICU))
>result = strcmp(a1p, a2p);
>
> ... then the comparison using ICU collation tells they are identical strings :
>
> postgres=# select (U&'\+001FFD') > (U&'\+000384') collate ucatest;
>  ?column?
> --
>  f
> (1 row)
>
> postgres=# select (U&'\+001FFD') < (U&'\+000384') collate ucatest;
>  ?column?
> --
>  f
> (1 row)
>
> postgres=# select (U&'\+001FFD') <= (U&'\+000384') collate ucatest;
>  ?column?
> --
>  t
>
>
> Now I have verified that strcoll() returns true for 1FFD > 0384. So,
> it looks like ICU API function ucol_strcoll() returns false by
> intention. That's the reason I feel like the
> strcmp-if-strtoll-returns-0 thing might not be applicable for ICU. But
> I may be wrong, please correct me if I may be missing something.

I may not have had enough coffee yet, but...

Why should ICU be any different than the system provider in this
respect?  In both cases, we have a two-level comparison: first we use
the collation-aware comparison, and then as a tie breaker, we use a
binary comparison.  If we didn't do a binary comparison as a
tie-breaker, wouldn't the result be logically incompatible with the =
operator, which does a binary comparison?

Put another way, if we didn't use binary order tie-breaking, we'd have
to teach texteq to understand collations (ie be defined as not (a < b)
and not (b > a)) otherwise we'd permit contradictions like a != b and
not (a < b) and not (b > a).

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [BUGS] Concurrent ALTER SEQUENCE RESTART Regression

2017-06-01 Thread Andres Freund

On 2017-06-01 19:08:33 +0200, Petr Jelinek wrote:
> On 01/06/17 16:51, Robert Haas wrote:
> > On Wed, May 31, 2017 at 8:07 PM, Andres Freund  wrote:
> >> Here's a patch doing what I suggested above.  The second patch addresses
> >> an independent oversight where the post alter hook was invoked before
> >> doing the catalog update.
> > 
> > 0002 is a slam-dunk.  I don't object to 0001 but haven't reviewed it 
> > carefully.
> > 
> 
> I did go over the code (ie didn't do actual testing), and I like it much
> better than the current state. Both in terms of making the behavior more
> consistent and the fact that the code is simpler.
> 
> So +1 to go ahead with both patches.

Thanks for the look!  I unfortunately forgot to credit you as a
reviewer, sorry for that.

Pushed both.

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] BEFORE trigger can cause undetected partition constraint violation

2017-06-01 Thread Jeevan Ladhe

I tried to debug this, and I see that while the target partition index is
correctly
found in ExecInsert(), somehow the resultRelInfo->ri_PartitionCheck is NIL,
this
is extracted from array mstate->mt_partitions.

This prevents execution of constraints in following code snippet in
ExecInsert():

/*
* Check the constraints of the tuple
*/
if (resultRelationDesc->rd_att->constr || resultRelInfo->ri_PartitionCheck)
ExecConstraints(resultRelInfo, slot, estate);

I couldn't debug it further today.

Regards,
Jeevan Ladhe

On Fri, Jun 2, 2017 at 1:21 AM, Robert Haas  wrote:

> I just discovered that a BEFORE trigger can allow data into a
> partition that violates the relevant partition constraint.  This is
> bad.
>
> Here is an example:
>
> rhaas=# create or replace function t() returns trigger as $$begin
> new.a := 2; return new; end$$ language plpgsql;
> CREATE FUNCTION
> rhaas=# create table foo (a int, b text) partition by list (a);
> CREATE TABLE
> rhaas=# create table foo1 partition of foo for values in (1);
> CREATE TABLE
> rhaas=# create trigger x before insert on foo1 for each row execute
> procedure t();
> CREATE TRIGGER
> rhaas=# insert into foo values (1, 'hi there');
> INSERT 0 1
> rhaas=# select tableoid::regclass, * from foo;
>  tableoid | a |b
> --+---+--
>  foo1 | 2 | hi there
> (1 row)
>
> That row violates the partition constraint, which requires that a = 1.
> You can see that by trying to insert the same row into the partition
> directly:
>
> rhaas=# insert into foo1 values (2, 'hi there');
> ERROR:  new row for relation "foo1" violates partition constraint
> DETAIL:  Failing row contains (2, hi there).
>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>

Re: [HACKERS] COPY (query) TO ... doesn't allow parallelism

2017-06-01 Thread Andres Freund

On 2017-06-01 15:56:35 -0400, Robert Haas wrote:
> > I personally think we should fix this in 9.6 and 10, but I've to admit
> > I'm not entirely impartial, because Citus hit this...
> 
> I guess it's a matter of judgement whether you want to call this a bug
> or a missing feature.  I wasn't really laboring under any illusion
> that I'd found every place that could benefit from a
> CURSOR_OPT_PARALLEL_OK marking, so it may be that in the future we'll
> find more such places and, well, where do you draw the line?

I agree that there's degrees here.  The reason I think this is on the
"backpatch" side of the fence is that IME COPY (query) is one of the
most common ways start start a longrunning query, which isn't the case
for some of the other ways to trigger them.

> That having been said, I don't know of any particular reason why this
> particular change would carry much risk.  My tolerance for optional
> changes in back branches is lower than many people here, so if it were
> me, I'd probably fix it only in master.  However, I can't get too
> excited about it either way.

So far I plan to push a fix to both branches, unless some other people
raise a bit stronger objections. I've some things I want to push first
(sequence stuff), so there's definitely some more time for protest.

- Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] COPY (query) TO ... doesn't allow parallelism

2017-06-01 Thread Robert Haas

On Wed, May 31, 2017 at 7:19 PM, Andres Freund  wrote:
> To me that appears to be an oversight rather than intentional.  A
> somewhat annoying one at that, because it's not uncommong to use COPY to
> execute reports to a CSV file and such.
>
> Robert, am I missing a complication?

No, I think that would work.

> I personally think we should fix this in 9.6 and 10, but I've to admit
> I'm not entirely impartial, because Citus hit this...

I guess it's a matter of judgement whether you want to call this a bug
or a missing feature.  I wasn't really laboring under any illusion
that I'd found every place that could benefit from a
CURSOR_OPT_PARALLEL_OK marking, so it may be that in the future we'll
find more such places and, well, where do you draw the line?  That
having been said, I don't know of any particular reason why this
particular change would carry much risk.  My tolerance for optional
changes in back branches is lower than many people here, so if it were
me, I'd probably fix it only in master.  However, I can't get too
excited about it either way.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] BEFORE trigger can cause undetected partition constraint violation

2017-06-01 Thread Robert Haas

I just discovered that a BEFORE trigger can allow data into a
partition that violates the relevant partition constraint.  This is
bad.

Here is an example:

rhaas=# create or replace function t() returns trigger as $$begin
new.a := 2; return new; end$$ language plpgsql;
CREATE FUNCTION
rhaas=# create table foo (a int, b text) partition by list (a);
CREATE TABLE
rhaas=# create table foo1 partition of foo for values in (1);
CREATE TABLE
rhaas=# create trigger x before insert on foo1 for each row execute
procedure t();
CREATE TRIGGER
rhaas=# insert into foo values (1, 'hi there');
INSERT 0 1
rhaas=# select tableoid::regclass, * from foo;
 tableoid | a |b
--+---+--
 foo1 | 2 | hi there
(1 row)

That row violates the partition constraint, which requires that a = 1.
You can see that by trying to insert the same row into the partition
directly:

rhaas=# insert into foo1 values (2, 'hi there');
ERROR:  new row for relation "foo1" violates partition constraint
DETAIL:  Failing row contains (2, hi there).

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] UPDATE of partition key

2017-06-01 Thread Robert Haas

On Thu, Jun 1, 2017 at 7:41 AM, Amit Khandekar  wrote:
>> Regarding the trigger issue, I can't claim to have a terribly strong
>> opinion on this.  I think that practically anything we do here might
>> upset somebody, but probably any halfway-reasonable thing we choose to
>> do will be OK for most people.  However, there seems to be a
>> discrepancy between the approach that got the most votes and the one
>> that is implemented by the v8 patch, so that seems like something to
>> fix.
>
> Yes, I have started working on updating the patch to use that approach
> (BR and AR update triggers on source and destination partition
> respectively, instead of delete+insert) The approach taken by the
> patch (BR update + delete+insert triggers) didn't require any changes
> in the way ExecDelete() and ExecInsert() were called. Now we would
> require to skip the delete/insert triggers, so some flags need to be
> passed to these functions, or else have stripped down versions of
> ExecDelete() and ExecInsert() which don't do other things like
> RETURNING handling and firing triggers.

See, that strikes me as a pretty good argument for firing the
DELETE+INSERT triggers...

I'm not wedded to that approach, but "what makes the code simplest?"
is not a bad tiebreak, other things being equal.

>> In terms of the approach taken by the patch itself, it seems
>> surprising to me that the patch only calls
>> ExecSetupPartitionTupleRouting when an update fails the partition
>> constraint.  Note that in the insert case, we call that function at
>> the start of execution;
>
>> calling it in the middle seems to involve additional hazards;
>> for example, is it really safe to add additional
>> ResultRelInfos midway through the operation?
>
> I thought since the additional ResultRelInfos go into
> mtstate->mt_partitions which is independent of
> estate->es_result_relations, that should be safe.

I don't know.  That sounds scary to me, but it might be OK.  Probably
needs more study.

>> Is it safe to take more locks midway through the operation?
>
> I can imagine some rows already updated, when other tasks like ALTER
> TABLE or CREATE INDEX happen on other partitions which are still
> unlocked, and then for row movement we try to lock these other
> partitions and wait for the DDL tasks to complete. But I didn't see
> any particular issues with that. But correct me if you suspect a
> possible issue. One issue can be if we were able to modify the table
> attributes, but I believe we cannot do that for inherited columns.

It's just that it's very unlike what we do anywhere else.  I don't
have a real specific idea in mind about what might totally break, but
at a minimum it could certainly cause behavior that can't happen
today.  Today, if you run a query on some tables, it will block
waiting for any locks at the beginning of the query, and the query
won't begin executing until it has all of the locks.  With this
approach, you might block midway through; you might even deadlock
midway through.  Maybe that's not overtly broken, but it's at least
got the possibility of being surprising.

Now, I'd actually kind of like to have behavior like this for other
cases, too.  If we're inserting one row, can't we just lock the one
partition into which it needs to get inserted, rather than all of
them?  But I'm wary of introducing such behavior incidentally in a
patch whose main goal is to allow UPDATE row movement.  Figuring out
what could go wrong and fixing it seems like a substantial project all
of its own.

> The reason I thought it cannot be done at the start of the execution,
> is because even if we know that update is not modifying the partition
> key column, we are not certain that the final NEW row has its
> partition key column unchanged, because of triggers. I understand it
> might be weird for a user requiring to modify a partition key value,
> but if a user does that, it will result in crash because we won't have
> the partition routing setup, thinking that there is no partition key
> column in the UPDATE.

I think we could avoid that issue.  Suppose we select the target
partition based only on the original NEW tuple.  If a trigger on that
partition subsequently modifies the tuple so that it no longer
satisfies the partition constraint for that partition, just let it
ERROR out normally.  Actually, it seems like that's probably the
*easiest* behavior to implement.  Otherwise, you might fire triggers,
discover that you need to re-route the tuple, and then ... fire
triggers again on the new partition, which might reroute it again?

>> I'm not sure that it's sensible to allow a trigger on an
>> individual partition to reroute an update to another partition
>> what if we get an infinite loop?)
>
> You mean, if the other table has another trigger that will again route
> to the original partition ? But this infinite loop problem could occur
> even for 2 normal tables ?

How?  For a normal trigger, nothing it does can change

Re: [HACKERS] Adding support for Default partition in partitioning

2017-06-01 Thread Jeevan Ladhe

Hi,

I have addressed Ashutosh's and Amit's comments in the attached patch.

Please let me know if I have missed anything and any further comments.

PFA.

Regards,
Jeevan Ladhe

On Wed, May 31, 2017 at 9:50 AM, Beena Emerson 
wrote:

> On Wed, May 31, 2017 at 8:13 AM, Amit Langote
>  wrote:
> > On 2017/05/31 9:33, Amit Langote wrote:
> >
> >
> > In get_rule_expr():
> >
> >  case PARTITION_STRATEGY_LIST:
> >  Assert(spec->listdatums != NIL);
> >
> > +/*
> > + * If the boundspec is of Default partition, it
> does
> > + * not have list of datums, but has only one
> node to
> > + * indicate its a default partition.
> > + */
> > +if (isDefaultPartitionBound(
> > +(Node *)
> linitial(spec->listdatums)))
> > +{
> > +appendStringInfoString(buf, "DEFAULT");
> > +break;
> > +}
> > +
> >
> > How about adding this part before the switch (key->strategy)?  That way,
> > we won't have to come back and add this again when we add range default
> > partitions.
>
> I think it is best that we add a bool is_default to PartitionBoundSpec
> and then have a general check for both list and range. Though
> listdatums, upperdatums and lowerdatums are set to default for a
> DEFAULt partition, it does not seem proper that we check listdatums
> for range as well.
>
>
>
>
> --
>
> Beena Emerson
>
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>


default_partition_v18.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] strcmp() tie-breaker for identical ICU-collated strings

2017-06-01 Thread Amit Khandekar

While comparing two text strings using varstr_cmp(), if *strcoll*()
call returns 0, we do strcmp() tie-breaker to do binary comparison,
because strcoll() can return 0 for non-identical strings :

varstr_cmp()
{
...
/*
* In some locales strcoll() can claim that nonidentical strings are
* equal.  Believing that would be bad news for a number of reasons,
* so we follow Perl's lead and sort "equal" strings according to
* strcmp().
*/
if (result == 0)
result = strcmp(a1p, a2p);
...
}

But is this supposed to apply for ICU collations as well ? If
collation provider is icu, the comparison is done using
ucol_strcoll*(). I suspect that ucol_strcoll*() intentionally returns
some characters as being identical, so doing strcmp() may not make
sense.

For e.g. , if the below two characters are compared using
ucol_strcollUTF8(), it returns 0, meaning the strings are identical :
Greek Oxia : UTF-16 encoding : 0x1FFD
(http://www.fileformat.info/info/unicode/char/1ffd/index.htm)
Greek Tonos : UTF-16 encoding : 0x0384
(http://www.fileformat.info/info/unicode/char/0384/index.htm)

The characters are displayed like this :
postgres=# select (U&'\+001FFD') , (U&'\+000384') collate ucatest;
 ?column? | ?column?
--+--
 ´| ΄
(Although this example has similar looking characters, this might not
be a factor behind treating them equal)

Now since ucol_strcoll*() returns 0, these strings are always compared
using strcmp(), so 1FFD > 0384 returns true :

create collation ucatest (locale = 'en_US.UTF8', provider = 'icu');

postgres=# select (U&'\+001FFD') > (U&'\+000384') collate ucatest;
 ?column?
--
 t

Whereas, if strcmp() is skipped for ICU collations :
if (result == 0 && !(mylocale && mylocale->provider == COLLPROVIDER_ICU))
   result = strcmp(a1p, a2p);

... then the comparison using ICU collation tells they are identical strings :

postgres=# select (U&'\+001FFD') > (U&'\+000384') collate ucatest;
 ?column?
--
 f
(1 row)

postgres=# select (U&'\+001FFD') < (U&'\+000384') collate ucatest;
 ?column?
--
 f
(1 row)

postgres=# select (U&'\+001FFD') <= (U&'\+000384') collate ucatest;
 ?column?
--
 t


Now I have verified that strcoll() returns true for 1FFD > 0384. So,
it looks like ICU API function ucol_strcoll() returns false by
intention. That's the reason I feel like the
strcmp-if-strtoll-returns-0 thing might not be applicable for ICU. But
I may be wrong, please correct me if I may be missing something.


-- 
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical replication busy-waiting on a lock

2017-06-01 Thread Andres Freund

On 2017-06-01 14:17:44 +0200, Petr Jelinek wrote:
> Thinking more about this, I am not convinced it's a good idea to change
> exports this late in the cycle. I still think it's best to do the xid
> assignment only when the snapshot is actually exported but don't assign
> xid when the export is only used by the local session (the new option in
> PG10). That's one line change which impacts only logical
> replication/decoding as opposed to everything else which uses exported
> snapshots.

I'm not quite convinced by this argument.  Exported snapshot contents
are ephemeral, we can change the format at any time.  The wait is fairly
annoying for every user of logical decoding.  For me the combination of
those two fact implies that we should rather fix this properly.

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hash Functions

2017-06-01 Thread Andres Freund

On 2017-06-01 13:59:42 -0400, Robert Haas wrote:
> I'm not actually aware of an instance where this has bitten anyone,
> even though it seems like it certainly could have and maybe should've
> gotten somebody at some point.  Has anyone else?

Two comments: First, citus has been doing hash-partitiong and
append/range partitioning for a while now, and I'm not aware of anyone
being bitten by this (although there've been plenty other things ;)),
even though there've been cases upgrading to different collation &
encodings.  Secondly, I think that's to a significant degree caused by
the fact that in practice people way more often partition on types like
int4/int8/date/timestamp/uuid rather than text - there's rarely good
reasons to do the latter.

> Furthermore, neither range nor list partitioning depends on properties
> of the hardware, like how wide integers are, or whether they are
> stored big-endian.  A naive approach to hash partitioning would depend
> on those things.  That's clearly worse.

I don't think our current int4/8 hash functions depend on
FLOAT8PASSBYVAL.

> 3. Implement portable hash functions (Jeff Davis or me, not sure
> which).  Andres scoffed at this idea, but I still think it might have
> legs.  Coming up with a hashing algorithm for integers that produces
> the same results on big-endian and little-endian systems seems pretty
> feasible, even with the additional constraint that it should still be
> fast.

Just to clarify: I don't think it's a problem to do so for integers and
most other simple scalar types. There's plenty hash algorithms that are
endianess independent, and the rest is just a bit of care.  Where I see
a lot more issues is doing so for more complex types like arrays, jsonb,
postgis geometry/geography types and the like, where the fast and simple
implementation is to just hash the entire datum - and that'll very
commonly not be portable at all due to padding and type wideness
differences.

> My personal guess is that most people will prefer the fast
> hash functions over the ones that solve their potential future
> migration problems, but, hey, options are good.

I'm pretty sure that will be the case.  I'm not sure that adding
infrastructure to allow for something that nobody will use in practice
is a good idea.  If there ends up being demand for it, we can still go there.

I think that the number of people that migrate between architectures is
low enough that this isn't going to be a very common issue.  Having some
feasible way around this is important, but I don't think we should
optimize heavily for it by developing new infrastructure / complicating
experience for the 'normal' uses.

Greetings,

Andres Freund

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Effect of changing the value for PARALLEL_TUPLE_QUEUE_SIZE

2017-06-01 Thread Andres Freund

On 2017-06-01 18:41:20 +0530, Rafia Sabih wrote:
> As per my understanding it looks like this increase in tuple queue
> size is helping only gather-merge. Particularly, in the case where it
> is enough stalling by master in gather-merge because it is maintaining
> the sort-order. Like in q12 the index is unclustered and gather-merge
> is just above parallel index scan, thus, it is likely that to maintain
> the order the workers have to wait long for the in-sequence tuple is
> attained by the master.

I wonder if there's some way we could make this problem a bit less bad.
One underlying problem is that we don't know what the current boundary
on each worker is, unless it returns a tuple. I.e. even if some worker
is guaranteed to not return any further tuples below another worker's
last tuple, gather-merge won't know about that until it finds another
matching tuple.  Perhaps, for some subsets, we could make the workers
update that boundary without producing a tuple that gather will actually
return?  In the, probably reasonably common, case of having merge-joins
below the gather, it shouldn't be very hard to do so.  Imagine e.g. that
every worker gets a "slot" in a dsm where it can point to a tuple
(managed by dsa.c to deal with variable-length keys) that contains the
current boundary.  For a merge-join it'd not be troublesome to
occasionally - although what constitutes that isn't easy, perhaps the
master signals the worker? - put a new boundary tuple there, even if it
doesn't find a match.  It's probably harder for cases where most of the
filtering happens far below the top-level worker node.

- Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hash Functions

2017-06-01 Thread Robert Haas

On Fri, May 12, 2017 at 1:35 PM, Joe Conway  wrote:
>> That's a good point, but the flip side is that, if we don't have
>> such a rule, a pg_dump of a hash-partitioned table on one
>> architecture might fail to restore on another architecture.  Today, I
>> believe that, while the actual database cluster is
>> architecture-dependent, a pg_dump is architecture-independent.  Is it
>> OK to lose that property?
>
> Not from where I sit.

It was pointed out at PGCon that we've actually already crossed this
Rubicon.  If you create a range-partitioned table, put a bunch of data
into it, and then try to reload it on another system with a different
set of encoding definitions, the proper partition for some row might
be different.  That would result in the dump failing to reload with a
complaint about the partition key being violated.  And, in fact, you
could have the exact same issue on earlier releases which don't have
partitioning, because a CHECK constraint of the form (a >= 'something'
AND b < 'something else') could be true under one encoding and false
under another, and you could define such a constraint on any release
(from this millienium, anyway).

I'm not actually aware of an instance where this has bitten anyone,
even though it seems like it certainly could have and maybe should've
gotten somebody at some point.  Has anyone else?  I think it's a
reasonable guess that such problems will become more common with the
advent of partitioning and more common still as we continue to improve
partitioning, because people who otherwise would have given up on
PostgreSQL -- or at least on partitioning -- will actually try to use
it in earnest and then hit this problem.  However, my guess is that it
will still be pretty rare, and that having an optional
--dump-partition-data-with-parent flag that can be used when it crops
up will be an adequate workaround for most people.  Of course, that is
just my opinion.

So now I think that the right way to think about the issues around
hash partitioning is as a new variant of a problem that we already
have rather than an altogether new problem.  IMHO, the right questions
are:

1. Are the new problems worse than the old ones?

2. What could we do about it?

On #1, I'd say tentatively yes.  The problem of portability across
encodings is probably not new.  Suppose you have a table partitioned
by range, either using the new range partitioning or using the old
table inheritance method and CHECK constraints.  If you move that
table to a different encoding, will the collation behavior you get
under the new encoding match the collation behavior you got under the
old encoding?  The documentation says: "Also, a collation is tied to a
character set encoding (see Section 22.3). The same collation name may
exist for different encodings", which makes it sound like it is
possible but not guaranteed.  Even if the same collation name happens
to exist, there's no guarantee it behaves the same way under the new
encoding, and given our experiences with glibc so far, I'd bet against
it.  If glibc doesn't even think strcoll() and strxfrm() need to agree
with each other for the same collation, or that minor releases
shouldn't whack the behavior around, there doesn't seem to be room for
optimism about the possibility that they carefully preserve behavior
across similarly-named collations on different encodings.  On the
other hand, collation rules probably tend to vary only around the
edges, so there's a reasonably good chance that even if the collation
rules change when you switch encodings, every row will still get put
into the same partition as before.  If we implement hashing for hash
partitioning in some trivial way like hashing the raw bytes, that will
most certainly not be true -- *most* rows will move to a different
partition when you switch encodings.

Furthermore, neither range nor list partitioning depends on properties
of the hardware, like how wide integers are, or whether they are
stored big-endian.  A naive approach to hash partitioning would depend
on those things.  That's clearly worse.

On #2, I'll attempt to list the approaches that have been proposed so far:

1. Don't implement hash partitioning (Tom Lane).  I don't think this
proposal will attract many votes.

2. Add an option like --dump-partition-data-with-parent.  I'm not sure
who originally proposed this, but it seems that everybody likes it.
What we disagree about is the degree to which it's sufficient.  Jeff
Davis thinks it doesn't go far enough: what if you have an old
plain-format dump that you don't want to hand-edit, and the source
database is no longer available?  Most people involved in the
unconference discussion of partitioning at PGCon seemed to feel that
wasn't really something we should be worry about too much.  I had been
taking that position also, more or less because I don't see that there
are better alternatives.  For instance, Jeff proposed having the COPY
command specify both the parent and

Re: [HACKERS] Get stuck when dropping a subscription during synchronizing table

2017-06-01 Thread Petr Jelinek

On 01/06/17 17:32, Masahiko Sawada wrote:
> On Thu, May 25, 2017 at 5:29 PM, tushar  wrote:
>> On 05/25/2017 12:44 AM, Petr Jelinek wrote:
>>>
>>> There is still outstanding issue that sync worker will keep running
>>> inside the long COPY because the invalidation messages are also not
>>> processed until it finishes but all the original issues reported here
>>> disappear for me with the attached patches applied.
>>
>> After applying all your patches, drop subscription no  more hangs while
>> dropping  subscription but there is an error   "ERROR:  tuple concurrently
>> updated" in the log file of
>> standby.
>>
> 
> I tried to reproduce this issue with latest four patches I submit but
> it didn't happen. I guess this issue doesn't related to the issues
> reported on this thread and another factor  might be cause of this
> issue. It would be good to test the same again with latest four
> patches or after solved current some open items.
> 

That's because your additional patch kills the workers that do the
concurrent update. While that's probably okay, I still plan to look into
making the subscription and state locking more robust.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [BUGS] Concurrent ALTER SEQUENCE RESTART Regression

2017-06-01 Thread Petr Jelinek

On 01/06/17 16:51, Robert Haas wrote:
> On Wed, May 31, 2017 at 8:07 PM, Andres Freund  wrote:
>> Here's a patch doing what I suggested above.  The second patch addresses
>> an independent oversight where the post alter hook was invoked before
>> doing the catalog update.
> 
> 0002 is a slam-dunk.  I don't object to 0001 but haven't reviewed it 
> carefully.
> 

I did go over the code (ie didn't do actual testing), and I like it much
better than the current state. Both in terms of making the behavior more
consistent and the fact that the code is simpler.

So +1 to go ahead with both patches.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: retry shm attach for windows (WAS: Re: [HACKERS] OK, so culicidae is still broken)

2017-06-01 Thread Petr Jelinek

On 01/06/17 15:25, Tom Lane wrote:
> Robert Haas  writes:
>> So, are you going to, perhaps, commit this?  Or who is picking this up?
> 
>> /me knows precious little about Windows.
> 
> I'm not going to be the one to commit this either, but seems like someone
> should.
> 

The new code does not use any windows specific APIs or anything, it just
adds retry logic for reattaching when we do EXEC_BACKEND which seems to
be agreed way of solving this. I do have couple of comments about the
code though.

The new parameter retry_count in PGSharedMemoryReAttach() seems to be
only used to decide if to log reattach issues so that we don't spam log
when retrying, but this fact is not mentioned anywhere.

Also, I am not excited about following coding style:
> + if (!pgwin32_ReserveSharedMemoryRegion(pi.hProcess))
> + continue;
> + else
> + {

Amit, if you want to avoid having to add the curly braces for single
line while still having else, I'd invert the expression in the if ()
statement so that true comes first. It's much less ugly to have curly
braces part first and the continue statement in the else block IMHO.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [JDBC] [HACKERS] Channel binding support for SCRAM-SHA-256

2017-06-01 Thread Álvaro Hernández Tortosa




On 01/06/17 17:50, Stephen Frost wrote:

Robert,

* Robert Haas (robertmh...@gmail.com) wrote:

On Wed, May 31, 2017 at 7:59 PM, Stephen Frost  wrote:

If your comments regarding the requirement that we have interoperability
testing of this feature before accepting it were intended to mean that
we need to make sure that the JDBC driver is able to work with the
chosen solution (or, perhaps, that we support multuple options, one of
which works with the JDBC changes), and that we review the other TLS
libraries with the goal of making sure that what we agree on in core
should work with those also, then that's great and exactly what I'd like
to see also.

OK, great.  That's what I mean (mostly - see below).

Glad to hear it.


If your comments regarding the requirement that we have interoperability
testing of this feature before accepting it were intended to mean that
we must have a complete alternative TLS solution, while that would
actually play a great deal to a goal I've had for a while to have PG
support multiple TLS implementations (LibNSS being top-of-mind, at least
for me, as I've mentioned a couple times), I can't agree that we should
require that before accepting channel binding as a feature.  To do so
would be akin to requiring that we support multiple TLS implementations
before we agreed to support client-side certificates, in my opinion.

I don't think that we need to have a committed patch allowing multiple
SSL implementations before we accept a patch for channel binding, but
I think it might be a good idea for someone to hack either libpq or
the server enough to make it work with some other SSL implementation
(Windows SSL would probably be the best candidate) and test that what
we think will work for channel binding actually does work.  It
wouldn't need to be a committable patch, but it should be enough to
make a successful connection in the laboratory.  Now, there might also
be other ways besides that of testing that interoperability is
possible, so don't take me as being of the view that someone has to
necessarily do exactly that thing that I just said.  But I do think
that we probably should do more than say "hey, we've got this
undocumented SSL API, and this other Windows API, and it looks like
they do about the same thing, so we're good".  There's sometimes a big
difference between what sounds like it should work on paper and what
actually does work.

I certainly wouldn't object to someone working on this, but I feel like
it's a good deal more work than perhaps you're realizing (and I tend to
think trying to use the Windows SSL implementation would increase the
level of effort, not minimize it).  Perhaps it wouldn't be too bad to
write a one-off piece of code which just connects and then returns the
channel binding information on each side and then one could hand-check
that what's returned matches what it's supposed to based on the RFC, but
if it doesn't, then what?  In the specific case we're talking about,
there's two approaches in the RFC and it seems like we should support
both because at least our current JDBC implementation only works with
one, and ideally we can allow for that to be extended to other methods,
but I wouldn't want to implement a method that only works on Windows SSL
because that implementation, for whatever reason, doesn't follow either
of the methods available in the RFC.


To make things even more complicated, I think the RFC is not 
helping very much, as the definition is not very clear itself:


"
Description: The first TLS Finished message sent (note: the Finished
   struct, not the TLS record layer message containing it) in the most
   recent TLS handshake of the TLS connection being bound to (note: TLS
   connection, not session, so that the channel binding is specific to
   each connection regardless of whether session resumption is used).
   If TLS renegotiation takes place before the channel binding
   operation, then the first TLS Finished message sent of the latest/
   inner-most TLS connection is used.  Note that for full TLS
   handshakes, the first Finished message is sent by the client, while
   for abbreviated TLS handshakes (session resumption), the first
   Finished message is sent by the server.
"
https://tools.ietf.org/html/rfc5929#section-3.1

If you read further, it becomes even less clear what it is and that 
if there are re-negotiations, it gets more complicated. Server-end-point 
is kind of better specified:


"
The hash of the TLS server's certificate [RFC5280] as it
   appears, octet for octet, in the server's Certificate message.
"


Álvaro



--

Álvaro Hernández Tortosa


---
<8K>data



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] COPY (query) TO ... doesn't allow parallelism

2017-06-01 Thread Andres Freund

On 2017-06-01 21:37:56 +0530, Amit Kapila wrote:
> On Thu, Jun 1, 2017 at 9:34 PM, Andres Freund  wrote:
> > On 2017-06-01 21:23:04 +0530, Amit Kapila wrote:
> >> On a related note, I think it might be better to have an
> >> IsInParallelMode() check in this case as we have at other places.
> >> This is to ensure that if this command is invoked via plpgsql function
> >> and that function runs is the parallel mode, it will act as a
> >> safeguard.
> >
> > Hm? Which other places do it that way?  Isn't standard_planner()
> > centralizing such a check?
> >
> 
> heap_insert->heap_prepare_insert, heap_update, heap_delete, etc.

Those aren't comparable, they're not invoking the planner - and all the
places that set PARALLEL_OK don't check for it.  The relevant check for
planning is in standard_planner().

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [JDBC] [HACKERS] Channel binding support for SCRAM-SHA-256

2017-06-01 Thread Álvaro Hernández Tortosa




On 01/06/17 18:11, Bruce Momjian wrote:

On Wed, May 31, 2017 at 09:37:02AM -0400, Robert Haas wrote:

On Tue, May 30, 2017 at 11:49 PM, Stephen Frost  wrote:

... and I don't believe that we should be asking the
implementors of channel binding to also implement support for multiple
TLS libraries in PostgreSQL in order to test that their RFC-following
(at least, as far as they can tell) implementation actually works.

You're of course free to believe what you wish, but that sounds
short-sighted to me.  If we implement channel binding and it turns out
not to be interoperable with other SSL implementations, then what?  We
can't change it later without breaking compatibility with our own
prior implementation.  Note that Álvaro Hernández Tortosa said about
two hours before you sent this email that it doesn't seem possible to
implement something comparable in Java's standard SSL stack.  If
that's the case, adopting this implementation is dooming everyone who
connects to the database server using JDBC to be unable to use channel
binding.  And that's a large percentage of our user base.

Just to step back, exactly how does channel binding work?  Is each side
of the SSL connection hashing the password hash with the shared SSL
session secret in some way that each side knows the other end knows
the password hash, but not disclosing the secret or password hash?  Is
there some other way JDBC can get that information?



In short, channel binding augments SCRAM (could be also other 
authentication methods, I guess) by adding to the mix of data to be 
exchanged, data that comes off-band. Normal SCRAM exchange involves 
user, a unique token, a salt and some other information. If you add into 
the mix off-band information, that is uniquely known by only client and 
server, you can avoid MITM attacks. It's not as simple as "hashing" the 
password, though. SCRAM involves 4 steps (2 messages each way) with 
somehow complex hashing of data in the last 2 steps.


There are basically 2 channel binding options, that is, 2 possible 
data pieces that could be taken off-band of the SCRAM authentication and 
throw them into this mix:


- tls-unique. It's like a unique identifier for each client-server SSL 
connection. It should be get from the SSL channel and there doesn't seem 
to be a super consistent way of getting it from the channel (in OpenSSL 
is an undocumented API, it's not available in normal Java stack, etc).
- tls-server-end-point. Channel binding data is just a hash of a SSL 
certificate, as is. As such, seems to be easier to be supported across 
multiple OSs/SSL stacks.


However, SCRAM RFC states that if channel binding is implemented, 
at least tls-unique must be implemented, with others being optional 
(there is as of today a third one, but only applies to telnet). So in my 
opinion, I'd implement both of the above, for maximum flexibility, and 
add a field to the required scram authentication febe message with a 
list (aka CSV) of the channel binding mechanisms supported by this 
server. In this way, I believe covers support for several channel 
binding mechanisms and could be extended in the future should other 
mechanisms appear.



Álvaro

--

Álvaro Hernández Tortosa


---
<8K>data



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [JDBC] [HACKERS] Channel binding support for SCRAM-SHA-256

2017-06-01 Thread Robert Haas

On Thu, Jun 1, 2017 at 11:50 AM, Stephen Frost  wrote:
> I certainly wouldn't object to someone working on this, but I feel like
> it's a good deal more work than perhaps you're realizing (and I tend to
> think trying to use the Windows SSL implementation would increase the
> level of effort, not minimize it).

I agree that it's a fair amount of work, but if nobody does it, then I
think it's pretty speculative to suppose that the feature actually
works correctly.

> Perhaps it wouldn't be too bad to
> write a one-off piece of code which just connects and then returns the
> channel binding information on each side and then one could hand-check
> that what's returned matches what it's supposed to based on the RFC, but
> if it doesn't, then what?

Then something's broken and we need to fix it before we start
committing patches.

> In the specific case we're talking about,
> there's two approaches in the RFC and it seems like we should support
> both because at least our current JDBC implementation only works with
> one, and ideally we can allow for that to be extended to other methods,
> but I wouldn't want to implement a method that only works on Windows SSL
> because that implementation, for whatever reason, doesn't follow either
> of the methods available in the RFC.

I am very skeptical that if two people on two different SSL
interpretations both do something that appears to comply with the RFC,
we can just assume those two people will get the same answer.  In an
ideal world, that would definitely work, but in the real world, I
think it needs to be tested or you don't really know.  I agree that if
a given SSL implementation is such that it can't support either of the
possible channel binding methods, then that's not our problem; people
using that SSL implementation just can't get channel binding, and if
they don't like that they can switch SSL implementations.  But I also
think that if two SSL implementations both claim to support what's
needed to make channel binding work and we don't do any testing that
they actually interoperate, then I don't think we can really know that
we've got it right.

Another way of validating Michael's problem which I would find
satisfactory is to go look at some other OpenSSL-based implementations
of channel binding.  If in each case they are using the same functions
that Michael used in the same way, then that's good evidence that his
implementation is doing the right thing, especially if any of those
implementations also support other SSL implementations and have
verified that the OpenSSL mechanism does in fact interoperate.

I don't really know why we're arguing about this.  It seems blindingly
obvious to me that we can't just take it on faith that the functions
Michael identified for this purpose are the right ones and will work
perfectly in complete compliance with the RFC.  We are in general very
reluctant to make such assumptions and if we were going to start,
changes that affect wire protocol compatibility wouldn't be my first
choice.  Is it really all that revolutionary or controversial to
suggest that this patch has not yet had enough validation to really
know that it does what we want?  To me, it seems like verifying that a
patch which supposedly implements a standard interoperates with
something other than itself is so obvious that it should barely need
to be mentioned, let alone become a bone of contention.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] <> join selectivity estimate question

2017-06-01 Thread Tom Lane

Dilip Kumar  writes:
> Actually, I was not proposing this patch instead I wanted to discuss
> the approach.  I was claiming that for
> non-equal JOIN_SEMI selectivity estimation instead of calculating
> selectivity in an existing way i.e
> = 1- (selectivity of equal JOIN_SEMI)  the better way would be = 1-
> (selectivity of equal).  I have only tested only standalone scenario
> where it solves the problem but not the TPCH cases.  But I was more
> interested in discussing that the way I am thinking how it should
> calculate the nonequal SEMI join selectivity make any sense.

I don't think it does really.  The thing about a <> semijoin is that it
will succeed unless *every* join key value from the inner query is equal
to the outer key value (or is null).  That's something we should consider
to be of very low probability typically, so that the <> selectivity should
be estimated as nearly 1.0.  If the regular equality selectivity
approaches 1.0, or when there are expected to be very few rows out of the
inner query, then maybe the <> estimate should start to drop off from 1.0,
but it surely doesn't move linearly with the equality selectivity.

BTW, I'd momentarily confused this thread with the one about bug #14676,
which points out that neqsel() isn't correctly accounting for nulls.
neqjoinsel() isn't either.  Not sure that we want to solve both things
in one patch though.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [JDBC] [HACKERS] Channel binding support for SCRAM-SHA-256

2017-06-01 Thread Bruce Momjian

On Wed, May 31, 2017 at 09:37:02AM -0400, Robert Haas wrote:
> On Tue, May 30, 2017 at 11:49 PM, Stephen Frost  wrote:
> > ... and I don't believe that we should be asking the
> > implementors of channel binding to also implement support for multiple
> > TLS libraries in PostgreSQL in order to test that their RFC-following
> > (at least, as far as they can tell) implementation actually works.
> 
> You're of course free to believe what you wish, but that sounds
> short-sighted to me.  If we implement channel binding and it turns out
> not to be interoperable with other SSL implementations, then what?  We
> can't change it later without breaking compatibility with our own
> prior implementation.  Note that Álvaro Hernández Tortosa said about
> two hours before you sent this email that it doesn't seem possible to
> implement something comparable in Java's standard SSL stack.  If
> that's the case, adopting this implementation is dooming everyone who
> connects to the database server using JDBC to be unable to use channel
> binding.  And that's a large percentage of our user base.

Just to step back, exactly how does channel binding work?  Is each side
of the SSL connection hashing the password hash with the shared SSL
session secret in some way that each side knows the other end knows
the password hash, but not disclosing the secret or password hash?  Is
there some other way JDBC can get that information?

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] COPY (query) TO ... doesn't allow parallelism

2017-06-01 Thread Amit Kapila

On Thu, Jun 1, 2017 at 9:34 PM, Andres Freund  wrote:
> On 2017-06-01 21:23:04 +0530, Amit Kapila wrote:
>> On a related note, I think it might be better to have an
>> IsInParallelMode() check in this case as we have at other places.
>> This is to ensure that if this command is invoked via plpgsql function
>> and that function runs is the parallel mode, it will act as a
>> safeguard.
>
> Hm? Which other places do it that way?  Isn't standard_planner()
> centralizing such a check?
>

heap_insert->heap_prepare_insert, heap_update, heap_delete, etc.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] COPY (query) TO ... doesn't allow parallelism

2017-06-01 Thread Andres Freund

On 2017-06-01 21:23:04 +0530, Amit Kapila wrote:
> On a related note, I think it might be better to have an
> IsInParallelMode() check in this case as we have at other places.
> This is to ensure that if this command is invoked via plpgsql function
> and that function runs is the parallel mode, it will act as a
> safeguard.

Hm? Which other places do it that way?  Isn't standard_planner()
centralizing such a check?

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] COPY (query) TO ... doesn't allow parallelism

2017-06-01 Thread Amit Kapila

On Thu, Jun 1, 2017 at 4:49 AM, Andres Freund  wrote:
> Hi,
>
> At the moment $subject doesn't allow parallelism, because copy.c's
> pg_plan_query() invocation doesn't set the CURSOR_OPT_PARALLEL_OK
> flag.
>
> To me that appears to be an oversight rather than intentional.
>

I also don't see any problem in parallelizing it.  On a related note,
I think it might be better to have an IsInParallelMode() check in this
case as we have at other places.  This is to ensure that if this
command is invoked via plpgsql function and that function runs is the
parallel mode, it will act as a safeguard.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [JDBC] [HACKERS] Channel binding support for SCRAM-SHA-256

2017-06-01 Thread Stephen Frost

Robert,

* Robert Haas (robertmh...@gmail.com) wrote:
> On Wed, May 31, 2017 at 7:59 PM, Stephen Frost  wrote:
> > If your comments regarding the requirement that we have interoperability
> > testing of this feature before accepting it were intended to mean that
> > we need to make sure that the JDBC driver is able to work with the
> > chosen solution (or, perhaps, that we support multuple options, one of
> > which works with the JDBC changes), and that we review the other TLS
> > libraries with the goal of making sure that what we agree on in core
> > should work with those also, then that's great and exactly what I'd like
> > to see also.
> 
> OK, great.  That's what I mean (mostly - see below).

Glad to hear it.

> > If your comments regarding the requirement that we have interoperability
> > testing of this feature before accepting it were intended to mean that
> > we must have a complete alternative TLS solution, while that would
> > actually play a great deal to a goal I've had for a while to have PG
> > support multiple TLS implementations (LibNSS being top-of-mind, at least
> > for me, as I've mentioned a couple times), I can't agree that we should
> > require that before accepting channel binding as a feature.  To do so
> > would be akin to requiring that we support multiple TLS implementations
> > before we agreed to support client-side certificates, in my opinion.
> 
> I don't think that we need to have a committed patch allowing multiple
> SSL implementations before we accept a patch for channel binding, but
> I think it might be a good idea for someone to hack either libpq or
> the server enough to make it work with some other SSL implementation
> (Windows SSL would probably be the best candidate) and test that what
> we think will work for channel binding actually does work.  It
> wouldn't need to be a committable patch, but it should be enough to
> make a successful connection in the laboratory.  Now, there might also
> be other ways besides that of testing that interoperability is
> possible, so don't take me as being of the view that someone has to
> necessarily do exactly that thing that I just said.  But I do think
> that we probably should do more than say "hey, we've got this
> undocumented SSL API, and this other Windows API, and it looks like
> they do about the same thing, so we're good".  There's sometimes a big
> difference between what sounds like it should work on paper and what
> actually does work.

I certainly wouldn't object to someone working on this, but I feel like
it's a good deal more work than perhaps you're realizing (and I tend to
think trying to use the Windows SSL implementation would increase the
level of effort, not minimize it).  Perhaps it wouldn't be too bad to
write a one-off piece of code which just connects and then returns the
channel binding information on each side and then one could hand-check
that what's returned matches what it's supposed to based on the RFC, but
if it doesn't, then what?  In the specific case we're talking about,
there's two approaches in the RFC and it seems like we should support
both because at least our current JDBC implementation only works with
one, and ideally we can allow for that to be extended to other methods,
but I wouldn't want to implement a method that only works on Windows SSL
because that implementation, for whatever reason, doesn't follow either
of the methods available in the RFC.

Thanks!

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] Perfomance bug in v10

2017-06-01 Thread Teodor Sigaev


Thank you for the answer!



This is all caused by get_variable_numdistinct() deciding that all
values are distinct because ntuples < DEFAULT_NUM_DISTINCT. I see that
if the example is increased to use 300 tuples instead of 32, then
that's enough for the planner to estimate 2 rows instead of clamping
to 1, and the bad plan does not look so good anymore since the planner
predicts that those nested loops need to be executed more than once.
I miss here why could the presence of index influence on that? removing 
index causes a good plan although it isn't used in both plans .




I really think the planner is too inclined to take risks by nesting
Nested loops like this, but I'm not all that sure the best solution to
fix this, and certainly not for beta1.

So, I'm a bit unsure exactly how best to deal with this.  It seems
like we'd better make some effort, as perhaps this could be a case
that might occur when temp tables are used and not ANALYZED, but the
only way I can think to deal with it is not to favour unique inner
nested loops in the costing model.  The unfortunate thing about not
doing this is that the planner will no longer swap the join order of a
2-way join to put the unique rel on the inner side. This is evident by
the regression test failures caused by patching with the attached,
which changes the cost model for nested loops back to what it was
before unique joins.
The patch, seems, works for this particular case, but loosing swap isn't 
good thing, I suppose.




My other line of thought is just not to bother doing anything about
this. There's plenty more queries you could handcraft to trick the
planner into generating a plan that'll blow up like this. Is this a
realistic enough one to bother accounting for? Did it come from a real
world case? else, how did you stumble upon it?


Unfortunately, it's taken from real application.

--
Teodor Sigaev  E-mail: teo...@sigaev.ru
  WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] <> join selectivity estimate question

2017-06-01 Thread Dilip Kumar

On Thu, Jun 1, 2017 at 8:24 PM, Robert Haas  wrote:
> On Wed, May 31, 2017 at 1:18 PM, Dilip Kumar  wrote:
>> +   if (jointype = JOIN_SEMI)
>> +   {
>> +   sjinfo->jointype = JOIN_INNER;
>> +   }
>
> That is pretty obviously half-baked and completely untested.

Actually, I was not proposing this patch instead I wanted to discuss
the approach.  I was claiming that for
non-equal JOIN_SEMI selectivity estimation instead of calculating
selectivity in an existing way i.e
= 1- (selectivity of equal JOIN_SEMI)  the better way would be = 1-
(selectivity of equal).  I have only tested only standalone scenario
where it solves the problem but not the TPCH cases.  But I was more
interested in discussing that the way I am thinking how it should
calculate the nonequal SEMI join selectivity make any sense.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] TPC-H Q20 from 1 hour to 19 hours!

2017-06-01 Thread Tom Lane

Tomas Vondra  writes:
> Which brings me to the slightly suspicious bit. On 9.5, there's no 
> difference between GROUP and GROUP+LIKE cases - the estimates are 
> exactly the same in both cases. This is true too, but only without the 
> foreign key between "partsupp" and "part", i.e. the two non-grouped 
> relations in the join. And what's more, the difference (1737 vs. 16) is 
> pretty much exactly 100x, which is the estimate for the LIKE condition.
> So it kinda seems 9.5 does not apply this condition for semi-joins, 
> while >=9.6 does that.

I got around to looking into this finally.  I assume that when you say
you added a foreign key from partsupp to part, you also created a unique
index on part.p_partkey --- there is no such index in dbt3's version of
the schema, anyway, so I had to add one to create a foreign key.
The unique index itself, never mind the foreign key, changes things
substantially for this query in HEAD, as a result of commit 92a43e485:
the semijoin to part becomes a plain join, and so we go through entirely
different code paths to estimate selectivity.  But without that, there's
clearly something odd going on, because the LIKE condition ought to have
some effect on the estimate of number of matched rows.

I poked into it and found that the problem stems from the fact that the
initial estimate of the top join relation's size is estimated using
agg_lineitem.agg_partkey = part.p_partkey as the join condition, not
the original partsupp.ps_partkey = part.p_partkey qual.  We can get
the former from the latter via equivalence-clause deductions, and
I think it's just bad luck that the join is formed first in that
direction.  What that means is that eqjoinsel_semi() finds no statistics
for the LHS variable, although it does have stats for the RHS.  There
is logic in eqjoinsel_semi() that attempts to reduce the semijoin
selectivity estimate proportionally to whatever restriction clauses were
applied to the RHS ... but if you look closely, that code has no effect
if we lack stats for the LHS.  We'll fall past both the MCV-dependent
calculation and the nd1-vs-nd2 test and end up taking the selectivity
as just 0.5, independently of anything else.

It seems reasonable to me that we ought to derate that default estimate
by whatever we'd concluded the restriction selectivity on the inner rel
is.  So I experimented with the attached patch and verified that it
results in the LIKE affecting the final rowcount estimate in this
situation.  I've not tested it much further than that, other than checking
that we still pass regression tests.

I'm not sure if we ought to think about applying this now.  It will likely
make things worse not better for the Q20 query, because it will cause the
underestimate for the full query to be even further off.  Still, in
principle it seems like it ought to be an improvement in most cases.

The thing that would actually have a chance of improving matters for Q20
would be if we could see our way to looking through the aggregation
subquery and applying the foreign key constraint for lineitem.  That
seems like a research project though; it's surely not happening for v10.

I'm also wondering idly if we could fix things so that the join size
estimate gets formed from a join condition that we have stats for rather
than one we lack them for.  But I have no clear ideas on ways to go
about that that wouldn't be horrid kluges.

regards, tom lane

diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index 6e491bb..b875d5d 100644
*** a/src/backend/utils/adt/selfuncs.c
--- b/src/backend/utils/adt/selfuncs.c
*** eqjoinsel_semi(Oid operator,
*** 2603,2610 
  		 *
  		 * Crude as the above is, it's completely useless if we don't have
  		 * reliable ndistinct values for both sides.  Hence, if either nd1 or
! 		 * nd2 is default, punt and assume half of the uncertain rows have
! 		 * join partners.
  		 */
  		if (!isdefault1 && !isdefault2)
  		{
--- 2603,2609 
  		 *
  		 * Crude as the above is, it's completely useless if we don't have
  		 * reliable ndistinct values for both sides.  Hence, if either nd1 or
! 		 * nd2 is default, we can't use this.
  		 */
  		if (!isdefault1 && !isdefault2)
  		{
*** eqjoinsel_semi(Oid operator,
*** 2616,2622 
--- 2615,2635 
  uncertainfrac = nd2 / nd1;
  		}
  		else
+ 		{
+ 			/*
+ 			 * In this situation, we basically assume half of the uncertain
+ 			 * rows have join partners.  However, we'd still like to respond
+ 			 * to restriction clauses applied to the inner rel, so what we
+ 			 * really do is assume half of the uncertain rows have partners in
+ 			 * the underlying inner rel, then reduce that fraction by the
+ 			 * previously-determined selectivity of the inner restrictions.
+ 			 */
  			uncertainfrac = 0.5;
+ 			if (vardata2->rel &&
+ vardata2->rel->rows > 0 &&
+ vardata2->rel->rows <

Re: [HACKERS] An incomplete comment sentence in subtrans.c

2017-06-01 Thread Masahiko Sawada

On Thu, Jun 1, 2017 at 3:26 AM, Robert Haas  wrote:
> On Tue, May 30, 2017 at 7:43 PM, Masahiko Sawada  
> wrote:
>> There is an incomplete sentence at top of subtrans.c file. I think the
>> commit 88e66d19 removed the whole line mistakenly.
>
> Thanks for the patch.  Sorry for the mistake that made it necessary.  
> Committed.
>

Thank you!

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Get stuck when dropping a subscription during synchronizing table

2017-06-01 Thread Masahiko Sawada

On Thu, May 25, 2017 at 5:29 PM, tushar  wrote:
> On 05/25/2017 12:44 AM, Petr Jelinek wrote:
>>
>> There is still outstanding issue that sync worker will keep running
>> inside the long COPY because the invalidation messages are also not
>> processed until it finishes but all the original issues reported here
>> disappear for me with the attached patches applied.
>
> After applying all your patches, drop subscription no  more hangs while
> dropping  subscription but there is an error   "ERROR:  tuple concurrently
> updated" in the log file of
> standby.
>

I tried to reproduce this issue with latest four patches I submit but
it didn't happen. I guess this issue doesn't related to the issues
reported on this thread and another factor  might be cause of this
issue. It would be good to test the same again with latest four
patches or after solved current some open items.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [JDBC] [HACKERS] Channel binding support for SCRAM-SHA-256

2017-06-01 Thread Robert Haas

On Wed, May 31, 2017 at 7:59 PM, Stephen Frost  wrote:
> If your comments regarding the requirement that we have interoperability
> testing of this feature before accepting it were intended to mean that
> we need to make sure that the JDBC driver is able to work with the
> chosen solution (or, perhaps, that we support multuple options, one of
> which works with the JDBC changes), and that we review the other TLS
> libraries with the goal of making sure that what we agree on in core
> should work with those also, then that's great and exactly what I'd like
> to see also.

OK, great.  That's what I mean (mostly - see below).

> If your comments regarding the requirement that we have interoperability
> testing of this feature before accepting it were intended to mean that
> we must have a complete alternative TLS solution, while that would
> actually play a great deal to a goal I've had for a while to have PG
> support multiple TLS implementations (LibNSS being top-of-mind, at least
> for me, as I've mentioned a couple times), I can't agree that we should
> require that before accepting channel binding as a feature.  To do so
> would be akin to requiring that we support multiple TLS implementations
> before we agreed to support client-side certificates, in my opinion.

I don't think that we need to have a committed patch allowing multiple
SSL implementations before we accept a patch for channel binding, but
I think it might be a good idea for someone to hack either libpq or
the server enough to make it work with some other SSL implementation
(Windows SSL would probably be the best candidate) and test that what
we think will work for channel binding actually does work.  It
wouldn't need to be a committable patch, but it should be enough to
make a successful connection in the laboratory.  Now, there might also
be other ways besides that of testing that interoperability is
possible, so don't take me as being of the view that someone has to
necessarily do exactly that thing that I just said.  But I do think
that we probably should do more than say "hey, we've got this
undocumented SSL API, and this other Windows API, and it looks like
they do about the same thing, so we're good".  There's sometimes a big
difference between what sounds like it should work on paper and what
actually does work.

> I would be rather surprised if the solution which Michael and Alvaro
> come to results in a solution which requires us to break compatibility
> down the road, though it's of course a risk we need to consider.  The
> ongoing discussion around finding a way to support both methods outlined
> in the RFC sounds exactly correct to me and I encourage further
> discussion there.

Sure, I have no objection to the discussion.  Discussion is always
cool; what I'm worried about is a tls-unique implementation that is
supremely unportable, and there's no current evidence that the
currently-proposed patch is anything else.  There is some optimism,
but optimism <> evidence.

> I'm certainly on-board with the general idea of having a way for the
> client to require both SCRAM and channel binding and I agree that's a
> hole in our current system, but that is really an orthogonal concern.

Orthogonal but *very* closely related.

> entirely technical perspective.  If I were one of the individuals
> working on this feature, I'd be rather put-off and upset at the
> suggestion that the good work they're doing is viewed by the PostgreSQL
> community and one of our major committers as only being done to check a
> box or to be "buzzword compliant" and ultimately without technical
> merit.

I did not say that the feature was "ultimately without technical
merit"; I said that the patch as submitted seemed like it might not
interoperate, and that without a libpq option to force it to be used
it wouldn't add any real security.  I stand by those statements and I
think it is 100% appropriate for me to raise those issues.  Other
people, including you, do the same thing about other patches all the
time.  It is not a broadside against the contributors or the patch; it
is a short, specific list of concerns that are IMHO quite justifiable.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] <> join selectivity estimate question

2017-06-01 Thread Robert Haas

On Wed, May 31, 2017 at 1:18 PM, Dilip Kumar  wrote:
> +   if (jointype = JOIN_SEMI)
> +   {
> +   sjinfo->jointype = JOIN_INNER;
> +   }

That is pretty obviously half-baked and completely untested.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [BUGS] Concurrent ALTER SEQUENCE RESTART Regression

2017-06-01 Thread Robert Haas

On Wed, May 31, 2017 at 8:07 PM, Andres Freund  wrote:
> Here's a patch doing what I suggested above.  The second patch addresses
> an independent oversight where the post alter hook was invoked before
> doing the catalog update.

0002 is a slam-dunk.  I don't object to 0001 but haven't reviewed it carefully.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GSoC 2017 : Proposal for predicate locking in gist index

2017-06-01 Thread Shubham Barai

Hi, Kevin sir!

On 1 June 2017 at 02:20, Kevin Grittner  wrote:

> On Wed, May 31, 2017 at 3:02 PM, Shubham Barai 
> wrote:
>
> > I have been accepted as GSoC student for the project "Explicitly support
> > predicate locks in index access methods besides b-tree". I want to share
> my
> > approach for implementation of page level predicate locking in gist
> index.
>
> For the benefit of all following the discussion, implementing
> support in an index AM conceptually consists of two things:
> (1) Any scan with user-visible results must create SIRead predicate
> locks on "gaps" scanned.  (For example, a scan just to find an
> insertion spot for an index entry does not generally count, whereas
> a scan to satisfy a user "EXISTS" test does.)
> (2) Any insert into the index must CheckForSerializableConflictIn()
> at enough points that at least one of them will detect an overlap
> with a predicate lock from a preceding scan if the inserted index
> entry might have changed the results of that preceding scan.
>
> Detecting such a conflict does not automatically result in a
> serialization failure, but is only part of tracking the information
> necessary to make such a determination.  All that is handled by the
> existing machinery if the AM does the above.
>
> With a btree index, locking gaps at the leaf level is normally
> sufficient, because both scan and insertion of an index entry must
> descend that far.
>
> > The main difference between b-tree and gist index while searching for a
> > target tuple is that in gist index, we can determine if there is a match
> or
> > not at any level of the index.
>
> Agreed.  A gist scan can fail at any level, but for that scan to
> have produced a different result due to insertion of an index entry,
> *some* page that the scan looked at must be modified.
>
> > The simplest way to do that will be by inserting a call for
> > prdicatelockpage() in gistscanpage().
>
> Yup.
>
> > Insertion algorithm also needs to check for conflicting predicate locks
> at
> > each level of the index.
>
> Yup.
>
> > We can insert a call for CheckForSerializableConflictIn() at two places
> in
> > gistdoinsert().
> >
> > 1. after acquiring an exclusive lock on internal page (in case we are
> trying
> > to replace an old search key)
> >
> > 2. after acquiring an exclusive lock on leaf page
> >
> > If there is not enough space for insertion, we have to copy predicate
> lock
> > from an old page to all new pages generated after a successful split
> > operation. For that, we can insert a call for PredicateLockPageSplit() in
> > gistplacetopage().
>
> That all sounds good.  When you code a patch, don't forget to add
> tests to src/test/isolation.
>

> Do you need any help getting a development/build/test environment
> set up?
>

 Yes, any link to the documentation will be helpful.

>
> --
> Kevin Grittner
> VMware vCenter Server
> https://www.vmware.com/
>

Re: [HACKERS] TAP: allow overriding PostgresNode in get_new_node

2017-06-01 Thread Tels

Moin,

On Wed, May 31, 2017 10:18 pm, Craig Ringer wrote:
> On 31 May 2017 at 08:43, Craig Ringer  wrote:
>> Hi all
>>
>> More and more I'm finding it useful to extend PostgresNode for project
>> specific helper classes. But PostgresNode::get_new_node is a factory
>> that doesn't provide any mechanism for overriding, so you have to
>> create a PostgresNode then re-bless it as your desired subclass. Ugly.
>>
>> The attached patch allows an optional second argument, a class name,
>> to be passed to PostgresNode::get_new_node . It's instantiated instead
>> of PostgresNode if supplied. Its 'new' constructor must take the same
>> arguments.
>
> Note that you can achieve the same effect w/o patching
> PostgresNode.pm, albeit in a somewhat ugly manner, by re-blessing the
> returned object.
>
> sub get_new_mywhatever_node {
> my $self = PostgresNode::get_new_node($name);
> $self = bless $self, 'MyWhateverNode';
> return $self;
> }
>
> so this would be cosmetically nice, but far from functionally vital.

It's good style in Perl to have constructors bless new objects with the
class that is passed in, tho.

(I'd even go so far as to say that any Perl OO code that uses fixed class
names is broken).

While technically you can rebless a returned object, that breaks thge
subclassing, sometimes subtle, and sometimes really bad.

For instances, any method calls the constructor does, are happening in the
"hardcoded" package, not in the subclass you are using, because the
reblessing happens later.

Consider for instance:

 package MyBase;

 sub new
{
my $self = bless {}, 'MyBase';
# it should be instead:
# my $self = bless {}, shift;
$self->_init();
}

 sub _init
{
my ($self) = @_;

$self->{foo} = 'bar';

# return the initialized object
$self;
}

If you do the this:

 package MySubclass;

 use MyBase;
 use vars qw/@ISA/;

 @ISA = qw/MyBase/;

 sub _init
{
my ($self) = @_;

# call the base's _init
$self->SUPER::_init();

# initialize our own stuff and override some
$self->{foo} = 'subclass';
$self->{baz} = 1;

# return the initialized object
$self;
}

and try to use it like this:

  package main;

  use MySubclass;

  my $thingy = MySubclass->new();
  print $thingy->{foo},"\n";

you get "bar", not "subclass" - even if you rebless $thingy into the
correct class.

And as someone who subclasses MyBase, you have no idea why or how and it
will break with the next update to MyBase's code. While technically you
can work around that by "peeking" into MyBase's code and maybe some
reblessing, the point is that you shouldn't do nor need to do this.

Please SEE: http://perldoc.perl.org/perlobj.html

Regards,

Tels

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] make check false success

2017-06-01 Thread Sandro Santilli

I noticed that the `check` Makefile rule imported by PGXS is giving
a success exit code even when it is unsupported.

The attached patch fixes that.

--strk;

  ()   Free GIS & Flash consultant/developer
  /\   https://strk.kbt.io/services.html
>From 43fa28f141871a6efdd3e5d0c9ec8cc537585ff5 Mon Sep 17 00:00:00 2001
From: Sandro Santilli 
Date: Thu, 1 Jun 2017 16:14:58 +0200
Subject: [PATCH] Make sure `make check` fails when it cannot be run

---
 src/makefiles/pgxs.mk | 1 +
 1 file changed, 1 insertion(+)

diff --git a/src/makefiles/pgxs.mk b/src/makefiles/pgxs.mk
index c27004ecfb..5274499116 100644
--- a/src/makefiles/pgxs.mk
+++ b/src/makefiles/pgxs.mk
@@ -279,6 +279,7 @@ ifdef PGXS
 check:
 	@echo '"$(MAKE) check" is not supported.'
 	@echo 'Do "$(MAKE) install", then "$(MAKE) installcheck" instead.'
+	@false
 else
 check: submake $(REGRESS_PREP)
 	$(pg_regress_check) $(REGRESS_OPTS) $(REGRESS)
-- 
2.11.0


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_class.relpartbound definition overly brittle

2017-06-01 Thread Mark Dilger

> On Jun 1, 2017, at 6:31 AM, Tom Lane  wrote:
> 
> Mark Dilger  writes:
>> When you guys commit changes that impact partitioning, I notice, and change
>> my code to match.  But in this case, it seemed to me the change that got
>> committed was not thought through, and it might benefit the community for
>> me to point it out, rather than quietly make my code behave the same as
>> what got committed.
> 
> Let me explain the project standards in words of one syllable: user code
> should not examine the contents of node trees.  That's what pg_get_expr
> is for.  There is not, never has been, and never will be any guarantee
> that we won't whack those structures around in completely arbitrary ways,
> as long as we do a catversion bump along with it.

I have already dealt with this peculiarity and don't care much at this point, 
but
I think your characterization of my position is inaccurate:

I'm really not talking about examining the contents of node trees.  I'm only
talking about comparing two of them for equality, and getting false as an 
answer,
when they are actually equal.  Asking "Is A == B" is not "examining the 
contents",
it is asking the project supplied comparator function to examine the contents,
which is on par with asking the project supplied pg_get_expr function to do so.
There is currently only one production in gram.y in the public sources that
creates partitions, so you don't notice the relpartbound being dependent on
which grammar production defined the table.  I notice, and think it is an odd
thing to encode in the the relpartbound field.  Bumping the catversion is 
irrelevant
if you have two or more different productions in gram.y that give rise to these
field values.  Apparently, you don't care.  Fine by me.  It just seemed to me an
oversight when I first encountered it, rather than something anybody would
intentionally commit to the sources.

Mark Dilger

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] tap tests on older branches fail if concurrency is used

2017-06-01 Thread Tom Lane

Andres Freund  writes:
> when using
> $ cat ~/.proverc
> -j9
> some tests fail for me in 9.4 and 9.5.

Weren't there fixes specifically intended to make that safe, awhile ago?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_class.relpartbound definition overly brittle

2017-06-01 Thread Tom Lane

Mark Dilger  writes:
> When you guys commit changes that impact partitioning, I notice, and change
> my code to match.  But in this case, it seemed to me the change that got
> committed was not thought through, and it might benefit the community for
> me to point it out, rather than quietly make my code behave the same as
> what got committed.

Let me explain the project standards in words of one syllable: user code
should not examine the contents of node trees.  That's what pg_get_expr
is for.  There is not, never has been, and never will be any guarantee
that we won't whack those structures around in completely arbitrary ways,
as long as we do a catversion bump along with it.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: retry shm attach for windows (WAS: Re: [HACKERS] OK, so culicidae is still broken)

2017-06-01 Thread Tom Lane

Robert Haas  writes:
> So, are you going to, perhaps, commit this?  Or who is picking this up?

> /me knows precious little about Windows.

I'm not going to be the one to commit this either, but seems like someone
should.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Effect of changing the value for PARALLEL_TUPLE_QUEUE_SIZE

2017-06-01 Thread Rafia Sabih

On Tue, May 30, 2017 at 4:57 PM, Robert Haas  wrote:

> I did a little bit of brief experimentation on this same topic a long
> time ago and didn't see an improvement from boosting the queue size
> beyond 64k but Rafia is testing Gather rather than Gather Merge and,
> as I say, my test was very brief.  I think it would be a good idea to
> try to get a complete picture here.  Does this help on any query that
> returns many tuples through the Gather?  Only the ones that use Gather
> Merge?  Some queries but not others with no obvious pattern?  Only
> this query?
>
I did further exploration trying other values of
PARALLEL_TUPLE_QUEUE_SIZE and trying different queries and here are my
findings,
- on even setting PARALLEL_TUPLE_QUEUE_SIZE to 655360, there isn't
much improvement in q12 itself.
- there is no other TPC-H query which is showing significant
improvement on 6553600 itself. There is a small improvement in q3
which is also using gather-merge.
- as per perf analysis of q12 on head and patch, the %age of
ExecGatherMerge is 18% with patch and 98% on head, and similar with
gather_merge_readnext and gather_merge_writenext.

As per my understanding it looks like this increase in tuple queue
size is helping only gather-merge. Particularly, in the case where it
is enough stalling by master in gather-merge because it is maintaining
the sort-order. Like in q12 the index is unclustered and gather-merge
is just above parallel index scan, thus, it is likely that to maintain
the order the workers have to wait long for the in-sequence tuple is
attained by the master. Something like this might be happening, master
takes one tuple from worker 1, then next say 10 tuples from worker 2
and so on, and then finally returning to worker1, so, one worker 1 has
done enough that filled it's queue it sits idle. Hence, on increasing
the tuple queue size helps in workers to keep on working for longer
and this is improving the performance.

In other cases like q3, q18, etc. gather-merge is above sort, partial
group aggregate, etc. here the chances of stalls is comparatively
lesser stalls since the scan of relation is using the primary key,
hence the tuples in the blocks are likely to be in the order. Similar
was the case for many other cases of TPC-H queries. Other thing is
that in TPC-H benchmark queries most of the time the number of tuples
at gather-merge is fairly low so I'll try to test this on some custom
queries which exhibit aforementioned case.

> Blindly adding a GUC because we found one query that would be faster
> with a different value is not the right solution.   If we don't even
> know why a larger value is needed here and (maybe) not elsewhere, then
> how will any user possibly know how to tune the GUC?  And do we really
> want the user to have to keep adjusting a GUC before each query to get
> maximum performance?  I think we need to understand the whole picture
> here, and then decide what to do.  Ideally this would auto-tune, but
> we can't write code for that without a more complete picture of the
> behavior.
>
Yeah may be for the scenario discussed above GUC is not the best idea
but may be using something which can tell the relation between the
ordering on index and the physical ordering of the tuples along with
the number of tuples, etc. by the planner to decide the value of
PARALLEL_TUPLE_QUEUE_SIZE might help. E.g. if the index is primary key
then the physical order is same as index order and if this the sort
key then while at gather-merge stalls would be less, but if this is
unclustered index then the physical order is way different than index
order then it is likely that workers would be stalling more so keep a
higher value of PARALLEL_TUPLE_QUEUE _SIZE based on the number of
tuples.

Again I am not yet concluding anything as this is very less
experimentation to ascertain something, I'll continue the experiments
and would be grateful to have more suggestions on that.

-- 
Regards,
Rafia Sabih
EnterpriseDB: http://www.enterprisedb.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ALTER INDEX .. SET STATISTICS ... behaviour

2017-06-01 Thread Alexander Korotkov

On Wed, May 31, 2017 at 7:18 PM, Alexander Korotkov <
a.korot...@postgrespro.ru> wrote:

> On Wed, May 31, 2017 at 6:53 PM, Tom Lane  wrote:
>
>> Alexander Korotkov  writes:
>> > I've discovered that PostgreSQL is able to run following kind of
>> queries in
>> > order to change statistic-gathering target for an indexed expression.
>>
>> > ALTER INDEX index_name ALTER COLUMN expr SET STATISTICS stat_target;
>>
>> > It's been previously discussed in [1].
>>
>> > I think this should be fixed not just in docs.  This is why I've started
>> > thread in pgsql-hackers.  For me usage of internal column names "expr",
>> > "expr1", "expr2" etc. looks weird.  And I think we should replace it
>> with a
>> > better syntax.  What do you think about these options?
>>
>> > ALTER INDEX index_name ALTER EXPRESSION 0 SET STATISTICS stat_target; --
>> > Refer expression by its number
>> > ALTER INDEX index_name ALTER EXPRESSION (x + y) SET STATISTICS
>> stat_target;
>> > -- Refer expression by its definition
>>
>> Don't like either of those particularly, but what about just targeting
>> a column by column number, independently of whether it's an expression?
>>
>> ALTER INDEX myindex ALTER COLUMN 4 SET STATISTICS 1000;
>>
>
> I completely agree with that.
> If no objections, I will write a patch for that.
>

Please, find it in attached patch.

I decided to forbid referencing columns by numbers for tables, because
those numbers could contain gaps.  Also, I forbid altering statistics
target for non-expression index columns, because it has no effect.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


alter-index-statistics-1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Create subscription with `create_slot=false` and incorrect slot name

2017-06-01 Thread Petr Jelinek

On 01/06/17 04:44, Peter Eisentraut wrote:
> On 5/31/17 09:40, Robert Haas wrote:
>> On Tue, May 30, 2017 at 3:01 PM, Peter Eisentraut
>>  wrote:
>>> On 5/25/17 17:26, Peter Eisentraut wrote:
 Another way to fix this particular issue is to not verify the
 replication slot name before doing the drop.  After all, if the name is
 not valid, then you can also just report that it doesn't exist.
>>>
>>> Here is a possible patch along these lines.
>>
>> I don't see how this solves the problem.  Don't you still end up with
>> an error message telling you that you can't drop the subscription, and
>> no guidance as to how to fix it?
> 
> Well, the idea was to make the error message less cryptic.
> 
> But I notice that there is really little documentation about this.  So
> how about the attached documentation patch as well?
> 
> As mentioned earlier, if we want to do HINT messages, that will be a bit
> more involved and probably PG11 material.
> 

I think the combination of those patches is probably good enough
solution for PG10 (I never understood the need for name validation in
ReplicationSlotAcquire() anyway).

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical replication busy-waiting on a lock

2017-06-01 Thread Petr Jelinek

On 31/05/17 11:21, Petr Jelinek wrote:
> On 31/05/17 09:24, Andres Freund wrote:
>> On 2017-05-29 23:49:33 +0200, Petr Jelinek wrote:
>>> I am not quite sure I understand (both the vxid suggestion and for the
>>> session dying badly). Maybe we can discuss bit more when you get to
>>> computer so it's easier for you to expand on what you mean.
>>
>> The xid interlock when exporting a snapshot is required because
>> otherwise it's not generally guaranteed that all resourced required for
>> the snapshot are reserved.  In the logical replication case we could
>> guarantee that otherwise, but there'd be weird-ish edge cases when
>> erroring out just after exporting a snapshot.
>>
>> The problem with using the xid as that interlock is that it requires
>> acquiring an xid - which is something we're going to block against when
>> building a new catalog snapshot.  Afaict that's not entirely required -
>> all that we need to verify is that the snapshot in the source
>> transaction is still running.  The easiest way for the importer to check
>> that the source is still alive seems to be export the virtual
>> transaction id instead of the xid.  Normally we can't store things like
>> virtual xids on disk, but that concern isn't valid here because exported
>> snapshots are ephemeral, there's also already a precedent in
>> predicate.c.
>>
>> It seems like it'd be fairly easy to change things around that way, but
>> maybe I'm missing something.
>>
> 
> Okay, thanks for explanation. Code-wise it does seem simple enough
> indeed. I admit I don't know enough about the exported snapshots and
> snapshot management in general to be able to answer the question of
> safety here. That said, it does seem to me like it should work as the
> exported snapshots are just on disk representation of in-memory state
> that becomes invalid once the in-memory state does.
> 

Thinking more about this, I am not convinced it's a good idea to change
exports this late in the cycle. I still think it's best to do the xid
assignment only when the snapshot is actually exported but don't assign
xid when the export is only used by the local session (the new option in
PG10). That's one line change which impacts only logical
replication/decoding as opposed to everything else which uses exported
snapshots.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] walsender & parallelism

2017-06-01 Thread Petr Jelinek

On 01/06/17 06:06, Andres Freund wrote:
> On 2017-05-31 23:51:08 -0400, Peter Eisentraut wrote:
>> I think the easiest and safest thing to do now is to just prevent
>> parallel plans in the walsender.  See attached patch.  This prevents the
>> hang in the select_parallel tests run under your new test setup.
> 
> I'm not quite sure I can buy this.  The lack of wired up signals has
> more problems than just hurting parallelism.  In fact, the USR1 thing
> seems like something that we actually should backpatch, rather than
> defer to v11.  I think there's some fair arguments to be made that we
> shouldn't do the refactoring right now - although I'm not sure about it
> - but just not fixing the bugs seems like a bad plan.
> 

I think the signal handling needs to be fixed. It does not have to be
done via large refactoring, but signals should be handled properly (= we
need to share SIGHUP/SIGUSR1 handling between postgres.c and walsender.c).

The rest can wait for PG11.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Server ignores contents of SASLInitialResponse

2017-06-01 Thread Heikki Linnakangas


On 05/25/2017 06:36 PM, Michael Paquier wrote:

On Thu, May 25, 2017 at 10:52 AM, Michael Paquier
 wrote:

Actually, I don't think that we are completely done here. Using the
patch of upthread to enforce a failure on SASLInitialResponse, I see
that connecting without SSL causes the following error:
psql: FATAL:  password authentication failed for user "mpaquier"
But connecting with SSL returns that:
psql: duplicate SASL authentication request

I have not looked at that in details yet, but it seems to me that we
should not take pg_SASL_init() twice in the scram authentication code
path in libpq for a single attempt.


Gotcha. This happens because of sslmode=prefer, on which
pqDropConnection is used to clean up the connection state. So it seems
to me that the correct fix is to move the cleanup of sasl_state to
pqDropConnection() instead of closePGconn(). Once I do so the failures
are correct, showing to the user two FATAL errors because of the two
attempts:
psql: FATAL:  password authentication failed for user "mpaquier"
FATAL:  password authentication failed for user "mpaquier"


Hmm, the SASL state cleanup is done pretty much the same way that 
GSS/SSPI cleanup is. Don't we have a similar problem with GSS?


I tested that, and I got an error from glibc, complaining about a 
double-free:



*** Error in `/home/heikki/pgsql.master/bin/psql': double free or corruption 
(out): 0x56157d13be00 ***
=== Backtrace: =
/lib/x86_64-linux-gnu/libc.so.6(+0x70bcb)[0x7f6a460b7bcb]
/lib/x86_64-linux-gnu/libc.so.6(+0x76f96)[0x7f6a460bdf96]
/lib/x86_64-linux-gnu/libc.so.6(+0x7778e)[0x7f6a460be78e]
/home/heikki/pgsql.master/lib/libpq.so.5(+0xa1dc)[0x7f6a46d651dc]
/home/heikki/pgsql.master/lib/libpq.so.5(+0xa4b3)[0x7f6a46d654b3]
/home/heikki/pgsql.master/lib/libpq.so.5(+0xacb9)[0x7f6a46d65cb9]
/home/heikki/pgsql.master/lib/libpq.so.5(PQconnectPoll+0x14e7)[0x7f6a46d6ae81]
/home/heikki/pgsql.master/lib/libpq.so.5(+0xe895)[0x7f6a46d69895]
/home/heikki/pgsql.master/lib/libpq.so.5(PQconnectdbParams+0x4f)[0x7f6a46d675b9]
/home/heikki/pgsql.master/bin/psql(+0x3daa9)[0x56157ccafaa9]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf1)[0x7f6a460672b1]
/home/heikki/pgsql.master/bin/psql(+0x1163a)[0x56157cc8363a]


I bisected that; the culprit was commit 61bf96cab0, where I refactored 
the libpq authentication code in preparation for SCRAM. The logic around 
that free() was always a bit wonky, but the refactoring made it outright 
broken. Attached is a patch for that, see commit message for details. 
(Review of that would be welcome.)


So, after fixing that, back to the original question; don't we have a 
similar "duplicate authentication request" problem with GSS? Yes, turns 
out that we do, even on stable branches:


psql "sslmode=prefer dbname=postgres hostaddr=127.0.0.1 
krbsrvname=postgres  host=localhost user=krbtestuser"

psql: duplicate GSS authentication request

To fix, I suppose we can do what you did for SASL in your patch, and 
move the cleanup of conn->gctx from closePGconn to pgDropConnection. And 
I presume we need to do the same for the SSPI state too, but I don't 
have a Windows set up to test that at the moment.


- Heikki



0001-Fix-double-free-bug-in-GSS-authentication.patch
Description: invalid/octet-stream

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] UPDATE of partition key

2017-06-01 Thread Amit Khandekar

On 1 June 2017 at 03:25, Robert Haas  wrote:
> Greg/Amit's idea of using the CTID field rather than an infomask bit
> seems like a possibly promising approach.  Not everything that needs
> bit-space can use the CTID field, so using it is a little less likely
> to conflict with something else we want to do in the future than using
> a precious infomask bit.  However, I'm worried about this:
>
> /* Make sure there is no forward chain link in t_ctid */
> tp.t_data->t_ctid = tp.t_self;
>
> The comment does not say *why* we need to make sure that there is no
> forward chain link, but it implies that some code somewhere in the
> system does or at one time did depend on no forward link existing.
> Any such code that still exists will need to be updated.  Anybody know
> what code that might be, exactly?

I am going to have a look overall at this approach, and about code
somewhere else which might be assuming that t_ctid cannot be Invalid.

> Regarding the trigger issue, I can't claim to have a terribly strong
> opinion on this.  I think that practically anything we do here might
> upset somebody, but probably any halfway-reasonable thing we choose to
> do will be OK for most people.  However, there seems to be a
> discrepancy between the approach that got the most votes and the one
> that is implemented by the v8 patch, so that seems like something to
> fix.

Yes, I have started working on updating the patch to use that approach
(BR and AR update triggers on source and destination partition
respectively, instead of delete+insert) The approach taken by the
patch (BR update + delete+insert triggers) didn't require any changes
in the way ExecDelete() and ExecInsert() were called. Now we would
require to skip the delete/insert triggers, so some flags need to be
passed to these functions, or else have stripped down versions of
ExecDelete() and ExecInsert() which don't do other things like
RETURNING handling and firing triggers.

>
> For what it's worth, in the future, I imagine that we might allow
> adding a trigger to a partitioned table and having that cascade down
> to all descendant tables.  In that world, firing the BR UPDATE trigger
> for the old partition and the AR UPDATE trigger for the new partition
> will look a lot like the behavior the user would expect on an
> unpartitioned table, which could be viewed as a good thing.  On the
> other hand, it's still going to be a DELETE+INSERT under the hood for
> the foreseeable future, so firing the delete triggers and then the
> insert triggers is also defensible.

Ok, I was assuming that there won't be any plans to support triggers
on a partitioned table, but yes, I had imagined how the behaviour
would be in this world. Currently, users who want to have triggers on
a table that happens to be a partitioned table, have to install the
same trigger on each of the leaf partitions, since there is no other
choice. But we would never know whether a trigger on a leaf partition
was actually meant to be specifically on that individual partition or
it was actually meant to be a trigger on a root partitioned table.
Hence there is the difficulty of deciding the right behaviour in case
of triggers with row movement.

If we have an AR UPDATE trigger on root table, then during row
movement, it does not matter whether we fire the trigger on source or
destination, because it is the same single trigger cascaded on both
the partitions. If there is a trigger installed specifically on a leaf
partition, then we know that it should not be fired on other
partitions since it is specifically made for this one. And same
applies for delete and insert triggers: If installed on parent, don't
involve them in row-movement; only fire them if installed on leaf
partitions regardless of whether it was an internally generated
delete+insert due to row-movement). Similarly we can think about BR
triggers.

Of courses, DBAs should be aware of triggers that are already
installed in the table ancestors before installing a new one on a
child table.

Overall, it becomes much clearer what to do if we decide to allow
triggers on partitioned tables.

> Is there any big difference between these appraoches in terms
> of how much code is required to make this work?

You mean if we allow triggers on partitioned tables ? I think we would
have to keep some flag in the trigger data (or somewhere else) that
the trigger actually belongs to upper partitioned table, and so for
delete+insert, don't fire such trigger. Other than that, we don't have
to decide in any unique way which trigger to fire on which table.

>
> In terms of the approach taken by the patch itself, it seems
> surprising to me that the patch only calls
> ExecSetupPartitionTupleRouting when an update fails the partition
> constraint.  Note that in the insert case, we call that function at
> the start of execution;

> calling it in the middle seems to involve additional hazards;
> for example, is it really safe to add additional

Re: [HACKERS] Error while creating subscription when server is running in single user mode

2017-06-01 Thread Dilip Kumar

On Thu, Jun 1, 2017 at 1:02 PM, Michael Paquier
 wrote:
> Thanks, this looks correct to me at quick glance.
>
> +if (!IsUnderPostmaster)
> +ereport(FATAL,
> +(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
> +errmsg("subscription commands are not supported by
> single-user servers")));
> The messages could be more detailed, like directly the operation of
> CREATE/ALTER/DROP SUBCRIPTION in each error message. But that's a nit.

Thanks for looking into it.  Yeah, I think it's better to give
specific message instead of generic because we still support some of
the subscription commands even in single-user mode i.e ALTER
SUBSCRIPTION OWNER.  My patch doesn't block this command and there is
no harm in supporting this in single-user mode but does this make any
sense?  We may create some use case like creation subscription in
normal mode and then ALTER OWNER in single user mode but it makes
little sense to me.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

subscription_error_v1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] "create publication..all tables" ignore 'partition not supported' error

2017-06-01 Thread Kuntal Ghosh

On Thu, Jun 1, 2017 at 6:56 AM, Peter Eisentraut
 wrote:
> On 5/31/17 02:17, Kuntal Ghosh wrote:
>> On Wed, May 31, 2017 at 12:58 AM, Masahiko Sawada  
>> wrote:
>>>
>>> I'd say we can fix this issue by just changing the query. Attached
>>> patch changes the query so that it can handle publication name
>>> correctly, the query gets complex, though.
>>>
>> In is_publishable_class function, there are four conditions to decide
>> whether this is a publishable class or not.
>>
>> 1. relkind == RELKIND_RELATION
>> 2. IsCatalogClass()
>> 3. relpersistence == 'p'
>> 4. relid >= FirstNormalObjectId /* Skip tables created during initdb */
>>
>> I think the modified query should have a check for the fourth condition as 
>> well.
>
> The query should be fixed like in the attached patch.
> pg_get_publication_tables() ends up calling is_publishable_class()
> internally.
>
Works fine.



-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] TAP: allow overriding PostgresNode in get_new_node

2017-06-01 Thread Michael Paquier

On Wed, May 31, 2017 at 7:18 PM, Craig Ringer  wrote:
> Note that you can achieve the same effect w/o patching
> PostgresNode.pm, albeit in a somewhat ugly manner, by re-blessing the
> returned object.
>
> sub get_new_mywhatever_node {
> my $self = PostgresNode::get_new_node($name);
> $self = bless $self, 'MyWhateverNode';
> return $self;
> }
>
> so this would be cosmetically nice, but far from functionally vital.

+$pgnclass = 'PostgresNode' unless defined $pgnclass;
I'd rather leave any code of this kind for the module maintainers,
there is no actual reason to complicate PostgresNode.pm with code
that's not directly useful for what is in-core.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Error while creating subscription when server is running in single user mode

2017-06-01 Thread Michael Paquier

On Wed, May 31, 2017 at 10:49 PM, Dilip Kumar  wrote:
> On Wed, May 31, 2017 at 2:20 PM, Michael Paquier
>  wrote:
>> Yeah, see 0e0f43d6 for example. A simple fix is to look at
>> IsUnderPostmaster when creating, altering or dropping a subscription
>> in subscriptioncmds.c.
>
> Yeah, below patch fixes that.

Thanks, this looks correct to me at quick glance.

+if (!IsUnderPostmaster)
+ereport(FATAL,
+(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+errmsg("subscription commands are not supported by
single-user servers")));
The messages could be more detailed, like directly the operation of
CREATE/ALTER/DROP SUBCRIPTION in each error message. But that's a nit.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Is ECPG's SET CONNECTION really not thread-aware?

2017-06-01 Thread Tsunakawa, Takayuki

Hello,

The following page says:

https://www.postgresql.org/docs/devel/static/ecpg-connect.html#ecpg-set-connection

--
EXEC SQL AT connection-name SELECT ...;

If your application uses multiple threads of execution, they cannot share a 
connection concurrently. You must either explicitly control access to the 
connection (using mutexes) or use a connection for each thread. If each thread 
uses its own connection, you will need to use the AT clause to specify which 
connection the thread will use.

EXEC SQL SET CONNECTION connection-name;

...It is not thread-aware.
--


What does this mean by "not thread-aware?"  Does SET CONNECTION in one thread 
change the current connection in another thread?
It doesn't look so, because the connection management and SQL execution in ECPG 
uses thread-specific data (TSD) to get and set the current connection, like 
this:


bool
ECPGsetconn(int lineno, const char *connection_name)
{
struct connection *con = ecpg_get_connection(connection_name);

if (!ecpg_init(con, connection_name, lineno))
return (false);

#ifdef ENABLE_THREAD_SAFETY
pthread_setspecific(actual_connection_key, con);
#else
actual_connection = con;
#endif
return true;
}

Regards
Takayuki Tsunakawa



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

91 matches

Mail list logo