date:20171130

Re: [HACKERS] INSERT ON CONFLICT and partitioned tables

2017-11-30 Thread Amit Langote

On 2017/12/01 11:01, Amit Langote wrote:
> On 2017/12/01 1:02, Robert Haas wrote:
>> Second, this would be the first place where the second argument to
>> ExecOpenIndices() is passed simply as true.  The only other caller
>> that doesn't pass constant false is in nodeModifyTable.c and looks
>> like this:
>>
>> ExecOpenIndices(resultRelInfo, mtstate->mt_onconflict != ONCONFLICT_NONE);
>>
>> The intention here appears to be to avoid the overhead of doing
>> BuildSpeculativeIndexInfo() when it isn't needed.  I think we should
>> try to do the same thing here.  As written, the patch will do that
>> work even when the query has no ON CONFLICT clause, which doesn't seem
>> like a good idea.
> 
> Agreed.  One thing though is that ExecSetupPartitionTupleRouting doesn't
> receive the mtstate today.  I've a patch pending that will re-design that
> interface (for another project) that I will post in the near future, but
> meanwhile let's just add a new parameter mtstate so that this patch can
> move forward with its business.  The future patch I'm talking about will
> keep the parameter intact, so it should be OK now to add the same.
> 
> Attached two patches:
> 
> 0001: refactoring to add the mtstate parameter
> 0002: the original patch updated to address the above comment

I forgot to consider the fact that mtstate could be NULL in
ExecSetupPartitionTupleRouting(), so would result in dereferencing NULL
pointer when called from CopyFrom(), which fixed in the attached updated
patch.

Thanks,
Amit
From 986676c8614d2d5954ea5d505cf729dc6dc487bd Mon Sep 17 00:00:00 2001
From: amit 
Date: Fri, 1 Dec 2017 10:44:37 +0900
Subject: [PATCH 1/2] Pass mtstate to ExecSetupPartitionTupleRouting

---
 src/backend/commands/copy.c| 3 ++-
 src/backend/executor/execPartition.c   | 3 ++-
 src/backend/executor/nodeModifyTable.c | 3 ++-
 src/include/executor/execPartition.h   | 3 ++-
 4 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 13eb9e34ba..bace390470 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -2478,7 +2478,8 @@ CopyFrom(CopyState cstate)
int num_parted,
num_partitions;
 
-   ExecSetupPartitionTupleRouting(cstate->rel,
+   ExecSetupPartitionTupleRouting(NULL,
+  
cstate->rel,
   1,
   
estate,
   
&partition_dispatch_info,
diff --git a/src/backend/executor/execPartition.c 
b/src/backend/executor/execPartition.c
index 59a0ca4597..f6108a156b 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -63,7 +63,8 @@ static char *ExecBuildSlotPartitionKeyDescription(Relation 
rel,
  * RowExclusiveLock mode upon return from this function.
  */
 void
-ExecSetupPartitionTupleRouting(Relation rel,
+ExecSetupPartitionTupleRouting(ModifyTableState *mtstate,
+  Relation rel,
   Index resultRTindex,
   EState *estate,
   PartitionDispatch 
**pd,
diff --git a/src/backend/executor/nodeModifyTable.c 
b/src/backend/executor/nodeModifyTable.c
index 1e3ece9b34..afb83ed3ae 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -1953,7 +1953,8 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, 
int eflags)
int num_parted,
num_partitions;
 
-   ExecSetupPartitionTupleRouting(rel,
+   ExecSetupPartitionTupleRouting(mtstate,
+  rel,
   
node->nominalRelation,
   
estate,
   
&partition_dispatch_info,
diff --git a/src/include/executor/execPartition.h 
b/src/include/executor/execPartition.h
index 43ca9908aa..86a199d169 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -49,7 +49,8 @@ typedef struct PartitionDispatchData
 
 typedef struct PartitionDispatchData *PartitionDispatch;
 
-extern void ExecSetupPartitionTupleRouting(Relation rel,
+extern void ExecSetupPartitionTupleRouting(ModifyTableState *mtstate,
+  Relation rel,

Re: [HACKERS] Partition-wise join for join between (declaratively) partitioned tables

2017-11-30 Thread Rajkumar Raghuwanshi

On Tue, Oct 31, 2017 at 2:45 PM, Robert Haas  wrote:
>> OK, committed.  This is a good example of how having good code
> coverage doesn't necessarily mean you've found all the bugs.  :-)
>
As of now partition_join.sql is not having test cases covering cases
where partition table have default partition, attaching a small test
case patch to cover those.

Thanks & Regards,
Rajkumar Raghuwanshi
QMG, EnterpriseDB Corporation
diff --git a/src/test/regress/expected/partition_join.out b/src/test/regress/expected/partition_join.out
index 27ab852..045b4c2 100644
--- a/src/test/regress/expected/partition_join.out
+++ b/src/test/regress/expected/partition_join.out
@@ -1337,6 +1337,160 @@ SELECT avg(t1.a), avg(t2.b), avg(t3.a + t3.b), t1.c, t2.c, t3.c FROM pht1 t1, ph
  574. | 574.5000 | 1148. | 0011 | 0011 | A0011
 (12 rows)
 
+-- test default partition behavior for range
+ALTER TABLE prt1 DETACH PARTITION prt1_p3;
+ALTER TABLE prt2 DETACH PARTITION prt2_p3;
+ALTER TABLE prt1 ATTACH PARTITION prt1_p3 DEFAULT;
+ALTER TABLE prt2 ATTACH PARTITION prt2_p3 DEFAULT;
+-- Join with pruned partitions from joining relations
+EXPLAIN (COSTS OFF)
+SELECT t1.a, t1.c, t2.b, t2.c FROM prt1 t1, prt2 t2 WHERE t1.a = t2.b AND t1.a < 450 AND t2.b > 250 AND t1.b = 0 ORDER BY t1.a, t2.b;
+  QUERY PLAN   
+---
+ Sort
+   Sort Key: t1.a
+   ->  Append
+ ->  Hash Join
+   Hash Cond: (t2.b = t1.a)
+   ->  Seq Scan on prt2_p2 t2
+ Filter: (b > 250)
+   ->  Hash
+ ->  Seq Scan on prt1_p2 t1
+   Filter: ((a < 450) AND (b = 0))
+ ->  Nested Loop
+   ->  Seq Scan on prt1_p3 t1_1
+ Filter: ((a < 450) AND (b = 0))
+   ->  Index Scan using iprt2_p3_b on prt2_p3 t2_1
+ Index Cond: ((b = t1_1.a) AND (b > 250))
+(15 rows)
+
+SELECT t1.a, t1.c, t2.b, t2.c FROM prt1 t1, prt2 t2 WHERE t1.a = t2.b AND t1.a < 450 AND t2.b > 250 AND t1.b = 0 ORDER BY t1.a, t2.b;
+  a  |  c   |  b  |  c   
+-+--+-+--
+ 300 | 0300 | 300 | 0300
+(1 row)
+
+ALTER TABLE prt1_e DETACH PARTITION prt1_e_p3;
+ALTER TABLE prt1_e ATTACH PARTITION prt1_e_p3 DEFAULT;
+-- N-way join
+EXPLAIN (COSTS OFF)
+SELECT t1.a, t1.c, t2.b, t2.c, t3.a + t3.b, t3.c FROM (prt1 t1 LEFT JOIN prt2 t2 ON t1.a = t2.b) RIGHT JOIN prt1_e t3 ON (t1.a = (t3.a + t3.b)/2) WHERE t3.c = 0 ORDER BY t1.a, t2.b, t3.a + t3.b;
+   QUERY PLAN
+-
+ Sort
+   Sort Key: t1.a, t2.b, ((t3.a + t3.b))
+   ->  Result
+ ->  Append
+   ->  Nested Loop Left Join
+ ->  Hash Right Join
+   Hash Cond: (t1.a = ((t3.a + t3.b) / 2))
+   ->  Seq Scan on prt1_p1 t1
+   ->  Hash
+ ->  Seq Scan on prt1_e_p1 t3
+   Filter: (c = 0)
+ ->  Index Scan using iprt2_p1_b on prt2_p1 t2
+   Index Cond: (t1.a = b)
+   ->  Nested Loop Left Join
+ ->  Hash Right Join
+   Hash Cond: (t1_1.a = ((t3_1.a + t3_1.b) / 2))
+   ->  Seq Scan on prt1_p2 t1_1
+   ->  Hash
+ ->  Seq Scan on prt1_e_p2 t3_1
+   Filter: (c = 0)
+ ->  Index Scan using iprt2_p2_b on prt2_p2 t2_1
+   Index Cond: (t1_1.a = b)
+   ->  Nested Loop Left Join
+ ->  Hash Right Join
+   Hash Cond: (t1_2.a = ((t3_2.a + t3_2.b) / 2))
+   ->  Seq Scan on prt1_p3 t1_2
+   ->  Hash
+ ->  Seq Scan on prt1_e_p3 t3_2
+   Filter: (c = 0)
+ ->  Index Scan using iprt2_p3_b on prt2_p3 t2_2
+   Index Cond: (t1_2.a = b)
+(31 rows)
+
+SELECT t1.a, t1.c, t2.b, t2.c, t3.a + t3.b, t3.c FROM (prt1 t1 LEFT JOIN prt2 t2 ON t1.a = t2.b) RIGHT JOIN prt1_e t3 ON (t1.a = (t3.a + t3.b)/2) WHERE t3.c = 0 ORDER BY t1.a, t2.b, t3.a + t3.b;
+  a  |  c   |  b  |  c   | ?column? | c 
+-+--+-+--+--+---
+   0 |  |   0 |  |0 | 0
+  50 | 0050 | |  |  100 | 0
+ 100 | 0100 | |  |  200 | 0
+ 150 | 0150 | 150 | 0150 |  300 | 0
+ 200 | 0200 | |  |  400 | 0
+ 250 | 0250 | |  |  500 | 0
+ 300 | 0300 | 300 | 0300 |  600 | 0
+ 350 | 0350 | |  |  700 | 0
+ 400 | 0400 | |  |  800 | 0
+ 450 | 0450 | 450 | 0450 |

Re: Commit fest 2017-11

2017-11-30 Thread Fabien COELHO



Hello Michaël,


And the last 21 patches have been classified as well. Here is the
final score for this time:
Committed: 55.
Moved to next CF: 103.
Rejected: 1.
Returned with Feedback: 47.
Total: 206.

Thanks to all the contributors for this session! The CF is now closed.


Thanks for the CF management.

Attached a small graph of the end status of patch at the end of each CF.

There is a significant "moved to next CF" set of patches, which includes:

 - patches that were ready but did not get a committer,
   (about one per committer...)

 - patches that were still waiting for a review (a lot...)

 - possibly a few that were in "waiting on author" state, although
   they tend to be switched to "returned with feedback".

There is a rolling chunck of 50% of patches which moves from CF to CF.

Not enough review(er)s, obviously.

Also, not enough committers: as a reviewer, if the patches I review do not 
get committed once ready, spending time on reviews becomes quite 
unattractive.


--
Fabien.

Re: [HACKERS] pgbench - allow to store select results into variables

2017-11-30 Thread Fabien COELHO



Hello Michaël,


Here is a v14, after yet another rebase, and some comments added to answer
your new comments.


The last patch sent still applies, but its status of "ready for
committer" does not look adapted as this version got no reviews. So I
am switching back the patch as "needs review". Feel free to change if
that's not adapted,


Indeed, I think that is not needed, as the edit was beyond mundane 
(comments lightly edited, one const added after a wind of const got 
committed, patch context diff changes probably after a pgindent run that 
triggered the need of a rebase), which is why I did not updated the status 
for this version.


So I put it back to "ready".


The patch is moved as well to next CF.


--
Fabien.

Re: Commit fest 2017-11

2017-11-30 Thread Michael Paquier

On Thu, Nov 30, 2017 at 1:51 PM, Michael Paquier
 wrote:
> Remains 22 patches as of now, exactly *one* for each committer.

And the last 21 patches have been classified as well. Here is the
final score for this time:
Committed: 55.
Moved to next CF: 103.
Rejected: 1.
Returned with Feedback: 47.
Total: 206.

Thanks to all the contributors for this session! The CF is now closed.

(I know that I am a couple of hours ahead, doing things now fits
better with my schedule.)
-- 
Michael

Re: [HACKERS] pg_basebackup --progress output for batch execution

2017-11-30 Thread Michael Paquier

On Tue, Nov 21, 2017 at 8:24 PM, Martín Marqués  wrote:
> Thank you very much for reviewing the patch and for your valuable
> input (you made me read the Microsoft Visual C specs ;))

+   if (!isatty(fileno(stderr)))
+   fprintf(stderr, "\n");
+   else
+   fprintf(stderr, "\r");
Er, why is that not "\r\n"?

I have moved this patch to next CF. Congrats for being the last one.
-- 
Michael

Re: [HACKERS] [PATCH] Generic type subscripting

2017-11-30 Thread Michael Paquier

On Mon, Nov 20, 2017 at 7:47 PM, Dmitry Dolgov <9erthali...@gmail.com> wrote:
>> On 19 November 2017 at 16:13, Arthur Zakirov 
>> wrote:
>>
>> I think it is time to mark the patch as "Ready for Commiter". I've
>> marked it.
>
> Good, thank you for the comprehensive review.

Documentation in patch 4 has conflicts. Please rebase. I am moving
this patch to next CF with "waiting on author".
-- 
Michael

Re: [HACKERS] [PATCH] Improve geometric types

2017-11-30 Thread Michael Paquier

On Thu, Nov 30, 2017 at 3:28 AM, Emre Hasegeli  wrote:
> It would at least be dump-and-restore hazard if we don't let them in.
> The new version allows NaNs.
>
>> This gives a wrong result for NaN-containing objects.
>
> I removed the NaN aware comparisons from FP macros, and carefully
> reviewed the places that needs to be NaN aware.
>
> I am sorry that it took so long for me to post the new versions.  The
> more I get into this the more problems I find.  The new versions
> include non-trivial changes.  I would be glad if you can look into
> them.

I have moved this patch to next CF, with "needs review" as status as a
new version has been posted.
-- 
Michael

Re: [HACKERS] postgres_fdw super user checks

2017-11-30 Thread Michael Paquier

On Thu, Nov 30, 2017 at 12:15 PM, Ashutosh Bapat
 wrote:
> On Wed, Nov 29, 2017 at 7:42 PM, Stephen Frost  wrote:
>> Ashutosh,
>>
>> * Ashutosh Bapat (ashutosh.ba...@enterprisedb.com) wrote:
>>> On Wed, Nov 29, 2017 at 4:56 AM, Stephen Frost  wrote:
>>> > The "global rethink" being contemplated seems to be more about
>>> > authentication forwarding than it is about this specific change.  If
>>> > there's some 'global rethink' which is actually applicable to this
>>> > specific deviation from the usual "use the view's owner for privilege
>>> > checks", then it's unclear to me what that is.
>>>
>>> Global rethink may constitute other authentication methods like
>>> certificate based authentication. But I am not clear about global
>>> rethink in the context of owner privileges problem being discussed
>>> here.
>>
>> Right, I'm all for an independent discussion about how we can have
>> same-cluster or cross-cluster trust relationships set up to make it
>> easier for users in one cluster/database to access tables in another
>> that they should be allowed to, but that's a different topic from this.
>>
>> In other words, I think we're agreeing here. :)
>
> Yes.

I am moving this patch to next CF 2018-01.
-- 
Michael

Re: Transform for pl/perl

2017-11-30 Thread Michael Paquier

On Tue, Nov 28, 2017 at 5:14 PM, Aleksander Alekseev
 wrote:
> The new status of this patch is: Ready for Committer

Patch moved to CF 2018-01. Perhaps a committer will look at it at some point.
-- 
Michael

Re: [HACKERS] [PATCH]make pg_rewind to not copy useless WAL files

2017-11-30 Thread Michael Paquier

On Tue, Oct 3, 2017 at 1:20 AM, chenhj  wrote:
> I had filled the authors field of this patch in commitfest, and will rebase
> this patch if needed. Thank you for your help!

The documentation of the patch needs a rebase, so I am moving it to
next CF with "waiting on author" as status.

$ git diff master --check
src/bin/pg_rewind/pg_rewind.c:292: trailing whitespace.
+*
There are whitespace complains.
-- 
Michael

Re: [HACKERS] Fix performance degradation of contended LWLock on NUMA

2017-11-30 Thread Michael Paquier

On Tue, Nov 28, 2017 at 4:10 AM, Jesper Pedersen
 wrote:
> On 11/27/2017 07:41 AM, Юрий Соколов wrote:
>> Oh... there were stupid error in previos file.
>> Attached fixed version.
>
> I can reconfirm my performance findings with this patch; system same as
> up-thread.

I have moved this patch to next CF, with "needs review" as status as a
new patch has been sent after addressing Andres' comments.
-- 
Michael

Re: [HACKERS] Constraint exclusion for partitioned tables

2017-11-30 Thread Michael Paquier

On Wed, Sep 13, 2017 at 4:07 PM, Ashutosh Bapat
 wrote:
> For a partitioned table, this patch saves the time to run constraint
> exclusion on all the partitions if constraint exclusion succeeds on
> the partitioned table. If constraint exclusion fails, we have wasted
> CPU cycles on one run of constraint exclusion. The difference between
> the time spent in the two scenarios increases with the number of
> partitions. Practically, users will have a handful partitions rather
> than a couple and thus running overhead of running constraint
> exclusion on partitioned table would be justified given the time it
> will save when CE succeeds.

Moved patch to next CF.
-- 
Michael

Re: [HACKERS] GUC for cleanup indexes threshold.

2017-11-30 Thread Michael Paquier

On Thu, Nov 30, 2017 at 12:06 AM, Masahiko Sawada  wrote:
> On Wed, Nov 29, 2017 at 11:05 PM, Simon Riggs  wrote:
>> Unless there is disagreement on the above, it seems we should apply
>> Yura's patch (an edited version, perhaps).
>>
>
> IIRC the patches that makes the cleanup scan skip has a problem
> pointed by Peter[1], that is that we stash an XID when a btree page is
> deleted, which is used to determine when it's finally safe to recycle
> the page. Yura's patch doesn't have that problem?
>
> [1] 
> https://www.postgresql.org/message-id/CAH2-Wz%3D1%3Dt5fcGGfarQGcAWBqaCh%2BdLMjpYCYHpEyzK8Qg6OrQ%40mail.gmail.com

The latest patch present on this thread does not apply
(https://www.postgresql.org/message-id/c4a47caef28024ab7528946bed264...@postgrespro.ru).
Please send a rebase. For now I am moving the patch to next CF, with
"waiting on author".
-- 
Michael

Re: [HACKERS] proposal: psql command \graw

2017-11-30 Thread Michael Paquier

On Sat, Nov 11, 2017 at 12:57 AM, Pavel Stehule  wrote:
> 2017-11-10 16:38 GMT+01:00 Fabien COELHO :
>> So I switched the patch to "ready for committer".
>
> Thank you very much

Patch moved to CF 2018-01 with same status: ready for committer.
-- 
Michael

Re: Use of uninitialized variables in ExecFindPartition() for parent partition without leaves (HEAD only)

2017-11-30 Thread Amit Langote

On 2017/12/01 11:48, Michael Paquier wrote:
> On Thu, Nov 30, 2017 at 10:17 AM, Amit Langote
>  wrote:
>> Oops, I messed up taking the diff and mistakenly added noise to the patch.
> 
> Which is that bit:
> - * BuildSlotPartitionKeyDescription
> + * ExecBuildSlotPartitionKeyDescription

Yep.

>>  Fixed in the attached.
> 
> For information, it is easy enough to run into rather nasty behaviors
> here. Taking for example the test case in the last patch, but without
> the actual fix:
>   -- no partitions, so fail
>   insert into range_parted values ('a', 11);
> ! ERROR:  invalid memory alloc request size 2139062147

Yikes.

> So the test case you are adding is a good thing to have, and I am fine
> with the comment.

Thanks.

> I have added a CF entry for this thread by the way
> (https://commitfest.postgresql.org/16/1392/), and marked the thing as
> ready for committer as we agree about the fix. Let's track properly
> this issue until it gets committed.

Yeah, thanks.

Regards,
Amit

Re: [HACKERS] INSERT ON CONFLICT and partitioned tables

2017-11-30 Thread Amit Langote

On 2017/12/01 11:27, Simon Riggs wrote:
> On 24 November 2017 at 13:45, Amit Langote
>  wrote:
> 
>>> Why? There is no caller that needs information.
>>
>> It is to be used if and when ExecInsert() calls
>> ExecCheckIndexConstraints() in the code path to handle ON CONFLICT DO
>> NOTHING that we're intending to support in some cases.  Note that it will
>> only check conflicts for the individual leaf partitions using whatever
>> constraint-enforcing indexes they might have.
> 
> So we should have 2 patches. One for now that does DO NOTHING and
> another that adds the change that depends upon Alvaro's work.

Yes, I'd think so.

It won't be until the patch at [1] to add support to define UNIQUE indexes
on partitioned tables is committed that we could support specifying
conflict_target on partitioned tables.  Even then, it would take at least
some executor changes to actually make it work, but at least the planner
won't complain that there are no indexes.  If I try Alvaro's patch today
and see what happens when conflict_target is specified, still get an error
but now it's the executor:

create table p (a int, b char) partition by list (a);
create table p1 partition of p (b unique) for values in (1);

-- on HEAD (before applying the patch on this thread)
insert into p values (1) on conflict (a) do nothing;
ERROR:  ON CONFLICT clause is not supported with partitioned tables

-- after applying the patch on this thread (no indexes yet)
insert into p values (1) on conflict (a) do nothing;
ERROR:  there is no unique or exclusion constraint matching the ON
CONFLICT specification

-- after applying Alvaro's patch at [1]
create unique index on p (a);

-- but, the executor support is missing, so...
insert into p values (1) on conflict (a) do nothing;
ERROR:  unexpected failure to find arbiter index

I will report this on that thread and we can discuss the executor changes
that would be need to make it work there.

Thanks,
Amit

[1] https://commitfest.postgresql.org/16/1365/

Re: [HACKERS] Proposal: Local indexes for partitioned table

2017-11-30 Thread David Rowley

On 1 December 2017 at 01:02, Alvaro Herrera  wrote:
> we currently (I mean after my
> patch) don't mark partitioned-table-level indexes as valid or not valid
> depending on whether all its children exist, so trying to use that in
> the planner without having a flag could cause invalid plans to be
> generated (i.e. ones that would cause nonexistent indexes to be
> referenced).

So, then this patch is only really intended as a syntax shortcut for
mass adding of indexes to each partition?

I feel like we could do better here with little extra effort. The
DETACH index feature does not really seem required for this patch. I
think just ensuring a matching index exists on each leaf partition and
creating any which don't exist before creating the index on the target
partitioned table seems like the correct solution. That way we can
make that index indisvalid flag have a valid meaning all the time.
Some later patch can invent some way to replace a bloated index.

Perhaps later we can invent some generic way to replace a physical
leaf index for a given partitioned index perhaps with the same patch
that might allow us to replace an index which is used by a constraint,
which to me seems like a feature we should have had years ago.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: Doc tweak for huge_pages?

2017-11-30 Thread Thomas Munro

On Fri, Dec 1, 2017 at 5:04 PM, Justin Pryzby  wrote:
> On Fri, Dec 01, 2017 at 04:01:24PM +1300, Thomas Munro wrote:
>> Hi hackers,
>>
>> The manual implies that only Linux can use huge pages.  That is not
>> true: FreeBSD, Illumos and probably others support larger page sizes
>> using transparent page coalescing algorithms.  On my FreeBSD box
>> procstat -v often shows PostgreSQL shared buffers in "S"-flagged
>> memory.  I think we should adjust the manual to make clear that it's
>> the *explicit request for huge pages* that is supported only on Linux
>> (and hopefully soon Windows).  Am I being too pedantic?
>
> I suggest to remove "other" and include Linux in the enumeration, since it 
> also
> supports "transparent" hugepages.

Hmm.  Yeah, it does, but apparently it's not so transparent.  So if we
mention that we'd better indicate in the same paragraph that you
probably don't actually want to use it.  How about the attached?

-- 
Thomas Munro
http://www.enterprisedb.com


huge-pages-doc-tweak-v2.patch
Description: Binary data

Re: Protect syscache from bloating with negative cache entries

2017-11-30 Thread Tom Lane

Robert Haas  writes:
> On Wed, Nov 29, 2017 at 11:17 PM, Tom Lane  wrote:
>> The thing that makes me uncomfortable about this is that we used to have a
>> catcache size limitation mechanism, and ripped it out because it had too
>> much overhead (see commit 8b9bc234a).  I'm not sure how we can avoid that
>> problem within a fresh implementation.

> At the risk of beating a dead horse, I still think that the amount of
> wall clock time that has elapsed since an entry was last accessed is
> very relevant.

While I don't object to that statement, I'm not sure how it helps us
here.  If we couldn't afford DLMoveToFront(), doing a gettimeofday()
during each syscache access is surely right out.

regards, tom lane

Re: Doc tweak for huge_pages?

2017-11-30 Thread Justin Pryzby

On Fri, Dec 01, 2017 at 04:01:24PM +1300, Thomas Munro wrote:
> Hi hackers,
> 
> The manual implies that only Linux can use huge pages.  That is not
> true: FreeBSD, Illumos and probably others support larger page sizes
> using transparent page coalescing algorithms.  On my FreeBSD box
> procstat -v often shows PostgreSQL shared buffers in "S"-flagged
> memory.  I think we should adjust the manual to make clear that it's
> the *explicit request for huge pages* that is supported only on Linux
> (and hopefully soon Windows).  Am I being too pedantic?

I suggest to remove "other" and include Linux in the enumeration, since it also
supports "transparent" hugepages.

Justin
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 3060597..98d42e5 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -1363,7 +1363,7 @@ include_dir 'conf.d'
   
   

-Enables/disables the use of huge memory pages. Valid values are
+Controls whether huge memory pages are explicitly requested. Valid 
values are
 try (the default), on,
 and off.

@@ -1371,6 +1371,9 @@ include_dir 'conf.d'

 At present, this feature is supported only on Linux. The setting is
 ignored on other systems when set to try.
+Note that some operating systems including Linux, FreeBSD and Illumos
+support huge pages (also known as "super" pages or "large" pages)
+automatically without an explicit request from PostgreSQL.

 

@@ -1384,7 +1387,7 @@ include_dir 'conf.d'
 the server will try to use huge pages, but fall back to using
 normal allocation if that fails. With on, failure
 to use huge pages will prevent the server from starting up. With
-off, huge pages will not be used.
+off, huge pages will not be specifically requested.

Re: [HACKERS] Additional logging for VACUUM and ANALYZE

2017-11-30 Thread Michael Paquier

On Wed, Nov 8, 2017 at 12:54 AM, Fabrízio de Royes Mello
 wrote:
> Great. Changed status to ready for commiter.

The patch still applies, but no committer seem to be interested in the
topic, so moved to next CF.
-- 
Michael

Re: [HACKERS] Supporting huge pages on Windows

2017-11-30 Thread Michael Paquier

On Fri, Nov 24, 2017 at 9:28 AM, Tsunakawa, Takayuki
 wrote:
> From: Thomas Munro [mailto:thomas.mu...@enterprisedb.com]
>> I hope Tsunakawa-san doesn't mind me posting another rebased version of
>> his patch.  The last version conflicted with the change from SGML to XML
>> that just landed in commit 3c49c6fa.
>
> Thank you very much for keeping it fresh.  I hope the committer could have 
> some time.

I have moved this patch to next CF. Magnus, you are registered as a
reviewer of this patch. Are you planning to look at it and potentially
commit it?
-- 
Michael

Doc tweak for huge_pages?

2017-11-30 Thread Thomas Munro

Hi hackers,

The manual implies that only Linux can use huge pages.  That is not
true: FreeBSD, Illumos and probably others support larger page sizes
using transparent page coalescing algorithms.  On my FreeBSD box
procstat -v often shows PostgreSQL shared buffers in "S"-flagged
memory.  I think we should adjust the manual to make clear that it's
the *explicit request for huge pages* that is supported only on Linux
(and hopefully soon Windows).  Am I being too pedantic?

-- 
Thomas Munro
http://www.enterprisedb.com


huge-pages-doc-tweak.patch
Description: Binary data

Re: [HACKERS] Patch: Add --no-comments to skip COMMENTs with pg_dump

2017-11-30 Thread Michael Paquier

On Wed, Nov 29, 2017 at 5:50 AM, Robert Haas  wrote:
> On Mon, Aug 7, 2017 at 11:14 AM, Fabrízio de Royes Mello
>  wrote:
>> I also test against all current supported versions (9.2 ... 9.6) and didn't
>> find any issue.
>>
>> Changed status to "ready for commiter".
>
> On a very fast read this patch looks OK to me, but I'm a bit concerned
> about whether we have consensus for it.  By my count the vote is 6-3
> in favor of proceeding.
>
> +1: Robins Tharakan, Stephen Frost, David Fetter, Fabrizio Mello,
> Michael Paquier, Robert Haas
> -1: David G. Johnston, Tom Lane, Simon Riggs
>
> I guess that's probably sufficient to go forward, but does anyone wish
> to dispute my characterization of their vote or the vote tally in
> general?  Would anyone else like to cast a vote?

As this is not settled yet, I have moved the patch to next CF.
-- 
Michael

Re: [HACKERS] pgbench more operators & functions

2017-11-30 Thread Michael Paquier

On Wed, Oct 25, 2017 at 5:57 PM, Pavel Stehule  wrote:
> 2017-10-20 18:36 GMT+02:00 Fabien COELHO :
> Here is a v13. No code changes, but TAP tests added to maintain pgbench
> coverage to green.
>>
>>
>> Here is a v14, which is just a rebase after the documentation xml-ization.
>
> all tests passed
> no problems with doc building

Moved to next CF. This is still parked as ready for committer.
-- 
Michael

Re: [HACKERS] pgbench - allow to store select results into variables

2017-11-30 Thread Michael Paquier

On Sat, Nov 4, 2017 at 8:05 PM, Fabien COELHO  wrote:
>>> Here is a v13, which is just a rebase after the documentation
>>> xml-ization.
>
> Here is a v14, after yet another rebase, and some comments added to answer
> your new comments.

The last patch sent still applies, but its status of "ready for
committer" does not look adapted as this version got no reviews. So I
am switching back the patch as "needs review". Feel free to change if
that's not adapted, The patch is moved as well to next CF.
-- 
Michael

Re: [HACKERS] pow support for pgbench

2017-11-30 Thread Michael Paquier

On Tue, Nov 7, 2017 at 1:34 AM, Fabien COELHO  wrote:
> Let's now hope that a committer gets around to consider these patch some
> day.

Which is not the case yet, so moved to CF 2018-01. Please note that
the patch proposed does not apply anymore, so its status is changed to
"waiting on author" for a rebase.
-- 
Michael

Re: [HACKERS] Assertion failure when the non-exclusive pg_stop_backup aborted.

2017-11-30 Thread Michael Paquier

On Wed, Nov 29, 2017 at 7:36 AM, Michael Paquier
 wrote:
> On Wed, Nov 29, 2017 at 6:44 AM, Robert Haas  wrote:
>> On Tue, Nov 21, 2017 at 4:32 AM, Masahiko Sawada  
>> wrote:
>>> Thank you for comments. Attached updated patch.
>>
>> I see that Michael has marked this Ready for Committer, but also that
>> he didn't update the thread, so perhaps some interested committer
>> (Fujii Masao?) might have missed the fact that Michael thinks it's
>> ready to go.
>
> Sorry for not mentioning that directly on the thread.
>
>> Anyone interested in taking a look at this one for commit?
>
> I would assume that Fujii-san would be the chosen one here as he has
> worked already on the thread.

No replies yet. So I am moving this patch to next CF.
-- 
Michael

Re: [HACKERS] Commits don't block for synchronous replication

2017-11-30 Thread Michael Paquier

On Fri, Nov 24, 2017 at 11:38 PM, Simon Riggs  wrote:
> On 23 November 2017 at 11:11, Michael Paquier  
> wrote:
>
>> This is older than the bug report of this thread. All those
>> indications point out that the patch has *not* been committed. So it
>> seems to me that you perhaps committed it to your local repository,
>> but forgot to push it to the remote. I am switching back the patch
>> status to what looks correct to me "Ready for committer". Thanks.
>
> Yes, that looks like it's my mistake. Thanks for rechecking.
>
> Will commit and backpatch when I get home.

Ping. Simon, are you planning to look at this patch, and potentially commit it?

I am moving this item to next CF for now.
-- 
Michael

Re: Use of uninitialized variables in ExecFindPartition() for parent partition without leaves (HEAD only)

2017-11-30 Thread Michael Paquier

On Thu, Nov 30, 2017 at 10:17 AM, Amit Langote
 wrote:
> Oops, I messed up taking the diff and mistakenly added noise to the patch.

Which is that bit:
- * BuildSlotPartitionKeyDescription
+ * ExecBuildSlotPartitionKeyDescription

>  Fixed in the attached.

For information, it is easy enough to run into rather nasty behaviors
here. Taking for example the test case in the last patch, but without
the actual fix:
  -- no partitions, so fail
  insert into range_parted values ('a', 11);
! ERROR:  invalid memory alloc request size 2139062147
So the test case you are adding is a good thing to have, and I am fine
with the comment.

I have added a CF entry for this thread by the way
(https://commitfest.postgresql.org/16/1392/), and marked the thing as
ready for committer as we agree about the fix. Let's track properly
this issue until it gets committed.
-- 
Michael

Re: [HACKERS] INSERT ON CONFLICT and partitioned tables

2017-11-30 Thread Simon Riggs

On 24 November 2017 at 13:45, Amit Langote
 wrote:

>> Why? There is no caller that needs information.
>
> It is to be used if and when ExecInsert() calls
> ExecCheckIndexConstraints() in the code path to handle ON CONFLICT DO
> NOTHING that we're intending to support in some cases.  Note that it will
> only check conflicts for the individual leaf partitions using whatever
> constraint-enforcing indexes they might have.

So we should have 2 patches. One for now that does DO NOTHING and
another that adds the change that depends upon Alvaro's work.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [JDBC] [HACKERS] Channel binding support for SCRAM-SHA-256

2017-11-30 Thread Michael Paquier

On Wed, Nov 29, 2017 at 7:42 AM, Peter Eisentraut
 wrote:
> On 11/28/17 17:33, Michael Paquier wrote:
>> 1) Have a special value in the parameter saslchannelbinding proposed
>> in patch 0001. For example by specifying "none" then no channel
>> binding is used.
>
> I was thinking if it's empty then don't use channel binding.  Right now,
> empty means the same thing as tls-unique.  In any case, some variant of
> that should be fine.  I don't think we need a separate server option
> that this point.

OK, here is a reworked version with the following changes:
- renamed saslchannelbinding to scramchannelbinding, with a default
set to tls-unique.
- An empty value of scramchannelbinding allows client to not use
channel binding, or in short use use SCRAM-SHA-256 and cbind-flag set
to 'n'.

While reviewing the code, I have found something a bit disturbing with
the header definitions: the libpq frontend code includes scram.h,
which references backend-side routines. So I think that the definition
of the SCRAM mechanisms as well as the channel binding types should be
moved to scram-common.h. This cleanup is included in 0001.
-- 
Michael


0001-Move-SCRAM-related-name-definitions-to-scram-common..patch
Description: Binary data


0002-Add-connection-parameter-scramchannelbinding.patch
Description: Binary data


0003-Implement-channel-binding-tls-server-end-point-for-S.patch
Description: Binary data

Re: [HACKERS] INSERT ON CONFLICT and partitioned tables

2017-11-30 Thread Amit Langote

On 2017/12/01 1:02, Robert Haas wrote:
> On Wed, Nov 29, 2017 at 11:07 PM, Michael Paquier
>  wrote:
>> On Fri, Nov 24, 2017 at 11:47 AM, Amit Langote
>>  wrote:
>>> On 2017/11/24 11:45, Amit Langote wrote:
 Meanwhile, rebased patch is attached.
>>>
>>> Oops, forgot to attach in the last email.  Attached now.
>>
>> Moved to next CF.
> 
> I have two review comments concerning this patch.

Thanks for the review.

>  First, I think it
> would be a good idea to add this to the regression test, just before
> 'drop table':
> 
> +-- but it works OK if we target the partition directly
> +insert into parted_conflict_test_1 values (1) on conflict (b) do
> update set a = excluded.a;

Done.

> Second, this would be the first place where the second argument to
> ExecOpenIndices() is passed simply as true.  The only other caller
> that doesn't pass constant false is in nodeModifyTable.c and looks
> like this:
> 
> ExecOpenIndices(resultRelInfo, mtstate->mt_onconflict != ONCONFLICT_NONE);
> 
> The intention here appears to be to avoid the overhead of doing
> BuildSpeculativeIndexInfo() when it isn't needed.  I think we should
> try to do the same thing here.  As written, the patch will do that
> work even when the query has no ON CONFLICT clause, which doesn't seem
> like a good idea.

Agreed.  One thing though is that ExecSetupPartitionTupleRouting doesn't
receive the mtstate today.  I've a patch pending that will re-design that
interface (for another project) that I will post in the near future, but
meanwhile let's just add a new parameter mtstate so that this patch can
move forward with its business.  The future patch I'm talking about will
keep the parameter intact, so it should be OK now to add the same.

Attached two patches:

0001: refactoring to add the mtstate parameter
0002: the original patch updated to address the above comment

Thanks,
Amit
From 986676c8614d2d5954ea5d505cf729dc6dc487bd Mon Sep 17 00:00:00 2001
From: amit 
Date: Fri, 1 Dec 2017 10:44:37 +0900
Subject: [PATCH 1/2] Pass mtstate to ExecSetupPartitionTupleRouting

---
 src/backend/commands/copy.c| 3 ++-
 src/backend/executor/execPartition.c   | 3 ++-
 src/backend/executor/nodeModifyTable.c | 3 ++-
 src/include/executor/execPartition.h   | 3 ++-
 4 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 13eb9e34ba..bace390470 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -2478,7 +2478,8 @@ CopyFrom(CopyState cstate)
int num_parted,
num_partitions;
 
-   ExecSetupPartitionTupleRouting(cstate->rel,
+   ExecSetupPartitionTupleRouting(NULL,
+  
cstate->rel,
   1,
   
estate,
   
&partition_dispatch_info,
diff --git a/src/backend/executor/execPartition.c 
b/src/backend/executor/execPartition.c
index 59a0ca4597..f6108a156b 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -63,7 +63,8 @@ static char *ExecBuildSlotPartitionKeyDescription(Relation 
rel,
  * RowExclusiveLock mode upon return from this function.
  */
 void
-ExecSetupPartitionTupleRouting(Relation rel,
+ExecSetupPartitionTupleRouting(ModifyTableState *mtstate,
+  Relation rel,
   Index resultRTindex,
   EState *estate,
   PartitionDispatch 
**pd,
diff --git a/src/backend/executor/nodeModifyTable.c 
b/src/backend/executor/nodeModifyTable.c
index 1e3ece9b34..afb83ed3ae 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -1953,7 +1953,8 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, 
int eflags)
int num_parted,
num_partitions;
 
-   ExecSetupPartitionTupleRouting(rel,
+   ExecSetupPartitionTupleRouting(mtstate,
+  rel,
   
node->nominalRelation,
   
estate,
   
&partition_dispatch_info,
diff --git a/src/include/executor/execPartition.h 
b/src/include/executor/execPartition.h
index 43ca9908aa..86a199d169 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@

Re: [HACKERS] create_unique_path and GEQO

2017-11-30 Thread Ashutosh Bapat

On Thu, Nov 30, 2017 at 8:19 PM, Robert Haas  wrote:
> On Fri, Mar 24, 2017 at 9:50 AM, Tom Lane  wrote:
>> Yeah.  I think the code in mark_dummy_rel is newer and better-thought-out
>> than what's in create_unique_path.  It probably makes sense to change over.
>
> I did a bit of archaeology here.  create_unique_path() first appears
> in commit bdfbfde1b168b3332c4cdac34ac86a80aaf4d442 (vintage 2003),
> where it used GetMemoryChunkContext(rel). Commit
> f41803bb39bc2949db200116a609fd242d0ec221 (vintage 2007) changed it to
> use root->planner_cxt, but neither the comment changes in that patch
> nor the commit message give any hint as to the motivation for the
> change.  The comments do mention that it should be kept in sync with
> best_inner_indexscan(), which was also switched to use
> root->planner_cxt by that commit, again without any real explanation
> as to why, and best_inner_indexscan() continued to use
> root->planner_cxt until its demise in commit
> e2fa76d80ba571d4de8992de6386536867250474 (vintage 2012).  Meanwhile,
> mark_dummy_rel() didn't switch contexts at all until commit
> eca75a12a27d28b972fc269c1c8813cd8eb15441 (vintage 2011) at which point
> it began using GetMemoryChunkContext(rel).

Thanks a lot for digging into the history.

>
> All of which, I think, is a long-winded way of saying that I'm going
> to go commit this patch.

Thanks a lot for the commit.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager

2017-11-30 Thread Masahiko Sawada

On Fri, Dec 1, 2017 at 3:04 AM, Robert Haas  wrote:
> On Thu, Nov 30, 2017 at 6:20 AM, Masahiko Sawada  
> wrote:
>>> This code ignores the existence of multiple databases; RELEXTLOCK
>>> contains a relid, but no database OID.  That's easy enough to fix, but
>>> it actually causes no problem unless, by bad luck, you have two
>>> relations with the same OID in different databases that are both being
>>> rapidly extended at the same time -- and even then, it's only a
>>> performance problem, not a correctness problem.  In fact, I wonder if
>>> we shouldn't go further: instead of creating these RELEXTLOCK
>>> structures dynamically, let's just have a fixed number of them, say
>>> 1024.  When we get a request to take a lock, hash  and
>>> take the result modulo 1024; lock the RELEXTLOCK at that offset in the
>>> array.
>>
>> Attached the latest patch incorporated comments except for the fix of
>> the memory size for relext lock.
>
> It doesn't do anything about the comment of mine quoted above.

Sorry I'd missed the comment.

>   Since it's only possible to hold one relation extension lock at a time, we
> don't really need the hash table here at all. We can just have an
> array of 1024 or so locks and map every  pair on to one of
> them by hashing.  The worst thing we'll get it some false contention,
> but that doesn't seem awful, and it would permit considerable further
> simplification of this code -- and maybe make it faster in the
> process, because we'd no longer need the hash table, or the pin count,
> or the extra LWLocks that protect the hash table.  All we would have
> is atomic operations manipulating the lock state, which seems like it
> would be quite a lot faster and simpler.

Agreed. With this change, we will have an array of the struct that has
lock state and cv. The lock state has the wait count as well as the
status of lock.

> BTW, I think RelExtLockReleaseAll is broken because it shouldn't
> HOLD_INTERRUPTS(); I also think it's kind of silly to loop here when
> we know we can only hold one lock.  Maybe RelExtLockRelease can take
> bool force and do if (force) held_relextlock.nLocks = 0; else
> held_relextlock.nLocks--.  Or, better yet, have the caller adjust that
> value and then only call RelExtLockRelease() if we needed to release
> the lock in shared memory.  That avoids needless branching.

Agreed. I'd vote for the latter.

>  On a
> related note, is there any point in having both held_relextlock.nLocks
> and num_held_relextlocks?

num_held_relextlocks is actually unnecessary, will be removed.

> I think RelationExtensionLock should be a new type of IPC wait event,
> rather than a whole new category.

Hmm, I thought the wait event types of IPC seems related to events
that communicates to other processes for the same purpose, for example
parallel query, sync repli etc. On the other hand, the relation
extension locks are one kind of the lock mechanism. That's way I added
a new category. But maybe it can be fit to the IPC wait event.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Re: [HACKERS] Refactoring identifier checks to consistently use strcmp

2017-11-30 Thread Michael Paquier

On Fri, Dec 1, 2017 at 4:14 AM, Robert Haas  wrote:
> I think the changes in DefineView and ATExecSetRelOptions is wrong,
> because transformRelOptions() is still using pg_strcasecmp.  With the
> patch:
>
> rhaas=# create view v(x) with ("Check_option"="local") as select 1;
> CREATE VIEW
> rhaas=# create view v(x) with ("check_option"="local") as select 1;
> ERROR:  WITH CHECK OPTION is supported only on automatically updatable views
> HINT:  Views that do not select from a single table or view are not
> automatically updatable.
>
> Oops.

I am getting the feeling that there could be other issues than this
one... If this patch ever gets integrated, I think that this should
actually be shaped as two patches:
- One patch with the set of DDL queries taking advantage of the
current grammar with pg_strcasecmp.
- A second patch that does the switch from pg_strcasecmp to strcmp,
and allows checking which paths of patch 1 are getting changed.
Patch 1 is the hard part to figure out all the possible patterns that
could be changed.

> I am actually pretty dubious that we want to do this.  I found one bug
> already (see above), and I don't think there's any guarantee that it's
> the only one, because it's pretty hard to be sure that none of the
> remaining calls to pg_strcasecmp are being applied to any of these
> values in some other part of the code.  I'm not sure that the backward
> compatibility issue is a huge deal, but from my point of view this
> carries a significant risk of introducing new bugs, might annoy users
> who spell any of these keywords in all caps with surrounding quotation
> marks, and really has no clear benefit that I can see.  The
> originally-offered justification is that making this consistent would
> help us avoid subtle bugs, but it seems at least as likely to CREATE
> subtle bugs, and the status quo is AFAICT harming nobody.

Changing opinion here ;)
Yes, I agree that the risk of bugs may not be worth the compatibility
effort at the end. I still see value in this patch for long-term
purposes by making the code more consistent though.
-- 
Michael

Re: Combine function returning NULL unhandled?

2017-11-30 Thread David Rowley

On 24 November 2017 at 14:20, Andres Freund  wrote:
> Pushed a fix to the relevant branches, including tests of the
> trans/combine functions returns NULL cases.

Apologies for my silence here. I've been on leave and out of internet
range for two weeks.

Thank you for making the fix.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] GSOC'17 project introduction: Parallel COPY execution with errors handling

2017-11-30 Thread Alvaro Herrera

Alexey Kondratov wrote:

> I have rebased my branch and squashed all commits into one, since the
> patch is quite small. Everything seems to be working fine now, patch
> passes all regression tests.

On a *very* quick look, please use an enum to return from NextCopyFrom
rather than 'int'.  The chunks that change bool to int are very
odd-looking.  This would move the comment that explains the value from
copy.c to copy.h, obviously.  Also, you seem to be using non-ASCII dashes
in the descriptions of those values; please don't.

But those are really just minor nitpicks while paging through.  After
looking at the way you're handling errors, it seems that you've chosen
to go precisely the way people were saying that was not admissible: use
a PG_TRY block, and when things go south, simply log a WARNING and move
on.  This is unsafe.  For example: what happens if the error being
raised is out-of-memory?  Failing to re-throw means you don't release
current transaction memory context.  What if the error is that WAL
cannot be written because the WAL disk ran out of space?  This would
just be reported as a warning over and over until the server log disk is
also full.  What if your insertion fired a trigger that caused an update
which got a serializability exception?  You cannot just move on, because
at that point session state (serializability session state) might be
busted until you abort the current transaction.

Or maybe I misunderstood the patch completely.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] <> join selectivity estimate question

2017-11-30 Thread Thomas Munro

On Fri, Dec 1, 2017 at 10:41 AM, Thomas Munro
 wrote:
> On Fri, Dec 1, 2017 at 4:05 AM, Robert Haas  wrote:
>> Hmm, do you have an example of the better but still-funky estimates
>> handy?  Like an EXPLAIN plan?
>
> Sure.  Here's some EXPLAIN ANALYZE output from scale 3 TPCH + a few
> indexes[1].  There's a version from HEAD with and without commit
> 7ca25b7d.

So, in that plan we saw anti-join estimate 1 row but really there were
13462.  If you remove most of Q21 and keep just the anti-join between
l1 and l3, then you try removing different quals, you can see the the
problem is not the <> qual:

  select count(*)
from lineitem l1
   where not exists (
select *
  from lineitem l3
 where l3.l_orderkey = l1.l_orderkey
   and l3.l_suppkey <> l1.l_suppkey
   and l3.l_receiptdate > l3.l_commitdate
)
  => estimate=1 actual=8998304

  select count(*)
from lineitem l1
   where not exists (
select *
  from lineitem l3
 where l3.l_suppkey <> l1.l_suppkey
)
  => estimate=1 actual=0

  select count(*)
from lineitem l1
   where not exists (
select *
  from lineitem l3
 where l3.l_orderkey = l1.l_orderkey
)
  => estimate=1 actual=0

  select count(*)
from lineitem l1
   where not exists (
select *
  from lineitem l3
 where l3.l_orderkey = l1.l_orderkey
   and l3.l_receiptdate > l3.l_commitdate
)
  => estimate=1 actual=294884

The = and <> quals see to be estimated well, but when that filter on
l3 is added we go off the rails.  It removes about 37% of the rows in
l3, and means that we sometimes don't find a match, so the anti-join
produces some rows.

-- 
Thomas Munro
http://www.enterprisedb.com

Re: Commit fest 2017-11

2017-11-30 Thread Michael Paquier

On Fri, Dec 1, 2017 at 12:44 AM, Andreas Karlsson  wrote:
> On 11/30/2017 05:51 AM, Michael Paquier wrote:> One thing that could be
> improved in my
>>
>> opinion is that patch authors should try more to move a patch to a
>> following commit fest once the end gets close...This would leverage
>> slightly the load of work for the CFM.
>
> Thanks for the suggestion. I had no idea that I could do that.

As a patch author, you have the right to decide what you are yourself
planning to do with your own baby :)

When the commit fest comes close to the end, I always try to deal with
my own patches first. If the discussion has stalled or that based on
the feedback a given thing is going to require more efforts than I
initially planned so it is not possible to spend more time on
something in the close future, I mark my own entries as returned with
feedback. If I am still planning to work on something in the close
future, then I move them to the next CF. This makes less work for the
CFM which has less patches to deal with at the end. Mentioning what
you do on the patch thread with the so-said CF entry is also something
worth doing in my opinion to keep a track of what you did. When
multiple authors and/or reviewers are involved, it is sometimes good
to post your intention before doing it some time ahead.
-- 
Michael

Re: [HACKERS] Custom compression methods

2017-11-30 Thread Alvaro Herrera

Tomas Vondra wrote:

> On 11/30/2017 04:20 PM, Ildus Kurbangaliev wrote:

> > CREATE COMPRESSION METHOD ts1 FOR tsvector HANDLER
> > tsvector_compression_handler;
> 
> Understood. Good to know you've considered it, and I agree it doesn't
> need to be there from the start (which makes the patch simpler).

Just passing by, but wouldn't this fit in the ACCESS METHOD group of
commands?  So this could be simplified down to
CREATE ACCESS METHOD ts1 TYPE COMPRESSION
we have that for indexes and there are patches flying for heap storage,
sequences, etc.  I think that's simpler than trying to invent all new
commands here.  Then (in a future patch) you can use ALTER TYPE to
define compression for that type, or even add a column-level option to
reference a specific compression method.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: using index or check in ALTER TABLE SET NOT NULL

2017-11-30 Thread Sergei Kornilov

Thank you for review! My apologies for my errors. It seems i read developers 
wiki pages not enough carefully. I will reread wiki, code style and then update 
patch with all your remarks.

> The comment /* T if we added new NOT NULL constraints */ should
> probably be changed to /* T if we should recheck NOT NULL constraints
> */ or similar.
Correct. Hm, probably I will also rename this property to verify_new_notnull 
for clarity

> I'd suggest registering this with the next CommitFest.
Already done in https://commitfest.postgresql.org/16/1389/ , i noticed this 
step in wiki

Re: IndexTupleDSize macro seems redundant

2017-11-30 Thread Peter Geoghegan

On Thu, Nov 30, 2017 at 1:48 PM, Robert Haas  wrote:
> One difference between those two macros is that IndexTupleSize
> forcibly casts the argument to IndexTuple, which means that you don't
> get any type-checking when you use that one.  I suggest that in
> addition to removing IndexTupleDSize as proposed, we also remove that
> cast.  It seems that the only place which relies on that cast
> currently is btree_xlog_split.
>
> Therefore I propose the attached patch instead.

+1 to both points. I specifically recall being annoyed by both issues,
more than once.

-- 
Peter Geoghegan

Re: IndexTupleDSize macro seems redundant

2017-11-30 Thread Robert Haas

On Tue, Nov 21, 2017 at 9:26 AM, Amit Kapila  wrote:
> +1.  I was also once confused with these macros.  I think this is a
> good cleanup.  On a quick look, I don't see any problem with your
> changes.

One difference between those two macros is that IndexTupleSize
forcibly casts the argument to IndexTuple, which means that you don't
get any type-checking when you use that one.  I suggest that in
addition to removing IndexTupleDSize as proposed, we also remove that
cast.  It seems that the only place which relies on that cast
currently is btree_xlog_split.

Therefore I propose the attached patch instead.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

remove-indextupledsize-v2.patch
Description: Binary data

Re: [HACKERS] <> join selectivity estimate question

2017-11-30 Thread Thomas Munro

On Fri, Dec 1, 2017 at 4:05 AM, Robert Haas  wrote:
> On Wed, Nov 29, 2017 at 11:55 PM, Thomas Munro
>  wrote:
>> Thank you for the original pointer and the commit.  Everything here
>> seems to make intuitive sense and the accompanying throw-away tests
>> that I posted above seem to produce sensible results except in some
>> cases that we discussed, so I think this is progress.  There is still
>> something pretty funny about the cardinality estimates for TPCH Q21
>> which I haven't grokked though.  I suspect it is crafted to look for a
>> technique we don't know (an ancient challenge set by some long retired
>> database gurus back in 1992 that their RDBMSs know how to solve,
>> hopefully not in the manner of a certain car manufacturer's air
>> pollution tests), but I haven't yet obtained enough round tuits to dig
>> further.  I will, though.
>
> Hmm, do you have an example of the better but still-funky estimates
> handy?  Like an EXPLAIN plan?

Sure.  Here's some EXPLAIN ANALYZE output from scale 3 TPCH + a few
indexes[1].  There's a version from HEAD with and without commit
7ca25b7d.

[1] 
https://github.com/macdice/pg_sisyphus/blob/master/cluster-recipes/make-tpch-cluster.sh

-- 
Thomas Munro
http://www.enterprisedb.com

   QUERY PLAN   
 
-
 Limit  (cost=544020.68..544020.69 rows=1 width=34) (actual 
time=5070.235..5070.253 rows=100 loops=1)
   ->  Sort  (cost=544020.68..544020.69 rows=1 width=34) (actual 
time=5070.233..5070.241 rows=100 loops=1)
 Sort Key: (count(*)) DESC, supplier.s_name
 Sort Method: top-N heapsort  Memory: 39kB
 ->  GroupAggregate  (cost=544020.65..544020.67 rows=1 width=34) 
(actual time=5061.289..5068.050 rows=1194 loops=1)
   Group Key: supplier.s_name
   ->  Sort  (cost=544020.65..544020.66 rows=1 width=26) (actual 
time=5061.275..5063.360 rows=11887 loops=1)
 Sort Key: supplier.s_name
 Sort Method: quicksort  Memory: 1406kB
 ->  Nested Loop  (cost=1752.62..544020.64 rows=1 width=26) 
(actual time=3.134..4926.365 rows=11887 loops=1)
   ->  Nested Loop Semi Join  (cost=1752.19..544015.98 
rows=1 width=34) (actual time=3.122..4598.302 rows=24138 loops=1)
 ->  Gather  (cost=1751.75..544010.27 rows=1 
width=34) (actual time=2.909..4354.282 rows=40387 loops=1)
   Workers Planned: 2
   Workers Launched: 2
   ->  Nested Loop Anti Join  
(cost=751.75..543010.17 rows=1 width=34) (actual time=4.282..4720.283 
rows=13462 loops=3)
 ->  Hash Join  
(cost=751.32..442412.33 rows=99981 width=34) (actual time=3.911..3554.800 
rows=151103 loops=3)
   Hash Cond: (l1.l_suppkey = 
supplier.s_suppkey)
   ->  Parallel Seq Scan on 
lineitem l1  (cost=0.00..431288.00 rows=2499520 width=8) (actual 
time=0.046..2742.912 rows=3792542 loops=3)
 Filter: (l_receiptdate 
> l_commitdate)
 Rows Removed by 
Filter: 2206328
   ->  Hash  
(cost=736.32..736.32 rows=1200 width=30) (actual time=3.690..3.690 rows=1194 
loops=3)
 Buckets: 2048  
Batches: 1  Memory Usage: 91kB
 ->  Nested Loop  
(cost=25.59..736.32 rows=1200 width=30) (actual time=0.486..3.206 rows=1194 
loops=3)
   ->  Seq Scan on 
nation  (cost=0.00..1.31 rows=1 width=4) (actual time=0.018..0.038 rows=1 
loops=3)
 Filter: 
(n_name = 'ALGERIA'::bpchar)
 Rows 
Removed by Filter: 24
   ->  Bitmap Heap 
Scan on supplier  (cost=25.59..723.00 rows=1200 width=34) (actual 
time=0.460..2.809 rows=1194 loops=3)
 Recheck 
Cond: (s_nationkey = nation.n_nationkey)
 Heap 
Blocks: exact=564
 ->  Bitmap 
Index Scan on idx_supplier_nation_key  (cost=0.00..25.29 rows=120

Re: [HACKERS] CSV Logging questions

2017-11-30 Thread Robert Haas

On Mon, Sep 4, 2017 at 12:27 PM, Greg Stark  wrote:
> 1) Why do we gather a per-session log line number? Is it just to aid
> people importing to avoid duplicate entries from partial files? Is
> there some other purpose given that entries will already be sequential
> in the csv file?

I think the idea is that if you see a line for a session you can tell
whether there are any earlier lines for the same session.  It might
not be obvious, because they could be much earlier in the log if a
session was idle for a while.  I've certainly run into this problem in
real-world troubleshooting.

> 2) Why is the file error conditional on log_error_verbosity? Surely
> the whole point of a structured log is that you can log everything and
> choose what to display later -- i.e. why csv logging doesn't look at
> log_line_prefix to determine which other bits to display. There's no
> added cost to include this information unconditionally and they're far
> from the largest piece of data being logged either.
>
> 3) Similarly I wonder if the statement should always be included even
> with hide_stmt is set so that users can write sensible queries against
> the data even if it means duplicating data.

I think the principle that the CSV log should contain all of the
output fields can be taken too far, and I'd put both of these ideas in
that category.  I don't see any reason to believe there couldn't be a
user who wants CSV logging but not at maximum verbosity -- and
hide_stmt is used for cases like this:

ereport(LOG,
(errmsg("statement: %s", query_string),
 errhidestmt(true),
 errdetail_execute(parsetree_list)));

Actually, I think it's poor design to force the CSV log to contain all
of the output fields.  For some users, that might make it unusable by
making the output too big.  I think it would be better if the data
were self-identifying - e.g. by sticking a header line on each log
file - and perhaps complete by default, but still configurable.  We've
had the idea of adding new %-escapes shot down on the grounds that
that would force us to include them all the time in CSV output and
they are too specialized to justify this, but that seems to me to be a
case of the tail wagging the dog.

> 4) Why the session start time? Is this just so that  session_start_time> uniquely identiifes a session? Should we perhaps
> generate a unique session identifier instead?

We could do that, but it doesn't seem better...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: using index or check in ALTER TABLE SET NOT NULL

2017-11-30 Thread Robert Haas

On Wed, Nov 29, 2017 at 11:53 AM, Sergei Kornilov  wrote:
> Here is new patch with check only existed valid constraints and without SPI 
> at all.

Please use pgindent or anyway try to follow project style.  { needs to
go on a line by itself, for example.

isSetNotNullNeedsTableScan seems different than other similarly-named
functions we have.  NotNullImpliedByRelConstraints?

It also lacks a header comment.

Having both PartConstraintImpliedByRelConstraint and
ConstraintImpliedByRelConstraint with identical implementations
doesn't seem like a good idea to me.  I guess just renaming it is
probably fine.

The comment /* T if we added new NOT NULL constraints */ should
probably be changed to /* T if we should recheck NOT NULL constraints
*/ or similar.

I'd suggest registering this with the next CommitFest.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] SQL procedures

2017-11-30 Thread Thomas Munro

On Fri, Dec 1, 2017 at 7:48 AM, Peter Eisentraut
 wrote:
> On 11/29/17 14:17, Andrew Dunstan wrote:
>> On 11/28/2017 10:03 AM, Peter Eisentraut wrote:
>>> Here is a new patch that addresses the previous review comments.
>>>
>>> If there are no new comments, I think this might be ready to go.
>>
>> Looks good to me. Marking ready for committer.
>
> committed

postgres=# \df
   List of functions
 Schema | Name | Result data type | Argument data types | Type
+--+--+-+--
 public | bar  | integer  | i integer   | func
 public | foo  |  | i integer   | proc
(2 rows)

Should this now be called a "List of routines"?

-- 
Thomas Munro
http://www.enterprisedb.com

Re: [HACKERS] Bug in to_timestamp().

2017-11-30 Thread Arthur Zakirov

On Thu, Nov 30, 2017 at 10:39:56AM -0500, Robert Haas wrote:
> 
> So is it now the case that all of the regression test cases in this
> patch produce the same results as they would on Oracle?
> 

Yes, exactly. All new cases give the same result as in Oracle, except
text of error messages.

-- 
Arthur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company

Re: ASCII Null control character validation

2017-11-30 Thread Peter Eisentraut

On 11/30/17 03:13, Alexey Chernyshov wrote:
> I found in src/backend/utils/mb/wchar.c: pg_verify_mbstr_len() that it
> reports ASCII Null character (\000) as invalid. As for me, it should
> pass validation. However, ASCII Null character breaks a line and the
> end of the line is missed, try:
> 
> INSERT INTO mytable VALUES (E'a\001b\000c and rest of line MIA');
> 
> Find patch attached. Am I wrong?

The main reason why you can't usefully have null characters in a string
literal is that the type input functions take a cstring as input, so no
string with a null character can possibly be input to a type.

So removing that check would just cause failure or confusion later.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] postgres_fdw bug in 9.6

2017-11-30 Thread Robert Haas

On Wed, Nov 29, 2017 at 5:32 PM, Tom Lane  wrote:
> AFAICT, [1] just demonstrates that nobody had thought about the scenario
> up to that point, not that the subsequently chosen solution was a good
> one.

But have a look at this:

http://postgr.es/m/561e12d4.7040...@lab.ntt.co.jp

That shows a case where, before
5fc4c26db5120bd90348b6ee3101fcddfdf54800, a query that required the
foreign table to do an EPQ recheck produced an unambiguously wrong
answer; the query stipulates that inttab.a = ft1.a but the row
returned lacks that property.  I think this clearly shows that EPQ
rechecks are needed at least for FDW paths that are parameterized.

However, I guess that doesn't actually refute your point, which IIUC
is that maybe we don't need these checks for FDW join paths, because
we don't build parameterized paths for those yet. Now I think it would
still be a good idea to have a way of handling that case, because
somebody's likely going want to fix that /* XXX Consider parameterized
paths for the join relation */ eventually, but I guess for the
back-branches just setting epq_path = NULL instead of calling
GetExistingLocalJoinPath() might be the way to go... assuming that
your contention that this won't break anything is indeed correct.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] Removing LEFT JOINs in more cases

2017-11-30 Thread David Rowley

On 1 December 2017 at 01:40, Ashutosh Bapat
 wrote:
> The patch looks good to me.
> It applies cleanly, compiles on my laptop without warnings or errors.
> make check does not show any failures. Marking it as ready for
> committer.

Great. Many thanks for the review!


-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] Custom compression methods

2017-11-30 Thread Tomas Vondra



On 11/30/2017 04:20 PM, Ildus Kurbangaliev wrote:
> On Thu, 30 Nov 2017 00:30:37 +0100
> Tomas Vondra  wrote:
>
> ...
>
>> I can imagine other interesting use cases - for example values in
>> JSONB columns often use the same "schema" (keys, nesting, ...), so
>> can I imagine building a "dictionary" of JSON keys for the whole
>> column ...
>>
>> Ildus, is this a use case you've been aiming for, or were you aiming
>> to use the new API in a different way?
> 
> Thank you for such good overview. I agree that pglz is pretty good as
> general compression method, and there's no point to change it, at
> least now.
> 
> I see few useful use cases for compression methods, it's special
> compression methods for int[], timestamp[] for time series and yes,
> dictionaries for jsonb, for which I have even already created an
> extension (https://github.com/postgrespro/jsonbd). It's working and
> giving promising results.
> 

I understand the reluctance to put everything into core, particularly
for complex patches that evolve quickly. Also, not having to put
everything into core is kinda why we have extensions.

But perhaps some of the simpler cases would be good candidates for core,
making it possible to test the feature?

>>
>> I wonder if the patch can be improved to handle this use case better.
>> For example, it requires knowledge the actual data type, instead of
>> treating it as opaque varlena / byte array. I see tsvector compression
>> does that by checking typeid in the handler.
>>
>> But that fails for example with this example
>>
>> db=# create domain x as tsvector;
>> CREATE DOMAIN
>> db=# create table t (a x compressed ts1);
>> ERROR:  unexpected type 28198672 for tsvector compression handler
>>
>> which means it's a few brick shy to properly support domains. But I
>> wonder if this should be instead specified in CREATE COMPRESSION
>> METHOD instead. I mean, something like
>>
>> CREATE COMPRESSION METHOD ts1 HANDLER tsvector_compression_handler
>> TYPE tsvector;
>>
>> When type is no specified, it applies to all varlena values. Otherwise
>> only to that type. Also, why not to allow setting the compression as
>> the default method for a data type, e.g.
>>
>> CREATE COMPRESSION METHOD ts1 HANDLER tsvector_compression_handler
>> TYPE tsvector DEFAULT;
>>
>> would automatically add 'COMPRESSED ts1' to all tsvector columns in
>> new CREATE TABLE commands.
> 
> Initial version of the patch contains ALTER syntax that change 
> compression method for whole types, but I have decided to remove
> that functionality for now because the patch is already quite complex
> and it could be added later as separate patch.
> 
> Syntax was:
> ALTER TYPE  SET COMPRESSION ;
> 
> Specifying the supported type for the compression method is a good idea.
> Maybe the following syntax would be better?
> 
> CREATE COMPRESSION METHOD ts1 FOR tsvector HANDLER
> tsvector_compression_handler;
> 

Understood. Good to know you've considered it, and I agree it doesn't
need to be there from the start (which makes the patch simpler).

>>
>> BTW do you expect the tsvector compression to be generally useful, or
>> is it meant to be used only by the tests? If generally useful,
>> perhaps it should be created in pg_compression by default. If only
>> for tests, maybe it should be implemented in an extension in contrib
>> (thus also serving as example how to implement new methods).
>>
>> I haven't thought about the JSONB use case very much, but I suppose
>> that could be done using the configure/drop methods. I mean,
>> allocating the dictionary somewhere (e.g. in a table created by an
>> extension?). The configure method gets the Form_pg_attribute record,
>> so that should be enough I guess.
>>
>> But the patch is not testing those two methods at all, which seems
>> like something that needs to be addresses before commit. I don't
>> expect a full-fledged JSONB compression extension, but something
>> simple that actually exercises those methods in a meaningful way.
> 
> I will move to tsvector_compression_handler to separate extension in
> the next version. I added it more like as example, but also it could be
> used to achieve a better compression for tsvectors. Tests on maillists
> database ('archie' tables):
> 
> usual compression:
> 
> maillists=# select body_tsvector, subject_tsvector into t1 from
> messages; SELECT 1114213
> maillists=# select pg_size_pretty(pg_total_relation_size('t1'));
>  pg_size_pretty 
> 
>  1637 MB
> (1 row)
> 
> tsvector_compression_handler:
> maillists=# select pg_size_pretty(pg_total_relation_size('t2'));
>  pg_size_pretty 
> 
>  1521 MB
> (1 row)
> 
> lz4:
> maillists=# select pg_size_pretty(pg_total_relation_size('t3'));
>  pg_size_pretty 
> 
>  1487 MB
> (1 row)
> 
> I don't stick to tsvector_compression_handler, I think if there
> will some example that can use all the features then
> tsvector_compression_handler could be replaced with it.

Re: [HACKERS] Proposal: Local indexes for partitioned table

2017-11-30 Thread Robert Haas

On Thu, Nov 30, 2017 at 7:02 AM, Alvaro Herrera  wrote:
> Great question.  So you're thinking that the planner might have an
> interest in knowing what indexes are defined at the parent table level
> for planning purposes; but for that to actually have any effect we would
> need to change the planner and executor also.  And one more point, also
> related to something you said before: we currently (I mean after my
> patch) don't mark partitioned-table-level indexes as valid or not valid
> depending on whether all its children exist, so trying to use that in
> the planner without having a flag could cause invalid plans to be
> generated (i.e. ones that would cause nonexistent indexes to be
> referenced).

Did you do it this way due to locking concerns?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] Refactoring identifier checks to consistently use strcmp

2017-11-30 Thread Robert Haas

On Thu, Nov 30, 2017 at 6:40 AM, Daniel Gustafsson  wrote:
> For reasons unknown to me I had avoided poking in contrib/.  Attached patch
> handles the additional defname comparisons in contrib that are applicable.

I am having a bit of trouble understanding why the first hunk in
DefineAggregate() is taking PARALLEL as a special case.  The
documentation gives three separate synopses for CREATE AGGREGATE, but
parallel appears in all of them, and there are other options that do
too, so the comment doesn't really help me understand why it's special
as compared to other, similar options.

I think the changes in DefineView and ATExecSetRelOptions is wrong,
because transformRelOptions() is still using pg_strcasecmp.  With the
patch:

rhaas=# create view v(x) with ("Check_option"="local") as select 1;
CREATE VIEW
rhaas=# create view v(x) with ("check_option"="local") as select 1;
ERROR:  WITH CHECK OPTION is supported only on automatically updatable views
HINT:  Views that do not select from a single table or view are not
automatically updatable.

Oops.

Here are, for the record, examples of SQL that will be generate errors
or warnings with the patch, but not presently, with a note about which
function got changed at the C level to affect the behavior.

transformRelOptions/interpretOidsOption: create table a () with ("OiDs"=true);
DefineAggregate, second hunk: CREATE AGGREGATE avg (float8) (sfunc =
float8_accum, stype = float8[], finalfunc = float8_avg, initcond =
'{0,0,0}', parallel = 'sAfe');
DefineCollation: CREATE COLLATION stunk ("LoCaLeS" = "C");
compute_attributes_with_style: create function one() returns int as
$$select 1$$ language sql with ("isStrict" = 'true');
DefineOperator: create operator @ (procedure = pg_catalog.int4eq,
leftarg = int4, "RIGHTARG" = int4);
DefineType: create type awesome (input = int4in, "OuTpUt" = int4out);
validateWithCheckOption: create table t(a int, b text, unique(a));
create view x with (check_option = 'loCal') as select * from t;

I have not yet managed to figure out what the impact of the contrib
changes, or the text search changes in core, is.  This is partly a
lack of subject matter expertise, but the fact that the functions
being modified in contrib have a grand total of 0 lines of comments
between them does not help.  That's not this patch's fault, nor this
patch's job to fix, but it is a barrier to understanding.  I think it
would be nice to have a complete list of examples of what syntax this
patch is affecting.

I am actually pretty dubious that we want to do this.  I found one bug
already (see above), and I don't think there's any guarantee that it's
the only one, because it's pretty hard to be sure that none of the
remaining calls to pg_strcasecmp are being applied to any of these
values in some other part of the code.  I'm not sure that the backward
compatibility issue is a huge deal, but from my point of view this
carries a significant risk of introducing new bugs, might annoy users
who spell any of these keywords in all caps with surrounding quotation
marks, and really has no clear benefit that I can see.  The
originally-offered justification is that making this consistent would
help us avoid subtle bugs, but it seems at least as likely to CREATE
subtle bugs, and the status quo is AFAICT harming nobody.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] plpgsql - additional extra checks

2017-11-30 Thread Pavel Stehule

Hi

2017-11-30 3:44 GMT+01:00 Michael Paquier :

> On Wed, Sep 13, 2017 at 12:51 PM, Pavel Stehule 
> wrote:
> >
> >
> > 2017-09-13 1:42 GMT+02:00 Daniel Gustafsson :
> >>
> >> > On 08 Apr 2017, at 15:46, David Steele  wrote:
> >> >
> >> >> On 1/13/17 6:55 AM, Marko Tiikkaja wrote:
> >> >>> On Fri, Jan 13, 2017 at 2:46 AM, Jim Nasby <
> jim.na...@bluetreble.com
> >> >>> > wrote:
> >> >>>
> >> >>>On 1/11/17 5:54 AM, Pavel Stehule wrote:
> >> >>>
> >> >>>+too_many_rows
> >> >>>+
> >> >>>+ 
> >> >>>+  When result is assigned to a variable by
> >> >>>INTO clause,
> >> >>>+  checks if query returns more than one row. In this
> case
> >> >>>the assignment
> >> >>>+  is not deterministic usually - and it can be signal
> some
> >> >>>issues in design.
> >> >>>
> >> >>>
> >> >>>Shouldn't this also apply to
> >> >>>
> >> >>>var := blah FROM some_table WHERE ...;
> >> >>>
> >> >>>?
> >> >>>
> >> >>>AIUI that's one of the beefs the plpgsql2 project has.
> >> >>>
> >> >>>
> >> >>> No, not at all.  That syntax is undocumented and only works because
> >> >>> PL/PgSQL is a hack internally.  We don't use it, and frankly I don't
> >> >>> think anyone should.
> >> >
> >> > This submission has been moved to CF 2017-07.
> >>
> >> This patch was automatically marked as “Waiting for author” since it
> needs
> >> to
> >> be updated with the macro changes in
> >> 2cd70845240087da205695baedab6412342d1dbe
> >> to compile.  Changing to using TupleDescAttr(); makes it compile again.
> >> Can
> >> you submit an updated version with that fix Pavel?
> >
> >
> > I am sending fixed patch
>
> +   
> +The setting plpgsql.extra_warnings to all is
> a
> +good idea in developer or test environments.
> +   
> At least documentation needs patching, or this is going to generate
> warnings on HEAD at compilation. I am moving this to next CF for lack
> of reviews, and the status is waiting on author as this needs at least
> a couple of doc fixes.
>

fixed doc attached

Regards

Pavel


> --
> Michael
>
diff --git a/doc/src/sgml/plpgsql.sgml b/doc/src/sgml/plpgsql.sgml
index 7d23ed437e..efa48bc13c 100644
--- a/doc/src/sgml/plpgsql.sgml
+++ b/doc/src/sgml/plpgsql.sgml
@@ -4951,7 +4951,7 @@ a_output := a_output || $$ if v_$$ || referrer_keys.kind || $$ like '$$
 
   
   
-   Additional Compile-time Checks
+   Additional Compile-time and Run-time Checks
 

 To aid the user in finding instances of simple but common problems before
@@ -4963,6 +4963,11 @@ a_output := a_output || $$ if v_$$ || referrer_keys.kind || $$ like '$$
 so you are advised to test in a separate development environment.

 
+   
+The setting plpgsql.extra_warnings to all is a 
+good idea in developer or test environments.
+   
+
  
   These additional checks are enabled through the configuration variables
   plpgsql.extra_warnings for warnings and
@@ -4979,6 +4984,30 @@ a_output := a_output || $$ if v_$$ || referrer_keys.kind || $$ like '$$
  
 

+
+   
+strict_multi_assignment
+
+ 
+  Some PL/PgSQL commands allows to assign a values to
+  more than one variable. The number of target variables should not be
+  equal to number of source values. Missing values are replaced by NULL
+  value, spare values are ignored. More times this situation signalize
+  some error.
+ 
+
+   
+
+   
+too_many_rows
+
+ 
+  When result is assigned to a variable by INTO clause,
+  checks if query returns more than one row. In this case the assignment
+  is not deterministic usually - and it can be signal some issues in design.
+ 
+
+   
   
 
   The following example shows the effect of plpgsql.extra_warnings
@@ -4997,6 +5026,34 @@ WARNING:  variable "f1" shadows a previously defined variable
 LINE 3: f1 int;
 ^
 CREATE FUNCTION
+
+
+  The another example shows the effect of plpgsql.extra_warnings
+  set to strict_multi_assignment:
+
+SET plpgsql.extra_warnings TO 'strict_multi_assignment';
+
+CREATE OR REPLACE FUNCTION public.foo()
+ RETURNS void
+ LANGUAGE plpgsql
+AS $$
+DECLARE
+  x int;
+  y int;
+BEGIN
+  SELECT 1 INTO x, y;
+  SELECT 1, 2 INTO x, y;
+  SELECT 1, 2, 3 INTO x, y;
+END;
+$$
+
+SELECT foo();
+WARNING:  Number of evaluated attributies (1) does not match expected attributies (2)
+WARNING:  Number of evaluated attributies (3) does not match expected attributies (2)
+ foo 
+-
+ 
+(1 row)
 
  
  
diff --git a/src/pl/plpgsql/src/pl_exec.c b/src/pl/plpgsql/src/pl_exec.c
index ec480cb0ba..0879e84cd2 100644
--- a/src/pl/plpgsql/src/pl_exec.c
+++ b/src/pl/plpgsql/src/pl_exec.c
@@ -3629,6 +3629,24 @@ exec_stmt_execsql(PLpgSQL_execstate *estate,
 	long		tcount;
 	int			rc;
 	PLpgSQL_expr *expr = stmt->sqlstmt;
+	bool		too_many_rows_check;
+	int			too_many_rows_level;
+
+	if (plpgsql_extra_errors & PLPGSQL_XCHECK_TOOMANYROWS)
+

Re: [HACKERS] SQL procedures

2017-11-30 Thread Peter Eisentraut

On 11/29/17 14:17, Andrew Dunstan wrote:
> On 11/28/2017 10:03 AM, Peter Eisentraut wrote:
>> Here is a new patch that addresses the previous review comments.
>>
>> If there are no new comments, I think this might be ready to go.
> 
> Looks good to me. Marking ready for committer.

committed


-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] WIP: Covering + unique indexes.

2017-11-30 Thread Andrey Borodin

Hi, Peter!
> 29 нояб. 2017 г., в 8:45, Peter Geoghegan  написал(а):
> 
> It looks like amcheck needs to be patched -- a simple oversight.
> amcheck is probably calling _bt_compare() without realizing that
> internal pages don't have the extra attributes (just leaf pages,
> although they should also not participate in comparisons in respect of
> included/extra columns). There were changes to amcheck at one point in
> the past. That must have slipped through again. I don't think it's
> that complicated.
> 
> BTW, it would probably be a good idea to use the new Github version's
> "heapallindexed" verification [1] for testing this patch. Anastasia
> will need to patch the externally maintained amcheck to do this, but
> it's probably no extra work, because this is already needed for
> contrib/amcheck, and because the heapallindexed check doesn't actually
> care about index structure at all.

Seems like it was not a big deal of patching, I've fixed those bits (see 
attachment). 
I've done only simple tests as for now, but I'm planning to do better testing 
before next CF.
Thanks for mentioning "heapallindexed", I'll use it too.

Best regards, Andrey Borodin.


amcheck_include.diff
Description: Binary data

Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager

2017-11-30 Thread Robert Haas

On Thu, Nov 30, 2017 at 6:20 AM, Masahiko Sawada  wrote:
>> This code ignores the existence of multiple databases; RELEXTLOCK
>> contains a relid, but no database OID.  That's easy enough to fix, but
>> it actually causes no problem unless, by bad luck, you have two
>> relations with the same OID in different databases that are both being
>> rapidly extended at the same time -- and even then, it's only a
>> performance problem, not a correctness problem.  In fact, I wonder if
>> we shouldn't go further: instead of creating these RELEXTLOCK
>> structures dynamically, let's just have a fixed number of them, say
>> 1024.  When we get a request to take a lock, hash  and
>> take the result modulo 1024; lock the RELEXTLOCK at that offset in the
>> array.
>
> Attached the latest patch incorporated comments except for the fix of
> the memory size for relext lock.

It doesn't do anything about the comment of mine quoted above.  Since
it's only possible to hold one relation extension lock at a time, we
don't really need the hash table here at all. We can just have an
array of 1024 or so locks and map every  pair on to one of
them by hashing.  The worst thing we'll get it some false contention,
but that doesn't seem awful, and it would permit considerable further
simplification of this code -- and maybe make it faster in the
process, because we'd no longer need the hash table, or the pin count,
or the extra LWLocks that protect the hash table.  All we would have
is atomic operations manipulating the lock state, which seems like it
would be quite a lot faster and simpler.

BTW, I think RelExtLockReleaseAll is broken because it shouldn't
HOLD_INTERRUPTS(); I also think it's kind of silly to loop here when
we know we can only hold one lock.  Maybe RelExtLockRelease can take
bool force and do if (force) held_relextlock.nLocks = 0; else
held_relextlock.nLocks--.  Or, better yet, have the caller adjust that
value and then only call RelExtLockRelease() if we needed to release
the lock in shared memory.  That avoids needless branching.  On a
related note, is there any point in having both held_relextlock.nLocks
and num_held_relextlocks?

I think RelationExtensionLock should be a new type of IPC wait event,
rather than a whole new category.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] Support to COMMENT ON DATABASE CURRENT_DATABASE

2017-11-30 Thread Robert Haas

On Wed, Nov 29, 2017 at 11:35 PM, Michael Paquier
 wrote:
> On Mon, Nov 27, 2017 at 11:41 AM, Jing Wang  wrote:
>> This is a patch for current_database working on ALTER ROLE/GRANT/REVOKE
>> statements which should be applied after the previous patch
>> "comment_on_current_database_no_pgdump_v4.4.patch".
>>
>> By using the patch the CURRENT_DATABASE can working in the following SQL
>> statements:
>>
>> ALTER ROLE ... IN DATABASE CURRENT_DATABASE SET/RESET
>> configuration_parameter
>> GRANT ... ON DATABASE CURRENT_DATABASE TO role_specification ...
>> REVOKE ... ON DATABASE CURRENT_DATABASE FROM ...
>
> Moved to next CF with same status, "needs review".

Patch no longer applies.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: pl/perl extension fails on Windows

2017-11-30 Thread Andrew Dunstan



On 11/29/2017 11:45 PM, Tom Lane wrote:
> Noah Misch  writes:
>> On Wed, Nov 29, 2017 at 11:34:56PM -0500, Tom Lane wrote:
>>> I don't really have an opinion about the relative merits of these changes,
>>> but why do anything?  The existing solution has the buildfarm happy, and
>>> we've not heard any field complaints that I saw.  I'm not sure we should
>>> spend more time on supporting obsolete toolchain combinations that aren't
>>> represented in the buildfarm.
>> It's the other way around.  The buildfarm's 32-bit MSVC animals each use an
>> obsolete Perl.  PostgreSQL is incompatible with today's 32-bit ActivePerl and
>> Strawberry Perl.
> Oh, OK.  In that case, we need to get some representatives of these
> more modern builds into the buildfarm while we're at it.
>
>   



My 32 bit XP machine currawong/frogmouth builds against perl 5.16.3,
build dated 2014.

cheers

andrew

-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] [PROPOSAL] Temporal query processing with range types

2017-11-30 Thread Robert Haas

On Tue, Nov 21, 2017 at 4:36 AM, Peter Moser  wrote:
> Hi hackers,
> we like to rethink our approach...
>
> For simplicity I'll drop ALIGN for the moment and focus solely on NORMALIZE:
>
> SELECT * FROM (R NORMALIZE S ON R.x = S.y WITH (R.time, S.time)) c;
>
> Our normalization executor node needs the following input (for now
> expressed in plain SQL):
>
> SELECT R.*, p1
> FROM (SELECT *, row_id() OVER () rn FROM R) R
>  LEFT OUTER JOIN (
> SELECT y, LOWER(time) p1 FROM S
> UNION
> SELECT y, UPPER(time) p1 FROM S
>  ) S
>  ON R.x = S.y AND p1 <@ R.time
> ORDER BY rn, p1;
>
> In other words:
> 1) The left subquery adds an unique ID to each tuple (i.e., rn).
> 2) The right subquery creates two results for each input tuple: one for
>the upper and one for the lower bound of each input tuple's valid time
>column. The boundaries get put into a single (scalar) column, namely p1.
> 3) We join both subqueries if the normalization predicates hold (R.x = S.y)
>and p1 is inside the time of the current outer tuple.
> 4) Finally, we sort the result by the unique ID (rn) and p1, and give all
>columns of the outer relation, rn and p1 back.

So, if I understand correctly, there are three possible outcomes.  If
S.time and R.time are non-overlapping, then a null-extended row is
produced with the original data from R.  If S.time overlaps R.time but
is not contained within it, then this produces one result row, not
null-extended -- either the upper or lower bound will found to be
contained within R.time, but the other will not.  If S.time overlaps
R.time but is not contained within it, then this produces 2 result
rows, one with p1 containing S.time's lower bound and one with p1
containing S.time's upper bound.  Is that right?  Then, after all this
happens, the temporal normalizer code runs over the results and uses
the p1 values as split points for the ranges from R.

I wonder if we could instead think about R NORMALIZE S ON R.x = S.y
WITH (R.time, S.time) as an ordinary join on R.x = S.y with some extra
processing afterwards.  After finding all the join partners for a row
in R, extract all the lower and upper bounds from the S.time fields of
the join partners and use that to emit as many rows from R as may be
necessary.  The main problem that I see with that approach is that it
seems desirable to separate the extra-processing-afterwards step into
a separate executor node, and there's no way for a separate type of
plan node to know when the join advances the outer side of the join.
That's the purpose that row_id() is serving as you have it; we'd have
to invent some other kind of signalling mechanism, which might not be
entirely trivial.  :-(

If we could do it, though, it might be quite a bit more efficient,
because it would avoid scanning S twice and performing a UNION on the
results of those scans.  Also, we wouldn't necessarily need to sort
the whole set of rows from R, which I suspect is unavoidable in your
implementation.  We'd just need to sort the individual groups of rows
from S, and my guess is that in many practical cases those groups are
fairly small.

I wonder what identities hold for NORMALIZE.  It does not seem to be
commutative, but it looks like it might be associative... i.e. (R
NORMALIZE S) NORMALIZE T produces the same output as R NORMALIZE (S
NORMALIZE T), perhaps?

> Our first attempt to understand the new approach would be as follows: The
> left base rel of the inner left-outer-join can be expressed as a WindowAgg
> node. However, the right query of the join is much more difficult to build
> (maybe through hash aggregates). Both queries could be put together with a
> MergeJoin for instance. However, if we create the plan tree by hand and
> choose algorithms for it manually, how is it possible to have it optimized
> later? Or, if that is not possible, how do we choose the best algorithms
> for it?

You don't want to generate plan nodes directly like this, I think.
Generally, you create paths, and the cheapest path wins at the end of
planning.  The thing that makes this tricky is that the temporal
normalization phase can be separated from the inner left-outer-join by
other joins, and the planner doesn't really have a good way of
representing that concept.

More broadly, the very idea of creating plan nodes suggests that you
are approaching this from the point of view of sticking the logic in
near the end of the planning process, which I think is not going to
work.  Whatever is going to happen here needs to happen at path
generation time at the latest, or maybe even earlier, before the
optimizer even starts processing the join tree.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Protect syscache from bloating with negative cache entries

2017-11-30 Thread Robert Haas

On Wed, Nov 29, 2017 at 11:17 PM, Tom Lane  wrote:
> The thing that makes me uncomfortable about this is that we used to have a
> catcache size limitation mechanism, and ripped it out because it had too
> much overhead (see commit 8b9bc234a).  I'm not sure how we can avoid that
> problem within a fresh implementation.

At the risk of beating a dead horse, I still think that the amount of
wall clock time that has elapsed since an entry was last accessed is
very relevant.  The problem with a fixed maximum size is that you can
hit it arbitrarily frequently; time-based expiration solves that
problem.  It allows backends that are actively using a lot of stuff to
hold on to as many cache entries as they need, while forcing backends
that have moved on to a different set of tables -- or that are
completely idle -- to let go of cache entries that are no longer being
actively used.  I think that's what we want.  Nobody wants to keep the
cache size small when a big cache is necessary for good performance,
but what people do want to avoid is having long-running backends
eventually accumulate huge numbers of cache entries most of which
haven't been touched in hours or, maybe, weeks.

To put that another way, we should only hang on to a cache entry for
so long as the bytes of memory that it consumes are more valuable than
some other possible use of those bytes of memory.  That is very likely
to be true when we've accessed those bytes recently, but progressively
less likely to be true the more time has passed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] INSERT ON CONFLICT and partitioned tables

2017-11-30 Thread Robert Haas

On Wed, Nov 29, 2017 at 11:07 PM, Michael Paquier
 wrote:
> On Fri, Nov 24, 2017 at 11:47 AM, Amit Langote
>  wrote:
>> On 2017/11/24 11:45, Amit Langote wrote:
>>> Meanwhile, rebased patch is attached.
>>
>> Oops, forgot to attach in the last email.  Attached now.
>
> Moved to next CF.

I have two review comments concerning this patch.  First, I think it
would be a good idea to add this to the regression test, just before
'drop table':

+-- but it works OK if we target the partition directly
+insert into parted_conflict_test_1 values (1) on conflict (b) do
update set a = excluded.a;

Second, this would be the first place where the second argument to
ExecOpenIndices() is passed simply as true.  The only other caller
that doesn't pass constant false is in nodeModifyTable.c and looks
like this:

ExecOpenIndices(resultRelInfo, mtstate->mt_onconflict != ONCONFLICT_NONE);

The intention here appears to be to avoid the overhead of doing
BuildSpeculativeIndexInfo() when it isn't needed.  I think we should
try to do the same thing here.  As written, the patch will do that
work even when the query has no ON CONFLICT clause, which doesn't seem
like a good idea.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Commit fest 2017-11

2017-11-30 Thread Andreas Karlsson

On 11/30/2017 05:51 AM, Michael Paquier wrote:> One thing that could be 
improved in my

opinion is that patch authors should try more to move a patch to a
following commit fest once the end gets close...This would leverage
slightly the load of work for the CFM.


Thanks for the suggestion. I had no idea that I could do that.

Andreas

Re: [HACKERS] Bug in to_timestamp().

2017-11-30 Thread Robert Haas

On Wed, Nov 29, 2017 at 10:50 PM, Michael Paquier
 wrote:
> On Thu, Nov 23, 2017 at 12:23 AM, Arthur Zakirov
>  wrote:
>> Messages have the following format now:
>>
>> SELECT to_timestamp('97/Feb/16', 'FXYY:Mon:DD');
>> ERROR:  unexpected character "/", expected ":"
>> HINT:  In FX mode, punctuation in the input string must exactly match the 
>> format string.
>
> Moved to next CF as this is fresh and got no reviews.

So is it now the case that all of the regression test cases in this
patch produce the same results as they would on Oracle?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] [PATCH] Call RelationDropStorage() for broader range of object drops.

2017-11-30 Thread Robert Haas

On Wed, Nov 29, 2017 at 10:33 PM, Michael Paquier
 wrote:
> On Fri, Sep 15, 2017 at 4:36 AM, Alexander Korotkov
>  wrote:
>> +1,
>> FDW looks OK for prototyping pluggable storage, but it doesn't seem suitable
>> for permanent implementation.
>> BTW, Hadi, could you visit "Pluggable storage" thread and check how suitable
>> upcoming pluggable storage API is for cstore?
>
> I cannot make a clear opinion about this patch. Not changing the
> situation, or changing it have both downsides and upsides. The
> suggestion from Alexander is surely something to look at. I am bumping
> this to next CF for now..

Well, I think there's no question that this patch is not really the
right way of solving the problem; the right way is pluggable storage.
What remains to be decided is whether we should adopt this approach to
make life easier for extension authors between now and then.  I don't
have a big problem with adopting this approach *provided that* we're
confident that it won't ever put us at risk of removing the wrong
files.  For example, if it were possible for rel->rd_node.relNode,
which we expect to be pointing at nothing in the case of a foreign
table, to instead be pointing at somebody else's relfilenode, this
patch would be bad news.

And I'm worried that might actually be a problem, because the only
hard guarantee GetNewRelFileNode() provides is that the file doesn't
exist on disk.  If the pg_class argument to that function is passed as
non-NULL, we additionally guarantee that the OID is not in use in
pg_class.oid, but not all callers do that, and it anyway doesn't
guarantee that the OID is not in use in pg_class.relfilenode.  So I
think it might be possible for a table that gets rewritten by CLUSTER
or VACUUM FULL to end up with the same relfilenode that was previously
assigned to a foreign table; dropping the foreign table would then
nuke the other relation's storage.  This probably wouldn't be a
problem in Citus's case, because they would create a file for the
FDW's relfilenode and that would prevent it from getting used -- but
it would be a problem for postgres_fdw or any other FDW which is doing
the expected thing of not creating files on disk.

Unless somebody can prove convincingly that this argument is wrong and
that there are no other possible problems either, and memorialize that
argument in the form of a detailed comment, I think we should reject
this patch.  
http://postgr.es/m/ca+tgmoa7tja6-mvjwdcb_bouwfkx9apnu+ok9m94tktztym...@mail.gmail.com
from earlier this morning is good evidence for the proposition that we
must be careful to document the reasons for the changes we make, even
seemingly minor ones, if we don't want developers to be guessing in
ten years whether those changes were actually safe and correct.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] Custom compression methods

2017-11-30 Thread Ildus Kurbangaliev

On Thu, 30 Nov 2017 00:30:37 +0100
Tomas Vondra  wrote:

> While the results may look differently for other datasets, my
> conclusion is that it's unlikely we'll find another general-purpose
> algorithm beating pglz (enough for people to switch to it, as they'll
> need to worry about testing, deployment of extensions etc).
> 
> That doesn't necessarily mean supporting custom compression algorithms
> is pointless, of course, but I think people will be much more
> interested in exploiting known features of the data (instead of
> treating the values as opaque arrays of bytes).
> 
> For example, I see the patch implements a special compression method
> for tsvector values (used in the tests), exploiting from knowledge of
> internal structure. I haven't tested if that is an improvement (either
> in compression/decompression speed or compression ratio), though.
> 
> I can imagine other interesting use cases - for example values in
> JSONB columns often use the same "schema" (keys, nesting, ...), so
> can I imagine building a "dictionary" of JSON keys for the whole
> column ...
> 
> Ildus, is this a use case you've been aiming for, or were you aiming
> to use the new API in a different way?

Thank you for such good overview. I agree that pglz is pretty good as
general compression method, and there's no point to change it, at
least now.

I see few useful use cases for compression methods, it's special
compression methods for int[], timestamp[] for time series and yes,
dictionaries for jsonb, for which I have even already created an
extension (https://github.com/postgrespro/jsonbd). It's working and
giving promising results.

> 
> I wonder if the patch can be improved to handle this use case better.
> For example, it requires knowledge the actual data type, instead of
> treating it as opaque varlena / byte array. I see tsvector compression
> does that by checking typeid in the handler.
> 
> But that fails for example with this example
> 
> db=# create domain x as tsvector;
> CREATE DOMAIN
> db=# create table t (a x compressed ts1);
> ERROR:  unexpected type 28198672 for tsvector compression handler
> 
> which means it's a few brick shy to properly support domains. But I
> wonder if this should be instead specified in CREATE COMPRESSION
> METHOD instead. I mean, something like
> 
> CREATE COMPRESSION METHOD ts1 HANDLER tsvector_compression_handler
> TYPE tsvector;
> 
> When type is no specified, it applies to all varlena values. Otherwise
> only to that type. Also, why not to allow setting the compression as
> the default method for a data type, e.g.
> 
> CREATE COMPRESSION METHOD ts1 HANDLER tsvector_compression_handler
> TYPE tsvector DEFAULT;
> 
> would automatically add 'COMPRESSED ts1' to all tsvector columns in
> new CREATE TABLE commands.

Initial version of the patch contains ALTER syntax that change
compression method for whole types, but I have decided to remove that
functionality for now because the patch is already quite complex and it
could be added later as separate patch.

Syntax was:
ALTER TYPE  SET COMPRESSION ;

Specifying the supported type for the compression method is a good idea.
Maybe the following syntax would be better?

CREATE COMPRESSION METHOD ts1 FOR tsvector HANDLER
tsvector_compression_handler;

> 
> BTW do you expect the tsvector compression to be generally useful, or
> is it meant to be used only by the tests? If generally useful,
> perhaps it should be created in pg_compression by default. If only
> for tests, maybe it should be implemented in an extension in contrib
> (thus also serving as example how to implement new methods).
> 
> I haven't thought about the JSONB use case very much, but I suppose
> that could be done using the configure/drop methods. I mean,
> allocating the dictionary somewhere (e.g. in a table created by an
> extension?). The configure method gets the Form_pg_attribute record,
> so that should be enough I guess.
> 
> But the patch is not testing those two methods at all, which seems
> like something that needs to be addresses before commit. I don't
> expect a full-fledged JSONB compression extension, but something
> simple that actually exercises those methods in a meaningful way.

I will move to tsvector_compression_handler to separate extension in
the next version. I added it more like as example, but also it could be
used to achieve a better compression for tsvectors. Tests on maillists
database ('archie' tables):

usual compression:

maillists=# select body_tsvector, subject_tsvector into t1 from
messages; SELECT 1114213
maillists=# select pg_size_pretty(pg_total_relation_size('t1'));
 pg_size_pretty 

 1637 MB
(1 row)

tsvector_compression_handler:
maillists=# select pg_size_pretty(pg_total_relation_size('t2'));
 pg_size_pretty 

 1521 MB
(1 row)

lz4:
maillists=# select pg_size_pretty(pg_total_relation_size('t3'));
 pg_size_pretty 

 1487 MB
(1 row)

I don't stick to tsvector

Re: [HACKERS] <> join selectivity estimate question

2017-11-30 Thread Robert Haas

On Wed, Nov 29, 2017 at 11:55 PM, Thomas Munro
 wrote:
> Thank you for the original pointer and the commit.  Everything here
> seems to make intuitive sense and the accompanying throw-away tests
> that I posted above seem to produce sensible results except in some
> cases that we discussed, so I think this is progress.  There is still
> something pretty funny about the cardinality estimates for TPCH Q21
> which I haven't grokked though.  I suspect it is crafted to look for a
> technique we don't know (an ancient challenge set by some long retired
> database gurus back in 1992 that their RDBMSs know how to solve,
> hopefully not in the manner of a certain car manufacturer's air
> pollution tests), but I haven't yet obtained enough round tuits to dig
> further.  I will, though.

Hmm, do you have an example of the better but still-funky estimates
handy?  Like an EXPLAIN plan?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] create_unique_path and GEQO

2017-11-30 Thread Robert Haas

On Fri, Mar 24, 2017 at 9:50 AM, Tom Lane  wrote:
> Yeah.  I think the code in mark_dummy_rel is newer and better-thought-out
> than what's in create_unique_path.  It probably makes sense to change over.

I did a bit of archaeology here.  create_unique_path() first appears
in commit bdfbfde1b168b3332c4cdac34ac86a80aaf4d442 (vintage 2003),
where it used GetMemoryChunkContext(rel). Commit
f41803bb39bc2949db200116a609fd242d0ec221 (vintage 2007) changed it to
use root->planner_cxt, but neither the comment changes in that patch
nor the commit message give any hint as to the motivation for the
change.  The comments do mention that it should be kept in sync with
best_inner_indexscan(), which was also switched to use
root->planner_cxt by that commit, again without any real explanation
as to why, and best_inner_indexscan() continued to use
root->planner_cxt until its demise in commit
e2fa76d80ba571d4de8992de6386536867250474 (vintage 2012).  Meanwhile,
mark_dummy_rel() didn't switch contexts at all until commit
eca75a12a27d28b972fc269c1c8813cd8eb15441 (vintage 2011) at which point
it began using GetMemoryChunkContext(rel).

So, the coding that Ashutosh is proposing here is just changing things
back to the way they were before 2007, but with the better comment
that Tom wrote in 2011 for mark_dummy_rel().  Since the 2007 change
lacks any justification and the new code has one, I'm inclined to
agree with Tom that changing this make sense.  If it turns out to be
wrong, then mark_dummy_rel() also needs fixing, and the comments need
to explain things better.  My guess, though, is that it's right,
because the rationale offered in mark_dummy_rel() seems to make a lot
of sense.  One reason why we might want to care about this, other than
a laudable consistency, is that at one point Ashutosh had some patches
to reduce the memory usage of partition-wise join that involved doing
more incremental discarding of RelOptInfos akin to what GEQO does
today.  If we ever do anything like that, it will be good if we're
consistent (and correct) about which contexts end up containing which
data structures.

All of which, I think, is a long-winded way of saying that I'm going
to go commit this patch.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: ASCII Null control character validation

2017-11-30 Thread Tom Lane

Alexey Chernyshov  writes:
> I found in src/backend/utils/mb/wchar.c: pg_verify_mbstr_len() that it
> reports ASCII Null character (\000) as invalid. As for me, it should
> pass validation.

This is intentional and we're not going to change it.  There is too
much code in the backend that relies on NUL-terminated strings ...

> However, ASCII Null character breaks a line and the
> end of the line is missed, try:
> INSERT INTO mytable VALUES (E'a\001b\000c and rest of line MIA');

... like that for instance.  See (many) past discussions of this issue.

regards, tom lane

Re: [HACKERS] postgres_fdw bug in 9.6

2017-11-30 Thread Tom Lane

Etsuro Fujita  writes:
> (2017/11/30 7:32), Tom Lane wrote:
>> the output of the foreign join cannot change during EPQ, since the remote
>> server already locked the rows before returning them.  The only thing that
>> can change is the output of the local scan on public.tab.  Yes, we then
>> need to re-verify that foo.a = tab.a ... but in this example, that's the
>> responsibility of the NestLoop plan node, not the foreign join.

> That's right, but is that true when the FDW uses late row locking?

An FDW using late row locking would need to work harder, yes.  But
that's true at the scan level as well as the join level.  We have
already committed to using early locking in postgres_fdw, for the
network-round-trip-cost reasons I mentioned before, and I can't see
why we'd change that decision at the join level.

Right now we've got the worst of both worlds, in that we're actually
doing early row locking on the remote, but we're paying (some of) the
code complexity costs required for late locking.

regards, tom lane

Re: [HACKERS] UPDATE of partition key

2017-11-30 Thread Amit Khandekar

While addressing Thomas's point about test scenarios not yet covered,
I observed the following ...

Suppose an UPDATE RLS policy with a WITH CHECK clause is defined on
the target table. Now In ExecUpdate(), the corresponding WCO qual gets
executed *before* the partition constraint check, as per existing
behaviour. And the qual succeeds. And then because of partition-key
updated, the row is moved to another partition. Here, suppose there is
a BR INSERT trigger which modifies the row, and the resultant row
actually would *not* pass the UPDATE RLS policy. But for this
partition, since it is an INSERT, only INSERT RLS WCO quals are
executed.

So effectively, with a user-perspective, an RLS WITH CHECK policy that
was defined to reject an updated row, is getting updated successfully.
This is because the policy is not checked *after* a row trigger in the
new partition is executed.

Attached is a test case that reproduces this issue.

I think, in case of row-movement, we should defer calling
ExecWithCheckOptions() until the row is being inserted using
ExecInsert(). And then in ExecInsert(), ExecWithCheckOptions() should
be called using WCO_RLS_UPDATE_CHECK rather than WCO_RLS_INSERT_CHECK
(I recall Amit Langote was of this opinion) as below :

--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -510,7 +510,9 @@ ExecInsert(ModifyTableState *mtstate,
  * we are looking for at this point.
  */
  if (resultRelInfo->ri_WithCheckOptions != NIL)
- ExecWithCheckOptions(WCO_RLS_INSERT_CHECK,
+ExecWithCheckOptions((mtstate->operation == CMD_UPDATE ?
+ WCO_RLS_UPDATE_CHECK : WCO_RLS_INSERT_CHECK),
  resultRelInfo, slot, estate);


It can be argued that since in case of triggers we always execute
INSERT row triggers for rows inserted as part of update-row-movement,
we should be consistent and execute INSERT WCOs and not UPDATE WCOs
for such rows. But note that, the row triggers we execute are defined
on the leaf partitions. But the RLS policies being executed are
defined for the target partitioned table, and not the leaf partition.
Hence it makes sense to execute them as per the original operation on
the target table. This is similar to why we execute UPDATE statement
triggers even when the row is eventually inserted into another
partition. This is because UPDATE statement trigger was defined for
the target table, not the leaf partition.

Barring any objections, I am going to send a revised patch that fixes
the above issue as described.

Thanks
-Amit Khandekar


wco_rls_issue.sql
Description: Binary data

Re: [HACKERS] postgres_fdw bug in 9.6

2017-11-30 Thread Etsuro Fujita


(2017/11/30 7:32), Tom Lane wrote:

AFAICT, [1] just demonstrates that nobody had thought about the scenario
up to that point, not that the subsequently chosen solution was a good
one.  In that example,

LockRows  (cost=100.00..101.18 rows=4 width=70)
Output: tab.a, tab.b, tab.ctid, foo.*, bar.*
->   Nested Loop  (cost=100.00..101.14 rows=4 width=70)
  Output: tab.a, tab.b, tab.ctid, foo.*, bar.*
  Join Filter: (foo.a = tab.a)
  ->   Seq Scan on public.tab  (cost=0.00..1.01 rows=1 width=14)
Output: tab.a, tab.b, tab.ctid
  ->   Foreign Scan  (cost=100.00..100.08 rows=4 width=64)
Output: foo.*, foo.a, bar.*, bar.a
Relations: (public.foo) INNER JOIN (public.bar)
Remote SQL: SELECT l.a1, l.a2, r.a1, r.a2 FROM (SELECT 
ROW(l.a9), l.a9 FROM (SELECT a a9 FROM public.foo FOR UPDATE) l) l (a1, a2) 
INNER JOIN (SELECT ROW(r.a9), r.a9 FROM (SELECT a a9 FROM public.bar FOR 
UPDATE) r) r (a1, a2) ON ((l.a2 = r.a2))

the output of the foreign join cannot change during EPQ, since the remote
server already locked the rows before returning them.  The only thing that
can change is the output of the local scan on public.tab.  Yes, we then
need to re-verify that foo.a = tab.a ... but in this example, that's the
responsibility of the NestLoop plan node, not the foreign join.


That's right, but is that true when the FDW uses late row locking?

Best regards,
Etsuro Fujita

Re: [HACKERS] Removing LEFT JOINs in more cases

2017-11-30 Thread Ashutosh Bapat

On Thu, Nov 30, 2017 at 12:20 PM, David Rowley
 wrote:
> Thanks for looking over this and my apologies for the delay in my
> response. I've been on leave and out of range of the internet for most
> of that time.
>
> On 23 November 2017 at 02:30, Ashutosh Bapat
>  wrote:
>>
>> @@ -597,15 +615,25 @@ rel_supports_distinctness(PlannerInfo *root,
>> RelOptInfo *rel)
>> +if (root->parse->distinctClause != NIL)
>> +return true;
>> +
>> +if (root->parse->groupClause != NIL && !root->parse->hasAggs)
>> +return true;
>> +
>>
>> The other callers of rel_supports_distinctness() are looking for distinctness
>> of the given relation, whereas the code change here are applicable to any
>> relation, not just the given relation. I find that confusing. Looking at the
>> way we are calling rel_supports_distinctness() in join_is_removable() this
>> change looks inevitable, but still if we could reduce the confusion, that 
>> will
>> be good. Also if we could avoid duplication of comment about unique index, 
>> that
>> will be good.
>
> When I put this together I'd considered leaving
> rel_supports_distinctness() alone since the other uses of the function
> can't make use of the new checks. Since you mention this, then I think
> it's likely better just to go with that and just special case the code
> in join_is_removable() to check for rel_supports_distinctness() and
> the DISTINCT/GROUP BY clauses too. This will mean less false positives
> for usages such as in innerrel_is_unique() which would end up causing
> a bit of extra work before possibly failing a bit later on. I've made
> changes in the attached to do things this way.
>
>> May be you want to add a testcase with DISTINCT ON.
>
> Good idea. I've also added a test case for this, and also one to
> ensure non-RTE_RELATION relations can be removed too.
>
> Updated patch attached.
>
> Thanks again for the review.

Thanks for considering those suggestions. The patch looks good to me.
It applies cleanly, compiles on my laptop without warnings or errors.
make check does not show any failures. Marking it as ready for
committer.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: [HACKERS] INSERT .. ON CONFLICT DO SELECT [FOR ..]

2017-11-30 Thread Marko Tiikkaja

On Thu, Nov 30, 2017 at 6:39 AM, Peter Geoghegan  wrote:

> On Wed, Nov 29, 2017 at 8:34 PM, Michael Paquier
>  wrote:
> > The patch does not currently apply. I am noticing as well that Peter
> > Geoghegan has registered himself as a reviewer a couple of hours back,
> > so moved to next CF with waiting on author as status.
>
> Marko unregistered me, so I promptly reregistered. You'll have to ask him
> why.

Oops.  I didn't mean to do what I did, and I apologize.  I had my months
mixed up and I thought this commit fest had ended without you having looked
at the patch, so I assumed you would register in the next CF if you were
still interested, or someone else could pick it up.

In any case, *I* don't have any interest in pursuing this patch at this
point.  I think this is a really good feature, and as far as I know the
patch is technically solid, or at least was when it still applied.  Can
someone else take it from here?

.m

Re: [HACKERS] Proposal: Local indexes for partitioned table

2017-11-30 Thread Alvaro Herrera

Michael Paquier wrote:
> On Wed, Nov 15, 2017 at 2:49 AM, Alvaro Herrera  
> wrote:
> > Here's the remaining bits, rebased.
> 
> At least patch 3 has conflicts with HEAD. I am moving this patch to
> next CF per a lack of reviews, switching status to waiting on author.

Thanks, will submit a new version before the start of the next
commitfest.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] Proposal: Local indexes for partitioned table

2017-11-30 Thread Alvaro Herrera

David Rowley wrote:

>  I wonder if it should be this patches job to alter the code in
> get_relation_info() which causes the indexes not to be loaded for
> partitioned tables:
> 
> /*
> * Make list of indexes.  Ignore indexes on system catalogs if told to.
> * Don't bother with indexes for an inheritance parent, either.
> */
> if (inhparent ||
> (IgnoreSystemIndexes && IsSystemRelation(relation)))
> hasindex = false;
> else
> hasindex = relation->rd_rel->relhasindex;
> 
> A partitioned table will always go into the hasindex = false code path.
> 
> I'm kind of thinking this patch should change that, even if the patch is
> not making use of the indexes, you could argue that something
> using set_rel_pathlist_hook might want to do something there, although,
> there's likely a bunch of counter arguments too.
> 
> What do you think?

Great question.  So you're thinking that the planner might have an
interest in knowing what indexes are defined at the parent table level
for planning purposes; but for that to actually have any effect we would
need to change the planner and executor also.  And one more point, also
related to something you said before: we currently (I mean after my
patch) don't mark partitioned-table-level indexes as valid or not valid
depending on whether all its children exist, so trying to use that in
the planner without having a flag could cause invalid plans to be
generated (i.e. ones that would cause nonexistent indexes to be
referenced).

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] Refactoring identifier checks to consistently use strcmp

2017-11-30 Thread Daniel Gustafsson

> On 28 Nov 2017, at 02:07, Michael Paquier  wrote:
> 
> On Tue, Nov 28, 2017 at 12:11 AM, Daniel Gustafsson  wrote:
>>> The patch needs a rebase, and there are a couple of places that need
>>> an extra lookup I think:
>>> $ git grep defname -- *.c | grep strcasecmp | wc -l
>>> 39
>> 
>> Rebased and handled a few more places which I had either missed in the last
>> round, or that had been added in the meantime.  “PARALLEL” in aggregatecmds.c
>> is intentionally using pg_strcasecmp() due to the old-style syntax which is
>> still supported.
> 
> This meritates a comment. Code readers may get confused.

Good point, added.  I also sent a separate doc patch for this to -docs the
other day.

>> AFAICS this covers all relevant codepaths from the 39 above.
> 
> I was just looking at the tsearch code which uses pg_strcmpcase, and
> those are defined with makeDefElem() so you should switch to strcmp in
> this case as well, no? If I patch the code myself I would get an error
> when double-quoting, making those command more consistent with the
> rest of what you are patching here:
> create extension unaccent;
> alter text search dictionary unaccent (Rules = 'unaccent'); -- ok
> alter text search dictionary unaccent (RuLes = 'unaccent'); -- ok
> alter text search dictionary unaccent ("Rules" = 'unaccent'); — error

For reasons unknown to me I had avoided poking in contrib/.  Attached patch
handles the additional defname comparisons in contrib that are applicable.

The remainder of the pg_strcasecmp() calls in the text search code are
operating on a defelem list created in deserialize_deflist() rather than in the
parser, so I opted for keeping that as is rather than casefolding in the list
generation.

cheers ./daniel



defname_strcmp-v3.patch
Description: Binary data

Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager

2017-11-30 Thread Masahiko Sawada

On Wed, Nov 29, 2017 at 5:33 AM, Robert Haas  wrote:
> On Sun, Nov 26, 2017 at 9:33 PM, Masahiko Sawada  
> wrote:
>> Attached latest patch incorporated all comments so far. Please review it.
>
> I think you only need RelExtLockReleaseAllI() where we currently have
> LockReleaseAll(DEFAULT_LOCKMETHOD, ...) not where we have
> LockReleaseAll(USER_LOCKMETHOD, ...).  That's because relation
> extension locks use the default lock method, not USER_LOCKMETHOD.

Fixed.

> You need to update the table of wait events in the documentation.
> Please be sure to actually build the documentation afterwards and make
> sure it looks OK.  Maybe the way event name should be
> RelationExtensionLock rather than just RelationExtension; we are not
> waiting for the extension itself.

Fixed. I added both new wait_event and wait_event_type for relext
lock. Also checked to pass a documentation build.

>
> You have a typo/thinko in lmgr/README: confliction is not a word.
> Maybe you mean "When conflicts occur, lock waits are implemented using
> condition variables."

Fixed.

>
> Instead of having shared and exclusive locks, how about just having
> exclusive locks and introducing a new primitive operation that waits
> for the lock to be free and returns without acquiring it? That is
> essentially what brin_pageops.c is doing by taking and releasing the
> shared lock, and it's the only caller that takes anything but an
> exclusive lock.  This seems like it would permit a considerable
> simplification of the locking mechanism, since there would then be
> only two possible states: 1 (locked) and 0 (not locked).

I think it's a good idea. With this change, the concurrency of
executing brin_page_cleanup() would get decreased. But since
brin_page_cleanup() is called only during vacuum so far it's no
problem. I think we can process the code in vacuumlazy.c in the same
manner as well. I've changed the patch so that it has only exclusive
locks and introduces WaitForRelationExtensionLockToBeFree() function
to wait for the the lock to be free.

Also, now that we got rid of shared locks, I gathered lock state and
pin count into a atomic uint32.

> In RelExtLockAcquire, I can't endorse this sort of coding:
>
> +if (relid == held_relextlock.lock->relid &&
> +lockmode == held_relextlock.mode)
> +{
> +held_relextlock.nLocks++;
> +return true;
> +}
> +else
> +Assert(false);/* cannot happen */
>
> Either convert the Assert() to an elog(), or change the if-statement
> to an Assert() of the same condition.  I'd probably vote for the first
> one.  As it is, if that Assert(false) is ever hit, chaos will (maybe)
> ensue.  Let's make sure we nip any such problems in the bud.

Agreed, fixed.

>
> "successed" is not a good variable name; that's not an English word.

Fixed.

> +/* Could not got the lock, return iff in conditional locking */
> +if (mustwait && conditional)
>
> Comment contradicts code.  The comment is right; the code need not
> test mustwait, as that's already been done.

Fixed.

> The way this is hooked into the shared-memory initialization stuff
> looks strange in a number of ways:
>
> - Apparently, you're making initialize enough space for as many
> relation extension locks as the save of the main heavyweight lock
> table, but that seems like overkill.  I'm not sure how much space we
> actually need for relation extension locks, but I bet it's a lot less
> than we need for regular heavyweight locks.

Agreed. The maximum of the number of relext locks is the number of
relations on a database cluster, it's not relevant with the number of
clients. Currently NLOCKENTS() counts the number of locks including
relation extension lock. One idea is to introduce a new GUC to control
the memory size, although the total memory size for locks will get
increased. Probably we can make it behave similar to
max_pred_locks_per_relation. Or, in order to not change total memory
size for lock even after moved it out of heavyweight lock, we can
divide NLOCKENTS() into heavyweight lock and relation extension lock
(for example, 80% for heavyweight locks and 20% relation extension
locks). But the latter would make parameter tuning hard. I'd vote for
the first one to keep it simple. Any ideas? This part is not fixed in
the patch yet.

> - The error message emitted when you run out of space also claims that
> you can fix the issue by raising max_pred_locks_per_transaction, but
> that has no effect on the size of the main lock table or this table.

Fixed.

> - The changes to LockShmemSize() suppose that the hash table elements
> have a size equal to the size of an LWLock, but the actual size is
> sizeof(RELEXTLOCK).

Fixed.

> - I don't really know why the code for this should be daisy-chained
> off of the lock.c code inside of being called from
> CreateSharedMemoryAndSemaphores() just like (almost) all of the other
> subsystems.

Fixed.

>
> This code ignores the existen

Re: [HACKERS] Issues with logical replication

2017-11-30 Thread Simon Riggs

On 30 November 2017 at 11:30, Petr Jelinek  wrote:
> On 30/11/17 00:47, Andres Freund wrote:
>> On 2017-11-30 00:45:44 +0100, Petr Jelinek wrote:
>>> I don't understand. I mean sure the SnapBuildWaitSnapshot() can live
>>> with it, but the problematic logic happens inside the
>>> XactLockTableInsert() and SnapBuildWaitSnapshot() has no way of
>>> detecting the situation short of reimplementing the
>>> XactLockTableInsert() instead of calling it.
>>
>> Right. But we fairly trivially can change that. I'm remarking on it
>> because other people's, not yours, suggestions aimed at making this
>> bulletproof. I just wanted to make clear that I don't think that's
>> necessary at all.
>>
>
> Okay, then I guess we are in agreement. I can confirm that the attached
> fixes the issue in my tests. Using SubTransGetTopmostTransaction()
> instead of SubTransGetParent() means 3 more ifs in terms of extra CPU
> cost for other callers. I don't think it's worth worrying about given we
> are waiting for heavyweight lock, but if we did we can just inline the
> code directly into SnapBuildWaitSnapshot().

This will still fail an Assert in TransactionIdIsInProgress() when
snapshots are overflowed.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] logical decoding of two-phase transactions

2017-11-30 Thread Nikhil Sontakke

Hi,


>> So perhaps better approach would be to not return
>> HEAPTUPLE_DEAD if the transaction id is newer than the OldestXmin (same
>> logic we use for deleted tuples of committed transactions) in the
>> HeapTupleSatisfiesVacuum() even for aborted transactions. I also briefly
>> checked HOT pruning and AFAICS the normal HOT pruning (the one not
>> called by vacuum) also uses the xmin as authoritative even for aborted
>> txes so nothing needs to be done there probably.
>>
>> In case we are worried that this affects cleanups of for example large
>> aborted COPY transactions and we think it's worth worrying about then we
>> could limit the new OldestXmin based logic only to catalog tuples as
>> those are the only ones we need available in decoding.
>
>
> Yeah, if it's limited to catalog tuples only then that sounds good. I was
> quite concerned about how it'd impact vacuuming otherwise, but if limited to
> catalogs about the only impact should be on workloads that create lots of
> TEMPORARY tables then ROLLBACK - and not much on those.
>

Based on these discussions, I think there are two separate issues here:

1) Make HeapTupleSatisfiesVacuum() to behave differently for recently
aborted catalog tuples.

2) Invent a mechanism to stop a specific logical decoding activity in
the middle. The reason to stop it could be a concurrent abort, maybe a
global transaction manager decides to rollback, or any other reason,
for example.

 ISTM, that for 2, if (1) is able to leave the recently abort tuples
around for a little bit while (we only really need them till the
decode of the current change record is ongoing), then we could
accomplish it via a callback. This callback should be called before
commencing decode and network send of each change record. In case of
in-core logical decoding, the callback for pgoutput could check for
the transaction having aborted (a call to TransactionIdDidAbort() or
similar such functions), additional logic can be added as needed for
various scenarios. If it's aborted, we will abandon decoding and send
an ABORT to the subscribers before returning.

Regards,
Nikhils

Re: Unclear regression test for postgres_fdw

2017-11-30 Thread Jeevan Chalke

On Thu, Nov 30, 2017 at 3:44 PM, Jeevan Chalke <
jeevan.cha...@enterprisedb.com> wrote:

>
>
> On Thu, Nov 30, 2017 at 1:36 AM, Antonin Houska  wrote:
>
>> The following test
>>
>> -- Input relation to aggregate push down hook is not safe to pushdown and
>> thus
>> -- the aggregate cannot be pushed down to foreign server.
>> explain (verbose, costs off)
>> select count(t1.c3) from ft1 t1, ft1 t2 where t1.c1 =
>> postgres_fdw_abs(t1.c2);
>>
>> produces the following plan
>>
>> QUERY PLAN
>> 
>> --
>>  Aggregate
>>Output: count(t1.c3)
>>->  Nested Loop
>>  Output: t1.c3
>>  ->  Foreign Scan on public.ft1 t2
>>Remote SQL: SELECT NULL FROM "S 1"."T 1"
>>  ->  Materialize
>>Output: t1.c3
>>->  Foreign Scan on public.ft1 t1
>>  Output: t1.c3
>>  Remote SQL: SELECT c3 FROM "S 1"."T 1" WHERE (("C 1"
>> = public.postgres_fdw_abs(c2)))
>>
>> which is not major problem as such, but gdb shows that the comment
>> "aggregate
>> cannot be pushed" is not correct. In fact, postgresGetForeignUpperPaths()
>> *does* create the upper path.
>>
>> The reason that UPPERREL_GROUP_AGG is eventually not used seems to be that
>> postgresGetForeignJoinPaths() -> add_foreign_grouping_paths() ->
>> estimate_path_cost_size() estimates the join cost in rather generic way.
>> While
>> the remote server can push the join clause down to the inner relation of
>> NL,
>> the postgres_fdw cost computation assumes that the join clause is applied
>> to
>> each pair of output and input tuple.
>>
>> I don't think that the postgres_fdw's estimate can be fixed easily, but
>> if the
>> impact of "shipability" on (not) using the upper relation should be
>> tested, we
>> need a different test.
>>
>
> Oops. My bad.
> Agree with your analysis.
> Will send a patch fixing this testcase.
>

Attached patch to fix the test case. In new test case I am using a JOIN
query where JOIN condition is not safe to push down and hence the JOIN
itself is unsafe. Due to which AggPushDown does not consider that relation.
Also, I have used ft2 in the query which has use_remote_estimate set to
true.

Thanks


>
>
> Thank you Antonin for catching and reporting it.
>
>
>>
>> --
>> Antonin Houska
>> Cybertec Schönig & Schönig GmbH
>> Gröhrmühlgasse 26
>> A-2700 Wiener Neustadt
>> Web: http://www.postgresql-support.de, http://www.cybertec.at
>>
>>

-- 
Jeevan Chalke
Technical Architect, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 1063d92..bce3348 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -3160,21 +3160,23 @@ drop operator public.<^(int, int);
 -- Input relation to aggregate push down hook is not safe to pushdown and thus
 -- the aggregate cannot be pushed down to foreign server.
 explain (verbose, costs off)
-select count(t1.c3) from ft1 t1, ft1 t2 where t1.c1 = postgres_fdw_abs(t1.c2);
-QUERY PLAN
---
+select count(t1.c3) from ft2 t1 left join ft2 t2 on (t1.c1 = random() * t2.c2);
+QUERY PLAN 
+---
  Aggregate
Output: count(t1.c3)
-   ->  Nested Loop
+   ->  Nested Loop Left Join
  Output: t1.c3
- ->  Foreign Scan on public.ft1 t2
-   Remote SQL: SELECT NULL FROM "S 1"."T 1"
+ Join Filter: ((t1.c1)::double precision = (random() * (t2.c2)::double precision))
+ ->  Foreign Scan on public.ft2 t1
+   Output: t1.c3, t1.c1
+   Remote SQL: SELECT "C 1", c3 FROM "S 1"."T 1"
  ->  Materialize
-   Output: t1.c3
-   ->  Foreign Scan on public.ft1 t1
- Output: t1.c3
- Remote SQL: SELECT c3 FROM "S 1"."T 1" WHERE (("C 1" = public.postgres_fdw_abs(c2)))
-(11 rows)
+   Output: t2.c2
+   ->  Foreign Scan on public.ft2 t2
+ Output: t2.c2
+ Remote SQL: SELECT c2 FROM "S 1"."T 1"
+(13 rows)
 
 -- Subquery in FROM clause having aggregate
 explain (verbose, costs off)
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index 0986957..1df1e3a 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -829,7 +829,7 @@ drop operator public.<^(int, int);
 -- In

Re: es_query_dsa is broken

2017-11-30 Thread Thomas Munro

On Thu, Nov 30, 2017 at 4:01 AM, Robert Haas  wrote:
>> Better ideas?
>
> How about this:
>
> 1. Remove es_query_dsa altogether.
> 2. Add a dsa_area * to ExecParallelInitializeDSMContext.
> 3. In ExecParallelInitializeDSM, pass the dsa_area * as a separate to
> the per-node-type function.
> 4. If the per-node-type function cares about the dsa_area *, it is
> responsible for saving a pointer to it in the PlanState node.
> 5. Also pass it down via the ParallelWorkerContext.
>
> In v10 we might need to go with a solution like what you've sketched
> here, because Tom will complain about breaking binary compatibility
> with EState (and maybe other PlanState nodes) in a released branch.

I will post both versions.  I've been stuck for a while now trying to
come up with a query that actually breaks, so I can show that it's
fixed...

-- 
Thomas Munro
http://www.enterprisedb.com

Re: Unclear regression test for postgres_fdw

2017-11-30 Thread Jeevan Chalke

On Thu, Nov 30, 2017 at 1:36 AM, Antonin Houska  wrote:

> The following test
>
> -- Input relation to aggregate push down hook is not safe to pushdown and
> thus
> -- the aggregate cannot be pushed down to foreign server.
> explain (verbose, costs off)
> select count(t1.c3) from ft1 t1, ft1 t2 where t1.c1 =
> postgres_fdw_abs(t1.c2);
>
> produces the following plan
>
> QUERY PLAN
> 
> --
>  Aggregate
>Output: count(t1.c3)
>->  Nested Loop
>  Output: t1.c3
>  ->  Foreign Scan on public.ft1 t2
>Remote SQL: SELECT NULL FROM "S 1"."T 1"
>  ->  Materialize
>Output: t1.c3
>->  Foreign Scan on public.ft1 t1
>  Output: t1.c3
>  Remote SQL: SELECT c3 FROM "S 1"."T 1" WHERE (("C 1"
> = public.postgres_fdw_abs(c2)))
>
> which is not major problem as such, but gdb shows that the comment
> "aggregate
> cannot be pushed" is not correct. In fact, postgresGetForeignUpperPaths()
> *does* create the upper path.
>
> The reason that UPPERREL_GROUP_AGG is eventually not used seems to be that
> postgresGetForeignJoinPaths() -> add_foreign_grouping_paths() ->
> estimate_path_cost_size() estimates the join cost in rather generic way.
> While
> the remote server can push the join clause down to the inner relation of
> NL,
> the postgres_fdw cost computation assumes that the join clause is applied
> to
> each pair of output and input tuple.
>
> I don't think that the postgres_fdw's estimate can be fixed easily, but if
> the
> impact of "shipability" on (not) using the upper relation should be
> tested, we
> need a different test.
>

Oops. My bad.
Agree with your analysis.
Will send a patch fixing this testcase.

Thank you Antonin for catching and reporting it.


>
> --
> Antonin Houska
> Cybertec Schönig & Schönig GmbH
> Gröhrmühlgasse 26
> A-2700 Wiener Neustadt
> Web: http://www.postgresql-support.de, http://www.cybertec.at
>
>


-- 
Jeevan Chalke
Technical Architect, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

Re: [HACKERS] Issues with logical replication

2017-11-30 Thread Stas Kelvich


> On 30 Nov 2017, at 03:30, Petr Jelinek  wrote:
> 
> On 30/11/17 00:47, Andres Freund wrote:
>> On 2017-11-30 00:45:44 +0100, Petr Jelinek wrote:
>>> I don't understand. I mean sure the SnapBuildWaitSnapshot() can live
>>> with it, but the problematic logic happens inside the
>>> XactLockTableInsert() and SnapBuildWaitSnapshot() has no way of
>>> detecting the situation short of reimplementing the
>>> XactLockTableInsert() instead of calling it.
>> 
>> Right. But we fairly trivially can change that. I'm remarking on it
>> because other people's, not yours, suggestions aimed at making this
>> bulletproof. I just wanted to make clear that I don't think that's
>> necessary at all.
>> 
> 
> Okay, then I guess we are in agreement. I can confirm that the attached
> fixes the issue in my tests. Using SubTransGetTopmostTransaction()
> instead of SubTransGetParent() means 3 more ifs in terms of extra CPU
> cost for other callers. I don't think it's worth worrying about given we
> are waiting for heavyweight lock, but if we did we can just inline the
> code directly into SnapBuildWaitSnapshot().
> 
> -- 
>  Petr Jelinek  http://www.2ndQuadrant.com/
>  PostgreSQL Development, 24x7 Support, Training & Services
> <0001-Make-XactLockTableWait-work-for-transactions-that-ar.patch>

+1 

Seems that having busy loop is the best idea out of several discussed.

I thought about small sleep at the bottom of that loop if we reached topmost
transaction, but taking into account low probability of that event may be
it is faster to do just busy wait.

Also some clarifying comment in code would be nice. 


Stas Kelvich
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] create_unique_path and GEQO

2017-11-30 Thread Ashutosh Bapat

On Thu, Nov 30, 2017 at 9:00 AM, Ashutosh Bapat
 wrote:
> On Thu, Nov 30, 2017 at 8:50 AM, Michael Paquier
>  wrote:
>> On Fri, Mar 24, 2017 at 10:50 PM, Tom Lane  wrote:
>>> Ashutosh Bapat  writes:
> Do you have test case, which can reproduce the issue you
> explained above?
>>>
 No. It would require some surgery in standard_planner() to measure the
 memory consumed in the planner context OR build the code with
 SHOW_MEMORY_STATS defined and dump memory context statistics and check
 planner context memory usage. I don't think I can produce a testcase
 quickly right now. But then, I think the problem is quite apparent
 from the code inspection alone, esp. comparing what mark_dummy_rel()
 does with what create_unique_path() is doing.
>>>
>>> Yeah.  I think the code in mark_dummy_rel is newer and better-thought-out
>>> than what's in create_unique_path.  It probably makes sense to change over.
>>
>> This has remained unanswered for more than 8 months, so I am marking
>> this patch as returned with feedback.
>
> I am not sure what's there to answer in Tom's reply. He seems to be
> agreeing with my analysis. Correct me if I am wrong. If that's the
> case, I am expecting somebody to review the patch. If there are no
> review comments, some committer may commit it.
>

For now I have marked it as need reviewer and moved to the next commitfest.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

ASCII Null control character validation

2017-11-30 Thread Alexey Chernyshov

Hello, hackers!

I found in src/backend/utils/mb/wchar.c: pg_verify_mbstr_len() that it
reports ASCII Null character (\000) as invalid. As for me, it should
pass validation. However, ASCII Null character breaks a line and the
end of the line is missed, try:

INSERT INTO mytable VALUES (E'a\001b\000c and rest of line MIA');

Find patch attached. Am I wrong?

-- 
Alexey Chernyshov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company>From ee7b505792a4d7630dd5e20aca475825d11301a7 Mon Sep 17 00:00:00 2001
From: Alexey Chernyshov 
Date: Wed, 29 Nov 2017 15:35:10 +0300
Subject: [PATCH] Fix 0x00 symbol validation. Earlier it was considered as
 illegal by fast path for ASCII.

---
 src/backend/utils/mb/wchar.c | 16 +---
 1 file changed, 5 insertions(+), 11 deletions(-)

diff --git a/src/backend/utils/mb/wchar.c b/src/backend/utils/mb/wchar.c
index a5fdda4..62b1a7d 100644
--- a/src/backend/utils/mb/wchar.c
+++ b/src/backend/utils/mb/wchar.c
@@ -1922,18 +1922,12 @@ pg_verify_mbstr_len(int encoding, const char *mbstr, int len, bool noError)
 		int			l;
 
 		/* fast path for ASCII-subset characters */
-		if (!IS_HIGHBIT_SET(*mbstr))
+		if ((*mbstr >= 0x00) && (*mbstr <= 0x7F))
 		{
-			if (*mbstr != '\0')
-			{
-mb_len++;
-mbstr++;
-len--;
-continue;
-			}
-			if (noError)
-return -1;
-			report_invalid_encoding(encoding, mbstr, len);
+			mb_len++;
+			mbstr++;
+			len--;
+			continue;
 		}
 
 		l = (*mbverify) ((const unsigned char *) mbstr, len);
-- 
2.7.4

89 matches

Mail list logo