Re: [HACKERS] PATCH: Keep one postmaster monitoring pipe per process

2018-11-22 Thread Thomas Munro
On Sat, Nov 17, 2018 at 2:27 PM Thomas Munro
 wrote:
> Thanks for the review.

Pushed.

-- 
Thomas Munro
http://www.enterprisedb.com



Re: Accounting of zero-filled buffers in EXPLAIN (BUFFERS)

2018-11-22 Thread Thomas Munro
On Mon, Nov 19, 2018 at 5:24 PM Kyotaro HORIGUCHI
 wrote:
> At Sat, 17 Nov 2018 11:15:54 -0300, Alvaro Herrera  
> wrote in <20181117141554.4dkx2u4j6md3bqdh@alvherre.pgsql>
> > Is this patch committable now?
>
> I don't think so. We should make a decision on a point.
>
> I was a bit confused (sorry) but IIUIC Haribabu suggested that
> the RBM_ZERO_ON_ERROR case should be included to read (not just
> ignored). I'm on it.  I think that RPM_ZERO_* other than ON_ERROR
> cases could be a kind of hit but I don't insist on it.
>
> So, I think we should decide on at least the ON_ERROR case before
> this becomes commttable.

Agreed, RBM_ZERO_ON_ERROR needs to be counted as a read.  Here's a new
version like that.

> The another counter could be another issue. I don't object to add
> the counter but I'm not sure what is the suitable name.

I think that might be interesting information in the future, but let's
consider that for a later patch.

-- 
Thomas Munro
http://www.enterprisedb.com


0001-Don-t-count-zero-filled-buffers-as-read-in-EXPLAI-v2.patch
Description: Binary data


Re: WIP: Avoid creation of the free space map for small tables

2018-11-22 Thread John Naylor
On 11/16/18, Amit Kapila  wrote:

I've attached v8, which includes the 2-state map and addresses the points below:

> Some assorted comments:
> 1.
>  
> -Each heap and index relation, except for hash indexes, has a Free Space
> Map
> +Each heap relation, unless it is very small, and each index relation,
> +except for hash indexes, has a Free Space Map
>  (FSM) to keep track of available space in the relation. It's stored
>
> It appears that line has ended abruptly.

Revised.

> 2.
> page = BufferGetPage(buffer);
> + targetBlock = BufferGetBlockNumber(buffer);
>
>   if (!PageIsNew(page))
>   elog(ERROR, "page %u of relation \"%s\" should be empty but is not",
> - BufferGetBlockNumber(buffer),
> + targetBlock,
>   RelationGetRelationName(relation));
>
>   PageInit(page, BufferGetPageSize(buffer), 0);
> @@ -623,7 +641,18 @@ loop:
>   * current backend to make more insertions or not, which is probably a
>   * good bet most of the time.  So for now, don't add it to FSM yet.
>   */
> - RelationSetTargetBlock(relation, BufferGetBlockNumber(buffer));
> + RelationSetTargetBlock(relation, targetBlock);
>
> Is this related to this patch? If not, I suggest let's do it
> separately if required.
>
> 3.
>  static int fsm_set_and_search(Relation rel, FSMAddress addr, uint16 slot,
> -uint8 newValue, uint8 minValue);
> + uint8 newValue, uint8 minValue);
>
> This appears to be a spurious change.

2 and 3 are separated into 0001.

> 4.
> @@ -378,24 +386,15 @@ RelationGetBufferForTuple(Relation relation, Size
> len,
>   * target.
>   */
>   targetBlock = GetPageWithFreeSpace(relation, len + saveFreeSpace);
> +
> + /*
> + * In case we used an in-memory map of available blocks, reset
> + * it for next use.
> + */
> + if (targetBlock < HEAP_FSM_CREATION_THRESHOLD)
> + ClearLocalMap();
>
> How will you clear the local map during error?  I think you need to
> clear it in abort path and you can name the function as
> FSMClearLocalMap or something like that.

Done. I've put this call last before abort processing.

> 5.
> +/*#define TRACE_TARGETBLOCK */
>
> Debugging leftover, do you want to retain this and related stuff
> during the development of patch?

I have kept it aside as a separate patch but not attached it for now.

Also, we don't quite have a consensus on the threshold value, but I
have set it to 4 pages for v8. If this is still considered too
expensive (and basic tests show it shouldn't be), I suspect it'd be
better to interleave the available block numbers as described a couple
days ago than lower the threshold further.

I have looked at zhio.c, and it seems trivial to adapt zheap to this patchset.

-John Naylor
From d1752858aaaf048fd94461dbd79fe0bb0f262b98 Mon Sep 17 00:00:00 2001
From: John Naylor 
Date: Fri, 23 Nov 2018 12:55:16 +0700
Subject: [PATCH v8 1/3] Minor cosmetic adjustments for consistency.

---
 src/backend/access/heap/hio.c | 5 +++--
 src/backend/storage/freespace/freespace.c | 2 +-
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/src/backend/access/heap/hio.c b/src/backend/access/heap/hio.c
index b8b5871559..7f7c9db635 100644
--- a/src/backend/access/heap/hio.c
+++ b/src/backend/access/heap/hio.c
@@ -600,10 +600,11 @@ loop:
 	 * risk wiping out valid data).
 	 */
 	page = BufferGetPage(buffer);
+	targetBlock = BufferGetBlockNumber(buffer);
 
 	if (!PageIsNew(page))
 		elog(ERROR, "page %u of relation \"%s\" should be empty but is not",
-			 BufferGetBlockNumber(buffer),
+			 targetBlock,
 			 RelationGetRelationName(relation));
 
 	PageInit(page, BufferGetPageSize(buffer), 0);
@@ -623,7 +624,7 @@ loop:
 	 * current backend to make more insertions or not, which is probably a
 	 * good bet most of the time.  So for now, don't add it to FSM yet.
 	 */
-	RelationSetTargetBlock(relation, BufferGetBlockNumber(buffer));
+	RelationSetTargetBlock(relation, targetBlock);
 
 	return buffer;
 }
diff --git a/src/backend/storage/freespace/freespace.c b/src/backend/storage/freespace/freespace.c
index 7c4ad1c449..47a991e21d 100644
--- a/src/backend/storage/freespace/freespace.c
+++ b/src/backend/storage/freespace/freespace.c
@@ -106,7 +106,7 @@ static Size fsm_space_cat_to_avail(uint8 cat);
 
 /* workhorse functions for various operations */
 static int fsm_set_and_search(Relation rel, FSMAddress addr, uint16 slot,
-   uint8 newValue, uint8 minValue);
+uint8 newValue, uint8 minValue);
 static BlockNumber fsm_search(Relation rel, uint8 min_cat);
 static uint8 fsm_vacuum_page(Relation rel, FSMAddress addr,
 BlockNumber start, BlockNumber end,
-- 
2.17.1

From e6226e8ee3d51e651b3a0b7035ad6676b4cf4bc6 Mon Sep 17 00:00:00 2001
From: John Naylor 
Date: Fri, 23 Nov 2018 12:58:07 +0700
Subject: [PATCH v8 2/3] Avoid creation of the free space map for small tables.

The FSM isn't created if the heap has 4 blocks or fewer. If the last
known target block has insufficient space, try every block before extending
the heap.

If a heap with a FSM is truncated back to below the threshold, the 

Re: Inadequate executor locking of indexes

2018-11-22 Thread David Rowley
On Thu, 8 Nov 2018 at 13:14, Tom Lane  wrote:
>
> I discovered that it's possible to trigger relation_open's new assertion
> about having a lock on the relation by the simple expedient of running
> the core regression tests with plan_cache_mode = force_generic_plan.


> There are several things we might do to fix this:
>
> 1. Drop the "operation != CMD_DELETE" condition from the above-quoted bit
> in ExecInitModifyTable.  We might be forced into that someday anyway if
> we want to support non-heap-style tables, since most other peoples'
> versions of indexes do want to know about deletions.
>
> 2. Drop the target-table optimization from nodeIndexscan.c and friends,
> and just always open the scan index with AccessShareLock.  (BTW, it's
> a bit inconsistent that these nodes don't release their index locks
> at the end; ExecCloseIndices does.)
>
> 3. Teach nodeIndexscan.c and friends about the operation != CMD_DELETE
> exception, so that they get the lock for themselves in that case.  This
> seems pretty ugly/fragile, but it's about the only option that won't end
> in adding index lock-acquisition overhead in some cases.  (But given the
> planner's behavior, it's not clear to me how often that really matters.)

Since I missed this and only discovered this was a problem when
someone else reported it today, and since I already did my own
analysis separately in [1], then my vote is for #2.  For partitioned
table updates, there may be many result relations which can cause
ExecRelationIsTargetRelation() to become very slow, in such a case any
additional redundant lock would be cheap by comparison.

Ideally, the locking code would realise we already hold a stronger
lock and skip the lock, but I don't see how that's realistically
possible without probing the hash table for all stronger lock types
first, which would likely damage the performance of locking.

[1] 
https://www.postgresql.org/message-id/CAKJS1f-DyKTYyMf9oxn1PQ%3DWyEOOjfVcV-dCc6eB9eat_MYPeA%40mail.gmail.com

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: Hitting CheckRelationLockedByMe() ASSERT with force_generic_plan

2018-11-22 Thread Tom Lane
David Rowley  writes:
> On Thu, 22 Nov 2018 at 22:33, Rushabh Lathia  wrote:
>> Now, to fix this issue either we need to hold proper lock before reaching
>> to ExecInitIndexScan() or teach ExecInitIndexScan() to take AccessShareLock
>> on the scan coming from CMD_DELETE.

> I'd say the following comment and code in nodeIndexscan.c is outdated:

I complained about this previously in

https://www.postgresql.org/message-id/19465.1541636...@sss.pgh.pa.us

but got nothing but crickets :-(

regards, tom lane



Re: Should new partitions inherit their tablespace from their parent?

2018-11-22 Thread David Rowley
On Fri, 9 Nov 2018 at 06:58, Robert Haas  wrote:
> On Wed, Nov 7, 2018 at 9:43 PM Michael Paquier  wrote:
> > On Thu, Nov 08, 2018 at 12:50:40PM +1300, David Rowley wrote:
> > > How about we record the tablespace option for the partitioned table in
> > > reltablespace instead of saving it as 0.  Newly created partitions
> > > which don't have a TABLESPACE mentioned in the CREATE TABLE command
> > > should be created in their direct parent partitioned tables
> > > tablespace.
> >
> > I have seen enough complains on the mailing lists regarding the way
> > tablespaces are handled for partitioned tables and their partitions that
> > it looks like a very good idea to make the tablespace being inherited
> > automatically, by setting up reltablespace to a non-zero value even if
> > a partitioned table has no physical presence.  Of course not on v11 or
> > older releases, just on HEAD.  It is no good to have partitioned indexes
> > and partitioned tables being handling inconsistently for such things.
>
> +1.

Here's a patch for that.  Parking here until January's commitfest.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


v1-0001-Allow-newly-created-partitions-to-inherit-their-p.patch
Description: Binary data


Re: Control your disk usage in PG: Introduction to Disk Quota Extension

2018-11-22 Thread Hubert Zhang
>
> For this particular purpose, I don't immediately see why you need a
> hook in both places.  If ReadBuffer is called with P_NEW, aren't we
> guaranteed to end up in smgrextend()?

For the usercase in diskquota.
BufferExtendCheckPerms_hook is used to do dynamic query enforcement, while
smgr related hooks are used to detect relfilenodeoid of active tables and
write them into shared memory(which is used to accelerate refreshing of
diskquota model).
The reason we don't use smgr_extend hook to replace ReadBuffer hook to do
enforcement has two folds:
1. As for enforcement, we don't want to affect the performance of insert
query. But hooks in smgr_extend need to convert relfilenode to reloid
firstly which need an indexscan.
2. Using hooks in ReadBuffer instead of smgr_extend could avoid to
enforcement on 'cluster relation' operator. For example, 'vacuum full
table' will firstly cluster and create a new table, and then delete the old
table. Because the disk usage will first grow and then shrink, if quota
limit is reached, then vacuum full will fail.(but in fact we want vacuum
full to reduce disk usage) Using hooks in ReadBuffer is one solution to
this problem. Of course, there are other solutions. But This is one of the
reason we use BufferExtendCheckPerms_hook to do enforcement at current
stage.

On Thu, Nov 22, 2018 at 7:26 PM Haozhou Wang  wrote:

> Thank you very much for your review.
> We refactored our patch with new names and comments.
>
> For ReadBufferExtended hook, yes, Readbuffer with P_NEW will then call
> smgrextend.
>
> But in smgrextend, we cannot get the oid of a relation, and it will take
> some time to get the oid via smgrrelation.
> We would like to add a hook just before the smgrextend to get the oid and
> avoid use RelidByRelfilenode().
>
> New patch is attached in the attachment.
> Thank a lot!
>
> Regards,
> Haozhou
>
>
> On Wed, Nov 21, 2018 at 10:48 PM Robert Haas 
> wrote:
>
>> On Tue, Nov 20, 2018 at 2:20 AM Haozhou Wang  wrote:
>> > We prepared a patch that includes the hook points. And such hook points
>> are needed for disk quota extension.
>> > There are two hooks.
>> > One is SmgrStat_hook. It's used to perform ad-hoc logic in storage when
>> doing smgr create/extend/truncate in general. As for disk quota extension,
>> this hook is used to detect active tables(new created tables, tables
>> extending new blocks, or tables being truncated)
>> > The other is BufferExtendCheckPerms_hook. It's used to perform ad-hoc
>> logic when buffer extend a new block. Since ReadBufferExtended is a hot
>> function, we call this hook only when blockNum == P_NEW. As  for disk quota
>> extension, this hook is used to do query enforcement during the query is
>> loading data.
>> >
>> > Any comments are appreciated.
>>
>> +1 for adding some hooks to support this kind of thing, but I think
>> the names you've chosen are not very good.  The hook name should
>> describe the place from which it is called, not the purpose for which
>> one imagines that it will be used, because somebody else might imagine
>> another use.  Both BufferExtendCheckPerms_hook_type and
>> SmgrStat_hook_type are imagining that they know what the hook does -
>> CheckPerms in the first case and Stat in the second case.
>>
>> For this particular purpose, I don't immediately see why you need a
>> hook in both places.  If ReadBuffer is called with P_NEW, aren't we
>> guaranteed to end up in smgrextend()?
>>
>> --
>> Robert Haas
>> EnterpriseDB: http://www.enterprisedb.com
>> The Enterprise PostgreSQL Company
>>
>
>
>

-- 
Thanks

Hubert Zhang


Re: row filtering for logical replication

2018-11-22 Thread Stephen Frost
Greetings,

* Euler Taveira (eu...@timbira.com.br) wrote:
> 2018-02-28 21:54 GMT-03:00 Craig Ringer :
> > Good idea. I haven't read this yet, but one thing to make sure you've
> > handled is limiting the clause to referencing only the current tuple and the
> > catalogs. user-catalog tables are OK, too, anything that is
> > RelationIsAccessibleInLogicalDecoding().
> >
> > This means only immutable functions may be invoked, since a stable or
> > volatile function might attempt to access a table. And views must be
> > prohibited or recursively checked. (We have tree walkers that would help
> > with this).
> >
> > It might be worth looking at the current logic for CHECK expressions, since
> > the requirements are similar. In my opinion you could safely not bother with
> > allowing access to user catalog tables in the filter expressions and limit
> > them strictly to immutable functions and the tuple its self.
>
> IIRC implementation is similar to RLS expressions. I'll check all of
> these rules.

Given the similarity to RLS and the nearby discussion about allowing
non-superusers to create subscriptions, and probably publications later,
I wonder if we shouldn't be somehow associating this with RLS policies
instead of having the publication filtering be entirely independent..

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: Add extension options to control TAP and isolation tests

2018-11-22 Thread Michael Paquier
On Thu, Nov 22, 2018 at 10:01:26PM +0300, Nikolay Shaplov wrote:
> В письме от 21 ноября 2018 09:39:53 пользователь Michael Paquier написал:
>> Those are mentioned here as part of the additional test suites:
>> https://www.postgresql.org/docs/11/regress-run.html#id-1.6.20.5.5
> 
> Oh thanks (I feel urge to start editing documentation right now. I will 
> surpress it, and do it, I hope, later.)

If you have a patch to improve the existing docs, please feel free to
submit those on a different thread.  If you have suggestions about
improving this patch's docs, of course please feel free to suggest them
here!

> May I also ask a question, I came across. It is not part of the patch, but 
> nevertheless...
> If I look in the code, regression test are sql files with expected output 
> that 
> are in src/test/regress. If I look in the documentation, all the tests are 
> regression tests including TAP tests. 
> https://www.postgresql.org/docs/11/regress.html
>
> what is the right way to look at it?

That's for the main regression test suite within src/test/regress, TAP
is also a regression test suite, the page of the link reflects the
general set of test suites available.

> So, I continued exploring your patch. First I carefully read changes in 
> pgxs.mk. The only part of it I did not understand is
> 
>  .PHONY: submake
> -submake:
>  ifndef PGXS
> +submake:
> +ifdef REGRESS
> $(MAKE) -C $(top_builddir)/src/test/regress pg_regress$(X)
>  endif
> +ifdef ISOLATION
> +   $(MAKE) -C $(top_builddir)/src/test/isolation all
> +endif
> +endif # PGXS

This is to make sure that the necessary tools are compiled before
running the related tests.  "submake:" needs to be moved out actually.
Thinking about it a bit more, we should also remove the "ifdef REGRESS"
and "ifdef ISOLATION" because there are some dependencies here.  For
example TAP tests call pg_regress to initialize connection policy.  TAP
tests may also use isolation_tester or such.

> I suppose it is because the purpose of PGXS is never explained, not in the 
> documentation, not in the code comments, and PGXS := $(shell $(PG_CONFIG) --
> pgxs) sounds like some strange magic spell, that explains nothing. In what 
> cases it is defined? In what cases it is not defined? What exactly does it 
> store? Can we make things more easy to understand here?

That's part of a larger scope which is here:
https://www.postgresql.org/docs/11/extend-pgxs.html

> 2. As far as you said that TAP tests for bloom were never executed, 
> I guess I should just ignore
> 
> -
> -wal-check: temp-install
> -   (prove_check)
> 
> as a part of wrong way to execute TAP tests. (I fond no traces of "wal-check" 
> string in the code, so I guess this build target is  an invention of bloom 
> authors)

Yes.  It seems that it was useful for debugging at this time, but this
will rot if let as-is...  I am pretty sure that most people don't use
that wrapper.

> 3. The rest are trivial, except changes for contrib/test_decoding/ and
> src/test/modules/brin I will explore them in my next postgres-dev time slot 
> and then my review work will be done :-)

Thanks for the input, Nikolay!
--
Michael


signature.asc
Description: PGP signature


Re: Speeding up INSERTs and UPDATEs to partitioned tables

2018-11-22 Thread David Rowley
On Sat, 17 Nov 2018 at 07:28, Alvaro Herrera  wrote:
> > The 0002 patch is included again, this time with a new proposed commit
> > message.  There was some discussion over on [1] where nobody seemed to
> > have any concerns about delaying the locking until we route the first
> > tuple to the partition.
>
> Please get a new commitfest entry for this patch.

Added to Jan-fest in: https://commitfest.postgresql.org/21/1887/

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Delay locking partitions during INSERT and UPDATE

2018-11-22 Thread David Rowley
As a follow-on from [1] and also discussed in [2], I'd like to propose
that we don't obtain locks on all partitions during INSERT into a
partitioned table and UPDATE of a partitioned key and instead, only
lock the partition when we first route a tuple to it. This means that
the order that the locks are obtained is no longer well defined and is
at the mercy of the order that tuples are INSERTed or UPDATEd.  It
seems worth relaxing this a bit for gains in performance, as when a
partitioned table contains many partitions, the overhead of locking
all partitions when inserting a single row, or just a few rows is
often significantly higher than the cost of doing the actual insert.

The current behaviour was added in 54cde0c4c058073 in order to
minimise deadlock risk.  It seems that the risk there only comes from
AELs that could be taken when a partition directly receives a TRUNCATE
/ CREATE INDEX / VACUUM FULL / CLUSTER. There's obviously no conflict
with other DML operations since two RowExclusiveLocks don't conflict
with each other.  I think all other AEL obtaining DDL must be
performed on the top level partitioned table, for example, ADD COLUMN
can't be done directly on a partition, so there's no added deadlock
risk from those. For a deadlock to occur one of the above DDL commands
would have to be executed inside a transaction in an order opposite to
the order rows are being INSERTed or UPDATEd in the partitioned table.
If required, such operations could LOCK TABLE the top partitioned
table to block the DML operation. There's already a risk of similar
deadlocks from such operations done on multiple separate tables when
the order they're done is not the same as the order the tables are
written in a query, although, in that case, the window for the
deadlock is likely to be much smaller.

With this done, the performance of an INSERT into a 10k partition
partitioned table looks like:

Setup:
create table hashp (a int) partition by hash(a);
select 'create table hashp'||x::Text || ' partition of hashp for
values with (modulus 1, remainder '||x::text||');' from
generate_Series(0,) x;
\gexec

hashp_insert.sql:
\set p_a random(1,1000)
insert into hashp values(:p_a);

Results:
$ psql -c "truncate hashp;" postgres && pgbench -n -f hashp_insert.sql
-M prepared -c 4 -j 4 -T 60 postgres

Patched:
tps = 27811.427620 (excluding connections establishing)
tps = 28617.417308 (excluding connections establishing)

Unpatched:
tps = 130.446706 (excluding connections establishing)
tps = 119.726641 (excluding connections establishing)

The patch is attached.
I'll park this here until the January commitfest.

[1] 
https://www.postgresql.org/message-id/flat/CAKJS1f_1RJyFquuCKRFHTdcXqoPX-PYqAd7nz=gvbwvgh4a...@mail.gmail.com
[2] 
https://www.postgresql.org/message-id/flat/25C1C6B2E7BE044889E4FE8643A58BA963B5796B%40G01JPEXMBKW03#3783596a794c6789a54a95a20971b6aa

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


v1-0001-Delay-locking-of-partitions-during-INSERT-and-UPD.patch
Description: Binary data


Re: New function pg_stat_statements_reset_query() to reset statistics of a specific query

2018-11-22 Thread Haribabu Kommi
On Fri, Nov 23, 2018 at 1:54 AM Peter Eisentraut <
peter.eisentr...@2ndquadrant.com> wrote:

> On 19/11/2018 06:18, Haribabu Kommi wrote:
> > Amit suggested another option in another mail, so total viable
> > solutions that are discussed as of now are,
> >
> > 1. Single API with NULL input treat as invalid value
> > 2. Multiple API to deal with NULL input of other values
> > 3. Single API with NULL value to treat them as current user, current
> > database
> >  and NULL queryid.
> > 4. Single API with -1 as invalid value, treat NULL as no matching. (Only
> > problem
> >  with this approach is till now -1 is also a valid queryid, but setting
> > -1 as queryid
> > needs to be avoided.
>
> Can you show examples of what these would look like?
>

Following are some of the examples how the various options
work.

Option -1:

select pg_stat_statements_reset(NULL,NULL,NULL);
-- No change, just return. Because the function is strict.

select pg_stat_statements_reset(10,10,NULL);
-- Resets all the statements that are executed by userid 10 on database id
10.

select pg_stat_statements_reset(NULL,10,NULL);
-- Resets all the statements executed on database id 10


Option - 2:

select pg_stat_statements_reset(NULL,NULL,NULL);
-- No change, just return. Function are of type strict.

select pg_stat_statements_reset(10,10,1234)
-- Resets the query statement 1234 executed by userid 10 on database id 10

select pg_stat_statements_reset_by_userid(10)
-- Resets all the statements executed by userid 10

select pg_stat_statements_reset_by_dbid(10);
-- Rests all the statements executed on database id 10.

select pg_stat_statements_reset_by_userid_and_dbid(10, 10);
-- Resets all the statements executed by userid 10 on datbase id 10

Likewise total 7 API's.


Option -3:

select pg_stat_statements_reset(NULL,NULL,NULL);
--Similar like option-1, but when the userid or databaseid are NULL, it uses
the current_role and current_database as the options. But for the queryid
when the options is NULL, it deletes all the queries.

Rest of the details are same as option-1.

Option -4:

select pg_stat_statements_reset(NULL,NULL,NULL);
-- No change, just return. Because the function is strict.

select pg_stat_statements_reset(0,0,0);
-- Resets all the statements

select pg_stat_statements_reset(0,10,0);
-- Resets all the statements executed on database id 10

select pg_stat_statements_reset(10,0,0);
-- Resets all the statements executed on userid 10.


Regards,
Haribabu Kommi
Fujitsu Australia


Re: row filtering for logical replication

2018-11-22 Thread Petr Jelinek
On 01/11/2018 01:29, Euler Taveira wrote:
> Em qua, 28 de fev de 2018 às 20:03, Euler Taveira
>  escreveu:
>> The attached patches add support for filtering rows in the publisher.
>>
> I rebased the patch. I added row filtering for initial
> synchronization, pg_dump support and psql support. 0001 removes unused
> code. 0002 reduces memory use. 0003 passes only structure member that
> is used in create_estate_for_relation. 0004 reuses a parser node for
> row filtering. 0005 is the feature. 0006 prints WHERE expression in
> psql. 0007 adds pg_dump support. 0008 is only for debug purposes (I'm
> not sure some of these messages will be part of the final patch).
> 0001, 0002, 0003 and 0008 are not mandatory for this feature.
> 
> Comments?
> 

Hi,

I think there are two main topics that still need to be discussed about
this patch.

Firstly, I am not sure if it's wise to allow UDFs in the filter clause
for the table. The reason for that is that we can't record all necessary
dependencies there because the functions are black box for parser. That
means if somebody drops object that an UDF used in replication filter
depends on, that function will start failing. But unlike for user
sessions it will start failing during decoding (well processing in
output plugin). And that's not recoverable by reading the missing
object, the only way to get out of that is either to move slot forward
which means losing part of replication stream and need for manual resync
or full rebuild of replication. Neither of which are good IMHO.

Secondly, do we want to at least notify user on filters (or maybe even
disallow them) with combination of action + column where column value
will not be logged? I mean for example we do this when processing the
filter against a row:

> + ExecStoreHeapTuple(new_tuple ? new_tuple : old_tuple, 
> ecxt->ecxt_scantuple, false);

But if user has expression on column which is not part of replica
identity that expression will always return NULL for DELETEs because
only replica identity is logged with actual values and everything else
in NULL in old_tuple. So if publication replicates deletes we should
check for this somehow.

Btw about code (you already fixed the wrong reloid in sync so skipping
that).

0002:
> + for (tupn = 0; tupn < walres->ntuples; tupn++)
>   {
> - char   *cstrs[MaxTupleAttributeNumber];
> + char**cstrs;
>  
>   CHECK_FOR_INTERRUPTS();
>  
>   /* Do the allocations in temporary context. */
>   oldcontext = MemoryContextSwitchTo(rowcontext);
>  
> + cstrs = palloc(nfields * sizeof(char *));

Not really sure that this is actually worth it given that we have to
allocate and free this in a loop now while before it was just sitting on
a stack.

0005:
> @@ -654,5 +740,10 @@ rel_sync_cache_publication_cb(Datum arg, int cacheid, 
> uint32 hashvalue)
>*/
>   hash_seq_init(, RelationSyncCache);
>   while ((entry = (RelationSyncEntry *) hash_seq_search()) != NULL)
> + {
>   entry->replicate_valid = false;
> + if (list_length(entry->row_filter) > 0)
> + list_free(entry->row_filter);
> + entry->row_filter = NIL;
> + }

Won't this leak memory? The list_free only frees the list cells, but not
the nodes you stored there before.

Also I think we should document here that the expression is run with the
session environment of the replication connection (so that it's more
obvious that things like CURRENT_USER will not return user which changed
tuple but the replication user).

It would be nice if 0006 had regression test and 0007 TAP test.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services



Re: [RFC] Removing "magic" oids

2018-11-22 Thread Andrew Dunstan



On 11/22/18 4:14 PM, Andres Freund wrote:

Hi,

On 2018-11-21 23:32:07 -0500, Andrew Dunstan wrote:

On 11/21/18 7:14 PM, Andres Freund wrote:

   Could you check whether you
still encounter the issue after applying the attached fix?



This has largely fixed the problem, so I think this should be applied.

Cool, will do so tomorrow or such. Thanks for testing.



With some adjustments to the tests to remove problematic cases (e.g.
postgres_fdw's ft_pg_type) the tests pass. The exception is
HEAD->HEAD. The change is that the LOs are not dumped in the same
order pre and post upgrade. I can change the tests to allow for a
greater fuzz factor - generally when the source and target are the
same we don't allow any fuzz.  Or if we care we could do a better job
of dumping LOs in a consistent order.

So you'd want to dump large objects in oid order or such? Probably
comparatively not a huge overhead, but also not nothing? We don't really
force ordering in other places in pg_dump afaik.



Well, all other data is dumped in a consistent order, and the tests rely 
on this. If we don't care about that for LOs I can accommodate it. I 
don't have a terribly strong opinion about the desirability of making 
LOs keep the same behaviour.



cheers


andrew

--
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Hitting CheckRelationLockedByMe() ASSERT with force_generic_plan

2018-11-22 Thread David Rowley
On Thu, 22 Nov 2018 at 22:33, Rushabh Lathia  wrote:
> CREATE TABLE foo (x int primary key);
> INSERT INTO foo VALUES (1), (2), (3), (4), (5);
>
> CREATE OR REPLACE FUNCTION f1(a int) RETURNS int
> AS $$
> BEGIN
>  DELETE FROM foo where x = a;
>  return 0;
> END;
> $$ LANGUAGE plpgsql;
>
> postgres@100858=#set plan_cache_mode = force_generic_plan;
> SET
> postgres@100858=#select f1(4);
>  f1
> 
>   0
> (1 row)
>
> postgres@100858=#select f1(4);
> server closed the connection unexpectedly


> Now, to fix this issue either we need to hold proper lock before reaching
> to ExecInitIndexScan() or teach ExecInitIndexScan() to take AccessShareLock
> on the scan coming from CMD_DELETE.

I'd say the following comment and code in nodeIndexscan.c is outdated:

/*
* Open the index relation.
*
* If the parent table is one of the target relations of the query, then
* InitPlan already opened and write-locked the index, so we can avoid
* taking another lock here.  Otherwise we need a normal reader's lock.
*/
relistarget = ExecRelationIsTargetRelation(estate, node->scan.scanrelid);
indexstate->iss_RelationDesc = index_open(node->indexid,
  relistarget ? NoLock : AccessShareLock);

Despite what the above comment claims, these indexes have not been
opened in InitPlan since 389af951552ff2. As you mentioned, they're
opened in nodeModifyTable.c for non-delete statements.

I think we either need to update the above code to align it to what's
now in nodeModifyTable.c, or just obtain an AccessShareLock
regardless. I kinda think we should just take the lock regardless as
determining if the relation is a result relation may be much more
expensive than just taking another lower-level lock on the index.

Anyway. I've attached a small patch to update the above fragment.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


fix_index_locking_for_deletes.patch
Description: Binary data


Re: [RFC] Removing "magic" oids

2018-11-22 Thread Andres Freund
Hi,

On 2018-11-21 23:32:07 -0500, Andrew Dunstan wrote:
> On 11/21/18 7:14 PM, Andres Freund wrote:
> >   Could you check whether you
> > still encounter the issue after applying the attached fix?
> > 
> 
> 
> This has largely fixed the problem, so I think this should be applied.

Cool, will do so tomorrow or such. Thanks for testing.


> With some adjustments to the tests to remove problematic cases (e.g.
> postgres_fdw's ft_pg_type) the tests pass. The exception is
> HEAD->HEAD. The change is that the LOs are not dumped in the same
> order pre and post upgrade. I can change the tests to allow for a
> greater fuzz factor - generally when the source and target are the
> same we don't allow any fuzz.  Or if we care we could do a better job
> of dumping LOs in a consistent order.

So you'd want to dump large objects in oid order or such? Probably
comparatively not a huge overhead, but also not nothing? We don't really
force ordering in other places in pg_dump afaik.

Greetings,

Andres Freund



Re: pg_upgrade supported versions policy

2018-11-22 Thread Andrew Dunstan



On 11/22/18 7:57 AM, Magnus Hagander wrote:
On Thu, Nov 22, 2018 at 12:48 AM Andres Freund > wrote:


Hi,

I feel like we ought to trim the support for a few old versions from
pg_upgrade.  In my particular case I don't really think it's
reasonable
to test < 9.0 versions for pg_largeobject_metadata migrations. But I
think we should create a policy that's the default, leaving individual
cases aside.

I can see a few possible policies:

1) Support upgrading from the set of releases that were supported when
   the pg_upgrade target version was released. While some will
argue that
   this is fairly short, people above it can still upgrade ~10 years
   worth of versions with a single intermediary upgrade.
2) Same, but +2 releases or such.
3) Support until it gets too uncomfortable.
4) Support all versions remotely feasible (i.e. don't drop
versions that
   still can be compiled)

I personally think 1 is preferrable and 2 is acceptable.


As a developer, I'd like 1. As someone who repeatedly runs into 
customer cases, I'd say 2. The reason for that is that far too many 
don't realize they should upgrade on time, but at least a fair number 
of them notice within one cycle from it going EOL -- perhaps just by 
reading the announcement saying "hey version x is now EOL". And 
supporting +1 or +2 versions makes it possible for them to go directly 
to latest.




2 seems reasonable. It's perfectly possible for the buildfarm module 
that does tests against old version to go back as fas as we like.



cheers


andrew


--

Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: 64-bit hash function for hstore and citext data type

2018-11-22 Thread Andrew Gierth
> "Tomas" == Tomas Vondra  writes:

 Tomas> I wonder if the hstore hash function is actually correct. I see
 Tomas> it pretty much just computes hash on the varlena representation.
 Tomas> The important question is - can there be two different encodings
 Tomas> for the same hstore value?

I was going to say "no", but in fact on closer examination there is an
edge case caused by the fact that hstoreUpgrade allows an _empty_ hstore
from pg_upgraded 8.4 data through without modifying it. (There's also a
vanishingly unlikely case involving the pgfoundry release of hstore-new.)

I'm inclined to fix this in hstoreUpgrade rather than complicate
hstore_hash with historical trivia. Also there have been no field
complaints - I guess it's unlikely that there is much pg 8.4 hstore data
in the wild that anyone wants to hash.

-- 
Andrew (irc:RhodiumToad)



Re: Add extension options to control TAP and isolation tests

2018-11-22 Thread Nikolay Shaplov
В письме от 21 ноября 2018 09:39:53 пользователь Michael Paquier написал:

> > For me name "output_iso" means nothing. iso is something about CD/DVD or
> > about standards. I would not guess that iso stands for isolation if I did
> > not know it already. isolation_output is more sensible: I have heard that
> > there are some isolation tests, this must be something about it. May be
> > it would be better to change it to isolation_output everywhere instead of
> > changing to output_iso
> That's just a default configuration used by the isolation tester.
> That's not much bothering with in my opinion for this patch, as the
> patch is here to make sure that the default configuration gets used
> where it had better be used.  Changing this default value would be of
> course doable technically, but that's around for years to changing it
> does not seem like good idea.

Ok. If it is long time tradition, let it be :-)
 
> > I tried to find definition in documentation what does "isolation test"
> > exactly means, but did not find. There is some general words about TAP
> > tests in main postgres documentation
> > https://www.postgresql.org/docs/11/regress-tap.html,
> > but I would not understand anything from it if I did not already know how
> > it works.
> 
> Those are mentioned here as part of the additional test suites:
> https://www.postgresql.org/docs/11/regress-run.html#id-1.6.20.5.5

Oh thanks (I feel urge to start editing documentation right now. I will 
surpress it, and do it, I hope, later.)
May I also ask a question, I came across. It is not part of the patch, but 
nevertheless...
If I look in the code, regression test are sql files with expected output that 
are in src/test/regress. If I look in the documentation, all the tests are 
regression tests including TAP tests. 
https://www.postgresql.org/docs/11/regress.html 

what is the right way to look at it?

> > In current extend-pgxs documentation there is some explanation about
> > regression test, it sensible enough. Since TAP and isolation tests are
> > introduced now, there should be same short explanation for both of
> > them.
> 
> I see your point here, and it is true that documentation ought to be
> better.  So I have written out a new set of paragraphs which explain the
> whereabouts of the new variables a bit more in depth in the section of 
> extend-pgxs.

Oh thanks! Now it is much more clear.

So, I continued exploring your patch. First I carefully read changes in 
pgxs.mk. The only part of it I did not understand is

 .PHONY: submake
-submake:
 ifndef PGXS
+submake:
+ifdef REGRESS
$(MAKE) -C $(top_builddir)/src/test/regress pg_regress$(X)
 endif
+ifdef ISOLATION
+   $(MAKE) -C $(top_builddir)/src/test/isolation all
+endif
+endif # PGXS

I suppose it is because the purpose of PGXS is never explained, not in the 
documentation, not in the code comments, and PGXS := $(shell $(PG_CONFIG) --
pgxs) sounds like some strange magic spell, that explains nothing. In what 
cases it is defined? In what cases it is not defined? What exactly does it 
store? Can we make things more easy to understand here? 

2. As far as you said that TAP tests for bloom were never executed, 
I guess I should just ignore

-
-wal-check: temp-install
-   (prove_check)

as a part of wrong way to execute TAP tests. (I fond no traces of "wal-check" 
string in the code, so I guess this build target is  an invention of bloom 
authors)

3. The rest are trivial, except changes for contrib/test_decoding/ and
src/test/modules/brin I will explore them in my next postgres-dev time slot 
and then my review work will be done :-)

-- 
Do code for fun.

signature.asc
Description: This is a digitally signed message part.


Re: 64-bit hash function for hstore and citext data type

2018-11-22 Thread Tomas Vondra

On 9/26/18 12:20 PM, amul sul wrote:

Hi all,

Commit[1] has added 64-bit hash functions for core data types and in the same
discussion thread[2] Robert Haas suggested to have the similar extended hash
function for hstore and citext data type. Attaching patch proposes the same.



I wonder if the hstore hash function is actually correct. I see it 
pretty much just computes hash on the varlena representation. The 
important question is - can there be two different encodings for the 
same hstore value? If that's possible, those two versions would end up 
with a different hash value, breaking the hashing scheme.


I'm not very familiar with hstore internals so I don't know if that's 
actually possible, but if you look at hstore_cmp, that seems to be far 
more complex than just comparing the varlena values directly.



regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Tid scan improvements

2018-11-22 Thread Tomas Vondra

On 11/22/18 8:41 AM, David Rowley wrote:
>

...

3. I'd rather see EnsureTidRangeSpace() keep doubling the size of the
allocation until it reaches the required size. See how
MakeSharedInvalidMessagesArray() does it.  Doing it this way ensures
we always have a power of two sized array which is much nicer if we
ever reach the palloc() limit as if the array is sized at the palloc()
limit / 2 + 1, then if we try to double it'll fail.  Of course, it's
unlikely to be a problem here, but... the question would be how to
decide on the initial size.



I think it kinda tries to do that in some cases, by doing this:

*numAllocRanges *= 2;

...

tidRanges = (TidRange *)
repalloc(tidRanges,
 *numAllocRanges * sizeof(TidRange));

The problem here is that what matters is not numAllocRanges being 2^N, 
but the number of bytes allocated being 2^N. Because that's what ends up 
in AllocSet, which keeps lists of 2^N chunks.


And as TidRange is 12B, so this is guaranteed to waste memory, because 
no matter what the first factor is, the result will never be 2^N.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Online verification of checksums

2018-11-22 Thread Stephen Frost
Greetings,

* Tomas Vondra (tomas.von...@2ndquadrant.com) wrote:
> On 11/22/18 2:12 AM, Stephen Frost wrote:
> >* Michael Banck (michael.ba...@credativ.de) wrote:
> >>On Tue, Oct 30, 2018 at 06:22:52PM +0100, Fabien COELHO wrote:
> >>>The "check if page was modified since checkpoint" does not look useful when
> >>>offline. Maybe it lacks a comment to say that this cannot (should not ?)
> >>>happen when offline, but even then I would not like it to be true: ISTM 
> >>>that
> >>>no page should be allowed to be skipped on the checkpoint condition when
> >>>offline, but it is probably ok to skip with the new page test, which make 
> >>>me
> >>>still think that they should be counted and reported separately, or at 
> >>>least
> >>>the checkpoint skip test should not be run when offline.
> >>
> >>What is the rationale to not skip on the checkpoint condition when the
> >>instance is offline?  If it was shutdown cleanly, this should not
> >>happen, if the instance crashed, those would be spurious errors that
> >>would get repaired on recovery.
> >>
> >>I have not changed that for now.
> >
> >Agreed- this is an important check even in offline mode.
> 
> Yeah. I suppose we could detect if the shutdown was clean (like pg_rewind
> does), and then skip the check. Or perhaps we should still do the check
> (without a retry), and report it as issue when we find a page with LSN newer
> than the last checkpoint.

I agree that it'd be nice to report an issue if it's a clean shutdown
but there's an LSN newer than the last checkpoint, though I suspect that
would be more useful in debugging and such and not so useful for users.

> In any case, the check is pretty cheap (comparing two 64-bit values), and I
> don't see how skipping it would optimize anything. It would make the code a
> tad simpler, but we still need the check for the online mode.

Yeah, I'd just keep the check.

> A minor detail is that the reads/writes should be atomic at the sector
> level, which used to be 512B, so it's not just about pages torn in 4kB/4kB
> manner, but possibly an arbitrary mix of 512B chunks from old and new
> version.

Sure.

> This also explains why we don't need any delay - the reread happens after
> the write must have already written the page header, so the new LSN must be
> already visible.

Agreed.

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: Online verification of checksums

2018-11-22 Thread Tomas Vondra




On 11/22/18 2:12 AM, Stephen Frost wrote:

Greetings,

* Michael Banck (michael.ba...@credativ.de) wrote:

On Tue, Oct 30, 2018 at 06:22:52PM +0100, Fabien COELHO wrote:

The "check if page was modified since checkpoint" does not look useful when
offline. Maybe it lacks a comment to say that this cannot (should not ?)
happen when offline, but even then I would not like it to be true: ISTM that
no page should be allowed to be skipped on the checkpoint condition when
offline, but it is probably ok to skip with the new page test, which make me
still think that they should be counted and reported separately, or at least
the checkpoint skip test should not be run when offline.


What is the rationale to not skip on the checkpoint condition when the
instance is offline?  If it was shutdown cleanly, this should not
happen, if the instance crashed, those would be spurious errors that
would get repaired on recovery.

I have not changed that for now.


Agreed- this is an important check even in offline mode.



Yeah. I suppose we could detect if the shutdown was clean (like 
pg_rewind does), and then skip the check. Or perhaps we should still do 
the check (without a retry), and report it as issue when we find a page 
with LSN newer than the last checkpoint.


In any case, the check is pretty cheap (comparing two 64-bit values), 
and I don't see how skipping it would optimize anything. It would make 
the code a tad simpler, but we still need the check for the online mode.



When offline, the retry logic does not make much sense, it should complain
directly on the first error? Also, I'm unsure of the read & checksum retry
logic *without any delay*.


The race condition being considered here is where an 8k read somehow
gets the first 4k, then is scheduled off-cpu, and the full 8k page is
then written by some other process, and then this process is woken up
to read the second 4k.  I agree that this is unnecessary when the
database is offline, but it's also pretty cheap.  When the database is
online, it's an extremely unlikely case to hit (just try to reproduce
it...) but if it does get hit then it's easy enough to recheck by doing
a reread, which should show that the LSN has been updated in the first
4k and we can then know that this page is in the WAL.  We have not yet
seen a case where such a re-read returns an old LSN and an invalid
checksum; based on discussion with other hackers, that shouldn't be
possible as every kernel seems to consistently write in-order, meaning
that the first 4k will be updated before the second, so a single re-read
should be sufficient.



Right.

A minor detail is that the reads/writes should be atomic at the sector 
level, which used to be 512B, so it's not just about pages torn in 
4kB/4kB manner, but possibly an arbitrary mix of 512B chunks from old 
and new version.


This also explains why we don't need any delay - the reread happens 
after the write must have already written the page header, so the new 
LSN must be already visible.


So no delay is necessary. And if it was, how long should the delay be? 
The processes might end up off-CPU for arbitrary amount of time, so 
picking a good value would be pretty tricky.



Remember- this is all in-memory activity also, we aren't talking about
what might happen on disk here.


I think the small overhead of retrying in offline mode even if useless
is worth avoiding making the code more complicated in order to cater for
both modes.


Agreed.


Initially there was a delay, but this was removed after analysis and
requests by several other reviewers.


Agreed, there's no need for or point to having such a delay.



Yep.


This might suggest some option to tell the command that it should work in
online or offline mode, so that it may be stricter in some cases. The
default may be one of the option, eg the stricter offline mode, or maybe
guessed at startup.


If we believe the operation should be different, the patch removes the
"is cluster online?" check (as it is no longer necessary), so we could
just replace the current error message with a global variable with the
result of that check and use it where needed (if any).


That could let open the issue of someone starting the check offline, and
then starting the database while it is not finished. Maybe it is not worth
sweating about such a narrow use case.


I don't think we need to cater for that, yeah.


Agreed.



Yep. I don't think other tools protect against that either. And 
pg_rewind does actually modify the cluster state, unlike checksum 
verification.



It would also be nice if the test could apply on an active database,
eg with a low-rate pgbench running in parallel to the verification,
but I'm not sure how easy it is to add such a thing.


That sounds much more complicated so I have not tackled that yet.


I agree that this would be nice, but I don't want the regression tests
to become much longer...



I have to admit I find this thread rather confusing, because the subject 

Re: New GUC to sample log queries

2018-11-22 Thread Vik Fearing
On 21/11/2018 09:06, Adrien Nayrat wrote:
> On 11/19/18 2:52 PM, Dmitry Dolgov wrote:
>>> On Mon, Nov 19, 2018 at 2:40 PM Tomas Vondra  
>>> wrote:
>>>
>>> On 11/19/18 2:57 AM, Michael Paquier wrote:
 On Sun, Nov 18, 2018 at 12:18:33PM +0100, Dmitry Dolgov wrote:
> Since it's hard to come up with a concise name that will mention sampling 
> rate
> in the context of min_duration_statement, I think it's fine to name this
> configuration "log_sample_rate", as long as it's dependency from
> log_min_duration_statements is clearly explained in the documentation.

 log_sample_rate looks fine to me as a name.
>>>
>>> That seems far too short to me - the name should indicate it applies to
>>> statement logging. I'd say log_statement_sample_rate is better.
>>
>> I agree, sounds reasonable.
>>
> 
> Thanks for your comments. Here is the updated patch. I fixed a warning for
> missing parentheses in this expression:
> if ((exceeded && in_sample) || log_duration)
> 
> It passed make check_world and make docs

I am happy with this version.  I've marked it ready for committer.

Thanks!
-- 
Vik Fearing  +33 6 46 75 15 36
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support



Re: using index or check in ALTER TABLE SET NOT NULL

2018-11-22 Thread Dmitry Dolgov
> On Tue, Nov 13, 2018 at 1:59 PM Alvaro Herrera  
> wrote:
>
> On 2018-Nov-13, Dmitry Dolgov wrote:
>
> > > On Sun, 4 Nov 2018 at 19:03, Sergei Kornilov  wrote:
> > >
> > > > If not properly cataloguing NOT NULL constraints would be fixed, can it
> > > > potentially conflict with the current patch or not?
> > > We already doing same stuff for "alter table attach partition" and in 
> > > this patch i use exactly this routine. If proper cataloguing would 
> > > conflict with my patch - it would conflict with "attach partition" 
> > > validation too.
> > > I think proper cataloguing can be implemented without conflict with 
> > > proposed feature.
> >
> > Yes, indeed, this patch relies on the PartConstraintImpliedByRelConstraint.
> > Then maybe it makes sense to go with the solution, proposed in this thread,
> > while leaving open the possibility of having "SET NOT NULL NOT VALID"? From
> > the functionality point of view it definitely would be beneficial. Any other
> > opinions?
>
> I think we should ignore any SET NOT NULL NOT VALID patches, because
> in practice they don't exist.  Let's move forward with this, because the
> optimization being proposed is clearly very useful.

Absolutely agree. Looking at the history of the patch I see that it went
through some extensive review and even was marked as Ready for Committer before
the commentary from Peter, but since then some changes were made (rebase and
code reorganization). Can someone from the reviewers confirm (or not) if the
patch is in a good shape? If no, then I'll try to allocate some time for that,
but can't promise any exact date.



Re: MERGE SQL statement for PG12

2018-11-22 Thread Tomas Vondra

On 11/22/18 7:44 AM, Pavan Deolasee wrote:


Hi Tomas,

Sorry for a delayed response.

On Mon, Oct 29, 2018 at 4:59 PM Tomas Vondra 
mailto:tomas.von...@2ndquadrant.com>> wrote:


Hi Pavan,

On 10/29/2018 10:23 AM, Pavan Deolasee wrote:
 >
 > ...
 >
 > Thanks for keeping an eye on the patch. I've rebased the patch
 > against the current master. A new version is attached.
 >
 > Thanks,
 > Pavan
 >

It's good to see the patch moving forward. What are your thoughts
regarding things that need to be improved/fixed to make it committable?

I briefly discussed the patch with a couple of people at pgconf.eu
 last
week, and IIRC the main concern was how the merge is represented in
parser/executor.


Yes, Andres and to some extent Peter G had expressed concerns about 
that. Andres suggested that we should rework the parser and executor 
side. Here are some excerpts from his review comments.



"I don't think the parser / executor implementation
ofMERGEis architecturally sound.  I think creating hidden joins during
parse-analysis to implementMERGEis a seriously bad idea and it needs
to be replaced by a different executor structure."

+    * As top-level statements INSERT, UPDATE and DELETE have a Query, 
whereas

+    * with MERGE the individual actions do not require separate planning,
+    * only different handling in the executor. See nodeModifyTable handling
+    * of commandType CMD_MERGE.
+    *
+    * A sub-query can include the Target, but otherwise the sub-query 
cannot

+    * reference the outermost Target table at all.
+    */
+   qry->querySource = QSRC_PARSER;

Why is this, and not building a proper executor node for merge that
knows how to get the tuples, the right approach?  We did a rough
equivalent for matview updates, and I think it turned out to be a pretty
bad plan.



On Fri, Apr 6, 2018 at 1:30 AM, Andres Freund>wrote:




My impression is that this simply shouldn't be going through
nodeModifyTuple, but be it's own nodeMerge node. The trigger handling
would need to be abstraced into execTrigger.c or such to avoid
duplication.  We're now going from nodeModifyTable.c:ExecModifyTable()
into execMerge.c:ExecMerge(), back to
nodeModifyTable.c:ExecUpdate/Insert(). To avoid ExecInsert() doing
things that aren't appropriate formerge, we then pass an actionState,
which neuters part of ExecUpdate/Insert(). This is just bad.




To be honest, I need more directions on how to address these major 
architectural concerns. I'd tried to rework the executor part and had 
commented on that. But I know that was probably done in a hurry since we 
were trying to save a revert. Having said that, I am still not very sure 
how exactly the executor side should look like without causing too much 
duplication of handling nodeModifyTable() and proposed nodeMerge(). If 
Andres can give me further inputs, I will be more than happy to figure 
out the details and improve the patch.




I think not going through nodeModifyTable and having a separate 
nodeMerge.c makes sense. It might result in some code being duplicated, 
but I suppose that code can be shared (wrapped as a function, moved to 
some file shared by the two nodes). I wouldn't expect the result to be 
particularly ugly, at least not compared to how nodeModifyTable is 
twisted with the current patch.


As far as the parser side goes, do I understand correctly that instead 
of building a special join in transformMergeStmt() as the patch does 
today, we should follow what transformUpdateStmt() does for a potential 
join? i.e. just have a single RTE for the source relation and present it 
as a left hand side for the implicit join? If I get a little more 
direction on this, I can try to address the issues.




Not sure.

It looks very likely that the patch won't get much review in the current 
state. But if I get inputs, I can work out the details and try to get 
the patch in committable state.


BTW I am aware that the patch is bit-rotten because of the partitioning 
related changes. I will address those and post a revised patch soon.




thanks

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: pg_upgrade supported versions policy

2018-11-22 Thread Robert Eckhardt
On Thu, Nov 22, 2018 at 7:57 AM Magnus Hagander  wrote:
>
> On Thu, Nov 22, 2018 at 12:48 AM Andres Freund  wrote:
>>
>> Hi,
>>
>> I feel like we ought to trim the support for a few old versions from
>> pg_upgrade.  In my particular case I don't really think it's reasonable
>> to test < 9.0 versions for pg_largeobject_metadata migrations. But I
>> think we should create a policy that's the default, leaving individual
>> cases aside.
>>
>> I can see a few possible policies:
>>
>> 1) Support upgrading from the set of releases that were supported when
>>the pg_upgrade target version was released. While some will argue that
>>this is fairly short, people above it can still upgrade ~10 years
>>worth of versions with a single intermediary upgrade.
>> 2) Same, but +2 releases or such.
>> 3) Support until it gets too uncomfortable.
>> 4) Support all versions remotely feasible (i.e. don't drop versions that
>>still can be compiled)
>>
>> I personally think 1 is preferrable and 2 is acceptable.
>
>
> As a developer, I'd like 1. As someone who repeatedly runs into customer 
> cases, I'd say 2. The reason for that is that far too many don't realize they 
> should upgrade on time, but at least a fair number of them notice within one 
> cycle from it going EOL -- perhaps just by reading the announcement saying 
> "hey version x is now EOL". And supporting +1 or +2 versions makes it 
> possible for them to go directly to latest.

+1 Just looking at the Postgres versions we are supporting inside our
company, we have many/most products leveraging 9.4 currently and they
are just starting to think about how to upgrade. My assumption is many
teams will get it together enough to upgrade by the time 9.4 is no
longer supported, however, many will not and getting them to the
latest stable version in one jump would be really nice.

-- Rob
>
> 3 and 4 both causes a lot of work on the dev side, but also makes it a lot 
> less predictable for the end users. I'm willing to be many of them prefer a 
> defined policy even if it's a bit shorter than the limits we have now, to a 
> "sorry we dont know" kind of policy. Thus, -1 for both 3 and 4.
>
> --
>  Magnus Hagander
>  Me: https://www.hagander.net/
>  Work: https://www.redpill-linpro.com/



Re: Implement predicate propagation for non-equivalence clauses

2018-11-22 Thread Alexander Kuzmenkov

Hi Richard,

I took a look at the v2, here are some comments:

* This feature needs tests, especially for the cases where opfamilies or 
data types or collations don't match, and other non-obvious cases where 
it shouldn't work.



* Deducing an inequality to a constant is not always helpful. If we know 
that a = b and a = const1 and b < const2, we will deduce b = const1 from 
the EC, and adding b < const2 doesn't improve selectivity and only makes 
the cost estimate worse. One situation where it does make sense is when 
we can detect contradictions in these clauses, e.g. if we know that 
const1 > const2 and therefore know that the above selection clause is 
always false. Looking at the regression test changes, I see that v2 
doesn't do that. I think the handling of deduced inequalities shoud be 
modeled on the flow of generate_base_implied_equalities -> 
process_implied_equality -> distribute_qual_to_rels. This would allow us 
to correctly handle a deduced const1 < const2 clause and turn it into a 
gating Result qual.



* There are functions named generate_implied_quals, gen_implied_quals 
and gen_implied_qual. The names are confusingly similar, we could use 
something like generate_implied_quals_for_clause and 
generate_one_implied_qual for the latter two.



@@ gen_implied_quals
    else if (clause && IsA(clause, ScalarArrayOpExpr))
    {
* When can the clause be NULL?


@@ gen_implied_quals
    item1 = canonicalize_ec_expression(item1,
                                       exprType((Node *) item1),
                                       collation);
    item2 = canonicalize_ec_expression(item2,
                                       exprType((Node *) item2),
                                       collation);
* Why do we do this? If the collation or type of the original expression 
isn't right and we have to add RelabelType, the resulting expression 
won't match the original clause and we won't be able to substitute it 
with other equavalence members. So we can just check that the type and 
collation match.



* In gen_implied_quals, I'd rename item1 => left, item2 => right, em1 => 
orig_em, em2 => other_em, and same for list cells and types. As it is 
now, em1 can actually match both item1 and item2, and em2 is not related 
to either of them, so it took me some effort to understand what's going on.



* In gen_implied_qual, why do we search the clause for a matching 
subexpression? Reading the thread, I thought that we can only do the 
substitution for OpExprs of the same opfamilies as the generating EC. 
This code looks like it can do the substitution an at arbitrary depth, 
so we might change an argument of some unsuitable function, and the 
result will not be correct. What we should probably do is that after we 
matched one side of OpExpr to one EC member, we just replace it with 
another suitable member and add the resulting clause.



@@ gen_implied_qual
    check_mergejoinable(new_rinfo);
    check_hashjoinable(new_rinfo);
...
    /*
     * If the clause has a mergejoinable operator, set the EquivalenceClass
     * links. Otherwise, a mergejoinable operator with NULL 
left_ec/right_ec

     * will cause update_mergeclause_eclasses fails at assertion.
     */
    if (new_rinfo->mergeopfamilies)
        initialize_mergeclause_eclasses(root, new_rinfo);
* It's not an equality clause, so it is not going to be mergejoinable 
nor hashjoinable and we can skip these checks.


--
Alexander Kuzmenkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




Re: [HACKERS] generated columns

2018-11-22 Thread Peter Eisentraut
On 19/11/2018 19:54, Alvaro Herrera wrote:
> It's unclear why you made generated columns on partitions unsupported.
> I'd fix the limitation if possible, but if not, at least document it.

This is explained here:

+   /*
+* Generated columns in partition key expressions:
+*
+* - Stored generated columns cannot work: They are computed
+*   after BEFORE triggers, but partition routing is done
+*   before all triggers.
+*
+* - Virtual generated columns could work.  But there is a
+*   problem when dropping such a table: Dropping a table
+*   calls relation_open(), which causes partition keys to be
+*   constructed for the partcache, but at that point the
+*   generation expression is already deleted (through
+*   dependencies), so this will fail.  So if you remove the
+*   restriction below, things will appear to work, but you
+*   can't drop the table. :-(
+*/

> (I particularly notice that partition key is marked as unsupported in
> the regression test.  Consider partitioning on a SERIAL column, this is
> clearly something that users will expect to work.)

A serial column is not a generated column.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] generated columns

2018-11-22 Thread Peter Eisentraut
On 15/11/2018 15:10, Robert Haas wrote:
> I don't have a strong position on 1 vs. 2 vs. 3, but I do think it
> would be nicer not to use '\0' as a column value.  I'd suggest you use
> 'n' or '0' or '-' or some other printable character instead.

I had carefully considered this when attidentity was added.  Using '\0'
allows you to use this column as a boolean in C code, which is often
convenient.  Also, there are numerous places where a pg_attribute form
or a tuple descriptor is initialized to all zeroes, which works well for
most fields, and adding one exception like this would create a lot of
extra work and bloat the patch and create potential for future
instability.  Also note that a C char '\0' is represented as '' (empty
string) in SQL, so this also creates a natural representation in SQL.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: New function pg_stat_statements_reset_query() to reset statistics of a specific query

2018-11-22 Thread Peter Eisentraut
On 19/11/2018 06:18, Haribabu Kommi wrote:
> Amit suggested another option in another mail, so total viable 
> solutions that are discussed as of now are,
> 
> 1. Single API with NULL input treat as invalid value
> 2. Multiple API to deal with NULL input of other values
> 3. Single API with NULL value to treat them as current user, current
> database
>  and NULL queryid.
> 4. Single API with -1 as invalid value, treat NULL as no matching. (Only
> problem
>  with this approach is till now -1 is also a valid queryid, but setting
> -1 as queryid
> needs to be avoided.

Can you show examples of what these would look like?

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: pg_upgrade supported versions policy

2018-11-22 Thread Peter Eisentraut
On 22/11/2018 00:47, Andres Freund wrote:
> I can see a few possible policies:
> 
> 1) Support upgrading from the set of releases that were supported when
>the pg_upgrade target version was released. While some will argue that
>this is fairly short, people above it can still upgrade ~10 years
>worth of versions with a single intermediary upgrade.
> 2) Same, but +2 releases or such.
> 3) Support until it gets too uncomfortable.
> 4) Support all versions remotely feasible (i.e. don't drop versions that
>still can be compiled)

I would prefer #3.  That's basically how we handle pg_dump, psql, etc.
normally.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: ToDo: show size of partitioned table

2018-11-22 Thread Pavel Stehule
čt 22. 11. 2018 v 15:29 odesílatel Michael Paquier 
napsal:

> On Thu, Nov 22, 2018 at 12:42:14PM +0100, Pavel Stehule wrote:
> > Here my position is strong. \dP for me doesn't mean "tables or
> > indexes" - it means "partition tables with total relation size". I
> > don't see any sense to show tables and indexes in one report.
>
> Please let me disagree on that point.  \dP, \dPt and \dPi are commands
> able to show information about respectively partitioned relations,
> partitioned tables and partitioned indexes, which is not something only
> related to the size of those partitions.  Showing only the level of a
> relation in its hierarchy may be useful, but that's confusing for the
> user without knowing its direct parent or its top-most parent.  For
> multiple levels, the direct parent without the number in the hierarchy
> seems enough to me.  I may be of course wrong in designing those
> concepts.
>

There are open two points:

1. display hierarchy of partitioned structures.
2. what should be displayed by \dP command.

@1 I agree so this information can be interesting and useful. But I have a
problem with consistency of this report. When result is table, then I think
so we can introduce, and should to introduce some new special report for
command - maybe \dPh

that can show hiearchy of one partitioned table (the table name should be
required)

I think so can be much more readable to have special report like

\dPh parent_tab
parent_tab
  -> direct partitions24kB
  -> child_30_40
  -> direct partitions16kB

This is some what i can read, and I see (very naturally) the hierarchy of
partitions and the relations between

I have not feel well when I see in one report numbers 40 and 16, I see much
more comfortable when I see 24 and 16, but for this I need a different
perspective

What do you think about it?




> --
> Michael
>


Re: ToDo: show size of partitioned table

2018-11-22 Thread Michael Paquier
On Thu, Nov 22, 2018 at 12:42:14PM +0100, Pavel Stehule wrote:
> Here my position is strong. \dP for me doesn't mean "tables or
> indexes" - it means "partition tables with total relation size". I
> don't see any sense to show tables and indexes in one report.

Please let me disagree on that point.  \dP, \dPt and \dPi are commands
able to show information about respectively partitioned relations,
partitioned tables and partitioned indexes, which is not something only
related to the size of those partitions.  Showing only the level of a
relation in its hierarchy may be useful, but that's confusing for the
user without knowing its direct parent or its top-most parent.  For
multiple levels, the direct parent without the number in the hierarchy
seems enough to me.  I may be of course wrong in designing those
concepts.
--
Michael


signature.asc
Description: PGP signature


Re: New function pg_stat_statements_reset_query() to reset statistics of a specific query

2018-11-22 Thread Amit Kapila
On Thu, Nov 22, 2018 at 7:48 PM Amit Kapila  wrote:
>
> On Thu, Nov 22, 2018 at 7:02 PM Alvaro Herrera  
> wrote:
> >
> > On 2018-Nov-20, Haribabu Kommi wrote:
> >
> > > > > 4. Single API with -1 as invalid value, treat NULL as no matching. 
> > > > > (Only
> > > > problem
> > > > >  with this approach is till now -1 is also a valid queryid, but 
> > > > > setting
> > > > -1 as queryid
> > > > > needs to be avoided.
> > > >
> > > > Hmm, can we use 0 as default value without any such caveat?
> > >
> > > Yes, with strict and 0 as default value can work.
> > > If there is no problem, I can go ahead with the above changes?
> >
> > I'm not sure I understand this proposal.  Does this mean that passing -1
> > as databaseid / userid would match all databases/users, and passing 0 as
> > queryid would match all queries?
> >
>
> No, for userid/databaseid also it will be 0 (aka InvalidOid).
>

Not sure if the above statement is clear,  but what I wanted to say
was "for all the three userid/databaseid/queryid, default will be 0
(aka InvalidOid)."



-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: New function pg_stat_statements_reset_query() to reset statistics of a specific query

2018-11-22 Thread Amit Kapila
On Thu, Nov 22, 2018 at 7:02 PM Alvaro Herrera  wrote:
>
> On 2018-Nov-20, Haribabu Kommi wrote:
>
> > > > 4. Single API with -1 as invalid value, treat NULL as no matching. (Only
> > > problem
> > > >  with this approach is till now -1 is also a valid queryid, but setting
> > > -1 as queryid
> > > > needs to be avoided.
> > >
> > > Hmm, can we use 0 as default value without any such caveat?
> >
> > Yes, with strict and 0 as default value can work.
> > If there is no problem, I can go ahead with the above changes?
>
> I'm not sure I understand this proposal.  Does this mean that passing -1
> as databaseid / userid would match all databases/users, and passing 0 as
> queryid would match all queries?
>

No, for userid/databaseid also it will be 0 (aka InvalidOid).


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Constraint documentation

2018-11-22 Thread Patrick Francelle
On 11/15/18 00:02, Tom Lane wrote:
> I think this could be improved some more.  Perhaps something like this
> (I've not bothered with markup...)
> 

> 
> This is a little verbose maybe, but as the text stands, it sounds like
> using a trigger is enough to solve all the consistency problems that
> a cross-row CHECK has.  Which it's not of course.

Thank you for the rewriting, this is much more clear and explicit that way.

> I'm also wondering whether it's better to put this in the CREATE TABLE
> reference page instead of here.  While there are certainly benefits in
> having the caveat here, I'm a bit troubled by the number of forward
> references to concepts that are described later.  OTOH, a lot of people
> who need the warning might never see it if it's buried in the reference
> material.

To address your remark, I added a small message in the CREATE TABLE
reference page to be more explicit about the topic, so that it would be
a warning for the users reading the section. And then a reference to the
CHECK constraint page where the full explanation is to be located.

That way, the caveat is mentioned in both pages, but the full
explanation is located only on a single page.


Please, let me know if this is good enough or maybe if I missed
something.

Patrick Francelle

diff --git a/doc/src/sgml/ddl.sgml b/doc/src/sgml/ddl.sgml
index 61c4a25460..bfe89ef8ae 100644
--- a/doc/src/sgml/ddl.sgml
+++ b/doc/src/sgml/ddl.sgml
@@ -403,6 +403,33 @@ CREATE TABLE products (
 ensure that a column does not contain null values, the not-null
 constraint described in the next section can be used.

+
+   
+
+ PostgreSQL does not support
+ CHECK constraints that reference table data other than
+ the new or updated row being checked.  While a CHECK
+ constraint that violates this rule may appear to work in simple
+ tests, it cannot guarantee that the database will not reach a state
+ in which the constraint condition is false (due to subsequent changes
+ of the other row(s) involved).  This would cause a database dump and
+ reload to fail.  The reload could fail even when the complete
+ database state is consistent with the constraint, due to rows not
+ being loaded in an order that will satisfy the constraint.  If
+ possible, use UNIQUE, EXCLUDE,
+ or FOREIGN KEY constraints to express
+ cross-row and cross-table restrictions.
+
+
+
+ If what you desire is a one-time check against other rows at row
+ insertion, rather than a continuously-maintained consistency
+ guarantee, a custom trigger can be used
+ to implement that.  (This approach avoids the dump/reload problem because
+ pg_dump does not reinstall triggers until after
+ reloading data, so that the check will not be enforced during a dump/reload.)
+
+   
   
 
   
diff --git a/doc/src/sgml/ref/create_table.sgml b/doc/src/sgml/ref/create_table.sgml
index 50d5597002..d6d1191ddd 100644
--- a/doc/src/sgml/ref/create_table.sgml
+++ b/doc/src/sgml/ref/create_table.sgml
@@ -756,6 +756,8 @@ WITH ( MODULUS numeric_literal, REM
   subqueries nor refer to variables other than columns of the
   current row.  The system column tableoid
   may be referenced, but not any other system column.
+  Also, CHECK constraints that references other tables
+  are not supported (see ).
  
 
  



Re: reg* checks in pg_upgrade are out of date

2018-11-22 Thread Andrew Dunstan



On 11/21/18 7:12 PM, Andres Freund wrote:

Hi,

It seems the list of reg* types and the check for them in pg_upgrade
have gone out of sync. We have the following reg* types:

  SELECT typname FROM pg_type WHERE typname LIKE 'reg%' order by typname;
┌───┐
│typname│
├───┤
│ regclass  │
│ regconfig │
│ regdictionary │
│ regnamespace  │
│ regoper   │
│ regoperator   │
│ regproc   │
│ regprocedure  │
│ regrole   │
│ regtype   │
└───┘
(10 rows)

but pg_upgrade doesn't consider all of them:

 /*
  * While several relkinds don't store any data, e.g. views, they can
  * be used to define data types of other columns, so we check all
  * relkinds.
  */
 res = executeQueryOrDie(conn,
 "SELECT n.nspname, c.relname, a.attname "
 "FROM   pg_catalog.pg_class c, "
 "   pg_catalog.pg_namespace n, "
 "   pg_catalog.pg_attribute a "
 "WHERE  c.oid = a.attrelid AND "
 "   NOT a.attisdropped AND "
 "   a.atttypid IN ( "
 "   
'pg_catalog.regproc'::pg_catalog.regtype, "
 "   
'pg_catalog.regprocedure'::pg_catalog.regtype, "
 "   
'pg_catalog.regoper'::pg_catalog.regtype, "
 "   
'pg_catalog.regoperator'::pg_catalog.regtype, "
 /* regclass.oid is preserved, so 'regclass' is OK */
 /* regtype.oid is preserved, so 'regtype' is OK */
 "   
'pg_catalog.regconfig'::pg_catalog.regtype, "
 "   
'pg_catalog.regdictionary'::pg_catalog.regtype) AND "
 "   c.relnamespace = n.oid AND "
 "   n.nspname NOT IN ('pg_catalog', 
'information_schema')");

(I don't get the order here btw)

ISTM when regrole and regnamespace were added, pg_upgrade wasn't
considered. It turns out that regrole is safe, because we preserve user
oids, but regnamespace isn't afaict.  I don't think it's extremely
likely that users store such reg* columns in tables, but we probably
still should fix this.



yeah, I think you're right, both about the need to fix it and the 
unlikelihood of it occurring in the wild.



cheers


andrew



--
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: New function pg_stat_statements_reset_query() to reset statistics of a specific query

2018-11-22 Thread Alvaro Herrera
On 2018-Nov-20, Haribabu Kommi wrote:

> > > 4. Single API with -1 as invalid value, treat NULL as no matching. (Only
> > problem
> > >  with this approach is till now -1 is also a valid queryid, but setting
> > -1 as queryid
> > > needs to be avoided.
> >
> > Hmm, can we use 0 as default value without any such caveat?
> 
> Yes, with strict and 0 as default value can work.
> If there is no problem, I can go ahead with the above changes?

I'm not sure I understand this proposal.  Does this mean that passing -1
as databaseid / userid would match all databases/users, and passing 0 as
queryid would match all queries?  This would be suprising, since OID is
unsigned so -1 is a possibly valid value ...

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [PATCH] btree_gist: fix union implementation for variable length columns

2018-11-22 Thread Pavel Raiskup
On Thursday, July 12, 2018 5:53:18 PM CET Teodor Sigaev wrote:
> > It would be easier to figure this out if the btree_gist code weren't
> > so desperately undocumented.  Teodor, do you remember why it's like
> > this?
> 
> Will look.

Ping on this issue.  I guess the patch I proposed isn't wrong in this
case, I'm just not sure about the commit message.

Pavel






Re: [WIP] CREATE SUBSCRIPTION with FOR TABLES clause (table filter)

2018-11-22 Thread Evgeniy Efimkin

Hello!
New draft attached with filtering table in subscription (ADD/DROP) and allow 
non-superusers use` CREATE SUBSCRIPTION` for own tables.

14.11.2018, 18:10, "Evgeniy Efimkin" :
> Hello!
> I started work on patch (draft attached). Draft has changes related only to 
> `CREATE SUBSCRIPTION`.
> I also introduce a new status (DEFFERED) for tables in `FOR TABLE` clause 
> (but not in publication).
> New column in pg_subscription (suballtables) will be used in `REFRESH` clause
>
> 09.11.2018, 15:24, "Evgeniy Efimkin" :
>>  Hi!
>>  In order to support create subscription from non-superuser, we need to make 
>> it possible to choose tables on the subscriber side:
>>  1. add `FOR TABLE` clause in `CREATE SUBSCRIPTION`:
>> ```
>>  CREATE SUBSCRIPTION subscription_name
>>  CONNECTION 'conninfo'
>>  PUBLICATION publication_name [, ...]
>>  [ FOR TABLE [ ONLY ] table_name [ * ] [, ...]| FOR ALL TABLES ]
>>  [ WITH ( subscription_parameter [= value] [, ... ] ) ]
>> ```
>> ... where `FOR ALL TABLES` is only allowed for superuser.
>> and table list in `FOR TABLES` clause will be stored in 
>> pg_subscription_rel table (maybe another place?)
>>
>>  2. Each subscription should have "all tables" attribute.
>> For example via a new column in pg_subscription "suballtables".
>>
>>  3. Add `ALTER SUBSCRIPTION (ADD TABLE | DROP TABLE)`:
>> ```
>>  ALTER SUBSCRIPTION subscription_name ADD TABLE [ ONLY ] table_name 
>> [WITH copy_data];
>>  ALTER SUBSCRIPTION subscription_name DROP TABLE [ ONLY ] table_name;
>> ```
>>  4. On `ALTER SUBSCRIPTION  REFRESH PUBLICATION` should check if 
>> table owner equals subscription owner. The check is ommited if subscription 
>> owner is superuser.
>>  5. If superuser calls `ALTER SUBSCRIPTION REFRESH PUBLICATION` on 
>> subscription with table list and non-superuser owner, we should filter 
>> tables which owner is not subscription's owner or maybe we need to raise 
>> error?
>>
>>  What do you think about it? Any objections?
>>
>>  07.11.2018, 00:52, "Stephen Frost" :
>>>   Greetings,
>>>
>>>   * Evgeniy Efimkin (efim...@yandex-team.ru) wrote:
    As a first step I suggest we allow CREATE SUBSCRIPTION for table owner 
 only.
>>>
>>>   That's a nice idea but seems like we would want to have a way to filter
>>>   what tables a subscription follows then..? Just failing if the
>>>   publication publishes tables that we don't have access to or are not the
>>>   owner of seems like a poor solution..
>>>
>>>   Thanks!
>>>
>>>   Stephen
>>
>>  
>>  Ефимкин Евгений
>
> 
> Ефимкин Евгений

 
Ефимкин Евгений
diff --git a/src/backend/catalog/pg_subscription.c b/src/backend/catalog/pg_subscription.c
index e136aa6a0b..5d7841f296 100644
--- a/src/backend/catalog/pg_subscription.c
+++ b/src/backend/catalog/pg_subscription.c
@@ -70,6 +70,7 @@ GetSubscription(Oid subid, bool missing_ok)
 	sub->name = pstrdup(NameStr(subform->subname));
 	sub->owner = subform->subowner;
 	sub->enabled = subform->subenabled;
+	sub->alltables = subform->suballtables;
 
 	/* Get conninfo */
 	datum = SysCacheGetAttr(SUBSCRIPTIONOID,
diff --git a/src/backend/commands/subscriptioncmds.c b/src/backend/commands/subscriptioncmds.c
index 9021463a4c..4fe643c3bc 100644
--- a/src/backend/commands/subscriptioncmds.c
+++ b/src/backend/commands/subscriptioncmds.c
@@ -30,6 +30,7 @@
 #include "catalog/pg_subscription.h"
 #include "catalog/pg_subscription_rel.h"
 
+#include "commands/dbcommands.h"
 #include "commands/defrem.h"
 #include "commands/event_trigger.h"
 #include "commands/subscriptioncmds.h"
@@ -322,6 +323,14 @@ CreateSubscription(CreateSubscriptionStmt *stmt, bool isTopLevel)
 	char		originname[NAMEDATALEN];
 	bool		create_slot;
 	List	   *publications;
+	AclResult	aclresult;
+	bool 		alltables;
+
+	/* must have CREATE privilege on database */
+	aclresult = pg_database_aclcheck(MyDatabaseId, GetUserId(), ACL_CREATE);
+	if (aclresult != ACLCHECK_OK)
+		aclcheck_error(aclresult, OBJECT_DATABASE,
+	   get_database_name(MyDatabaseId));
 
 	/*
 	 * Parse and check options.
@@ -341,11 +350,13 @@ CreateSubscription(CreateSubscriptionStmt *stmt, bool isTopLevel)
 	 */
 	if (create_slot)
 		PreventInTransactionBlock(isTopLevel, "CREATE SUBSCRIPTION ... WITH (create_slot = true)");
-
-	if (!superuser())
+	alltables = !stmt->tables;
+	/* FOR ALL TABLES requires superuser */
+	if (alltables && !superuser())
 		ereport(ERROR,
 (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
- (errmsg("must be superuser to create subscriptions";
+ (errmsg("must be superuser to create FOR ALL TABLES publication";
+
 
 	rel = heap_open(SubscriptionRelationId, RowExclusiveLock);
 
@@ -375,6 +386,7 @@ CreateSubscription(CreateSubscriptionStmt *stmt, bool isTopLevel)
 
 	/* Check the connection info string. */
 	walrcv_check_conninfo(conninfo);
+	

Re: [HACKERS] Time to change pg_regress diffs to unified by default?

2018-11-22 Thread Christoph Berg
Re: Noah Misch 2017-04-07 <20170407021431.gb2658...@tornado.leadboat.com>
> > > I personally, and I know of a bunch of other regular contributors, find
> > > context diffs very hard to read.  Besides general dislike, for things
> > > like regression test output context diffs are just not well suited.
> > 
> > Personally, I disagree completely.  Unified diffs are utterly unreadable
> > for anything beyond trivial cases of small well-separated changes.
> > 
> > It's possible that regression failure diffs will usually fall into that
> > category, but I'm not convinced.
> 
> For reading patches, I frequently use both formats.  Overall, I perhaps read
> unified 3/4 of the time and context 1/4 of the time.
> 
> For regression diffs, I use PG_REGRESS_DIFF_OPTS=-u and have never converted a
> regression diff to context form.  Hence, +1 for the proposed change.

I've just had another case of horrible context diff from pg_regress.
I'd claim that regression diffs are particularly bad for context diffs
because one error will often trigger a whole chain of failures, which
will essentially make the whole file appear twice in the output,
whereas unified diffs would just put the original output and the error
right next to each other at the top. Having to scroll through a whole
.out file just to find "create extension; file not found" is very
inefficient.

It's nice that PG_REGRESS_DIFF_OPTS exists, but the diffs are often
coming from automated build logs where setting the variable after the
fact doesn't help.

Please consider the attached patch, extension packagers will thank
you.

Christoph
>From a9b1ef089bbcc36dc65e4acff60ae3b83ecd06e3 Mon Sep 17 00:00:00 2001
From: Christoph Berg 
Date: Thu, 22 Nov 2018 13:33:42 +0100
Subject: [PATCH] Switch pg_regress to output unified diffs by default

---
 doc/src/sgml/regress.sgml | 2 +-
 src/test/regress/pg_regress.c | 4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/doc/src/sgml/regress.sgml b/doc/src/sgml/regress.sgml
index 673a8c2164..a33e2482f2 100644
--- a/doc/src/sgml/regress.sgml
+++ b/doc/src/sgml/regress.sgml
@@ -410,7 +410,7 @@ make standbycheck
 If you don't
 like the diff options that are used by default, set the
 environment variable PG_REGRESS_DIFF_OPTS, for
-instance PG_REGRESS_DIFF_OPTS='-u'.  (Or you
+instance PG_REGRESS_DIFF_OPTS='-C'.  (Or you
 can run diff yourself, if you prefer.)

 
diff --git a/src/test/regress/pg_regress.c b/src/test/regress/pg_regress.c
index 3248603da1..bb06624936 100644
--- a/src/test/regress/pg_regress.c
+++ b/src/test/regress/pg_regress.c
@@ -62,10 +62,10 @@ static char *shellprog = SHELLPROG;
  */
 #ifndef WIN32
 const char *basic_diff_opts = "";
-const char *pretty_diff_opts = "-C3";
+const char *pretty_diff_opts = "-U3";
 #else
 const char *basic_diff_opts = "-w";
-const char *pretty_diff_opts = "-w -C3";
+const char *pretty_diff_opts = "-w -U3";
 #endif
 
 /* options settable from command line */
-- 
2.19.1



Re: [HACKERS] Re: [COMMITTERS] pgsql: Remove pgbench "progress" test pending solution of its timing is (fwd)

2018-11-22 Thread Dmitry Dolgov
> On Thu, Jul 19, 2018 at 2:24 AM Fabien COELHO  wrote:
>
>
> Hello Heikki,
>
> >> [...]
> >> So threadRun() would not have the opportunity to stop the scheduled
> >> transaction, even if beyond the end of run, because it would not have got
> >> out of doCustom, in the case I outlined above.
> >
> > I see. Instead of moving to FINISHED state, then, we could stay in THROTTLE
> > state, and 'return' out of doCustom(), to give the code in threadRun() a
> > chance to detect that the timer is up. Something like the attached. (I moved
> > the check after the check for latency_limit, because that code updates
> > txn_scheduled. Seems more like a more correct place, although that's as a
> > separate issue.)
>
> Although I think it would works, I do not find it better than the previous
> situation: Before the change throttling simply jumps to finished if time
> is up, while with that option the jump cannot be seen from within doCustom
> and relies on threadRun to do so, which is somehow much harder to see from
> the code because things happen in two functions.
>
> I would rather move state changes from threadRun to doCustom only, so that
> they are all in one place where it is easier to check and understand.
>
> As a PoC, see attached which does that and also records all stats instead
> of trying to be clever, and tries to homogeneise comments somehow. As I
> find it strange that a script can be interrupted in sleep and after a
> shell command, but not in other states, rather documents that once it
> started it will run to its end, and only cut it out before or after, but
> not within. Also, there are no state changes outside doCustom, and
> threadRun only looks at the states for decisions.

Unfortunately, there was no activity over the last few commitfests and the
proposed patch pgbench-tap-progress-6 can't be applied anymore without
conflicts. Fabien, what are your plans about it, could you please post a
rebased version?



Re: pg_upgrade supported versions policy

2018-11-22 Thread Magnus Hagander
On Thu, Nov 22, 2018 at 12:48 AM Andres Freund  wrote:

> Hi,
>
> I feel like we ought to trim the support for a few old versions from
> pg_upgrade.  In my particular case I don't really think it's reasonable
> to test < 9.0 versions for pg_largeobject_metadata migrations. But I
> think we should create a policy that's the default, leaving individual
> cases aside.
>
> I can see a few possible policies:
>
> 1) Support upgrading from the set of releases that were supported when
>the pg_upgrade target version was released. While some will argue that
>this is fairly short, people above it can still upgrade ~10 years
>worth of versions with a single intermediary upgrade.
> 2) Same, but +2 releases or such.
> 3) Support until it gets too uncomfortable.
> 4) Support all versions remotely feasible (i.e. don't drop versions that
>still can be compiled)
>
> I personally think 1 is preferrable and 2 is acceptable.
>

As a developer, I'd like 1. As someone who repeatedly runs into customer
cases, I'd say 2. The reason for that is that far too many don't realize
they should upgrade on time, but at least a fair number of them notice
within one cycle from it going EOL -- perhaps just by reading the
announcement saying "hey version x is now EOL". And supporting +1 or +2
versions makes it possible for them to go directly to latest.

3 and 4 both causes a lot of work on the dev side, but also makes it a lot
less predictable for the end users. I'm willing to be many of them prefer a
defined policy even if it's a bit shorter than the limits we have now, to a
"sorry we dont know" kind of policy. Thus, -1 for both 3 and 4.

-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/ 


Re: New function pg_stat_statements_reset_query() to reset statistics of a specific query

2018-11-22 Thread Magnus Hagander
On Thu, Nov 22, 2018 at 4:10 AM Amit Kapila  wrote:

> On Mon, Nov 19, 2018 at 10:48 AM Haribabu Kommi
>  wrote:
> >
> > On Mon, Nov 19, 2018 at 1:37 PM Alvaro Herrera 
> wrote:
> >>
> >> On 2018-Nov-19, Michael Paquier wrote:
> >>
> >> > On Mon, Nov 19, 2018 at 10:41:22AM +1100, Haribabu Kommi wrote:
> >> > > So 6 new functions needs to be added to cover all the above cases,
> >> > > IMO, we may need functions for all combinations, because I feel some
> >> > > user may have the requirement of left out combination, in case if
> we choose
> >> > > only some combinations.
> >> >
> >> > That's bloating the interface in my opinion.
> >>
> >> I understand.
> >>
> >> Let's call for a vote from a larger audience.  It's important to get
> >> this interface right, ISTM.
> >
> >
> > Amit suggested another option in another mail, so total viable
> > solutions that are discussed as of now are,
> >
> > 1. Single API with NULL input treat as invalid value
> > 2. Multiple API to deal with NULL input of other values
> > 3. Single API with NULL value to treat them as current user, current
> database
> >  and NULL queryid.
> > 4. Single API with -1 as invalid value, treat NULL as no matching. (Only
> problem
> >  with this approach is till now -1 is also a valid queryid, but setting
> -1 as queryid
> > needs to be avoided.
> >
>
> As discussed above the problem mentioned by Hari in point-4 won't be
> there if we use a default value as 0.
>
> > I prefer single API. I can go with either 3 or 4.
> >
> > opinion from others?
>
> We don't see many opinions from others, but what I can read here is the
> count:
> Option-3: Michael, Hari
> Option-4: Amit, Hari
> Option-2: Alvaro
>
> As Hari seems to be a bit more inclined towards option-4, I think we
> can proceed with it.
>

If you want more input ont it, I'd vote for either 2 or 4. Between those
two I think I have a slight favor in the direction of 2, but really not
enough to voice strongly. I do feel more strongly that 1 and 3 are not good
options.

-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/ 


Re: ToDo: show size of partitioned table

2018-11-22 Thread Pavel Stehule
čt 22. 11. 2018 v 1:51 odesílatel Michael Paquier 
napsal:

> On Wed, Nov 21, 2018 at 05:37:33PM +0100, Pavel Stehule wrote:
> > st 21. 11. 2018 v 17:21 odesílatel Alvaro Herrera <
> alvhe...@2ndquadrant.com>
> > napsal:
> >> Hmm, these tests are not going to work, because they have "pavel" in the
> >> expected output.
> >
> > I was blind, thank you for check
>
> +create table testtable_apple(logdate date);
> +create table testtable_orange(logdate date);
> +create index testtable_apple_index on testtable_apple(logdate);
> +create index testtable_orange_index on testtable_orange(logdate);
> There are already a bunch of partition relations with multiple levels
> created as part of the regression tests, so instead of creating more of
> those, I would suggest to test \dP and \dPt in create_table.sql, and
> \dPi in indexing.sql (please make sure to add tests for \dP with
> partitioned indexes as well).
>
> I think that you should really add the direct parent of a partition in
> at least the verbose output, now for multiple partition levels things
> are confusing in my opinion.  For example with such a schema:
> CREATE TABLE parent_tab (id int) PARTITION BY RANGE (id);
> CREATE INDEX parent_index ON parent_tab (id);
> CREATE TABLE child_0_10 PARTITION OF parent_tab
>   FOR VALUES FROM (0) TO (10);
> CREATE TABLE child_10_20 PARTITION OF parent_tab
>   FOR VALUES FROM (10) TO (20);
> CREATE TABLE child_20_30 PARTITION OF parent_tab
>   FOR VALUES FROM (20) TO (30);
> INSERT INTO parent_tab VALUES (generate_series(0,29));
> CREATE TABLE child_30_40 PARTITION OF parent_tab
> FOR VALUES FROM (30) TO (40)
>   PARTITION BY RANGE(id);
> CREATE TABLE child_30_35 PARTITION OF child_30_40
>   FOR VALUES FROM (30) TO (35);
> CREATE TABLE child_35_40 PARTITION OF child_30_40
>FOR VALUES FROM (35) TO (40);
> INSERT INTO parent_tab VALUES (generate_series(30,39));
>
> Then with \dP+ I got that:
> =# \dP+
> List of partitioned relations
>  Schema |Name | Owner  |  Size  | Description
> +-+++-
>  public | child_30_40 | ioltas | 48 kB  |
>  public | parent_tab  | ioltas | 120 kB |
> (2 rows)
> Showing the parent partition looks like a pretty important to me as I
> would expect multi-level partitions to be a frequent case (perhaps it
> should show up as well in the non-verbose output?).  The field should be
> NULL if the relation is the top of the tree.
>
>
it looks like bug for me much more.

your example - on my comp

   List of relations
++-+---+---++-+
| Schema |Name |   Type| Owner |Size|
Description |
++-+---+---++-+
| public | child_0_10  | table | pavel | 8192 bytes
| |
| public | child_10_20 | table | pavel | 8192 bytes
| |
| public | child_20_30 | table | pavel | 8192 bytes
| |
| public | child_30_35 | table | pavel | 8192 bytes
| |
| public | child_30_40 | partitioned table | pavel | 0 bytes
| |
| public | child_35_40 | table | pavel | 8192 bytes
| |
| public | parent_tab  | partitioned table | pavel | 0 bytes
| |
++-+---+---++-+
(7 rows)

there is about 5x 8KB data .. 40KB

But in views I got

  List of partitioned tables
++-+---+---+-+
| Schema |Name | Owner | Size  | Description |
++-+---+---+-+
| public | child_30_40 | pavel | 16 kB | |
| public | parent_tab  | pavel | 40 kB | |
++-+---+---+-+
(2 rows)

there is 16KB more, what is really messy.

I think so most correct is removing child_30_40 from the report.

test=# SELECT n.nspname as "Schema",
  c.relname as "Name",
  pg_catalog.pg_get_userbyid(c.relowner) as "Owner",
  (SELECT pg_catalog.pg_size_pretty(sum(pg_catalog.pg_table_size(relid)))
 FROM pg_catalog.pg_partition_tree(c.oid)) AS "Size",
  pg_catalog.obj_description(c.oid, 'pg_class') as "Description"
FROM pg_catalog.pg_class c
 LEFT JOIN pg_catalog.pg_namespace n ON n.oid = c.relnamespace
WHERE c.relkind IN ('p') and not c.relispartition
  AND n.nspname <> 'pg_catalog'
  AND n.nspname <> 'information_schema'
  AND n.nspname !~ '^pg_toast'
  AND pg_catalog.pg_table_is_visible(c.oid)
ORDER BY 1,2;
+++---+---+-+
| Schema |Name| Owner | Size  | Description |
+++---+---+-+
| public | parent_tab | pavel | 40 kB | |
+++---+---+-+
(1 row)

I afraid of unreadable result if we allow overlap in report. I think so can
be strange if some disk 

Re: Control your disk usage in PG: Introduction to Disk Quota Extension

2018-11-22 Thread Haozhou Wang
Thank you very much for your review.
We refactored our patch with new names and comments.

For ReadBufferExtended hook, yes, Readbuffer with P_NEW will then call
smgrextend.

But in smgrextend, we cannot get the oid of a relation, and it will take
some time to get the oid via smgrrelation.
We would like to add a hook just before the smgrextend to get the oid and
avoid use RelidByRelfilenode().

New patch is attached in the attachment.
Thank a lot!

Regards,
Haozhou


On Wed, Nov 21, 2018 at 10:48 PM Robert Haas  wrote:

> On Tue, Nov 20, 2018 at 2:20 AM Haozhou Wang  wrote:
> > We prepared a patch that includes the hook points. And such hook points
> are needed for disk quota extension.
> > There are two hooks.
> > One is SmgrStat_hook. It's used to perform ad-hoc logic in storage when
> doing smgr create/extend/truncate in general. As for disk quota extension,
> this hook is used to detect active tables(new created tables, tables
> extending new blocks, or tables being truncated)
> > The other is BufferExtendCheckPerms_hook. It's used to perform ad-hoc
> logic when buffer extend a new block. Since ReadBufferExtended is a hot
> function, we call this hook only when blockNum == P_NEW. As  for disk quota
> extension, this hook is used to do query enforcement during the query is
> loading data.
> >
> > Any comments are appreciated.
>
> +1 for adding some hooks to support this kind of thing, but I think
> the names you've chosen are not very good.  The hook name should
> describe the place from which it is called, not the purpose for which
> one imagines that it will be used, because somebody else might imagine
> another use.  Both BufferExtendCheckPerms_hook_type and
> SmgrStat_hook_type are imagining that they know what the hook does -
> CheckPerms in the first case and Stat in the second case.
>
> For this particular purpose, I don't immediately see why you need a
> hook in both places.  If ReadBuffer is called with P_NEW, aren't we
> guaranteed to end up in smgrextend()?
>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>


disk_quota_hooks_v2.patch
Description: Binary data


Re: A WalSnd issue related to state WALSNDSTATE_STOPPING

2018-11-22 Thread Paul Guo
On Thu, Nov 22, 2018 at 1:29 PM Michael Paquier  wrote:

> On Wed, Nov 21, 2018 at 04:09:41PM +0900, Michael Paquier wrote:
> > The checkpointer initializes a shutdown checkpoint where it tells to all
> > the WAL senders to stop once all the children processes are gone, so it
> > seems to me that there is little point in processing
> > SyncRepReleaseWaiters() when a WAL sender is in WALSNDSTATE_STOPPING
> > state as at this stage syncrep makes little sense.  It is still
> > necessary to process standby messages at this stage so as the primary
> > can shut down when it is sure that all the standbys have flushed the
> > shutdown checkpoint record of the primary.
>
> Just refreshed my memory with c6c33343, which is actually at the origin
> of the issue, and my previous argument is flawed.  If a WAL sender has
> reached WALSNDSTATE_STOPPING no regular backends are around but a WAL
> sender could always commit a transaction in parallel which may need to
> make sure that its record is flushed and sync'ed, and this needs to make
> sure that waiters are correctly released.  So it is necessary to patch
> up SyncRepGetSyncStandbysPriority and SyncRepGetSyncStandbysQuorum as
> mentioned upthread, perhaps adding a comment when looking at
> MyWalSnd->state looks adapted. Paul, would you like to write a patch?
> --
> Michael
>

Yes, please see the attached patch.


0001-Allow-stopping-wal-senders-to-be-invovled-in-SyncRep.patch
Description: Binary data


Re: Ordered Partitioned Table Scans

2018-11-22 Thread David Rowley
On Mon, 5 Nov 2018 at 10:46, David Rowley  wrote:
> On 1 November 2018 at 22:05, Antonin Houska  wrote:
> > I think these conditions are too restrictive:
> >
> > /*
> >  * Determine if these pathkeys match the partition order, or reverse
> >  * partition order.  It can't match both, so only go to the trouble 
> > of
> >  * checking the reverse order when it's not in ascending partition
> >  * order.
> >  */
> > partition_order = pathkeys_contained_in(pathkeys,
> > partition_pathkeys);
> > partition_order_desc = !partition_order &&
> > pathkeys_contained_in(pathkeys,
> > 
> > partition_pathkeys_desc);
> >

> The problem with doing that is that if the partition keys are better
> than the pathkeys then we'll most likely fail to generate any
> partition keys at all due to lack of any existing eclass to use for
> the pathkeys. It's unsafe to use just the prefix in this case as the
> eclass may not have been found due to, for example one of the
> partition keys having a different collation than the required sort
> order of the query. In other words, we can't rely on a failure to
> create the pathkey meaning that a more strict sort order is not
> required.

I had another look at this patch and it seems okay just to add a new
flag to build_partition_pathkeys() to indicate if the pathkey List was
truncated or not.  In generate_mergeappend_paths() we can then just
check that flag before checking if the partiiton pathkeys are
contained in pathkeys. It's fine if the partition keys were truncated
for the reverse of that check.

I've done this in the attached and added additional regression tests
for this case.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


v5-0001-Allow-Append-to-be-used-in-place-of-MergeAppend-f.patch
Description: Binary data


tab-completion debug print

2018-11-22 Thread Kyotaro HORIGUCHI
Hello.

I was reminded that I was often annoyed with identifying the code
that made a word-completion, by hearing the same complaint from a
collegue of mine just now.

Something like the attached that tweaks completion_matches calls
lets psql emit the line number where a word-completion
happens. The output can be split out using redirection so that it
doesn't break into the conversation on console.

(make -s COPT=-DTABCOMPLETION_DEBUG install)

$ psql postgres 2>~debug.out
=# alt[tab]er [tab]t[tab]ab[tab] [tab]

You can see the following output in another bash session.

$ tail -f ~/debug.out
[1414][1435][1435][1435][1431]

Every number enclosed by brackets is the line number in
tab-complete.c, where completion happens.

Is this useful? Any suggestions, thoughts?


regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
index 9dbd555166..2deb892de6 100644
--- a/src/bin/psql/tab-complete.c
+++ b/src/bin/psql/tab-complete.c
@@ -60,6 +60,23 @@ extern char *filename_completion_function();
 #define completion_matches rl_completion_matches
 #endif
 
+/*
+ * By enabling the following definition the source line number is emitted to
+ * stderr for every completion attempt. You can isolate them from console
+ * interaction by redirecting stderr into a file.
+ */
+#ifdef TABCOMPLETION_DEBUG
+#ifdef HAVE_RL_COMPLETION_MATCHES
+#define org_completion_matches rl_completion_matches
+#else
+#define org_completion_matches completion_matches
+#endif
+
+#undef completion_matches
+#define completion_matches(text, list) \
+	(fprintf(stderr, "[%d]", __LINE__), org_completion_matches(text, list))
+#endif
+
 /* word break characters */
 #define WORD_BREAKS		"\t\n@$><=;|&{() "
 


Re: cursors with prepared statements

2018-11-22 Thread Dmitry Dolgov
> On Wed, Jul 18, 2018 at 10:27 AM Heikki Linnakangas  wrote:
>
> On 16/07/18 15:56, Peter Eisentraut wrote:
> > On 11.07.18 19:07, Heikki Linnakangas wrote:
> >> It's confusing, and risks conflicting with future additions to
> >> the standard. ECPG supports the actual standard syntax, with OPEN,
> >> right? So this wouldn't be consistent with ECPG, either.
> >
> > It would be consistent for the case of no parameters.
>
> True. Except that if I understand correctly, in the standard syntax you
> still need to use OPEN after the DECLARE CURSOR, even when there are no
> parameters.
>
> >>> Curiously, the direct EXECUTE statement uses the non-standard syntax
> >>>
> >>>   EXECUTE prep_stmt (param, param);
> >>>
> >>> instead of the standard
> >>>
> >>>   EXECUTE prep_stmt USING param, param;
> >>>
> >>> I tried to consolidate this.  But using
> >>>
> >>>   DECLARE c CURSOR FOR p (foo, bar)
> >>>
> >>> leads to parsing conflicts (and looks confusing?),
> >>
> >> How about
> >>
> >> DECLARE c CURSOR FOR EXECUTE p (foo, bar)
> >
> > That's not the standard syntax for the case of no parameters.
>
> My thinking here is that "DECLARE c CURSOR FOR " is standard
> syntax. And we already have "EXECUTE p (foo, bar)" as a form of
> statement, along with "SELECT ...", "EXPLAIN ..." and so forth. Allowing
> "DECLARE c CURSOR FOR EXECUTE p (foo, bar)" would not introduce a new
> syntax, it would just allow the existing two commands, DECLARE CURSOR,
> and EXECUTE, to be used together.

This patch went through the last few commitfests without any noticeable
activity. Both suggested patches are still good (can be applied and passed all
the tests, except the minor text mismatch in the original one), but looks like
the discussion stopped right in the middle. Are there any more opinions about
OPEN vs DECLARE .. CURSOR FOR here or any other plans about the patch?



Hitting CheckRelationLockedByMe() ASSERT with force_generic_plan

2018-11-22 Thread Rushabh Lathia
Hi,

Commit b04aeb0a053e7cf7faad89f7d47844d8ba0dc839 add assertions that we hold
some relevant lock during relation open as opening a relation with no lock
at all is unsafe;

With above commit below test case is failing and hitting the newly added
Assert().

Test case:
===

CREATE TABLE foo (x int primary key);
INSERT INTO foo VALUES (1), (2), (3), (4), (5);

CREATE OR REPLACE FUNCTION f1(a int) RETURNS int
AS $$
BEGIN
 DELETE FROM foo where x = a;
 return 0;
END;
$$ LANGUAGE plpgsql;

postgres@100858=#set plan_cache_mode = force_generic_plan;
SET
postgres@100858=#select f1(4);
 f1

  0
(1 row)

postgres@100858=#select f1(4);
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.


Call stack:
=

#0  0x7f230b2d61d7 in raise () from /lib64/libc.so.6
#1  0x7f230b2d78c8 in abort () from /lib64/libc.so.6
#2  0x00a26ce5 in ExceptionalCondition (
conditionName=0xabdca8 "!(lockmode != 0 || (Mode ==
BootstrapProcessing) || CheckRelationLockedByMe(r, 1, 1))",
errorType=0xabc529 "FailedAssertion",
fileName=0xabc668 "heapam.c", lineNumber=1146) at assert.c:54
#3  0x004dacd8 in relation_open (relationId=16387, lockmode=0) at
heapam.c:1144
#4  0x004f94bf in index_open (relationId=16387, lockmode=0) at
indexam.c:155
#5  0x00701403 in ExecInitIndexScan (node=0x1cf8fd8,
estate=0x1cec418, eflags=0) at nodeIndexscan.c:994
#6  0x006dde31 in ExecInitNode (node=0x1cf8fd8, estate=0x1cec418,
eflags=0) at execProcnode.c:217
#7  0x0070d7b8 in ExecInitModifyTable (node=0x1cf8da8,
estate=0x1cec418, eflags=0) at nodeModifyTable.c:2190
#8  0x006ddd39 in ExecInitNode (node=0x1cf8da8, estate=0x1cec418,
eflags=0) at execProcnode.c:174
#9  0x006d48a5 in InitPlan (queryDesc=0x1cdc378, eflags=0) at
execMain.c:1024
#10 0x006d36e9 in standard_ExecutorStart (queryDesc=0x1cdc378,
eflags=0) at execMain.c:265
#11 0x006d3485 in ExecutorStart (queryDesc=0x1cdc378, eflags=0) at
execMain.c:147
#12 0x00725880 in _SPI_pquery (queryDesc=0x1cdc378,
fire_triggers=true, tcount=0) at spi.c:2469
#13 0x007252d4 in _SPI_execute_plan (plan=0x1cc4748,
paramLI=0x1ce23c8, snapshot=0x0, crosscheck_snapshot=0x0, read_only=false,
fire_triggers=true,
tcount=0) at spi.c:2235
#14 0x007222a1 in SPI_execute_plan_with_paramlist (plan=0x1cc4748,
params=0x1ce23c8, read_only=false, tcount=0) at spi.c:516
#15 0x7f23004a5f9a in exec_stmt_execsql (estate=0x7fff0bcaa4d0,
stmt=0x1cc78f0) at pl_exec.c:4114

Here DELETE statement pick the IndexScan plan, and later during execution
when it tries to open index relation, ExecInitIndexScan() assumes that index
relation should already hold require a lock.  But in this case, that's not
the
case - which got exposed through the newly added ASSERT() for NoLock.

When query get's execute first time build_simple_rel() ->
get_relation_info(),
takes a lock on the indexes of a relation (this do happen from the
BuildCachePlan()).  When we execute the same query second time, a plan is
caches
and CheckCachePlan() calls the AquireExecutorLocks() take the require locks
needed for execution of the caches plan.  Here it takes a lock on the rtable
list - but doesn't take any lock on the index relation.

I observed that ExecInitModifyTable() to open the index relation if indices
on
the result relation. But here also - it doesn't do anything for CMD_DELETE:

/*
 * If there are indices on the result relation, open them and save
 * descriptors in the result relation info, so that we can add new
 * index entries for the tuples we add/update.  We need not do this
 * for a DELETE, however, since deletion doesn't affect indexes.
Also,
 * inside an EvalPlanQual operation, the indexes might be open
 * already, since we share the resultrel state with the original
 * query.
 */
if (resultRelInfo->ri_RelationDesc->rd_rel->relhasindex &&
operation != CMD_DELETE &&
resultRelInfo->ri_IndexRelationDescs == NULL)
ExecOpenIndices(resultRelInfo,
node->onConflictAction != ONCONFLICT_NONE);

Now, to fix this issue either we need to hold proper lock before reaching
to ExecInitIndexScan() or teach ExecInitIndexScan() to take AccessShareLock
on the scan coming from CMD_DELETE.

Thoughts/Comments?

Thanks,
Rushabh Lathia
www.EnterpriseDB.com


[Logical replication] Does the initial table data copy break the transactional replication?

2018-11-22 Thread Jinhua Luo
Hi All,

When the publication adds new table, the main apply worker would skip
changes upon that table and launches new sync worker to copy the
existing data of that table, and if possible, the sync worker would
apply changes since the copy start lsn.

Then here comes the question, above logic splits the changes of the
new seen table into other transactions applied by sync worker, so it
breaks the original transaction stream?

Regards,
Jinhua Luo



Re: performance statistics monitoring without spamming logs

2018-11-22 Thread Adrien NAYRAT

On 11/22/18 6:41 AM, Justin Pryzby wrote:

  and then probably look
at the ratio of user CPU/clock time.


Maybe pg_stat_kcache could help you :

https://github.com/powa-team/pg_stat_kcache
https://rjuju.github.io/postgresql/2018/07/17/pg_stat_kcache-2-1-is-out.html