Re: backup manifests

2020-03-04 Thread Rajkumar Raghuwanshi
Hi,

In a negative test scenario, if I changed size to -1 in backup_manifest,
pg_validatebackup giving
error with a random size number.

[edb@localhost bin]$ ./pg_basebackup -p 5551 -D /tmp/bold
--manifest-checksum 'SHA256'
[edb@localhost bin]$ ./pg_validatebackup /tmp/bold
pg_validatebackup: backup successfully verified

--change a file size to -1 and generate new checksum.
[edb@localhost bin]$ vi /tmp/bold/backup_manifest
[edb@localhost bin]$ shasum -a256 /tmp/bold/backup_manifest
c3d7838cbbf991c6108f9c1ab78f673c20d8073114500f14da6ed07ede2dc44a
 /tmp/bold/backup_manifest
[edb@localhost bin]$ vi /tmp/bold/backup_manifest

[edb@localhost bin]$ ./pg_validatebackup /tmp/bold
pg_validatebackup: error: "global/4183" has size 0 on disk but size
*18446744073709551615* in the manifest

Thanks & Regards,
Rajkumar Raghuwanshi


On Thu, Mar 5, 2020 at 9:37 AM Suraj Kharage 
wrote:

>
> On Wed, Mar 4, 2020 at 7:21 PM tushar 
> wrote:
>
>> Hi,
>>
>> There is a scenario in which i add something inside the pg_tablespace
>> directory , i am getting an error like-
>>
>> pg_validatebackup: * manifest_checksum =
>> 77ddacb4e7e02e2b880792a19a3adf09266dd88553dd15cfd0c22caee7d9cc04
>> pg_validatebackup: error: "pg_tblspc/16385/*PG_13_202002271*/test" is
>> present on disk but not in the manifest
>>
>> but if i remove 'PG_13_202002271 ' directory then there is no error
>>
>> [centos@tushar-ldap-docker bin]$ ./pg_validatebackup data
>> pg_validatebackup: * manifest_checksum =
>> 77ddacb4e7e02e2b880792a19a3adf09266dd88553dd15cfd0c22caee7d9cc04
>> pg_validatebackup: backup successfully verified
>>
>>
> This seems expected considering current design as we don't log the
> directory entries in backup_manifest. In your case, you have tablespace
> with no objects (empty tablespace) then backup_manifest does not have any
> entry for this hence when you remove this tablespace directory, validator
> could not detect it.
>
> We can either document it or add the entry for directories in the
> manifest. Robert may have a better idea on this.
>
> --
> --
>
> Thanks & Regards,
> Suraj kharage,
> EnterpriseDB Corporation,
> The Postgres Database Company.
>


Re: Identifying user-created objects

2020-03-04 Thread Kyotaro Horiguchi
At Thu, 5 Mar 2020 15:21:49 +0900, Masahiko Sawada 
 wrote in 
> > > I don't come up with another use cases but, anyway, I think we need to
> > > clarify the scope of the feature.
> >
> > Agreed. Also we would need to consider that the existing approach
> > (e.g., checking whether the object is defined under pg_catalog or not,
> > or seeing pg_stat_user_functions, _indexes, and _tables) is enough
> > for the use cases. If enough, new function might not be necessary.
> > If not enough, we might also need to reconsider the definitions of
> > pg_stat_user_xxx after considering the function.
> >
> 
> Originally the motivation of this feature is that while studying PCI
> DSS 2.2.5 I thought that a running PostgreSQL server is not able to
> prove that there is no  malicious function in database. PCI DSS 2.2.5
> states "Remove all unnecessary functionality, such as scripts,
> drivers, features, subsystems, file systems, and unnecessary web
> servers." If we want to clarify unnecessary or malicious functions we
> can check public schema and user schema but once a function is created
> on pg_proc we cannot distinguish whether it's a built-in (i.g. safe)
> function or not. I totally agree that if malicious someone logs in as
> a superuser he/she can do anything more serious than changing catalog
> contents but I wanted to have a way to prove soundness of running
> database.

Thanks for the elaboration.  It doesn't seem to me as the
resposibility of PostgreSQL program.  The same can be said to OSes.

I think the section is not saying that "keep you system only with
defaultly installed components", but "remove all features unncecessary
to your system even if it is defaultly installed as far as you can".
And whether A system is needing a feature or not cannot be the matter
of PostgreSQL or OSes.

So you need to remove some system-admistrative functions if you know
it is not required by your system in order to comform the
requirement. But they would be "non-user-defined" objects.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: PG_COLOR not mentioned in docs of vacuumlo, oid2name and pgbench

2020-03-04 Thread Michael Paquier
On Wed, Mar 04, 2020 at 10:05:30PM +0900, Michael Paquier wrote:
> On Wed, Mar 04, 2020 at 11:31:27AM +0100, Fabien COELHO wrote:
>> For "pgbench", you could also add the standard sentence that it uses libpq
>> environment variables, as it is also missing?
> 
> Yeah, that's true.  Let's fix this part while on it.

So, combining the feedback from Fabien, Juan and Daniel I am finishing
with the attached.  Any thoughts?
--
Michael
diff --git a/src/bin/pg_ctl/pg_ctl.c b/src/bin/pg_ctl/pg_ctl.c
index f41084d2db..88da25167a 100644
--- a/src/bin/pg_ctl/pg_ctl.c
+++ b/src/bin/pg_ctl/pg_ctl.c
@@ -2279,7 +2279,6 @@ main(int argc, char **argv)
 	int			c;
 	pgpid_t		killproc = 0;
 
-	pg_logging_init(argv[0]);
 	progname = get_progname(argv[0]);
 	set_pglocale_pgservice(argv[0], PG_TEXTDOMAIN("pg_ctl"));
 	start_time = time(NULL);
diff --git a/doc/src/sgml/oid2name.sgml b/doc/src/sgml/oid2name.sgml
index c7ebd61c6b..acb7ee757f 100644
--- a/doc/src/sgml/oid2name.sgml
+++ b/doc/src/sgml/oid2name.sgml
@@ -223,6 +223,13 @@
utilities, also uses the environment variables supported by
libpq (see ).
   
+
+  
+   The environment variable PG_COLOR specifies whether to use
+   color in diagnostics messages. Possible values are
+   always, auto and
+   never.
+  
  
 
  
diff --git a/doc/src/sgml/ref/clusterdb.sgml b/doc/src/sgml/ref/clusterdb.sgml
index b25845ffc6..0633a42981 100644
--- a/doc/src/sgml/ref/clusterdb.sgml
+++ b/doc/src/sgml/ref/clusterdb.sgml
@@ -279,8 +279,8 @@ PostgreSQL documentation
 PG_COLOR
 
  
-  Specifies whether to use color in diagnostics messages.  Possible values
-  are always, auto,
+  Specifies whether to use color in diagnostics messages. Possible values
+  are always, auto and
   never.
  
 
diff --git a/doc/src/sgml/ref/createdb.sgml b/doc/src/sgml/ref/createdb.sgml
index 8fc8128bf9..b10216d154 100644
--- a/doc/src/sgml/ref/createdb.sgml
+++ b/doc/src/sgml/ref/createdb.sgml
@@ -327,8 +327,8 @@ PostgreSQL documentation
 PG_COLOR
 
  
-  Specifies whether to use color in diagnostics messages.  Possible values
-  are always, auto,
+  Specifies whether to use color in diagnostics messages. Possible values
+  are always, auto and
   never.
  
 
diff --git a/doc/src/sgml/ref/createuser.sgml b/doc/src/sgml/ref/createuser.sgml
index abe25f17d0..d2349d3aca 100644
--- a/doc/src/sgml/ref/createuser.sgml
+++ b/doc/src/sgml/ref/createuser.sgml
@@ -405,8 +405,8 @@ PostgreSQL documentation
 PG_COLOR
 
  
-  Specifies whether to use color in diagnostics messages.  Possible values
-  are always, auto,
+  Specifies whether to use color in diagnostics messages. Possible values
+  are always, auto and
   never.
  
 
diff --git a/doc/src/sgml/ref/dropdb.sgml b/doc/src/sgml/ref/dropdb.sgml
index f79bbb9a85..e4daa4f37d 100644
--- a/doc/src/sgml/ref/dropdb.sgml
+++ b/doc/src/sgml/ref/dropdb.sgml
@@ -245,8 +245,8 @@ PostgreSQL documentation
 PG_COLOR
 
  
-  Specifies whether to use color in diagnostics messages.  Possible values
-  are always, auto,
+  Specifies whether to use color in diagnostics messages. Possible values
+  are always, auto and
   never.
  
 
diff --git a/doc/src/sgml/ref/dropuser.sgml b/doc/src/sgml/ref/dropuser.sgml
index 72bb7e8535..c1c9a68264 100644
--- a/doc/src/sgml/ref/dropuser.sgml
+++ b/doc/src/sgml/ref/dropuser.sgml
@@ -225,8 +225,8 @@ PostgreSQL documentation
 PG_COLOR
 
  
-  Specifies whether to use color in diagnostics messages.  Possible values
-  are always, auto,
+  Specifies whether to use color in diagnostics messages. Possible values
+  are always, auto and
   never.
  
 
diff --git a/doc/src/sgml/ref/initdb.sgml b/doc/src/sgml/ref/initdb.sgml
index da5c8f5307..16941df3c8 100644
--- a/doc/src/sgml/ref/initdb.sgml
+++ b/doc/src/sgml/ref/initdb.sgml
@@ -466,8 +466,8 @@ PostgreSQL documentation
 PG_COLOR
 
  
-  Specifies whether to use color in diagnostics messages.  Possible values
-  are always, auto,
+  Specifies whether to use color in diagnostics messages. Possible values
+  are always, auto and
   never.
  
 
diff --git a/doc/src/sgml/ref/pg_basebackup.sgml b/doc/src/sgml/ref/pg_basebackup.sgml
index fc9ba2e8b0..ed1706208d 100644
--- a/doc/src/sgml/ref/pg_basebackup.sgml
+++ b/doc/src/sgml/ref/pg_basebackup.sgml
@@ -706,8 +706,8 @@ PostgreSQL documentation
 
   
The environment variable PG_COLOR specifies whether to use
-   color in diagnostics messages.  Possible values are
-   always, auto,
+   color in diagnostics messages. Possible values are
+   always, auto and
never.
   
  
diff --git a/doc/src/sgml/ref/pg_checksums.sgml b/doc/src/sgml/ref/pg_checksums.sgml
index 162bafdb33..9127c2251f 100644
--- a/doc/src/sgml/ref/pg_checksums.sgml
+++ b/doc/src/sgml/ref/pg_checksums.sgml
@@ -191,8 +191,8 @@ PostgreSQL documentation
  

Re: pg_stat_progress_basebackup - progress reporting for pg_basebackup, in the server side

2020-03-04 Thread Peter Eisentraut

On 2020-03-05 05:53, Fujii Masao wrote:

Or, as another approach, it might be worth considering to make
the server always estimate the total backup size whether --progress is
specified or not, as Amit argued upthread. If the time required to
estimate the backup size is negligible compared to total backup time,
IMO this approach seems better. If we adopt this, we can also get
rid of PROGESS option from BASE_BACKUP replication command.


I think that would be preferable.

--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Make mesage at end-of-recovery less scary.

2020-03-04 Thread Kyotaro Horiguchi
Hello.

I changed the condition from randAccess to fetching_ckpt considering
the discussion in another thread [1]. Then I moved the block that
shows the new messages to more appropriate place.

At Fri, 28 Feb 2020 17:28:06 +0900 (JST), Kyotaro Horiguchi 
 wrote in 
> > 
> > Have you considered an error context here?  Your patch leads to a bit
> > of duplication with the message a bit down of what you are changing
> > where the end of local pg_wal is reached.
> 
> It is a DEBUG message and it is for the time moving from crash
> recovery to archive recovery. I could remove that but decided to leave
> it for tracability.

I modified the message so that it has the same look to the new
messages, but I left it being DEBUG1, since it is just a intermediate
state. We should finally see one of the new three messages.

After the messages changed, another message from wal sender came to
look redundant.

| [20866] LOG:  replication terminated by primary server
| [20866] DETAIL:  End of WAL reached on timeline 1 at 0/30001C8.
| [20866] FATAL:  could not send end-of-streaming message to primary: no COPY 
in progress
| [20851] LOG:  reached end of WAL at 0/30001C8 on timeline 1 in archive during 
standby mode
| [20851] DETAIL:  invalid record length at 0/30001C8: wanted 24, got 0

I changed the above to the below, which looks more adequate.

| [24271]  LOG:  replication terminated by primary server on timeline 1 at 
0/3000240.
| [24271]  FATAL:  could not send end-of-streaming message to primary: no COPY 
in progress
| [24267]  LOG:  reached end of WAL at 0/3000240 on timeline 1 in archive 
during standby mode
| [24267]  DETAIL:  invalid record length at 0/3000240: wanted 24, got 0

> > > + * reached the end of WAL.  Otherwise something's really wrong and
> > > + * we report just only the errormsg if any. If we don't receive
> > 
> > This sentence sounds strange to me.  Or you meant "Something is wrong,
> > so use errormsg as report if it is set"?

The message no longer exists.

> > > + (errmsg ("rached end of WAL at %X/%X on timeline %u in %s during crash 
> > > recovery",
> > 
> > FWIW, you are introducing three times the same typo, in the same
> > word, in three different messages.
> 
> They're copy-pasto.  I refrained from constructing an error message
> from multiple nonindipendent parts.  Are you suggesting to do so?

The tree times repetition of almost same phrases is very unreadable. I
rewrote it in more simple shape.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From 17ee82e5d44dd5a932ed69b8a1ea91a23d170952 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Fri, 28 Feb 2020 15:52:58 +0900
Subject: [PATCH v2] Make End-Of-Recovery error less scary

When recovery in any type ends, we see a bit scary error message like
"invalid record length" that suggests something serious is happening.
Make this message less scary as "reached end of WAL".
---
 src/backend/access/transam/xlog.c | 72 ---
 src/backend/replication/walreceiver.c |  3 +-
 2 files changed, 55 insertions(+), 20 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index d19408b3be..849cf6fe6b 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -4282,12 +4282,15 @@ ReadRecord(XLogReaderState *xlogreader, int emode,
 	for (;;)
 	{
 		char	   *errormsg;
+		XLogRecPtr	ErrRecPtr = InvalidXLogRecPtr;
 
 		record = XLogReadRecord(xlogreader, &errormsg);
 		ReadRecPtr = xlogreader->ReadRecPtr;
 		EndRecPtr = xlogreader->EndRecPtr;
 		if (record == NULL)
 		{
+			ErrRecPtr = ReadRecPtr ? ReadRecPtr : EndRecPtr;
+
 			if (readFile >= 0)
 			{
 close(readFile);
@@ -4295,14 +4298,16 @@ ReadRecord(XLogReaderState *xlogreader, int emode,
 			}
 
 			/*
-			 * We only end up here without a message when XLogPageRead()
-			 * failed - in that case we already logged something. In
-			 * StandbyMode that only happens if we have been triggered, so we
-			 * shouldn't loop anymore in that case.
+			 * If we are fetching checkpoint, we emit the error message right
+			 * now. Otherwise the error is regarded as "end of WAL" and the
+			 * message if any is shown as a part of the end-of-WAL message
+			 * below.
 			 */
-			if (errormsg)
-ereport(emode_for_corrupt_record(emode, EndRecPtr),
+			if (fetching_ckpt && errormsg)
+			{
+ereport(emode_for_corrupt_record(emode, ErrRecPtr),
 		(errmsg_internal("%s", errormsg) /* already translated */ ));
+			}
 		}
 
 		/*
@@ -4332,11 +4337,12 @@ ReadRecord(XLogReaderState *xlogreader, int emode,
 			/* Great, got a record */
 			return record;
 		}
-		else
-		{
-			/* No valid record available from this source */
-			lastSourceFailed = true;
 
+		/* No valid record available from this source */
+		lastSourceFailed = true;
+
+		if (!fetching_ckpt)
+		{
 			/*
 			 * If archive recovery was requested, but we were still doing
 			 * crash recovery, switch to archive recovery and retry using t

Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager

2020-03-04 Thread Amit Kapila
On Wed, Mar 4, 2020 at 11:45 AM Mahendra Singh Thalor
 wrote:
>
> On Mon, 24 Feb 2020 at 15:39, Amit Kapila  wrote:
> >
> > On Thu, Feb 20, 2020 at 8:06 AM Andres Freund  wrote:
> > >
> > > What I'm advocating is that extension locks should continue to go
> > > through lock.c. And yes, that requires some changes to group locking,
> > > but I still don't see why they'd be complicated.
> > >
> >
> > Fair position, as per initial analysis, I think if we do below three
> > things, it should work out without changing to a new way of locking
> > for relation extension or page type locks.
> > a. As per the discussion above, ensure in code we will never try to
> > acquire another heavy-weight lock after acquiring relation extension
> > or page type locks (probably by having Asserts in code or maybe some
> > other way).
> > b. Change lock.c so that group locking is not considered for these two
> > lock types. For ex. in LockCheckConflicts, along with the check (if
> > (proclock->groupLeader == MyProc && MyProc->lockGroupLeader == NULL)),
> > we also check lock->tag and call it a conflict for these two locks.
> > c. The deadlock detector can ignore checking these two types of locks
> > because point (a) ensures that those won't lead to deadlock.  One idea
> > could be that FindLockCycleRecurseMember just ignores these two types
> > of locks by checking the lock tag.
>
> Thanks Amit for summary.
>
> Based on above 3 points, here attaching 2 patches for review.
>
> 1. v01_0001-Conflict-EXTENTION-lock-in-group-member.patch (Patch by Dilip 
> Kumar)
> Basically this patch is for point b and c.
>
> 2. v01_0002-Added-assert-to-verify-that-we-never-try-to-take-any.patch
> (Patch by me)
> This patch is for point a.
>
> After applying both the patches, make check-world is passing.
>
> We are testing both the patches and will post results.
>

I think we need to do detailed code review in the places where we are
taking Relation Extension Lock and see whether we are acquiring
another heavy-weight lock after that. It seems to me that in
brin_getinsertbuffer, after acquiring Relation Extension Lock, we
might again try to acquire the same lock.  See
brin_initialize_empty_new_buffer which is called after acquiring
Relation Extension Lock, in that function, we call
RecordPageWithFreeSpace and that can again try to acquire the same
lock if it needs to perform fsm_extend.  I think there will be similar
instances in the code.  I think it is fine if we again try to acquire
it, but the current assertion in your patch needs to be adjusted for
that.

Few other minor comments on
v01_0002-Added-assert-to-verify-that-we-never-try-to-take-any:
1. Ideally, this should be the first patch as we first need to ensure
that we don't take any heavy-weight locks after acquiring a relation
extension lock.

2. I think it is better to add an Assert after initial error checks
(after RecoveryInProgress().. check)

3.
+ Assert (locallock->tag.lock.locktag_type != LOCKTAG_RELATION_EXTEND ||
+ locallock->nLocks == 0);

I think it is not possible that we have an entry in
LockMethodLocalHash and its value is zero.  Do you see any such
possibility, if not, then we might want to remove it?

4. We already have a macro for LOCALLOCK_LOCKMETHOD, can we write
another one tag type?  This will make the check look a bit cleaner and
probably if we need to extend it in future for Page type locks, then
also it will be good.

5. I have also tried to think of another way to check if we already
hold lock type LOCKTAG_RELATION_EXTEND, but couldn't come up with a
cheaper way than this.  Basically, I think if we traverse the
MyProc->myProcLocks queue, we will get this information, but that
doesn't seem much cheaper than this.

6. Another thing that could be possible is to make this a test and
elog so that it can hit in production scenarios, but I think the cost
of that will be high unless we have a very simple way to write this
test condition.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com




Re: Berserk Autovacuum (let's save next Mandrill)

2020-03-04 Thread David Rowley
On Thu, 5 Mar 2020 at 04:15, Laurenz Albe  wrote:
> I just realized that the exercise is pointless unless that
> autovacuum also runs with FREEZE on.

I think we need to move forward with doing something to cope with
INSERT-only tables not being auto-vacuumed.

I think the patch you have is something along the lines to what I'd
have imagined we should do. However, there are a few things that I'd
do a different way.

1. I'd go for 2 new GUCs and reloptions.
autovacuum_vacuum_insert_threshold (you're currently calling this
autovacuum_vacuum_insert_limit. I don't see why the word "limit" is
relevant here). The other GUC I think should be named
autovacuum_vacuum_insert_scale_factor and these should work exactly
the same way as autovacuum_vacuum_threshold and
autovacuum_vacuum_scale_factor, but be applied in a similar way to the
vacuum settings, but only be applied after we've checked to ensure the
table is not otherwise eligible to be vacuumed.

2. I believe you're right in setting the freeze_min_age to 0 rather
than freeze_min_age.  My understanding of freeze_min_age is that we
don't too pro-actively freeze tuples as in many workloads, a freshly
INSERTed tuple is more likely to receive an UPDATE than an old tuple
is.  e.g something like new orders having their status updated various
times until the order is complete, where it's not updated ever again.
In the INSERT-only case, there seems to be not much reason not to just
freeze right away.

3. The name "insert_only" does not seem the best for the new boolean
variable that you're using in various places.  That name seems to be
too closely related to our current intended use case. Maybe
skip_index_cleanup is more to the point.

4. Are you sure you mean "Maximum" here?  Isn't it the minimum? At
least it will be once you add both options. Otherwise, I think Maximum
is not the correct word. Perhaps "The threshold"

+ {"autovacuum_vacuum_insert_limit", PGC_SIGHUP, AUTOVACUUM,
+ gettext_noop("Maximum number of tuple inserts prior to vacuum."),
+ NULL
+ },


5. I think the new field in this struct should be named vacuum_insert_threshold

@@ -252,6 +252,7 @@ typedef struct AutoVacOpts
 {
  bool enabled;
  int vacuum_threshold;
+ int vacuum_ins_limit;

6. Are you sure you meant to default this to 50?

index e58e4788a8..9d96d58ed2 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -598,6 +598,8 @@
 #autovacuum_naptime = 1min # time between autovacuum runs
 #autovacuum_vacuum_threshold = 50 # min number of row updates before
  # vacuum
+#autovacuum_vacuum_insert_limit = 50 # max number of row inserts before
+ # vacuum

Seems excessive given there's no scale factor in the current patch.

7. I know you know missing docs... would be good to get those.

8. Should we care when setting the insert counter back to 0 if
auto-vacuum has skipped pages?

9. You should add a new column to the pg_stat_all_tables view to allow
visibility of the insert since the last vacuum.  The column should be
named n_ins_since_vacuum. This seems like the best combination of
n_mod_since_analyze and n_tup_ins.

10. I'm slightly worried about the case where we don't quite trigger a
normal vacuum but trigger a vacuum due to INSERTs then skip cleaning
up the indexes but proceed to leave dead index entries causing indexes
to become bloated.  It does not seem impossible that given the right
balance of INSERTs and UPDATE/DELETEs that this could happen every
time and the indexes would just become larger and larger.

It's pretty easy to see this in action with:

create extension if not exists pgstattuple;
create table t0 (a int primary key);
alter table t0 set (autovacuum_enabled=off);
insert into t0 select generate_Series(1,100);

delete from t0 where a&1=0; vacuum (index_cleanup off) t0; insert into
t0 select generate_series(2,100,2); select * from
pgstattuple('t0'),pg_relation_size('t0') as t0_size; select n_dead_tup
from pg_stat_all_tables where relid = 't0'::regclass; -- repeat this a
few times and watch the indexes bloat

11. We probably do also need to debate if we want this on or off by
default.   I'd have leaned towards enabling by default if I'd not
personally witnessed the fact that people rarely* increase auto-vacuum
to run faster than the standard cost settings. I've seen hundreds of
servers over the years with all workers busy for days on something
they'll never finish quickly enough.  We increased those settings 10x
in PG12, so there will be fewer people around suffering from that now,
but even after having reduced the vacuum_cost_delay x10 over the PG11
settings, it's by no means fast enough for everyone.  I've mixed
feelings about giving auto-vacuum more work to do for those people, so
perhaps the best option is to keep this off by default so as not to
affect the people who don't tune auto-vacuum.  They'll just suffer the
pain all at once when they hit max freeze age instead of more
gradually with the additional load o

Re: [HACKERS] make async slave to wait for lsn to be replayed

2020-03-04 Thread Michael Paquier
On Wed, Mar 04, 2020 at 07:17:31AM -0500, David Steele wrote:
> On 3/4/20 5:36 AM, Fujii Masao wrote:
>> There are no updates from about two years before, but this patch
>> has been registered in CF 2020-03. Not sure why. It should be marked
>> as Returned with Feedback again?
> 
> Worse, it was marked Needs Review even though no new patch was provided.
> 
> I'm going to set this back to Returned with Feedback.  If anyone has a good
> reason that it should be in the CF we can always revive it.

+1.
--
Michael


signature.asc
Description: PGP signature


Re: WAL usage calculation patch

2020-03-04 Thread Michael Paquier
On Wed, Mar 04, 2020 at 05:02:25PM +0100, Julien Rouhaud wrote:
> I'm quite worried about the stability of those counters for regression tests.
> Wouldn't a checkpoint happening during the test change them?

Yep.  One way to go through that would be to test if this output is
non-zero still I suspect at quick glance that this won't be entirely
reliable either.

> While at it, did you consider adding a full-page image counter in the 
> WalUsage?
> That's something I'd really like to have and it doesn't seem hard to 
> integrate.

FWIW, one reason here is that we had recently some benchmark work done
internally where this would have been helpful in studying some spiky
WAL load patterns.
--
Michael


signature.asc
Description: PGP signature


Re: [Patch] pg_rewind: options to use restore_command from recovery.conf or command line

2020-03-04 Thread Michael Paquier
On Wed, Mar 04, 2020 at 08:14:20PM +0300, Alexey Kondratov wrote:
> On 04.03.2020 10:45, Michael Paquier wrote:
> - *        Functions for finding and validating executable files
> + *        Functions for finding and validating from executables files
> 
> There is probably something missing here. Finding and validating what? And
> 'executables files' does not seem to be correct as well.

Oops.  I was tweaking this part at some point of my review, but
discarded the change afterwards.

> +        # First, remove all the content in the archive directory,
> +        # as RecursiveCopy::copypath does not support copying to
> +        # existing directories.
> 
> I think that 'remove all the content' is not completely correct in this
> case. We are simply removing archive directory. There is no content there
> yet, so 'First, remove archive directory...' should be fine.

Indeed, this can be improved.  I still need to do an extra pass on
this patch set.

>> - I did not actually get why you don't check for a missing command
>> when using wait_result_is_any_signal.  In this case I'd think that it
>> is better to exit immediately as follow-up calls would just fail.
> 
> Believe me or not, but I put 'false' there intentionally. The idea was that
> if the reason is a signal, then maybe user tired of waiting and killed that
> restore_command process theirself or something like that, so it is better to
> exit immediately. If it was a missing command, then there is no hurry, so we
> can go further and complain that attempt of recovering WAL segment has
> failed.
> 
> Actually, I guess that there is no big difference if we include missing
> command here or not. There is no complicated logic further compared to real
> recovery process in Postgres, where we cannot simply return false in that
> case.

On the contrary, it seems to me that the difference is very important.
Imagine for example a frontend tool which calls RestoreArchivedWALFile
in a loop, and that this one fails because the command called is
missing.  This tool would keep looping for nothing.  So checking for a
missing command and leaving immediately would be more helpful for the
user.  Can you think about scenarios where it would make sense to be
able to loop in this case instead of failing?
--
Michael


signature.asc
Description: PGP signature


Re: Identifying user-created objects

2020-03-04 Thread Masahiko Sawada
On Thu, 5 Mar 2020 at 13:23, Fujii Masao  wrote:
>
>
>
> On 2020/03/05 12:32, Kyotaro Horiguchi wrote:
> > At Wed, 4 Mar 2020 21:07:05 +0900, Fujii Masao 
> >  wrote in
>  The function that you are proposing is really enough for this use
>  case?
>  What if malicious users directly change the oid of function
>  to < FirstNormalObjectId? Or you're assuming that malicious users 
>  will
>  never log in as superuser and not be able to change the oid?
> >>>
> >>> That's a good point! I'm surprised that user is allowed to update an
> >>> oid of database object. In addition, surprisingly we can update it to
> >>> 0, which in turn leads the assertion failure:
> >>
> >> Since non-superusers are not allowed to do that by default,
> >> that's not so bad? That is, to avoid such unexpected change of oid,
> >> admin just should prevent malicious users from logging in as
> >> superusers
> >> and not give the permission on system catalogs to such users.
> >>
> >
> > I think there is still insider threats. As long as we depend on
> > superuser privilege to do some DBA work, a malicious DBA might be able
> > to log in as superuser and modify oid.
> 
>  Yes. But I'm sure that DBA has already considered the measures
>  againt such threads. Otherwise malicious users can do anything
>  more malicious rather than changing oid.
> >>> Agreed. So that's not a serious problem in practice but we cannot say
> >>> the checking by pg_is_user_object() is totally enough for checking
> >>> whether malicious object exists or not. Is that right?
> >>
> >> Yes.
> >>
> >> My opinion is that, if malious users are not allowed to log in
> >> as superusers and the admin give no permission on the system
> >> schema/catalog to them, checking whether the object is defined
> >> under pg_catalog schema or not is enough for your purpose.
> >> Because they are also not allowed to create the object under
> >> pg_catalog. pg_is_user_object() seems not necessary.
> >>
> >> OTOH, if you address the case where malicious users can create
> >> the object under pg_catalog, of course, checking whether
> >> the object is defined under pg_catalog schema or not is enough
> >> for the purpose. But pg_is_user_object() is also not enough
> >> because such users can change oid.
> >
> > The discussion seems assuming the feature is related to some security
> > measure.  But I think I haven't seen the objective or use case for the
> > feature. I don't see how we should treat them according the result
> > from the "user-defined objects detection" feature.
> >
> > For example, we could decide a function whether to be pushed-out or
> > not to remote server on postgres_fdw. In this case, we need to ask "is
> > the behavior of this function known to us?", in short, "is this
> > function is predefined?".  In this use case, we have no concern if DBA
> > have added some functions as "not user-defined", since it's their own
> > risk.
> >
> > I don't come up with another use cases but, anyway, I think we need to
> > clarify the scope of the feature.
>
> Agreed. Also we would need to consider that the existing approach
> (e.g., checking whether the object is defined under pg_catalog or not,
> or seeing pg_stat_user_functions, _indexes, and _tables) is enough
> for the use cases. If enough, new function might not be necessary.
> If not enough, we might also need to reconsider the definitions of
> pg_stat_user_xxx after considering the function.
>

Originally the motivation of this feature is that while studying PCI
DSS 2.2.5 I thought that a running PostgreSQL server is not able to
prove that there is no  malicious function in database. PCI DSS 2.2.5
states "Remove all unnecessary functionality, such as scripts,
drivers, features, subsystems, file systems, and unnecessary web
servers." If we want to clarify unnecessary or malicious functions we
can check public schema and user schema but once a function is created
on pg_proc we cannot distinguish whether it's a built-in (i.g. safe)
function or not. I totally agree that if malicious someone logs in as
a superuser he/she can do anything more serious than changing catalog
contents but I wanted to have a way to prove soundness of running
database.

Regards,

-- 
Masahiko Sawadahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Identifying user-created objects

2020-03-04 Thread Michael Paquier
On Wed, Mar 04, 2020 at 06:57:00PM +0900, Fujii Masao wrote:
> Yes. But I'm sure that DBA has already considered the measures
> againt such threads. Otherwise malicious users can do anything
> more malicious rather than changing oid.

A superuser is by definition able to do anything on the system using
the rights of the OS user running the Postgres backend.  One thing for
example is to take a base backup of the full instance, but you can do
much more interesting things once you have such rights.  So I don't
quite get the line of arguments used on this thread regarding the
relation with somebody being malicious with superuser rights, and the
arguments about a superuser able to manipulate freely the catalog's
contents.
--
Michael


signature.asc
Description: PGP signature


Re: Cast to uint16 in pg_checksum_page()

2020-03-04 Thread Michael Paquier
On Wed, Mar 04, 2020 at 09:52:08PM +0900, Michael Paquier wrote:
> This sounds like a sensible argument, similar to the ones raised on
> the other thread, so no objections from me to improve things here.  I
> can look at that tomorrow, except if somebody else beats me to it.

And done.
--
Michael


signature.asc
Description: PGP signature


Re: Change atoi to strtol in same place

2020-03-04 Thread Joe Nelson
Daniel Gustafsson wrote:

> > On 11 Feb 2020, at 17:54, Alvaro Herrera  wrote:
> >
> > This patch doesn't currently apply; it has conflicts with at least
> > 01368e5d9da7 and 7e735035f208; even in 7e735035f208^ it applies with
> > fuzz.  Please post an updated version so that it can move forward.
> 
> Ping.  With the 2020-03 CommitFest now under way, are you able to supply a
> rebased patch for consideration?

My patch relies on another that was previously returned with feedback in
the 2019-11 CF: https://commitfest.postgresql.org/25/2272/

I've lost interest in continuing to rebase this. Someone else can take over the
work if they are interested in it. Otherwise just close the CF entry.




Re: pg_stat_progress_basebackup - progress reporting for pg_basebackup, in the server side

2020-03-04 Thread Fujii Masao




On 2020/03/05 9:31, Magnus Hagander wrote:

On Mon, Mar 2, 2020 at 10:03 PM Fujii Masao  wrote:




On 2020/03/03 14:37, Shinoda, Noriyoshi (PN Japan A&PS Delivery) wrote:

Hi,

Thank you for developing good features.
The attached patch is a small fix to the committed documentation. This patch 
fixes the description literal for the backup_streamed column.


Thanks for the report and patch! Pushed.


This patch requires, AIUI, that you add -P to the pg_basebackup
commandline in order to get the progress tracked in details
serverside.


Whether --progress is enabled or not, the pg_stat_progress_basebackup
view report the progress of the backup in the server side. But the total
amount of data that will be streamed is estimated and reported only when
this option is enabled.


But this also generates output in the client that one
might not want.

Should we perhaps have a switch in pg_basebackup that enables the
server side tracking only, without generating output in the client?


Yes, this sounds reasonable.

I have two ideas.

(1) Extend --progress option so that it accepts the setting values like
none, server, both (better names?). If both is specified, PROGRESS
option is specified in BASE_BACKUP replication command and
the total backup size is estimated in the server side, but the progress
is not reported in a client side. If none, PROGRESS option is not
specified in BASE_BACKUP. The demerit of this idea is that --progress
option without argument is not supported yet and the existing
application using --progress option when using pg_basebackup needs
to be updated when upgrading PostgreSQL version to v13.

(2) Add something like --estimate-backup-size (better name?) option
to pg_basebackup. If it's specified, PROGRESS option is specified but
the progress is not reported in a client side.

Thought?

Or, as another approach, it might be worth considering to make
the server always estimate the total backup size whether --progress is
specified or not, as Amit argued upthread. If the time required to
estimate the backup size is negligible compared to total backup time,
IMO this approach seems better. If we adopt this, we can also get
rid of PROGESS option from BASE_BACKUP replication command.

Regards,


--
Fujii Masao
NTT DATA CORPORATION
Advanced Platform Technology Group
Research and Development Headquarters




Re: Identifying user-created objects

2020-03-04 Thread Fujii Masao




On 2020/03/05 12:32, Kyotaro Horiguchi wrote:

At Wed, 4 Mar 2020 21:07:05 +0900, Fujii Masao  
wrote in

The function that you are proposing is really enough for this use
case?
What if malicious users directly change the oid of function
to < FirstNormalObjectId? Or you're assuming that malicious users will
never log in as superuser and not be able to change the oid?


That's a good point! I'm surprised that user is allowed to update an
oid of database object. In addition, surprisingly we can update it to
0, which in turn leads the assertion failure:


Since non-superusers are not allowed to do that by default,
that's not so bad? That is, to avoid such unexpected change of oid,
admin just should prevent malicious users from logging in as
superusers
and not give the permission on system catalogs to such users.



I think there is still insider threats. As long as we depend on
superuser privilege to do some DBA work, a malicious DBA might be able
to log in as superuser and modify oid.


Yes. But I'm sure that DBA has already considered the measures
againt such threads. Otherwise malicious users can do anything
more malicious rather than changing oid.

Agreed. So that's not a serious problem in practice but we cannot say
the checking by pg_is_user_object() is totally enough for checking
whether malicious object exists or not. Is that right?


Yes.

My opinion is that, if malious users are not allowed to log in
as superusers and the admin give no permission on the system
schema/catalog to them, checking whether the object is defined
under pg_catalog schema or not is enough for your purpose.
Because they are also not allowed to create the object under
pg_catalog. pg_is_user_object() seems not necessary.

OTOH, if you address the case where malicious users can create
the object under pg_catalog, of course, checking whether
the object is defined under pg_catalog schema or not is enough
for the purpose. But pg_is_user_object() is also not enough
because such users can change oid.


The discussion seems assuming the feature is related to some security
measure.  But I think I haven't seen the objective or use case for the
feature. I don't see how we should treat them according the result
from the "user-defined objects detection" feature.

For example, we could decide a function whether to be pushed-out or
not to remote server on postgres_fdw. In this case, we need to ask "is
the behavior of this function known to us?", in short, "is this
function is predefined?".  In this use case, we have no concern if DBA
have added some functions as "not user-defined", since it's their own
risk.

I don't come up with another use cases but, anyway, I think we need to
clarify the scope of the feature.


Agreed. Also we would need to consider that the existing approach
(e.g., checking whether the object is defined under pg_catalog or not,
or seeing pg_stat_user_functions, _indexes, and _tables) is enough
for the use cases. If enough, new function might not be necessary.
If not enough, we might also need to reconsider the definitions of
pg_stat_user_xxx after considering the function.

Regards,

--
Fujii Masao
NTT DATA CORPORATION
Advanced Platform Technology Group
Research and Development Headquarters




Re: Minor issues in .pgpass

2020-03-04 Thread Fujii Masao




On 2020/03/04 23:01, Hamid Akhtar wrote:

The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:   tested, passed
Spec compliant:   tested, passed
Documentation:tested, passed

Tested and looks fine to me.

The new status of this patch is: Ready for Committer


Many thanks for testing and reviewing the patch!
I pushed it.

Regards,

--
Fujii Masao
NTT DATA CORPORATION
Advanced Platform Technology Group
Research and Development Headquarters




Re: backup manifests

2020-03-04 Thread Suraj Kharage
On Wed, Mar 4, 2020 at 7:21 PM tushar  wrote:

> Hi,
>
> There is a scenario in which i add something inside the pg_tablespace
> directory , i am getting an error like-
>
> pg_validatebackup: * manifest_checksum =
> 77ddacb4e7e02e2b880792a19a3adf09266dd88553dd15cfd0c22caee7d9cc04
> pg_validatebackup: error: "pg_tblspc/16385/*PG_13_202002271*/test" is
> present on disk but not in the manifest
>
> but if i remove 'PG_13_202002271 ' directory then there is no error
>
> [centos@tushar-ldap-docker bin]$ ./pg_validatebackup data
> pg_validatebackup: * manifest_checksum =
> 77ddacb4e7e02e2b880792a19a3adf09266dd88553dd15cfd0c22caee7d9cc04
> pg_validatebackup: backup successfully verified
>
>
This seems expected considering current design as we don't log the
directory entries in backup_manifest. In your case, you have tablespace
with no objects (empty tablespace) then backup_manifest does not have any
entry for this hence when you remove this tablespace directory, validator
could not detect it.

We can either document it or add the entry for directories in the manifest.
Robert may have a better idea on this.

-- 
--

Thanks & Regards,
Suraj kharage,
EnterpriseDB Corporation,
The Postgres Database Company.


Re: reindex concurrently and two toast indexes

2020-03-04 Thread Michael Paquier
On Wed, Mar 04, 2020 at 09:21:45AM +0100, Julien Rouhaud wrote:
> Thanks for the patch!  I started to look at it during the weekend, but
> I got interrupted and unfortunately didn't had time to look at it
> since.

No problem, thanks for looking at it.  I have looked at it again this
morning, and applied it.

> The fix looks good to me.  I also tried multiple failure scenario and
> it's unsurprisingly working just fine.  Should we add some regression
> tests for that?  I guess most of it could be borrowed from the patch
> to fix the toast index issue I sent last week.

I have doubts when it comes to use a strategy based on
pg_cancel_backend() and a match of application_name (see for example
5ad72ce but I cannot find the associated thread).  I think that we
could design something more robust here and usable by all tests, with
two things coming into my mind: 
- A new meta-command for isolation tests to be able to cancel a
session with PQcancel().
- Fault injection in the backend.
For the case of this thread, the cancellation command would be a better
match.
--
Michael


signature.asc
Description: PGP signature


Re: backup manifests

2020-03-04 Thread Suraj Kharage
On Wed, Mar 4, 2020 at 3:51 PM tushar  wrote:

> Another scenario, in which if we modify Manifest-Checksum" value from
> backup_manifest file , we are not getting an error
>
> [centos@tushar-ldap-docker bin]$ ./pg_validatebackup data/
> pg_validatebackup: * manifest_checksum =
> 28d082921650d0ae881de8ceb122c8d2af5f449f51ecfb446827f7f49f91f65d
> pg_validatebackup: backup successfully verified
>
> open backup_manifest file and replace
>
> "Manifest-Checksum":
> "8d082921650d0ae881de8ceb122c8d2af5f449f51ecfb446827f7f49f91f65d"}
> with
> "Manifest-Checksum": "Hello World"}
>
> rerun the pg_validatebackup
>
> [centos@tushar-ldap-docker bin]$ ./pg_validatebackup data/
> pg_validatebackup: * manifest_checksum = Hello World
> pg_validatebackup: backup successfully verified
>
> regards,
>

Yeah, This handling is missing in the provided WIP patch. I believe Robert
will consider this fixing in upcoming version of validator patch.

-- 
--

Thanks & Regards,
Suraj kharage,
EnterpriseDB Corporation,
The Postgres Database Company.


Command statistics system (cmdstats)

2020-03-04 Thread Mark Dilger
Hackers,

as mentioned in [1], I have created an implementation of command counter 
statistics very similar in purpose to the one already pending in the commitfest 
going by the name "pg_stat_sql".  I don't really care if this implementation is 
seen as building on that one or as separate, but I was worried about hijacking 
that thread, so I'm starting this thead.  There are elements of this patch that 
borrowed from that one, so if this is accepted, authorship should reflect that.

See the work by Haribabu Kommi (haribabu) at 
https://commitfest.postgresql.org/27/790/

The two main differences are that

(1) This implementation is based on commandtags as enums, not strings and
(2) This implementation uses techniques to reduce lock contention

I think (2) is the more important part.

[1] 
https://www.postgresql.org/message-id/28F46159-3764-421B-904B-004DEA339310%40enterprisedb.com



v1-0001-Implementing-the-cmdstats-subsystem.patch
Description: Binary data

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company





Re: Identifying user-created objects

2020-03-04 Thread Kyotaro Horiguchi
At Wed, 4 Mar 2020 21:07:05 +0900, Fujii Masao  
wrote in 
> >> The function that you are proposing is really enough for this use
> >> case?
> >> What if malicious users directly change the oid of function
> >> to < FirstNormalObjectId? Or you're assuming that malicious users will
> >> never log in as superuser and not be able to change the oid?
> >
> > That's a good point! I'm surprised that user is allowed to update an
> > oid of database object. In addition, surprisingly we can update it to
> > 0, which in turn leads the assertion failure:
> 
>  Since non-superusers are not allowed to do that by default,
>  that's not so bad? That is, to avoid such unexpected change of oid,
>  admin just should prevent malicious users from logging in as
>  superusers
>  and not give the permission on system catalogs to such users.
> 
> >>>
> >>> I think there is still insider threats. As long as we depend on
> >>> superuser privilege to do some DBA work, a malicious DBA might be able
> >>> to log in as superuser and modify oid.
> >>
> >> Yes. But I'm sure that DBA has already considered the measures
> >> againt such threads. Otherwise malicious users can do anything
> >> more malicious rather than changing oid.
> > Agreed. So that's not a serious problem in practice but we cannot say
> > the checking by pg_is_user_object() is totally enough for checking
> > whether malicious object exists or not. Is that right?
> 
> Yes.
> 
> My opinion is that, if malious users are not allowed to log in
> as superusers and the admin give no permission on the system
> schema/catalog to them, checking whether the object is defined
> under pg_catalog schema or not is enough for your purpose.
> Because they are also not allowed to create the object under
> pg_catalog. pg_is_user_object() seems not necessary.
> 
> OTOH, if you address the case where malicious users can create
> the object under pg_catalog, of course, checking whether
> the object is defined under pg_catalog schema or not is enough
> for the purpose. But pg_is_user_object() is also not enough
> because such users can change oid.

The discussion seems assuming the feature is related to some security
measure.  But I think I haven't seen the objective or use case for the
feature. I don't see how we should treat them according the result
from the "user-defined objects detection" feature.

For example, we could decide a function whether to be pushed-out or
not to remote server on postgres_fdw. In this case, we need to ask "is
the behavior of this function known to us?", in short, "is this
function is predefined?".  In this use case, we have no concern if DBA
have added some functions as "not user-defined", since it's their own
risk.

I don't come up with another use cases but, anyway, I think we need to
clarify the scope of the feature.

regads.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Asynchronous Append on postgres_fdw nodes.

2020-03-04 Thread Thomas Munro
On Fri, Feb 28, 2020 at 9:08 PM Kyotaro Horiguchi
 wrote:
> - v2-0001-Allow-wait-event-set-to-be-regsitered-to-resoure.patch
>   The async feature uses WaitEvent, and it needs to be released on
>   error.  This patch makes it possible to register WaitEvent to
>   resowner to handle that case..

+1

> - v2-0002-infrastructure-for-asynchronous-execution.patch
>   It povides an abstraction layer of asynchronous behavior
>   (execAsync). Then adds ExecAppend, another version of ExecAppend,
>   that handles "async-capable" subnodes asynchronously. Also it
>   contains planner part that makes planner aware of "async-capable"
>   and "async-aware" path nodes.

> This patch add an infrastructure for asynchronous execution. As a PoC
> this makes only Append capable to handle asynchronously executable
> subnodes.

What other nodes do you think could be async aware?  I suppose you
would teach joins to pass through the async support of their children,
and then you could make partition-wise join work like that.

+/* choose appropriate version of Exec function */
+if (appendstate->as_nasyncplans == 0)
+appendstate->ps.ExecProcNode = ExecAppend;
+else
+appendstate->ps.ExecProcNode = ExecAppendAsync;

Cool.  No extra cost for people not using the new feature.

+slot = ExecProcNode(subnode);
+if (subnode->asyncstate == AS_AVAILABLE)

So, now when you execute a node, you get a result AND you get some
information that you access by reaching into the child node's
PlanState.  The ExecProcNode() interface is extremely limiting, but
I'm not sure if this is the right way to extend it.  Maybe
ExecAsyncProcNode() with a wide enough interface to do the job?

+Bitmapset *
+ExecAsyncEventWait(PlanState **nodes, Bitmapset *waitnodes, long timeout)
+{
+static int *refind = NULL;
+static int refindsize = 0;
...
+if (refindsize < n)
...
+static ExecAsync_mcbarg mcb_arg =
+{ &refind, &refindsize };
+static MemoryContextCallback mcb =
+{ ExecAsyncMemoryContextCallback, (void *)&mcb_arg, NULL };
...
+MemoryContextRegisterResetCallback(TopTransactionContext, &mcb);

This seems a bit strange.  Why not just put the pointer in the plan
state?  I suppose you want to avoid allocating a new buffer for every
query.  Perhaps you could fix that by having a small fixed-size buffer
in the PlanState to cover common cases and allocating a larger one in
a per-query memory context if that one is too small, or just not
worrying about it and allocating every time since you know the desired
size.

+wes = CreateWaitEventSet(TopTransactionContext,
TopTransactionResourceOwner, n);
...
+FreeWaitEventSet(wes);

BTW, just as an FYI, I am proposing[1] to add support for
RemoveWaitEvent(), so that you could have a single WaitEventSet for
the lifetime of the executor node, and then add and remove sockets
only as needed.  I'm hoping to commit that for PG13, if there are no
objections or better ideas soon, because it's useful for some other
places where we currently create and destroy WaitEventSets frequently.
One complication when working with long-lived WaitEventSet objects is
that libpq (or some other thing used by some other hypothetical
async-capable FDW) is free to close and reopen its socket whenever it
wants, so you need a way to know when it has done that.  In that patch
set I added pqSocketChangeCount() so that you can see when pgSocket()
refers to a new socket (even if the file descriptor number is the same
by coincidence), but that imposes some book-keeping duties on the
caller, and now I'm wondering how that would look in your patch set.
My goal is to generate the minimum number of systems calls.  I think
it would be nice if a 1000-shard query only calls epoll_ctl() when a
child node needs to be added or removed from the set, not
epoll_create(), 1000 * epoll_ctl(), epoll_wait(), close()  for every
wait.  But I suppose there is an argument that it's more complication
than it's worth.

[1] https://commitfest.postgresql.org/27/2452/




Re: Crash by targetted recovery

2020-03-04 Thread Kyotaro Horiguchi
At Mon, 2 Mar 2020 20:54:04 +0900, Fujii Masao  
wrote in 
> > And random access during StandbyMode ususally (always?) lets RecPtr go
> > back.  I'm not sure WaitForWALToBecomeAvailable works correctly if we
> > don't have a file in pg_wal and the REDO point is far back by more
> > than a segment from the initial checkpoint record.
> 
> It works correctly. This is why WaitForWALToBecomeAvailable() uses
> fetching_ckpt argument.

Hmm. You're right. We start streaming from RedoStartLSN when
fetching_ckpt.  So that doesn't happen.

> > If we go back to XLOG_FROM_ARCHIVE by random access, it correctly
> > re-connects to the primary for the past segment.
> 
> But this can lead to unnecessary restart of walreceiver. Since
> fetching_ckpt ensures that the WAL file containing the REDO
> starting record exists in pg_wal, there is no need to reconnect
> to the primary when reading the REDO starting record.
> 
> Is there other case where we need to go back to XLOG_FROM_ARCHIVE
> by random access?

I understand that the reconnection for REDO record is useless. Ok I
take the !StandbyMode way.

The attached is the test script that is changed to count the added
test, and the slight revised main patch.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From 5165a55c11aff5f18a8d39029b9291070a908744 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Wed, 26 Feb 2020 20:41:11 +0900
Subject: [PATCH v3 1/2] TAP test for a crash bug

---
 src/test/recovery/t/003_recovery_targets.pl | 34 -
 1 file changed, 33 insertions(+), 1 deletion(-)

diff --git a/src/test/recovery/t/003_recovery_targets.pl b/src/test/recovery/t/003_recovery_targets.pl
index fd14bab208..5b23ceb3c4 100644
--- a/src/test/recovery/t/003_recovery_targets.pl
+++ b/src/test/recovery/t/003_recovery_targets.pl
@@ -3,7 +3,7 @@ use strict;
 use warnings;
 use PostgresNode;
 use TestLib;
-use Test::More tests => 9;
+use Test::More tests => 10;
 use Time::HiRes qw(usleep);
 
 # Create and test a standby from given backup, with a certain recovery target.
@@ -167,3 +167,35 @@ foreach my $i (0..100)
 $logfile = slurp_file($node_standby->logfile());
 ok($logfile =~ qr/FATAL:  recovery ended before configured recovery target was reached/,
 	'recovery end before target reached is a fatal error');
+
+# Corner case where targetted promotion happens on segment boundary
+$node_standby = get_new_node('standby_9');
+$node_standby->init_from_backup($node_master, 'my_backup',
+has_restoring => 1, has_streaming => 1);
+$node_standby->start;
+## read wal_segment_size
+my $result =
+  $node_standby->safe_psql('postgres', "SHOW wal_segment_size");
+die "unknown format of wal_segment_size: $result\n"
+  if ($result !~ /^([0-9]+)MB$/);
+my $segsize = $1 * 1024 * 1024;
+## stop just before the next segment boundary
+$result =
+  $node_standby->safe_psql('postgres', "SELECT pg_last_wal_replay_lsn()");
+my ($seg, $off) = split('/', $result);
+my $target = sprintf("$seg/%08X", (hex($off) / $segsize + 1) * $segsize);
+
+$node_standby->stop;
+$node_standby->append_conf('postgresql.conf', qq(
+recovery_target_inclusive=no
+recovery_target_lsn='$target'
+recovery_target_action='promote'
+));
+$node_standby->start;
+## do targetted promote
+$node_master->safe_psql('postgres', "CREATE TABLE t(); DROP TABLE t;");
+$node_master->safe_psql('postgres', "SELECT pg_switch_wal(); CHECKPOINT;");
+my $caughtup_query = "SELECT NOT pg_is_in_recovery()";
+$node_standby->poll_query_until('postgres', $caughtup_query)
+  or die "Timed out while waiting for standby to promote";
+pass("targetted promotion on segment bounary");
-- 
2.18.2

>From 246a7894684434dbfcba8c11545458f1633e76b5 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Thu, 5 Mar 2020 11:54:07 +0900
Subject: [PATCH v3 2/2] Fix a crash bug of targetted promotion.

After recovery target is reached, StartupXLOG turns off standby mode
then refetches the last record. If the last record starts from the
previous segment at the time, WaitForWALToBecomeAvailable crashes with
assertion failure.  WaitForWALToBecomeAvailable should move back to
XLOG_FROM_ARCHIVE if standby mode is turned off while
XLOG_FROM_STREAM.
---
 src/backend/access/transam/xlog.c | 11 +--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index d19408b3be..50ad5079b6 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -11831,7 +11831,8 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
 	 * 1. Read from either archive or pg_wal (XLOG_FROM_ARCHIVE), or just
 	 *	  pg_wal (XLOG_FROM_PG_WAL)
 	 * 2. Check trigger file
-	 * 3. Read from primary server via walreceiver (XLOG_FROM_STREAM)
+	 * 3. Read from primary server via walreceiver (XLOG_FROM_STREAM).
+	 *If StandbyMode is turned off, the state machine goes back to 1.
 	 * 4. Rescan timelines
 	 * 5. Sleep wal_retrieve_retry_interval milliseco

Re: PATCH: add support for IN and @> in functional-dependency statistics use

2020-03-04 Thread Tomas Vondra

Hi Pierre,

I've looked at this patch series, hoping to get it close to committable.
Here is a somewhat improved version of the patch series, split into 5
pieces. The first 4 parts are about applying functional dependencies to
ScalarArrayOpExpr clauses. The last part is about doing the same thing
for MCV lists, so it seemed fine to post it here.

0001 is the patch you posted back in October

0002 simplifies the handling logic a bit, because ScalarArrayOpExpr can
only have form (Var op Const) but not (Const op Var).

0003 fixes what I think is a bug - ScalarArrayOpExpr can represent three
different cases:

 * Var op ANY ()
 * Var IN () -- special case of ANY
 * Var op ALL ()

I don't think functional dependencies can handle the ALL case, we need
to reject it by checking the useOr flag.

0004 adds queries to the stats_ext test suite, to test all of this (some
of the cases illustrate the need for 0003, I think)

0005 allows estimation of ScalarArrayOpExpr by MCV lists, including
regression tests etc.

Will you have time to look at this, particularly 0001-0004, but maybe
even the 0005 part?

As for the second part of your patch (the one allowing estimation of
array containment queries), I still think that's not something we can
reasonably do without also building statistics on elements (which is
what we have in pg_stats but not pg_stats_ext).

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 
>From ca77b087725d1c670250538c017f88150b3df868 Mon Sep 17 00:00:00 2001
From: Tomas Vondra 
Date: Wed, 4 Mar 2020 15:55:46 +0100
Subject: [PATCH 01/11] Apply functional dependencies to ScalarArrayOpExpr

Until now functional dependencies handled only plain equality clauses.
We can apply them to IN some cases of ANY, which can be translated to
an equality.
---
 src/backend/statistics/dependencies.c | 28 +++
 1 file changed, 28 insertions(+)

diff --git a/src/backend/statistics/dependencies.c 
b/src/backend/statistics/dependencies.c
index e2f6c5bb97..36941b8535 100644
--- a/src/backend/statistics/dependencies.c
+++ b/src/backend/statistics/dependencies.c
@@ -801,6 +801,34 @@ dependency_is_compatible_clause(Node *clause, Index relid, 
AttrNumber *attnum)
 
/* OK to proceed with checking "var" */
}
+   else if (IsA(rinfo->clause, ScalarArrayOpExpr))
+   {
+   /* If it's an opclause, check for Var IN Const. */
+   ScalarArrayOpExpr  *expr = (ScalarArrayOpExpr *) 
rinfo->clause;
+
+   /* Only expressions with two arguments are candidates. */
+   if (list_length(expr->args) != 2)
+   return false;
+
+   /* Make sure non-selected argument is a pseudoconstant. */
+   if (is_pseudo_constant_clause(lsecond(expr->args)))
+   var = linitial(expr->args);
+   else if (is_pseudo_constant_clause(linitial(expr->args)))
+   var = lsecond(expr->args);
+   else
+   return false;
+
+   /*
+* If it's not an "=" operator, just ignore the clause, as it's 
not
+* compatible with functional dependencies. The operator is 
identified
+* simply by looking at which function it uses to estimate 
selectivity.
+* That's a bit strange, but it's what other similar places do.
+*/
+   if (get_oprrest(expr->opno) != F_EQSEL)
+   return false;
+
+   /* OK to proceed with checking "var" */
+   }
else if (is_notclause(rinfo->clause))
{
/*
-- 
2.21.1

>From 9d1bba7dc28a219d9de47e14bad5888e9a495877 Mon Sep 17 00:00:00 2001
From: Tomas Vondra 
Date: Wed, 4 Mar 2020 22:25:54 +0100
Subject: [PATCH 02/11] Simplify parsing of ScalarArrayOpExpr

We know ScalarArrayOpExpr is always of the form

Var op Const

in this exact order, so we don't need to check if var is on the left.
---
 src/backend/statistics/dependencies.c | 15 ---
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/src/backend/statistics/dependencies.c 
b/src/backend/statistics/dependencies.c
index 36941b8535..a75b9d73db 100644
--- a/src/backend/statistics/dependencies.c
+++ b/src/backend/statistics/dependencies.c
@@ -803,21 +803,22 @@ dependency_is_compatible_clause(Node *clause, Index 
relid, AttrNumber *attnum)
}
else if (IsA(rinfo->clause, ScalarArrayOpExpr))
{
-   /* If it's an opclause, check for Var IN Const. */
+   /* If it's an scalar array operator, check for Var IN Const. */
ScalarArrayOpExpr  *expr = (ScalarArrayOpExpr *) 
rinfo->clause;
 
/* Only expressions with two arguments are candidates. */
if (list_length(expr->args) != 2)
return false;
 
-

Re: [PATCH] distinct aggregates within a window function WIP

2020-03-04 Thread Krasiyan Andreev
I have currently suspended development of this patch, based on it's review,
but I will continue development of the other Oliver Ford's work about
adding support of respect/ignore nulls
for lag(),lead(),first_value(),last_value() and nth_value() and from
first/last for nth_value() patch,
but I am not sure how to proceed with it's implementation and any feedback
will be very helpful.

I have dropped support of from first/last for nth_value(), but also I
reimplemented it in a different way,
by using negative number for the position argument, to be able to get the
same frame in exact reverse order.
After that patch becomes much more simple and major concerns about
precedence hack has gone,
but maybe it can be additionally simplified.

I have not renamed special bool type "ignorenulls", because it is probably
not acceptable way for calling extra version
of window functions (but it makes things very easy and it can reuse
frames), but I removed the other special bool type "fromlast".

Attached file is for PostgreSQL 13 (master git branch) and I will add it
now to a March commit fest, to be able to track changes.
Everything works and patch is in very good shape, all tests are passed and
also, I use it from some time for SQL analysis purposes
(because ignore nulls is one of the most needed feature in OLAP/BI area and
Oracle, Amazon Redshift, even Informix have it).

After patch review and suggestions about what to do with special bool type
and unreserved keywords, I will reimplement it, if needed.

На пн, 13.01.2020 г. в 18:19 Vik Fearing 
написа:

> On 13/01/2020 15:19, Tom Lane wrote:
> > Krasiyan Andreev  writes:
> >> I want to propose to you an old patch for Postgres 11, off-site
> developed
> >> by Oliver Ford,
> >> but I have permission from him to publish it and to continue it's
> >> development,
> >> that allow distinct aggregates, like select sum(distinct nums) within a
> >> window function.
> > I started to respond by asking whether that's well-defined, but
> > reading down further I see that that's not actually what the feature
> > is: what it is is attaching DISTINCT to a window function itself.
> > I'd still ask whether it's well-defined though, or even minimally
> > sensible.  Window functions are generally supposed to produce one
> > row per input row --- how does that square with the implicit row
> > merging of DISTINCT?  They're also typically row-order-sensitive
> > --- how does that work with DISTINCT?
>
>
> It's a little strange because the spec says:
>
>
> 
> If the window ordering clause or the window framing clause of the window
> structure descriptor that describes the 
> is present, then no  simply contained in  function> shall specify DISTINCT or .
> 
>
>
> So it seems to be well defined if all you have is a partition.
>
>
> But then it also says:
>
>
> 
> DENSE_RANK() OVER WNS is equivalent to the :
> COUNT (DISTINCT ROW ( VE 1 , ..., VE N ) )
> OVER (WNS1 RANGE UNBOUNDED PRECEDING)
> 
>
>
> And that kind of looks like a framing clause there.
>
>
> > Also, to the extent that
> > this is sensible, can't you get the same results already today
> > with appropriate use of window framing options?
>
>
> I don't see how.
>
>
> I have sometimes wanted this feature so I am +1 on us getting at least a
> minimal form of it.
>
> --
>
> Vik
>
>
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 28035f1635..3d73c96891 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -15702,7 +15702,7 @@ SELECT xmlagg(x) FROM (SELECT x FROM test ORDER BY y DESC) AS tab;

  lag(value anyelement
  [, offset integer
- [, default anyelement ]])
+ [, default anyelement ]]) [null_treatment]

   
   
@@ -15731,7 +15731,7 @@ SELECT xmlagg(x) FROM (SELECT x FROM test ORDER BY y DESC) AS tab;

  lead(value anyelement
   [, offset integer
-  [, default anyelement ]])
+  [, default anyelement ]]) [null_treatment]

   
   
@@ -15757,7 +15757,7 @@ SELECT xmlagg(x) FROM (SELECT x FROM test ORDER BY y DESC) AS tab;

 first_value

-   first_value(value any)
+   first_value(value any) [null_treatment]
   
   
same type as value
@@ -15773,7 +15773,7 @@ SELECT xmlagg(x) FROM (SELECT x FROM test ORDER BY y DESC) AS tab;

 last_value

-   last_value(value any)
+   last_value(value any) [null_treatment]
   
   
same type as value
@@ -15790,7 +15790,7 @@ SELECT xmlagg(x) FROM (SELECT x FROM test ORDER BY y DESC) AS tab;
 nth_value


- nth_value(value any, nth integer)
+ nth_value(value any, nth integer) [null_treatment]

   
   
@@ -15806,6 +15806,16 @@ SELECT xmlagg(x) FROM (SELECT x FROM test ORDER BY y DESC) AS tab;

   
 
+  
+   In , null_treatment is one of:
+   
+ RESPECT NULLS
+ IGNORE NULLS
+   
+ 

[PATCH] respect/ignore nulls for lag,lead,first_value,last_value and nth_value and from first/last for nth_value

2020-03-04 Thread Krasiyan Andreev
Hi hackers,

I want to continue development of Oliver Ford's respect/ignore nulls for
lag,lead,first_value,last_value and nth_value
and from first/last for nth_value patch, but I am not sure how to proceed
with it and any feedback will be very useful.

I have dropped support of from first/last for nth_value(), but also I
reimplemented it in a different way,
by using negative number for the position argument, to be able to get the
same frame in exact reverse order.
After that patch becomes much more simple and major concerns about
precedence hack has gone,
but maybe it can be additionally simplified.

I have not renamed special bool type "ignorenulls", because it is probably
not acceptable way for calling extra version
of window functions (because it makes things very easy, and it can reuse
frames), but I removed the other special bool type "fromlast".

So, that is the major question, can someone give me an better idea or
example that I can use,
for something, that can be more acceptable as implementation and I will try
to do it in a such way.

Attached file is for PostgreSQL 13 (master git branch) and I will add it
now to a March commit fest, to be able to track changes.
Everything works and patch is in very good shape, make check is passed and
also, I use it from some time for SQL analysis purposes
(because ignore nulls is one of the most needed feature in OLAP/BI area and
Oracle, Amazon Redshift and Informix have it).

After patch review and suggestions about what to do with special bool type
and unreserved keywords, I will reimplement it, if needed.
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 28035f1635..3d73c96891 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -15702,7 +15702,7 @@ SELECT xmlagg(x) FROM (SELECT x FROM test ORDER BY y DESC) AS tab;

  lag(value anyelement
  [, offset integer
- [, default anyelement ]])
+ [, default anyelement ]]) [null_treatment]

   
   
@@ -15731,7 +15731,7 @@ SELECT xmlagg(x) FROM (SELECT x FROM test ORDER BY y DESC) AS tab;

  lead(value anyelement
   [, offset integer
-  [, default anyelement ]])
+  [, default anyelement ]]) [null_treatment]

   
   
@@ -15757,7 +15757,7 @@ SELECT xmlagg(x) FROM (SELECT x FROM test ORDER BY y DESC) AS tab;

 first_value

-   first_value(value any)
+   first_value(value any) [null_treatment]
   
   
same type as value
@@ -15773,7 +15773,7 @@ SELECT xmlagg(x) FROM (SELECT x FROM test ORDER BY y DESC) AS tab;

 last_value

-   last_value(value any)
+   last_value(value any) [null_treatment]
   
   
same type as value
@@ -15790,7 +15790,7 @@ SELECT xmlagg(x) FROM (SELECT x FROM test ORDER BY y DESC) AS tab;
 nth_value


- nth_value(value any, nth integer)
+ nth_value(value any, nth integer) [null_treatment]

   
   
@@ -15806,6 +15806,16 @@ SELECT xmlagg(x) FROM (SELECT x FROM test ORDER BY y DESC) AS tab;

   
 
+  
+   In , null_treatment is one of:
+   
+ RESPECT NULLS
+ IGNORE NULLS
+   
+   RESPECT NULLS specifies the default behavior to include nulls in the result.
+   IGNORE NULLS ignores any null values when determining a result.
+  
+
   
All of the functions listed in
 depend on the sort ordering
@@ -15843,17 +15853,11 @@ SELECT xmlagg(x) FROM (SELECT x FROM test ORDER BY y DESC) AS tab;
 
   

-The SQL standard defines a RESPECT NULLS or
-IGNORE NULLS option for lead, lag,
-first_value, last_value, and
-nth_value.  This is not implemented in
-PostgreSQL: the behavior is always the
-same as the standard's default, namely RESPECT NULLS.
-Likewise, the standard's FROM FIRST or FROM LAST
-option for nth_value is not implemented: only the
-default FROM FIRST behavior is supported.  (You can achieve
-the result of FROM LAST by reversing the ORDER BY
-ordering.)
+The SQL standard defines a FROM FIRST or FROM LAST
+option for nth_value. This is not implemented in PostgreSQL:
+only the default FROM FIRST behavior is supported.
+(You can achieve the result of FROM LAST by using negative number for the position argument,
+as is done in many languages to indicate a FROM END index.)

   
 
diff --git a/src/backend/catalog/sql_features.txt b/src/backend/catalog/sql_features.txt
index 9f840ddfd2..a355e379e7 100644
--- a/src/backend/catalog/sql_features.txt
+++ b/src/backend/catalog/sql_features.txt
@@ -508,9 +508,9 @@ T612	Advanced OLAP operations			YES
 T613	Sampling			YES	
 T614	NTILE function			YES	
 T615	LEAD and LAG functions			YES	
-T616	Null treatment option for LEAD and LAG functions			NO	
+T616	Null treatment option for LEAD and LAG functions			YES	
 T617	FIRST_VALUE and LAST_VALUE function			YES	
-T618	NTH_VALUE fu

kill_prior_tuple and index scan costing

2020-03-04 Thread Peter Geoghegan
If I run the regression tests so that the "tenk1" table is available,
and then create an index on tenk1.twothousand, I notice that simple
"where twothousand = ?" queries have query plans that look like the
following sample plan:

pg@regression:5432 [17755]=# explain (analyze, buffers, costs off)
select * from tenk1 where twothousand = 42;
┌─┐
│ QUERY PLAN
   │
├─┤
│ Bitmap Heap Scan on tenk1 (actual time=0.023..0.032 rows=5 loops=1)
   │
│   Recheck Cond: (twothousand = 42)
   │
│   Heap Blocks: exact=5
   │
│   Buffers: shared hit=7
   │
│   ->  Bitmap Index Scan on tenk1_twothousand_idx1 (actual
time=0.015..0.015 rows=5 loops=1) │
│ Index Cond: (twothousand = 42)
   │
│ Buffers: shared hit=2
   │
│ Planning Time: 0.146 ms
   │
│ Execution Time: 0.065 ms
   │
└─┘
(9 rows)

Seems reasonable. There is a bitmap index scan, more or less due to
the uncorrelated table access that would be required by an index scan
on tenk1_twothousand_idx1. We return 5 rows, and must access one heap
block for each of those 5 rows. I wonder why we don't get the
following alternative plan instead, which is slightly faster even with
the weak correlation:

pg@regression:5432 [17755]=# explain (analyze, buffers, costs off)
select * from tenk1 where twothousand = 42;
┌┐
│ QUERY PLAN
  │
├┤
│ Index Scan using tenk1_twothousand_idx1 on tenk1 (actual
time=0.020..0.030 rows=5 loops=1) │
│   Index Cond: (twothousand = 42)
  │
│   Buffers: shared hit=7
  │
│ Planning Time: 0.134 ms
  │
│ Execution Time: 0.058 ms
  │
└┘
(5 rows)

Both plans are very similar, really. The number of heap accesses and
B-Tree index page accesses is exactly the same in each case. But the
index scan plan has one non-obvious advantage, that might matter a lot
in the real world: it can apply the kill_prior_tuple optimization. (It
is never possible to use the kill_prior_tuple optimization during a
bitmap index scan.)

It makes sense that the planner determines that a bitmap index scan is
faster -- or it used to make sense. Before commit dd299df8, which made
heap TID a tiebreaker nbtree index column, we might find ourselves
accessing the same heap page multiple times, pinning it a second or a
third time within the executor (it depended on very unstable
implementation details in the nbtree code). These days we should
*reliably* access the same number of heap pages (and index pages) with
either plan. (There are a couple of caveats that I'm glossing over for
now, like pg_upgrade'd indexes.)

Is it worth considering the influence of the tiebreaker heap TID
column work in the planner, so that we get to use the kill_prior_tuple
optimization more often? I'm not planning to work on it myself, but it
seems worth considering.

FWIW, the planner requires lots of convincing before it will use the
index scan plan right now. I find that I need to set random_page_cost
to 1.6 before the planner chooses the latter plan (a CLUSTER that uses
the twothousand index works too, of course). If I come up with a
similar example that returns 10 rows (i.e. that indexes the "thousand"
row instead), random_page_cost needs to be reduced to 1.1 to produce
an equivalent query plan crossover.

-- 
Peter Geoghegan


Re: pg_stat_progress_basebackup - progress reporting for pg_basebackup, in the server side

2020-03-04 Thread Magnus Hagander
On Mon, Mar 2, 2020 at 10:03 PM Fujii Masao  wrote:
>
>
>
> On 2020/03/03 14:37, Shinoda, Noriyoshi (PN Japan A&PS Delivery) wrote:
> > Hi,
> >
> > Thank you for developing good features.
> > The attached patch is a small fix to the committed documentation. This 
> > patch fixes the description literal for the backup_streamed column.
>
> Thanks for the report and patch! Pushed.

This patch requires, AIUI, that you add -P to the pg_basebackup
commandline in order to get the progress tracked in details
serverside. But this also generates output in the client that one
might not want.

Should we perhaps have a switch in pg_basebackup that enables the
server side tracking only, without generating output in the client?

//Magnus




Re: [PATCH] Incremental sort

2020-03-04 Thread James Coleman
On Tue, Mar 3, 2020 at 1:43 PM Tomas Vondra 
wrote:

> On Tue, Mar 03, 2020 at 12:17:22PM -0500, David Steele wrote:
> >James and Tomas,
> >
> >On 1/21/20 10:03 AM, James Coleman wrote:
> >>On Tue, Jan 21, 2020 at 9:58 AM Tomas Vondra
> >> wrote:
> >>>
> >>>On Tue, Jan 21, 2020 at 09:37:01AM -0500, James Coleman wrote:
> On Tue, Jan 21, 2020 at 9:25 AM Tomas Vondra
>  wrote:
> >
> >Hi,
> >
> >This patch has been marked as WoA since end of November, and there has
> >been no discussion/reviews since then :-( Based on off-list discussion
> >with James I don't think that's going to change in this CF, so I'll
> move
> >it to the next CF.
> >
> >I plan to work on the planner part of this patch before 2020-03, with
> >the hope it can still make it into 13.
> 
> In that off-list discussion I'd mentioned to Tomas that I would still
> like to work on this, just my other responsibilities at work have left
> me little time to work on the most important remaining part of this
> (the planner parts) since that requires a fair amount of focus and
> time.
> 
> That being said, the patch also needs some more work on improving
> EXPLAIN ANALYZE output (perhaps min/max/mean or median of
> memory usage number of groups in each sort mode), and I think it's far
> more feasible that I can tackle that piecemeal before the next CF.
> 
> >>>
> >>>Sure, sorry if that was not clear from my message - you're of course
> >>>more than welcome to continue working on this. My understanding is that
> >>>won't happen by the end of this CF, hence the move to 2020-03.
> >>
> >>Oh, yeah, I probably didn't word that reply well -- I just wanted to
> >>add some additional detail to what you had already said.
> >>
> >>Thanks for your work on this!
> >
> >It doesn't look there has been much movement on this patch for the
> >last few CFs.  Are one or both of you planning to work on this for
> >v13? Or should we mark it for v14 and/or move it to the next CF.
> >
>
> I'm currently working on it, I plan to submit a new patch version
> shortly - hopefully by the end of this week.
>

Tomas, thanks much for working on this.

I haven't had a lot of time to dedicate to this, but I do hope to soon
(late this week or next), in addition to the planner stuff I believe Tomas
is working on, push improvements to the EXPLAIN ANALYZE output.

James


Re: Asynchronous Append on postgres_fdw nodes.

2020-03-04 Thread Kyotaro Horiguchi
At Wed, 4 Mar 2020 09:56:55 -0500, David Steele  wrote in 
> On 2/28/20 3:06 AM, Kyotaro Horiguchi wrote:
> > Hello, this is a follow-on of [1] and [2].
> > Currently the executor visits execution nodes one-by-one.  Considering
> > sharding, Append on multiple postgres_fdw nodes can work
> > simultaneously and that can largely shorten the respons of the whole
> > query.  For example, aggregations that can be pushed-down to remote
> > would be accelerated by the number of remote servers. Even other than
> > such an extreme case, collecting tuples from multiple servers also can
> > be accelerated by tens of percent [2].
> > I have suspended the work waiting asyncrohous or push-up executor to
> > come but the mood seems inclining toward doing that before that to
> > come [3].
> > The patchset consists of three parts.
> 
> Are these improvements targeted at PG13 or PG14?  This seems to be a
> pretty big change for the last CF of PG13.

It is targeted at PG14.  As we have the target version in CF-app now,
I marked it as targetting PG14.

Thank you for the suggestion.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Allowing ALTER TYPE to change storage strategy

2020-03-04 Thread David G. Johnston
On Wed, Mar 4, 2020 at 4:15 PM Tom Lane  wrote:

> I wrote:
> > I think this is committable --- how about you?
>
> ... or not.  I just noticed that the typcache tracks each type's
> typstorage setting, and there's no provision for flushing/reloading
> that.
>
> As far as I can find, there is only one place where the cached
> value is used, and that's in rangetypes.c which needs to know
> whether the range element type is toastable.  (It doesn't actually
> need to know the exact value of typstorage, only whether it is or
> isn't PLAIN.)
>
> [...]



>
> 3. Drop the ability for ALTER TYPE to promote from PLAIN to not-PLAIN
> typstorage, and adjust the typcache so that it only remembers boolean
> toastability not the specific toasting strategy.  Then the cache is
> still immutable so no need for update logic.
>
> I'm kind of liking #3, ugly as it sounds, because I'm not sure how
> much of a use-case there is for the upgrade-from-PLAIN case.
> Particularly now that TOAST is so ingrained in the system, it seems
> rather unlikely that a production-grade data type wouldn't have
> been designed to be toastable from the beginning, if there could be
> any advantage to that.  Either #1 or #2 seem like mighty high prices
> to pay for offering an option that might have no real-world uses.
>

Tomas' opening paragraph for this thread indicated this was motivated by
the plain-to-toast change but I'm not in a position to provide independent
insight.

Without that piece this is mainly about being able to specify a type's
preference for when and how it can be toasted.  That seems like sufficient
motivation, though that functionality seems basic enough that I'm wondering
why it hasn't come up before now (this seems like a different topic of
wonder than what Tomas mentioned in the OP).

Is there also an issue with whether the type has implemented compression or
not - i.e., should the x->e and m->e paths be forbidden too?  Or is it
always the case a non-plain type is compressible and the other non-plain
options just switch between preferences (so External just says "while I can
be compressed, please don't")?
Separately...

Can you please include an edit to [1] indicating that "e" is the
abbreviation for External and "x" is Extended (spelling out the other two
as well).  Might be worth a comment at [2] as well.

[1] https://www.postgresql.org/docs/12/catalog-pg-type.html
[2] https://www.postgresql.org/docs/12/storage-toast.html

Thanks!

David J.


Re: Allowing ALTER TYPE to change storage strategy

2020-03-04 Thread Tom Lane
I wrote:
> 3. Drop the ability for ALTER TYPE to promote from PLAIN to not-PLAIN
> typstorage, and adjust the typcache so that it only remembers boolean
> toastability not the specific toasting strategy.  Then the cache is
> still immutable so no need for update logic.
>
> I'm kind of liking #3, ugly as it sounds, because I'm not sure how
> much of a use-case there is for the upgrade-from-PLAIN case.
> Particularly now that TOAST is so ingrained in the system, it seems
> rather unlikely that a production-grade data type wouldn't have
> been designed to be toastable from the beginning, if there could be
> any advantage to that.  Either #1 or #2 seem like mighty high prices
> to pay for offering an option that might have no real-world uses.

Here's a v5 based on that approach.  I also added some comments about
the potential race conditions involved in recursing to domains.

regards, tom lane

diff --git a/doc/src/sgml/ref/create_type.sgml b/doc/src/sgml/ref/create_type.sgml
index 175315f..111f8e6 100644
--- a/doc/src/sgml/ref/create_type.sgml
+++ b/doc/src/sgml/ref/create_type.sgml
@@ -823,18 +823,6 @@ CREATE TYPE name
function is written in C.
   
 
-  
-   In PostgreSQL versions before 7.3, it
-   was customary to avoid creating a shell type at all, by replacing the
-   functions' forward references to the type name with the placeholder
-   pseudo-type opaque.  The cstring arguments and
-   results also had to be declared as opaque before 7.3.  To
-   support loading of old dump files, CREATE TYPE will
-   accept I/O functions declared using opaque, but it will issue
-   a notice and change the function declarations to use the correct
-   types.
-  
-
  
 
  
diff --git a/src/backend/commands/functioncmds.c b/src/backend/commands/functioncmds.c
index 540044b..9279c05 100644
--- a/src/backend/commands/functioncmds.c
+++ b/src/backend/commands/functioncmds.c
@@ -1444,50 +1444,6 @@ SetFunctionReturnType(Oid funcOid, Oid newRetType)
 
 
 /*
- * SetFunctionArgType - change declared argument type of a function
- *
- * As above, but change an argument's type.
- */
-void
-SetFunctionArgType(Oid funcOid, int argIndex, Oid newArgType)
-{
-	Relation	pg_proc_rel;
-	HeapTuple	tup;
-	Form_pg_proc procForm;
-	ObjectAddress func_address;
-	ObjectAddress type_address;
-
-	pg_proc_rel = table_open(ProcedureRelationId, RowExclusiveLock);
-
-	tup = SearchSysCacheCopy1(PROCOID, ObjectIdGetDatum(funcOid));
-	if (!HeapTupleIsValid(tup)) /* should not happen */
-		elog(ERROR, "cache lookup failed for function %u", funcOid);
-	procForm = (Form_pg_proc) GETSTRUCT(tup);
-
-	if (argIndex < 0 || argIndex >= procForm->pronargs ||
-		procForm->proargtypes.values[argIndex] != OPAQUEOID)
-		elog(ERROR, "function %u doesn't take OPAQUE", funcOid);
-
-	/* okay to overwrite copied tuple */
-	procForm->proargtypes.values[argIndex] = newArgType;
-
-	/* update the catalog and its indexes */
-	CatalogTupleUpdate(pg_proc_rel, &tup->t_self, tup);
-
-	table_close(pg_proc_rel, RowExclusiveLock);
-
-	/*
-	 * Also update the dependency to the new type. Opaque is a pinned type, so
-	 * there is no old dependency record for it that we would need to remove.
-	 */
-	ObjectAddressSet(type_address, TypeRelationId, newArgType);
-	ObjectAddressSet(func_address, ProcedureRelationId, funcOid);
-	recordDependencyOn(&func_address, &type_address, DEPENDENCY_NORMAL);
-}
-
-
-
-/*
  * CREATE CAST
  */
 ObjectAddress
diff --git a/src/backend/commands/typecmds.c b/src/backend/commands/typecmds.c
index 5209736..d6e694e 100644
--- a/src/backend/commands/typecmds.c
+++ b/src/backend/commands/typecmds.c
@@ -163,7 +163,6 @@ DefineType(ParseState *pstate, List *names, List *parameters)
 	char	   *array_type;
 	Oid			array_oid;
 	Oid			typoid;
-	Oid			resulttype;
 	ListCell   *pl;
 	ObjectAddress address;
 
@@ -196,8 +195,7 @@ DefineType(ParseState *pstate, List *names, List *parameters)
 #endif
 
 	/*
-	 * Look to see if type already exists (presumably as a shell; if not,
-	 * TypeCreate will complain).
+	 * Look to see if type already exists.
 	 */
 	typoid = GetSysCacheOid2(TYPENAMENSP, Anum_pg_type_oid,
 			 CStringGetDatum(typeName),
@@ -211,35 +209,37 @@ DefineType(ParseState *pstate, List *names, List *parameters)
 	{
 		if (moveArrayTypeName(typoid, typeName, typeNamespace))
 			typoid = InvalidOid;
+		else
+			ereport(ERROR,
+	(errcode(ERRCODE_DUPLICATE_OBJECT),
+	 errmsg("type \"%s\" already exists", typeName)));
 	}
 
 	/*
-	 * If it doesn't exist, create it as a shell, so that the OID is known for
-	 * use in the I/O function definitions.
+	 * If this command is a parameterless CREATE TYPE, then we're just here to
+	 * make a shell type, so do that (or fail if there already is a shell).
 	 */
-	if (!OidIsValid(typoid))
+	if (parameters == NIL)
 	{
-		address = TypeShellMake(typeName, typeNamespace, GetUserId());
-		typoid = address.objectId;
-		/* Make new shell type visible for modification below */
-		CommandCounterIncrem

Re: Allowing ALTER TYPE to change storage strategy

2020-03-04 Thread Tom Lane
I wrote:
> I think this is committable --- how about you?

... or not.  I just noticed that the typcache tracks each type's
typstorage setting, and there's no provision for flushing/reloading
that.

As far as I can find, there is only one place where the cached
value is used, and that's in rangetypes.c which needs to know
whether the range element type is toastable.  (It doesn't actually
need to know the exact value of typstorage, only whether it is or
isn't PLAIN.)

We have a number of possible fixes for that:

1. Upgrade typcache.c to support flushing and rebuilding this data.
That seems fairly expensive; while we may be forced into that someday,
I'm hesitant to do it for a fairly marginal feature like this one.

2. Stop using the typcache for this particular purpose in rangetypes.c.
That seems rather undesirable from a performance standpoint, too.

3. Drop the ability for ALTER TYPE to promote from PLAIN to not-PLAIN
typstorage, and adjust the typcache so that it only remembers boolean
toastability not the specific toasting strategy.  Then the cache is
still immutable so no need for update logic.

I'm kind of liking #3, ugly as it sounds, because I'm not sure how
much of a use-case there is for the upgrade-from-PLAIN case.
Particularly now that TOAST is so ingrained in the system, it seems
rather unlikely that a production-grade data type wouldn't have
been designed to be toastable from the beginning, if there could be
any advantage to that.  Either #1 or #2 seem like mighty high prices
to pay for offering an option that might have no real-world uses.

regards, tom lane




Re: useless RangeIOData->typiofunc

2020-03-04 Thread Tom Lane
Alvaro Herrera  writes:
> I noticed while going over the multirange types patch that it adds a
> pointless typiofunc cached OID to a struct used for I/O functions'
> fn_extra.  It seems to go completely unused, so I checked range types
> (which this was cribbed from) and indeed, it is completely unused there
> either.  My guess is that it was in turn cribbed from array's
> ArrayMetaState, which is considerably more sophisticated; I suspect
> nobody noticed that caching it was pointless.

> Here's a patch to remove it from rangetypes.c.  It doesn't really waste
> much memory anyway, but removing it lessens the cognitive load by one or
> two bits.

Hm, I'm not sure that really lessens the cognitive load any, but
if you do commit this please fix the dangling reference you left
in the nearby comment:

 {
TypeCacheEntry *typcache;   /* range type's typcache entry */
-   Oid typiofunc;  /* element type's I/O function */
Oid typioparam; /* element type's I/O parameter */
FmgrInfoproc;   /* lookup result for typiofunc */
   ^^^
 } RangeIOData;


regards, tom lane




Re: useless RangeIOData->typiofunc

2020-03-04 Thread Paul Jungwirth

On 3/4/20 1:57 PM, Alvaro Herrera wrote:

I noticed while going over the multirange types patch that it adds a
pointless typiofunc cached OID to a struct used for I/O functions'
fn_extra.  It seems to go completely unused, so I checked range types
(which this was cribbed from) and indeed, it is completely unused there
either.  My guess is that it was in turn cribbed from array's
ArrayMetaState, which is considerably more sophisticated; I suspect
nobody noticed that caching it was pointless.


I didn't believe it at first but I think you're right. :-)


Here's a patch to remove it from rangetypes.c.  It doesn't really waste
much memory anyway, but removing it lessens the cognitive load by one or
two bits.


Looks good to me, and it seems okay to make the same edits to 
multirangetypes.c


Yours,

--
Paul  ~{:-)
p...@illuminatedcomputing.com




Re: range_agg

2020-03-04 Thread Paul Jungwirth

Thanks for looking at this again!

On 3/4/20 1:33 PM, Alvaro Herrera wrote:

I came across an interesting thing, namely multirange_canonicalize()'s
use of qsort_arg with a callback of range_compare().  range_compare()
calls range_deserialize() (non-trivial parsing) for each input range;
multirange_canonicalize() later does a few extra deserialize calls of
its own.  Call me a premature optimization guy if you will, but I think
it makes sense to have a different struct (let's call it
"InMemoryRange") which stores the parsed representation of each range;
then we can deserialize all ranges up front, and use that as many times
as needed, without having to deserialize each range every time.


I don't know, this sounds like a drastic change. I agree that 
multirange_deserialize and range_deserialize do a lot of copying (not 
really any parsing though, and they both assume their inputs are already 
de-TOASTED). But they are used very extensively, so if you wanted to 
remove them you'd have to rewrite a lot.


I interpreted the intention of range_deserialize to be a way to keep the 
range struct fairly "private" and give a standard interface to 
extracting its attributes. Its motive seems akin to deconstruct_array. 
So I wrote multirange_deserialize to follow that principle. Both 
functions also handle memory alignment issues for you. With 
multirange_deserialize, there isn't actually much structure (just the 
list of ranges), so perhaps you could more easily omit it and give 
callers direct access into the multirange contents. That still seems 
risky though, and less well encapsulated.


My preference would be to see if these functions are really a 
performance problem first, and only redo the in-memory structures if 
they are. Also that seems like something you could do as a separate 
project. (I wouldn't mind working on it myself, although I'd prefer to 
do actual temporal database features first.) There are no 
backwards-compatibility concerns to changing the in-memory structure, 
right? (Even if there are, it's too late to avoid them for ranges.)



While I'm at this, why not name the new file simply multiranges.c
instead of multirangetypes.c?


As someone who doesn't do a lot of Postgres hacking, I tried to follow 
the approach in rangetypes.c as closely as I could, especially for 
naming things. So I named the file multirangetypes.c because there was 
already rangetypes.c. But also I can see how the "types" emphasizes that 
ranges and multiranges are not concrete types themselves, but more like 
abstract data types or generics (like arrays).


Yours,

--
Paul  ~{:-)
p...@illuminatedcomputing.com




Re: Ought to use heap_multi_insert() for pg_attribute/depend insertions?

2020-03-04 Thread Daniel Gustafsson
> On 2 Mar 2020, at 03:06, Michael Paquier  wrote:

Thanks a lot for another round of review, much appreciated!

> On Fri, Feb 28, 2020 at 05:24:29PM +0900, Michael Paquier wrote:
>> +   /*
>> +* CONTAINS_NEW_TUPLE will always be set unless the multi_insert was
>> +* performed for a catalog.  If it is a catalog, return immediately as
>> +* there is nothing to logically decode.
>> +*/
>> +   if (!(xlrec->flags & XLH_INSERT_CONTAINS_NEW_TUPLE))
>> +   return;
>> Hmm, OK.  Consistency with DecodeInsert() is a good idea here, so
>> count me in regarding the way your patch handles the problem.  I would
>> be fine to apply that part but, Andres, perhaps you would prefer
>> taking care of it yourself?
> 
> Okay, I have applied this part.

Thanks!

> +#define MAX_BUFFERED_BYTES 65535
> The use-case is different than copy.c, so aren't you looking at a
> separate variable to handle that?

Renamed to indicate current usecase.

> +reset_object_addresses(ObjectAddresses *addrs)
> +{
> +   if (addrs->numrefs == 0)
> +   return;
> Or just use an assert?

I don't see why we should prohibit calling reset_object_addresses so strongly,
it's nonsensical but is it wrong enough to Assert on?  I can change it if you
feel it should be an assertion, but have left it for now.

> src/backend/commands/tablecmds.c:   /* attribute.attacl is handled by
> InsertPgAttributeTuple */
> src/backend/catalog/heap.c: * This is a variant of
> InsertPgAttributeTuple() which dynamically allocates
> Those two references are not true anymore as your patch removes
> InsertPgAttributeTuple() and replaces it by the plural flavor.

Fixed.

> +/*
> + * InsertPgAttributeTuples
> ...
> And those comments largely need to remain around.

Fixed.

> attstattarget handling is inconsistent here, no?

Indeed it was, fixed.

> Not convinced that we have to publish add_object_address() in the
> headers, because we have already add_exact_object_address which is
> able to do the same job.  All code paths touched by your patch involve
> already ObjectAddress objects, so switching to _exact_ creates less
> code diffs.

Good point, using _exact will make the changes smaller at the cost of additions
being larger.  I do prefer the add_object_address API since it makes code more
readable IMO, but there is clear value in not exposing more functions.  I've
changed to using add_exact_object_address in the attached version.

> +   if (ntuples == DEPEND_TUPLE_BUF)
> +   {
> +   CatalogTuplesMultiInsertWithInfo(sdepRel, slot, ntuples, 
> indstate);
> +   ntuples = 0;
> +   }
> Maybe this is a sufficient argument that we had better consider the
> template copy as part of a separate patch, handled once the basics is
> in place.  It is not really obvious if 32 is a good thing, or not.

Fixed by instead avoiding the copying altogether and creating virtual tuples.
Andres commented on this upthread but I had missed fixing the code to do that,
so better late than never.

>/* construct new attribute's pg_attribute entry */
> +   oldcontext = MemoryContextSwitchTo(CurTransactionContext);
> This needs a comment as this change is not obvious for the reader.
> And actually, why?  

Thats a good question, this was a leftover from a different version the code
which I abandoned, but I missed removing the context handling.  Sorry about the
sloppyness there. Removed.

The v9 patch attached addresses, I believe, all comments made to date.  I tried
to read through earlier reviews too to ensure I hadn't missed anything so I
hope I've covered all findings so far.

cheers ./daniel



catalog_multi_insert-v9.patch
Description: Binary data



useless RangeIOData->typiofunc

2020-03-04 Thread Alvaro Herrera
I noticed while going over the multirange types patch that it adds a
pointless typiofunc cached OID to a struct used for I/O functions'
fn_extra.  It seems to go completely unused, so I checked range types
(which this was cribbed from) and indeed, it is completely unused there
either.  My guess is that it was in turn cribbed from array's
ArrayMetaState, which is considerably more sophisticated; I suspect
nobody noticed that caching it was pointless.

Here's a patch to remove it from rangetypes.c.  It doesn't really waste
much memory anyway, but removing it lessens the cognitive load by one or
two bits.

-- 
Álvaro Herrera
diff --git a/src/backend/utils/adt/rangetypes.c b/src/backend/utils/adt/rangetypes.c
index 490bc2ae81..412322c6fa 100644
--- a/src/backend/utils/adt/rangetypes.c
+++ b/src/backend/utils/adt/rangetypes.c
@@ -49,7 +49,6 @@
 typedef struct RangeIOData
 {
 	TypeCacheEntry *typcache;	/* range type's typcache entry */
-	Oid			typiofunc;		/* element type's I/O function */
 	Oid			typioparam;		/* element type's I/O parameter */
 	FmgrInfo	proc;			/* lookup result for typiofunc */
 } RangeIOData;
@@ -309,6 +308,7 @@ get_range_io_data(FunctionCallInfo fcinfo, Oid rngtypid, IOFuncSelector func)
 		bool		typbyval;
 		char		typalign;
 		char		typdelim;
+		Oid			typiofunc;
 
 		cache = (RangeIOData *) MemoryContextAlloc(fcinfo->flinfo->fn_mcxt,
    sizeof(RangeIOData));
@@ -324,9 +324,9 @@ get_range_io_data(FunctionCallInfo fcinfo, Oid rngtypid, IOFuncSelector func)
 		 &typalign,
 		 &typdelim,
 		 &cache->typioparam,
-		 &cache->typiofunc);
+		 &typiofunc);
 
-		if (!OidIsValid(cache->typiofunc))
+		if (!OidIsValid(typiofunc))
 		{
 			/* this could only happen for receive or send */
 			if (func == IOFunc_receive)
@@ -340,7 +340,7 @@ get_range_io_data(FunctionCallInfo fcinfo, Oid rngtypid, IOFuncSelector func)
 		 errmsg("no binary output function available for type %s",
 format_type_be(cache->typcache->rngelemtype->type_id;
 		}
-		fmgr_info_cxt(cache->typiofunc, &cache->proc,
+		fmgr_info_cxt(typiofunc, &cache->proc,
 	  fcinfo->flinfo->fn_mcxt);
 
 		fcinfo->flinfo->fn_extra = (void *) cache;


Re: error context for vacuum to include block number

2020-03-04 Thread Justin Pryzby
On Tue, Mar 03, 2020 at 04:49:00PM -0300, Alvaro Herrera wrote:
> On 2020-Mar-03, Justin Pryzby wrote:
> > On Thu, Feb 27, 2020 at 09:09:42PM -0300, Alvaro Herrera wrote:
> > > > +   case PROGRESS_VACUUM_PHASE_VACUUM_HEAP:
> > > > +   if (BlockNumberIsValid(cbarg->blkno))
> > > > +   errcontext("while vacuuming block %u of 
> > > > relation \"%s.%s\"",
> > > 
> > > I think you should still call errcontext() when blkno is invalid.
> > 
> > In my experience while testing, the conditional avoids lots of CONTEXT noise
> > from interrupted autovacuum, at least.  I couldn't easily reproduce it with 
> > the
> > current patch, though, maybe due to less pushing and popping.
> 
> I think you're saying that the code had the bug that too many lines were
> reported because of excessive stack pushes, and you worked around it by
> making the errcontext() be conditional; and that now the bug is fixed by
> avoiding the push/pop games -- which explains why you can no longer
> reproduce it.  I don't see why you want to keep the no-longer-needed
> workaround.

No - the issue I observed from autovacuum ("while scanning block number
4294967295") was unrelated to showing multiple context lines (that issue I only
saw in the v22 patch, and was related to vacuum_one_index being used by both
leader and parallel workers).

The locations showing a block number first set vacrelstats->blkno to
InvalidBlockNumber, and then later update the vacrelstats->blkno from a loop
variable.  I think today's v24 patch makes it harder to hit the window where
it's set to InvalidBlockNumber, by moving VACUUM_HEAP context into
vacuum_page(), but I don't think we should rely on an absence of
CHECK_FOR_INTERRUPTS() to avoid misleading noise context.  

-- 
Justin




Re: New SQL counter statistics view (pg_stat_sql)

2020-03-04 Thread Mark Dilger



> On Mar 3, 2020, at 6:50 PM, Alvaro Herrera  wrote:
> 
> On 2019-Nov-13, Smith, Peter wrote:
> 
>> From: Thomas Munro  Sent: Monday, 4 November 2019 
>> 1:43 PM
>> 
>>> No comment on the patch but I noticed that the documentation changes don't 
>>> build.  Please make sure you can "make docs" successfully, having installed 
>>> the documentation tools[1].
>> 
>> Thanks for the feedback. An updated patch which fixes the docs issue is 
>> attached.
> 
> So, who is updating this patch for the new cmdtaglist.h stuff?

I have a much altered (competing?) patch inspired by pg_stat_sql.  I am working 
to integrate the changes you made to my commandtag patch into my commandstats 
patch before posting it.

It might be good if somebody else tackled the pg_stat_sql patch in case my 
version is rejected.

I don't want to highjack this thread to talk in detail about the other patch, 
so I'll just give an overview that might be useful for anybody thinking about 
putting effort into committing this patch first. 

The motivation for the commandtag patch that you've committed (thanks again!) 
was to make the commandstats patch possible.  I didn't like the way pg_stat_sql 
stored the tags as strings, but I also still don't like the way it needs to 
take a lock every time a backend increments the counter for a command.  That is 
bad, I think, because the lock overhead for an otherwise fast command (like 
moving a cursor) may be high enough to discourage users from turning this 
feature on in production.

I solve that by trading off speed and memory.  For most commandtags, there is a 
single uint64 counter, and incrementing the counter requires taking a lock.  
This is very similar to how the other patch works.  For the few commandtags 
that we deem worthy of the special treatment, there is an additional uint32 
counter *per backend* reserved in shared memory.  Backends update this counter 
using atomics, but do not require locks.  When the counter is in danger of 
overflow, locks are taken to roll the count into the uint64 counters and zero 
out the backend-specific uint32 counter.  I have roughly 20 command tags 
receiving this special treatment, but the design of the patch doesn't make it 
too hard to change the exact list of tags worthy of it, so I don't plan to 
stress too much about the exact set of tags.  I'm sure you'll all tell me which 
ones you think do or do not deserve the treatment.

If it isn't obvious, the logic of having this treatment for only a subset of 
tags is that 192 tags * 4 bytes per counter * MaxBackends gets expensive for 
folks running a high number of backends.  Taking that from 192 tags down to one 
or two dozen seems worthwhile.

I expect to post on a separate thread before the day is over.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: error context for vacuum to include block number

2020-03-04 Thread Justin Pryzby
On Wed, Mar 04, 2020 at 04:21:06PM +0900, Masahiko Sawada wrote:
> Thank you for updating the patch. But we have two more places where we
> do fsm vacuum.

Oops, thanks.

I realized that vacuum_page is called not only from lazy_vacuum_heap, but also
directly from lazy_scan_heap, which failed to update errcbarg.  So I changed to
update errcbarg in vacuum_page.

What about these other calls ?  I think granularity of individual function
calls requires a debugger, but is it fine issue if errors here are attributed
to (say) "scanning heap" ?

GetRecordedFreeSpace
heap_*_freeze_tuple
heap_page_prune
HeapTupleSatisfiesVacuum
LockBufferForCleanup
MarkBufferDirty
Page*AllVisible
PageGetHeapFreeSpace
RecordPageWithFreeSpace
visibilitymap_*
VM_ALL_FROZEN

> These functions have LVDeadTuples and LVRelStats but LVDeadTuples can
> be referred by LVRelStats. If we want to use LVRelStats as callback
> argument, we can remove function arguments that can be referred by
> LVRelStats.

That doesn't work easily with parallel vacuum, which passes not
vacrelstats->dead_tuples, but a dead_tuples pulled out of shm_toc.

But it was easy enough to remove "reltuples".

-- 
Justin
>From a5d981a5ecd867bb46f51a8fb1b153d203df03ac Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Thu, 12 Dec 2019 20:54:37 -0600
Subject: [PATCH v24 1/4] vacuum errcontext to show block being processed

Discussion:
https://www.postgresql.org/message-id/20191120210600.gc30...@telsasoft.com
---
 src/backend/access/heap/vacuumlazy.c | 192 +++
 1 file changed, 167 insertions(+), 25 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 03c43efc32..084b43c178 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -268,8 +268,19 @@ typedef struct LVParallelState
 	int			nindexes_parallel_condcleanup;
 } LVParallelState;
 
+typedef enum {
+	VACUUM_ERRCB_PHASE_UNKNOWN,
+	VACUUM_ERRCB_PHASE_SCAN_HEAP,
+	VACUUM_ERRCB_PHASE_VACUUM_INDEX,
+	VACUUM_ERRCB_PHASE_VACUUM_HEAP,
+	VACUUM_ERRCB_PHASE_INDEX_CLEANUP,
+	VACUUM_ERRCB_PHASE_VACUUM_FSM,
+} errcb_phase;
+
 typedef struct LVRelStats
 {
+	char 		*relnamespace;
+	char		*relname;
 	/* useindex = true means two-pass strategy; false means one-pass */
 	bool		useindex;
 	/* Overall statistics about rel */
@@ -290,8 +301,12 @@ typedef struct LVRelStats
 	int			num_index_scans;
 	TransactionId latestRemovedXid;
 	bool		lock_waiter_detected;
-} LVRelStats;
 
+	/* Used for error callback: */
+	char 		*indname;
+	BlockNumber blkno;	/* used only for heap operations */
+	errcb_phase	phase;
+} LVRelStats;
 
 /* A few variables that don't seem worth passing around as parameters */
 static int	elevel = -1;
@@ -314,10 +329,10 @@ static void lazy_vacuum_all_indexes(Relation onerel, Relation *Irel,
 	LVRelStats *vacrelstats, LVParallelState *lps,
 	int nindexes);
 static void lazy_vacuum_index(Relation indrel, IndexBulkDeleteResult **stats,
-			  LVDeadTuples *dead_tuples, double reltuples);
+			  LVDeadTuples *dead_tuples, double reltuples, LVRelStats *vacrelstats);
 static void lazy_cleanup_index(Relation indrel,
 			   IndexBulkDeleteResult **stats,
-			   double reltuples, bool estimated_count);
+			   double reltuples, bool estimated_count, LVRelStats *vacrelstats);
 static int	lazy_vacuum_page(Relation onerel, BlockNumber blkno, Buffer buffer,
 			 int tupindex, LVRelStats *vacrelstats, Buffer *vmbuffer);
 static bool should_attempt_truncation(VacuumParams *params,
@@ -337,13 +352,13 @@ static void lazy_parallel_vacuum_indexes(Relation *Irel, IndexBulkDeleteResult *
 		 int nindexes);
 static void parallel_vacuum_index(Relation *Irel, IndexBulkDeleteResult **stats,
   LVShared *lvshared, LVDeadTuples *dead_tuples,
-  int nindexes);
+  int nindexes, LVRelStats *vacrelstats);
 static void vacuum_indexes_leader(Relation *Irel, IndexBulkDeleteResult **stats,
   LVRelStats *vacrelstats, LVParallelState *lps,
   int nindexes);
 static void vacuum_one_index(Relation indrel, IndexBulkDeleteResult **stats,
 			 LVShared *lvshared, LVSharedIndStats *shared_indstats,
-			 LVDeadTuples *dead_tuples);
+			 LVDeadTuples *dead_tuples, LVRelStats *vacrelstats);
 static void lazy_cleanup_all_indexes(Relation *Irel, IndexBulkDeleteResult **stats,
 	 LVRelStats *vacrelstats, LVParallelState *lps,
 	 int nindexes);
@@ -361,6 +376,9 @@ static void end_parallel_vacuum(Relation *Irel, IndexBulkDeleteResult **stats,
 LVParallelState *lps, int nindexes);
 static LVSharedIndStats *get_indstats(LVShared *lvshared, int n);
 static bool skip_parallel_vacuum_index(Relation indrel, LVShared *lvshared);
+static void vacuum_error_callback(void *arg);
+static void update_vacuum_error_cbarg(LVRelStats *errcbarg, int phase,
+		BlockNumber blkno, Relation rel);
 
 
 /*
@@ -460,6 +478,9 @@ heap_vacuum_rel(Relation onerel, Vacuum

Re: range_agg

2020-03-04 Thread Alvaro Herrera
I came across an interesting thing, namely multirange_canonicalize()'s
use of qsort_arg with a callback of range_compare().  range_compare()
calls range_deserialize() (non-trivial parsing) for each input range;
multirange_canonicalize() later does a few extra deserialize calls of
its own.  Call me a premature optimization guy if you will, but I think
it makes sense to have a different struct (let's call it
"InMemoryRange") which stores the parsed representation of each range;
then we can deserialize all ranges up front, and use that as many times
as needed, without having to deserialize each range every time.

While I'm at this, why not name the new file simply multiranges.c
instead of multirangetypes.c?

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Setting ACL

2020-03-04 Thread Stephen Frost
Greetings,

* Vik Fearing (v...@postgresfriends.org) wrote:
> So I have to manually do a diff of the two acls and generate
> GRANT/REVOKE statements?  That's not encouraging. :(

Not sure if it's helpful to you, but pg_dump has code that generates SQL
to do more-or-less exactly this.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: jsonpath syntax extensions

2020-03-04 Thread Nikita Glukhov

On 04.03.2020 19:13, David Steele wrote:


Hi Nikita,

On 2/27/20 10:57 AM, Nikita Glukhov wrote:


Attached patches implement several useful jsonpath syntax extensions.
I already published them two years ago in the original SQL/JSON thread,
but then after creation of separate threads for SQL/JSON functions and
JSON_TABLE I forgot about them.


Are these improvements targeted at PG13 or PG14?  This seems to be a 
pretty big change for the last CF of PG13.  I know these have been 
submitted before but that was a few years ago so I think they count as 
new.


I believe that some of these improvements can get into PG13.  There is no need
to review all of them, we can choose only the simplest ones.

Most of code changes in #3-#5 consist of straightforward boilerplate jsonpath
I/O code, and only changes in jsonpath_exec.c are interesting.

Only the patch #1 is mandatory, patches #3-#6 depend on it.

The patch #2 is not necessary, if jbvArray and jbvObject values would be
wrapped into jbvBinary by JsonbValueToJsonb() call in #4 and #5.

Patch #4 is the simplest one (only 20 new lines of code in jsonpath_exec.c).

Patch #6 is the most complex one, and it affects only jsonpath execution.

--
Nikita Glukhov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: Add LogicalTapeSetExtend() to logtape.c

2020-03-04 Thread Jeff Davis
On Wed, 2020-03-04 at 11:57 +0800, Adam Lee wrote:
> Master(e537aed61d): 13342.844 ms 13195.982 ms 13271.023 ms
> With my patch(a pointer): 13020.029 ms 13008.158 ms 13063.658 ms
> With your patch(flexible array): 12870.117 ms 12814.725 ms 13119.255
> ms

I tracked the problem down.

When we change from a flexible array to a pointer and a separately-
allocated chunk, then it causes some unrelated code in
LogicalTapeWrite() to be optimized differently in my environment.

Specifically, the memcpy() in LogicalTapeWrite() gets turned into an
inline implementation using the "rep movsq" instruction. For my
particular case, actually calling memcpy (rather than using the inlined
version) is a lot faster.

In my test case, LogicalTapeWrite() was being called with size of 4 or
10, so perhaps those small values are just handled much more
efficiently in the real memcpy.

To get it to use the real memcpy(), I had to '#undef _FORTIFY_SOURCE'
at the top of the file, and pass -fno-builtin-memcpy. Then the
regression went away.

I don't care for the version I posted where it repalloc()s the entire
struct. That seems needlessly odd and could cause confusion; and it
also changes the API so that the caller needs to update its pointers.

I'm inclined to use a version similar to Adam's, where it has a pointer
and a separate palloc() chunk (attached). That causes a regression in
my setup, but it's not a terrible regression, and it's not really the
"fault" of the change. Trying to twist code around to satisfy a
particular compiler/libc seems like a bad idea. It also might depend on
the exact query, and may even be faster for some.

Any objections?

Regards,
Jeff Davis

diff --git a/src/backend/utils/sort/logtape.c b/src/backend/utils/sort/logtape.c
index 4f78b55fbaf..4274afea32e 100644
--- a/src/backend/utils/sort/logtape.c
+++ b/src/backend/utils/sort/logtape.c
@@ -191,8 +191,8 @@ struct LogicalTapeSet
 	Size		freeBlocksLen;	/* current allocated length of freeBlocks[] */
 
 	/* The array of logical tapes. */
-	int			nTapes;			/* # of logical tapes in set */
-	LogicalTape tapes[FLEXIBLE_ARRAY_MEMBER];	/* has nTapes nentries */
+	intnTapes;	/* # of logical tapes in set */
+	LogicalTape	   *tapes;	/* has nTapes nentries */
 };
 
 static void ltsWriteBlock(LogicalTapeSet *lts, long blocknum, void *buffer);
@@ -201,6 +201,7 @@ static long ltsGetFreeBlock(LogicalTapeSet *lts);
 static void ltsReleaseBlock(LogicalTapeSet *lts, long blocknum);
 static void ltsConcatWorkerTapes(LogicalTapeSet *lts, TapeShare *shared,
  SharedFileSet *fileset);
+static void ltsInitTape(LogicalTape *lt);
 static void ltsInitReadBuffer(LogicalTapeSet *lts, LogicalTape *lt);
 
 
@@ -536,6 +537,30 @@ ltsConcatWorkerTapes(LogicalTapeSet *lts, TapeShare *shared,
 	lts->nHoleBlocks = lts->nBlocksAllocated - nphysicalblocks;
 }
 
+/*
+ * Initialize per-tape struct.  Note we allocate the I/O buffer and the first
+ * block for a tape only when it is first actually written to.  This avoids
+ * wasting memory space when tuplesort.c overestimates the number of tapes
+ * needed.
+ */
+static void
+ltsInitTape(LogicalTape *lt)
+{
+	lt->writing   = true;
+	lt->frozen= false;
+	lt->dirty = false;
+	lt->firstBlockNumber  = -1L;
+	lt->curBlockNumber= -1L;
+	lt->nextBlockNumber   = -1L;
+	lt->offsetBlockNumber = 0L;
+	lt->buffer= NULL;
+	lt->buffer_size   = 0;
+	/* palloc() larger than MaxAllocSize would fail */
+	lt->max_size  = MaxAllocSize;
+	lt->pos   = 0;
+	lt->nbytes= 0;
+}
+
 /*
  * Lazily allocate and initialize the read buffer. This avoids waste when many
  * tapes are open at once, but not all are active between rewinding and
@@ -579,15 +604,13 @@ LogicalTapeSetCreate(int ntapes, TapeShare *shared, SharedFileSet *fileset,
 	 int worker)
 {
 	LogicalTapeSet *lts;
-	LogicalTape *lt;
 	int			i;
 
 	/*
 	 * Create top-level struct including per-tape LogicalTape structs.
 	 */
 	Assert(ntapes > 0);
-	lts = (LogicalTapeSet *) palloc(offsetof(LogicalTapeSet, tapes) +
-	ntapes * sizeof(LogicalTape));
+	lts = (LogicalTapeSet *) palloc(sizeof(LogicalTapeSet));
 	lts->nBlocksAllocated = 0L;
 	lts->nBlocksWritten = 0L;
 	lts->nHoleBlocks = 0L;
@@ -596,30 +619,10 @@ LogicalTapeSetCreate(int ntapes, TapeShare *shared, SharedFileSet *fileset,
 	lts->freeBlocks = (long *) palloc(lts->freeBlocksLen * sizeof(long));
 	lts->nFreeBlocks = 0;
 	lts->nTapes = ntapes;
+	lts->tapes = (LogicalTape *) palloc(ntapes * sizeof(LogicalTape));
 
-	/*
-	 * Initialize per-tape structs.  Note we allocate the I/O buffer and the
-	 * first block for a tape only when it is first actually written to.  This
-	 * avoids wasting memory space when tuplesort.c overestimates the number
-	 * of tapes needed.
-	 */
 	for (i = 0; i < ntapes; i++)
-	{
-		lt = tapes[i];
-		lt->writing = true;
-		lt->frozen = false;
-		lt->dirty = false;
-		lt->firstBlockNumber = -1L;
-		lt->curBlockNumber = -1L;

Re: [PATCH] Finally split StdRdOptions into HeapOptions and ToastOptions

2020-03-04 Thread Nikolay Shaplov
В письме от понедельник, 9 декабря 2019 г. 12:11:17 MSK пользователь Michael 
Paquier написал:
> On Fri, Dec 06, 2019 at 06:18:48PM +0300, Nikolay Shaplov wrote:
> > In the thread
> > https://www.postgresql.org/message-id/2620882.s52SJui4ql@x200m
> > I've suggested to split one big StdRdOption that is used for options
> > storage into into Options structures individual for each relkind and each
> > relam
> > 
> > And here goes the last part of StrRdOptions removal patch, where
> > StdRdOptions is replaced with HeapOptions and ToastOptions.
> 
> -typedef struct StdRdOptions
> +/*
> + * HeapOptions
> + * Binary representation of relation options for Heap relations.
> + */
> +typedef struct HeapOptions
> 
> I think that it makes sense to split relation options dedicated to
> heap into their own parsing structure, because those options are
> actually related to the table AM heap.  However, I think that this
> patch is not ambitious enough in the work which is done and that
> things could move into a more generic direction.  At the end of the
> day, I'd like to think that we should have something like:
> - Heap-related reloptions are built as part of its AM handler in
> heapam_handler.c, with reloptions.c holding no more references to
> heap.  At all.
> - The table AM option parsing follows a model close to what is done
> for indexes in terms of option parsing, moving the responsibility to
> define relation options to each table AM.
> - Toast is an interesting case, as table AMs may want to use toast
> tables.  Or not.  Robert may be able to comment more on that as he has
> worked in this area for bd12499.

Oh, yeah, I forget that relations now also have AM :-)

But the truth is that my goal is to move all code that defines all option 
names, min/max values etc, move it inside am code. To move data from 
boolRelOpts, intRelOpts, realRelOpts, enumRelOpts, enumRelOpts from 
reloptions.c into the code that implement AMs that uses these options.

I did it for indexes in patch I've offered several years ago. Now we have also 
relaion AM. 

But I would prefer to fix index AM relioptions first, and then copy that 
solution for relations. 

Because if I frist copy AM solution from indexes to relation, then I will have 
to fix it in two places.

So I would prefer to keep reloptions for relations in relations.c, only split 
them into HeapOptions and ToastOptions, then change AM for indexes moving 
option definition into AM's and then clone the solution for relations.

This seems to be most simple and most logical way.

PS. I've checked the patch against current master. No changes were needed, but 
I am attaching a diff made against current master, just in case.
diff --git a/src/backend/access/common/reloptions.c b/src/backend/access/common/reloptions.c
index c3d45c7..73d5e4f 100644
--- a/src/backend/access/common/reloptions.c
+++ b/src/backend/access/common/reloptions.c
@@ -46,9 +46,8 @@
  * upper and lower bounds (if applicable); for strings, consider a validation
  * routine.
  * (ii) add a record below (or use add__reloption).
- * (iii) add it to the appropriate options struct (perhaps StdRdOptions)
- * (iv) add it to the appropriate handling routine (perhaps
- * default_reloptions)
+ * (iii) add it to the appropriate options struct
+ * (iv) add it to the appropriate handling routine
  * (v) make sure the lock level is set correctly for that operation
  * (vi) don't forget to document the option
  *
@@ -1116,9 +1115,11 @@ extractRelOptions(HeapTuple tuple, TupleDesc tupdesc,
 	switch (classForm->relkind)
 	{
 		case RELKIND_RELATION:
-		case RELKIND_TOASTVALUE:
 		case RELKIND_MATVIEW:
-			options = heap_reloptions(classForm->relkind, datum, false);
+			options = heap_reloptions(datum, false);
+			break;
+		case RELKIND_TOASTVALUE:
+			options = toast_reloptions(datum, false);
 			break;
 		case RELKIND_PARTITIONED_TABLE:
 			options = partitioned_table_reloptions(datum, false);
@@ -1490,55 +1491,62 @@ fillRelOptions(void *rdopts, Size basesize,
 
 
 /*
- * Option parser for anything that uses StdRdOptions.
+ * Option parsing definition for autovacuum. Used in toast and heap options.
+ */
+
+#define AUTOVACUUM_RELOPTIONS(OFFSET)\
+		{"autovacuum_enabled", RELOPT_TYPE_BOOL, \
+		OFFSET + offsetof(AutoVacOpts, enabled)},\
+		{"autovacuum_vacuum_threshold", RELOPT_TYPE_INT, \
+		OFFSET + offsetof(AutoVacOpts, vacuum_threshold)},   \
+		{"autovacuum_analyze_threshold", RELOPT_TYPE_INT,\
+		OFFSET + offsetof(AutoVacOpts, analyze_threshold)},  \
+		{"autovacuum_vacuum_cost_delay", RELOPT_TYPE_REAL,   \
+		OFFSET + offsetof(AutoVacOpts, vacuum_cost_delay)},  \
+		{"autovacuum_vacuum_cost_limit", RELOPT_TYPE_INT,\
+		OFFSET + offsetof(AutoVacOpts, vacuum_cost_limit)},  \
+		{"autovacuum_freeze_min_age", RELOPT_TYPE_INT,   \
+		OFFSET + offsetof(AutoVacOpts, freeze_

Re: Allowing ALTER TYPE to change storage strategy

2020-03-04 Thread Tom Lane
Tomas Vondra  writes:
> On Mon, Mar 02, 2020 at 02:11:10PM -0500, Tom Lane wrote:
>> In the v3 patch below, I've ripped out the ALTER DOMAIN syntax on
>> the assumption that we'd do the latter; but I've not written the
>> cascade recursion logic, because that seemed like a lot of work
>> to do in advance of having consensus on it being a good idea.

> I do agree we should do the latter, i.e. maintain the assumption that
> domains have the same properties as their base type. I can't think of a
> use case for allowing them to differ, it just didn't occur to me there
> is this implicit assumption when writing the patch.

Here's a v4 that is rebased over HEAD + the OPAQUE-ectomy that I
proposed at <4110.1583255...@sss.pgh.pa.us>, plus it adds recursion
to domains, and I also added the ability to set typmod I/O and
analyze functions, which seems like functionality that somebody
could possibly wish to add to a type after-the-fact much like
binary I/O.

I thought about allowing the basic I/O functions to be replaced as
well, but I couldn't really convince myself that there's a use-case
for that.  In practice you'd probably always just change the
behavior of the existing I/O functions, not want to sub in new ones.

(I kind of wonder, actually, whether there's a use-case for the
NONE options here at all.  When would you remove a support function?)

Of the remaining CREATE TYPE options, "category" and "preferred"
could perhaps be changeable but I couldn't get excited about them.
All the others seem like there are gotchas --- for example,
changing a type's collatable property is much harder than it
looks because it'd affect stored views.  So this seems like a
reasonable stopping point.

I think this is committable --- how about you?

I've included the OPAQUE-ectomy patches below so that the cfbot
can test this, but they're the same as in the other thread.

regards, tom lane

diff --git a/doc/src/sgml/ref/create_type.sgml b/doc/src/sgml/ref/create_type.sgml
index 175315f..111f8e6 100644
--- a/doc/src/sgml/ref/create_type.sgml
+++ b/doc/src/sgml/ref/create_type.sgml
@@ -823,18 +823,6 @@ CREATE TYPE name
function is written in C.
   
 
-  
-   In PostgreSQL versions before 7.3, it
-   was customary to avoid creating a shell type at all, by replacing the
-   functions' forward references to the type name with the placeholder
-   pseudo-type opaque.  The cstring arguments and
-   results also had to be declared as opaque before 7.3.  To
-   support loading of old dump files, CREATE TYPE will
-   accept I/O functions declared using opaque, but it will issue
-   a notice and change the function declarations to use the correct
-   types.
-  
-
  
 
  
diff --git a/src/backend/commands/functioncmds.c b/src/backend/commands/functioncmds.c
index 540044b..9279c05 100644
--- a/src/backend/commands/functioncmds.c
+++ b/src/backend/commands/functioncmds.c
@@ -1444,50 +1444,6 @@ SetFunctionReturnType(Oid funcOid, Oid newRetType)
 
 
 /*
- * SetFunctionArgType - change declared argument type of a function
- *
- * As above, but change an argument's type.
- */
-void
-SetFunctionArgType(Oid funcOid, int argIndex, Oid newArgType)
-{
-	Relation	pg_proc_rel;
-	HeapTuple	tup;
-	Form_pg_proc procForm;
-	ObjectAddress func_address;
-	ObjectAddress type_address;
-
-	pg_proc_rel = table_open(ProcedureRelationId, RowExclusiveLock);
-
-	tup = SearchSysCacheCopy1(PROCOID, ObjectIdGetDatum(funcOid));
-	if (!HeapTupleIsValid(tup)) /* should not happen */
-		elog(ERROR, "cache lookup failed for function %u", funcOid);
-	procForm = (Form_pg_proc) GETSTRUCT(tup);
-
-	if (argIndex < 0 || argIndex >= procForm->pronargs ||
-		procForm->proargtypes.values[argIndex] != OPAQUEOID)
-		elog(ERROR, "function %u doesn't take OPAQUE", funcOid);
-
-	/* okay to overwrite copied tuple */
-	procForm->proargtypes.values[argIndex] = newArgType;
-
-	/* update the catalog and its indexes */
-	CatalogTupleUpdate(pg_proc_rel, &tup->t_self, tup);
-
-	table_close(pg_proc_rel, RowExclusiveLock);
-
-	/*
-	 * Also update the dependency to the new type. Opaque is a pinned type, so
-	 * there is no old dependency record for it that we would need to remove.
-	 */
-	ObjectAddressSet(type_address, TypeRelationId, newArgType);
-	ObjectAddressSet(func_address, ProcedureRelationId, funcOid);
-	recordDependencyOn(&func_address, &type_address, DEPENDENCY_NORMAL);
-}
-
-
-
-/*
  * CREATE CAST
  */
 ObjectAddress
diff --git a/src/backend/commands/typecmds.c b/src/backend/commands/typecmds.c
index 5209736..d6e694e 100644
--- a/src/backend/commands/typecmds.c
+++ b/src/backend/commands/typecmds.c
@@ -163,7 +163,6 @@ DefineType(ParseState *pstate, List *names, List *parameters)
 	char	   *array_type;
 	Oid			array_oid;
 	Oid			typoid;
-	Oid			resulttype;
 	ListCell   *pl;
 	ObjectAddress address;
 
@@ -196,8 +195,7 @@ DefineType(ParseState *pstate, List *names, List *parameters)
 #endif
 
 	/*
-	 * Look to see if type already exists (presumably as a shell; if not,

Re: pgbench: option delaying queries till connections establishment?

2020-03-04 Thread Andres Freund
Hi,

On 2020-03-01 22:16:06 +0100, Fabien COELHO wrote:
>
> Hello Andres,
>
> > FWIW, leaving windows, error handling, and other annoyances aside, this
> > can be implemented fairly simply. See below.
>
> Attached an attempt at improving things.

Awesome!


> I've put 2 barriers: one so that all threads are up, one when all
> connections are setup and the bench is ready to go.

I'd done similarly locally.

Slight aside: Have you ever looked at moving pgbench to non-blocking
connection establishment? It seems weird to use non-blocking everywhere
but connection establishment.


> I've done a blind attempt at implementing the barrier stuff on windows.

Neat.


> I've changed the performance calculations depending on -C or not. Ramp-up
> effects are smoothed.


> I've merged all time-related stuff (time_t, instr_time, int64) to use a
> unique type (pg_time_usec_t) and set of functions/macros, which simplifies
> the code somehow.

Hm.  I'm not convinced it's a good idea for pgbench to do its own thing
here.


>
>  #ifdef WIN32
> +#define PTHREAD_BARRIER_SERIAL_THREAD (-1)
> +
>  /* Use native win32 threads on Windows */
>  typedef struct win32_pthread *pthread_t;
>  typedef int pthread_attr_t;
> +typedef SYNCHRONIZATION_BARRIER pthread_barrier_t;
>
>  static int   pthread_create(pthread_t *thread, pthread_attr_t *attr, void 
> *(*start_routine) (void *), void *arg);
>  static int   pthread_join(pthread_t th, void **thread_return);
> +
> +static int   pthread_barrier_init(pthread_barrier_t *barrier, void *unused, 
> int nthreads);
> +static int   pthread_barrier_wait(pthread_barrier_t *barrier);
> +static int   pthread_barrier_destroy(pthread_barrier_t *barrier);

How about using 'struct unknown_type *unused' instead of "unused"?
Because the void *unused will accept everything...


> +/* Thread synchronization barriers */
> +static pthread_barrier_t
> + start_barrier,  /* all threads are started */
> + bench_barrier;  /* benchmarking ready to start */
> +

We don't really need two barriers here. The way that pthread barriers
are defined is that they 'reset' after all the threads have arrived. You
can argue we still want that, but ...



> @@ -5165,20 +5151,16 @@ printSimpleStats(const char *prefix, SimpleStats *ss)
>
>  /* print out results */
>  static void
> -printResults(StatsData *total, instr_time total_time,
> -  instr_time conn_total_time, int64 latency_late)
> +printResults(StatsData *total,

Given that we're changing the output (for the better) of pgbench again,
I wonder if we should add the pgbench version to the benchmark
output. Otherwise it seems easy to end up e.g. seeing a performance
difference between pg12 and pg14, where all that's actually happening is
a different output because each run used the respective pgbench version.



> +  pg_time_usec_t total_delay,/* benchmarking 
> time */
> +  pg_time_usec_t conn_total_delay,   /* is_connect */
> +  pg_time_usec_t conn_elapsed_delay, /* !is_connect 
> */
> +  int64 latency_late)

I'm not a fan of naming these 'delay'. To me that doesn't sounds like
it's about the time the total benchmark has taken.


> @@ -5239,8 +5220,16 @@ printResults(StatsData *total, instr_time total_time,
>  0.001 * total->lag.sum / total->cnt, 0.001 * 
> total->lag.max);
>   }
>
> - printf("tps = %f (including connections establishing)\n", tps_include);
> - printf("tps = %f (excluding connections establishing)\n", tps_exclude);
> + if (is_connect)
> + {
> + printf("average connection time = %.3f ms\n", 0.001 * 
> conn_total_delay / total->cnt);
> + printf("tps = %f (including reconnection times)\n", tps);
> + }
> + else
> + {
> + printf("initial connection time = %.3f ms\n", 0.001 * 
> conn_elapsed_delay);
> + printf("tps = %f (without initial connection establishing)\n", 
> tps);
> + }

Keeping these separate makes sense to me, they're just so wildly
different.


> +/*
> + * Simpler convenient interface
> + *
> + * The instr_time type is expensive when dealing with time arithmetic.
> + * Define a type to hold microseconds on top of this, suitable for
> + * benchmarking performance measures, eg in "pgbench".
> + */
> +typedef int64 pg_time_usec_t;
> +
> +static inline pg_time_usec_t
> +pg_time_get_usec(void)
> +{
> + instr_time now;
> +
> + INSTR_TIME_SET_CURRENT(now);
> + return (pg_time_usec_t) INSTR_TIME_GET_MICROSEC(now);
> +}

For me the function name sounds like you're getting the usec out of a
pg_time. Not that it's getting a new timestamp.


> +#define PG_TIME_SET_CURRENT_LAZY(t)  \
> + if ((t) == 0)   \
> + (t) = pg_time_get_usec()
> +
> +#define PG_TIME_GET_DOUBLE(t) (0.01 * (t))
>  #endif  

Re: [Patch] pg_rewind: options to use restore_command from recovery.conf or command line

2020-03-04 Thread Alexey Kondratov

On 04.03.2020 10:45, Michael Paquier wrote:

On Mon, Mar 02, 2020 at 08:59:49PM +0300, Alexey Kondratov wrote:

All other remarks look clear for me, so I fix them in the next patch
version, thanks.

Already done as per the attached, with a new routine named
getRestoreCommand() and more done.


Many thanks for doing that. I went through the diff between v21 and v20. 
Most of the changes look good to me.


- *        Functions for finding and validating executable files
+ *        Functions for finding and validating from executables files

There is probably something missing here. Finding and validating what? 
And 'executables files' does not seem to be correct as well.


+        # First, remove all the content in the archive directory,
+        # as RecursiveCopy::copypath does not support copying to
+        # existing directories.

I think that 'remove all the content' is not completely correct in this 
case. We are simply removing archive directory. There is no content 
there yet, so 'First, remove archive directory...' should be fine.



- I did not actually get why you don't check for a missing command
when using wait_result_is_any_signal.  In this case I'd think that it
is better to exit immediately as follow-up calls would just fail.


Believe me or not, but I put 'false' there intentionally. The idea was 
that if the reason is a signal, then maybe user tired of waiting and 
killed that restore_command process theirself or something like that, so 
it is better to exit immediately. If it was a missing command, then 
there is no hurry, so we can go further and complain that attempt of 
recovering WAL segment has failed.


Actually, I guess that there is no big difference if we include missing 
command here or not. There is no complicated logic further compared to 
real recovery process in Postgres, where we cannot simply return false 
in that case.



- The code was rather careless about error handling and
RestoreArchivedWALFile(), and it seemed to me that it is rather
pointless to report an extra message "could not restore file \"%s\"
from archive" on top of the other error.


Probably you mean several pg_log_error calls not followed by 'return 
-1;'. Yes, I did it to fall down to the function end and show this extra 
message, but I agree that there is no much sense in doing so.



Regards

--
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres Company





Re: Portal->commandTag as an enum

2020-03-04 Thread Mark Dilger



> On Mar 2, 2020, at 1:57 PM, Alvaro Herrera  wrote:
> 
> I pushed it now.

Thanks again!  While rebasing some other work on top, I noticed one of your 
comments is out of date:

--- a/src/include/tcop/cmdtaglist.h
+++ b/src/include/tcop/cmdtaglist.h
@@ -23,7 +23,7 @@
  * textual name, so that we can bsearch on it; see GetCommandTagEnum().
  */
 
-/* symbol name, textual name, event_trigger_ok, table_rewrite_ok, rowcount, 
last_oid */
+/* symbol name, textual name, event_trigger_ok, table_rewrite_ok, rowcount */
 PG_CMDTAG(CMDTAG_UNKNOWN, "???", false, false, false)
 PG_CMDTAG(CMDTAG_ALTER_ACCESS_METHOD, "ALTER ACCESS METHOD", true, false, 
false)
 PG_CMDTAG(CMDTAG_ALTER_AGGREGATE, "ALTER AGGREGATE", true, false, false)


—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: Option to dump foreign data in pg_dump

2020-03-04 Thread Ashutosh Bapat
I am just responding on the latest mail on this thread. But the question is
about functionality. The proposal is to add a single flag
--include-foreign-data which controls whether or not data is dumped for all
the foreign tables in a database. That may not serve the purpose. A foreign
table may point to a view, materialized view or inheritance tree, and so
on. A database can have foreign tables pointing to all of those kinds.
Restoring data to a view won't be possible and restoring it into an
inheritance tree would insert it into the parent only and not the children.
Further, a user may not want the data to be dumped for all the foreign
tables since their usages are different esp. considering restore. I think a
better option is to extract data in a foreign table using --table if that's
the only usage. Otherwise, we need a foreign table level flag indicating
whether pg_dump should dump the data for that foreign table or not.

On Wed, Mar 4, 2020 at 12:41 AM David Steele  wrote:

> Hi Luis,
>
> On 1/29/20 11:05 AM, Peter Eisentraut wrote:
> > On 2020-01-21 10:36, Luis Carril wrote:
> >>> Yes we can support --include-foreign-data without parallel option and
> >>> later add support for parallel option as a different patch.
> >>
> >> Hi,
> >>
> >>  I've attached a new version of the patch in which an error is
> >> emitted if the parallel backup is used with the --include-foreign-data
> >> option.
> >
> > This seems like an overreaction.  The whole point of
> > lockTableForWorker() is to avoid deadlocks, but foreign tables don't
> > have locks, so it's not a problem.  I think you can just skip foreign
> > tables in lockTableForWorker() using the same logic that getTables()
> uses.
> >
> > I think parallel data dump would be an especially interesting option
> > when using foreign tables, so it's worth figuring this out.
>
> What do you think of Peter's comment?
>
> Regards,
> --
> -David
> da...@pgmasters.net
>
>
>

-- 
Best Wishes,
Ashutosh Bapat


Re: HAVE_WORKING_LINK still needed?

2020-03-04 Thread Peter Eisentraut

On 2020-02-28 19:44, Alvaro Herrera wrote:

On 2020-Feb-28, Tom Lane wrote:


Also +1 for s/durable_link_or_rename/durable_link/.


Actually, it's not *that* either, because what the function does is link
followed by unlink.  So it's more a variation of durable_rename with
slightly different semantics -- the difference is what happens if a file
with the target name already exists.  Maybe call it durable_rename_no_overwrite.


I have committed the first two patches.

Here is the third patch again, we renaming durable_link_or_rename() to 
durable_rename_excl().  This seems to match existing Unix system call 
naming best (see open() flag O_EXCL, and macOS has a renamex_np() flag 
RENAME_EXCL).


--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From e892237439e49770cc5013848f4b1f8b5c6c0ff4 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Wed, 4 Mar 2020 17:33:10 +0100
Subject: [PATCH v2] Remove HAVE_WORKING_LINK

Previously, hard links were not used on Windows and Cygwin, but they
support them just fine in currently supported OS versions, so we can
use them there as well.

Since all supported platforms now support hard links, we can remove
the alternative code paths.

Rename durable_link_or_rename() to durable_rename_excl() to make the
purpose more clear without referencing the implementation details.

Discussion: 
https://www.postgresql.org/message-id/flat/72fff73f-dc9c-4ef4-83e8-d2e60c98df48%402ndquadrant.com
---
 src/backend/access/transam/timeline.c |  4 ++--
 src/backend/access/transam/xlog.c |  4 ++--
 src/backend/storage/file/fd.c | 21 +
 src/include/pg_config_manual.h|  7 ---
 src/include/storage/fd.h  |  2 +-
 5 files changed, 10 insertions(+), 28 deletions(-)

diff --git a/src/backend/access/transam/timeline.c 
b/src/backend/access/transam/timeline.c
index 57ee4e048b..860a996414 100644
--- a/src/backend/access/transam/timeline.c
+++ b/src/backend/access/transam/timeline.c
@@ -429,7 +429,7 @@ writeTimeLineHistory(TimeLineID newTLI, TimeLineID 
parentTLI,
 * Perform the rename using link if available, paranoidly trying to 
avoid
 * overwriting an existing file (there shouldn't be one).
 */
-   durable_link_or_rename(tmppath, path, ERROR);
+   durable_rename_excl(tmppath, path, ERROR);
 
/* The history file can be archived immediately. */
if (XLogArchivingActive())
@@ -507,7 +507,7 @@ writeTimeLineHistoryFile(TimeLineID tli, char *content, int 
size)
 * Perform the rename using link if available, paranoidly trying to 
avoid
 * overwriting an existing file (there shouldn't be one).
 */
-   durable_link_or_rename(tmppath, path, ERROR);
+   durable_rename_excl(tmppath, path, ERROR);
 }
 
 /*
diff --git a/src/backend/access/transam/xlog.c 
b/src/backend/access/transam/xlog.c
index 4361568882..108358df57 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -3591,11 +3591,11 @@ InstallXLogFileSegment(XLogSegNo *segno, char *tmppath,
 * Perform the rename using link if available, paranoidly trying to 
avoid
 * overwriting an existing file (there shouldn't be one).
 */
-   if (durable_link_or_rename(tmppath, path, LOG) != 0)
+   if (durable_rename_excl(tmppath, path, LOG) != 0)
{
if (use_lock)
LWLockRelease(ControlFileLock);
-   /* durable_link_or_rename already emitted log message */
+   /* durable_rename_excl already emitted log message */
return false;
}
 
diff --git a/src/backend/storage/file/fd.c b/src/backend/storage/file/fd.c
index 34f7443110..7dc6dd2f15 100644
--- a/src/backend/storage/file/fd.c
+++ b/src/backend/storage/file/fd.c
@@ -765,10 +765,11 @@ durable_unlink(const char *fname, int elevel)
 }
 
 /*
- * durable_link_or_rename -- rename a file in a durable manner.
+ * durable_rename_excl -- rename a file in a durable manner, without
+ * overwriting an existing target file
  *
- * Similar to durable_rename(), except that this routine tries (but does not
- * guarantee) not to overwrite the target file.
+ * Similar to durable_rename(), except that this routine will fail if the
+ * target file already exists.
  *
  * Note that a crash in an unfortunate moment can leave you with two links to
  * the target file.
@@ -779,7 +780,7 @@ durable_unlink(const char *fname, int elevel)
  * valid upon return.
  */
 int
-durable_link_or_rename(const char *oldfile, const char *newfile, int elevel)
+durable_rename_excl(const char *oldfile, const char *newfile, int elevel)
 {
/*
 * Ensure that, if we crash directly after the rename/link, a file with
@@ -788,7 +789,6 @@ durable_link_or_rename(const char *oldfile, const char 
*newfile, int elevel)
if (fsync_fname_ext(oldfile, false, false, elevel) != 0)

Re: Some improvements to numeric sqrt() and ln()

2020-03-04 Thread Dean Rasheed
On Wed, 4 Mar 2020 at 14:41, David Steele  wrote:
>
> Are these improvements targeted at PG13 or PG14?  This seems a pretty
> big change for the last CF of PG13.
>

Well of course that's not entirely up to me, but I was hoping to
commit it for PG13.

It's very well covered by a large number of regression tests in both
numeric.sql and numeric_big.sql, since nearly anything that calls
ln(), log() or pow() ends up going through sqrt_var(). Also, the
changes are local to functions in numeric.c, which makes them easy to
revert if something proves to be wrong.

Regards,
Dean




Re: Bug in pg_restore with EventTrigger in parallel mode

2020-03-04 Thread vignesh C
On Fri, Feb 21, 2020 at 12:06 AM Fabrízio de Royes Mello
 wrote:
>
>
>
> On Thu, Feb 20, 2020 at 4:52 AM Michael Paquier  wrote:
> >
> > That sounds right, as event triggers could interact with GRANT and
> > REFRESH of matviews, so they should be logically last.  Looking at the
> > recent commit history, this would be similar to 3eb9a5e as we don't
> > really have a way to treat event triggers as dependency-sortable
> > objects.
> >
>
> Indeed... event triggers should be the last thing to be restored.
>
> >  What kind of errors did you see in this customer
> > environment?  Errors triggered by one or more event triggers blocking
> > some commands based on a tag match?
> >
>
> By error I meant the weird behavior I described before that pg_restore create 
> the event triggers in parallel mode and after that other objects are created 
> then the event trigger is fired during the restore...
>
> Have a look at the new attached patch.
>

The test works fine with the patch.

Few comments:
There is minor code alignment that need to be fixed:
git apply fix_pg_restore_parallel_with_event_trigger_v2.patch
fix_pg_restore_parallel_with_event_trigger_v2.patch:11: trailing whitespace.
 * then ACLs, matview refresh items, then event triggers. We might be
warning: 1 line adds whitespace errors.

I'm not sure if we can add a test for this, can you have a thought
about this to check if we can add a test.

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com




Re: jsonpath syntax extensions

2020-03-04 Thread David Steele

Hi Nikita,

On 2/27/20 10:57 AM, Nikita Glukhov wrote:


Attached patches implement several useful jsonpath syntax extensions.
I already published them two years ago in the original SQL/JSON thread,
but then after creation of separate threads for SQL/JSON functions and
JSON_TABLE I forgot about them.


Are these improvements targeted at PG13 or PG14?  This seems to be a 
pretty big change for the last CF of PG13.  I know these have been 
submitted before but that was a few years ago so I think they count as new.


Regards,
--
-David
da...@pgmasters.net




Re: WAL usage calculation patch

2020-03-04 Thread Julien Rouhaud
On Thu, Feb 20, 2020 at 06:56:27PM +0300, Kirill Bychik wrote:
> > вт, 18 февр. 2020 г. в 06:23, Thomas Munro :
> > > On Mon, Feb 10, 2020 at 8:20 PM Craig Ringer  
> > > wrote:
> > > > On Wed, 5 Feb 2020 at 21:36, Kirill Bychik  
> > > > wrote:
> > > > > Patch is separated in two parts: core changes and pg_stat_statements
> > > > > additions. Essentially the extension has its schema updated to allow
> > > > > two more fields, docs updated to reflect the change. Patch is prepared
> > > > > against master branch.
> > > > >
> > > > > Please provide your comments and/or code findings.
> > > >
> > > > I like the concept, I'm a big fan of anything that affordably improves
> > > > visibility into Pg's I/O and activity.
> > >
> > > +1

Huge +1 too.

> > Thank you for testing the patch and running extension checks. I assume
> > the patch applies without problems.
> >
> > As for the regr test, it apparently requires some rework. I didn't pay
> > attention enough to make sure the data I check is actually meaningful
> > and isolated enough to be repeatable.
> >
> > Please consider the extension part of the patch as WIP, I'll resubmit
> > the patch once I get a stable and meanngful test up. Thanks for
> > finding it!
> >
>
> I have reworked the extension regression test to be more isolated.
> Apparently, something merged into master branch shifted my numbers.
>
> PFA the new patch. Core part didn't change a bit, the extension part
> has regression test SQL and expected log changed.

I'm quite worried about the stability of those counters for regression tests.
Wouldn't a checkpoint happening during the test change them?

While at it, did you consider adding a full-page image counter in the WalUsage?
That's something I'd really like to have and it doesn't seem hard to integrate.

Another point is that this patch won't help to see autovacuum activity.
As an example, I did a quick test to store the informations in pgstat, sending
the data in the PG_FINALLY part of vacuum():

rjuju=# create table t1(id integer, val text);
CREATE TABLE
rjuju=# insert into t1 select i, 'val ' || i from generate_series(1, 10) i;
INSERT 0 10
rjuju=# vacuum t1;
VACUUM
rjuju=# select datname, vac_wal_records, vac_wal_bytes, autovac_wal_records, 
autovac_wal_bytes
from pg_stat_database where datname = 'rjuju';
 datname | vac_wal_records | vac_wal_bytes | autovac_wal_records | 
autovac_wal_bytes
-+-+---+-+---
 rjuju   | 547 | 65201 |   0 |  
   0
(1 row)

rjuju=# delete from t1 where id % 2 = 0;
DELETE 5
rjuju=# select pg_sleep(60);
 pg_sleep
--

(1 row)

rjuju=# select datname, vac_wal_records, vac_wal_bytes, autovac_wal_records, 
autovac_wal_bytes
from pg_stat_database where datname = 'rjuju';
 datname | vac_wal_records | vac_wal_bytes | autovac_wal_records | 
autovac_wal_bytes
-+-+---+-+---
 rjuju   | 547 | 65201 |1631 |
323193
(1 row)

That's seems like useful data (especially since I recently had to dig into a
problematic WAL consumption issue that was due to some autovacuum activity),
but that may seem strange to only account for (auto)vacuum activity, rather
than globally, grouping per RmgrId or CommandTag for instance.  We could then
see the complete WAL usage per-database.  What do you think?

Some minor points I noticed:

- the extension patch doesn't apply anymore, I guess since 70a7732007bc4689

 #define PARALLEL_KEY_JIT_INSTRUMENTATION UINT64CONST(0xE009)
+#define PARALLEL_KEY_WAL_USAGE UINT64CONST(0xE010)

Shouldn't it be 0xA rather than 0x10?

- it would be better to add a version number to the patches, so we're sure
  which one we're talking about.




Re: PG14 target version?

2020-03-04 Thread David Steele

On 3/4/20 10:43 AM, Alvaro Herrera wrote:

On 2020-Mar-03, David Steele wrote:


Anybody know how to add 14 to the "Target version" dropdown in the CF app?

I haven't needed it yet but I'd like it to be there when I do.


Done.


I see it, thanks!

--
-David
da...@pgmasters.net




Re: PG14 target version?

2020-03-04 Thread Alvaro Herrera
On 2020-Mar-03, David Steele wrote:

> Anybody know how to add 14 to the "Target version" dropdown in the CF app?
> 
> I haven't needed it yet but I'd like it to be there when I do.

Done.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: [Proposal] Global temporary tables

2020-03-04 Thread tushar

On 3/3/20 2:10 PM, 曾文旌(义从) wrote:

I fixed in global_temporary_table_v16-pg13.patch.
Thanks Wenjing. The reported  issue is fixed now  but  there is an 
another similar  scenario -
if we enable 'on_commit_delete_rows' to true using alter command then 
getting same issue i.e rows are not removing after commit.


x=# create global  temp table foo123(n int) with 
(on_commit_delete_rows='false');

CREATE TABLE
x=#
x=# alter table foo123 set ( on_commit_delete_rows='true');
ALTER TABLE
x=#
x=# insert into foo123 values (1);
INSERT 0 1
x=# select * from foo123;   <- row should get removed.
 n
---
 1
(1 row)

--
regards,tushar
EnterpriseDB  https://www.enterprisedb.com/
The Enterprise PostgreSQL Company





Re: logical copy_replication_slot issues

2020-03-04 Thread Arseny Sher

Masahiko Sawada  writes:

> I've attached the updated version patch that incorporated your
> comments. I believe we're going in the right direction for fixing this
> bug. I'll register this item to the next commit fest so as not to
> forget.

I've moved confirmed_flush check to the second lookup out of paranoic
considerations (e.g. slot could have been recreated and creation hasn't
finished yet) and made some minor stylistic adjustments. It looks good
to me now.

diff --git a/src/backend/replication/slotfuncs.c b/src/backend/replication/slotfuncs.c
index 2c9d5de6d9..4a3c7aa0ce 100644
--- a/src/backend/replication/slotfuncs.c
+++ b/src/backend/replication/slotfuncs.c
@@ -121,7 +121,8 @@ pg_create_physical_replication_slot(PG_FUNCTION_ARGS)
  */
 static void
 create_logical_replication_slot(char *name, char *plugin,
-bool temporary, XLogRecPtr restart_lsn)
+bool temporary, XLogRecPtr restart_lsn,
+bool find_startpoint)
 {
 	LogicalDecodingContext *ctx = NULL;
 
@@ -139,16 +140,18 @@ create_logical_replication_slot(char *name, char *plugin,
 		  temporary ? RS_TEMPORARY : RS_EPHEMERAL);
 
 	/*
-	 * Create logical decoding context, to build the initial snapshot.
+	 * Create logical decoding context to find start point or, if we don't
+	 * need it, to 1) bump slot's restart_lsn 2) check plugin sanity.
 	 */
 	ctx = CreateInitDecodingContext(plugin, NIL,
-	false,	/* do not build snapshot */
+	false,	/* do not build data snapshot */
 	restart_lsn,
 	logical_read_local_xlog_page, NULL, NULL,
 	NULL);
 
 	/* build initial snapshot, might take a while */
-	DecodingContextFindStartpoint(ctx);
+	if (find_startpoint)
+		DecodingContextFindStartpoint(ctx);
 
 	/* don't need the decoding context anymore */
 	FreeDecodingContext(ctx);
@@ -179,7 +182,8 @@ pg_create_logical_replication_slot(PG_FUNCTION_ARGS)
 	create_logical_replication_slot(NameStr(*name),
 	NameStr(*plugin),
 	temporary,
-	InvalidXLogRecPtr);
+	InvalidXLogRecPtr,
+	true);
 
 	values[0] = NameGetDatum(&MyReplicationSlot->data.name);
 	values[1] = LSNGetDatum(MyReplicationSlot->data.confirmed_flush);
@@ -683,10 +687,19 @@ copy_replication_slot(FunctionCallInfo fcinfo, bool logical_slot)
 
 	/* Create new slot and acquire it */
 	if (logical_slot)
+	{
+		/*
+		 * WAL required for building snapshot could be removed as we haven't
+		 * reserved WAL yet. So we create a new logical replication slot
+		 * without building an initial snapshot.  A reasonable start point for
+		 * decoding will be provided by the source slot.
+		 */
 		create_logical_replication_slot(NameStr(*dst_name),
 		plugin,
 		temporary,
-		src_restart_lsn);
+		src_restart_lsn,
+		false);
+	}
 	else
 		create_physical_replication_slot(NameStr(*dst_name),
 		 true,
@@ -703,6 +716,7 @@ copy_replication_slot(FunctionCallInfo fcinfo, bool logical_slot)
 		TransactionId copy_xmin;
 		TransactionId copy_catalog_xmin;
 		XLogRecPtr	copy_restart_lsn;
+		XLogRecPtr	copy_confirmed_flush;
 		bool		copy_islogical;
 		char	   *copy_name;
 
@@ -714,6 +728,7 @@ copy_replication_slot(FunctionCallInfo fcinfo, bool logical_slot)
 		copy_xmin = src->data.xmin;
 		copy_catalog_xmin = src->data.catalog_xmin;
 		copy_restart_lsn = src->data.restart_lsn;
+		copy_confirmed_flush = src->data.confirmed_flush;
 
 		/* for existence check */
 		copy_name = pstrdup(NameStr(src->data.name));
@@ -738,6 +753,13 @@ copy_replication_slot(FunctionCallInfo fcinfo, bool logical_slot)
 			NameStr(*src_name)),
 	 errdetail("The source replication slot was modified incompatibly during the copy operation.")));
 
+		/* The source slot must have a consistent snapshot */
+		if (src_islogical && XLogRecPtrIsInvalid(copy_confirmed_flush))
+			ereport(ERROR,
+	(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+	 errmsg("cannot copy a logical replication slot that doesn't have confirmed_flush_lsn"),
+	 errhint("Retry when the source replication slot creation is finished.")));
+
 		/* Install copied values again */
 		SpinLockAcquire(&MyReplicationSlot->mutex);
 		MyReplicationSlot->effective_xmin = copy_effective_xmin;
@@ -746,6 +768,7 @@ copy_replication_slot(FunctionCallInfo fcinfo, bool logical_slot)
 		MyReplicationSlot->data.xmin = copy_xmin;
 		MyReplicationSlot->data.catalog_xmin = copy_catalog_xmin;
 		MyReplicationSlot->data.restart_lsn = copy_restart_lsn;
+		MyReplicationSlot->data.confirmed_flush = copy_confirmed_flush;
 		SpinLockRelease(&MyReplicationSlot->mutex);
 
 		ReplicationSlotMarkDirty();


-- cheers, arseny


Re: Symbolic names for the values of typalign and typstorage

2020-03-04 Thread Tom Lane
Michael Paquier  writes:
> On Tue, Mar 03, 2020 at 04:45:51PM -0500, Tom Lane wrote:
>> Yeah, that works out very nicely: there's now only one place besides
>> tupmacs.h that needs a new #include.
>> I did a little more polishing, and consider the attached committable,
>> unless anyone has objections.

> Nice.  I have looked at the patch and it seems to me that no spots
> have been missed.

Pushed, thanks for reviewing.

regards, tom lane




Re: ALTER TEXT SEARCH DICTIONARY tab completion

2020-03-04 Thread Georgios Kokolatos
The following review has been posted through the commitfest application:
make installcheck-world:  not tested
Implements feature:   not tested
Spec compliant:   not tested
Documentation:not tested

It looks good and does what it says on the tin.

One minor nitpick I feel I should add is that for completeness and
balance the equivalent `CREATE TEXT SEARCH DICTIONARY` should 
get the same treatment.

Maybe something along the lines of:
-   else if (Matches("CREATE", "TEXT", "SEARCH", "CONFIGURATION", MatchAny))
+   else if (Matches("CREATE", "TEXT", "SEARCH", 
"DICTIONARY|CONFIGURATION", MatchAny))

Re: Berserk Autovacuum (let's save next Mandrill)

2020-03-04 Thread Laurenz Albe
On Tue, 2020-03-03 at 16:28 +0100, Laurenz Albe wrote:
> As a more substantial base for discussion, here is a patch that:
> 
> - introduces a GUC and reloption "autovacuum_vacuum_insert_limit",
>   default 1000
> 
> - introduces a statistics counter "inserts_since_vacuum" per table
>   that gets reset to 0 after vacuum
> 
> - causes autovacuum to run without cleaning up indexes if
>   inserts_since_vacuum > autovacuum_vacuum_insert_limit
>   and there is no other reason for an autovacuum

I just realized that the exercise is pointless unless that
autovacuum also runs with FREEZE on.

Updated patch attached.

Yours,
Laurenz Albe
From 9c9d5c65addd7a8e973898eab053011392b9a18c Mon Sep 17 00:00:00 2001
From: Laurenz Albe 
Date: Wed, 4 Mar 2020 16:14:00 +0100
Subject: [PATCH] Autovacuum tables that have received only inserts

Add "autovacuum_vacuum_insert_limit" GUC and reloption
with a default value of 1000.

Any table that has received more inserts since it was
last vacuumed (and that is not vacuumed for another
reason) will be autovacuumed with "index_cleanup off, freeze".

This avoids the known problem that insert-only tables
are never autovacuumed until they need to have their
anti-wraparound autovacuum, which then can be massive
and disruptive.

To track the number of inserts since the last vacuum,
introduce a StatTabEntry "inserts_since_vacuum" that
gets reset to 0 after a vacuum.
---
 src/backend/access/common/reloptions.c| 11 +
 src/backend/postmaster/autovacuum.c   | 41 +++
 src/backend/postmaster/pgstat.c   |  5 +++
 src/backend/utils/misc/guc.c  |  9 
 src/backend/utils/misc/postgresql.conf.sample |  2 +
 src/bin/psql/tab-complete.c   |  2 +
 src/include/pgstat.h  |  1 +
 src/include/postmaster/autovacuum.h   |  1 +
 src/include/utils/rel.h   |  1 +
 9 files changed, 66 insertions(+), 7 deletions(-)

diff --git a/src/backend/access/common/reloptions.c b/src/backend/access/common/reloptions.c
index 5325dd3f61..9c3d8b174e 100644
--- a/src/backend/access/common/reloptions.c
+++ b/src/backend/access/common/reloptions.c
@@ -233,6 +233,15 @@ static relopt_int intRelOpts[] =
 		},
 		-1, 0, INT_MAX
 	},
+	{
+		{
+			"autovacuum_vacuum_insert_limit",
+			"Maximum number of tuple inserts prior to vacuum",
+			RELOPT_KIND_HEAP | RELOPT_KIND_TOAST,
+			ShareUpdateExclusiveLock
+		},
+		-1, 0, INT_MAX
+	},
 	{
 		{
 			"autovacuum_analyze_threshold",
@@ -1501,6 +1510,8 @@ default_reloptions(Datum reloptions, bool validate, relopt_kind kind)
 		offsetof(StdRdOptions, autovacuum) + offsetof(AutoVacOpts, enabled)},
 		{"autovacuum_vacuum_threshold", RELOPT_TYPE_INT,
 		offsetof(StdRdOptions, autovacuum) + offsetof(AutoVacOpts, vacuum_threshold)},
+		{"autovacuum_vacuum_insert_limit", RELOPT_TYPE_INT,
+		offsetof(StdRdOptions, autovacuum) + offsetof(AutoVacOpts, vacuum_ins_limit)},
 		{"autovacuum_analyze_threshold", RELOPT_TYPE_INT,
 		offsetof(StdRdOptions, autovacuum) + offsetof(AutoVacOpts, analyze_threshold)},
 		{"autovacuum_vacuum_cost_limit", RELOPT_TYPE_INT,
diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
index e3a43d3296..091a01804f 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -116,6 +116,7 @@ int			autovacuum_max_workers;
 int			autovacuum_work_mem = -1;
 int			autovacuum_naptime;
 int			autovacuum_vac_thresh;
+int			autovacuum_vac_ins_limit;
 double		autovacuum_vac_scale;
 int			autovacuum_anl_thresh;
 double		autovacuum_anl_scale;
@@ -330,7 +331,8 @@ static void relation_needs_vacanalyze(Oid relid, AutoVacOpts *relopts,
 	  Form_pg_class classForm,
 	  PgStat_StatTabEntry *tabentry,
 	  int effective_multixact_freeze_max_age,
-	  bool *dovacuum, bool *doanalyze, bool *wraparound);
+	  bool *dovacuum, bool *doanalyze, bool *wraparound,
+	  bool *insert_only);
 
 static void autovacuum_do_vac_analyze(autovac_table *tab,
 	  BufferAccessStrategy bstrategy);
@@ -2056,6 +2058,7 @@ do_autovacuum(void)
 		bool		dovacuum;
 		bool		doanalyze;
 		bool		wraparound;
+		bool		insert_only;
 
 		if (classForm->relkind != RELKIND_RELATION &&
 			classForm->relkind != RELKIND_MATVIEW)
@@ -2096,7 +2099,8 @@ do_autovacuum(void)
 		/* Check if it needs vacuum or analyze */
 		relation_needs_vacanalyze(relid, relopts, classForm, tabentry,
   effective_multixact_freeze_max_age,
-  &dovacuum, &doanalyze, &wraparound);
+  &dovacuum, &doanalyze, &wraparound,
+  &insert_only);
 
 		/* Relations that need work are added to table_oids */
 		if (dovacuum || doanalyze)
@@ -2149,6 +2153,7 @@ do_autovacuum(void)
 		bool		dovacuum;
 		bool		doanalyze;
 		bool		wraparound;
+		bool		insert_only;
 
 		/*
 		 * We cannot safely process other backends' temp tables, so skip 'em.
@@ -2179,7 +2184,8 @@ do_autovacuum(void)
 
 		re

Re: Asynchronous Append on postgres_fdw nodes.

2020-03-04 Thread David Steele

On 2/28/20 3:06 AM, Kyotaro Horiguchi wrote:

Hello, this is a follow-on of [1] and [2].

Currently the executor visits execution nodes one-by-one.  Considering
sharding, Append on multiple postgres_fdw nodes can work
simultaneously and that can largely shorten the respons of the whole
query.  For example, aggregations that can be pushed-down to remote
would be accelerated by the number of remote servers. Even other than
such an extreme case, collecting tuples from multiple servers also can
be accelerated by tens of percent [2].

I have suspended the work waiting asyncrohous or push-up executor to
come but the mood seems inclining toward doing that before that to
come [3].

The patchset consists of three parts.


Are these improvements targeted at PG13 or PG14?  This seems to be a 
pretty big change for the last CF of PG13.


Regards,
--
-David
da...@pgmasters.net




Re: Refactor compile-time assertion checks for C/C++

2020-03-04 Thread Georgios Kokolatos
The following review has been posted through the commitfest application:
make installcheck-world:  not tested
Implements feature:   not tested
Spec compliant:   not tested
Documentation:not tested

In my humble opinion the patch improves readability, hence gets my +1.

No further suggestions. Passing on to a committer to judge further.

The new status of this patch is: Ready for Committer


Re: Some improvements to numeric sqrt() and ln()

2020-03-04 Thread David Steele

Hi Dean,

On 2/28/20 3:15 AM, Dean Rasheed wrote:

Attached is a WIP patch to improve the performance of numeric sqrt()
and ln(), which also makes a couple of related improvements to
div_var_fast(), all of which have knock-on benefits for other numeric
functions. The actual impact varies greatly depending on the inputs,
but the overall effect is to reduce the run time of the numeric_big
regression test by about 20%.


Are these improvements targeted at PG13 or PG14?  This seems a pretty 
big change for the last CF of PG13.


Regards,
--
-David
da...@pgmasters.net




Re: replay pause vs. standby promotion

2020-03-04 Thread Jehan-Guillaume de Rorthais
On Wed, 04 Mar 2020 15:00:54 +0300
Sergei Kornilov  wrote:

> Hello
> 
> > I want to start this discussion because this is related to the patch
> > (propoesd at the thread [1]) that I'm reviewing. It does that partially,
> > i.e., prefers the promotion only when the pause is requested by
> > recovery_target_action=pause. But I think that it's reasonable and
> > more consistent to do that whether whichever the pause is requested
> > by pg_wal_replay_pause() or recovery_target_action.  
> 
> +1.

+1

And pg_wal_replay_pause () should probably raise an error explaining the
standby ignores the pause because of ongoing promotion.




Re: Improving connection scalability: GetSnapshotData()

2020-03-04 Thread David Steele

On 3/1/20 3:36 AM, Andres Freund wrote:


I think these are pretty nice results.


Indeed they are.

Is the target version PG13 or PG14?  It seems like a pretty big patch to 
go in the last commitfest for PG13.


Regards,
--
-David
da...@pgmasters.net




Re: Minor issues in .pgpass

2020-03-04 Thread Hamid Akhtar
The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:   tested, passed
Spec compliant:   tested, passed
Documentation:tested, passed

Tested and looks fine to me.

The new status of this patch is: Ready for Committer


Re: backup manifests

2020-03-04 Thread tushar

Hi,

There is a scenario in which i add something inside the pg_tablespace 
directory , i am getting an error like-


pg_validatebackup: * manifest_checksum = 
77ddacb4e7e02e2b880792a19a3adf09266dd88553dd15cfd0c22caee7d9cc04
pg_validatebackup: error: "pg_tblspc/16385/*PG_13_202002271*/test" is 
present on disk but not in the manifest


but if i remove 'PG_13_202002271 ' directory then there is no error

[centos@tushar-ldap-docker bin]$ ./pg_validatebackup data
pg_validatebackup: * manifest_checksum = 
77ddacb4e7e02e2b880792a19a3adf09266dd88553dd15cfd0c22caee7d9cc04

pg_validatebackup: backup successfully verified

Steps to reproduce -
--connect to psql terminal   , create a tablespace
postgres=# \! mkdir /tmp/my_tblspc
postgres=# create tablespace tbs location '/tmp/my_tblspc';
CREATE TABLESPACE
postgres=# \q

--run pg_basebackup
[centos@tushar-ldap-docker bin]$ ./pg_basebackup -D data_dir   -T 
/tmp/my_tblspc/=/tmp/new_my_tblspc

[centos@tushar-ldap-docker bin]$
[centos@tushar-ldap-docker bin]$ ls /tmp/new_my_tblspc/
PG_13_202002271

--create a new file under PG_13_* folder
[centos@tushar-ldap-docker bin]$ touch 
/tmp/new_my_tblspc/PG_13_202002271/test

[centos@tushar-ldap-docker bin]$

--run pg_validatebackup ,Getting an error which looks expected
[centos@tushar-ldap-docker bin]$ ./pg_validatebackup data_dir/
pg_validatebackup: * manifest_checksum = 
3951308eab576906ebdb002ff00ca313b2c1862592168c1f5f7ecf051ac07907
pg_validatebackup: error: "pg_tblspc/16386/PG_13_202002271/test" is 
present on disk but not in the manifest

[centos@tushar-ldap-docker bin]$

--remove the added file
[centos@tushar-ldap-docker bin]$ rm -rf   
/tmp/new_my_tblspc/PG_13_202002271/test


--run pg_validatebackup , working fine
[centos@tushar-ldap-docker bin]$ ./pg_validatebackup data_dir/
pg_validatebackup: * manifest_checksum = 
3951308eab576906ebdb002ff00ca313b2c1862592168c1f5f7ecf051ac07907

pg_validatebackup: backup successfully verified
[centos@tushar-ldap-docker bin]$

--remove the folder PG_13*
[centos@tushar-ldap-docker bin]$ rm -rf   
/tmp/new_my_tblspc/PG_13_202002271/

[centos@tushar-ldap-docker bin]$
[centos@tushar-ldap-docker bin]$ ls /tmp/new_my_tblspc/

--run pg_validatebackup ,   No error reported  ?
[centos@tushar-ldap-docker bin]$ ./pg_validatebackup data_dir/
pg_validatebackup: * manifest_checksum = 
3951308eab576906ebdb002ff00ca313b2c1862592168c1f5f7ecf051ac07907

pg_validatebackup: backup successfully verified
[centos@tushar-ldap-docker bin]$

Start the server -

[centos@tushar-ldap-docker bin]$ ./pg_ctl -D data_dir/ start -o '-p 9033'
waiting for server to start2020-03-04 19:18:54.839 IST [13097] LOG:  
starting PostgreSQL 13devel on x86_64-pc-linux-gnu, compiled by gcc 
(GCC) 4.8.5 20150623 (Red Hat 4.8.5-39), 64-bit
2020-03-04 19:18:54.840 IST [13097] LOG:  listening on IPv6 address 
"::1", port 9033
2020-03-04 19:18:54.840 IST [13097] LOG:  listening on IPv4 address 
"127.0.0.1", port 9033
2020-03-04 19:18:54.842 IST [13097] LOG:  listening on Unix socket 
"/tmp/.s.PGSQL.9033"
2020-03-04 19:18:54.843 IST [13097] LOG:  could not open directory 
"pg_tblspc/16386/PG_13_202002271": No such file or directory
2020-03-04 19:18:54.845 IST [13098] LOG:  database system was 
interrupted; last known up at 2020-03-04 19:14:50 IST
2020-03-04 19:18:54.937 IST [13098] LOG:  could not open directory 
"pg_tblspc/16386/PG_13_202002271": No such file or directory
2020-03-04 19:18:54.939 IST [13098] LOG:  could not open directory 
"pg_tblspc/16386/PG_13_202002271": No such file or directory

2020-03-04 19:18:54.939 IST [13098] LOG:  redo starts at 0/1828
2020-03-04 19:18:54.939 IST [13098] LOG:  consistent recovery state 
reached at 0/18000100

2020-03-04 19:18:54.939 IST [13098] LOG:  redo done at 0/18000100
2020-03-04 19:18:54.941 IST [13098] LOG:  could not open directory 
"pg_tblspc/16386/PG_13_202002271": No such file or directory
2020-03-04 19:18:54.984 IST [13097] LOG:  database system is ready to 
accept connections

 done
server started
[centos@tushar-ldap-docker bin]$

regards,

On 3/4/20 3:51 PM, tushar wrote:
Another scenario, in which if we modify Manifest-Checksum" value from 
backup_manifest file , we are not getting an error


[centos@tushar-ldap-docker bin]$ ./pg_validatebackup data/
pg_validatebackup: * manifest_checksum = 
28d082921650d0ae881de8ceb122c8d2af5f449f51ecfb446827f7f49f91f65d

pg_validatebackup: backup successfully verified

open backup_manifest file and replace

"Manifest-Checksum": 
"8d082921650d0ae881de8ceb122c8d2af5f449f51ecfb446827f7f49f91f65d"}

with
"Manifest-Checksum": "Hello World"}

rerun the pg_validatebackup

[centos@tushar-ldap-docker bin]$ ./pg_validatebackup data/
pg_validatebackup: * manifest_checksum = Hello World
pg_validatebackup: backup successfully verified

regards,

On 3/4/20 3:26 PM, tushar wrote:

Hi,
Another observation , if i change the ownership of a file which is 
under global/ directory

i.e

[root@tushar-ldap-docker global]# chown enterprisedb 2396

and run

Re: [PATCH] Erase the distinctClause if the result is unique by definition

2020-03-04 Thread Andy Fan
>
>
>> * There are some changes in existing regression cases that aren't
>> visibly related to the stated purpose of the patch, eg it now
>> notices that "select distinct max(unique2) from tenk1" doesn't
>> require an explicit DISTINCT step.  That's not wrong, but I wonder
>> if maybe you should subdivide this patch into more than one patch,
>> because that must be coming from some separate change.  I'm also
>> wondering what caused the plan change in expected/join.out.
>>
>
> Per my purpose it should be in the same patch,  the logical here is we
> have distinct in the sql and the query is distinct already since the max
> function (the rule is defined in query_is_distinct_agg which is splited
> from
> the original query_is_distinct_for clause).
>

I think I was right until I come
into contrib/postgres_fdw/sql/postgres_fdw.sql.
Per my understanding,  the query the result of  "select max(a) from t"  is
unique
since the aggregation function and has no group clause there.  But in the
postgres_fdw.sql case,  the Query->hasAggs is true for "select distinct
(select count(*) filter (where t2.c2 = 6 and t2.c1 < 10) from ft1 t1 where
t1.c1 = 6)
from ft2 t2 where t2.c2 % 6 = 0 order by 1;"  This looks very strange to
me.
Is my understanding wrong or there is a bug here?

query->hasAggs was set to true in the following call stack.

 pstate->p_hasAggs = true;

..

 qry->hasAggs =  pstate->p_hasAggs;


0  in check_agglevels_and_constraints of parse_agg.c:343
1  in transformAggregateCall of parse_agg.c:236
2  in ParseFuncOrColumn of parse_func.c:805
3  in transformFuncCall of parse_expr.c:1558
4  in transformExprRecurse of parse_expr.c:265
5  in transformExpr of parse_expr.c:155
6  in transformTargetEntry of parse_target.c:105
7  in transformTargetList of parse_target.c:193
8  in transformSelectStmt of analyze.c:1224
9  in transformStmt of analyze.c:301

You can see the new updated patch which should fix all the issues you point
out
except the one for supporting group by.   The another reason for this patch
will
not be the final one is because the changes for postgres_fdw.out is too
arbitrary.
uploading it now just for reference.  (The new introduced guc variable can
be
removed at last, keeping it now just make sure the testing is easier.)


At a high level, I'm a bit disturbed that this focuses only on DISTINCT
>>> and doesn't (appear to) have any equivalent intelligence for GROUP BY,
>>> though surely that offers much the same opportunities for optimization.
>>> It seems like it'd be worthwhile to take a couple steps back and see
>>> if we couldn't recast the logic to work for both.
>>>
>>>
>> OK,  Looks group by is a bit harder  than distinct since the aggregation
>> function.
>> I will go through the code to see where to add this logic.
>>
>>
>
> Can we grantee  any_aggr_func(a) == a  if only 1 row returned,  if so, we
> can do
> some work on the pathtarget/reltarget by transforming the Aggref to raw
> expr.
> I checked the execution path of the aggregation call, looks it depends on
> Agg node
> which is the thing we want to remove.
>

We can't grantee  any_aggr_func(a) == a  when only 1 row returned, so the
above
method doesn't work.   do you have any suggestion for this?
From 9449c09688d542c4dc201ee866f67d67304cff98 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?=E4=B8=80=E6=8C=83?= 
Date: Wed, 4 Mar 2020 14:33:56 +0800
Subject: [PATCH v2] [PATCH] Erase the distinct path if the result is unique by
 catalog

For a single relation, we can tell it by any one of the following
is true:
1. The pk is in the target list.
2. The uk is in the target list and the columns is not null
3. The columns in group-by clause is also in the target list

for relation join, we can tell it by:
if every relation in the jointree yield a unique result set,then
the final result is unique as well regardless the join method.
---
 .../postgres_fdw/expected/postgres_fdw.out|  28 +-
 src/backend/optimizer/path/costsize.c |   1 +
 src/backend/optimizer/plan/analyzejoins.c | 184 +++-
 src/backend/optimizer/plan/planner.c  |  27 ++
 src/backend/optimizer/util/plancat.c  |   9 +
 src/backend/utils/misc/guc.c  |  10 +
 src/include/nodes/pathnodes.h |   1 +
 src/include/optimizer/cost.h  |   1 +
 src/include/optimizer/planmain.h  |   2 +
 src/test/regress/expected/aggregates.out  |  13 +-
 src/test/regress/expected/join.out|  16 +-
 src/test/regress/expected/select_distinct.out | 276 ++
 src/test/regress/expected/sysviews.out|   3 +-
 src/test/regress/sql/select_distinct.sql  |  84 ++
 14 files changed, 619 insertions(+), 36 deletions(-)

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 84fd3ad2e0..215f10bf7d 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -2902,22 +2902,20 @@ selec

Re: PG_COLOR not mentioned in docs of vacuumlo, oid2name and pgbench

2020-03-04 Thread Michael Paquier
On Wed, Mar 04, 2020 at 11:31:27AM +0100, Fabien COELHO wrote:
> No objection. I did not know there was such a thing…
> 
> Maybe a more detailed explanation about PG_COLOR could be stored somewhere,
> and all affected tools could link to it? Or not.

One argument against that position is that each tool may just handle a
subset of the full set available, and that some of the subsets may
partially intersect.  Fun.

> For "pgbench", you could also add the standard sentence that it uses libpq
> environment variables, as it is also missing?

Yeah, that's true.  Let's fix this part while on it.
--
Michael


signature.asc
Description: PGP signature


Re: PG_COLOR not mentioned in docs of vacuumlo, oid2name and pgbench

2020-03-04 Thread Michael Paquier
On Wed, Mar 04, 2020 at 10:22:26AM +0100, Juan José Santamaría Flecha wrote:
> I think there are a couple tools missing: pg_archivecleanup, pg_ctl,
> pg_test_fsync and pg_upgrade. pg_regress also, but there is nothing to do
> in the documentation with it.

Indeed, true for pg_archivecleanup and pg_test_fsync, but not for
pg_ctl and pg_upgrade.  The funny part about pg_ctl is that the
initialization is done for nothing, because nothing is actually logged
with the APIs of logging.c.  pg_upgrade uses its own logging APIs,
which have nothing to do with logging.c.
--
Michael


signature.asc
Description: PGP signature


Re: PG_COLOR not mentioned in docs of vacuumlo, oid2name and pgbench

2020-03-04 Thread Michael Paquier
On Wed, Mar 04, 2020 at 10:12:23AM +0100, Daniel Gustafsson wrote:
> +   color in diagnostics messages.  Possible values are
> +   always, auto,
> +   never.
> 
> Not being a native english speaker, I might have it backwards, but I find 
> lists
> of values in a sentence like this to be easier to read when the final value is
> separated by a conjunction like:
> 
>   , , .. ,  and 

Point received.  Your suggestion is more natural to me as well.  Now,
all the existing docs don't follow that style so I chose consistency.
--
Michael


signature.asc
Description: PGP signature


Re: Cast to uint16 in pg_checksum_page()

2020-03-04 Thread Michael Paquier
On Wed, Mar 04, 2020 at 07:02:43AM -0500, David Steele wrote:
> Yes, this is being included in an external project. Previously we have used
> a highly marked-up version but we are now trying to pull in the header more
> or less verbatim.
> 
> Since this header is specifically designated as something external projects
> may want to use I think it makes sense to fix the warning.

This sounds like a sensible argument, similar to the ones raised on
the other thread, so no objections from me to improve things here.  I
can look at that tomorrow, except if somebody else beats me to it.
--
Michael


signature.asc
Description: PGP signature


Re: Minor issues in .pgpass

2020-03-04 Thread Hamid Akhtar
On Wed, Mar 4, 2020 at 4:54 PM Fujii Masao 
wrote:

>
>
> On 2020/03/04 20:39, Hamid Akhtar wrote:
> >
> >
> > On Tue, Mar 3, 2020 at 8:57 PM Fujii Masao  > wrote:
> >
> >
> >
> > On 2020/03/03 21:38, Hamid Akhtar wrote:
> >  >
> >  >
> >  > On Mon, Mar 2, 2020 at 6:07 PM Fujii Masao <
> masao.fu...@oss.nttdata.com   masao.fu...@oss.nttdata.com >> wrote:
> >  >
> >  >
> >  >
> >  > On 2020/02/29 0:46, Hamid Akhtar wrote:
> >  >  > The following review has been posted through the
> commitfest application:
> >  >  > make installcheck-world:  not tested
> >  >  > Implements feature:   not tested
> >  >  > Spec compliant:   not tested
> >  >  > Documentation:not tested
> >  >  >
> >  >  > First of all, this seems like fixing a valid issue,
> albeit, the probability of somebody messing is low, but it is still better
> to fix this problem.
> >  >  >
> >  >  > I've not tested the patch in any detail, however, there
> are a couple of comments I have before I proceed on with detailed testing.
> >  >
> >  > Thanks for the review and comments!
> >  >
> >  >  > 1. pgindent is showing a few issues with formatting.
> Please have a look and resolve those.
> >  >
> >  > Yes.
> >  >
> >  >  > 2. I think you can potentially use "len" variable instead
> of introducing "buflen" and "tmplen" variables.
> >  >
> >  > Basically I don't want to use the same variable for several
> purposes
> >  > because which would decrease the code readability.
> >  >
> >  >  > Also, I would choose a more appropriate name for "tmp"
> variable.
> >  >
> >  > Yeah, so what about "rest" as the variable name?
> >  >
> >  >  > I believe if you move the following lines before the
> conditional statement and simply and change the if statement to "if (len >=
> sizeof(buf) - 1)", it will serve the purpose.
> >  >
> >  > ISTM that this doesn't work correctly when the "buf" contains
> >  > trailing carriage returns but not newlines (i.e., this line
> is too long
> >  > so the "buf" doesn't include newline). In this case,
> pg_strip_crlf()
> >  > shorten the "buf" and then its return value "len" should
> become
> >  > less than sizeof(buf). So the following condition always
> becomes
> >  > false unexpectedly in that case even though there is still
> rest of
> >  > the line to eat.
> >  >
> >  >
> >  > Per code comments for pg_strip_crlf:
> >  > "pg_strip_crlf -- Remove any trailing newline and carriage return"
> >  > If the buf read contains a newline or a carriage return at the
> end, then clearly the line
> >  > is not exceeding the sizeof(buf).
> >
> > No if the length of the setting line exceeds sizeof(buf) and
> > the buf contains only a carriage return at the end and not newline.
> > This case can happen because fgets() stops reading when a newline
> > (not a carriage return) is found. Normal users are very unlikely to
> > add a carriage return into the middle of the pgpass setting line
> > in practice, though. But IMO the code should handle even this
> > case because it *can* happen, if the code is not so complicated.
> >
> >
> > I'm not sure if I understand your comment here. From the code of
> pg_strip_crlf
> > I see that it is handling both carriage return and/or new line at the
> end of a
> > string:
>
> So if "buf" contains a carriage return at the end, it's removed and
> the "len" that pg_strip_crlf() returns obviously should be smaller
> than sizeof(buf). This causes the following condition that you
> proposed as follows to always be false (i.e., len < sizeof(buf) - 1)
> even when there are still rest of line. So we cannot eat rest of
> the line even though it exists. I'm missing something?
>

No, you are perfectly fine. I now understand where you are coming from. So,
all good now.


>
> +   if (len >= sizeof(buf) - 1)
> +   {
> +   chartmp[LINELEN];
> +
> +   /*
> +* Warn if this password setting line is too long,
> +* because it's unexpectedly truncated.
> +*/
> +   if (buf[0] != '#')
> +   fprintf(stderr,
> +   libpq_gettext("WARNING:
> line %d too long in password file \"%s\"\n"),
> +   line_number, pgpassfile);
> +
> +   /* eat rest of the line */
> +   while (!feof(fp) && !ferror(fp))
>
> Regards,
>
> --
> Fujii Masao
> NTT DATA CORPORATION
> Advanced Platform Technology Group
> 

Re: [HACKERS] make async slave to wait for lsn to be replayed

2020-03-04 Thread David Steele

On 3/4/20 5:36 AM, Fujii Masao wrote:

On Tue, Mar 13, 2018 at 10:06 PM David Steele  wrote:

On 3/6/18 9:25 PM, Michael Paquier wrote:

On Tue, Mar 06, 2018 at 02:24:24PM +0300, Ivan Kartyshov wrote:

Hello, I now is preparing the patch over syntax that Simon offered. And in
few day I will update the patch.
Thank you for your interest in thread.


It has been more than one month since a patch update has been requested,
and time is growing short.  This refactored patch introduces a whole new
concept as well, so my recommendation would be to mark this patch as
returned with feedback, and then review it freshly for v12 if this
concept is still alive and around.


This patch wasn't updated at the beginning of the CF and still hasn't
been updated after almost two weeks.

I have marked the patch Returned with Feedback.  Please resubmit to a
new CF when you have an updated patch.


There are no updates from about two years before, but this patch
has been registered in CF 2020-03. Not sure why. It should be marked
as Returned with Feedback again?


Worse, it was marked Needs Review even though no new patch was provided.

I'm going to set this back to Returned with Feedback.  If anyone has a 
good reason that it should be in the CF we can always revive it.


Regards,
--
-David
da...@pgmasters.net




Re: Identifying user-created objects

2020-03-04 Thread Fujii Masao




On 2020/03/04 19:14, Masahiko Sawada wrote:

On Wed, 4 Mar 2020 at 18:57, Fujii Masao  wrote:




On 2020/03/04 18:36, Masahiko Sawada wrote:

On Wed, 4 Mar 2020 at 18:02, Fujii Masao  wrote:




On 2020/03/04 17:05, Masahiko Sawada wrote:

On Wed, 4 Mar 2020 at 16:43, Fujii Masao  wrote:




On 2020/02/05 20:26, Masahiko Sawada wrote:

Hi,

User can create database objects such as functions into pg_catalog.
But if I'm not missing something, currently there is no
straightforward way to identify if the object is a user created object
or a system object which is created during initdb. If we can do that
user will be able to check if malicious functions are not created in
the database, which is important from the security perspective.


The function that you are proposing is really enough for this use case?
What if malicious users directly change the oid of function
to < FirstNormalObjectId? Or you're assuming that malicious users will
never log in as superuser and not be able to change the oid?


That's a good point! I'm surprised that user is allowed to update an
oid of database object. In addition, surprisingly we can update it to
0, which in turn leads the assertion failure:


Since non-superusers are not allowed to do that by default,
that's not so bad? That is, to avoid such unexpected change of oid,
admin just should prevent malicious users from logging in as superusers
and not give the permission on system catalogs to such users.



I think there is still insider threats. As long as we depend on
superuser privilege to do some DBA work, a malicious DBA might be able
to log in as superuser and modify oid.


Yes. But I'm sure that DBA has already considered the measures
againt such threads. Otherwise malicious users can do anything
more malicious rather than changing oid.


Agreed. So that's not a serious problem in practice but we cannot say
the checking by pg_is_user_object() is totally enough for checking
whether malicious object exists or not. Is that right?


Yes.

My opinion is that, if malious users are not allowed to log in
as superusers and the admin give no permission on the system
schema/catalog to them, checking whether the object is defined
under pg_catalog schema or not is enough for your purpose.
Because they are also not allowed to create the object under
pg_catalog. pg_is_user_object() seems not necessary.

OTOH, if you address the case where malicious users can create
the object under pg_catalog, of course, checking whether
the object is defined under pg_catalog schema or not is enough
for the purpose. But pg_is_user_object() is also not enough
because such users can change oid.

Regards,

--
Fujii Masao
NTT DATA CORPORATION
Advanced Platform Technology Group
Research and Development Headquarters




Re: Cast to uint16 in pg_checksum_page()

2020-03-04 Thread David Steele

On 3/4/20 1:05 AM, Tom Lane wrote:

Michael Paquier  writes:

On Tue, Mar 03, 2020 at 06:37:36PM -0500, David Steele wrote:

It seems like an explicit cast to uint16 would be better?



Attempting to compile the backend code with -Wconversion leads to many
warnings, still there has been at least one fix in the past to ease
the use of the headers in this case, with b5b3229 (this made the code
more readable).  Should we really care about this case?


Per the commit message for b5b3229, it might be worth getting rid of
such messages for code that's exposed in header files, even if removing
all of those warnings would be too much work.  Perhaps David's use-case
is an extension that's using that header?


Yes, this is being included in an external project. Previously we have 
used a highly marked-up version but we are now trying to pull in the 
header more or less verbatim.


Since this header is specifically designated as something external 
projects may want to use I think it makes sense to fix the warning.


Regards,
--
-David
da...@pgmasters.net




Re: replay pause vs. standby promotion

2020-03-04 Thread Sergei Kornilov
Hello

> I want to start this discussion because this is related to the patch
> (propoesd at the thread [1]) that I'm reviewing. It does that partially,
> i.e., prefers the promotion only when the pause is requested by
> recovery_target_action=pause. But I think that it's reasonable and
> more consistent to do that whether whichever the pause is requested
> by pg_wal_replay_pause() or recovery_target_action.

+1.
I'm just not sure if this is safe for replay logic, so I did not touch this 
behavior in my proposal. (hmm, I wanted to mention this, but apparently forgot)

regards, Sergei




Re: Minor issues in .pgpass

2020-03-04 Thread Fujii Masao




On 2020/03/04 20:39, Hamid Akhtar wrote:



On Tue, Mar 3, 2020 at 8:57 PM Fujii Masao mailto:masao.fu...@oss.nttdata.com>> wrote:



On 2020/03/03 21:38, Hamid Akhtar wrote:
 >
 >
 > On Mon, Mar 2, 2020 at 6:07 PM Fujii Masao mailto:masao.fu...@oss.nttdata.com> >> wrote:
 >
 >
 >
 >     On 2020/02/29 0:46, Hamid Akhtar wrote:
 >      > The following review has been posted through the commitfest 
application:
 >      > make installcheck-world:  not tested
 >      > Implements feature:       not tested
 >      > Spec compliant:           not tested
 >      > Documentation:            not tested
 >      >
 >      > First of all, this seems like fixing a valid issue, albeit, the 
probability of somebody messing is low, but it is still better to fix this problem.
 >      >
 >      > I've not tested the patch in any detail, however, there are a 
couple of comments I have before I proceed on with detailed testing.
 >
 >     Thanks for the review and comments!
 >
 >      > 1. pgindent is showing a few issues with formatting. Please have 
a look and resolve those.
 >
 >     Yes.
 >
 >      > 2. I think you can potentially use "len" variable instead of introducing 
"buflen" and "tmplen" variables.
 >
 >     Basically I don't want to use the same variable for several purposes
 >     because which would decrease the code readability.
 >
 >      > Also, I would choose a more appropriate name for "tmp" variable.
 >
 >     Yeah, so what about "rest" as the variable name?
 >
 >      > I believe if you move the following lines before the conditional statement and 
simply and change the if statement to "if (len >= sizeof(buf) - 1)", it will serve 
the purpose.
 >
 >     ISTM that this doesn't work correctly when the "buf" contains
 >     trailing carriage returns but not newlines (i.e., this line is too 
long
 >     so the "buf" doesn't include newline). In this case, pg_strip_crlf()
 >     shorten the "buf" and then its return value "len" should become
 >     less than sizeof(buf). So the following condition always becomes
 >     false unexpectedly in that case even though there is still rest of
 >     the line to eat.
 >
 >
 > Per code comments for pg_strip_crlf:
 > "pg_strip_crlf -- Remove any trailing newline and carriage return"
 > If the buf read contains a newline or a carriage return at the end, then 
clearly the line
 > is not exceeding the sizeof(buf).

No if the length of the setting line exceeds sizeof(buf) and
the buf contains only a carriage return at the end and not newline.
This case can happen because fgets() stops reading when a newline
(not a carriage return) is found. Normal users are very unlikely to
add a carriage return into the middle of the pgpass setting line
in practice, though. But IMO the code should handle even this
case because it *can* happen, if the code is not so complicated.


I'm not sure if I understand your comment here. From the code of pg_strip_crlf
I see that it is handling both carriage return and/or new line at the end of a
string:


So if "buf" contains a carriage return at the end, it's removed and
the "len" that pg_strip_crlf() returns obviously should be smaller
than sizeof(buf). This causes the following condition that you
proposed as follows to always be false (i.e., len < sizeof(buf) - 1)
even when there are still rest of line. So we cannot eat rest of
the line even though it exists. I'm missing something?

+   if (len >= sizeof(buf) - 1)
+   {
+   chartmp[LINELEN];
+
+   /*
+* Warn if this password setting line is too long,
+* because it's unexpectedly truncated.
+*/
+   if (buf[0] != '#')
+   fprintf(stderr,
+   libpq_gettext("WARNING: line %d too long in 
password file \"%s\"\n"),
+   line_number, pgpassfile);
+
+   /* eat rest of the line */
+   while (!feof(fp) && !ferror(fp))

Regards,

--
Fujii Masao
NTT DATA CORPORATION
Advanced Platform Technology Group
Research and Development Headquarters




replay pause vs. standby promotion

2020-03-04 Thread Fujii Masao

Hi,

Currently if pg_wal_replay_pause() is called after the standby
promotion is triggerred before the promotion has successfully
finished, WAL replay is paused. That is, the replay pause is
preferred than the promotion. Is this desiderable behavior?

ISTM that most users including me want the recovery to complete
as soon as possible and the server to become the master when
they requeste the promotion. So I'm thinking to change
the recovery so that it ignore the pause request after the promotion
is triggerred. Thought?

I want to start this discussion because this is related to the patch
(propoesd at the thread [1]) that I'm reviewing. It does that partially,
i.e., prefers the promotion only when the pause is requested by
recovery_target_action=pause. But I think that it's reasonable and
more consistent to do that whether whichever the pause is requested
by pg_wal_replay_pause() or recovery_target_action.

BTW, regarding "replay pause vs. delayed standby", any wait by
recovery_min_apply_delay doesn't happen after the promotion
is triggerred. IMO "pause" should be treated as the similar.

[1]
https://www.postgresql.org/message-id/flat/19168211580382...@myt5-b646bde4b8f3.qloud-c.yandex.net

Regards,

--
Fujii Masao
NTT DATA CORPORATION
Advanced Platform Technology Group
Research and Development Headquarters




Re: Minor issues in .pgpass

2020-03-04 Thread Hamid Akhtar
On Tue, Mar 3, 2020 at 8:57 PM Fujii Masao 
wrote:

>
>
> On 2020/03/03 21:38, Hamid Akhtar wrote:
> >
> >
> > On Mon, Mar 2, 2020 at 6:07 PM Fujii Masao  > wrote:
> >
> >
> >
> > On 2020/02/29 0:46, Hamid Akhtar wrote:
> >  > The following review has been posted through the commitfest
> application:
> >  > make installcheck-world:  not tested
> >  > Implements feature:   not tested
> >  > Spec compliant:   not tested
> >  > Documentation:not tested
> >  >
> >  > First of all, this seems like fixing a valid issue, albeit, the
> probability of somebody messing is low, but it is still better to fix this
> problem.
> >  >
> >  > I've not tested the patch in any detail, however, there are a
> couple of comments I have before I proceed on with detailed testing.
> >
> > Thanks for the review and comments!
> >
> >  > 1. pgindent is showing a few issues with formatting. Please have
> a look and resolve those.
> >
> > Yes.
> >
> >  > 2. I think you can potentially use "len" variable instead of
> introducing "buflen" and "tmplen" variables.
> >
> > Basically I don't want to use the same variable for several purposes
> > because which would decrease the code readability.
> >
> >  > Also, I would choose a more appropriate name for "tmp" variable.
> >
> > Yeah, so what about "rest" as the variable name?
> >
> >  > I believe if you move the following lines before the conditional
> statement and simply and change the if statement to "if (len >= sizeof(buf)
> - 1)", it will serve the purpose.
> >
> > ISTM that this doesn't work correctly when the "buf" contains
> > trailing carriage returns but not newlines (i.e., this line is too
> long
> > so the "buf" doesn't include newline). In this case, pg_strip_crlf()
> > shorten the "buf" and then its return value "len" should become
> > less than sizeof(buf). So the following condition always becomes
> > false unexpectedly in that case even though there is still rest of
> > the line to eat.
> >
> >
> > Per code comments for pg_strip_crlf:
> > "pg_strip_crlf -- Remove any trailing newline and carriage return"
> > If the buf read contains a newline or a carriage return at the end, then
> clearly the line
> > is not exceeding the sizeof(buf).
>
> No if the length of the setting line exceeds sizeof(buf) and
> the buf contains only a carriage return at the end and not newline.
> This case can happen because fgets() stops reading when a newline
> (not a carriage return) is found. Normal users are very unlikely to
> add a carriage return into the middle of the pgpass setting line
> in practice, though. But IMO the code should handle even this
> case because it *can* happen, if the code is not so complicated.
>

I'm not sure if I understand your comment here. From the code of
pg_strip_crlf
I see that it is handling both carriage return and/or new line at the end
of a
string:
=
src/common/string.c
=
while (len > 0 && (str[len - 1] == '\n' || str[len - 1] == '\r'))
str[--len] = '\0';
=


> Regards,
>
>
> --
> Fujii Masao
> NTT DATA CORPORATION
> Advanced Platform Technology Group
> Research and Development Headquarters
>


-- 
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca
ADDR: 10318 WHALLEY BLVD, Surrey, BC
CELL:+923335449950  EMAIL: mailto:hamid.akh...@highgo.ca
SKYPE: engineeredvirus


Re: [PATCH] kNN for btree

2020-03-04 Thread Alexander Korotkov
On Wed, Mar 4, 2020 at 4:58 AM Peter Geoghegan  wrote:
> On Mon, Mar 2, 2020 at 1:27 PM Alexander Korotkov
>  wrote:
> > I've rebased the patchset to the current master and made some
> > refactoring.  I hope it would be possible to bring it to committable
> > shape during this CF.  This need more refactoring though.
>
> This patch doesn't change anything about the B-Tree on-disk format -- right?

Yes, this is correct.  No on-disk format changes, just new scanning strategy.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




Re: PATCH: Add uri percent-encoding for binary data

2020-03-04 Thread Daniel Gustafsson
> On 20 Feb 2020, at 23:27, Alvaro Herrera  wrote:
> 
> On 2019-Oct-07, Anders Åstrand wrote:
> 
>> Attached is a patch for adding uri as an encoding option for
>> encode/decode. It uses what's called "percent-encoding" in rfc3986
>> (https://tools.ietf.org/html/rfc3986#section-2.1).
> 
> Thanks.  Seems useful.  I made a few cosmetic tweaks and it looks almost
> ready to me;

I agree that uri decoding/encoding would be useful, but I'm not convinced that
this patch does the functionality justice enough to be useful.  What is the
usecase we envision to solve when not taking scheme into consideration?

Reserved characters have different meaning based on context and scheme, and
should not be encoded when used as a delimiter.  This does make the patch a lot
more complicated, but if we provide a uri encoding which percent-encode the
delimiters in https:// I would expect that to be reported to pgsql-bugs@
repeatedly.  Adding URIs with userinfo makes it even more problematic, as
encoding the @ delimiter will break it.

Further, RFC6874 specifies that ipv6 URIs with zone identifiers are written as:
IPv6address "%25" ZoneID.  With this patch it would be encoded %2525 ZoneID
which is incorrect.

That being said, if we do look at the scheme then we'll need to decide which
URI standard we want to stick to as RFC3986 and WHATWG URL-spec aren't
compatible.

Perhaps not calling it 'uri' and instead renaming it to 'percent-encoding' can
make it clearer, while sticking to the proposed feature?

cheers ./daniel



Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager

2020-03-04 Thread Mahendra Singh Thalor
On Wed, 4 Mar 2020 at 12:03, Dilip Kumar  wrote:
>
> On Wed, Mar 4, 2020 at 11:45 AM Mahendra Singh Thalor
>  wrote:
> >
> > On Mon, 24 Feb 2020 at 15:39, Amit Kapila  wrote:
> > >
> > > On Thu, Feb 20, 2020 at 8:06 AM Andres Freund  wrote:
> > > >
> > > > Hi,
> > > >
> > > > On 2020-02-19 11:12:18 +0530, Amit Kapila wrote:
> > > > > I think till we know the real need for changing group locking, going
> > > > > in the direction of what Tom suggested to use an array of LWLocks [1]
> > > > > to address the problems in hand is a good idea.
> > > >
> > > > -many
> > > >
> > > > I think that building yet another locking subsystem is the entirely
> > > > wrong idea - especially when there's imo no convincing architectural
> > > > reasons to do so.
> > > >
> > >
> > > Hmm, AFAIU, it will be done by having an array of LWLocks which we do
> > > at other places as well (like BufferIO locks).  I am not sure if we
> > > can call it as new locking subsystem, but if we decide to continue
> > > using lock.c and change group locking then I think we can do that as
> > > well, see my comments below regarding that.
> > >
> > > >
> > > > > It is not very clear to me that are we thinking to give up on Tom's
> > > > > idea [1] and change group locking even though it is not clear or at
> > > > > least nobody has proposed an idea/patch which requires that?  Or are
> > > > > we thinking that we can do what Tom suggested for relation extension
> > > > > lock and also plan to change group locking for future parallel
> > > > > operations that might require it?
> > > >
> > > > What I'm advocating is that extension locks should continue to go
> > > > through lock.c. And yes, that requires some changes to group locking,
> > > > but I still don't see why they'd be complicated.
> > > >
> > >
> > > Fair position, as per initial analysis, I think if we do below three
> > > things, it should work out without changing to a new way of locking
> > > for relation extension or page type locks.
> > > a. As per the discussion above, ensure in code we will never try to
> > > acquire another heavy-weight lock after acquiring relation extension
> > > or page type locks (probably by having Asserts in code or maybe some
> > > other way).
> > > b. Change lock.c so that group locking is not considered for these two
> > > lock types. For ex. in LockCheckConflicts, along with the check (if
> > > (proclock->groupLeader == MyProc && MyProc->lockGroupLeader == NULL)),
> > > we also check lock->tag and call it a conflict for these two locks.
> > > c. The deadlock detector can ignore checking these two types of locks
> > > because point (a) ensures that those won't lead to deadlock.  One idea
> > > could be that FindLockCycleRecurseMember just ignores these two types
> > > of locks by checking the lock tag.
> >
> > Thanks Amit for summary.
> >
> > Based on above 3 points, here attaching 2 patches for review.
> >
> > 1. v01_0001-Conflict-EXTENTION-lock-in-group-member.patch (Patch by Dilip 
> > Kumar)
> > Basically this patch is for point b and c.
> >
> > 2. v01_0002-Added-assert-to-verify-that-we-never-try-to-take-any.patch
> > (Patch by me)
> > This patch is for point a.
> >
> > After applying both the patches, make check-world is passing.
> >
> > We are testing both the patches and will post results.

Hi all,

I am planing to test below 3 points on v1 patch set:

1. We will check that new added assert can be hit by hacking code
(while holding extension lock, try to take any heavyweight lock)
2. In FindLockCycleRecurseMember, for testing purposes, we can put
additional loop to check that for all relext holders, there must not
be any outer edge.
3. Test that group members are not granted the lock for the relation
extension lock (group members should conflict).

Please let me know your thoughts to test this patch.

-- 
Thanks and Regards
Mahendra Singh Thalor
EnterpriseDB: http://www.enterprisedb.com




Re: Identifying user-created objects

2020-03-04 Thread Masahiko Sawada
On Wed, 4 Mar 2020 at 15:28, vignesh C  wrote:
>
> On Wed, Mar 4, 2020 at 9:02 AM Masahiko Sawada
>  wrote:
> >
> > On Tue, 3 Mar 2020 at 23:33, vignesh C  wrote:
> > >
> > > Should we add some check if object exists or not here:
> > > +Datum
> > > +pg_is_user_object(PG_FUNCTION_ARGS)
> > > +{
> > > +Oid oid = PG_GETARG_OID(0);
> > > +
> > > +PG_RETURN_BOOL(ObjectIsUserObject(oid));
> > > +}
> > >
> > > I was trying some scenarios where we pass an object which does not exist:
> > > postgres=# SELECT pg_is_user_object(0);
> > >  pg_is_user_object
> > > ---
> > >  f
> > > (1 row)
> > > postgres=# SELECT pg_is_user_object(22);
> > >  pg_is_user_object
> > > ---
> > >  t
> > > (1 row)
> > > SELECT pg_is_user_object('pg_class1'::regclass);
> > > ERROR:  relation "pg_class1" does not exist
> > > LINE 1: SELECT pg_is_user_object('pg_class1'::regclass);
> > >  ^
> > > I felt these behavior seems to be slightly inconsistent.
> > > Thoughts?
> > >
> >
> > Hmm I'm not sure we should existing check in that function. Main use
> > case would be passing an oid of a tuple of a system catalog to that
> > function to check if the given object was created while multi-user
> > mode. So I think this function can assume that the given object id
> > exists. And if we want to do that check, we will end up with checking
> > if the object having that oid in all system catalogs, which is very
> > high cost I think.
> >
> > I suspect perhaps the function name pg_is_user_object led that
> > confusion. That name looks like it checks if the given 'object' is
> > created while multi-user mode. So maybe we can improve it either by
> > renaming to pg_is_user_object_id (or pg_is_user_oid?) or leaving the
> > name but describing in the doc (based on Amit's suggestion in previous
> > mail):
>
> I liked pg_is_user_oid over pg_is_user_object_id but this liking may
> vary from person to person, so I'm still ok if you don't change the
> name. I'm fine about adding the information in the document unless
> someone else feels that this check is required in this function.
>

Attached updated patch that incorporated comments from Amit and Vignesh.


Regards,

--
Masahiko Sawadahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 323366feb6..3a2c7fa6a6 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -18752,6 +18752,10 @@ SELECT collation for ('foo' COLLATE "de_DE");
 pg_get_object_address

 
+   
+pg_is_user_object
+   
+
   
 lists functions related to
database object identification and addressing.
@@ -18785,6 +18789,14 @@ SELECT collation for ('foo' COLLATE "de_DE");
classid oid, objid oid, objsubid integer
get address of a database object from its external representation
   
+  
+   pg_is_user_object(oid)
+   bool
+   
+true for OIDs assigned while database is operating in normal multi-user
+mode, as opposed to single-user mode (see ).
+   
+  
  
 

diff --git a/src/backend/catalog/pg_publication.c b/src/backend/catalog/pg_publication.c
index c5eea7af3f..7888e00d03 100644
--- a/src/backend/catalog/pg_publication.c
+++ b/src/backend/catalog/pg_publication.c
@@ -106,7 +106,7 @@ is_publishable_class(Oid relid, Form_pg_class reltuple)
 	return reltuple->relkind == RELKIND_RELATION &&
 		!IsCatalogRelationOid(relid) &&
 		reltuple->relpersistence == RELPERSISTENCE_PERMANENT &&
-		relid >= FirstNormalObjectId;
+		ObjectIsUserObject(relid);
 }
 
 /*
diff --git a/src/backend/utils/adt/json.c b/src/backend/utils/adt/json.c
index 567eab1e01..e4f00fc309 100644
--- a/src/backend/utils/adt/json.c
+++ b/src/backend/utils/adt/json.c
@@ -204,7 +204,7 @@ json_categorize_type(Oid typoid,
 /* It's probably the general case ... */
 *tcategory = JSONTYPE_OTHER;
 /* but let's look for a cast to json, if it's not built-in */
-if (typoid >= FirstNormalObjectId)
+if (ObjectIsUserObject(typoid))
 {
 	Oid			castfunc;
 	CoercionPathType ctype;
diff --git a/src/backend/utils/adt/jsonb.c b/src/backend/utils/adt/jsonb.c
index fea4335951..6869ea9954 100644
--- a/src/backend/utils/adt/jsonb.c
+++ b/src/backend/utils/adt/jsonb.c
@@ -690,7 +690,7 @@ jsonb_categorize_type(Oid typoid,
  * but first let's look for a cast to json (note: not to
  * jsonb) if it's not built-in.
  */
-if (typoid >= FirstNormalObjectId)
+if (ObjectIsUserObject(typoid))
 {
 	Oid			castfunc;
 	CoercionPathType ctype;
diff --git a/src/backend/utils/adt/ruleutils.c b/src/backend/utils/adt/ruleutils.c
index 158784474d..15b1ad13eb 100644
--- a/src/backend/utils/adt/ruleutils.c
+++ b/src/backend/utils/adt/ruleutils.c
@@ -3202,6 +3202,16 @@ pg_get_function_arg_default(PG_FUNCTION_ARGS)
 	PG_RETURN_TEXT_P(string_to_text(str));
 }
 
+/*
+ * Return 

Re: Re: [HACKERS] make async slave to wait for lsn to be replayed

2020-03-04 Thread Fujii Masao
On Tue, Mar 13, 2018 at 10:06 PM David Steele  wrote:
>
> Hi Ivan,
>
> On 3/6/18 9:25 PM, Michael Paquier wrote:
> > On Tue, Mar 06, 2018 at 02:24:24PM +0300, Ivan Kartyshov wrote:
> >> Hello, I now is preparing the patch over syntax that Simon offered. And in
> >> few day I will update the patch.
> >> Thank you for your interest in thread.
> >
> > It has been more than one month since a patch update has been requested,
> > and time is growing short.  This refactored patch introduces a whole new
> > concept as well, so my recommendation would be to mark this patch as
> > returned with feedback, and then review it freshly for v12 if this
> > concept is still alive and around.
>
> This patch wasn't updated at the beginning of the CF and still hasn't
> been updated after almost two weeks.
>
> I have marked the patch Returned with Feedback.  Please resubmit to a
> new CF when you have an updated patch.

There are no updates from about two years before, but this patch
has been registered in CF 2020-03. Not sure why. It should be marked
as Returned with Feedback again?

Regards,

-- 
Fujii Masao




Re: logical replication empty transactions

2020-03-04 Thread Dilip Kumar
On Wed, Mar 4, 2020 at 3:47 PM Amit Kapila  wrote:
>
> On Wed, Mar 4, 2020 at 11:16 AM Dilip Kumar  wrote:
> >
> > On Wed, Mar 4, 2020 at 10:50 AM Amit Kapila  wrote:
> > >
> > > On Wed, Mar 4, 2020 at 9:52 AM Dilip Kumar  wrote:
> > > >
> > > >
> > > > IMHO, the threshold should be based on the commit LSN.  Our main
> > > > reason we want to send empty transactions after a certain
> > > > transaction/duration is that we want the restart_lsn to be moving
> > > > forward so that if we need to restart the replication slot we don't
> > > > need to process a lot of extra WAL.  So assume we set the threshold
> > > > based on transaction count then there is still a possibility that we
> > > > might process a few very big transactions then we will have to process
> > > > them again after the restart.
> > > >
> > >
> > > Won't the subscriber eventually send the flush location for the large
> > > transactions which will move the restart_lsn?
> >
> > I meant large empty transactions (basically we can not send anything
> > to the subscriber).  So my point was if there are only large
> > transactions in the system which we can not stream because those
> > tables are not published.  Then keeping threshold based on transaction
> > count will not help much because even if we don't reach the
> > transaction count threshold, we still might need to process a lot of
> > data if we don't stream the commit for the empty transactions.  So
> > instead of tracking transaction count can we track LSN,   and LSN
> > different since we last stream some change cross the threshold then we
> > will stream the next empty transaction.
> >
>
> You have a point and it may be better to keep threshold based on LSN
> if we want to keep any threshold, but keeping on transaction count
> seems to be a bit straightforward.  Let us see if anyone else has any
> opinion on this matter?

Ok, that make sense.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: PG_COLOR not mentioned in docs of vacuumlo, oid2name and pgbench

2020-03-04 Thread Fabien COELHO


Bonjour Michaël,


All the tools mentioned in $subject have been switched recently to use
the central logging infrastructure, which means that they have gained
coloring output.  However we (mostly I) forgot to update the docs.

Attached is a patch to fix this issue.  Please let me know if there
are comments and/or objections.


No objection. I did not know there was such a thing…

Maybe a more detailed explanation about PG_COLOR could be stored 
somewhere, and all affected tools could link to it? Or not.


For "pgbench", you could also add the standard sentence that it uses libpq 
environment variables, as it is also missing?


--
Fabien.

Re: backup manifests

2020-03-04 Thread tushar
Another scenario, in which if we modify Manifest-Checksum" value from 
backup_manifest file , we are not getting an error


[centos@tushar-ldap-docker bin]$ ./pg_validatebackup data/
pg_validatebackup: * manifest_checksum = 
28d082921650d0ae881de8ceb122c8d2af5f449f51ecfb446827f7f49f91f65d

pg_validatebackup: backup successfully verified

open backup_manifest file and replace

"Manifest-Checksum": 
"8d082921650d0ae881de8ceb122c8d2af5f449f51ecfb446827f7f49f91f65d"}

with
"Manifest-Checksum": "Hello World"}

rerun the pg_validatebackup

[centos@tushar-ldap-docker bin]$ ./pg_validatebackup data/
pg_validatebackup: * manifest_checksum = Hello World
pg_validatebackup: backup successfully verified

regards,

On 3/4/20 3:26 PM, tushar wrote:

Hi,
Another observation , if i change the ownership of a file which is 
under global/ directory

i.e

[root@tushar-ldap-docker global]# chown enterprisedb 2396

and run the pg_validatebackup command, i am getting this message -

[centos@tushar-ldap-docker bin]$ ./pg_validatebackup 
pg_validatebackup: * manifest_checksum = 
e8cb007bcc9c0deab6eff51cd8d9d9af6af35b86e02f3055e60e70e56737e877
pg_validatebackup: error: could not open file "global/2396": 
Permission denied
*** Error in `./pg_validatebackup': double free or corruption (!prev): 
0x01850ba0 ***

=== Backtrace: =
/lib64/libc.so.6(+0x81679)[0x7fa2248e3679]
./pg_validatebackup[0x401f4c]
/lib64/libc.so.6(__libc_start_main+0xf5)[0x7fa224884505]
./pg_validatebackup[0x402049]
=== Memory map: 
0040-00415000 r-xp  fd:03 4044545 
/home/centos/pg13_bk_mani/edb/edbpsql/bin/pg_validatebackup
00614000-00615000 r--p 00014000 fd:03 4044545 
/home/centos/pg13_bk_mani/edb/edbpsql/bin/pg_validatebackup
00615000-00616000 rw-p 00015000 fd:03 4044545 
/home/centos/pg13_bk_mani/edb/edbpsql/bin/pg_validatebackup
017f3000-01878000 rw-p  00:00 
0  [heap]

7fa21800-7fa218021000 rw-p  00:00 0
7fa218021000-7fa21c00 ---p  00:00 0
7fa21e122000-7fa21e137000 r-xp  fd:03 
141697 /usr/lib64/libgcc_s-4.8.5-20150702.so.1
7fa21e137000-7fa21e336000 ---p 00015000 fd:03 
141697 /usr/lib64/libgcc_s-4.8.5-20150702.so.1
7fa21e336000-7fa21e337000 r--p 00014000 fd:03 
141697 /usr/lib64/libgcc_s-4.8.5-20150702.so.1
7fa21e337000-7fa21e338000 rw-p 00015000 fd:03 
141697 /usr/lib64/libgcc_s-4.8.5-20150702.so.1
7fa21e338000-7fa224862000 r--p  fd:03 
266442 /usr/lib/locale/locale-archive
7fa224862000-7fa224a25000 r-xp  fd:03 
134456 /usr/lib64/libc-2.17.so
7fa224a25000-7fa224c25000 ---p 001c3000 fd:03 
134456 /usr/lib64/libc-2.17.so
7fa224c25000-7fa224c29000 r--p 001c3000 fd:03 
134456 /usr/lib64/libc-2.17.so
7fa224c29000-7fa224c2b000 rw-p 001c7000 fd:03 
134456 /usr/lib64/libc-2.17.so

7fa224c2b000-7fa224c3 rw-p  00:00 0
7fa224c3-7fa224c47000 r-xp  fd:03 
134485 /usr/lib64/libpthread-2.17.so
7fa224c47000-7fa224e46000 ---p 00017000 fd:03 
134485 /usr/lib64/libpthread-2.17.so
7fa224e46000-7fa224e47000 r--p 00016000 fd:03 
134485 /usr/lib64/libpthread-2.17.so
7fa224e47000-7fa224e48000 rw-p 00017000 fd:03 
134485 /usr/lib64/libpthread-2.17.so

7fa224e48000-7fa224e4c000 rw-p  00:00 0
7fa224e4c000-7fa224e9 r-xp  fd:03 4044478 
/home/centos/pg13_bk_mani/edb/edbpsql/lib/libpq.so.5.13
7fa224e9-7fa22509 ---p 00044000 fd:03 4044478 
/home/centos/pg13_bk_mani/edb/edbpsql/lib/libpq.so.5.13
7fa22509-7fa225093000 r--p 00044000 fd:03 4044478 
/home/centos/pg13_bk_mani/edb/edbpsql/lib/libpq.so.5.13
7fa225093000-7fa225094000 rw-p 00047000 fd:03 4044478 
/home/centos/pg13_bk_mani/edb/edbpsql/lib/libpq.so.5.13
7fa225094000-7fa2250b6000 r-xp  fd:03 
130333 /usr/lib64/ld-2.17.so

7fa22527d000-7fa2252a2000 rw-p  00:00 0
7fa2252b3000-7fa2252b5000 rw-p  00:00 0
7fa2252b5000-7fa2252b6000 r--p 00021000 fd:03 
130333 /usr/lib64/ld-2.17.so
7fa2252b6000-7fa2252b7000 rw-p 00022000 fd:03 
130333 /usr/lib64/ld-2.17.so

7fa2252b7000-7fa2252b8000 rw-p  00:00 0
7ffdf354f000-7ffdf357 rw-p  00:00 
0  [stack]
7ffdf3572000-7ffdf3574000 r-xp  00:00 
0  [vdso]
ff60-ff601000 r-xp  00:00 
0  [vsyscall]

Aborted
[centos@tushar-ldap-docker bin]$


I am getting the error message but along with "*** Error in 
`./pg_validatebackup': double free or corruption (!prev): 
0x01850ba0 ***"  messages


Is this expected ?

regards,

On 3/3/20 8:19 PM, tushar wrote:

On 3/3/20 4:04 PM, tushar wrote:
Thanks Robert.  After applying all the 5 patches (v8-00*) against PG 
v13 (co

Re: logical replication empty transactions

2020-03-04 Thread Amit Kapila
On Wed, Mar 4, 2020 at 11:16 AM Dilip Kumar  wrote:
>
> On Wed, Mar 4, 2020 at 10:50 AM Amit Kapila  wrote:
> >
> > On Wed, Mar 4, 2020 at 9:52 AM Dilip Kumar  wrote:
> > >
> > >
> > > IMHO, the threshold should be based on the commit LSN.  Our main
> > > reason we want to send empty transactions after a certain
> > > transaction/duration is that we want the restart_lsn to be moving
> > > forward so that if we need to restart the replication slot we don't
> > > need to process a lot of extra WAL.  So assume we set the threshold
> > > based on transaction count then there is still a possibility that we
> > > might process a few very big transactions then we will have to process
> > > them again after the restart.
> > >
> >
> > Won't the subscriber eventually send the flush location for the large
> > transactions which will move the restart_lsn?
>
> I meant large empty transactions (basically we can not send anything
> to the subscriber).  So my point was if there are only large
> transactions in the system which we can not stream because those
> tables are not published.  Then keeping threshold based on transaction
> count will not help much because even if we don't reach the
> transaction count threshold, we still might need to process a lot of
> data if we don't stream the commit for the empty transactions.  So
> instead of tracking transaction count can we track LSN,   and LSN
> different since we last stream some change cross the threshold then we
> will stream the next empty transaction.
>

You have a point and it may be better to keep threshold based on LSN
if we want to keep any threshold, but keeping on transaction count
seems to be a bit straightforward.  Let us see if anyone else has any
opinion on this matter?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com




Re: Identifying user-created objects

2020-03-04 Thread Masahiko Sawada
On Wed, 4 Mar 2020 at 18:57, Fujii Masao  wrote:
>
>
>
> On 2020/03/04 18:36, Masahiko Sawada wrote:
> > On Wed, 4 Mar 2020 at 18:02, Fujii Masao  
> > wrote:
> >>
> >>
> >>
> >> On 2020/03/04 17:05, Masahiko Sawada wrote:
> >>> On Wed, 4 Mar 2020 at 16:43, Fujii Masao  
> >>> wrote:
> 
> 
> 
>  On 2020/02/05 20:26, Masahiko Sawada wrote:
> > Hi,
> >
> > User can create database objects such as functions into pg_catalog.
> > But if I'm not missing something, currently there is no
> > straightforward way to identify if the object is a user created object
> > or a system object which is created during initdb. If we can do that
> > user will be able to check if malicious functions are not created in
> > the database, which is important from the security perspective.
> 
>  The function that you are proposing is really enough for this use case?
>  What if malicious users directly change the oid of function
>  to < FirstNormalObjectId? Or you're assuming that malicious users will
>  never log in as superuser and not be able to change the oid?
> >>>
> >>> That's a good point! I'm surprised that user is allowed to update an
> >>> oid of database object. In addition, surprisingly we can update it to
> >>> 0, which in turn leads the assertion failure:
> >>
> >> Since non-superusers are not allowed to do that by default,
> >> that's not so bad? That is, to avoid such unexpected change of oid,
> >> admin just should prevent malicious users from logging in as superusers
> >> and not give the permission on system catalogs to such users.
> >>
> >
> > I think there is still insider threats. As long as we depend on
> > superuser privilege to do some DBA work, a malicious DBA might be able
> > to log in as superuser and modify oid.
>
> Yes. But I'm sure that DBA has already considered the measures
> againt such threads. Otherwise malicious users can do anything
> more malicious rather than changing oid.

Agreed. So that's not a serious problem in practice but we cannot say
the checking by pg_is_user_object() is totally enough for checking
whether malicious object exists or not. Is that right?

Regards,

-- 
Masahiko Sawadahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Minor issues in .pgpass

2020-03-04 Thread Fujii Masao



On 2020/03/03 22:07, Hamid Akhtar wrote:

On Tue, Mar 3, 2020 at 5:38 PM Hamid Akhtar mailto:hamid.akh...@gmail.com>> wrote:



On Mon, Mar 2, 2020 at 6:07 PM Fujii Masao mailto:masao.fu...@oss.nttdata.com>> wrote:



On 2020/02/29 0:46, Hamid Akhtar wrote:
 > The following review has been posted through the commitfest 
application:
 > make installcheck-world:  not tested
 > Implements feature:       not tested
 > Spec compliant:           not tested
 > Documentation:            not tested
 >
 > First of all, this seems like fixing a valid issue, albeit, the 
probability of somebody messing is low, but it is still better to fix this problem.
 >
 > I've not tested the patch in any detail, however, there are a couple 
of comments I have before I proceed on with detailed testing.

Thanks for the review and comments!

 > 1. pgindent is showing a few issues with formatting. Please have a 
look and resolve those.

Yes.


Fixed. Attached is the updated version of the patch.
I marked this CF entry as "Needs Review" again.


 > 2. I think you can potentially use "len" variable instead of introducing 
"buflen" and "tmplen" variables.

Basically I don't want to use the same variable for several purposes
because which would decrease the code readability.


That is fine.


 > Also, I would choose a more appropriate name for "tmp" variable.

Yeah, so what about "rest" as the variable name?


May be something like "excess_buf" or any other one that describes that these 
bytes are to be discarded.


Thanks for the comment! But IMO that "rest" is not
so bad choice, so for now I used "rest" in the latest patch.

Regards,


--
Fujii Masao
NTT DATA CORPORATION
Advanced Platform Technology Group
Research and Development Headquarters
diff --git a/src/interfaces/libpq/fe-connect.c 
b/src/interfaces/libpq/fe-connect.c
index 408000af83..0157c619aa 100644
--- a/src/interfaces/libpq/fe-connect.c
+++ b/src/interfaces/libpq/fe-connect.c
@@ -6949,6 +6949,7 @@ passwordFromFile(const char *hostname, const char *port, 
const char *dbname,
 {
FILE   *fp;
struct stat stat_buf;
+   int line_number = 0;
 
 #define LINELEN NAMEDATALEN*5
charbuf[LINELEN];
@@ -7014,10 +7015,42 @@ passwordFromFile(const char *hostname, const char 
*port, const char *dbname,
   *p1,
   *p2;
int len;
+   int buflen;
 
if (fgets(buf, sizeof(buf), fp) == NULL)
break;
 
+   line_number++;
+   buflen = strlen(buf);
+   if (buflen >= sizeof(buf) - 1 && buf[buflen - 1] != '\n')
+   {
+   charrest[LINELEN];
+   int restlen;
+
+   /*
+* Warn if this password setting line is too long, 
because it's
+* unexpectedly truncated.
+*/
+   if (buf[0] != '#')
+   fprintf(stderr,
+   libpq_gettext("WARNING: line %d 
too long in password file \"%s\"\n"),
+   line_number, pgpassfile);
+
+   /* eat rest of the line */
+   while (!feof(fp) && !ferror(fp))
+   {
+   if (fgets(rest, sizeof(rest), fp) == NULL)
+   break;
+   restlen = strlen(rest);
+   if (restlen < sizeof(rest) - 1 || rest[restlen 
- 1] == '\n')
+   break;
+   }
+   }
+
+   /* ignore comments */
+   if (buf[0] == '#')
+   continue;
+
/* strip trailing newline and carriage return */
len = pg_strip_crlf(buf);
 


Re: Identifying user-created objects

2020-03-04 Thread Fujii Masao




On 2020/03/04 18:36, Masahiko Sawada wrote:

On Wed, 4 Mar 2020 at 18:02, Fujii Masao  wrote:




On 2020/03/04 17:05, Masahiko Sawada wrote:

On Wed, 4 Mar 2020 at 16:43, Fujii Masao  wrote:




On 2020/02/05 20:26, Masahiko Sawada wrote:

Hi,

User can create database objects such as functions into pg_catalog.
But if I'm not missing something, currently there is no
straightforward way to identify if the object is a user created object
or a system object which is created during initdb. If we can do that
user will be able to check if malicious functions are not created in
the database, which is important from the security perspective.


The function that you are proposing is really enough for this use case?
What if malicious users directly change the oid of function
to < FirstNormalObjectId? Or you're assuming that malicious users will
never log in as superuser and not be able to change the oid?


That's a good point! I'm surprised that user is allowed to update an
oid of database object. In addition, surprisingly we can update it to
0, which in turn leads the assertion failure:


Since non-superusers are not allowed to do that by default,
that's not so bad? That is, to avoid such unexpected change of oid,
admin just should prevent malicious users from logging in as superusers
and not give the permission on system catalogs to such users.



I think there is still insider threats. As long as we depend on
superuser privilege to do some DBA work, a malicious DBA might be able
to log in as superuser and modify oid.


Yes. But I'm sure that DBA has already considered the measures
againt such threads. Otherwise malicious users can do anything
more malicious rather than changing oid.

Regards,

--
Fujii Masao
NTT DATA CORPORATION
Advanced Platform Technology Group
Research and Development Headquarters




Re: backup manifests

2020-03-04 Thread tushar

Hi,
Another observation , if i change the ownership of a file which is under 
global/ directory

i.e

[root@tushar-ldap-docker global]# chown enterprisedb 2396

and run the pg_validatebackup command, i am getting this message -

[centos@tushar-ldap-docker bin]$ ./pg_validatebackup 
pg_validatebackup: * manifest_checksum = 
e8cb007bcc9c0deab6eff51cd8d9d9af6af35b86e02f3055e60e70e56737e877
pg_validatebackup: error: could not open file "global/2396": Permission 
denied
*** Error in `./pg_validatebackup': double free or corruption (!prev): 
0x01850ba0 ***

=== Backtrace: =
/lib64/libc.so.6(+0x81679)[0x7fa2248e3679]
./pg_validatebackup[0x401f4c]
/lib64/libc.so.6(__libc_start_main+0xf5)[0x7fa224884505]
./pg_validatebackup[0x402049]
=== Memory map: 
0040-00415000 r-xp  fd:03 4044545 
/home/centos/pg13_bk_mani/edb/edbpsql/bin/pg_validatebackup
00614000-00615000 r--p 00014000 fd:03 4044545 
/home/centos/pg13_bk_mani/edb/edbpsql/bin/pg_validatebackup
00615000-00616000 rw-p 00015000 fd:03 4044545 
/home/centos/pg13_bk_mani/edb/edbpsql/bin/pg_validatebackup
017f3000-01878000 rw-p  00:00 0  
[heap]

7fa21800-7fa218021000 rw-p  00:00 0
7fa218021000-7fa21c00 ---p  00:00 0
7fa21e122000-7fa21e137000 r-xp  fd:03 141697 
/usr/lib64/libgcc_s-4.8.5-20150702.so.1
7fa21e137000-7fa21e336000 ---p 00015000 fd:03 141697 
/usr/lib64/libgcc_s-4.8.5-20150702.so.1
7fa21e336000-7fa21e337000 r--p 00014000 fd:03 141697 
/usr/lib64/libgcc_s-4.8.5-20150702.so.1
7fa21e337000-7fa21e338000 rw-p 00015000 fd:03 141697 
/usr/lib64/libgcc_s-4.8.5-20150702.so.1
7fa21e338000-7fa224862000 r--p  fd:03 266442 
/usr/lib/locale/locale-archive
7fa224862000-7fa224a25000 r-xp  fd:03 134456 
/usr/lib64/libc-2.17.so
7fa224a25000-7fa224c25000 ---p 001c3000 fd:03 134456 
/usr/lib64/libc-2.17.so
7fa224c25000-7fa224c29000 r--p 001c3000 fd:03 134456 
/usr/lib64/libc-2.17.so
7fa224c29000-7fa224c2b000 rw-p 001c7000 fd:03 134456 
/usr/lib64/libc-2.17.so

7fa224c2b000-7fa224c3 rw-p  00:00 0
7fa224c3-7fa224c47000 r-xp  fd:03 134485 
/usr/lib64/libpthread-2.17.so
7fa224c47000-7fa224e46000 ---p 00017000 fd:03 134485 
/usr/lib64/libpthread-2.17.so
7fa224e46000-7fa224e47000 r--p 00016000 fd:03 134485 
/usr/lib64/libpthread-2.17.so
7fa224e47000-7fa224e48000 rw-p 00017000 fd:03 134485 
/usr/lib64/libpthread-2.17.so

7fa224e48000-7fa224e4c000 rw-p  00:00 0
7fa224e4c000-7fa224e9 r-xp  fd:03 4044478 
/home/centos/pg13_bk_mani/edb/edbpsql/lib/libpq.so.5.13
7fa224e9-7fa22509 ---p 00044000 fd:03 4044478 
/home/centos/pg13_bk_mani/edb/edbpsql/lib/libpq.so.5.13
7fa22509-7fa225093000 r--p 00044000 fd:03 4044478 
/home/centos/pg13_bk_mani/edb/edbpsql/lib/libpq.so.5.13
7fa225093000-7fa225094000 rw-p 00047000 fd:03 4044478 
/home/centos/pg13_bk_mani/edb/edbpsql/lib/libpq.so.5.13
7fa225094000-7fa2250b6000 r-xp  fd:03 130333 
/usr/lib64/ld-2.17.so

7fa22527d000-7fa2252a2000 rw-p  00:00 0
7fa2252b3000-7fa2252b5000 rw-p  00:00 0
7fa2252b5000-7fa2252b6000 r--p 00021000 fd:03 130333 
/usr/lib64/ld-2.17.so
7fa2252b6000-7fa2252b7000 rw-p 00022000 fd:03 130333 
/usr/lib64/ld-2.17.so

7fa2252b7000-7fa2252b8000 rw-p  00:00 0
7ffdf354f000-7ffdf357 rw-p  00:00 0  
[stack]
7ffdf3572000-7ffdf3574000 r-xp  00:00 0  
[vdso]
ff60-ff601000 r-xp  00:00 0  
[vsyscall]

Aborted
[centos@tushar-ldap-docker bin]$


I am getting the error message but along with "*** Error in 
`./pg_validatebackup': double free or corruption (!prev): 
0x01850ba0 ***"  messages


Is this expected ?

regards,

On 3/3/20 8:19 PM, tushar wrote:

On 3/3/20 4:04 PM, tushar wrote:
Thanks Robert.  After applying all the 5 patches (v8-00*) against PG 
v13 (commit id -afb5465e0cfce7637066eaaaeecab30b0f23fbe3) , 


There is a scenario where pg_validatebackup is not throwing an error 
if some file deleted from pg_wal/ folder and  but later at the time of 
restoring - we are getting an error


[centos@tushar-ldap-docker bin]$ ./pg_basebackup  -D test1

[centos@tushar-ldap-docker bin]$ ls test1/pg_wal/
00010010  archive_status

[centos@tushar-ldap-docker bin]$ rm -rf test1/pg_wal/*

[centos@tushar-ldap-docker bin]$ ./pg_validatebackup test1
pg_validatebackup: * manifest_checksum = 
88f1ed995c83e86252466a2c88b3e660a69cfc76c169991134b101c4f16c9df7

pg_validatebackup: backup successfully verified

[centos@tushar-ldap-docker bin]$ ./pg_ctl -D test1 start -o '-p '
waiting for se

Re: Use compiler intrinsics for bit ops in hash

2020-03-04 Thread John Naylor
On Tue, Mar 3, 2020 at 4:46 AM Jesse Zhang  wrote:
> The math in the new (from v4 to v6) patch is wrong: it yields
> ceil_log2(1) = 1 or next_power_of_2(1) = 2.

I think you're right.

> I can see that you lifted
> the restriction of "num greater than one" for ceil_log2() in this patch
> set, but it's now _more_ problematic to base those functions on
> pg_leftmost_one_pos().

> I'm not comfortable with your changes to pg_leftmost_one_pos() to remove
> the restriction on word being non-zero. Specifically
> pg_leftmost_one_pos() is made to return 0 on 0 input. While none of its
> current callers (in HEAD) is harmed, this introduces muddy semantics:
>
> 1. pg_leftmost_one_pos is semantically undefined on 0 input: scanning
> for a set bit in a zero word won't find it anywhere.

Right.

> I've quickly put together a PoC patch on top of yours, which
> re-implements ceil_log2 using LZCNT coupled with a CPUID check.
> Thoughts?

This patch seems to be making an assumption that an indirect function
call is faster than taking a branch (in inlined code) that the CPU
will almost always predict correctly. It would be nice to have some
numbers to compare. (against pg_count_leading_zeros_* using the "slow"
versions but statically inlined).

Stylistically, "8 * sizeof(num)" is a bit overly formal, since the
hard-coded number we want is in the name of the function.

-- 
John Naylorhttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Identifying user-created objects

2020-03-04 Thread Masahiko Sawada
On Wed, 4 Mar 2020 at 18:02, Fujii Masao  wrote:
>
>
>
> On 2020/03/04 17:05, Masahiko Sawada wrote:
> > On Wed, 4 Mar 2020 at 16:43, Fujii Masao  
> > wrote:
> >>
> >>
> >>
> >> On 2020/02/05 20:26, Masahiko Sawada wrote:
> >>> Hi,
> >>>
> >>> User can create database objects such as functions into pg_catalog.
> >>> But if I'm not missing something, currently there is no
> >>> straightforward way to identify if the object is a user created object
> >>> or a system object which is created during initdb. If we can do that
> >>> user will be able to check if malicious functions are not created in
> >>> the database, which is important from the security perspective.
> >>
> >> The function that you are proposing is really enough for this use case?
> >> What if malicious users directly change the oid of function
> >> to < FirstNormalObjectId? Or you're assuming that malicious users will
> >> never log in as superuser and not be able to change the oid?
> >
> > That's a good point! I'm surprised that user is allowed to update an
> > oid of database object. In addition, surprisingly we can update it to
> > 0, which in turn leads the assertion failure:
>
> Since non-superusers are not allowed to do that by default,
> that's not so bad? That is, to avoid such unexpected change of oid,
> admin just should prevent malicious users from logging in as superusers
> and not give the permission on system catalogs to such users.
>

I think there is still insider threats. As long as we depend on
superuser privilege to do some DBA work, a malicious DBA might be able
to log in as superuser and modify oid.

This behavior is introduced in PG12 where we made oid column
non-system column. A table having oid = 0 is shown in pg_class but we
cannot drop it.

Regards,

-- 
Masahiko Sawadahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: PG14 target version?

2020-03-04 Thread Dave Page
On Wed, Mar 4, 2020 at 9:14 AM Daniel Gustafsson  wrote:

> > On 4 Mar 2020, at 06:28, Michael Paquier  wrote:
> >
> > On Tue, Mar 03, 2020 at 01:10:26PM -0500, David Steele wrote:
> >> Anybody know how to add 14 to the "Target version" dropdown in the CF
> app?
> >
> > The only person knowing that stuff is I think Magnus.  I don't have an
> > access to that.
>
> Magnus, or someone else on the infra team since it requires an update to
> the
> database. Looping in -www for visibility.
>

Hmm, I just tried to login to the admin site to do this and got a 500
error. Unfortunately I'm now off to take a number of meetings, so can't
look more closely myself.

-- 
Dave Page
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake

EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


  1   2   >