Re: warn_unused_results

2020-11-08 Thread Peter Eisentraut

On 2020-11-09 07:56, Michael Paquier wrote:

This is accepted by clang, and MSVC has visibly an equivalent for
that, as of VS 2012:
#elif defined(_MSC_VER) && (_MSC_VER >= 1700)
#define pg_nodiscard _Check_return_
We don't care about the 1700 condition as we support only >= 1800 on
HEAD, and in this case the addition of pg_nodiscard would be required
on the definition and the declaration.  Should it be added?  It is
much more invasive than the gcc/clang equivalent though..


AFAICT from the documentation, this only applies for special "analyze" 
runs, not as a normal compiler warning.  Do we have any support for 
analyze runs with MSVC?



FWIW, I saw an extra case with parsePGArray() today.


AFAICT, that's more in the category of checking for error returns, which 
is similar to the "fortify" options that some environments have for 
checking the return of fwrite() etc.



I am not sure
about the addition of repalloc(), as it is quite obvious that one has
to use its result.  Lists are fine, these are proper to PG internals
and beginners tend to be easily confused in the way to use them.


realloc() is listed in the GCC documentation as the reason this option 
exists, and glibc tags its realloc() with this attribute, so doing the 
same for repalloc() seems sensible.


--
Peter Eisentraut
2ndQuadrant, an EDB company
https://www.2ndquadrant.com/




Re: warn_unused_results

2020-11-08 Thread Michael Paquier
On Sat, Oct 17, 2020 at 08:57:51AM +0200, Peter Eisentraut wrote:
> Forgetting to assign the return value of list APIs such as lappend() is a
> perennial favorite.  The compiler can help point out such mistakes. GCC has
> an attribute warn_unused_results.  Also C++ has standardized this under the
> name "nodiscard", and C has a proposal to do the same [0].  In my patch I
> call the symbol pg_nodiscard, so that perhaps in a distant future one only
> has to do s/pg_nodiscard/nodiscard/ or something similar.  Also, the name is
> short enough that it doesn't mess up the formatting of function declarations
> too much.

I have seen as well this stuff being a source of confusion in the
past.

> I have added pg_nodiscard decorations to all the list functions where I
> found it sensible, as well as repalloc() for good measure, since realloc()
> is usually mentioned as an example where this function attribute is useful.

+#ifdef __GNUC__
+#define pg_nodiscard __attribute__((warn_unused_result))
+#else
+#define pg_nodiscard
+#endif

This is accepted by clang, and MSVC has visibly an equivalent for
that, as of VS 2012:
#elif defined(_MSC_VER) && (_MSC_VER >= 1700)
#define pg_nodiscard _Check_return_
We don't care about the 1700 condition as we support only >= 1800 on
HEAD, and in this case the addition of pg_nodiscard would be required
on the definition and the declaration.  Should it be added?  It is
much more invasive than the gcc/clang equivalent though.. 

> I have found two places in the existing code where this creates warnings.
> Both places are correct as is, but make assumptions about the internals of
> the list APIs and it seemed better just to fix the warning than to write a
> treatise about why it's correct as is.

FWIW, I saw an extra case with parsePGArray() today.  I am not sure
about the addition of repalloc(), as it is quite obvious that one has
to use its result.  Lists are fine, these are proper to PG internals
and beginners tend to be easily confused in the way to use them.
--
Michael


signature.asc
Description: PGP signature


Re: [PATCH] Add features to pg_stat_statements

2020-11-08 Thread Seino Yuki


However, let me confirm the following.

Is this information really useful?
If there is no valid use case for this, I'd like to drop it.
Thought?


I thought it would be easy for users to see at a glance that if there 
is a case I assumed,
if the last modified date and time is old, there is no need to adjust 
at all, and if the
last modified date and time is recent, it would be easy for users to 
understand that the

parameters need to be adjusted.
What do you think?


Even without the last deallocation timestamp, you can presume
when the deallocation of entries happened by periodically monitoring
the total number of deallocations and seeing those history. Or IMO
it's better to log whenever the deallocation happens as proposed 
upthread.

Then you can easily check the history of occurrences of deallocations
from the log file.

Regards,


+1.I think you should output the deallocation history to the log as 
well.


In light of the above, I've posted a patch that reflects the points 
made.


Regards,
diff --git a/contrib/pg_stat_statements/Makefile b/contrib/pg_stat_statements/Makefile
index 081f997d70..3ec627b956 100644
--- a/contrib/pg_stat_statements/Makefile
+++ b/contrib/pg_stat_statements/Makefile
@@ -6,7 +6,7 @@ OBJS = \
 	pg_stat_statements.o
 
 EXTENSION = pg_stat_statements
-DATA = pg_stat_statements--1.4.sql \
+DATA = pg_stat_statements--1.4.sql pg_stat_statements--1.8--1.9.sql \
 	pg_stat_statements--1.7--1.8.sql pg_stat_statements--1.6--1.7.sql \
 	pg_stat_statements--1.5--1.6.sql pg_stat_statements--1.4--1.5.sql \
 	pg_stat_statements--1.3--1.4.sql pg_stat_statements--1.2--1.3.sql \
diff --git a/contrib/pg_stat_statements/expected/pg_stat_statements.out b/contrib/pg_stat_statements/expected/pg_stat_statements.out
index e0edb134f3..85e78c7f46 100644
--- a/contrib/pg_stat_statements/expected/pg_stat_statements.out
+++ b/contrib/pg_stat_statements/expected/pg_stat_statements.out
@@ -859,4 +859,19 @@ SELECT query, plans, calls, rows FROM pg_stat_statements ORDER BY query COLLATE
  SELECT query, plans, calls, rows FROM pg_stat_statements ORDER BY query COLLATE "C" | 1 | 0 |0
 (6 rows)
 
+--
+-- Checking the execution of the pg_stat_statements_info view
+--
+SELECT pg_stat_statements_reset(0,0,0);
+ pg_stat_statements_reset 
+--
+ 
+(1 row)
+
+SELECT * FROM pg_stat_statements_info;
+ dealloc 
+-
+   0
+(1 row)
+
 DROP EXTENSION pg_stat_statements;
diff --git a/contrib/pg_stat_statements/pg_stat_statements--1.8--1.9.sql b/contrib/pg_stat_statements/pg_stat_statements--1.8--1.9.sql
new file mode 100644
index 00..5c491f242a
--- /dev/null
+++ b/contrib/pg_stat_statements/pg_stat_statements--1.8--1.9.sql
@@ -0,0 +1,17 @@
+/* contrib/pg_stat_statements/pg_stat_statements--1.8--1.9.sql */
+
+-- complain if script is sourced in psql, rather than via ALTER EXTENSION
+\echo Use "ALTER EXTENSION pg_stat_statements UPDATE TO '1.9'" to load this file. \quit
+
+--- Define pg_stat_statements_info
+CREATE FUNCTION pg_stat_statements_info (
+	OUT dealloc bigint
+)
+RETURNS bigint
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT VOLATILE PARALLEL SAFE;
+
+CREATE VIEW pg_stat_statements_info AS
+	SELECT * FROM pg_stat_statements_info();
+
+GRANT SELECT ON pg_stat_statements_info TO PUBLIC;
diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index 1eac9edaee..65cf4fb0a7 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -193,6 +193,15 @@ typedef struct Counters
 	uint64		wal_bytes;		/* total amount of WAL bytes generated */
 } Counters;
 
+/*
+ * Counter for pg_stat_statements_info
+ */
+typedef struct pgssInfoCounters
+{
+	int64		dealloc;		/* # of deallocation */
+	slock_t		mutex;			/* protects the counters only */
+} pgssInfoCounters;
+
 /*
  * Statistics per statement
  *
@@ -279,6 +288,7 @@ static ProcessUtility_hook_type prev_ProcessUtility = NULL;
 /* Links to shared memory state */
 static pgssSharedState *pgss = NULL;
 static HTAB *pgss_hash = NULL;
+static pgssInfoCounters *pgss_info = NULL;
 
 /* GUC variables */
 
@@ -327,6 +337,7 @@ PG_FUNCTION_INFO_V1(pg_stat_statements_1_2);
 PG_FUNCTION_INFO_V1(pg_stat_statements_1_3);
 PG_FUNCTION_INFO_V1(pg_stat_statements_1_8);
 PG_FUNCTION_INFO_V1(pg_stat_statements);
+PG_FUNCTION_INFO_V1(pg_stat_statements_info);
 
 static void pgss_shmem_startup(void);
 static void pgss_shmem_shutdown(int code, Datum arg);
@@ -380,6 +391,8 @@ static char *generate_normalized_query(pgssJumbleState *jstate, const char *quer
 static void fill_in_constant_lengths(pgssJumbleState *jstate, const char *query,
 	 int query_loc);
 static int	comp_location(const void *a, const void *b);
+static void pgss_info_update(void);
+static void pgss_info_reset(void);
 
 
 /*
@@ -518,6 +531,7 @@ static void
 pgss_shmem_startup(void)
 {
 	bool		found;
+	bool		found_info;
 	HASHCTL		info;
 	FILE	   *file = NULL;
 	FIL

RE: Disable WAL logging to speed up data loading

2020-11-08 Thread osumi.takami...@fujitsu.com
Hello, Stephen


On Tuesday, Nov 3, 2020 3:02 AM Stephen Frost  wrote:
> * Magnus Hagander (mag...@hagander.net) wrote:
> > On Mon, Nov 2, 2020 at 4:28 PM Robert Haas 
> wrote:
> > > On Thu, Oct 29, 2020 at 4:00 PM Fujii Masao
>  wrote:
> > > > Yes. What I meant was such a safe guard needs to be implemented.
> > > >
> > > > This may mean that if we want to recover the database from that
> > > > backup, we need to specify the recovery target so that the archive
> > > > recovery stops just before the WAL record indicating wal_level change.
> > >
> > > Yeah, I think we need these kinds of safeguards, for sure.
> > >
> > > I'm also concerned about the way that this proposed feature
> > > interacts with incremental backup capabilities that already exist in
> > > tools like pgBackRest, EDB's BART, pg_probackup, and future things
> > > we might want to introduce into core, along the lines of what I have
> > > previously proposed. Now, I think pgBackRest uses only timestamps
> > > and checksums, so it probably doesn't care, but some of the other
> > > solutions rely on WAL-scanning to gather a list of changed blocks. I
> > > guess there's no reason that they can't notice the wal_level being
> > > changed and do the right thing; they should probably have that kind
> > > of capability already. Still, it strikes me that it might be useful
> > > if we had a stronger mechanism.
> 
> Checking the WAL level certainly seems critical to anything that's reading the
> WAL.  We certainly do this already when running as a
> replica:
> 
> ereport(WARNING,
> (errmsg("WAL was generated with wal_level=minimal, data may be
> missing"),
> errhint("This happens if you temporarily set wal_level=minimal
> without taking a new base backup.")));
> 
> There's definitely a question about if a WARNING there is really sufficient or
> not, considering that you could end up with 'logged'
> tables on the replica that are missing data, but I'm not sure that inventing a
> new, independent, mechanism for checking WAL level changes makes sense.
> 
> While pgbackrest backups of a primary wouldn't be impacted, it does support
> backing up from a replica (as do other backup systems, of
> course) and if data that's supposed to be on the replica isn't there because
> someone restarted PG with wal_level=minimal and then flipped it back to
> replica and got the replica to move past that part of the WAL by turning off 
> hot
> standby, replaying, and then turning it back on, then the backup is going to
> also be missing that data.
> 
> Perhaps that's where we need to have a stronger mechanism though- that is,
> if we hit the above WARNING, ever, set a flag somewhere that tools can check
> and which we could also check and throw further warnings about.  In other
> words, every time this replica is started, we could check for this flag and
> throw the same warning above about how data may be missing, and we could
> have pg_basebackup flat out refuse to back up from a replica where this flag
> has been set (maybe with some kind of override mechanism...  maybe not).
> After all, that's where the real issue here is, isn't it?
> 
> > > I'm not exactly sure what that would look like, but suppose we had a
> > > feature where every time wal_level drops below replica, a counter
> > > gets incremented by 1, and that counter is saved in the control
> > > file. Or maybe when wal_level drops below minimal to none. Or maybe
> > > there are two counters. Anyway, the idea is that if you have a
> > > snapshot of the cluster at one time and a snapshot at another time,
> > > you can see whether anything scary has happened in the middle
> > > without needing all of the WAL in between.
> > >
> > > Maybe this is off-topic for this thread or not really needed, but
> > > I'm not sure. I don't think wal_level=none is a bad idea
> > > intrinsically, but I think it would be easy to implement it poorly
> > > and end up harming a lot of users. I have no problem with giving
> > > people a way to do dangerous things, but we should do our best to
> > > let people know how much danger they've incurred.
> 
> I'm not sure that wal_level=none is really the right way to address this
> use-case.  We already have unlogged tables and that's pretty clean and
> meets the "we want to load data fast without having to pay for WAL" use case.
> The argument here seems to be that to take advantage of unlogged tables
> requires the tools using PG to know how to issue a 'CREATE UNLOGGED
> TABLE' command instead of a 'CREATE TABLE' command.  That doesn't
> seem like a huge leap, but we could make it easier by just adding a
> 'table_log_default' or such GUC that could be set on the data loading role to
> have all tables created by it be unlogged.
I'm afraid to say that in the case to setup all tables as unlogged,
the user are forced to be under tension to
back up *all* commands from application, in preparation for unexpected crash.
This is because whenever the server crashes,
the unlogged tab

Re: Temporary tables versus wraparound... again

2020-11-08 Thread Greg Stark
On Mon, 9 Nov 2020 at 00:17, Noah Misch  wrote:
>
> > 2) adding the dependency on heapam.h to heap.c makes sense because of
> > heap_inplace_update bt it may be a bit annoying because I suspect
> > that's a useful sanity check that the tableam stuff hasn't been
> > bypassed
>
> That is not terrible.  How plausible would it be to call vac_update_relstats()
> for this, instead of reimplementing part of it?

It didn't seem worth it to change its API to add boolean flags to skip
setting some of the variables (I was originally only doing
relfrozenxid and minmmxid). Now that I'm doing most of the variables
maybe it makes a bit more sense.

> > @@ -3340,6 +3383,7 @@ heap_truncate_one_rel(Relation rel)
> >
> >   /* Truncate the underlying relation */
> >   table_relation_nontransactional_truncate(rel);
> > + ResetVacStats(rel);
>
> I didn't test, but I expect this will cause a stats reset for the second
> TRUNCATE here:
>
> CREATE TABLE t ();
> ...
> BEGIN;
> TRUNCATE t;
> TRUNCATE t;  -- inplace relfrozenxid reset
> ROLLBACK;  -- inplace reset survives
>
> Does that indeed happen?

Apparently no, see below.  I have to say I was pretty puzzled by the
actual behaviour which is that the rollback actually does roll back
the inplace update. But I *think* what is happening is that the first
truncate does an MVCC update so the inplace update happens only to the
newly created tuple which is never commited.

Thinking about things a bit this does worry me a bit. I wonder if
inplace update is really safe outside of vacuum where we know we're
not in a transaction that can be rolled back. But IIRC doing a
non-inplace update on pg_class for these columns breaks other things.
I don't know if that's still true.

Also, in checking this question I realized I had missed 3d351d91. I
should be initializing reltuples to -1 not 0.


postgres=# vacuum t;
VACUUM
postgres=# select
relname,relpages,reltuples,relallvisible,relfrozenxid from pg_class
where oid='t'::regclass;
 relname | relpages | reltuples | relallvisible | relfrozenxid
-+--+---+---+--
 t   |9 |  2000 | 9 |15557
(1 row)

postgres=# begin;
BEGIN
postgres=*# truncate t;
TRUNCATE TABLE
postgres=*# truncate t;
TRUNCATE TABLE
postgres=*# select
relname,relpages,reltuples,relallvisible,relfrozenxid from pg_class
where oid='t'::regclass;
 relname | relpages | reltuples | relallvisible | relfrozenxid
-+--+---+---+--
 t   |0 | 0 | 0 |15562
(1 row)

postgres=*# abort;
ROLLBACK
postgres=# select
relname,relpages,reltuples,relallvisible,relfrozenxid from pg_class
where oid='t'::regclass;
 relname | relpages | reltuples | relallvisible | relfrozenxid
-+--+---+---+--
 t   |9 |  2000 | 9 |15557
(1 row)




-- 
greg




Re: document pg_settings view doesn't display custom options

2020-11-08 Thread Fujii Masao




On 2020/11/07 0:42, Fujii Masao wrote:



On 2020/10/31 2:06, John Naylor wrote:



On Fri, Oct 30, 2020 at 12:48 PM Tom Lane mailto:t...@sss.pgh.pa.us>> wrote:

    John Naylor mailto:john.nay...@enterprisedb.com>> writes:
 > Okay, along those lines here's a patch using "this view" in a new 
paragraph
 > for simplicity.

    Basically OK with me, but ...

    
    It seems fairly weird to use a nonspecific reference first and then a
    specific one.  That is, I'd expect to read "The pg_settings view ..."
    and then "This view ...", not the other way around.  So we could
    put this para second, or put it first but make this para say
    "The pg_settings view ..." while the existing text gets reduced to
    "This view ...".

    Or just make them both say "This view ..." so we don't have to have
    this discussion again the next time somebody wants to add a para here.
    


Okay, how's this?


Looks good to me. Barring any objection, I will commit the patch.


Pushed. Thanks!

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: abstract Unix-domain sockets

2020-11-08 Thread Michael Paquier
On Thu, Oct 22, 2020 at 09:03:49AM +0200, Peter Eisentraut wrote:
> On 2020-10-09 09:28, Peter Eisentraut wrote:
>> During the discussion on Unix-domain sockets on Windows, someone pointed
>> out[0] abstract Unix-domain sockets.  This is a variant of the normal
>> Unix-domain sockets that don't use the file system but a separate
>> "abstract" namespace.  At the user interface, such sockets are
>> represented by names starting with "@".  I took a look at this and it
>> wasn't hard to get working, so here is a patch.  It's supposed to be
>> supported on Linux and Windows right now, but I haven't tested on Windows.

Yeah, peaking at the Windows docs, what you are trying to do here
should be supported (please note that I have not tested ).  One
reference:
https://devblogs.microsoft.com/commandline/af_unix-comes-to-windows/

>> And then some extra patches for surrounding cleanup.  During testing I
>> noticed that the bind() failure hint "Is another postmaster already
>> running ..." was shown in inappropriate situations, so I changed that to
>> only show for EADDRINUSE errors.  (Maybe other error codes could be
>> appropriate, but I couldn't find any more.)
>> 
>> And then looking for other uses of EADDRINUSE I found some dead
>> Windows-related code that can be cleaned up.
> 
> This last piece has been committed.

+   
+A value that starts with @ specifies that a
+Unix-domain socket in the abstract namespace should be created
+(currently supported on Linux and Windows).  In that case, this value
+does not specify a directory but a prefix from which
+the actual socket name is computed in the same manner as for the
+file-system namespace.  While the abstract socket name prefix can be
+chosen freely, since it is not a file-system location, the convention
+is to nonetheless use file-system-like values such as
+@/tmp.
+   

As abstract namespaces don't have permissions, anyone knowing the name
of the path, which should be unique, can have an access to the server.
Do you think that the documentation should warn the user about that?
This feature is about easing the management part of the socket paths
while throwing away the security aspect of it.

When attempting to start a server that listens to the same port and
uses the same abstract path, the second server started still shows
a hint referring to a file that does not exist:
LOG: could not bind Unix address "@tmp/.s.PGSQL.5432": Address already
in use
HINT: Is another postmaster already running on port 5432? If not,
remove socket file "@tmp/.s.PGSQL.5432" and retry.

Instead of showing paths with at signs, wouldn't it be better to
mention it is an abstract socket address?

I am not sure that 0002 is an improvement.  It would be more readable
to move the part choosing what hint is adapted into a first block that
selects the hint string rather than have the whole thing in a single
elog() call.
--
Michael


signature.asc
Description: PGP signature


Re: logical streaming of xacts via test_decoding is broken

2020-11-08 Thread Amit Kapila
On Mon, Nov 9, 2020 at 11:21 AM Dilip Kumar  wrote:
>
> On Mon, Nov 9, 2020 at 11:04 AM Dilip Kumar  wrote:
> >
> > On Mon, Nov 9, 2020 at 11:00 AM Amit Kapila  wrote:
> > >
> > > Michael reported a BF failure [1] related to one of the logical
> > > streaming test case and I've analyzed the issue. As responded on
> > > pgsql-committers [2], the issue here is that the streaming
> > > transactions can be interleaved and because we are maintaining whether
> > > xact_wrote_changes at the LogicalDecodingContext level, one of later
> > > transaction can overwrite the flag for previously streaming
> > > transaction. I think it is logical to have this flag at each
> > > transaction level (aka in ReorderBufferTxn), however till now it was
> > > fine because the changes of each transaction are decoded at one-shot
> > > which will be no longer true. We can keep a output_plugin_private data
> > > pointer in ReorderBufferTxn which will be used by test_decoding module
> > > to keep this and any other such flags in future. We need to set this
> > > flag at begin_cb and stream_start_cb APIs and then reset/remove it at
> > > stream_commit_cb, stream_abort_cb and stream_stop_cb APIs.
>
> So IIUC, we need to keep 'output_plugin_private' in
> LogicalDecodingContext as well as in ReorderBufferTxn,  So the
> output_plugin_private in the ReorderBufferTxn will currently just keep
> one flag xact_wrote_changes and the remaining things will still be
> maintained in output_plugin_private of the LogicalDecodingContext.  Is
> my understanding correct?
>

Yes. But keep it as void * so that we can add more things later if required.

-- 
With Regards,
Amit Kapila.




Re: logical streaming of xacts via test_decoding is broken

2020-11-08 Thread Dilip Kumar
On Mon, Nov 9, 2020 at 11:04 AM Dilip Kumar  wrote:
>
> On Mon, Nov 9, 2020 at 11:00 AM Amit Kapila  wrote:
> >
> > Michael reported a BF failure [1] related to one of the logical
> > streaming test case and I've analyzed the issue. As responded on
> > pgsql-committers [2], the issue here is that the streaming
> > transactions can be interleaved and because we are maintaining whether
> > xact_wrote_changes at the LogicalDecodingContext level, one of later
> > transaction can overwrite the flag for previously streaming
> > transaction. I think it is logical to have this flag at each
> > transaction level (aka in ReorderBufferTxn), however till now it was
> > fine because the changes of each transaction are decoded at one-shot
> > which will be no longer true. We can keep a output_plugin_private data
> > pointer in ReorderBufferTxn which will be used by test_decoding module
> > to keep this and any other such flags in future. We need to set this
> > flag at begin_cb and stream_start_cb APIs and then reset/remove it at
> > stream_commit_cb, stream_abort_cb and stream_stop_cb APIs.

So IIUC, we need to keep 'output_plugin_private' in
LogicalDecodingContext as well as in ReorderBufferTxn,  So the
output_plugin_private in the ReorderBufferTxn will currently just keep
one flag xact_wrote_changes and the remaining things will still be
maintained in output_plugin_private of the LogicalDecodingContext.  Is
my understanding correct?

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: logical streaming of xacts via test_decoding is broken

2020-11-08 Thread Dilip Kumar
On Mon, Nov 9, 2020 at 11:00 AM Amit Kapila  wrote:
>
> Michael reported a BF failure [1] related to one of the logical
> streaming test case and I've analyzed the issue. As responded on
> pgsql-committers [2], the issue here is that the streaming
> transactions can be interleaved and because we are maintaining whether
> xact_wrote_changes at the LogicalDecodingContext level, one of later
> transaction can overwrite the flag for previously streaming
> transaction. I think it is logical to have this flag at each
> transaction level (aka in ReorderBufferTxn), however till now it was
> fine because the changes of each transaction are decoded at one-shot
> which will be no longer true. We can keep a output_plugin_private data
> pointer in ReorderBufferTxn which will be used by test_decoding module
> to keep this and any other such flags in future. We need to set this
> flag at begin_cb and stream_start_cb APIs and then reset/remove it at
> stream_commit_cb, stream_abort_cb and stream_stop_cb APIs.
>
> Additionally, we can extend the existing test case
> concurrent_stream.spec to cover this scenario by adding a step to have
> an empty transaction before the commit of transaction which we are
> going to stream changes for (before s1_commit).
>
> Thoughts?

The analysis seems correct to me, I will work on it.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




logical streaming of xacts via test_decoding is broken

2020-11-08 Thread Amit Kapila
Michael reported a BF failure [1] related to one of the logical
streaming test case and I've analyzed the issue. As responded on
pgsql-committers [2], the issue here is that the streaming
transactions can be interleaved and because we are maintaining whether
xact_wrote_changes at the LogicalDecodingContext level, one of later
transaction can overwrite the flag for previously streaming
transaction. I think it is logical to have this flag at each
transaction level (aka in ReorderBufferTxn), however till now it was
fine because the changes of each transaction are decoded at one-shot
which will be no longer true. We can keep a output_plugin_private data
pointer in ReorderBufferTxn which will be used by test_decoding module
to keep this and any other such flags in future. We need to set this
flag at begin_cb and stream_start_cb APIs and then reset/remove it at
stream_commit_cb, stream_abort_cb and stream_stop_cb APIs.

Additionally, we can extend the existing test case
concurrent_stream.spec to cover this scenario by adding a step to have
an empty transaction before the commit of transaction which we are
going to stream changes for (before s1_commit).

Thoughts?

[1] - https://www.postgresql.org/message-id/20201109014118.GD1695%40paquier.xyz
[2] - 
https://www.postgresql.org/message-id/CAA4eK1JMCm9HURVmOapo%2Bv2u2EEABOuzgp7XJ32C072ygcKktQ%40mail.gmail.com

-- 
With Regards,
Amit Kapila.




Re: Temporary tables versus wraparound... again

2020-11-08 Thread Noah Misch
On Sun, Nov 08, 2020 at 06:19:57PM -0500, Greg Stark wrote:
> However in the case of ON COMMIT DELETE ROWS we can do better. Why not
> just reset the relfrozenxid and other stats as if the table was
> freshly created when it's truncated?

That concept is sound.

> 1) Should we update relpages and reltuples. I think so but an argument
> could be made that people might be depending on running analyze once
> when the data is loaded and then not having to run analyze on every
> data load.

I'd wager no, we should not.  An app that ever analyzes an ON COMMIT DELETE
ROWS temp table probably analyzes it every time.  If not, it's fair to guess
that similar statistics recur in each xact.

> 2) adding the dependency on heapam.h to heap.c makes sense because of
> heap_inplace_update bt it may be a bit annoying because I suspect
> that's a useful sanity check that the tableam stuff hasn't been
> bypassed

That is not terrible.  How plausible would it be to call vac_update_relstats()
for this, instead of reimplementing part of it?

> 3) I added a check to the regression tests but I'm not sure it's a
> good idea to actually commit this. It could fail if there's a parallel
> transaction going on and even moving the test to the serial schedule
> might not guarantee that never happens due to autovacuum running
> analyze?

Right.

> @@ -3340,6 +3383,7 @@ heap_truncate_one_rel(Relation rel)
>  
>   /* Truncate the underlying relation */
>   table_relation_nontransactional_truncate(rel);
> + ResetVacStats(rel);

I didn't test, but I expect this will cause a stats reset for the second
TRUNCATE here:

CREATE TABLE t ();
...
BEGIN;
TRUNCATE t;
TRUNCATE t;  -- inplace relfrozenxid reset
ROLLBACK;  -- inplace reset survives

Does that indeed happen?




Re: Spurious "apparent wraparound" via SimpleLruTruncate() rounding

2020-11-08 Thread Noah Misch
On Wed, Oct 28, 2020 at 09:01:59PM -0700, Noah Misch wrote:
> On Sat, Aug 29, 2020 at 10:34:33PM -0700, Noah Misch wrote:
> > On Mon, May 25, 2020 at 12:00:33AM -0700, Noah Misch wrote:
> > > [last non-rebase change]
> > 
> > Rebased the second patch.  The first patch did not need a rebase.
> 
> Rebased both patches, necessitated by commit dee663f changing many of the same
> spots.

Rebased both patches, necessitated by commit c732c3f (a repair of commit
dee663f).  As I mentioned on another branch of the thread, I'd be content if
someone reviews the slru-truncate-modulo patch and disclaims knowledge of the
slru-truncate-insurance patch; I would then abandon the latter patch.
Author: Noah Misch 
Commit: Noah Misch 

Prevent excess SimpleLruTruncate() deletion.

Every core SLRU wraps around.  With the exception of pg_notify, the wrap
point can fall in the middle of a page.  Account for this in the
PagePrecedes callback specification and in SimpleLruTruncate()'s use of
said callback.  Update each callback implementation to fit the new
specification.  This changes SerialPagePrecedesLogically() from the
style of asyncQueuePagePrecedes() to the style of CLOGPagePrecedes().
(Whereas pg_clog and pg_serial share a key space, pg_serial is nothing
like pg_notify.)  The bug fixed here has the same symptoms and user
followup steps as 592a589a04bd456410b853d86bd05faa9432cbbb.  Back-patch
to 9.5 (all supported versions).

Reviewed by Tom Lane.

Discussion: https://postgr.es/m/20190202083822.gc32...@gust.leadboat.com

diff --git a/src/backend/access/transam/clog.c 
b/src/backend/access/transam/clog.c
index 034349a..55bdac4 100644
--- a/src/backend/access/transam/clog.c
+++ b/src/backend/access/transam/clog.c
@@ -694,6 +694,7 @@ CLOGShmemInit(void)
SimpleLruInit(XactCtl, "Xact", CLOGShmemBuffers(), CLOG_LSNS_PER_PAGE,
  XactSLRULock, "pg_xact", 
LWTRANCHE_XACT_BUFFER,
  SYNC_HANDLER_CLOG);
+   SlruPagePrecedesUnitTests(XactCtl, CLOG_XACTS_PER_PAGE);
 }
 
 /*
@@ -912,13 +913,22 @@ TruncateCLOG(TransactionId oldestXact, Oid 
oldestxid_datoid)
 
 
 /*
- * Decide which of two CLOG page numbers is "older" for truncation purposes.
+ * Decide whether a CLOG page number is "older" for truncation purposes.
  *
  * We need to use comparison of TransactionIds here in order to do the right
- * thing with wraparound XID arithmetic.  However, if we are asked about
- * page number zero, we don't want to hand InvalidTransactionId to
- * TransactionIdPrecedes: it'll get weird about permanent xact IDs.  So,
- * offset both xids by FirstNormalTransactionId to avoid that.
+ * thing with wraparound XID arithmetic.  However, TransactionIdPrecedes()
+ * would get weird about permanent xact IDs.  So, offset both such that xid1,
+ * xid2, and xid + CLOG_XACTS_PER_PAGE - 1 are all normal XIDs; this offset is
+ * relevant to page 0 and to the page preceding page 0.
+ *
+ * The page containing oldestXact-2^31 is the important edge case.  The
+ * portion of that page equaling or following oldestXact-2^31 is expendable,
+ * but the portion preceding oldestXact-2^31 is not.  When oldestXact-2^31 is
+ * the first XID of a page and segment, the entire page and segment is
+ * expendable, and we could truncate the segment.  Recognizing that case would
+ * require making oldestXact, not just the page containing oldestXact,
+ * available to this callback.  The benefit would be rare and small, so we
+ * don't optimize that edge case.
  */
 static bool
 CLOGPagePrecedes(int page1, int page2)
@@ -927,11 +937,12 @@ CLOGPagePrecedes(int page1, int page2)
TransactionId xid2;
 
xid1 = ((TransactionId) page1) * CLOG_XACTS_PER_PAGE;
-   xid1 += FirstNormalTransactionId;
+   xid1 += FirstNormalTransactionId + 1;
xid2 = ((TransactionId) page2) * CLOG_XACTS_PER_PAGE;
-   xid2 += FirstNormalTransactionId;
+   xid2 += FirstNormalTransactionId + 1;
 
-   return TransactionIdPrecedes(xid1, xid2);
+   return (TransactionIdPrecedes(xid1, xid2) &&
+   TransactionIdPrecedes(xid1, xid2 + CLOG_XACTS_PER_PAGE 
- 1));
 }
 
 
diff --git a/src/backend/access/transam/commit_ts.c 
b/src/backend/access/transam/commit_ts.c
index 2fe551f..ae45777 100644
--- a/src/backend/access/transam/commit_ts.c
+++ b/src/backend/access/transam/commit_ts.c
@@ -557,6 +557,7 @@ CommitTsShmemInit(void)
  CommitTsSLRULock, "pg_commit_ts",
  LWTRANCHE_COMMITTS_BUFFER,
  SYNC_HANDLER_COMMIT_TS);
+   SlruPagePrecedesUnitTests(CommitTsCtl, COMMIT_TS_XACTS_PER_PAGE);
 
commitTsShared = ShmemInitStruct("CommitTs shared",
 
sizeof(CommitTimestampShared),
@@ -927,14 +928,27 @@ AdvanceOldestCommitTsXid(TransactionId oldestXact)

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

2020-11-08 Thread Andy Fan
On Mon, Nov 9, 2020 at 10:07 AM David Rowley  wrote:

> On Mon, 9 Nov 2020 at 03:52, Andy Fan  wrote:
> > then I did a perf on the 2 version,  Is it possible that you called
> tts_minimal_clear twice in
> > the v9 version?  Both ExecClearTuple and  ExecStoreMinimalTuple called
> tts_minimal_clear
> > on the same  slot.
> >
> > With the following changes:
>
> Thanks for finding that.  After applying that fix I did a fresh set of
> benchmarks on the latest master, latest master + v8 and latest master
> + v9 using the attached script. (resultcachebench2.sh.txt)
>
> I ran this on my zen2 AMD64 machine and formatted the results into the
> attached resultcache_master_vs_v8_vs_v9.csv file
>
> If I load this into PostgreSQL:
>
> # create table resultcache_bench (tbl text, target text, col text,
> latency_master numeric(10,3), latency_v8 numeric(10,3), latency_v9
> numeric(10,3));
> # copy resultcache_bench from
> '/path/to/resultcache_master_vs_v8_vs_v9.csv' with(format csv);
>
> and run:
>
> # select col,tbl,target, sum(latency_v8) v8, sum(latency_v9) v9,
> round(avg(latency_v8/latency_v9)*100,1) as v8_vs_v9 from
> resultcache_bench group by 1,2,3 order by 2,1,3;
>
> I've attached the results of the above query. (resultcache_v8_vs_v9.txt)
>
> Out of the 24 tests done on each branch, only 6 of 24 are better on v9
> compared to v8. So v8 wins on 75% of the tests.


I think either version is OK for me and I like this patch overall.  However
I believe v9
should be no worse than v8 all the time,  Is there any theory to explain
your result?


v9 never wins using
> the lookup1 table (1 row per lookup). It only runs on 50% of the
> lookup100 queries (100 inner rows per outer row). However, despite the
> draw in won tests for the lookup100 test, v8 takes less time overall,
> as indicated by the following query:
>
> postgres=# select round(avg(latency_v8/latency_v9)*100,1) as v8_vs_v9
> from resultcache_bench where tbl='lookup100';
>  v8_vs_v9
> --
>  99.3
> (1 row)
>
> Ditching the WHERE clause and simply doing:
>
> postgres=# select round(avg(latency_v8/latency_v9)*100,1) as v8_vs_v9
> from resultcache_bench;
>  v8_vs_v9
> --
>  96.2
> (1 row)
>
> indicates that v8 is 3.8% faster than v9. Altering that query
> accordingly indicates v8 is 11.5% faster than master and v9 is only 7%
> faster than master.
>
> Of course, scaling up the test will yield both versions being even
> more favourable then master, but the point here is comparing v8 to v9.
>
> David
>


-- 
Best Regards
Andy Fan


RE: extension patch of CREATE OR REPLACE TRIGGER

2020-11-08 Thread osumi.takami...@fujitsu.com
Hi,


On Saturday, Nov 7, 2020 2:06 PM Dilip Kumar  wrote:
> The patch looks fine to me however I feel that in the test case there are a 
> lot
> of duplicate statement which can be reduced e.g.
> +-- 1. Overwrite existing regular trigger with regular trigger
> (without OR REPLACE)
> +create trigger my_trig
> +  after insert on my_table
> +  for each row execute procedure funcA(); create trigger my_trig
> +  after insert on my_table
> +  for each row execute procedure funcB(); -- should fail drop trigger
> +my_trig on my_table;
> +
> +-- 2. Overwrite existing regular trigger with regular trigger (with OR
> +REPLACE) create trigger my_trig
> +  after insert on my_table
> +  for each row execute procedure funcA(); insert into my_table values
> +(1); create or replace trigger my_trig
> +  after insert on my_table
> +  for each row execute procedure funcB(); -- OK insert into my_table
> +values (1); drop trigger my_trig on my_table;
> 
> In this test, test 1 failed because it tried to change the trigger function 
> without
> OR REPLACE, which is fine but now test 2 can continue from there, I mean we
> don't need to drop the trigger at end of the
> test1 and then test2 can try it with OR REPLACE syntax.  This way we can
> reduce the extra statement execution which is not necessary.
OK. That makes sense.

Attached the revised version.
The tests in this patch should not include redundancy.
I checked the tests of trigger replacement for partition tables as well.

Here, I did not and will not delete the comments with numbering from 1 to 8 so 
that
other developers can check if the all cases are listed up or not easily.


Best,
Takamichi Osumi


CREATE_OR_REPLACE_TRIGGER_v18.patch
Description: CREATE_OR_REPLACE_TRIGGER_v18.patch


Re: Some doubious code in pgstat.c

2020-11-08 Thread Kyotaro Horiguchi
At Fri, 6 Nov 2020 16:40:39 +0530, Amit Kapila  wrote 
in 
> On Thu, Nov 5, 2020 at 2:13 PM Kyotaro Horiguchi
>  wrote:
> >
> > At Thu, 5 Nov 2020 11:48:24 +0530, Amit Kapila  
> > wrote in
> > > On Thu, Nov 5, 2020 at 9:44 AM Masahiko Sawada  
> > > wrote:
> > > >
> > > > On Thu, Nov 5, 2020 at 11:18 AM Kyotaro Horiguchi
> > > >  wrote:
> > > > > As another issue, just replace memcpy with strlcpy makes compiler
> > > > > complain of type mismatch, as the first paramter to memcpy had an
> > > > > needless "&" operator. I removed it in this patch.
> > > > >
> > > > > (&msg.m_slotname is a "char (*)[NAMEDATALEN]", not a "char *".)
> > > > >
> > > >
> > > > The patch looks good to me.
> > > >
> > >
> > > LGTM as well but the proposed commit message seems to be a bit
> > > unclear. How about something like this:
> > > "Use strlcpy instead of memcpy for copying the slot name in pgstat.c.
> > >
> > > There is no outright bug here but it is better to be consistent with
> > > the usage at other places in the same file. In the passing, fix a wrong
> > > Assertion in pgstat_recv_replslot."
> >
> > Looks better, thanks.
> >
> 
> Pushed!

Thanks!

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Protect syscache from bloating with negative cache entries

2020-11-08 Thread Kyotaro Horiguchi
At Mon, 09 Nov 2020 11:13:31 +0900 (JST), Kyotaro Horiguchi 
 wrote in 
> Now the branch for counter-increment is removed.  For similar
> branches for counter-decrement side in CatCacheCleanupOldEntries(),
> Min() is compiled into cmovbe and a branch was removed.

Mmm. Sorry, I sent this by a mistake. Please ignore it.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Protect syscache from bloating with negative cache entries

2020-11-08 Thread Kyotaro Horiguchi
At Fri, 6 Nov 2020 10:42:15 +0200, Heikki Linnakangas  wrote 
in 
> On 06/11/2020 10:24, Kyotaro Horiguchi wrote:
> > Thank you for the comment!
> > First off, I thought that I managed to eliminate the degradation
> > observed on the previous versions, but significant degradation (1.1%
> > slower) is still seen in on case.
> 
> One thing to keep in mind with micro-benchmarks like this is that even
> completely unrelated code changes can change the layout of the code in
> memory, which in turn can affect CPU caching affects in surprising
> ways. If you're lucky, you can see 1-5% differences just by adding a
> function that's never called, for example, if it happens to move other
> code in memory so that a some hot codepath or struct gets split across
> CPU cache lines. It can be infuriating when benchmarking.

True.  I sometimes had to make distclean to stabilize such benchmarks..

> > At Thu, 5 Nov 2020 11:09:09 +0200, Heikki Linnakangas
> >  wrote in
> > (A) original test patch
> > I naively thought that the code path is too short to bury the
> > degradation of additional a few instructions.  Actually I measured
> > performance again with the same patch set on the current master and
> > had the more or less the same result.
> > master 8195.58ms, patched 8817.40 ms: +10.75%
> > However, I noticed that the additional call was a recursive call and a
> > jmp inserted for the recursive call seems taking significant
> > time. After avoiding the recursive call, the difference reduced to
> > +0.96% (master 8268.71ms : patched 8348.30ms)
> > Just two instructions below are inserted in this case, which looks
> > reasonable.
> >8720ff <+31>: cmpl $0x,0x4ba942(%rip) # 0xd2ca48
> >
> >872106 <+38>: jl 0x872240  (call to a function)
> 
> That's interesting. I think a 1% degradation would be acceptable.
> 
> I think we'd like to enable this feature by default though, so the
> performance when it's enabled is also very important.
> 
> > (C) inserting bare counter-update code without a branch
> > 
> >> Do we actually need a branch there? If I understand correctly, the
> >> point is to bump up a usage counter on the catcache entry. You could
> >> increment the counter unconditionally, even if the feature is not
> >> used, and avoid the branch that way.
> > That change causes 4.9% degradation, which is worse than having a
> > branch.
> > master 8364.54ms, patched 8666.86ms (+4.9%)
> > The additional instructions follow.
> > + 8721ab <+203>:mov0x30(%rbx),%eax  # %eax = ct->naccess
> > + 8721ae <+206>:mov$0x2,%edx
> > + 8721b3 <+211>:add$0x1,%eax# %eax++
> > + 8721b6 <+214>: cmove %edx,%eax # if %eax == 0 then %eax = 2
> > 
> > + 8721bf <+223>:mov%eax,0x30(%rbx)  # ct->naccess = %eax
> > + 8721c2 <+226>: mov 0x4cfe9f(%rip),%rax # 0xd42068 
> > + 8721c9 <+233>:mov%rax,0x38(%rbx)  # ct->lastaccess = %rax
> 
> Do you need the "ntaccess == 2" test? You could always increment the
> counter, and in the code that uses ntaccess to decide what to evict,
> treat all values >= 2 the same.
> 
> Need to handle integer overflow somehow. Or maybe not: integer
> overflow is so infrequent that even if a hot syscache entry gets
> evicted prematurely because its ntaccess count wrapped around to 0, it
> will happen so rarely that it won't make any difference in practice.

Agreed. Ok, I have prioritized completely avoiding degradation on the
normal path, but laxing that restriction to 1% or so makes the code
far simpler and make the expiration path signifinicantly faster.

Now the branch for counter-increment is removed.  For similar
branches for counter-decrement side in CatCacheCleanupOldEntries(),
Min() is compiled into cmovbe and a branch was removed.






Re: Hybrid Hash/Nested Loop joins and caching results from subplans

2020-11-08 Thread David Rowley
On Mon, 9 Nov 2020 at 03:52, Andy Fan  wrote:
> then I did a perf on the 2 version,  Is it possible that you called 
> tts_minimal_clear twice in
> the v9 version?  Both ExecClearTuple and  ExecStoreMinimalTuple called 
> tts_minimal_clear
> on the same  slot.
>
> With the following changes:

Thanks for finding that.  After applying that fix I did a fresh set of
benchmarks on the latest master, latest master + v8 and latest master
+ v9 using the attached script. (resultcachebench2.sh.txt)

I ran this on my zen2 AMD64 machine and formatted the results into the
attached resultcache_master_vs_v8_vs_v9.csv file

If I load this into PostgreSQL:

# create table resultcache_bench (tbl text, target text, col text,
latency_master numeric(10,3), latency_v8 numeric(10,3), latency_v9
numeric(10,3));
# copy resultcache_bench from
'/path/to/resultcache_master_vs_v8_vs_v9.csv' with(format csv);

and run:

# select col,tbl,target, sum(latency_v8) v8, sum(latency_v9) v9,
round(avg(latency_v8/latency_v9)*100,1) as v8_vs_v9 from
resultcache_bench group by 1,2,3 order by 2,1,3;

I've attached the results of the above query. (resultcache_v8_vs_v9.txt)

Out of the 24 tests done on each branch, only 6 of 24 are better on v9
compared to v8. So v8 wins on 75% of the tests.  v9 never wins using
the lookup1 table (1 row per lookup). It only runs on 50% of the
lookup100 queries (100 inner rows per outer row). However, despite the
draw in won tests for the lookup100 test, v8 takes less time overall,
as indicated by the following query:

postgres=# select round(avg(latency_v8/latency_v9)*100,1) as v8_vs_v9
from resultcache_bench where tbl='lookup100';
 v8_vs_v9
--
 99.3
(1 row)

Ditching the WHERE clause and simply doing:

postgres=# select round(avg(latency_v8/latency_v9)*100,1) as v8_vs_v9
from resultcache_bench;
 v8_vs_v9
--
 96.2
(1 row)

indicates that v8 is 3.8% faster than v9. Altering that query
accordingly indicates v8 is 11.5% faster than master and v9 is only 7%
faster than master.

Of course, scaling up the test will yield both versions being even
more favourable then master, but the point here is comparing v8 to v9.

David
#!/bin/bash
seconds=60

psql -c "drop table if exists hundredk, lookup1, lookup100;" postgres
psql -c "create table hundredk (hundredk int, tenk int, thousand int, hundred 
int, ten int, one int);" postgres
psql -c "insert into hundredk select x%10,x%1,x%1000,x%100,x%10,1 from 
generate_Series(1,10) x;" postgres

psql -c "create table lookup100 (a int);" postgres
psql -c "insert into lookup100 select x from 
generate_Series(1,10)x,generate_Series(1,100);" postgres
psql -c "create index on lookup100 (a);" postgres

psql -c "create table lookup1 (a int);" postgres
psql -c "insert into lookup1 select x from generate_Series(1,10)x;" postgres
psql -c "create index on lookup1 (a);" postgres

psql -c "vacuum analyze lookup1, lookup100, hundredk;" postgres

for branch in master resultcache_v8 resultcache_v9
do
cd pg_src
git checkout $branch
make clean -s
make -j -s
make install -s
cd
./ss.sh
sleep 1
psql -c "select pg_prewarm('lookup1'), pg_prewarm('lookup100'), 
pg_prewarm('hundredk');" postgres
for tbl in lookup1 lookup100
do
for target in "count(*)" "count(l.a)" "'*'"
do
for col in thousand hundred ten one
do
echo "select $target from hundredk hk inner 
join $tbl l on hk.$col = l.a" > bench.sql
echo Testing $branch $tbl $target $col >> 
bench.log
pgbench -n -T $seconds -f bench.sql postgres | 
grep latency >> bench.log
done
done
done
done


resultcache_master_vs_v8_vs_v9.csv
Description: MS-Excel spreadsheet
   col|tbl|   target   |v8|v9| v8_vs_v9
--+---++--+--+--
 hundred  | lookup1   | '*'|   42.484 |   44.511 | 95.4
 hundred  | lookup1   | count(*)   |   30.513 |   33.016 | 92.4
 hundred  | lookup1   | count(l.a) |   32.651 |   35.471 | 92.0
 one  | lookup1   | '*'|   42.084 |   43.668 | 96.4
 one  | lookup1   | count(*)   |   29.255 |   32.162 | 91.0
 one  | lookup1   | count(l.a) |   31.772 |   35.139 | 90.4
 ten  | lookup1   | '*'|   40.286 |   42.439 | 94.9
 ten  | lookup1   | count(*)   |   29.286 |   32.009 | 91.5
 ten  | lookup1   | count(l.a) |   31.392 |   34.053 | 92.2
 thousand | lookup1   | '*'|   43.771 |   45.711 | 95.8
 thousand | lookup1   | count(*)   |   31.531 |   33.845 | 93.2
 thousand | lookup1   | count(l.a) |   33.339 |   35.903 | 92.9
 hundred  | lookup100 | '*'| 1494.440 | 1471.999 |101.5
 hundred  | lookup100 | count(*)   |  266.988 |  26

Re: array_cat anycompatible change is breaking xversion upgrade tests

2020-11-08 Thread Tom Lane
I wrote:
> I think the most plausible response is to add this aggregate to the filter
> logic that already exists in the xversion tests.  Perhaps we could
> alternatively change this test case so that it relies on some other
> polymorphic function, but I'm not quite sure what a good candidate
> would be.

After looking at the commit that added array_cat_accum() (65d9aedb1),
I decided that it's fine to replace this aggregate with one using another
anyarray function, such as array_larger().  The point of that test is just
to show that the array argument can be array-of-record, so we don't need
the operation to be array_cat() specifically.  So I propose the attached
patches to un-break the xversion tests.

I'll hold off pushing this till after this week's wraps, though.

regards, tom lane

diff --git a/src/test/regress/expected/polymorphism.out b/src/test/regress/expected/polymorphism.out
index 2c3bb0a60b..b33b004a7f 100644
--- a/src/test/regress/expected/polymorphism.out
+++ b/src/test/regress/expected/polymorphism.out
@@ -729,24 +729,24 @@ select q2, sql_if(q2 > 0, q2, q2 + 1) from int8_tbl;
 (5 rows)
 
 -- another sort of polymorphic aggregate
-CREATE AGGREGATE array_cat_accum (anycompatiblearray)
+CREATE AGGREGATE array_larger_accum (anyarray)
 (
-sfunc = array_cat,
-stype = anycompatiblearray,
+sfunc = array_larger,
+stype = anyarray,
 initcond = '{}'
 );
-SELECT array_cat_accum(i)
+SELECT array_larger_accum(i)
 FROM (VALUES (ARRAY[1,2]), (ARRAY[3,4])) as t(i);
- array_cat_accum 
--
- {1,2,3,4}
+ array_larger_accum 
+
+ {3,4}
 (1 row)
 
-SELECT array_cat_accum(i)
+SELECT array_larger_accum(i)
 FROM (VALUES (ARRAY[row(1,2),row(3,4)]), (ARRAY[row(5,6),row(7,8)])) as t(i);
-  array_cat_accum  

- {"(1,2)","(3,4)","(5,6)","(7,8)"}
+ array_larger_accum 
+
+ {"(5,6)","(7,8)"}
 (1 row)
 
 -- another kind of polymorphic aggregate
diff --git a/src/test/regress/sql/polymorphism.sql b/src/test/regress/sql/polymorphism.sql
index 70a21c8978..527899e8ad 100644
--- a/src/test/regress/sql/polymorphism.sql
+++ b/src/test/regress/sql/polymorphism.sql
@@ -498,17 +498,17 @@ select q2, sql_if(q2 > 0, q2, q2 + 1) from int8_tbl;
 
 -- another sort of polymorphic aggregate
 
-CREATE AGGREGATE array_cat_accum (anycompatiblearray)
+CREATE AGGREGATE array_larger_accum (anyarray)
 (
-sfunc = array_cat,
-stype = anycompatiblearray,
+sfunc = array_larger,
+stype = anyarray,
 initcond = '{}'
 );
 
-SELECT array_cat_accum(i)
+SELECT array_larger_accum(i)
 FROM (VALUES (ARRAY[1,2]), (ARRAY[3,4])) as t(i);
 
-SELECT array_cat_accum(i)
+SELECT array_larger_accum(i)
 FROM (VALUES (ARRAY[row(1,2),row(3,4)]), (ARRAY[row(5,6),row(7,8)])) as t(i);
 
 -- another kind of polymorphic aggregate
diff --git a/src/test/regress/expected/polymorphism.out b/src/test/regress/expected/polymorphism.out
index 1ff40764d9..980e2e861c 100644
--- a/src/test/regress/expected/polymorphism.out
+++ b/src/test/regress/expected/polymorphism.out
@@ -729,24 +729,24 @@ select q2, sql_if(q2 > 0, q2, q2 + 1) from int8_tbl;
 (5 rows)
 
 -- another sort of polymorphic aggregate
-CREATE AGGREGATE array_cat_accum (anyarray)
+CREATE AGGREGATE array_larger_accum (anyarray)
 (
-sfunc = array_cat,
+sfunc = array_larger,
 stype = anyarray,
 initcond = '{}'
 );
-SELECT array_cat_accum(i)
+SELECT array_larger_accum(i)
 FROM (VALUES (ARRAY[1,2]), (ARRAY[3,4])) as t(i);
- array_cat_accum 
--
- {1,2,3,4}
+ array_larger_accum 
+
+ {3,4}
 (1 row)
 
-SELECT array_cat_accum(i)
+SELECT array_larger_accum(i)
 FROM (VALUES (ARRAY[row(1,2),row(3,4)]), (ARRAY[row(5,6),row(7,8)])) as t(i);
-  array_cat_accum  

- {"(1,2)","(3,4)","(5,6)","(7,8)"}
+ array_larger_accum 
+
+ {"(5,6)","(7,8)"}
 (1 row)
 
 -- another kind of polymorphic aggregate
diff --git a/src/test/regress/sql/polymorphism.sql b/src/test/regress/sql/polymorphism.sql
index e5222f1f81..15b8dfc6ce 100644
--- a/src/test/regress/sql/polymorphism.sql
+++ b/src/test/regress/sql/polymorphism.sql
@@ -498,17 +498,17 @@ select q2, sql_if(q2 > 0, q2, q2 + 1) from int8_tbl;
 
 -- another sort of polymorphic aggregate
 
-CREATE AGGREGATE array_cat_accum (anyarray)
+CREATE AGGREGATE array_larger_accum (anyarray)
 (
-sfunc = array_cat,
+sfunc = array_larger,
 stype = anyarray,
 initcond = '{}'
 );
 
-SELECT array_cat_accum(i)
+SELECT array_larger_accum(i)
 FROM (VALUES (ARRAY[1,2]), (ARRAY[3,4])) as t(i);
 
-SELECT array_cat_accum(i)
+SELECT array_larger_accum(i)
 FROM (VALUES (ARRAY[row(1,2),row(3,4)]), (ARRAY[row(5,6),row(7,8)])) as t(i);
 
 -- another kind of polymorphic aggregate


Re: upcoming API changes for LLVM 12

2020-11-08 Thread Tom Lane
Andres Freund  writes:
> Yea, I'll try to do that in the next few days (was plannin to last week,
> but due to a hand injury I was typing one handed last week - makes it
> pretty annoying to clean up code. But I just started being able to at
> least use my left thumb again...).

Ouch.  Get well soon, and don't overstress your hand --- that's a
good recipe for long-term problems.

regards, tom lane




Temporary tables versus wraparound... again

2020-11-08 Thread Greg Stark
We had an outage caused by transaction wraparound. And yes, one of the
first things I did on this site was check that we didn't have any
databases that were in danger of wraparound.

However since then we added a monitoring job that used a temporary
table with ON COMMIT DELETE ROWS. Since it was a simple monitoring job
it stayed connected to the database and used this small temporary
table for a very long period of time.

The temporary table never got vacuumed by autovacuum and never by the
monitoring job (since it was being truncated on every transaction why
would it need to be vacuumed...).

We've been around this bush before. Tom added orphaned table
protection to autovacuum precisely because temporary tables can cause
the datfrozenxid to get held back indefinitely. Then Michael Paquier
and Tsunakawa Takayuki both found it worth making this more
aggressive.

But none of that helped as the temporary schema was still in use so
they were not considered "orphaned" temp tables at all.

I think we need to add some warnings to autovacuum when it detects
*non* orphaned temporary tables that are older than the freeze
threshold.

However in the case of ON COMMIT DELETE ROWS we can do better. Why not
just reset the relfrozenxid and other stats as if the table was
freshly created when it's truncated?

I put together this quick patch to check the idea and it seems to
integrate fine in the code. I'm not sure about a few points but I
don't think they're showstoppers.

1) Should we update relpages and reltuples. I think so but an argument
could be made that people might be depending on running analyze once
when the data is loaded and then not having to run analyze on every
data load.

2) adding the dependency on heapam.h to heap.c makes sense because of
heap_inplace_update bt it may be a bit annoying because I suspect
that's a useful sanity check that the tableam stuff hasn't been
bypassed

3) I added a check to the regression tests but I'm not sure it's a
good idea to actually commit this. It could fail if there's a parallel
transaction going on and even moving the test to the serial schedule
might not guarantee that never happens due to autovacuum running
analyze?

I didn't actually add the warning to autovacuum yet.

-- 
greg
From 76eb00c43fba2a293dc4a079307e675e0eeaff06 Mon Sep 17 00:00:00 2001
From: Greg Stark 
Date: Sun, 8 Nov 2020 11:54:50 -0500
Subject: [PATCH] update relfrozenxmin when truncating temp tables

---
 src/backend/catalog/heap.c | 45 ++
 src/test/regress/expected/temp.out | 21 ++
 src/test/regress/parallel_schedule |  9 +---
 src/test/regress/sql/temp.sql  | 19 
 4 files changed, 91 insertions(+), 3 deletions(-)

diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c
index 4cd7d76..ffe36bb 100644
--- a/src/backend/catalog/heap.c
+++ b/src/backend/catalog/heap.c
@@ -30,6 +30,7 @@
 #include "postgres.h"
 
 #include "access/genam.h"
+#include "access/heapam.h"
 #include "access/htup_details.h"
 #include "access/multixact.h"
 #include "access/relation.h"
@@ -3277,6 +3278,48 @@ RelationTruncateIndexes(Relation heapRelation)
 }
 
 /*
+ * Reset the relfrozenxid and other stats to the same values used when
+ * creating tables. This is used after non-transactional truncation.
+ *
+ * This reduces the need for long-running programs to vacuum their own
+ * temporary tables (since they're not covered by autovacuum) at least in the
+ * case where they're ON COMMIT DELETE ROWS.
+ *
+ * see also src/backend/commands/vacuum.c vac_update_relstats()
+ * also see AddNewRelationTuple() above
+ */
+
+static void
+ResetVacStats(Relation rel)
+{
+	HeapTuple	ctup;
+	Form_pg_class pgcform;
+	Relation classRel;
+
+	/* Fetch a copy of the tuple to scribble on */
+	classRel = table_open(RelationRelationId, RowExclusiveLock);
+	ctup = SearchSysCacheCopy1(RELOID, ObjectIdGetDatum(RelationGetRelid(rel)));
+	if (!HeapTupleIsValid(ctup))
+		elog(ERROR, "pg_class entry for relid %u vanished during truncation",
+			 RelationGetRelid(rel));
+	pgcform = (Form_pg_class) GETSTRUCT(ctup);
+
+	/*
+	 * Update relfrozenxid
+	 */
+
+	pgcform->relpages = 0;
+	pgcform->reltuples = 0;
+	pgcform->relallvisible = 0;
+	pgcform->relfrozenxid = RecentXmin;
+	pgcform->relminmxid = GetOldestMultiXactId();
+
+	heap_inplace_update(classRel, ctup);
+
+	table_close(classRel, RowExclusiveLock);
+}
+
+/*
  *	 heap_truncate
  *
  *	 This routine deletes all data within all the specified relations.
@@ -3340,6 +3383,7 @@ heap_truncate_one_rel(Relation rel)
 
 	/* Truncate the underlying relation */
 	table_relation_nontransactional_truncate(rel);
+	ResetVacStats(rel);
 
 	/* If the relation has indexes, truncate the indexes too */
 	RelationTruncateIndexes(rel);
@@ -3351,6 +3395,7 @@ heap_truncate_one_rel(Relation rel)
 		Relation	toastrel = table_open(toastrelid, AccessExclusiveLock);
 
 		table_relation_nontransactional_truncate(toastrel);
+		ResetVacStats(rel);

Re: upcoming API changes for LLVM 12

2020-11-08 Thread Andres Freund
Hi,

On 2020-11-08 17:35:20 -0500, Tom Lane wrote:
> I suppose this is related to what you are talking about here.

Yes.


> If so, could we prioritize getting that committed?  It's annoying
> to have the buildfarm failures page so full of this one issue.

Yea, I'll try to do that in the next few days (was plannin to last week,
but due to a hand injury I was typing one handed last week - makes it
pretty annoying to clean up code. But I just started being able to at
least use my left thumb again...).

Greetings,

Andres Freund




Re: upcoming API changes for LLVM 12

2020-11-08 Thread Tom Lane
Andres Freund  writes:
> On 2020-11-02 10:28:33 -0800, Jesse Zhang wrote:
>> On Thu, Oct 15, 2020 at 6:12 PM Andres Freund  wrote:
>>> There will be a breaking API change for JIT related API in LLVM
>>> 12.

seawasp, which runs some bleeding-edge version of clang, has been falling
over for the last couple of weeks:

/home/fabien/clgtk/bin/clang -Wno-ignored-attributes -fno-strict-aliasing 
-fwrapv -O2  -D__STDC_LIMIT_MACROS -D__STDC_FORMAT_MACROS 
-D__STDC_CONSTANT_MACROS -D_DEBUG -D_GNU_SOURCE -I/home/fabien/clgtk/include  
-I../../../../src/include  -D_GNU_SOURCE -I/usr/include/libxml2  -flto=thin 
-emit-llvm -c -o llvmjit_types.bc llvmjit_types.c
llvmjit.c:21:10: fatal error: 'llvm-c/OrcBindings.h' file not found
#include 
 ^~
1 error generated.

I suppose this is related to what you are talking about here.
If so, could we prioritize getting that committed?  It's annoying
to have the buildfarm failures page so full of this one issue.

regards, tom lane




2020-11-12 Release Announcement Draft

2020-11-08 Thread Jonathan S. Katz
Hi,

Attached is a draft of the release announcement for the upcoming
2020-11-12 update release.

Corrections and feedback welcome, so long as it is submitted by
2020-11-11 AoE[1].

Thanks!

Jonathan

[1] https://en.wikipedia.org/wiki/Anywhere_on_Earth
The PostgreSQL Global Development Group has released an update to all supported
versions of our database system, including 13.1, 12.5, 11.10, 10.15, 9.6.20, and
9.5.24. This release fixes over 65 bugs reported over the last three months.

Please plan to update at your earliest convenience.

Additionally, this is the second-to-last release of PostgreSQL 9.5. If you are
running PostgreSQL 9.5 in a production environment, we suggest that you make
plans to upgrade.

Bug Fixes and Improvements
--

This update also fixes over 65 bugs that were reported in the last several
months. Some of these issues only affect version 13, but may also apply to other
supported versions.

Some of these fixes include:

* Fix a breakage in the replication protocol by ensuring that two "command
completion" events are expected for `START_REPLICATION`.
* Several fixes for potential data loss issues, though the circumstances where
these issues manifest are rare.
* Fix `ALTER ROLE` usage for users with the `BYPASSRLS` permission.
* `ALTER TABLE ONLY ... DROP EXPRESSION` is not allowed on partitioned tables
when there are child tables.
* Ensure that `ALTER TABLE ONLY ... ENABLE/DISABLE TRIGGER` does not apply to
child tables.
* Fix for `ALTER TABLE ... SET NOT NULL` on partitioned tables to avoid a
potential deadlock in parallel `pg_restore`.
* Fix handling of expressions in `CREATE TABLE LIKE` with inheritance.
* `DROP INDEX CONCURRENTLY` is not allowed on partitioned tables.
* Allow `LOCK TABLE` to succeed on a self-referential view instead of throwing
an error.
* Several fixes around statistics collection and progress reporting for
`REINDEX CONCURRENTLY`.
* Ensure that `GENERATED` columns are updated when any columns they depend on
are updated via a rule or an updatable view.
* Support hash partitioning with using text array columns as a partition key.
* Allow the jsonpath `.datetime()` method to accept ISO 8601-format timestamps.
* During a "smart" shutdown, ensure background processes are not terminated
until all foreground client sessions are completed, fixing an issue that broke
the processing of parallel queries.
* Several fixes for the query planner and optimizer.
* Ensure that data is de-toasted before being inserted into a BRIN index. This
could manifest itself with errors like "missing chunk number 0 for toast value
NNN". If you have seen a similar error in an existing BRIN index, you should be
able to correct it by using `REINDEX` on the index.
* Additional fixes for B-tree, BRIN, GiST, and GIN indexes.
* Fix the output of `EXPLAIN` to have the correct XML tag nesting for
incremental sort plans.
* Several fixes for memory leaks, including ones involving RLS policies, using
`CALL` with PL/pgSQL, `SIGHUP` processing a configuration parameter that cannot
be applied without a restart, and an edge-case for index lookup for a partition.
* libpq can now support arbitrary-length lines in the .pgpass file.
* On Windows, `psql` now reads the output of a backtick command in text mode,
not binary mode, so it can now properly handle newlines.
* Fix how `pg_dump`, `pg_restore`, `clusterdb`, `reindexdb`, and `vacuumdb` use
complex connection-string parameters.
* When the `\connect` command of `psql` reuses connection parameters, ensure
that all non-overridden parameters from a previous connection string are also
re-used.
* Ensure that `pg_dump` collects per-column information about extension
configuration tables, avoiding crashes when specifying `--inserts`.
* Ensure that parallel `pg_restore` processes foreign keys referencing
partitioned tables in the correct order.
* Several fixes for `contrib/pgcrypto`, including a memory leak fix.

This update also contains tzdata release 2020d for for DST law changes in Fiji,
Morocco, Palestine, the Canadian Yukon, Macquarie Island, and Casey Station
(Antarctica); plus historical corrections for France, Hungary, Monaco, and
Palestine.

For the full list of changes available, please review the
[release notes](https://www.postgresql.org/docs/current/release.html).

PostgreSQL 9.5 EOL Notice
-

PostgreSQL 9.5 will stop receiving fixes on February 11, 2021. If you are
running PostgreSQL 9.5 in a production environment, we suggest that you make
plans to upgrade to a newer, supported version of PostgreSQL. Please see our
[versioning policy](https://www.postgresql.org/support/versioning/) for more
information.

Updating


All PostgreSQL update releases are cumulative. As with other minor releases,
users are not required to dump and reload their database or use `pg_upgrade` in
order to apply this update release; you may simply shutdown PostgreSQL and
update its binaries.

Users who have skipped one or mo

Re: list of extended statistics on psql

2020-11-08 Thread Tomas Vondra
Hi,

I took a look at this today, and I think the code is ready, but the
regression test needs a bit more work:

1) It's probably better to use somewhat more specific names for the
objects, especially when created in public schema. It decreases the
chance of a collision with other tests (which may be hard to notice
because of timing). I suggest we use "stts_" prefix or something like
that, per the attached 0002 patch. (0001 is just the v7 patch)

2) The test is failing intermittently because it's executed in parallel
with stats_ext test, which is also creating extended statistics. So
depending on the timing the \dX may list some of the stats_ext stuff.
I'm not sure what to do about this. Either this part needs to be moved
to a separate test executed in a different group, or maybe we should
simply move it to stats_ext.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
>From 1a7e99c931c0b3ed5560abd3ff656412f8b353f1 Mon Sep 17 00:00:00 2001
From: Tomas Vondra 
Date: Sun, 8 Nov 2020 21:53:54 +0100
Subject: [PATCH 1/2] v7

---
 doc/src/sgml/ref/psql-ref.sgml |  14 
 src/bin/psql/command.c |   3 +
 src/bin/psql/describe.c| 100 +
 src/bin/psql/describe.h|   3 +
 src/bin/psql/help.c|   1 +
 src/bin/psql/tab-complete.c|   4 +-
 src/test/regress/expected/psql.out |  77 ++
 src/test/regress/sql/psql.sql  |  29 +
 8 files changed, 230 insertions(+), 1 deletion(-)

diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml
index 221a967bfe..fd860776af 100644
--- a/doc/src/sgml/ref/psql-ref.sgml
+++ b/doc/src/sgml/ref/psql-ref.sgml
@@ -1918,6 +1918,20 @@ testdb=>
 
 
   
+  
+  
+\dX [ pattern ]
+
+
+Lists extended statistics.
+If pattern
+is specified, only those extended statistics whose names match the pattern
+are listed.
+If + is appended to the command name, each extended statistics
+is listed with its size.
+
+
+  
 
   
 \dy[+] [ pattern ]
diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c
index c7a83d5dfc..c6f1653cb7 100644
--- a/src/bin/psql/command.c
+++ b/src/bin/psql/command.c
@@ -930,6 +930,9 @@ exec_command_d(PsqlScanState scan_state, bool active_branch, const char *cmd)
 else
 	success = listExtensions(pattern);
 break;
+			case 'X':			/* Extended Statistics */
+success = listExtendedStats(pattern, show_verbose);
+break;
 			case 'y':			/* Event Triggers */
 success = listEventTriggers(pattern, show_verbose);
 break;
diff --git a/src/bin/psql/describe.c b/src/bin/psql/describe.c
index 07d640021c..1807324542 100644
--- a/src/bin/psql/describe.c
+++ b/src/bin/psql/describe.c
@@ -4397,6 +4397,106 @@ listEventTriggers(const char *pattern, bool verbose)
 	return true;
 }
 
+/*
+ * \dX
+ *
+ * Briefly describes extended statistics.
+ */
+bool
+listExtendedStats(const char *pattern, bool verbose)
+{
+	PQExpBufferData buf;
+	PGresult   *res;
+	printQueryOpt myopt = pset.popt;
+
+	if (pset.sversion < 10)
+	{
+		char		sverbuf[32];
+
+		pg_log_error("The server (version %s) does not support extended statistics.",
+	 formatPGVersionNumber(pset.sversion, false,
+		   sverbuf, sizeof(sverbuf)));
+		return true;
+	}
+
+	initPQExpBuffer(&buf);
+	printfPQExpBuffer(&buf,
+	  "SELECT \n"
+	  "es.stxnamespace::pg_catalog.regnamespace::text AS \"%s\", \n"
+	  "es.stxname AS \"%s\", \n"
+	  "pg_catalog.format('%%s FROM %%s', \n"
+	  "  (SELECT pg_catalog.string_agg(pg_catalog.quote_ident(a.attname),', ') \n"
+	  "   FROM pg_catalog.unnest(es.stxkeys) s(attnum) \n"
+	  "   JOIN pg_catalog.pg_attribute a \n"
+	  "   ON (es.stxrelid = a.attrelid \n"
+	  "   AND a.attnum = s.attnum \n"
+	  "   AND NOT a.attisdropped)), \n"
+	  "es.stxrelid::regclass) AS \"%s\", \n"
+	  "CASE WHEN esd.stxdndistinct IS NOT NULL THEN 'built' \n"
+	  " WHEN 'd' = any(es.stxkind) THEN 'defined' \n"
+	  "END AS \"%s\", \n"
+	  "CASE WHEN esd.stxddependencies IS NOT NULL THEN 'built' \n"
+	  " WHEN 'f' = any(es.stxkind) THEN 'defined' \n"
+	  "END AS \"%s\", \n"
+	  "CASE WHEN esd.stxdmcv IS NOT NULL THEN 'built' \n"
+	  " WHEN 'm' = any(es.stxkind) THEN 'defined' \n"
+	  "END AS \"%s\"",
+	  gettext_noop("Schema"),
+	  gettext_noop("Name"),
+	  gettext_noop("Definition"),
+	  gettext_noop("N_distinct"),
+	  gettext_noop("Dependencies"),
+	  gettext_noop("Mcv"));
+
+	if (verbose)
+	{
+		appendPQExpBuffer(&buf,
+		  ",\nCASE WHEN esd.stxdndistinct IS NOT NULL THEN \n"
+		  "  pg_catalog.pg_size_pretty(pg_catalog.length(stxdndistinct)::bigint) \n"
+		  " WHEN 'd' = any(stxkind) THEN '0 bytes' \n"
+		  "END AS \"%s\", \n"
+		  "CASE

Re: [BUG]: segfault during update

2020-11-08 Thread Tom Lane
I wrote:
> Yeah, this is sufficient reason why we must use the more invasive
> patch on those branches.  What I'm wondering now is if there's a
> way to break even-older branches based on failure to handle dropped
> columns here.

After tracing through the example in v11, I see why those branches
are not broken: when ExecBRUpdateTriggers decides to return the
trigger-returned tuple, it sticks it into a target slot like this:

/*
 * Return the modified tuple using the es_trig_tuple_slot.  We assume
 * the tuple was allocated in per-tuple memory context, and therefore
 * will go away by itself. The tuple table slot should not try to
 * clear it.
 */
TupleTableSlot *newslot = estate->es_trig_tuple_slot;
TupleDesctupdesc = RelationGetDescr(relinfo->ri_RelationDesc);

if (newslot->tts_tupleDescriptor != tupdesc)
ExecSetSlotDescriptor(newslot, tupdesc);
ExecStoreTuple(newtuple, newslot, InvalidBuffer, false);

So the slot that ExecConstraints et al will be working with contains
the relation's actual tuple descriptor, not the approximate descr
obtained by looking at the plan tlist.

This logic is entirely gone in v12, which confirms my instinct that
there was something about Andres' slot-manipulation changes that
broke this scenario.  In v12 we end up using the junkfilter's output
slot, which does not have a sufficiently accurate tupdesc to deal with
an on-disk tuple rather than one constructed by the executor.

So I'll go see about back-patching 20d3fe900.

regards, tom lane




Re: [BUG]: segfault during update

2020-11-08 Thread Tom Lane
"Drouvot, Bertrand"  writes:
> Here is a scenario that produces segfault during update (on version 12 
> and 13):

Hm.  So the point about failing to reproduce dropped columns is more
critical than I thought.  I wonder how come we get away with that before
v12?

> So, we would need to back port this commit on 12 and 13.

Yeah, this is sufficient reason why we must use the more invasive
patch on those branches.  What I'm wondering now is if there's a
way to break even-older branches based on failure to handle dropped
columns here.

regards, tom lane




Re: Yet another (minor) fix in BRIN

2020-11-08 Thread Tomas Vondra
On 11/8/20 12:34 PM, Michael Paquier wrote:
> On Sat, Nov 07, 2020 at 10:45:26PM -0500, Tom Lane wrote:
>> The weekend before stable-branch releases is probably not the best
>> time to be pushing "minor" fixes into those branches.  I got my
>> fingers burned today, and so did Peter.  Don't follow our example ;-)
> 
> You could just apply your stuff after the version is tagged (not
> stamped as there could be urgent bug fixes between the stamp time and
> the tag time, like packaing issues).

Yeah, that's what I was planning to do - I was not suggesting I'll push
this right away. Or at least I did not mean to. Sorry if that was not
quite clear.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: pgbench stopped supporting large number of client connections on Windows

2020-11-08 Thread Marina Polyakova

On 2020-11-07 01:01, Fabien COELHO wrote:

Hello Marina,


Hello, Fabien!

Thank you for your comments!

While trying to test a patch that adds a synchronization barrier in 
pgbench [1] on Windows,


Thanks for trying that, I do not have a windows setup for testing, and
the sync code I wrote for Windows is basically blind coding:-(


FYI:

1) It looks like pgbench will no longer support Windows XP due to the 
function DeleteSynchronizationBarrier. From 
https://docs.microsoft.com/en-us/windows/win32/api/synchapi/nf-synchapi-deletesynchronizationbarrier 
:


Minimum supported client: Windows 8 [desktop apps only]
Minimum supported server: Windows Server 2012 [desktop apps only]

On Windows Server 2008 R2 (MSVC 2013) the 6-th version of the patch [1] 
has compiled without (new) warnings, but when running pgbench I got the 
following error:


The procedure entry point DeleteSynchronizationBarrier could not be 
located in the dynamic link library KERNEL32.dll.


2) On Windows Server 2019 (MSVC 2019) the 6-th version of the patch [1] 
with fix_max_client_conn_on_Windows.patch has compiled without (new) 
warnings. I made a few runs (-M prepared -c 100 -j 10 -T 10 -P1 -S) with 
and without your patches. On Linux (-M prepared -c 1000 -j 500 -T 10 -P1 
-S) your patches fix problems with progress reports as in [2], but on 
Windows I did not notice such changes, see attached 
pgbench_runs_linux_vs_windows.zip.


The almost same thing happens with reindexdb and vacuumdb (build on 
commit [3]):


Windows fd implementation is somehow buggy because it does not return
the smallest number available, and then with the assumption that
select uses a dense array indexed with them (true on linux, less so on
Windows which probably uses a sparse array), so that the number gets
over the limit, even if less are actually used, hence the catch, as
you noted.


I agree with you. It looks like the structure fd_set just contains used 
sockets by this application on Windows, and the macro FD_SETSIZE is used 
only here.


From 
https://docs.microsoft.com/en-us/windows/win32/api/winsock/ns-winsock-fd_set 
:


typedef struct fd_set {
  u_int  fd_count;
  SOCKET fd_array[FD_SETSIZE];
} fd_set, FD_SET, *PFD_SET, *LPFD_SET;

From 
https://docs.microsoft.com/en-us/windows/win32/winsock/maximum-number-of-sockets-supported-2 
:


The maximum number of sockets that a Windows Sockets application can use 
is not affected by the manifest constant FD_SETSIZE. This value defined 
in the Winsock2.h header file is used in constructing the FD_SET 
structures used with select function.


IIUC the checks below are not correct on Windows, since on this system 
sockets can have values equal to or greater than FD_SETSIZE (see 
Windows documentation [4] and pgbench debug output in attached 
pgbench_debug.txt).


Okay.

But then, how may one detect that there are too many fds in the set?

I think that an earlier version of the code needed to make assumptions
about the internal implementation of windows (there is a counter
somewhere in windows fd_set struct), which was rejected because if was
breaking the interface. Now your patch is basically resurrecting that.


I tried to keep the behaviour "we check if the socket value can be used 
in select() at runtime", but now I will also read that thread...



Why not if there is no other solution, but this is quite depressing,
and because it breaks the interface it would be broken if windows
changed its internals for some reason:-(


It looks like if the internals of the structure fd_set are changed, we 
will also have problems with the function pgwin32_select from 
src/backend/port/win32/socket.c, because it uses fd_set.fd_count too?..


(I'm writing responses to the rest of your comments but it takes 
time...)


[1] 
https://www.postgresql.org/message-id/alpine.DEB.2.22.394.2011021726390.989361%40pseudo
[2] 
https://www.postgresql.org/message-id/20200227185129.hikscyenomnlrord%40alap3.anarazel.de


--
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company<>


Re: Hybrid Hash/Nested Loop joins and caching results from subplans

2020-11-08 Thread Andy Fan
On Mon, Nov 2, 2020 at 3:44 PM David Rowley  wrote:

> On Tue, 20 Oct 2020 at 22:30, David Rowley  wrote:
> >
> > So far benchmarking shows there's still a regression from the v8
> > version of the patch. This is using count(*). An earlier test [1] did
> > show speedups when we needed to deform tuples returned by the nested
> > loop node. I've not yet repeated that test again. I was disappointed
> > to see v9 slower than v8 after having spent about 3 days rewriting the
> > patch
>
> I did some further tests this time with some tuple deforming.  Again,
> it does seem that v9 is slower than v8.
>

I run your test case on v8 and v9,  I can produce a stable difference
between them.

v8:
statement latencies in milliseconds:
  1603.611  select count(*) from hundredk hk inner join lookup l on
hk.thousand = l.a;

v9:
statement latencies in milliseconds:
  1772.287  select count(*) from hundredk hk inner join lookup l on
hk.thousand = l.a;

then I did a perf on the 2 version,  Is it possible that you
called tts_minimal_clear twice in
the v9 version?  Both ExecClearTuple and  ExecStoreMinimalTuple
called tts_minimal_clear
on the same  slot.

With the following changes:

diff --git a/src/backend/executor/execMRUTupleCache.c
b/src/backend/executor/execMRUTupleCache.c
index 3553dc26cb..b82d8e98b8 100644
--- a/src/backend/executor/execMRUTupleCache.c
+++ b/src/backend/executor/execMRUTupleCache.c
@@ -203,10 +203,9 @@ prepare_probe_slot(MRUTupleCache *mrucache,
MRUCacheKey *key)
TupleTableSlot *tslot = mrucache->tableslot;
int numKeys = mrucache->nkeys;

-   ExecClearTuple(pslot);
-
if (key == NULL)
{
+   ExecClearTuple(pslot);
/* Set the probeslot's values based on the current
parameter values */
for (int i = 0; i < numKeys; i++)
pslot->tts_values[i] =
ExecEvalExpr(mrucache->param_exprs[i],
@@ -641,7 +640,7 @@ ExecMRUTupleCacheFetch(MRUTupleCache *mrucache)
{
mrucache->state =
MRUCACHE_FETCH_NEXT_TUPLE;

-
ExecClearTuple(mrucache->cachefoundslot);
+   //
ExecClearTuple(mrucache->cachefoundslot);
slot =
mrucache->cachefoundslot;

ExecStoreMinimalTuple(mrucache->last_tuple->mintuple, slot, false);
return slot;
@@ -740,7 +739,7 @@ ExecMRUTupleCacheFetch(MRUTupleCache *mrucache)
return NULL;
}

-   ExecClearTuple(mrucache->cachefoundslot);
+   // ExecClearTuple(mrucache->cachefoundslot);
slot = mrucache->cachefoundslot;

ExecStoreMinimalTuple(mrucache->last_tuple->mintuple, slot, false);
return slot;


v9 has the following result:
  1608.048  select count(*) from hundredk hk inner join lookup l on
hk.thousand = l.a;



> Graphs attached
>
> Looking at profiles, I don't really see any obvious reason as to why
> this is.  I'm very much inclined to just pursue the v8 patch (separate
> Result Cache node) and just drop the v9 idea altogether.
>
> David
>


-- 
Best Regards
Andy Fan


Re: Hybrid Hash/Nested Loop joins and caching results from subplans

2020-11-08 Thread Andy Fan
On Fri, Nov 6, 2020 at 6:13 AM David Rowley  wrote:

> On Mon, 2 Nov 2020 at 20:43, David Rowley  wrote:
> >
> > On Tue, 20 Oct 2020 at 22:30, David Rowley  wrote:
> > I did some further tests this time with some tuple deforming.  Again,
> > it does seem that v9 is slower than v8.
> >
> > Graphs attached
> >
> > Looking at profiles, I don't really see any obvious reason as to why
> > this is.  I'm very much inclined to just pursue the v8 patch (separate
> > Result Cache node) and just drop the v9 idea altogether.
>
> Nobody raised any objections, so I'll start taking a more serious look
> at the v8 version (the patch with the separate Result Cache node).
>
> One thing that I had planned to come back to about now is the name
> "Result Cache".  I admit to not thinking for too long on the best name
> and always thought it was something to come back to later when there's
> some actual code to debate a better name for. "Result Cache" was
> always a bit of a placeholder name.
>
> Some other names that I'd thought of were:
>
> "MRU Hash"
> "MRU Cache"
> "Parameterized Tuple Cache" (bit long)
> "Parameterized Cache"
> "Parameterized MRU Cache"
>
>
I think "Tuple Cache" would be OK which means it is a cache for tuples.
Telling MRU/LRU would be too internal for an end user and "Parameterized"
looks redundant given that we have said "Cache Key" just below the node
name.

Just my $0.01.

-- 
Best Regards
Andy Fan


Re: Yet another (minor) fix in BRIN

2020-11-08 Thread Michael Paquier
On Sat, Nov 07, 2020 at 10:45:26PM -0500, Tom Lane wrote:
> The weekend before stable-branch releases is probably not the best
> time to be pushing "minor" fixes into those branches.  I got my
> fingers burned today, and so did Peter.  Don't follow our example ;-)

You could just apply your stuff after the version is tagged (not
stamped as there could be urgent bug fixes between the stamp time and
the tag time, like packaing issues).
--
Michael


signature.asc
Description: PGP signature


Re: Collation versioning

2020-11-08 Thread Thomas Munro
On Fri, Oct 4, 2019 at 1:25 AM Thomas Munro  wrote:
> Ok.  Here's one like that.  Also, a WIP patch for FreeBSD.

Here's an updated patch for FreeBSD, which I'll sit on for a bit
longer.  It needs bleeding edge 13-CURRENT (due out Q1ish).
From b9cb5562457c214c48c0a6334b8ed3264f50e3d6 Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Sun, 8 Nov 2020 21:12:12 +1300
Subject: [PATCH] Add collation versions for FreeBSD.

On FreeBSD 13, use querylocale() to read the current version of libc
collations.
---
 doc/src/sgml/charset.sgml |  3 ++-
 src/backend/utils/adt/pg_locale.c | 20 
 2 files changed, 22 insertions(+), 1 deletion(-)

diff --git a/doc/src/sgml/charset.sgml b/doc/src/sgml/charset.sgml
index 832a701523..e151987eff 100644
--- a/doc/src/sgml/charset.sgml
+++ b/doc/src/sgml/charset.sgml
@@ -973,7 +973,8 @@ CREATE COLLATION ignore_accents (provider = icu, locale = 'und-u-ks-level1-kc-tr
 Version information is available from the
 icu provider on all operating systems.  For the
 libc provider, versions are currently only available
-on systems using the GNU C library (most Linux systems) and Windows.
+on systems using the GNU C library (most Linux systems), FreeBSD and
+Windows.

 

diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index 1dfe343b79..b1a8eb9a4e 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -1684,6 +1684,26 @@ get_collation_actual_version(char collprovider, const char *collcollate)
 
 		/* Use the glibc version because we don't have anything better. */
 		collversion = pstrdup(gnu_get_libc_version());
+#elif defined(LC_VERSION_MASK)
+		locale_tloc;
+
+		/* C[.encoding] and POSIX never change. */
+		if (strcmp("C", collcollate) == 0 ||
+			strncmp("C.", collcollate, 2) == 0 ||
+			strcmp("POSIX", collcollate) == 0)
+			return NULL;
+
+		/* Look up FreeBSD collation version. */
+		loc = newlocale(LC_COLLATE, collcollate, NULL);
+		if (loc)
+		{
+			collversion =
+pstrdup(querylocale(LC_COLLATE_MASK | LC_VERSION_MASK, loc));
+			freelocale(loc);
+		}
+		else
+			ereport(ERROR,
+	(errmsg("could not load locale \"%s\"", collcollate)));
 #elif defined(WIN32) && _WIN32_WINNT >= 0x0600
 		/*
 		 * If we are targeting Windows Vista and above, we can ask for a name
-- 
2.28.0



Re: Allow "snapshot too old" error, to prevent bloat

2020-11-08 Thread Andy Fan
On Sun, Nov 8, 2020 at 5:14 PM Tom Lane  wrote:

> Jim Nasby  writes:
> > On 2/15/15 10:36 AM, Tom Lane wrote:
> >> I wonder if we couldn't achieve largely the same positive effects
> through
> >> adding a simple transaction-level timeout option.
>
> > A common use-case is long-running reports hitting relatively stable data
> > in a database that also has tables with a high churn rate (ie: queue
> > tables). In those scenarios your only options right now are to suffer
> > huge amounts of bloat in the high-churn or not do your reporting. A
> > simple transaction timeout only "solves" this by denying you reporting
> > queries.
>
> Agreed, but Kevin's proposal has exactly the same problem only worse,
> because (a) the reporting transactions might or might not fail (per
> Murphy, they'll fail consistently only when you're on deadline), and
> (b) if they do fail, there's nothing you can do short of increasing the
> slack db-wide.
>
> > An idea that I've had on this would be some way to "lock down" the
> > tables that a long-running transaction could access. That would allow
> > vacuum to ignore any snapshots that transaction had for tables it wasn't
> > accessing. That's something that would be deterministic.
>
> There might be something in that, but again it's not much like this patch.
> The key point I think we're both making is that nondeterministic failures
> are bad,


Based on my experience, a long transaction which is caused by a long-running
query is not so many.  A common case I met is user start a transaction
but forget it to close it,  I am thinking if we can do something for this
situation.

Since most users will use Read Committed isolation level,  so after a user
completes a query, the next query will use a fresh new snapshot, so there
is no
need to block the oldest xmin because of this.  will it be correct to
advance the oldest
xmin in this case?  If yes, what would be the blocker in implementing this
feature?

-- 
Best Regards
Andy Fan


Re: pgbench stopped supporting large number of client connections on Windows

2020-11-08 Thread Fabien COELHO


Hello Tom,


Use ppoll, and start more threads but not too many?


Does ppoll exist on Windows?


Some g*gling suggest that the answer is no.

There was a prior thread on this topic, which seems to have drifted off 
into the sunset:


Indeed. I may have contributed to this dwindling by not adding a CF entry 
for this thread, so that there was no reminder anywhere.



https://www.postgresql.org/message-id/flat/BL0PR1901MB1985F219C46C61EDE7036C34ED8E0%40BL0PR1901MB1985.namprd19.prod.outlook.com


It seems that there is no simple good solution around windows wait event 
implementations:

 - timeout seems to be milliseconds on all things I saw
 - 64 is an intrinsic limit, probably because of the underlying n² 
implementations

Maybe we could provide a specific windows implementation limited to 64 fd 
(which is not a bad thing bench-wise) but with a rounded-down timeout, so 
that it could end-up on an activate spinning wait in some cases, which is 
probably not a bug issue, all things considered.


--
Fabien.