Re: warn_unused_results
On 2020-11-09 07:56, Michael Paquier wrote: This is accepted by clang, and MSVC has visibly an equivalent for that, as of VS 2012: #elif defined(_MSC_VER) && (_MSC_VER >= 1700) #define pg_nodiscard _Check_return_ We don't care about the 1700 condition as we support only >= 1800 on HEAD, and in this case the addition of pg_nodiscard would be required on the definition and the declaration. Should it be added? It is much more invasive than the gcc/clang equivalent though.. AFAICT from the documentation, this only applies for special "analyze" runs, not as a normal compiler warning. Do we have any support for analyze runs with MSVC? FWIW, I saw an extra case with parsePGArray() today. AFAICT, that's more in the category of checking for error returns, which is similar to the "fortify" options that some environments have for checking the return of fwrite() etc. I am not sure about the addition of repalloc(), as it is quite obvious that one has to use its result. Lists are fine, these are proper to PG internals and beginners tend to be easily confused in the way to use them. realloc() is listed in the GCC documentation as the reason this option exists, and glibc tags its realloc() with this attribute, so doing the same for repalloc() seems sensible. -- Peter Eisentraut 2ndQuadrant, an EDB company https://www.2ndquadrant.com/
Re: warn_unused_results
On Sat, Oct 17, 2020 at 08:57:51AM +0200, Peter Eisentraut wrote: > Forgetting to assign the return value of list APIs such as lappend() is a > perennial favorite. The compiler can help point out such mistakes. GCC has > an attribute warn_unused_results. Also C++ has standardized this under the > name "nodiscard", and C has a proposal to do the same [0]. In my patch I > call the symbol pg_nodiscard, so that perhaps in a distant future one only > has to do s/pg_nodiscard/nodiscard/ or something similar. Also, the name is > short enough that it doesn't mess up the formatting of function declarations > too much. I have seen as well this stuff being a source of confusion in the past. > I have added pg_nodiscard decorations to all the list functions where I > found it sensible, as well as repalloc() for good measure, since realloc() > is usually mentioned as an example where this function attribute is useful. +#ifdef __GNUC__ +#define pg_nodiscard __attribute__((warn_unused_result)) +#else +#define pg_nodiscard +#endif This is accepted by clang, and MSVC has visibly an equivalent for that, as of VS 2012: #elif defined(_MSC_VER) && (_MSC_VER >= 1700) #define pg_nodiscard _Check_return_ We don't care about the 1700 condition as we support only >= 1800 on HEAD, and in this case the addition of pg_nodiscard would be required on the definition and the declaration. Should it be added? It is much more invasive than the gcc/clang equivalent though.. > I have found two places in the existing code where this creates warnings. > Both places are correct as is, but make assumptions about the internals of > the list APIs and it seemed better just to fix the warning than to write a > treatise about why it's correct as is. FWIW, I saw an extra case with parsePGArray() today. I am not sure about the addition of repalloc(), as it is quite obvious that one has to use its result. Lists are fine, these are proper to PG internals and beginners tend to be easily confused in the way to use them. -- Michael signature.asc Description: PGP signature
Re: [PATCH] Add features to pg_stat_statements
However, let me confirm the following. Is this information really useful? If there is no valid use case for this, I'd like to drop it. Thought? I thought it would be easy for users to see at a glance that if there is a case I assumed, if the last modified date and time is old, there is no need to adjust at all, and if the last modified date and time is recent, it would be easy for users to understand that the parameters need to be adjusted. What do you think? Even without the last deallocation timestamp, you can presume when the deallocation of entries happened by periodically monitoring the total number of deallocations and seeing those history. Or IMO it's better to log whenever the deallocation happens as proposed upthread. Then you can easily check the history of occurrences of deallocations from the log file. Regards, +1.I think you should output the deallocation history to the log as well. In light of the above, I've posted a patch that reflects the points made. Regards, diff --git a/contrib/pg_stat_statements/Makefile b/contrib/pg_stat_statements/Makefile index 081f997d70..3ec627b956 100644 --- a/contrib/pg_stat_statements/Makefile +++ b/contrib/pg_stat_statements/Makefile @@ -6,7 +6,7 @@ OBJS = \ pg_stat_statements.o EXTENSION = pg_stat_statements -DATA = pg_stat_statements--1.4.sql \ +DATA = pg_stat_statements--1.4.sql pg_stat_statements--1.8--1.9.sql \ pg_stat_statements--1.7--1.8.sql pg_stat_statements--1.6--1.7.sql \ pg_stat_statements--1.5--1.6.sql pg_stat_statements--1.4--1.5.sql \ pg_stat_statements--1.3--1.4.sql pg_stat_statements--1.2--1.3.sql \ diff --git a/contrib/pg_stat_statements/expected/pg_stat_statements.out b/contrib/pg_stat_statements/expected/pg_stat_statements.out index e0edb134f3..85e78c7f46 100644 --- a/contrib/pg_stat_statements/expected/pg_stat_statements.out +++ b/contrib/pg_stat_statements/expected/pg_stat_statements.out @@ -859,4 +859,19 @@ SELECT query, plans, calls, rows FROM pg_stat_statements ORDER BY query COLLATE SELECT query, plans, calls, rows FROM pg_stat_statements ORDER BY query COLLATE "C" | 1 | 0 |0 (6 rows) +-- +-- Checking the execution of the pg_stat_statements_info view +-- +SELECT pg_stat_statements_reset(0,0,0); + pg_stat_statements_reset +-- + +(1 row) + +SELECT * FROM pg_stat_statements_info; + dealloc +- + 0 +(1 row) + DROP EXTENSION pg_stat_statements; diff --git a/contrib/pg_stat_statements/pg_stat_statements--1.8--1.9.sql b/contrib/pg_stat_statements/pg_stat_statements--1.8--1.9.sql new file mode 100644 index 00..5c491f242a --- /dev/null +++ b/contrib/pg_stat_statements/pg_stat_statements--1.8--1.9.sql @@ -0,0 +1,17 @@ +/* contrib/pg_stat_statements/pg_stat_statements--1.8--1.9.sql */ + +-- complain if script is sourced in psql, rather than via ALTER EXTENSION +\echo Use "ALTER EXTENSION pg_stat_statements UPDATE TO '1.9'" to load this file. \quit + +--- Define pg_stat_statements_info +CREATE FUNCTION pg_stat_statements_info ( + OUT dealloc bigint +) +RETURNS bigint +AS 'MODULE_PATHNAME' +LANGUAGE C STRICT VOLATILE PARALLEL SAFE; + +CREATE VIEW pg_stat_statements_info AS + SELECT * FROM pg_stat_statements_info(); + +GRANT SELECT ON pg_stat_statements_info TO PUBLIC; diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c index 1eac9edaee..65cf4fb0a7 100644 --- a/contrib/pg_stat_statements/pg_stat_statements.c +++ b/contrib/pg_stat_statements/pg_stat_statements.c @@ -193,6 +193,15 @@ typedef struct Counters uint64 wal_bytes; /* total amount of WAL bytes generated */ } Counters; +/* + * Counter for pg_stat_statements_info + */ +typedef struct pgssInfoCounters +{ + int64 dealloc; /* # of deallocation */ + slock_t mutex; /* protects the counters only */ +} pgssInfoCounters; + /* * Statistics per statement * @@ -279,6 +288,7 @@ static ProcessUtility_hook_type prev_ProcessUtility = NULL; /* Links to shared memory state */ static pgssSharedState *pgss = NULL; static HTAB *pgss_hash = NULL; +static pgssInfoCounters *pgss_info = NULL; /* GUC variables */ @@ -327,6 +337,7 @@ PG_FUNCTION_INFO_V1(pg_stat_statements_1_2); PG_FUNCTION_INFO_V1(pg_stat_statements_1_3); PG_FUNCTION_INFO_V1(pg_stat_statements_1_8); PG_FUNCTION_INFO_V1(pg_stat_statements); +PG_FUNCTION_INFO_V1(pg_stat_statements_info); static void pgss_shmem_startup(void); static void pgss_shmem_shutdown(int code, Datum arg); @@ -380,6 +391,8 @@ static char *generate_normalized_query(pgssJumbleState *jstate, const char *quer static void fill_in_constant_lengths(pgssJumbleState *jstate, const char *query, int query_loc); static int comp_location(const void *a, const void *b); +static void pgss_info_update(void); +static void pgss_info_reset(void); /* @@ -518,6 +531,7 @@ static void pgss_shmem_startup(void) { bool found; + bool found_info; HASHCTL info; FILE *file = NULL; FIL
RE: Disable WAL logging to speed up data loading
Hello, Stephen On Tuesday, Nov 3, 2020 3:02 AM Stephen Frost wrote: > * Magnus Hagander (mag...@hagander.net) wrote: > > On Mon, Nov 2, 2020 at 4:28 PM Robert Haas > wrote: > > > On Thu, Oct 29, 2020 at 4:00 PM Fujii Masao > wrote: > > > > Yes. What I meant was such a safe guard needs to be implemented. > > > > > > > > This may mean that if we want to recover the database from that > > > > backup, we need to specify the recovery target so that the archive > > > > recovery stops just before the WAL record indicating wal_level change. > > > > > > Yeah, I think we need these kinds of safeguards, for sure. > > > > > > I'm also concerned about the way that this proposed feature > > > interacts with incremental backup capabilities that already exist in > > > tools like pgBackRest, EDB's BART, pg_probackup, and future things > > > we might want to introduce into core, along the lines of what I have > > > previously proposed. Now, I think pgBackRest uses only timestamps > > > and checksums, so it probably doesn't care, but some of the other > > > solutions rely on WAL-scanning to gather a list of changed blocks. I > > > guess there's no reason that they can't notice the wal_level being > > > changed and do the right thing; they should probably have that kind > > > of capability already. Still, it strikes me that it might be useful > > > if we had a stronger mechanism. > > Checking the WAL level certainly seems critical to anything that's reading the > WAL. We certainly do this already when running as a > replica: > > ereport(WARNING, > (errmsg("WAL was generated with wal_level=minimal, data may be > missing"), > errhint("This happens if you temporarily set wal_level=minimal > without taking a new base backup."))); > > There's definitely a question about if a WARNING there is really sufficient or > not, considering that you could end up with 'logged' > tables on the replica that are missing data, but I'm not sure that inventing a > new, independent, mechanism for checking WAL level changes makes sense. > > While pgbackrest backups of a primary wouldn't be impacted, it does support > backing up from a replica (as do other backup systems, of > course) and if data that's supposed to be on the replica isn't there because > someone restarted PG with wal_level=minimal and then flipped it back to > replica and got the replica to move past that part of the WAL by turning off > hot > standby, replaying, and then turning it back on, then the backup is going to > also be missing that data. > > Perhaps that's where we need to have a stronger mechanism though- that is, > if we hit the above WARNING, ever, set a flag somewhere that tools can check > and which we could also check and throw further warnings about. In other > words, every time this replica is started, we could check for this flag and > throw the same warning above about how data may be missing, and we could > have pg_basebackup flat out refuse to back up from a replica where this flag > has been set (maybe with some kind of override mechanism... maybe not). > After all, that's where the real issue here is, isn't it? > > > > I'm not exactly sure what that would look like, but suppose we had a > > > feature where every time wal_level drops below replica, a counter > > > gets incremented by 1, and that counter is saved in the control > > > file. Or maybe when wal_level drops below minimal to none. Or maybe > > > there are two counters. Anyway, the idea is that if you have a > > > snapshot of the cluster at one time and a snapshot at another time, > > > you can see whether anything scary has happened in the middle > > > without needing all of the WAL in between. > > > > > > Maybe this is off-topic for this thread or not really needed, but > > > I'm not sure. I don't think wal_level=none is a bad idea > > > intrinsically, but I think it would be easy to implement it poorly > > > and end up harming a lot of users. I have no problem with giving > > > people a way to do dangerous things, but we should do our best to > > > let people know how much danger they've incurred. > > I'm not sure that wal_level=none is really the right way to address this > use-case. We already have unlogged tables and that's pretty clean and > meets the "we want to load data fast without having to pay for WAL" use case. > The argument here seems to be that to take advantage of unlogged tables > requires the tools using PG to know how to issue a 'CREATE UNLOGGED > TABLE' command instead of a 'CREATE TABLE' command. That doesn't > seem like a huge leap, but we could make it easier by just adding a > 'table_log_default' or such GUC that could be set on the data loading role to > have all tables created by it be unlogged. I'm afraid to say that in the case to setup all tables as unlogged, the user are forced to be under tension to back up *all* commands from application, in preparation for unexpected crash. This is because whenever the server crashes, the unlogged tab
Re: Temporary tables versus wraparound... again
On Mon, 9 Nov 2020 at 00:17, Noah Misch wrote: > > > 2) adding the dependency on heapam.h to heap.c makes sense because of > > heap_inplace_update bt it may be a bit annoying because I suspect > > that's a useful sanity check that the tableam stuff hasn't been > > bypassed > > That is not terrible. How plausible would it be to call vac_update_relstats() > for this, instead of reimplementing part of it? It didn't seem worth it to change its API to add boolean flags to skip setting some of the variables (I was originally only doing relfrozenxid and minmmxid). Now that I'm doing most of the variables maybe it makes a bit more sense. > > @@ -3340,6 +3383,7 @@ heap_truncate_one_rel(Relation rel) > > > > /* Truncate the underlying relation */ > > table_relation_nontransactional_truncate(rel); > > + ResetVacStats(rel); > > I didn't test, but I expect this will cause a stats reset for the second > TRUNCATE here: > > CREATE TABLE t (); > ... > BEGIN; > TRUNCATE t; > TRUNCATE t; -- inplace relfrozenxid reset > ROLLBACK; -- inplace reset survives > > Does that indeed happen? Apparently no, see below. I have to say I was pretty puzzled by the actual behaviour which is that the rollback actually does roll back the inplace update. But I *think* what is happening is that the first truncate does an MVCC update so the inplace update happens only to the newly created tuple which is never commited. Thinking about things a bit this does worry me a bit. I wonder if inplace update is really safe outside of vacuum where we know we're not in a transaction that can be rolled back. But IIRC doing a non-inplace update on pg_class for these columns breaks other things. I don't know if that's still true. Also, in checking this question I realized I had missed 3d351d91. I should be initializing reltuples to -1 not 0. postgres=# vacuum t; VACUUM postgres=# select relname,relpages,reltuples,relallvisible,relfrozenxid from pg_class where oid='t'::regclass; relname | relpages | reltuples | relallvisible | relfrozenxid -+--+---+---+-- t |9 | 2000 | 9 |15557 (1 row) postgres=# begin; BEGIN postgres=*# truncate t; TRUNCATE TABLE postgres=*# truncate t; TRUNCATE TABLE postgres=*# select relname,relpages,reltuples,relallvisible,relfrozenxid from pg_class where oid='t'::regclass; relname | relpages | reltuples | relallvisible | relfrozenxid -+--+---+---+-- t |0 | 0 | 0 |15562 (1 row) postgres=*# abort; ROLLBACK postgres=# select relname,relpages,reltuples,relallvisible,relfrozenxid from pg_class where oid='t'::regclass; relname | relpages | reltuples | relallvisible | relfrozenxid -+--+---+---+-- t |9 | 2000 | 9 |15557 (1 row) -- greg
Re: document pg_settings view doesn't display custom options
On 2020/11/07 0:42, Fujii Masao wrote: On 2020/10/31 2:06, John Naylor wrote: On Fri, Oct 30, 2020 at 12:48 PM Tom Lane mailto:t...@sss.pgh.pa.us>> wrote: John Naylor mailto:john.nay...@enterprisedb.com>> writes: > Okay, along those lines here's a patch using "this view" in a new paragraph > for simplicity. Basically OK with me, but ... It seems fairly weird to use a nonspecific reference first and then a specific one. That is, I'd expect to read "The pg_settings view ..." and then "This view ...", not the other way around. So we could put this para second, or put it first but make this para say "The pg_settings view ..." while the existing text gets reduced to "This view ...". Or just make them both say "This view ..." so we don't have to have this discussion again the next time somebody wants to add a para here. Okay, how's this? Looks good to me. Barring any objection, I will commit the patch. Pushed. Thanks! Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
Re: abstract Unix-domain sockets
On Thu, Oct 22, 2020 at 09:03:49AM +0200, Peter Eisentraut wrote: > On 2020-10-09 09:28, Peter Eisentraut wrote: >> During the discussion on Unix-domain sockets on Windows, someone pointed >> out[0] abstract Unix-domain sockets. This is a variant of the normal >> Unix-domain sockets that don't use the file system but a separate >> "abstract" namespace. At the user interface, such sockets are >> represented by names starting with "@". I took a look at this and it >> wasn't hard to get working, so here is a patch. It's supposed to be >> supported on Linux and Windows right now, but I haven't tested on Windows. Yeah, peaking at the Windows docs, what you are trying to do here should be supported (please note that I have not tested ). One reference: https://devblogs.microsoft.com/commandline/af_unix-comes-to-windows/ >> And then some extra patches for surrounding cleanup. During testing I >> noticed that the bind() failure hint "Is another postmaster already >> running ..." was shown in inappropriate situations, so I changed that to >> only show for EADDRINUSE errors. (Maybe other error codes could be >> appropriate, but I couldn't find any more.) >> >> And then looking for other uses of EADDRINUSE I found some dead >> Windows-related code that can be cleaned up. > > This last piece has been committed. + +A value that starts with @ specifies that a +Unix-domain socket in the abstract namespace should be created +(currently supported on Linux and Windows). In that case, this value +does not specify a directory but a prefix from which +the actual socket name is computed in the same manner as for the +file-system namespace. While the abstract socket name prefix can be +chosen freely, since it is not a file-system location, the convention +is to nonetheless use file-system-like values such as +@/tmp. + As abstract namespaces don't have permissions, anyone knowing the name of the path, which should be unique, can have an access to the server. Do you think that the documentation should warn the user about that? This feature is about easing the management part of the socket paths while throwing away the security aspect of it. When attempting to start a server that listens to the same port and uses the same abstract path, the second server started still shows a hint referring to a file that does not exist: LOG: could not bind Unix address "@tmp/.s.PGSQL.5432": Address already in use HINT: Is another postmaster already running on port 5432? If not, remove socket file "@tmp/.s.PGSQL.5432" and retry. Instead of showing paths with at signs, wouldn't it be better to mention it is an abstract socket address? I am not sure that 0002 is an improvement. It would be more readable to move the part choosing what hint is adapted into a first block that selects the hint string rather than have the whole thing in a single elog() call. -- Michael signature.asc Description: PGP signature
Re: logical streaming of xacts via test_decoding is broken
On Mon, Nov 9, 2020 at 11:21 AM Dilip Kumar wrote: > > On Mon, Nov 9, 2020 at 11:04 AM Dilip Kumar wrote: > > > > On Mon, Nov 9, 2020 at 11:00 AM Amit Kapila wrote: > > > > > > Michael reported a BF failure [1] related to one of the logical > > > streaming test case and I've analyzed the issue. As responded on > > > pgsql-committers [2], the issue here is that the streaming > > > transactions can be interleaved and because we are maintaining whether > > > xact_wrote_changes at the LogicalDecodingContext level, one of later > > > transaction can overwrite the flag for previously streaming > > > transaction. I think it is logical to have this flag at each > > > transaction level (aka in ReorderBufferTxn), however till now it was > > > fine because the changes of each transaction are decoded at one-shot > > > which will be no longer true. We can keep a output_plugin_private data > > > pointer in ReorderBufferTxn which will be used by test_decoding module > > > to keep this and any other such flags in future. We need to set this > > > flag at begin_cb and stream_start_cb APIs and then reset/remove it at > > > stream_commit_cb, stream_abort_cb and stream_stop_cb APIs. > > So IIUC, we need to keep 'output_plugin_private' in > LogicalDecodingContext as well as in ReorderBufferTxn, So the > output_plugin_private in the ReorderBufferTxn will currently just keep > one flag xact_wrote_changes and the remaining things will still be > maintained in output_plugin_private of the LogicalDecodingContext. Is > my understanding correct? > Yes. But keep it as void * so that we can add more things later if required. -- With Regards, Amit Kapila.
Re: logical streaming of xacts via test_decoding is broken
On Mon, Nov 9, 2020 at 11:04 AM Dilip Kumar wrote: > > On Mon, Nov 9, 2020 at 11:00 AM Amit Kapila wrote: > > > > Michael reported a BF failure [1] related to one of the logical > > streaming test case and I've analyzed the issue. As responded on > > pgsql-committers [2], the issue here is that the streaming > > transactions can be interleaved and because we are maintaining whether > > xact_wrote_changes at the LogicalDecodingContext level, one of later > > transaction can overwrite the flag for previously streaming > > transaction. I think it is logical to have this flag at each > > transaction level (aka in ReorderBufferTxn), however till now it was > > fine because the changes of each transaction are decoded at one-shot > > which will be no longer true. We can keep a output_plugin_private data > > pointer in ReorderBufferTxn which will be used by test_decoding module > > to keep this and any other such flags in future. We need to set this > > flag at begin_cb and stream_start_cb APIs and then reset/remove it at > > stream_commit_cb, stream_abort_cb and stream_stop_cb APIs. So IIUC, we need to keep 'output_plugin_private' in LogicalDecodingContext as well as in ReorderBufferTxn, So the output_plugin_private in the ReorderBufferTxn will currently just keep one flag xact_wrote_changes and the remaining things will still be maintained in output_plugin_private of the LogicalDecodingContext. Is my understanding correct? -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: logical streaming of xacts via test_decoding is broken
On Mon, Nov 9, 2020 at 11:00 AM Amit Kapila wrote: > > Michael reported a BF failure [1] related to one of the logical > streaming test case and I've analyzed the issue. As responded on > pgsql-committers [2], the issue here is that the streaming > transactions can be interleaved and because we are maintaining whether > xact_wrote_changes at the LogicalDecodingContext level, one of later > transaction can overwrite the flag for previously streaming > transaction. I think it is logical to have this flag at each > transaction level (aka in ReorderBufferTxn), however till now it was > fine because the changes of each transaction are decoded at one-shot > which will be no longer true. We can keep a output_plugin_private data > pointer in ReorderBufferTxn which will be used by test_decoding module > to keep this and any other such flags in future. We need to set this > flag at begin_cb and stream_start_cb APIs and then reset/remove it at > stream_commit_cb, stream_abort_cb and stream_stop_cb APIs. > > Additionally, we can extend the existing test case > concurrent_stream.spec to cover this scenario by adding a step to have > an empty transaction before the commit of transaction which we are > going to stream changes for (before s1_commit). > > Thoughts? The analysis seems correct to me, I will work on it. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
logical streaming of xacts via test_decoding is broken
Michael reported a BF failure [1] related to one of the logical streaming test case and I've analyzed the issue. As responded on pgsql-committers [2], the issue here is that the streaming transactions can be interleaved and because we are maintaining whether xact_wrote_changes at the LogicalDecodingContext level, one of later transaction can overwrite the flag for previously streaming transaction. I think it is logical to have this flag at each transaction level (aka in ReorderBufferTxn), however till now it was fine because the changes of each transaction are decoded at one-shot which will be no longer true. We can keep a output_plugin_private data pointer in ReorderBufferTxn which will be used by test_decoding module to keep this and any other such flags in future. We need to set this flag at begin_cb and stream_start_cb APIs and then reset/remove it at stream_commit_cb, stream_abort_cb and stream_stop_cb APIs. Additionally, we can extend the existing test case concurrent_stream.spec to cover this scenario by adding a step to have an empty transaction before the commit of transaction which we are going to stream changes for (before s1_commit). Thoughts? [1] - https://www.postgresql.org/message-id/20201109014118.GD1695%40paquier.xyz [2] - https://www.postgresql.org/message-id/CAA4eK1JMCm9HURVmOapo%2Bv2u2EEABOuzgp7XJ32C072ygcKktQ%40mail.gmail.com -- With Regards, Amit Kapila.
Re: Temporary tables versus wraparound... again
On Sun, Nov 08, 2020 at 06:19:57PM -0500, Greg Stark wrote: > However in the case of ON COMMIT DELETE ROWS we can do better. Why not > just reset the relfrozenxid and other stats as if the table was > freshly created when it's truncated? That concept is sound. > 1) Should we update relpages and reltuples. I think so but an argument > could be made that people might be depending on running analyze once > when the data is loaded and then not having to run analyze on every > data load. I'd wager no, we should not. An app that ever analyzes an ON COMMIT DELETE ROWS temp table probably analyzes it every time. If not, it's fair to guess that similar statistics recur in each xact. > 2) adding the dependency on heapam.h to heap.c makes sense because of > heap_inplace_update bt it may be a bit annoying because I suspect > that's a useful sanity check that the tableam stuff hasn't been > bypassed That is not terrible. How plausible would it be to call vac_update_relstats() for this, instead of reimplementing part of it? > 3) I added a check to the regression tests but I'm not sure it's a > good idea to actually commit this. It could fail if there's a parallel > transaction going on and even moving the test to the serial schedule > might not guarantee that never happens due to autovacuum running > analyze? Right. > @@ -3340,6 +3383,7 @@ heap_truncate_one_rel(Relation rel) > > /* Truncate the underlying relation */ > table_relation_nontransactional_truncate(rel); > + ResetVacStats(rel); I didn't test, but I expect this will cause a stats reset for the second TRUNCATE here: CREATE TABLE t (); ... BEGIN; TRUNCATE t; TRUNCATE t; -- inplace relfrozenxid reset ROLLBACK; -- inplace reset survives Does that indeed happen?
Re: Spurious "apparent wraparound" via SimpleLruTruncate() rounding
On Wed, Oct 28, 2020 at 09:01:59PM -0700, Noah Misch wrote: > On Sat, Aug 29, 2020 at 10:34:33PM -0700, Noah Misch wrote: > > On Mon, May 25, 2020 at 12:00:33AM -0700, Noah Misch wrote: > > > [last non-rebase change] > > > > Rebased the second patch. The first patch did not need a rebase. > > Rebased both patches, necessitated by commit dee663f changing many of the same > spots. Rebased both patches, necessitated by commit c732c3f (a repair of commit dee663f). As I mentioned on another branch of the thread, I'd be content if someone reviews the slru-truncate-modulo patch and disclaims knowledge of the slru-truncate-insurance patch; I would then abandon the latter patch. Author: Noah Misch Commit: Noah Misch Prevent excess SimpleLruTruncate() deletion. Every core SLRU wraps around. With the exception of pg_notify, the wrap point can fall in the middle of a page. Account for this in the PagePrecedes callback specification and in SimpleLruTruncate()'s use of said callback. Update each callback implementation to fit the new specification. This changes SerialPagePrecedesLogically() from the style of asyncQueuePagePrecedes() to the style of CLOGPagePrecedes(). (Whereas pg_clog and pg_serial share a key space, pg_serial is nothing like pg_notify.) The bug fixed here has the same symptoms and user followup steps as 592a589a04bd456410b853d86bd05faa9432cbbb. Back-patch to 9.5 (all supported versions). Reviewed by Tom Lane. Discussion: https://postgr.es/m/20190202083822.gc32...@gust.leadboat.com diff --git a/src/backend/access/transam/clog.c b/src/backend/access/transam/clog.c index 034349a..55bdac4 100644 --- a/src/backend/access/transam/clog.c +++ b/src/backend/access/transam/clog.c @@ -694,6 +694,7 @@ CLOGShmemInit(void) SimpleLruInit(XactCtl, "Xact", CLOGShmemBuffers(), CLOG_LSNS_PER_PAGE, XactSLRULock, "pg_xact", LWTRANCHE_XACT_BUFFER, SYNC_HANDLER_CLOG); + SlruPagePrecedesUnitTests(XactCtl, CLOG_XACTS_PER_PAGE); } /* @@ -912,13 +913,22 @@ TruncateCLOG(TransactionId oldestXact, Oid oldestxid_datoid) /* - * Decide which of two CLOG page numbers is "older" for truncation purposes. + * Decide whether a CLOG page number is "older" for truncation purposes. * * We need to use comparison of TransactionIds here in order to do the right - * thing with wraparound XID arithmetic. However, if we are asked about - * page number zero, we don't want to hand InvalidTransactionId to - * TransactionIdPrecedes: it'll get weird about permanent xact IDs. So, - * offset both xids by FirstNormalTransactionId to avoid that. + * thing with wraparound XID arithmetic. However, TransactionIdPrecedes() + * would get weird about permanent xact IDs. So, offset both such that xid1, + * xid2, and xid + CLOG_XACTS_PER_PAGE - 1 are all normal XIDs; this offset is + * relevant to page 0 and to the page preceding page 0. + * + * The page containing oldestXact-2^31 is the important edge case. The + * portion of that page equaling or following oldestXact-2^31 is expendable, + * but the portion preceding oldestXact-2^31 is not. When oldestXact-2^31 is + * the first XID of a page and segment, the entire page and segment is + * expendable, and we could truncate the segment. Recognizing that case would + * require making oldestXact, not just the page containing oldestXact, + * available to this callback. The benefit would be rare and small, so we + * don't optimize that edge case. */ static bool CLOGPagePrecedes(int page1, int page2) @@ -927,11 +937,12 @@ CLOGPagePrecedes(int page1, int page2) TransactionId xid2; xid1 = ((TransactionId) page1) * CLOG_XACTS_PER_PAGE; - xid1 += FirstNormalTransactionId; + xid1 += FirstNormalTransactionId + 1; xid2 = ((TransactionId) page2) * CLOG_XACTS_PER_PAGE; - xid2 += FirstNormalTransactionId; + xid2 += FirstNormalTransactionId + 1; - return TransactionIdPrecedes(xid1, xid2); + return (TransactionIdPrecedes(xid1, xid2) && + TransactionIdPrecedes(xid1, xid2 + CLOG_XACTS_PER_PAGE - 1)); } diff --git a/src/backend/access/transam/commit_ts.c b/src/backend/access/transam/commit_ts.c index 2fe551f..ae45777 100644 --- a/src/backend/access/transam/commit_ts.c +++ b/src/backend/access/transam/commit_ts.c @@ -557,6 +557,7 @@ CommitTsShmemInit(void) CommitTsSLRULock, "pg_commit_ts", LWTRANCHE_COMMITTS_BUFFER, SYNC_HANDLER_COMMIT_TS); + SlruPagePrecedesUnitTests(CommitTsCtl, COMMIT_TS_XACTS_PER_PAGE); commitTsShared = ShmemInitStruct("CommitTs shared", sizeof(CommitTimestampShared), @@ -927,14 +928,27 @@ AdvanceOldestCommitTsXid(TransactionId oldestXact)
Re: Hybrid Hash/Nested Loop joins and caching results from subplans
On Mon, Nov 9, 2020 at 10:07 AM David Rowley wrote: > On Mon, 9 Nov 2020 at 03:52, Andy Fan wrote: > > then I did a perf on the 2 version, Is it possible that you called > tts_minimal_clear twice in > > the v9 version? Both ExecClearTuple and ExecStoreMinimalTuple called > tts_minimal_clear > > on the same slot. > > > > With the following changes: > > Thanks for finding that. After applying that fix I did a fresh set of > benchmarks on the latest master, latest master + v8 and latest master > + v9 using the attached script. (resultcachebench2.sh.txt) > > I ran this on my zen2 AMD64 machine and formatted the results into the > attached resultcache_master_vs_v8_vs_v9.csv file > > If I load this into PostgreSQL: > > # create table resultcache_bench (tbl text, target text, col text, > latency_master numeric(10,3), latency_v8 numeric(10,3), latency_v9 > numeric(10,3)); > # copy resultcache_bench from > '/path/to/resultcache_master_vs_v8_vs_v9.csv' with(format csv); > > and run: > > # select col,tbl,target, sum(latency_v8) v8, sum(latency_v9) v9, > round(avg(latency_v8/latency_v9)*100,1) as v8_vs_v9 from > resultcache_bench group by 1,2,3 order by 2,1,3; > > I've attached the results of the above query. (resultcache_v8_vs_v9.txt) > > Out of the 24 tests done on each branch, only 6 of 24 are better on v9 > compared to v8. So v8 wins on 75% of the tests. I think either version is OK for me and I like this patch overall. However I believe v9 should be no worse than v8 all the time, Is there any theory to explain your result? v9 never wins using > the lookup1 table (1 row per lookup). It only runs on 50% of the > lookup100 queries (100 inner rows per outer row). However, despite the > draw in won tests for the lookup100 test, v8 takes less time overall, > as indicated by the following query: > > postgres=# select round(avg(latency_v8/latency_v9)*100,1) as v8_vs_v9 > from resultcache_bench where tbl='lookup100'; > v8_vs_v9 > -- > 99.3 > (1 row) > > Ditching the WHERE clause and simply doing: > > postgres=# select round(avg(latency_v8/latency_v9)*100,1) as v8_vs_v9 > from resultcache_bench; > v8_vs_v9 > -- > 96.2 > (1 row) > > indicates that v8 is 3.8% faster than v9. Altering that query > accordingly indicates v8 is 11.5% faster than master and v9 is only 7% > faster than master. > > Of course, scaling up the test will yield both versions being even > more favourable then master, but the point here is comparing v8 to v9. > > David > -- Best Regards Andy Fan
RE: extension patch of CREATE OR REPLACE TRIGGER
Hi, On Saturday, Nov 7, 2020 2:06 PM Dilip Kumar wrote: > The patch looks fine to me however I feel that in the test case there are a > lot > of duplicate statement which can be reduced e.g. > +-- 1. Overwrite existing regular trigger with regular trigger > (without OR REPLACE) > +create trigger my_trig > + after insert on my_table > + for each row execute procedure funcA(); create trigger my_trig > + after insert on my_table > + for each row execute procedure funcB(); -- should fail drop trigger > +my_trig on my_table; > + > +-- 2. Overwrite existing regular trigger with regular trigger (with OR > +REPLACE) create trigger my_trig > + after insert on my_table > + for each row execute procedure funcA(); insert into my_table values > +(1); create or replace trigger my_trig > + after insert on my_table > + for each row execute procedure funcB(); -- OK insert into my_table > +values (1); drop trigger my_trig on my_table; > > In this test, test 1 failed because it tried to change the trigger function > without > OR REPLACE, which is fine but now test 2 can continue from there, I mean we > don't need to drop the trigger at end of the > test1 and then test2 can try it with OR REPLACE syntax. This way we can > reduce the extra statement execution which is not necessary. OK. That makes sense. Attached the revised version. The tests in this patch should not include redundancy. I checked the tests of trigger replacement for partition tables as well. Here, I did not and will not delete the comments with numbering from 1 to 8 so that other developers can check if the all cases are listed up or not easily. Best, Takamichi Osumi CREATE_OR_REPLACE_TRIGGER_v18.patch Description: CREATE_OR_REPLACE_TRIGGER_v18.patch
Re: Some doubious code in pgstat.c
At Fri, 6 Nov 2020 16:40:39 +0530, Amit Kapila wrote in > On Thu, Nov 5, 2020 at 2:13 PM Kyotaro Horiguchi > wrote: > > > > At Thu, 5 Nov 2020 11:48:24 +0530, Amit Kapila > > wrote in > > > On Thu, Nov 5, 2020 at 9:44 AM Masahiko Sawada > > > wrote: > > > > > > > > On Thu, Nov 5, 2020 at 11:18 AM Kyotaro Horiguchi > > > > wrote: > > > > > As another issue, just replace memcpy with strlcpy makes compiler > > > > > complain of type mismatch, as the first paramter to memcpy had an > > > > > needless "&" operator. I removed it in this patch. > > > > > > > > > > (&msg.m_slotname is a "char (*)[NAMEDATALEN]", not a "char *".) > > > > > > > > > > > > > The patch looks good to me. > > > > > > > > > > LGTM as well but the proposed commit message seems to be a bit > > > unclear. How about something like this: > > > "Use strlcpy instead of memcpy for copying the slot name in pgstat.c. > > > > > > There is no outright bug here but it is better to be consistent with > > > the usage at other places in the same file. In the passing, fix a wrong > > > Assertion in pgstat_recv_replslot." > > > > Looks better, thanks. > > > > Pushed! Thanks! -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Protect syscache from bloating with negative cache entries
At Mon, 09 Nov 2020 11:13:31 +0900 (JST), Kyotaro Horiguchi wrote in > Now the branch for counter-increment is removed. For similar > branches for counter-decrement side in CatCacheCleanupOldEntries(), > Min() is compiled into cmovbe and a branch was removed. Mmm. Sorry, I sent this by a mistake. Please ignore it. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Protect syscache from bloating with negative cache entries
At Fri, 6 Nov 2020 10:42:15 +0200, Heikki Linnakangas wrote in > On 06/11/2020 10:24, Kyotaro Horiguchi wrote: > > Thank you for the comment! > > First off, I thought that I managed to eliminate the degradation > > observed on the previous versions, but significant degradation (1.1% > > slower) is still seen in on case. > > One thing to keep in mind with micro-benchmarks like this is that even > completely unrelated code changes can change the layout of the code in > memory, which in turn can affect CPU caching affects in surprising > ways. If you're lucky, you can see 1-5% differences just by adding a > function that's never called, for example, if it happens to move other > code in memory so that a some hot codepath or struct gets split across > CPU cache lines. It can be infuriating when benchmarking. True. I sometimes had to make distclean to stabilize such benchmarks.. > > At Thu, 5 Nov 2020 11:09:09 +0200, Heikki Linnakangas > > wrote in > > (A) original test patch > > I naively thought that the code path is too short to bury the > > degradation of additional a few instructions. Actually I measured > > performance again with the same patch set on the current master and > > had the more or less the same result. > > master 8195.58ms, patched 8817.40 ms: +10.75% > > However, I noticed that the additional call was a recursive call and a > > jmp inserted for the recursive call seems taking significant > > time. After avoiding the recursive call, the difference reduced to > > +0.96% (master 8268.71ms : patched 8348.30ms) > > Just two instructions below are inserted in this case, which looks > > reasonable. > >8720ff <+31>: cmpl $0x,0x4ba942(%rip) # 0xd2ca48 > > > >872106 <+38>: jl 0x872240 (call to a function) > > That's interesting. I think a 1% degradation would be acceptable. > > I think we'd like to enable this feature by default though, so the > performance when it's enabled is also very important. > > > (C) inserting bare counter-update code without a branch > > > >> Do we actually need a branch there? If I understand correctly, the > >> point is to bump up a usage counter on the catcache entry. You could > >> increment the counter unconditionally, even if the feature is not > >> used, and avoid the branch that way. > > That change causes 4.9% degradation, which is worse than having a > > branch. > > master 8364.54ms, patched 8666.86ms (+4.9%) > > The additional instructions follow. > > + 8721ab <+203>:mov0x30(%rbx),%eax # %eax = ct->naccess > > + 8721ae <+206>:mov$0x2,%edx > > + 8721b3 <+211>:add$0x1,%eax# %eax++ > > + 8721b6 <+214>: cmove %edx,%eax # if %eax == 0 then %eax = 2 > > > > + 8721bf <+223>:mov%eax,0x30(%rbx) # ct->naccess = %eax > > + 8721c2 <+226>: mov 0x4cfe9f(%rip),%rax # 0xd42068 > > + 8721c9 <+233>:mov%rax,0x38(%rbx) # ct->lastaccess = %rax > > Do you need the "ntaccess == 2" test? You could always increment the > counter, and in the code that uses ntaccess to decide what to evict, > treat all values >= 2 the same. > > Need to handle integer overflow somehow. Or maybe not: integer > overflow is so infrequent that even if a hot syscache entry gets > evicted prematurely because its ntaccess count wrapped around to 0, it > will happen so rarely that it won't make any difference in practice. Agreed. Ok, I have prioritized completely avoiding degradation on the normal path, but laxing that restriction to 1% or so makes the code far simpler and make the expiration path signifinicantly faster. Now the branch for counter-increment is removed. For similar branches for counter-decrement side in CatCacheCleanupOldEntries(), Min() is compiled into cmovbe and a branch was removed.
Re: Hybrid Hash/Nested Loop joins and caching results from subplans
On Mon, 9 Nov 2020 at 03:52, Andy Fan wrote: > then I did a perf on the 2 version, Is it possible that you called > tts_minimal_clear twice in > the v9 version? Both ExecClearTuple and ExecStoreMinimalTuple called > tts_minimal_clear > on the same slot. > > With the following changes: Thanks for finding that. After applying that fix I did a fresh set of benchmarks on the latest master, latest master + v8 and latest master + v9 using the attached script. (resultcachebench2.sh.txt) I ran this on my zen2 AMD64 machine and formatted the results into the attached resultcache_master_vs_v8_vs_v9.csv file If I load this into PostgreSQL: # create table resultcache_bench (tbl text, target text, col text, latency_master numeric(10,3), latency_v8 numeric(10,3), latency_v9 numeric(10,3)); # copy resultcache_bench from '/path/to/resultcache_master_vs_v8_vs_v9.csv' with(format csv); and run: # select col,tbl,target, sum(latency_v8) v8, sum(latency_v9) v9, round(avg(latency_v8/latency_v9)*100,1) as v8_vs_v9 from resultcache_bench group by 1,2,3 order by 2,1,3; I've attached the results of the above query. (resultcache_v8_vs_v9.txt) Out of the 24 tests done on each branch, only 6 of 24 are better on v9 compared to v8. So v8 wins on 75% of the tests. v9 never wins using the lookup1 table (1 row per lookup). It only runs on 50% of the lookup100 queries (100 inner rows per outer row). However, despite the draw in won tests for the lookup100 test, v8 takes less time overall, as indicated by the following query: postgres=# select round(avg(latency_v8/latency_v9)*100,1) as v8_vs_v9 from resultcache_bench where tbl='lookup100'; v8_vs_v9 -- 99.3 (1 row) Ditching the WHERE clause and simply doing: postgres=# select round(avg(latency_v8/latency_v9)*100,1) as v8_vs_v9 from resultcache_bench; v8_vs_v9 -- 96.2 (1 row) indicates that v8 is 3.8% faster than v9. Altering that query accordingly indicates v8 is 11.5% faster than master and v9 is only 7% faster than master. Of course, scaling up the test will yield both versions being even more favourable then master, but the point here is comparing v8 to v9. David #!/bin/bash seconds=60 psql -c "drop table if exists hundredk, lookup1, lookup100;" postgres psql -c "create table hundredk (hundredk int, tenk int, thousand int, hundred int, ten int, one int);" postgres psql -c "insert into hundredk select x%10,x%1,x%1000,x%100,x%10,1 from generate_Series(1,10) x;" postgres psql -c "create table lookup100 (a int);" postgres psql -c "insert into lookup100 select x from generate_Series(1,10)x,generate_Series(1,100);" postgres psql -c "create index on lookup100 (a);" postgres psql -c "create table lookup1 (a int);" postgres psql -c "insert into lookup1 select x from generate_Series(1,10)x;" postgres psql -c "create index on lookup1 (a);" postgres psql -c "vacuum analyze lookup1, lookup100, hundredk;" postgres for branch in master resultcache_v8 resultcache_v9 do cd pg_src git checkout $branch make clean -s make -j -s make install -s cd ./ss.sh sleep 1 psql -c "select pg_prewarm('lookup1'), pg_prewarm('lookup100'), pg_prewarm('hundredk');" postgres for tbl in lookup1 lookup100 do for target in "count(*)" "count(l.a)" "'*'" do for col in thousand hundred ten one do echo "select $target from hundredk hk inner join $tbl l on hk.$col = l.a" > bench.sql echo Testing $branch $tbl $target $col >> bench.log pgbench -n -T $seconds -f bench.sql postgres | grep latency >> bench.log done done done done resultcache_master_vs_v8_vs_v9.csv Description: MS-Excel spreadsheet col|tbl| target |v8|v9| v8_vs_v9 --+---++--+--+-- hundred | lookup1 | '*'| 42.484 | 44.511 | 95.4 hundred | lookup1 | count(*) | 30.513 | 33.016 | 92.4 hundred | lookup1 | count(l.a) | 32.651 | 35.471 | 92.0 one | lookup1 | '*'| 42.084 | 43.668 | 96.4 one | lookup1 | count(*) | 29.255 | 32.162 | 91.0 one | lookup1 | count(l.a) | 31.772 | 35.139 | 90.4 ten | lookup1 | '*'| 40.286 | 42.439 | 94.9 ten | lookup1 | count(*) | 29.286 | 32.009 | 91.5 ten | lookup1 | count(l.a) | 31.392 | 34.053 | 92.2 thousand | lookup1 | '*'| 43.771 | 45.711 | 95.8 thousand | lookup1 | count(*) | 31.531 | 33.845 | 93.2 thousand | lookup1 | count(l.a) | 33.339 | 35.903 | 92.9 hundred | lookup100 | '*'| 1494.440 | 1471.999 |101.5 hundred | lookup100 | count(*) | 266.988 | 26
Re: array_cat anycompatible change is breaking xversion upgrade tests
I wrote: > I think the most plausible response is to add this aggregate to the filter > logic that already exists in the xversion tests. Perhaps we could > alternatively change this test case so that it relies on some other > polymorphic function, but I'm not quite sure what a good candidate > would be. After looking at the commit that added array_cat_accum() (65d9aedb1), I decided that it's fine to replace this aggregate with one using another anyarray function, such as array_larger(). The point of that test is just to show that the array argument can be array-of-record, so we don't need the operation to be array_cat() specifically. So I propose the attached patches to un-break the xversion tests. I'll hold off pushing this till after this week's wraps, though. regards, tom lane diff --git a/src/test/regress/expected/polymorphism.out b/src/test/regress/expected/polymorphism.out index 2c3bb0a60b..b33b004a7f 100644 --- a/src/test/regress/expected/polymorphism.out +++ b/src/test/regress/expected/polymorphism.out @@ -729,24 +729,24 @@ select q2, sql_if(q2 > 0, q2, q2 + 1) from int8_tbl; (5 rows) -- another sort of polymorphic aggregate -CREATE AGGREGATE array_cat_accum (anycompatiblearray) +CREATE AGGREGATE array_larger_accum (anyarray) ( -sfunc = array_cat, -stype = anycompatiblearray, +sfunc = array_larger, +stype = anyarray, initcond = '{}' ); -SELECT array_cat_accum(i) +SELECT array_larger_accum(i) FROM (VALUES (ARRAY[1,2]), (ARRAY[3,4])) as t(i); - array_cat_accum -- - {1,2,3,4} + array_larger_accum + + {3,4} (1 row) -SELECT array_cat_accum(i) +SELECT array_larger_accum(i) FROM (VALUES (ARRAY[row(1,2),row(3,4)]), (ARRAY[row(5,6),row(7,8)])) as t(i); - array_cat_accum - {"(1,2)","(3,4)","(5,6)","(7,8)"} + array_larger_accum + + {"(5,6)","(7,8)"} (1 row) -- another kind of polymorphic aggregate diff --git a/src/test/regress/sql/polymorphism.sql b/src/test/regress/sql/polymorphism.sql index 70a21c8978..527899e8ad 100644 --- a/src/test/regress/sql/polymorphism.sql +++ b/src/test/regress/sql/polymorphism.sql @@ -498,17 +498,17 @@ select q2, sql_if(q2 > 0, q2, q2 + 1) from int8_tbl; -- another sort of polymorphic aggregate -CREATE AGGREGATE array_cat_accum (anycompatiblearray) +CREATE AGGREGATE array_larger_accum (anyarray) ( -sfunc = array_cat, -stype = anycompatiblearray, +sfunc = array_larger, +stype = anyarray, initcond = '{}' ); -SELECT array_cat_accum(i) +SELECT array_larger_accum(i) FROM (VALUES (ARRAY[1,2]), (ARRAY[3,4])) as t(i); -SELECT array_cat_accum(i) +SELECT array_larger_accum(i) FROM (VALUES (ARRAY[row(1,2),row(3,4)]), (ARRAY[row(5,6),row(7,8)])) as t(i); -- another kind of polymorphic aggregate diff --git a/src/test/regress/expected/polymorphism.out b/src/test/regress/expected/polymorphism.out index 1ff40764d9..980e2e861c 100644 --- a/src/test/regress/expected/polymorphism.out +++ b/src/test/regress/expected/polymorphism.out @@ -729,24 +729,24 @@ select q2, sql_if(q2 > 0, q2, q2 + 1) from int8_tbl; (5 rows) -- another sort of polymorphic aggregate -CREATE AGGREGATE array_cat_accum (anyarray) +CREATE AGGREGATE array_larger_accum (anyarray) ( -sfunc = array_cat, +sfunc = array_larger, stype = anyarray, initcond = '{}' ); -SELECT array_cat_accum(i) +SELECT array_larger_accum(i) FROM (VALUES (ARRAY[1,2]), (ARRAY[3,4])) as t(i); - array_cat_accum -- - {1,2,3,4} + array_larger_accum + + {3,4} (1 row) -SELECT array_cat_accum(i) +SELECT array_larger_accum(i) FROM (VALUES (ARRAY[row(1,2),row(3,4)]), (ARRAY[row(5,6),row(7,8)])) as t(i); - array_cat_accum - {"(1,2)","(3,4)","(5,6)","(7,8)"} + array_larger_accum + + {"(5,6)","(7,8)"} (1 row) -- another kind of polymorphic aggregate diff --git a/src/test/regress/sql/polymorphism.sql b/src/test/regress/sql/polymorphism.sql index e5222f1f81..15b8dfc6ce 100644 --- a/src/test/regress/sql/polymorphism.sql +++ b/src/test/regress/sql/polymorphism.sql @@ -498,17 +498,17 @@ select q2, sql_if(q2 > 0, q2, q2 + 1) from int8_tbl; -- another sort of polymorphic aggregate -CREATE AGGREGATE array_cat_accum (anyarray) +CREATE AGGREGATE array_larger_accum (anyarray) ( -sfunc = array_cat, +sfunc = array_larger, stype = anyarray, initcond = '{}' ); -SELECT array_cat_accum(i) +SELECT array_larger_accum(i) FROM (VALUES (ARRAY[1,2]), (ARRAY[3,4])) as t(i); -SELECT array_cat_accum(i) +SELECT array_larger_accum(i) FROM (VALUES (ARRAY[row(1,2),row(3,4)]), (ARRAY[row(5,6),row(7,8)])) as t(i); -- another kind of polymorphic aggregate
Re: upcoming API changes for LLVM 12
Andres Freund writes: > Yea, I'll try to do that in the next few days (was plannin to last week, > but due to a hand injury I was typing one handed last week - makes it > pretty annoying to clean up code. But I just started being able to at > least use my left thumb again...). Ouch. Get well soon, and don't overstress your hand --- that's a good recipe for long-term problems. regards, tom lane
Temporary tables versus wraparound... again
We had an outage caused by transaction wraparound. And yes, one of the first things I did on this site was check that we didn't have any databases that were in danger of wraparound. However since then we added a monitoring job that used a temporary table with ON COMMIT DELETE ROWS. Since it was a simple monitoring job it stayed connected to the database and used this small temporary table for a very long period of time. The temporary table never got vacuumed by autovacuum and never by the monitoring job (since it was being truncated on every transaction why would it need to be vacuumed...). We've been around this bush before. Tom added orphaned table protection to autovacuum precisely because temporary tables can cause the datfrozenxid to get held back indefinitely. Then Michael Paquier and Tsunakawa Takayuki both found it worth making this more aggressive. But none of that helped as the temporary schema was still in use so they were not considered "orphaned" temp tables at all. I think we need to add some warnings to autovacuum when it detects *non* orphaned temporary tables that are older than the freeze threshold. However in the case of ON COMMIT DELETE ROWS we can do better. Why not just reset the relfrozenxid and other stats as if the table was freshly created when it's truncated? I put together this quick patch to check the idea and it seems to integrate fine in the code. I'm not sure about a few points but I don't think they're showstoppers. 1) Should we update relpages and reltuples. I think so but an argument could be made that people might be depending on running analyze once when the data is loaded and then not having to run analyze on every data load. 2) adding the dependency on heapam.h to heap.c makes sense because of heap_inplace_update bt it may be a bit annoying because I suspect that's a useful sanity check that the tableam stuff hasn't been bypassed 3) I added a check to the regression tests but I'm not sure it's a good idea to actually commit this. It could fail if there's a parallel transaction going on and even moving the test to the serial schedule might not guarantee that never happens due to autovacuum running analyze? I didn't actually add the warning to autovacuum yet. -- greg From 76eb00c43fba2a293dc4a079307e675e0eeaff06 Mon Sep 17 00:00:00 2001 From: Greg Stark Date: Sun, 8 Nov 2020 11:54:50 -0500 Subject: [PATCH] update relfrozenxmin when truncating temp tables --- src/backend/catalog/heap.c | 45 ++ src/test/regress/expected/temp.out | 21 ++ src/test/regress/parallel_schedule | 9 +--- src/test/regress/sql/temp.sql | 19 4 files changed, 91 insertions(+), 3 deletions(-) diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c index 4cd7d76..ffe36bb 100644 --- a/src/backend/catalog/heap.c +++ b/src/backend/catalog/heap.c @@ -30,6 +30,7 @@ #include "postgres.h" #include "access/genam.h" +#include "access/heapam.h" #include "access/htup_details.h" #include "access/multixact.h" #include "access/relation.h" @@ -3277,6 +3278,48 @@ RelationTruncateIndexes(Relation heapRelation) } /* + * Reset the relfrozenxid and other stats to the same values used when + * creating tables. This is used after non-transactional truncation. + * + * This reduces the need for long-running programs to vacuum their own + * temporary tables (since they're not covered by autovacuum) at least in the + * case where they're ON COMMIT DELETE ROWS. + * + * see also src/backend/commands/vacuum.c vac_update_relstats() + * also see AddNewRelationTuple() above + */ + +static void +ResetVacStats(Relation rel) +{ + HeapTuple ctup; + Form_pg_class pgcform; + Relation classRel; + + /* Fetch a copy of the tuple to scribble on */ + classRel = table_open(RelationRelationId, RowExclusiveLock); + ctup = SearchSysCacheCopy1(RELOID, ObjectIdGetDatum(RelationGetRelid(rel))); + if (!HeapTupleIsValid(ctup)) + elog(ERROR, "pg_class entry for relid %u vanished during truncation", + RelationGetRelid(rel)); + pgcform = (Form_pg_class) GETSTRUCT(ctup); + + /* + * Update relfrozenxid + */ + + pgcform->relpages = 0; + pgcform->reltuples = 0; + pgcform->relallvisible = 0; + pgcform->relfrozenxid = RecentXmin; + pgcform->relminmxid = GetOldestMultiXactId(); + + heap_inplace_update(classRel, ctup); + + table_close(classRel, RowExclusiveLock); +} + +/* * heap_truncate * * This routine deletes all data within all the specified relations. @@ -3340,6 +3383,7 @@ heap_truncate_one_rel(Relation rel) /* Truncate the underlying relation */ table_relation_nontransactional_truncate(rel); + ResetVacStats(rel); /* If the relation has indexes, truncate the indexes too */ RelationTruncateIndexes(rel); @@ -3351,6 +3395,7 @@ heap_truncate_one_rel(Relation rel) Relation toastrel = table_open(toastrelid, AccessExclusiveLock); table_relation_nontransactional_truncate(toastrel); + ResetVacStats(rel);
Re: upcoming API changes for LLVM 12
Hi, On 2020-11-08 17:35:20 -0500, Tom Lane wrote: > I suppose this is related to what you are talking about here. Yes. > If so, could we prioritize getting that committed? It's annoying > to have the buildfarm failures page so full of this one issue. Yea, I'll try to do that in the next few days (was plannin to last week, but due to a hand injury I was typing one handed last week - makes it pretty annoying to clean up code. But I just started being able to at least use my left thumb again...). Greetings, Andres Freund
Re: upcoming API changes for LLVM 12
Andres Freund writes: > On 2020-11-02 10:28:33 -0800, Jesse Zhang wrote: >> On Thu, Oct 15, 2020 at 6:12 PM Andres Freund wrote: >>> There will be a breaking API change for JIT related API in LLVM >>> 12. seawasp, which runs some bleeding-edge version of clang, has been falling over for the last couple of weeks: /home/fabien/clgtk/bin/clang -Wno-ignored-attributes -fno-strict-aliasing -fwrapv -O2 -D__STDC_LIMIT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_CONSTANT_MACROS -D_DEBUG -D_GNU_SOURCE -I/home/fabien/clgtk/include -I../../../../src/include -D_GNU_SOURCE -I/usr/include/libxml2 -flto=thin -emit-llvm -c -o llvmjit_types.bc llvmjit_types.c llvmjit.c:21:10: fatal error: 'llvm-c/OrcBindings.h' file not found #include ^~ 1 error generated. I suppose this is related to what you are talking about here. If so, could we prioritize getting that committed? It's annoying to have the buildfarm failures page so full of this one issue. regards, tom lane
2020-11-12 Release Announcement Draft
Hi, Attached is a draft of the release announcement for the upcoming 2020-11-12 update release. Corrections and feedback welcome, so long as it is submitted by 2020-11-11 AoE[1]. Thanks! Jonathan [1] https://en.wikipedia.org/wiki/Anywhere_on_Earth The PostgreSQL Global Development Group has released an update to all supported versions of our database system, including 13.1, 12.5, 11.10, 10.15, 9.6.20, and 9.5.24. This release fixes over 65 bugs reported over the last three months. Please plan to update at your earliest convenience. Additionally, this is the second-to-last release of PostgreSQL 9.5. If you are running PostgreSQL 9.5 in a production environment, we suggest that you make plans to upgrade. Bug Fixes and Improvements -- This update also fixes over 65 bugs that were reported in the last several months. Some of these issues only affect version 13, but may also apply to other supported versions. Some of these fixes include: * Fix a breakage in the replication protocol by ensuring that two "command completion" events are expected for `START_REPLICATION`. * Several fixes for potential data loss issues, though the circumstances where these issues manifest are rare. * Fix `ALTER ROLE` usage for users with the `BYPASSRLS` permission. * `ALTER TABLE ONLY ... DROP EXPRESSION` is not allowed on partitioned tables when there are child tables. * Ensure that `ALTER TABLE ONLY ... ENABLE/DISABLE TRIGGER` does not apply to child tables. * Fix for `ALTER TABLE ... SET NOT NULL` on partitioned tables to avoid a potential deadlock in parallel `pg_restore`. * Fix handling of expressions in `CREATE TABLE LIKE` with inheritance. * `DROP INDEX CONCURRENTLY` is not allowed on partitioned tables. * Allow `LOCK TABLE` to succeed on a self-referential view instead of throwing an error. * Several fixes around statistics collection and progress reporting for `REINDEX CONCURRENTLY`. * Ensure that `GENERATED` columns are updated when any columns they depend on are updated via a rule or an updatable view. * Support hash partitioning with using text array columns as a partition key. * Allow the jsonpath `.datetime()` method to accept ISO 8601-format timestamps. * During a "smart" shutdown, ensure background processes are not terminated until all foreground client sessions are completed, fixing an issue that broke the processing of parallel queries. * Several fixes for the query planner and optimizer. * Ensure that data is de-toasted before being inserted into a BRIN index. This could manifest itself with errors like "missing chunk number 0 for toast value NNN". If you have seen a similar error in an existing BRIN index, you should be able to correct it by using `REINDEX` on the index. * Additional fixes for B-tree, BRIN, GiST, and GIN indexes. * Fix the output of `EXPLAIN` to have the correct XML tag nesting for incremental sort plans. * Several fixes for memory leaks, including ones involving RLS policies, using `CALL` with PL/pgSQL, `SIGHUP` processing a configuration parameter that cannot be applied without a restart, and an edge-case for index lookup for a partition. * libpq can now support arbitrary-length lines in the .pgpass file. * On Windows, `psql` now reads the output of a backtick command in text mode, not binary mode, so it can now properly handle newlines. * Fix how `pg_dump`, `pg_restore`, `clusterdb`, `reindexdb`, and `vacuumdb` use complex connection-string parameters. * When the `\connect` command of `psql` reuses connection parameters, ensure that all non-overridden parameters from a previous connection string are also re-used. * Ensure that `pg_dump` collects per-column information about extension configuration tables, avoiding crashes when specifying `--inserts`. * Ensure that parallel `pg_restore` processes foreign keys referencing partitioned tables in the correct order. * Several fixes for `contrib/pgcrypto`, including a memory leak fix. This update also contains tzdata release 2020d for for DST law changes in Fiji, Morocco, Palestine, the Canadian Yukon, Macquarie Island, and Casey Station (Antarctica); plus historical corrections for France, Hungary, Monaco, and Palestine. For the full list of changes available, please review the [release notes](https://www.postgresql.org/docs/current/release.html). PostgreSQL 9.5 EOL Notice - PostgreSQL 9.5 will stop receiving fixes on February 11, 2021. If you are running PostgreSQL 9.5 in a production environment, we suggest that you make plans to upgrade to a newer, supported version of PostgreSQL. Please see our [versioning policy](https://www.postgresql.org/support/versioning/) for more information. Updating All PostgreSQL update releases are cumulative. As with other minor releases, users are not required to dump and reload their database or use `pg_upgrade` in order to apply this update release; you may simply shutdown PostgreSQL and update its binaries. Users who have skipped one or mo
Re: list of extended statistics on psql
Hi, I took a look at this today, and I think the code is ready, but the regression test needs a bit more work: 1) It's probably better to use somewhat more specific names for the objects, especially when created in public schema. It decreases the chance of a collision with other tests (which may be hard to notice because of timing). I suggest we use "stts_" prefix or something like that, per the attached 0002 patch. (0001 is just the v7 patch) 2) The test is failing intermittently because it's executed in parallel with stats_ext test, which is also creating extended statistics. So depending on the timing the \dX may list some of the stats_ext stuff. I'm not sure what to do about this. Either this part needs to be moved to a separate test executed in a different group, or maybe we should simply move it to stats_ext. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company >From 1a7e99c931c0b3ed5560abd3ff656412f8b353f1 Mon Sep 17 00:00:00 2001 From: Tomas Vondra Date: Sun, 8 Nov 2020 21:53:54 +0100 Subject: [PATCH 1/2] v7 --- doc/src/sgml/ref/psql-ref.sgml | 14 src/bin/psql/command.c | 3 + src/bin/psql/describe.c| 100 + src/bin/psql/describe.h| 3 + src/bin/psql/help.c| 1 + src/bin/psql/tab-complete.c| 4 +- src/test/regress/expected/psql.out | 77 ++ src/test/regress/sql/psql.sql | 29 + 8 files changed, 230 insertions(+), 1 deletion(-) diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml index 221a967bfe..fd860776af 100644 --- a/doc/src/sgml/ref/psql-ref.sgml +++ b/doc/src/sgml/ref/psql-ref.sgml @@ -1918,6 +1918,20 @@ testdb=> + + +\dX [ pattern ] + + +Lists extended statistics. +If pattern +is specified, only those extended statistics whose names match the pattern +are listed. +If + is appended to the command name, each extended statistics +is listed with its size. + + + \dy[+] [ pattern ] diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c index c7a83d5dfc..c6f1653cb7 100644 --- a/src/bin/psql/command.c +++ b/src/bin/psql/command.c @@ -930,6 +930,9 @@ exec_command_d(PsqlScanState scan_state, bool active_branch, const char *cmd) else success = listExtensions(pattern); break; + case 'X': /* Extended Statistics */ +success = listExtendedStats(pattern, show_verbose); +break; case 'y': /* Event Triggers */ success = listEventTriggers(pattern, show_verbose); break; diff --git a/src/bin/psql/describe.c b/src/bin/psql/describe.c index 07d640021c..1807324542 100644 --- a/src/bin/psql/describe.c +++ b/src/bin/psql/describe.c @@ -4397,6 +4397,106 @@ listEventTriggers(const char *pattern, bool verbose) return true; } +/* + * \dX + * + * Briefly describes extended statistics. + */ +bool +listExtendedStats(const char *pattern, bool verbose) +{ + PQExpBufferData buf; + PGresult *res; + printQueryOpt myopt = pset.popt; + + if (pset.sversion < 10) + { + char sverbuf[32]; + + pg_log_error("The server (version %s) does not support extended statistics.", + formatPGVersionNumber(pset.sversion, false, + sverbuf, sizeof(sverbuf))); + return true; + } + + initPQExpBuffer(&buf); + printfPQExpBuffer(&buf, + "SELECT \n" + "es.stxnamespace::pg_catalog.regnamespace::text AS \"%s\", \n" + "es.stxname AS \"%s\", \n" + "pg_catalog.format('%%s FROM %%s', \n" + " (SELECT pg_catalog.string_agg(pg_catalog.quote_ident(a.attname),', ') \n" + " FROM pg_catalog.unnest(es.stxkeys) s(attnum) \n" + " JOIN pg_catalog.pg_attribute a \n" + " ON (es.stxrelid = a.attrelid \n" + " AND a.attnum = s.attnum \n" + " AND NOT a.attisdropped)), \n" + "es.stxrelid::regclass) AS \"%s\", \n" + "CASE WHEN esd.stxdndistinct IS NOT NULL THEN 'built' \n" + " WHEN 'd' = any(es.stxkind) THEN 'defined' \n" + "END AS \"%s\", \n" + "CASE WHEN esd.stxddependencies IS NOT NULL THEN 'built' \n" + " WHEN 'f' = any(es.stxkind) THEN 'defined' \n" + "END AS \"%s\", \n" + "CASE WHEN esd.stxdmcv IS NOT NULL THEN 'built' \n" + " WHEN 'm' = any(es.stxkind) THEN 'defined' \n" + "END AS \"%s\"", + gettext_noop("Schema"), + gettext_noop("Name"), + gettext_noop("Definition"), + gettext_noop("N_distinct"), + gettext_noop("Dependencies"), + gettext_noop("Mcv")); + + if (verbose) + { + appendPQExpBuffer(&buf, + ",\nCASE WHEN esd.stxdndistinct IS NOT NULL THEN \n" + " pg_catalog.pg_size_pretty(pg_catalog.length(stxdndistinct)::bigint) \n" + " WHEN 'd' = any(stxkind) THEN '0 bytes' \n" + "END AS \"%s\", \n" + "CASE
Re: [BUG]: segfault during update
I wrote: > Yeah, this is sufficient reason why we must use the more invasive > patch on those branches. What I'm wondering now is if there's a > way to break even-older branches based on failure to handle dropped > columns here. After tracing through the example in v11, I see why those branches are not broken: when ExecBRUpdateTriggers decides to return the trigger-returned tuple, it sticks it into a target slot like this: /* * Return the modified tuple using the es_trig_tuple_slot. We assume * the tuple was allocated in per-tuple memory context, and therefore * will go away by itself. The tuple table slot should not try to * clear it. */ TupleTableSlot *newslot = estate->es_trig_tuple_slot; TupleDesctupdesc = RelationGetDescr(relinfo->ri_RelationDesc); if (newslot->tts_tupleDescriptor != tupdesc) ExecSetSlotDescriptor(newslot, tupdesc); ExecStoreTuple(newtuple, newslot, InvalidBuffer, false); So the slot that ExecConstraints et al will be working with contains the relation's actual tuple descriptor, not the approximate descr obtained by looking at the plan tlist. This logic is entirely gone in v12, which confirms my instinct that there was something about Andres' slot-manipulation changes that broke this scenario. In v12 we end up using the junkfilter's output slot, which does not have a sufficiently accurate tupdesc to deal with an on-disk tuple rather than one constructed by the executor. So I'll go see about back-patching 20d3fe900. regards, tom lane
Re: [BUG]: segfault during update
"Drouvot, Bertrand" writes: > Here is a scenario that produces segfault during update (on version 12 > and 13): Hm. So the point about failing to reproduce dropped columns is more critical than I thought. I wonder how come we get away with that before v12? > So, we would need to back port this commit on 12 and 13. Yeah, this is sufficient reason why we must use the more invasive patch on those branches. What I'm wondering now is if there's a way to break even-older branches based on failure to handle dropped columns here. regards, tom lane
Re: Yet another (minor) fix in BRIN
On 11/8/20 12:34 PM, Michael Paquier wrote: > On Sat, Nov 07, 2020 at 10:45:26PM -0500, Tom Lane wrote: >> The weekend before stable-branch releases is probably not the best >> time to be pushing "minor" fixes into those branches. I got my >> fingers burned today, and so did Peter. Don't follow our example ;-) > > You could just apply your stuff after the version is tagged (not > stamped as there could be urgent bug fixes between the stamp time and > the tag time, like packaing issues). Yeah, that's what I was planning to do - I was not suggesting I'll push this right away. Or at least I did not mean to. Sorry if that was not quite clear. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: pgbench stopped supporting large number of client connections on Windows
On 2020-11-07 01:01, Fabien COELHO wrote: Hello Marina, Hello, Fabien! Thank you for your comments! While trying to test a patch that adds a synchronization barrier in pgbench [1] on Windows, Thanks for trying that, I do not have a windows setup for testing, and the sync code I wrote for Windows is basically blind coding:-( FYI: 1) It looks like pgbench will no longer support Windows XP due to the function DeleteSynchronizationBarrier. From https://docs.microsoft.com/en-us/windows/win32/api/synchapi/nf-synchapi-deletesynchronizationbarrier : Minimum supported client: Windows 8 [desktop apps only] Minimum supported server: Windows Server 2012 [desktop apps only] On Windows Server 2008 R2 (MSVC 2013) the 6-th version of the patch [1] has compiled without (new) warnings, but when running pgbench I got the following error: The procedure entry point DeleteSynchronizationBarrier could not be located in the dynamic link library KERNEL32.dll. 2) On Windows Server 2019 (MSVC 2019) the 6-th version of the patch [1] with fix_max_client_conn_on_Windows.patch has compiled without (new) warnings. I made a few runs (-M prepared -c 100 -j 10 -T 10 -P1 -S) with and without your patches. On Linux (-M prepared -c 1000 -j 500 -T 10 -P1 -S) your patches fix problems with progress reports as in [2], but on Windows I did not notice such changes, see attached pgbench_runs_linux_vs_windows.zip. The almost same thing happens with reindexdb and vacuumdb (build on commit [3]): Windows fd implementation is somehow buggy because it does not return the smallest number available, and then with the assumption that select uses a dense array indexed with them (true on linux, less so on Windows which probably uses a sparse array), so that the number gets over the limit, even if less are actually used, hence the catch, as you noted. I agree with you. It looks like the structure fd_set just contains used sockets by this application on Windows, and the macro FD_SETSIZE is used only here. From https://docs.microsoft.com/en-us/windows/win32/api/winsock/ns-winsock-fd_set : typedef struct fd_set { u_int fd_count; SOCKET fd_array[FD_SETSIZE]; } fd_set, FD_SET, *PFD_SET, *LPFD_SET; From https://docs.microsoft.com/en-us/windows/win32/winsock/maximum-number-of-sockets-supported-2 : The maximum number of sockets that a Windows Sockets application can use is not affected by the manifest constant FD_SETSIZE. This value defined in the Winsock2.h header file is used in constructing the FD_SET structures used with select function. IIUC the checks below are not correct on Windows, since on this system sockets can have values equal to or greater than FD_SETSIZE (see Windows documentation [4] and pgbench debug output in attached pgbench_debug.txt). Okay. But then, how may one detect that there are too many fds in the set? I think that an earlier version of the code needed to make assumptions about the internal implementation of windows (there is a counter somewhere in windows fd_set struct), which was rejected because if was breaking the interface. Now your patch is basically resurrecting that. I tried to keep the behaviour "we check if the socket value can be used in select() at runtime", but now I will also read that thread... Why not if there is no other solution, but this is quite depressing, and because it breaks the interface it would be broken if windows changed its internals for some reason:-( It looks like if the internals of the structure fd_set are changed, we will also have problems with the function pgwin32_select from src/backend/port/win32/socket.c, because it uses fd_set.fd_count too?.. (I'm writing responses to the rest of your comments but it takes time...) [1] https://www.postgresql.org/message-id/alpine.DEB.2.22.394.2011021726390.989361%40pseudo [2] https://www.postgresql.org/message-id/20200227185129.hikscyenomnlrord%40alap3.anarazel.de -- Marina Polyakova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company<>
Re: Hybrid Hash/Nested Loop joins and caching results from subplans
On Mon, Nov 2, 2020 at 3:44 PM David Rowley wrote: > On Tue, 20 Oct 2020 at 22:30, David Rowley wrote: > > > > So far benchmarking shows there's still a regression from the v8 > > version of the patch. This is using count(*). An earlier test [1] did > > show speedups when we needed to deform tuples returned by the nested > > loop node. I've not yet repeated that test again. I was disappointed > > to see v9 slower than v8 after having spent about 3 days rewriting the > > patch > > I did some further tests this time with some tuple deforming. Again, > it does seem that v9 is slower than v8. > I run your test case on v8 and v9, I can produce a stable difference between them. v8: statement latencies in milliseconds: 1603.611 select count(*) from hundredk hk inner join lookup l on hk.thousand = l.a; v9: statement latencies in milliseconds: 1772.287 select count(*) from hundredk hk inner join lookup l on hk.thousand = l.a; then I did a perf on the 2 version, Is it possible that you called tts_minimal_clear twice in the v9 version? Both ExecClearTuple and ExecStoreMinimalTuple called tts_minimal_clear on the same slot. With the following changes: diff --git a/src/backend/executor/execMRUTupleCache.c b/src/backend/executor/execMRUTupleCache.c index 3553dc26cb..b82d8e98b8 100644 --- a/src/backend/executor/execMRUTupleCache.c +++ b/src/backend/executor/execMRUTupleCache.c @@ -203,10 +203,9 @@ prepare_probe_slot(MRUTupleCache *mrucache, MRUCacheKey *key) TupleTableSlot *tslot = mrucache->tableslot; int numKeys = mrucache->nkeys; - ExecClearTuple(pslot); - if (key == NULL) { + ExecClearTuple(pslot); /* Set the probeslot's values based on the current parameter values */ for (int i = 0; i < numKeys; i++) pslot->tts_values[i] = ExecEvalExpr(mrucache->param_exprs[i], @@ -641,7 +640,7 @@ ExecMRUTupleCacheFetch(MRUTupleCache *mrucache) { mrucache->state = MRUCACHE_FETCH_NEXT_TUPLE; - ExecClearTuple(mrucache->cachefoundslot); + // ExecClearTuple(mrucache->cachefoundslot); slot = mrucache->cachefoundslot; ExecStoreMinimalTuple(mrucache->last_tuple->mintuple, slot, false); return slot; @@ -740,7 +739,7 @@ ExecMRUTupleCacheFetch(MRUTupleCache *mrucache) return NULL; } - ExecClearTuple(mrucache->cachefoundslot); + // ExecClearTuple(mrucache->cachefoundslot); slot = mrucache->cachefoundslot; ExecStoreMinimalTuple(mrucache->last_tuple->mintuple, slot, false); return slot; v9 has the following result: 1608.048 select count(*) from hundredk hk inner join lookup l on hk.thousand = l.a; > Graphs attached > > Looking at profiles, I don't really see any obvious reason as to why > this is. I'm very much inclined to just pursue the v8 patch (separate > Result Cache node) and just drop the v9 idea altogether. > > David > -- Best Regards Andy Fan
Re: Hybrid Hash/Nested Loop joins and caching results from subplans
On Fri, Nov 6, 2020 at 6:13 AM David Rowley wrote: > On Mon, 2 Nov 2020 at 20:43, David Rowley wrote: > > > > On Tue, 20 Oct 2020 at 22:30, David Rowley wrote: > > I did some further tests this time with some tuple deforming. Again, > > it does seem that v9 is slower than v8. > > > > Graphs attached > > > > Looking at profiles, I don't really see any obvious reason as to why > > this is. I'm very much inclined to just pursue the v8 patch (separate > > Result Cache node) and just drop the v9 idea altogether. > > Nobody raised any objections, so I'll start taking a more serious look > at the v8 version (the patch with the separate Result Cache node). > > One thing that I had planned to come back to about now is the name > "Result Cache". I admit to not thinking for too long on the best name > and always thought it was something to come back to later when there's > some actual code to debate a better name for. "Result Cache" was > always a bit of a placeholder name. > > Some other names that I'd thought of were: > > "MRU Hash" > "MRU Cache" > "Parameterized Tuple Cache" (bit long) > "Parameterized Cache" > "Parameterized MRU Cache" > > I think "Tuple Cache" would be OK which means it is a cache for tuples. Telling MRU/LRU would be too internal for an end user and "Parameterized" looks redundant given that we have said "Cache Key" just below the node name. Just my $0.01. -- Best Regards Andy Fan
Re: Yet another (minor) fix in BRIN
On Sat, Nov 07, 2020 at 10:45:26PM -0500, Tom Lane wrote: > The weekend before stable-branch releases is probably not the best > time to be pushing "minor" fixes into those branches. I got my > fingers burned today, and so did Peter. Don't follow our example ;-) You could just apply your stuff after the version is tagged (not stamped as there could be urgent bug fixes between the stamp time and the tag time, like packaing issues). -- Michael signature.asc Description: PGP signature
Re: Collation versioning
On Fri, Oct 4, 2019 at 1:25 AM Thomas Munro wrote: > Ok. Here's one like that. Also, a WIP patch for FreeBSD. Here's an updated patch for FreeBSD, which I'll sit on for a bit longer. It needs bleeding edge 13-CURRENT (due out Q1ish). From b9cb5562457c214c48c0a6334b8ed3264f50e3d6 Mon Sep 17 00:00:00 2001 From: Thomas Munro Date: Sun, 8 Nov 2020 21:12:12 +1300 Subject: [PATCH] Add collation versions for FreeBSD. On FreeBSD 13, use querylocale() to read the current version of libc collations. --- doc/src/sgml/charset.sgml | 3 ++- src/backend/utils/adt/pg_locale.c | 20 2 files changed, 22 insertions(+), 1 deletion(-) diff --git a/doc/src/sgml/charset.sgml b/doc/src/sgml/charset.sgml index 832a701523..e151987eff 100644 --- a/doc/src/sgml/charset.sgml +++ b/doc/src/sgml/charset.sgml @@ -973,7 +973,8 @@ CREATE COLLATION ignore_accents (provider = icu, locale = 'und-u-ks-level1-kc-tr Version information is available from the icu provider on all operating systems. For the libc provider, versions are currently only available -on systems using the GNU C library (most Linux systems) and Windows. +on systems using the GNU C library (most Linux systems), FreeBSD and +Windows. diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c index 1dfe343b79..b1a8eb9a4e 100644 --- a/src/backend/utils/adt/pg_locale.c +++ b/src/backend/utils/adt/pg_locale.c @@ -1684,6 +1684,26 @@ get_collation_actual_version(char collprovider, const char *collcollate) /* Use the glibc version because we don't have anything better. */ collversion = pstrdup(gnu_get_libc_version()); +#elif defined(LC_VERSION_MASK) + locale_tloc; + + /* C[.encoding] and POSIX never change. */ + if (strcmp("C", collcollate) == 0 || + strncmp("C.", collcollate, 2) == 0 || + strcmp("POSIX", collcollate) == 0) + return NULL; + + /* Look up FreeBSD collation version. */ + loc = newlocale(LC_COLLATE, collcollate, NULL); + if (loc) + { + collversion = +pstrdup(querylocale(LC_COLLATE_MASK | LC_VERSION_MASK, loc)); + freelocale(loc); + } + else + ereport(ERROR, + (errmsg("could not load locale \"%s\"", collcollate))); #elif defined(WIN32) && _WIN32_WINNT >= 0x0600 /* * If we are targeting Windows Vista and above, we can ask for a name -- 2.28.0
Re: Allow "snapshot too old" error, to prevent bloat
On Sun, Nov 8, 2020 at 5:14 PM Tom Lane wrote: > Jim Nasby writes: > > On 2/15/15 10:36 AM, Tom Lane wrote: > >> I wonder if we couldn't achieve largely the same positive effects > through > >> adding a simple transaction-level timeout option. > > > A common use-case is long-running reports hitting relatively stable data > > in a database that also has tables with a high churn rate (ie: queue > > tables). In those scenarios your only options right now are to suffer > > huge amounts of bloat in the high-churn or not do your reporting. A > > simple transaction timeout only "solves" this by denying you reporting > > queries. > > Agreed, but Kevin's proposal has exactly the same problem only worse, > because (a) the reporting transactions might or might not fail (per > Murphy, they'll fail consistently only when you're on deadline), and > (b) if they do fail, there's nothing you can do short of increasing the > slack db-wide. > > > An idea that I've had on this would be some way to "lock down" the > > tables that a long-running transaction could access. That would allow > > vacuum to ignore any snapshots that transaction had for tables it wasn't > > accessing. That's something that would be deterministic. > > There might be something in that, but again it's not much like this patch. > The key point I think we're both making is that nondeterministic failures > are bad, Based on my experience, a long transaction which is caused by a long-running query is not so many. A common case I met is user start a transaction but forget it to close it, I am thinking if we can do something for this situation. Since most users will use Read Committed isolation level, so after a user completes a query, the next query will use a fresh new snapshot, so there is no need to block the oldest xmin because of this. will it be correct to advance the oldest xmin in this case? If yes, what would be the blocker in implementing this feature? -- Best Regards Andy Fan
Re: pgbench stopped supporting large number of client connections on Windows
Hello Tom, Use ppoll, and start more threads but not too many? Does ppoll exist on Windows? Some g*gling suggest that the answer is no. There was a prior thread on this topic, which seems to have drifted off into the sunset: Indeed. I may have contributed to this dwindling by not adding a CF entry for this thread, so that there was no reminder anywhere. https://www.postgresql.org/message-id/flat/BL0PR1901MB1985F219C46C61EDE7036C34ED8E0%40BL0PR1901MB1985.namprd19.prod.outlook.com It seems that there is no simple good solution around windows wait event implementations: - timeout seems to be milliseconds on all things I saw - 64 is an intrinsic limit, probably because of the underlying n² implementations Maybe we could provide a specific windows implementation limited to 64 fd (which is not a bad thing bench-wise) but with a rounded-down timeout, so that it could end-up on an activate spinning wait in some cases, which is probably not a bug issue, all things considered. -- Fabien.