Re: proposal: type info support functions for functions that use "any" type
st 15. 1. 2020 v 11:04 odesílatel Pavel Stehule napsal: > > > út 14. 1. 2020 v 22:09 odesílatel Tom Lane napsal: > >> Pavel Stehule writes: >> > [ parser-support-function-with-demo-20191128.patch ] >> >> TBH, I'm still not convinced that this is a good idea. Restricting >> the support function to only change the function's return type is >> safer than the original proposal, but it's still not terribly safe. >> If you change the support function's algorithm in any way, how do >> you know whether you've broken existing stored queries? If the >> support function consults external resources to make its choice >> (perhaps checking the existence of a cast), where could we record >> that the query depends on the existence of that cast? There'd be >> no visible trace of that in the query parsetree. >> > > This risk is real and cannot be simply solved without more complications. > > Can be solution to limit and enforce this functionality only for > extensions that be initialized from shared_preload_libraries or > local_preload_libraries? > When we check, so used function is started from dynamic loaded extension, we can raise a error. It's not too great for upgrades, but I expect upgrade of this kind extension is very similar like Postgres - and the restart can be together. > >> I'm also still not convinced that this idea allows doing anything >> that can't be done just as well with polymorphism. It would be a >> really bad idea for the support function to be examining the values >> of the arguments (else what happens when they're not constants?). >> So all you can do is look at their types, and then it seems like >> the things you can usefully do are pretty much like polymorphism, >> i.e. select some one of the input types, or a related type such >> as an array type or element type. If there are gaps in what you >> can express with polymorphism, I'd much rather spend effort on >> improving that facility than in adding something that is only >> accessible to advanced C coders. (Yes, I know I've been slacking >> on reviewing [1].) >> > > For my purpose critical information is type. I don't need to work with > constant, but I can imagine, so some API can be nice to work with constant > value. > Yes, I can solve lot of things by patch [1], but not all, and this patch > shorter, and almost trivial. > All this discussion is motivated by my work on Orafce extension - https://github.com/orafce/orafce Unfortunately implementation of "decode" functions is not possible with patch [1]. Now I have 55 instances of "decode" function and I am sure, I don't cover all. With this patch (polymorphism on stereoids :)), I can do it very simple, and quickly. This functions and other similar. The patch was very simple, so I think, maybe wrongly, so it is acceptable way. Our polymorphism is strong, and if I design code natively for Postgres, than it is perfect. But It doesn't allow to implement some simple functions that are used in other databases. With this small patch I can cover almost all situations - and very simply. I don't want to increase complexity of polymorphism rules more - [1] is maximum, what we can implement with acceptable costs, but this generic system is sometimes not enough. But I invite any design, how this problem can be solved. Any ideas? > > >> Lastly, I still think that this patch doesn't begin to address >> all the places that would have to know about the feature. There's >> a lot of places that know about polymorphism --- if this is >> polymorphism on steroids, which it is, then why don't all of those >> places need to be touched? >> > > I am sorry, I don't understand last sentence? > > >> On the whole I think we should reject this idea. >> >> regards, tom lane >> >> [1] https://commitfest.postgresql.org/26/1911/ >> >
Re: Expose lock group leader pid in pg_stat_activity
On Thu, Jan 16, 2020 at 04:27:27PM +0900, Michael Paquier wrote: > While looking at the code, I think that we could refactor things a bit > for raw_wait_event, wait_event_type and wait_event which has some > duplicated code for backend and auxiliary processes. What about > filling in the wait event data after fetching the PGPROC entry, and > also fill in leader_pid for auxiliary processes. This does not matter > now, perhaps it will never matter (or not), but that would make the > code much more consistent. And actually, the way you are looking at the leader's PID is visibly incorrect and inconsistent because the patch takes no shared LWLock on the leader using LockHashPartitionLockByProc() followed by LWLockAcquire(), no? That's incorrect because it could be perfectly possible to crash with this code between the moment you check if lockGroupLeader is NULL and the moment you look at lockGroupLeader->pid if a process is being stopped in-between and removes itself from a lock group in ProcKill(). That's also inconsistent because it could be perfectly possible to finish with an incorrect view of the data while scanning for all the backend entries, like a leader set to NULL with workers pointing to the leader for example, or even workers marked incorrectly as NULL. The second one may not be a problem, but the first one could be confusing. -- Michael signature.asc Description: PGP signature
Re: Expose lock group leader pid in pg_stat_activity
On Fri, Dec 27, 2019 at 10:15:33AM +0100, Julien Rouhaud wrote: > I think that not using "parallel" to name this field will help to > avoid confusion if the lock group infrastructure is eventually used > for something else, but that's only true if indeed we explain what a > lock group is. As you already pointed out, src/backend/storage/lmgr/README includes a full description of this stuff under the section "Group Locking". So I agree that the patch ought to document your new field in a much better way, without mentioning the term "group locking" that's even better to not confuse the reader because this term not present in the docs at all. > The leader_pid is NULL for processes not involved in parallel query. > When a process wants to cooperate with parallel workers, it becomes a > lock group leader, which means that this field will be valued to its > own pid. When a parallel worker starts up, this field will be valued > with the leader pid. The first sentence is good to have. Now instead of "lock group leader", I think that we had better use "parallel group leader" as in other parts of the docs (see wait events for example). Then we just need to say that if leader_pid has the same value as pg_stat_activity.pid, then we have a group leader. If not, then it is a parallel worker process initially spawned by the leader whose PID is leader_pid (when executing Gather, Gather Merge, soon-to-be parallel vacuum or for a parallel btree build, but this does not need a mention in the docs). There could be an argument as well to have leader_pid set to NULL for a leader, but that would be inconsistent with what the PGPROC entry reports for the backend. While looking at the code, I think that we could refactor things a bit for raw_wait_event, wait_event_type and wait_event which has some duplicated code for backend and auxiliary processes. What about filling in the wait event data after fetching the PGPROC entry, and also fill in leader_pid for auxiliary processes. This does not matter now, perhaps it will never matter (or not), but that would make the code much more consistent. -- Michael signature.asc Description: PGP signature
Re: [Proposal] Global temporary tables
On 15.01.2020 16:10, 曾文旌(义从) wrote: I do not see principle difference here with scenario when 50 sessions create (local) temp table, populate it with GB of data and create index for it. I think the problem is that when one session completes the creation of the index on GTT, it will trigger the other sessions build own local index of GTT in a centralized time. This will consume a lot of hardware resources (cpu io memory) in a short time, and even the database service becomes slow, because 50 sessions are building index. I think this is not what we expected. First of all creating index for GTT ni one session doesn't immediately initiate building indexes in all other sessions. Indexes are built on demand. If session is not using this GTT any more, then index for it will not build at all. And if GTT is really are actively used by all sessions, then building index and using it for constructing optimal execution plan is better, then continue to use sequential scan and read all GTT data from the disk. And as I already mentioned I do not see some principle difference in aspect of resource consumptions comparing with current usage of local temp tables. If we have have many sessions, each creating temp table, populating it with data and building index for it, then we will observe the same CPU utilization and memory resource consumption as in case of using GTT and creating index for it. Sorry, but I still not convinced by your and Tomas arguments. Yes, building GTT index may cause high memory consumption (maintenance_work_mem * n_backends). But such consumption can be observed also without GTT and it has to be taken in account when choosing value for maintenance_work_mem. But from my point of view it is much more important to make behavior of GTT as much compatible with normal tables as possible. Also from database administration point of view, necessity to restart sessions to make then use new indexes seems to be very strange and inconvenient. Alternatively DBA can address the problem with high memory consumption by adjusting maintenance_work_mem, so this solution is more flexible. -- Konstantin Knizhnik Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: Append with naive multiplexing of FDWs
Thank you very much for the testing of the patch, Ahsan! At Wed, 15 Jan 2020 15:41:04 -0500, Bruce Momjian wrote in > On Tue, Jan 14, 2020 at 02:37:48PM +0500, Ahsan Hadi wrote: > > Summary > > The patch is pretty good, it works well when there were little data back to > > the parent node. The patch doesn’t provide parallel FDW scan, it ensures > > that child nodes can send data to parent in parallel but the parent can only > > sequennly process the data from data nodes. "Parallel scan" at the moment means multiple workers fetch unique blocks from *one* table in an arbitrated manner. In this sense "parallel FDW scan" means multiple local workers fetch unique bundles of tuples from *one* foreign table, which means it is running on a single session. That doesn't offer an advantage. If parallel query processing worked in worker-per-table mode, especially on partitioned tables, maybe the current FDW would work without much of modification. But I believe asynchronous append on foreign tables on a single process is far resource-effective and moderately faster than parallel append. > > Providing there is no performance degrdation for non FDW append queries, > > I would recomend to consider this patch as an interim soluton while we are > > waiting for parallel FDW scan. > > Wow, these are very impressive results! Thanks. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: SlabCheck leaks memory into TopMemoryContext
Hi, On 2020-01-16 01:25:00 -0500, Tom Lane wrote: > Andres Freund writes: > > On 2020-01-16 00:09:53 -0500, Tom Lane wrote: > >> It's basically assuming that the memory management mechanism is sane, > >> which makes the whole thing fundamentally circular, even if it's > >> relying on some other context to be sane. Is there a way to do the > >> checking without extra allocations? > > > Probably not easily. > > In the AllocSet code, we don't hesitate to expend extra space per-chunk > for debug support: > > typedef struct AllocChunkData > { > ... > #ifdef MEMORY_CONTEXT_CHECKING > Sizerequested_size; > #endif > ... > > I don't see a pressing reason why SlabContext couldn't do something > similar, either per-chunk or per-context, whatever makes sense. Well, what I suggested upthread, was to just have two globals like int slabcheck_freechunks_size; bool *slabcheck_freechunks; and realloc that to the larger size in SlabContextCreate() if necessary, based on the computed chunksPerBlock? I thought you were asking whether any additional memory could just be avoided... Greetings, Andres Freund
Re: SlabCheck leaks memory into TopMemoryContext
Andres Freund writes: > On 2020-01-16 00:09:53 -0500, Tom Lane wrote: >> It's basically assuming that the memory management mechanism is sane, >> which makes the whole thing fundamentally circular, even if it's >> relying on some other context to be sane. Is there a way to do the >> checking without extra allocations? > Probably not easily. In the AllocSet code, we don't hesitate to expend extra space per-chunk for debug support: typedef struct AllocChunkData { ... #ifdef MEMORY_CONTEXT_CHECKING Sizerequested_size; #endif ... I don't see a pressing reason why SlabContext couldn't do something similar, either per-chunk or per-context, whatever makes sense. regards, tom lane
Re: SlabCheck leaks memory into TopMemoryContext
Hi, On 2020-01-16 00:09:53 -0500, Tom Lane wrote: > Andres Freund writes: > > I just noticed that having a slab context around in an assertion build > > leads to performance degrading and memory usage going up. A bit of > > poking revealed that SlabCheck() doesn't free the freechunks it > > allocates. > > > It's on its own obviously trivial to fix. > > It seems like having a context check function do new allocations > is something to be avoided in the first place. Yea, that's why I was wondering about doing the allocation during context creation, for the largest size necessary of any context: /* bitmap of free chunks on a block */ freechunks = palloc(slab->chunksPerBlock * sizeof(bool)); or at the very least using malloc(), rather than another context. > It's basically assuming that the memory management mechanism is sane, > which makes the whole thing fundamentally circular, even if it's > relying on some other context to be sane. Is there a way to do the > checking without extra allocations? Probably not easily. Was wondering if the bitmap could be made more dense, and allocated on the stack. It could actually using one bit for each tracked chunk, instead of one byte. Would have to put in a clear lower limit of the allowed chunk size, in relation to the block size, however, to stay in a reasonable range. Greetings, Andres Freund
Re: relcache sometimes initially ignores has_generated_stored
At Wed, 15 Jan 2020 10:11:05 -0800, Andres Freund wrote in > Hi, > > I think it's probably not relevant, but it confused me for a moment > that RelationBuildTupleDesc() might set constr->has_generated_stored to > true, but then throw away the constraint at the end, because nothing > matches the > /* >* Set up constraint/default info >*/ > if (has_not_null || ndef > 0 || > attrmiss || relation->rd_rel->relchecks) > test, i.e. there are no defaults. It was as follows before 16828d5c02. - if (constr->has_not_null || ndef > 0 ||relation->rd_rel->relchecks) At that time TupleConstr has only members defval, check and has_not_null other than subsidiary members. The condition apparently checked all of the members. Then the commit adds attrmiss to the condition since the corresponding member to TupleConstr. + if (constr->has_not_null || ndef > 0 || + attrmiss || relation->rd_rel->relchecks) Later fc22b6623b introduced has_generated_stored to TupleConstr but didn't add the corresponding check. > A quick assert confirms we do indeed pfree() constr in cases where > has_generated_stored == true. > > I suspect that's just an intermediate catalog, however, e.g. when > DefineRelation() does > heap_create_with_catalog(); > CommandCounterIncrement(); > relation_open(); > AddRelationNewConstraints(). > > It does still strike me as not great that we can get a different > relcache entry, even if transient, depending on whether there are other > reasons to create a TupleConstr. Say a NOT NULL column. > > I'm inclined to think we should just also check has_generated_stored in > the if quoted above? I agree to that. We could have a local boolean "has_any_constraint" to merge them but it would be an overkill. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: remove some STATUS_* symbols
On Sat, Jan 11, 2020 at 08:14:17AM +0100, Peter Eisentraut wrote: > OK, pushed as it was then. Thanks, that looks fine. I am still not sure whether the second patch adding an enum via ProcWaitStatus improves the code readability though, so my take would be to discard it for now. Perhaps others think differently, I don't know. -- Michael signature.asc Description: PGP signature
Re: TRUNCATE on foreign tables
On Wed, Jan 15, 2020 at 11:33:07PM +0900, Kohei KaiGai wrote: > 2020年1月15日(水) 17:11 Michael Paquier : >> I have done a quick read through the patch. You have modified the >> patch to pass down to the callback a list of relation OIDs to execute >> one command for all, and there are tests for FKs so that coverage >> looks fine. >> >> Regression tests are failing with this patch: >> -- TRUNCATE doesn't work on foreign tables, either directly or >> recursively >> TRUNCATE ft2; -- ERROR >> -ERROR: "ft2" is not a table >> +ERROR: foreign-data wrapper "dummy" has no handler >> You visibly just need to update the output because no handlers are >> available for truncate in this case. >> > What error message is better in this case? It does not print "ft2" anywhere, > so user may not notice that "ft2" is the source of the error. > How about 'foreign table "ft2" does not support truncate' ? It sounds to me that this message is kind of right. This FDW "dummy" does not have any FDW handler at all, so it complains about it. Having no support for TRUNCATE is something that may happen after that. Actually, this error message from your patch used for a FDW which has a handler but no TRUNCATE support could be reworked: + if (!fdwroutine->ExecForeignTruncate) + ereport(ERROR, + (errcode(ERRCODE_WRONG_OBJECT_TYPE), +errmsg("\"%s\" is not a supported foreign table", + relname))); Something like "Foreign-data wrapper \"%s\" does not support TRUNCATE"? >> + frels_list is a list of foreign tables that are >> + connected to a particular foreign server; thus, these foreign tables >> + should have identical foreign server ID >> The list is built by the backend code, so that has to be true. >> >> + foreach (lc, frels_list) >> + { >> + Relationfrel = lfirst(lc); >> + Oid frel_oid = RelationGetRelid(frel); >> + >> + if (server_id == GetForeignServerIdByRelId(frel_oid)) >> + { >> + frels_list = foreach_delete_current(frels_list, lc); >> + curr_frels = lappend(curr_frels, frel); >> + } >> + } >> Wouldn't it be better to fill in a hash table for each server with a >> list of relations? > > It's just a matter of preference. A temporary hash-table with server-id > and list of foreign-tables is an idea. Let me try to revise. Thanks. It would not matter much for relations without inheritance children, but if truncating a partition tree with many foreign tables using various FDWs that could matter performance-wise. -- Michael signature.asc Description: PGP signature
Re: [HACKERS] WAL logging problem in 9.4.3?
All the known defects are fixed. At Wed, 25 Dec 2019 16:15:21 -0800, Noah Misch wrote in > === Defect 3: storage.c checks size decrease of MAIN_FORKNUM only > > storage.c tracks only MAIN_FORKNUM in pendingsync->max_truncated. Is it > possible for MAIN_FORKNUM to have a net size increase while FSM_FORKNUM has a > net size decrease? I haven't tested, but this sequence seems possible: > > TRUNCATE > reduces MAIN_FORKNUM from 100 blocks to 0 blocks > reduces FSM_FORKNUM from 3 blocks to 0 blocks > COPY > raises MAIN_FORKNUM from 0 blocks to 110 blocks > does not change FSM_FORKNUM > COMMIT > should fsync, but wrongly chooses log_newpage_range() approach > > If that's indeed a problem, beside the obvious option of tracking every fork's > max_truncated, we could convert max_truncated to a bool and use fsync anytime > the relation experienced an mdtruncate(). (While FSM_FORKNUM is not critical > for database operations, the choice to subject it to checksums entails > protecting it here.) If that's not a problem, would you explain? That causes page-load failure since FSM can offer a nonexistent heap block, which failure leads to ERROR of an SQL statement. It's not critical but surely a problem. I'd like to take the bool option because insert-truncate sequence is rarely happen. That case is not our main target of the optimization so it is enough for us to make sure that the operation doesn't lead to such errors. The attached is nm30 patch followed by the three fix patches for the three defects. The new member "RelationData.isremoved" is renamed to "isdropped" in this version. regards. -- Kyotaro Horiguchi NTT Open Source Software Center >From 3a811c76874ae3c596e138369766ad00888c572c Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi Date: Thu, 21 Nov 2019 15:28:06 +0900 Subject: [PATCH v32 1/4] Rework WAL-skipping optimization While wal_level=minimal we omit WAL-logging for certain some operations on relfilenodes that are created in the current transaction. The files are fsynced at commit. The machinery accelerates bulk-insertion operations but it fails in certain sequence of operations and a crash just after commit may leave broken table files. This patch overhauls the machinery so that WAL-loggings on all operations are omitted for such relfilenodes. This patch also introduces a new feature that small files are emitted as a WAL record instead of syncing. The new GUC variable wal_skip_threshold controls the threshold. --- doc/src/sgml/config.sgml| 43 ++- doc/src/sgml/perform.sgml | 47 +-- src/backend/access/common/toast_internals.c | 4 +- src/backend/access/gist/gistutil.c | 31 +- src/backend/access/gist/gistxlog.c | 21 ++ src/backend/access/heap/heapam.c| 45 +-- src/backend/access/heap/heapam_handler.c| 22 +- src/backend/access/heap/rewriteheap.c | 21 +- src/backend/access/nbtree/nbtsort.c | 41 +-- src/backend/access/rmgrdesc/gistdesc.c | 5 + src/backend/access/transam/README | 45 ++- src/backend/access/transam/xact.c | 15 + src/backend/access/transam/xloginsert.c | 10 +- src/backend/access/transam/xlogutils.c | 18 +- src/backend/catalog/heap.c | 4 + src/backend/catalog/storage.c | 248 - src/backend/commands/cluster.c | 12 +- src/backend/commands/copy.c | 58 +-- src/backend/commands/createas.c | 11 +- src/backend/commands/matview.c | 12 +- src/backend/commands/tablecmds.c| 11 +- src/backend/storage/buffer/bufmgr.c | 125 ++- src/backend/storage/lmgr/lock.c | 12 + src/backend/storage/smgr/md.c | 36 +- src/backend/storage/smgr/smgr.c | 35 ++ src/backend/utils/cache/relcache.c | 159 +++-- src/backend/utils/misc/guc.c| 13 + src/include/access/gist_private.h | 2 + src/include/access/gistxlog.h | 1 + src/include/access/heapam.h | 3 - src/include/access/rewriteheap.h| 2 +- src/include/access/tableam.h| 15 +- src/include/catalog/storage.h | 6 + src/include/storage/bufmgr.h| 4 + src/include/storage/lock.h | 3 + src/include/storage/smgr.h | 1 + src/include/utils/rel.h | 57 ++- src/include/utils/relcache.h| 8 +- src/test/recovery/t/018_wal_optimize.pl | 374 src/test/regress/expected/alter_table.out | 6 + src/test/regress/sql/alter_table.sql| 7 + 41 files changed, 1242 insertions(+), 351 deletions(-) create mode 100644 src/test/recovery/t/018_wal_optimize.pl diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml index 5d45b6f7cb..0e7a0bc0ee 100644 --- a/doc/src/sgml/config.sgml +
Re: [HACKERS] Block level parallel vacuum
On Thu, Jan 16, 2020 at 10:11 AM Mahendra Singh Thalor wrote: > > On Thu, 16 Jan 2020 at 08:22, Amit Kapila wrote: > > > > > 2. > > > I checked time taken by vacuum.sql test. Execution time is almost same > > > with and without v45 patch. > > > > > > Without v45 patch: > > > Run1) vacuum ... ok 701 ms > > > Run2) vacuum ... ok 549 ms > > > Run3) vacuum ... ok 559 ms > > > Run4) vacuum ... ok 480 ms > > > > > > With v45 patch: > > > Run1) vacuum ... ok 842 ms > > > Run2) vacuum ... ok 808 ms > > > Run3) vacuum ... ok 774 ms > > > Run4) vacuum ... ok 792 ms > > > > > > > I see some variance in results, have you run with autovacuum as off. > > I was expecting that this might speed up some cases where parallel > > vacuum is used by default. > > I think, this is expected difference in timing because we are adding > some vacuum related test. I am not starting server manually(means I am > starting server with only default setting). > Can you once test by setting autovacuum = off? The autovacuum leads to variability in test timing. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: SlabCheck leaks memory into TopMemoryContext
Andres Freund writes: > I just noticed that having a slab context around in an assertion build > leads to performance degrading and memory usage going up. A bit of > poking revealed that SlabCheck() doesn't free the freechunks it > allocates. > It's on its own obviously trivial to fix. It seems like having a context check function do new allocations is something to be avoided in the first place. It's basically assuming that the memory management mechanism is sane, which makes the whole thing fundamentally circular, even if it's relying on some other context to be sane. Is there a way to do the checking without extra allocations? regards, tom lane
Re: Minimal logical decoding on standbys
On Fri, 10 Jan 2020 at 17:50, Rahila Syed wrote: > > Hi Amit, > > Can you please rebase the patches as they don't apply on latest master? Thanks for notifying. Attached is the rebased version. logicaldecodng_standby_v5_rebased.tar.gz Description: GNU Zip compressed data
SlabCheck leaks memory into TopMemoryContext
Hi Tomas, I just noticed that having a slab context around in an assertion build leads to performance degrading and memory usage going up. A bit of poking revealed that SlabCheck() doesn't free the freechunks it allocates. It's on its own obviously trivial to fix. But there's also this note at the top: * NOTE: report errors as WARNING, *not* ERROR or FATAL. Otherwise you'll * find yourself in an infinite loop when trouble occurs, because this * routine will be entered again when elog cleanup tries to release memory! which seems to be violated by doing: /* bitmap of free chunks on a block */ freechunks = palloc(slab->chunksPerBlock * sizeof(bool)); I guess it'd be better to fall back to malloc() here, and handle the allocation failure gracefully? Or perhaps we can just allocate something persistently during SlabCreate()? Greetings, Andres Freund
Re: [HACKERS] Block level parallel vacuum
On Thu, 16 Jan 2020 at 08:22, Amit Kapila wrote: > > On Thu, Jan 16, 2020 at 1:02 AM Mahendra Singh Thalor > wrote: > > > > On Wed, 15 Jan 2020 at 19:31, Mahendra Singh Thalor > > wrote: > > > > > > On Wed, 15 Jan 2020 at 19:04, Mahendra Singh Thalor > > > wrote: > > > > > > > > > > > > I reviewed v48 patch and below are some comments. > > > > > > > > 1. > > > > +* based on the number of indexes. -1 indicates a parallel vacuum > > > > is > > > > > > > > I think, above should be like "-1 indicates that parallel vacuum is" > > > > > > I am not an expert in this matter, but I am not sure if your > suggestion is correct. I thought an article is required here, but I > could be wrong. Can you please clarify? > > > > > 2. > > > > +/* Variables for cost-based parallel vacuum */ > > > > > > > > At the end of comment, there is 2 spaces. I think, it should be only 1 > > > > space. > > > > > > > > 3. > > > > I think, we should add a test case for parallel option(when degree is > > > > not specified). > > > > Ex: > > > > postgres=# VACUUM (PARALLEL) tmp; > > > > ERROR: parallel option requires a value between 0 and 1024 > > > > LINE 1: VACUUM (PARALLEL) tmp; > > > > ^ > > > > postgres=# > > > > > > > > Because above error is added in this parallel patch, so we should have > > > > test case for this to increase code coverage. > > > > > > I thought about it but was not sure to add a test for it. We might > not want to add a test for each and every case as that will increase > the number and time of tests without a significant advantage. Now > that you have pointed this, I can add a test for it unless someone > else thinks otherwise. > > > > > 1. > > With v45 patch, compute_parallel_delay is never called so function hit > > is zero. I think, we can add some delay options into vacuum.sql test > > to hit function. > > > > But how can we meaningfully test the functionality of the delay? It > would be tricky to come up with a portable test that can always > produce consistent results. > > > 2. > > I checked time taken by vacuum.sql test. Execution time is almost same > > with and without v45 patch. > > > > Without v45 patch: > > Run1) vacuum ... ok 701 ms > > Run2) vacuum ... ok 549 ms > > Run3) vacuum ... ok 559 ms > > Run4) vacuum ... ok 480 ms > > > > With v45 patch: > > Run1) vacuum ... ok 842 ms > > Run2) vacuum ... ok 808 ms > > Run3) vacuum ... ok 774 ms > > Run4) vacuum ... ok 792 ms > > > > I see some variance in results, have you run with autovacuum as off. > I was expecting that this might speed up some cases where parallel > vacuum is used by default. I think, this is expected difference in timing because we are adding some vacuum related test. I am not starting server manually(means I am starting server with only default setting). If we start server with default settings, then we will not hit vacuum related test cases to parallel because size of index relation is very small so we will not do parallel vacuum. -- Thanks and Regards Mahendra Singh Thalor EnterpriseDB: http://www.enterprisedb.com
Re: Assert failure due to "drop schema pg_temp_3 cascade" for temporary tables and \d+ is not showing any info after drooping temp table schema
On Fri, Jan 10, 2020 at 08:07:48PM +0900, Michael Paquier wrote: > Thinking more about it, this has a race condition if a temporary > schema is removed after collecting the OIDs in the drop phase. So the > updated attached is actually much more conservative and does not need > an update of the log message, without giving up on the improvements > done in v11~. In 9.4~10, the code of the second phase relies on > GetTempNamespaceBackendId() which causes an orphaned relation to not > be dropped in the event of a missing namespace. I'll just leave that > alone for a couple of days now.. And back on that one, I still like better the solution as of the attached which skips any relations with their namespace gone missing as 246a6c87's intention was only to allow orphaned temp relations to be dropped by autovacuum when a backend slot is connected, but not using yet its own temp namespace. If we want the drop of temp relations to work properly, more thoughts are needed regarding the storage part, and I am not actually sure that it is autovacuum's job to handle that better. Any thoughts? -- Michael diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c index f0e40e36af..9eb8132a37 100644 --- a/src/backend/postmaster/autovacuum.c +++ b/src/backend/postmaster/autovacuum.c @@ -2071,9 +2071,11 @@ do_autovacuum(void) { /* * We just ignore it if the owning backend is still active and - * using the temporary schema. + * using the temporary schema. If the namespace does not exist + * ignore the entry. */ - if (!isTempNamespaceInUse(classForm->relnamespace)) + if (!isTempNamespaceInUse(classForm->relnamespace) && +get_namespace_name(classForm->relnamespace) != NULL) { /* * The table seems to be orphaned -- although it might be that @@ -2202,6 +2204,7 @@ do_autovacuum(void) Oid relid = lfirst_oid(cell); Form_pg_class classForm; ObjectAddress object; + char *nspname; /* * Check for user-requested abort. @@ -2243,7 +2246,15 @@ do_autovacuum(void) continue; } - if (isTempNamespaceInUse(classForm->relnamespace)) + nspname = get_namespace_name(classForm->relnamespace); + + /* + * Nothing to do for a relation with a missing namespace. This + * check is the same as above when building the list of orphan + * relations. + */ + if (isTempNamespaceInUse(classForm->relnamespace) || + nspname == NULL) { UnlockRelationOid(relid, AccessExclusiveLock); continue; @@ -2253,8 +2264,7 @@ do_autovacuum(void) ereport(LOG, (errmsg("autovacuum: dropping orphan temp table \"%s.%s.%s\"", get_database_name(MyDatabaseId), - get_namespace_name(classForm->relnamespace), - NameStr(classForm->relname; + nspname, NameStr(classForm->relname; object.classId = RelationRelationId; object.objectId = relid; signature.asc Description: PGP signature
Re: Implementing Incremental View Maintenance
Aggregate operation of user-defined type cannot be specified (commit e150d964df7e3aeb768e4bae35d15764f8abd284) A SELECT statement using the MIN() and MAX() functions can be executed on a user-defined type column that implements the aggregate functions MIN () and MAX (). However, if the same SELECT statement is specified in the AS clause of CREATE INCREMENTAL MATERIALIZED VIEW, the following error will occur. ``` SELECT MIN(data) data_min, MAX(data) data_max FROM foo; data_min | data_max --+-- 1/3 | 2/3 (1 row) CREATE INCREMENTAL MATERIALIZED VIEW foo_min_imv AS SELECT MIN(data) data_min FROM foo; psql:extension-agg.sql:14: ERROR: aggregate function min is not supported CREATE INCREMENTAL MATERIALIZED VIEW foo_max_imv AS SELECT MAX(data) data_max FROM foo; psql:extension-agg.sql:15: ERROR: aggregate function max is not supported ``` Does query including user-defined type aggregate operation not supported by INCREMENTAL MATERIALIZED VIEW? An execution example is shown below. ``` [ec2-user@ip-10-0-1-10 ivm]$ cat extension-agg.sql -- -- pg_fraction: https://github.com/nuko-yokohama/pg_fraction -- DROP EXTENSION IF EXISTS pg_fraction CASCADE; DROP TABLE IF EXISTS foo CASCADE; CREATE EXTENSION IF NOT EXISTS pg_fraction; \dx \dT+ fraction CREATE TABLE foo (id int, data fraction); INSERT INTO foo (id, data) VALUES (1,'2/3'),(2,'1/3'),(3,'1/2'); SELECT MIN(data) data_min, MAX(data) data_max FROM foo; CREATE INCREMENTAL MATERIALIZED VIEW foo_min_imv AS SELECT MIN(data) data_min FROM foo; CREATE INCREMENTAL MATERIALIZED VIEW foo_max_imv AS SELECT MAX(data) data_max FROM foo; SELECT MIN(id) id_min, MAX(id) id_max FROM foo; CREATE INCREMENTAL MATERIALIZED VIEW foo_id_imv AS SELECT MIN(id) id_min, MAX(id) id_max FROM foo; ``` Best regards. 2018年12月27日(木) 21:57 Yugo Nagata : > Hi, > > I would like to implement Incremental View Maintenance (IVM) on > PostgreSQL. > IVM is a technique to maintain materialized views which computes and > applies > only the incremental changes to the materialized views rather than > recomputate the contents as the current REFRESH command does. > > I had a presentation on our PoC implementation of IVM at PGConf.eu 2018 > [1]. > Our implementation uses row OIDs to compute deltas for materialized > views. > The basic idea is that if we have information about which rows in base > tables > are contributing to generate a certain row in a matview then we can > identify > the affected rows when a base table is updated. This is based on an idea of > Dr. Masunaga [2] who is a member of our group and inspired from ID-based > approach[3]. > > In our implementation, the mapping of the row OIDs of the materialized view > and the base tables are stored in "OID map". When a base relation is > modified, > AFTER trigger is executed and the delta is recorded in delta tables using > the transition table feature. The accual udpate of the matview is triggerd > by REFRESH command with INCREMENTALLY option. > > However, we realize problems of our implementation. First, WITH OIDS will > be removed since PG12, so OIDs are no longer available. Besides this, it > would > be hard to implement this since it needs many changes of executor nodes to > collect base tables's OIDs during execuing a query. Also, the cost of > maintaining > OID map would be high. > > For these reasons, we started to think to implement IVM without relying on > OIDs > and made a bit more surveys. > > We also looked at Kevin Grittner's discussion [4] on incremental matview > maintenance. In this discussion, Kevin proposed to use counting algorithm > [5] > to handle projection views (using DISTNICT) properly. This algorithm need > an > additional system column, count_t, in materialized views and delta tables > of > base tables. > > However, the discussion about IVM is now stoped, so we would like to > restart and > progress this. > > > Through our PoC inplementation and surveys, I think we need to think at > least > the followings for implementing IVM. > > 1. How to extract changes on base tables > > I think there would be at least two approaches for it. > > - Using transition table in AFTER triggers > - Extracting changes from WAL using logical decoding > > In our PoC implementation, we used AFTER trigger and transition tables, > but using > logical decoding might be better from the point of performance of base > table > modification. > > If we can represent a change of UPDATE on a base table as query-like > rather than > OLD and NEW, it may be possible to update the materialized view directly > instead > of performing delete & insert. > > > 2. How to compute the delta to be applied to materialized views > > Essentially, IVM is based on relational algebra. Theorically, changes on > base > tables are represented as deltas on this, like "R <- R + dR", and the > delta on > the materialized view is computed using base table deltas based on "change > propagation equations". For implementation, we have to derive the
Re: isTempNamespaceInUse() is incorrect with its handling of MyBackendId
On Tue, Jan 14, 2020 at 07:23:19AM +0900, Michael Paquier wrote: > Yes, I'd rather keep this routine in its simplest shape for now. If > the optimization makes sense, though in most cases it won't because it > just helps sessions to detect faster their own temp schema, then let's > do it. I'll let this patch aside for a couple of days to let others > comment on it, and if there are no objections, I'll commit the fix. > Thanks for the lookup! For the archive's sake: this has been fixed with ac5bdf6, down to 11. -- Michael signature.asc Description: PGP signature
Re: Reorderbuffer crash during recovery
On Tue, Dec 31, 2019 at 11:35 AM vignesh C wrote: > > On Mon, Dec 30, 2019 at 11:17 AM Amit Kapila wrote: > > > > On Fri, Dec 27, 2019 at 8:37 PM Alvaro Herrera > > wrote: > > > > > > On 2019-Dec-27, vignesh C wrote: > > > > > > > I felt amit solution also solves the problem. Attached patch has the > > > > fix based on the solution proposed. > > > > Thoughts? > > > > > > This seems a sensible fix to me, though I didn't try to reproduce the > > > failure. > > > > > > > @@ -2472,6 +2457,7 @@ ReorderBufferSerializeTXN(ReorderBuffer *rb, > > > > ReorderBufferTXN *txn) > > > > } > > > > > > > > ReorderBufferSerializeChange(rb, txn, fd, change); > > > > + txn->final_lsn = change->lsn; > > > > dlist_delete(&change->node); > > > > ReorderBufferReturnChange(rb, change); > > > > > > Should this be done insider ReorderBufferSerializeChange itself, instead > > > of in its caller? > > > > > > > makes sense. But, I think we should add a comment specifying the > > reason why it is important to set final_lsn while serializing the > > change. > > Fixed > > > > Also, would it be sane to verify that the TXN > > > doesn't already have a newer final_lsn? Maybe as an Assert. > > > > > > > I don't think this is a good idea because we update the final_lsn with > > commit_lsn in ReorderBufferCommit after which we can try to serialize > > the remaining changes. Instead, we should update it only if the > > change_lsn value is greater than final_lsn. > > > > Fixed. > Thanks Alvaro & Amit for your suggestions. I have made the changes > based on your suggestions. Please find the updated patch for the same. > I have also verified the patch in back branches. Separate patch was > required for Release-10 branch, patch for the same is attached as > 0001-Reorder-buffer-crash-while-aborting-old-transactions-REL_10.patch. > Thoughts? One minor comment. Otherwise, the patch looks fine to me. + /* + * We set final_lsn on a transaction when we decode its commit or abort + * record, but we never see those records for crashed transactions. To + * ensure cleanup of these transactions, set final_lsn to that of their + * last change; this causes ReorderBufferRestoreCleanup to do the right + * thing. Final_lsn would have been set with commit_lsn earlier when we + * decode it commit, no need to update in that case + */ + if (txn->final_lsn < change->lsn) + txn->final_lsn = change->lsn; /decode it commit,/decode its commit, -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: making the backend's json parser work in frontend code
On Wed, Jan 15, 2020 at 09:39:13PM -0500, Robert Haas wrote: > On Wed, Jan 15, 2020 at 6:40 PM Andres Freund wrote: >> It's not obvious why the better approach here wouldn't be to just have a >> very simple ereport replacement, that needs to be explicitly included >> from frontend code. It'd not be meaningfully harder, imo, and it'd >> require fewer adaptions, and it'd look more familiar. > > I agree that it's far from obvious that the hacks in the patch are > best; to the contrary, they are hacks. That said, I feel that the > semantics of throwing an error are not very well-defined in a > front-end environment. I mean, in a backend context, throwing an error > is going to abort the current transaction, with all that this implies. > If the frontend equivalent is to do nothing and hope for the best, I > doubt it will survive anything more than the simplest use cases. This > is one of the reasons I've been very reluctant to go do down this > whole path in the first place. The error handling is a well defined concept in the backend. If connected to a database, you know that a session has to rollback any existing activity, etc. The clients have to be more flexible because an error depends a lot of how the tools is designed and how it should react on a error. So the backend code in charge of logging an error does the best it can: it throws an error, then lets the caller decide what to do with it. I agree with the feeling that having a simple replacement for ereport() in the frontend would be nice, that would be less code churn in parts shared by backend/frontend. > Not sure how. pg_mblen() and PQmblen() are both existing interfaces, > and they're not compatible with each other. I guess we could make > PQmblen() available to backend code, but given that the function name > implies an origin in libpq, that seems wicked confusing. Well, the problem here is the encoding part, and the code looks at the same table pg_wchar_table[] at the end, so this needs some thoughts. On top of that, we don't know exactly on the client what kind of encoding is available (this led for example to several assumptions/hiccups behind the implementation of SCRAM as it requires UTF-8 per its RFC when working on the libpq part). -- Michael signature.asc Description: PGP signature
Re: Setting min/max TLS protocol in clientside libpq
On Wed, Jan 15, 2020 at 02:58:09PM +0900, Michael Paquier wrote: > On Tue, Jan 14, 2020 at 11:01:00PM +0100, Daniel Gustafsson wrote: >> Files renamed to match existing naming convention, the rest of the patch left >> unchanged. > > [previous review] One thing I remembered after sleeping on it is that we can split the patch into two parts: the refactoring pieces and the addition of the options for libpq. The previous review mostly impacts the libpq part, and the split is straight-forward, so attached is a patch for only the refactoring pieces with some fixes and tweaks. I have tested it with and without OpenSSL, using 1.0.2 and 1.1.0 on Linux and Windows (MSVC). Those tests have allowed me to find an error in the previous patch that I missed: the new files openssl.h and protocol_openssl.c still declared SSL_CTX_set_min/max_proto_version as static functions, so compilation was broken when trying to use OpenSSL <= 1.0.2. If that looks fine, I would like to get that part committed first. Daniel, any thoughts? -- Michael diff --git a/src/include/common/openssl.h b/src/include/common/openssl.h new file mode 100644 index 00..47fa129994 --- /dev/null +++ b/src/include/common/openssl.h @@ -0,0 +1,28 @@ +/*- + * + * openssl.h + * OpenSSL supporting functionality shared between frontend and backend + * + * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group + * Portions Copyright (c) 1994, Regents of the University of California + * + * IDENTIFICATION + * src/include/common/openssl.h + * + *- + */ +#ifndef COMMON_OPENSSL_H +#define COMMON_OPENSSL_H + +#ifdef USE_OPENSSL +#include + +/* src/common/protocol_openssl.c */ +#ifndef SSL_CTX_set_min_proto_version +extern int SSL_CTX_set_min_proto_version(SSL_CTX *ctx, int version); +extern int SSL_CTX_set_max_proto_version(SSL_CTX *ctx, int version); +#endif + +#endif + +#endif /* COMMON_OPENSSL_H */ diff --git a/src/backend/libpq/be-secure-openssl.c b/src/backend/libpq/be-secure-openssl.c index 62f1fcab2b..0cc59f1be1 100644 --- a/src/backend/libpq/be-secure-openssl.c +++ b/src/backend/libpq/be-secure-openssl.c @@ -36,6 +36,7 @@ #include #endif +#include "common/openssl.h" #include "libpq/libpq.h" #include "miscadmin.h" #include "pgstat.h" @@ -69,11 +70,6 @@ static bool ssl_is_server_start; static int ssl_protocol_version_to_openssl(int v, const char *guc_name, int loglevel); -#ifndef SSL_CTX_set_min_proto_version -static int SSL_CTX_set_min_proto_version(SSL_CTX *ctx, int version); -static int SSL_CTX_set_max_proto_version(SSL_CTX *ctx, int version); -#endif - /* */ /* Public interface */ @@ -1314,96 +1310,3 @@ ssl_protocol_version_to_openssl(int v, const char *guc_name, int loglevel) GetConfigOption(guc_name, false, false; return -1; } - -/* - * Replacements for APIs present in newer versions of OpenSSL - */ -#ifndef SSL_CTX_set_min_proto_version - -/* - * OpenSSL versions that support TLS 1.3 shouldn't get here because they - * already have these functions. So we don't have to keep updating the below - * code for every new TLS version, and eventually it can go away. But let's - * just check this to make sure ... - */ -#ifdef TLS1_3_VERSION -#error OpenSSL version mismatch -#endif - -static int -SSL_CTX_set_min_proto_version(SSL_CTX *ctx, int version) -{ - int ssl_options = SSL_OP_NO_SSLv2 | SSL_OP_NO_SSLv3; - - if (version > TLS1_VERSION) - ssl_options |= SSL_OP_NO_TLSv1; - /* - * Some OpenSSL versions define TLS*_VERSION macros but not the - * corresponding SSL_OP_NO_* macro, so in those cases we have to return - * unsuccessfully here. - */ -#ifdef TLS1_1_VERSION - if (version > TLS1_1_VERSION) - { -#ifdef SSL_OP_NO_TLSv1_1 - ssl_options |= SSL_OP_NO_TLSv1_1; -#else - return 0; -#endif - } -#endif -#ifdef TLS1_2_VERSION - if (version > TLS1_2_VERSION) - { -#ifdef SSL_OP_NO_TLSv1_2 - ssl_options |= SSL_OP_NO_TLSv1_2; -#else - return 0; -#endif - } -#endif - - SSL_CTX_set_options(ctx, ssl_options); - - return 1; /* success */ -} - -static int -SSL_CTX_set_max_proto_version(SSL_CTX *ctx, int version) -{ - int ssl_options = 0; - - AssertArg(version != 0); - - /* - * Some OpenSSL versions define TLS*_VERSION macros but not the - * corresponding SSL_OP_NO_* macro, so in those cases we have to return - * unsuccessfully here. - */ -#ifdef TLS1_1_VERSION - if (version < TLS1_1_VERSION) - { -#ifdef SSL_OP_NO_TLSv1_1 - ssl_options |= SSL_OP_NO_TLSv1_1; -#else - return 0; -#endif - } -#endif -#ifdef TLS1_2_VERSION - if (version < TLS1_2_VERSION) - { -#ifdef SSL_OP_NO_TLSv1_2 - ssl_options |= SSL_OP_NO_TLSv1_2; -#else - return 0; -#endif - } -#endif - - SSL_CTX_set_options(ctx, ssl_options); - - return 1; /* success */ -} - -#endif /* !SSL_CTX_set_min_
Re: aggregate crash
Hi, On 2020-01-15 12:47:47 -0800, Andres Freund wrote: > FWIW, I'm working on narrowing it down to something small. I can > reliably trigger the bug, and I understand the mechanics, I > think. Interestingly enough the reproducer currently only triggers on > v12, not on v11 and before. That's just happenstance due to allocation changes in plpgsql, though. The attached small reproducer, for me, reliably triggers crashes on 10 - master. It's hard to hit intentionally, because plpgsql does a datumCopy() to its non-null return value, which means that to hit the bug, one needs different numbers of allocations between setting up the transition value with transvalueisnull = true, transvalue = 0xsomepointer (because plpgsql doesn't copy NULLs), and the transition output with transvalueisnull = false, transvalue = 0xsomepointer. Which is necessary to trigger the bug, as it's then not reparented into a long lived enough context. To be then freed/accessed for the next group input value. I think this is too finnicky to actually keep as a regression test. The bug, in a way, exists all the way back, but it's a bit harder to create NULL values where the datum component isn't 0. To fix I suggest we, in all branches, do the equivalent of adding something like: diff --git i/src/backend/executor/execExprInterp.c w/src/backend/executor/execExprInterp.c index 790380051be..3260a63ac6b 100644 --- i/src/backend/executor/execExprInterp.c +++ w/src/backend/executor/execExprInterp.c @@ -4199,6 +4199,12 @@ ExecAggTransReparent(AggState *aggstate, AggStatePerTrans pertrans, pertrans->transtypeByVal, pertrans->transtypeLen); } +else +{ +/* ensure datum component is 0 for NULL transition values */ +newValue = (Datum) 0; +} + if (!oldValueIsNull) { if (DatumIsReadWriteExpandedObject(oldValue, and a comment explaining why it's (now) safe to rely on datum comparisons for if (DatumGetPointer(newVal) != DatumGetPointer(pergroup->transValue)) I don't think it makes sense to add something like it to the byval case - there's plenty other ways a function returning != 0 with fcinfo->isnull == true can cause such values to exist. And that's longstanding. A separate question is whether it's worth adding code to e.g. EEO_CASE(EEOP_FUNCEXPR_STRICT) also resetting *op->resvalue to (Datum) 0. I don't personally don't think ensuring the datum is always 0 when isnull true is all that helpful, if we can't guarantee it everywhere. So I'm a bit loathe to add cycles to places that don't need it, and are hot. Regards, Andres xrepro.sql Description: application/sql
Re: pgindent && weirdness
On Wed, Jan 15, 2020 at 11:30 AM Tom Lane wrote: > Alvaro Herrera writes: > > I just ran pgindent over some patch, and noticed that this hunk ended up > > in my working tree: > > > - if (IsA(leftop, Var) && IsA(rightop, Const)) > > + if (IsA(leftop, Var) &&IsA(rightop, Const)) > > Yeah, it's been doing that for decades. I think the triggering > factor is the typedef name (Var, here) preceding the &&. > > It'd be nice to fix properly, but I've tended to take the path > of least resistance by breaking such lines to avoid the ugliness: > > if (IsA(leftop, Var) && > IsA(rightop, Const)) I am on vacation away from the Internet this week but somehow saw this on my phone and couldn't stop myself from peeking at pg_bsd_ident again. Yeah, "(Var)" (where Var is a known typename) causes it to think that any following operator must be unary. One way to fix that in the cases Alvaro is referring to is to tell override the setting so that && (and likewise ||) are never considered to be unary, though I haven't tested this much and there are surely other ways to achieve this: diff --git a/lexi.c b/lexi.c index d43723c..6de3227 100644 --- a/lexi.c +++ b/lexi.c @@ -655,6 +655,12 @@ stop_lit: unary_delim = state->last_u_d; break; } + + /* && and || are never unary */ + if ((token[0] == '&' && *buf_ptr == '&') || + (token[0] == '|' && *buf_ptr == '|')) + state->last_u_d = false; + while (*(e_token - 1) == *buf_ptr || *buf_ptr == '=') { /* * handle ||, &&, etc, and also things as in int *i The problem with that is that && sometimes *should* be formatted like a unary operator: when it's part of the nonstandard GCC computed goto syntax.
Re: [HACKERS] Block level parallel vacuum
On Thu, Jan 16, 2020 at 1:02 AM Mahendra Singh Thalor wrote: > > On Wed, 15 Jan 2020 at 19:31, Mahendra Singh Thalor > wrote: > > > > On Wed, 15 Jan 2020 at 19:04, Mahendra Singh Thalor > > wrote: > > > > > > > > > I reviewed v48 patch and below are some comments. > > > > > > 1. > > > +* based on the number of indexes. -1 indicates a parallel vacuum is > > > > > > I think, above should be like "-1 indicates that parallel vacuum is" > > > I am not an expert in this matter, but I am not sure if your suggestion is correct. I thought an article is required here, but I could be wrong. Can you please clarify? > > > 2. > > > +/* Variables for cost-based parallel vacuum */ > > > > > > At the end of comment, there is 2 spaces. I think, it should be only 1 > > > space. > > > > > > 3. > > > I think, we should add a test case for parallel option(when degree is not > > > specified). > > > Ex: > > > postgres=# VACUUM (PARALLEL) tmp; > > > ERROR: parallel option requires a value between 0 and 1024 > > > LINE 1: VACUUM (PARALLEL) tmp; > > > ^ > > > postgres=# > > > > > > Because above error is added in this parallel patch, so we should have > > > test case for this to increase code coverage. > > > I thought about it but was not sure to add a test for it. We might not want to add a test for each and every case as that will increase the number and time of tests without a significant advantage. Now that you have pointed this, I can add a test for it unless someone else thinks otherwise. > > 1. > With v45 patch, compute_parallel_delay is never called so function hit > is zero. I think, we can add some delay options into vacuum.sql test > to hit function. > But how can we meaningfully test the functionality of the delay? It would be tricky to come up with a portable test that can always produce consistent results. > 2. > I checked time taken by vacuum.sql test. Execution time is almost same > with and without v45 patch. > > Without v45 patch: > Run1) vacuum ... ok 701 ms > Run2) vacuum ... ok 549 ms > Run3) vacuum ... ok 559 ms > Run4) vacuum ... ok 480 ms > > With v45 patch: > Run1) vacuum ... ok 842 ms > Run2) vacuum ... ok 808 ms > Run3) vacuum ... ok 774 ms > Run4) vacuum ... ok 792 ms > I see some variance in results, have you run with autovacuum as off. I was expecting that this might speed up some cases where parallel vacuum is used by default. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: making the backend's json parser work in frontend code
On Wed, Jan 15, 2020 at 6:40 PM Andres Freund wrote: > It's not obvious why the better approach here wouldn't be to just have a > very simple ereport replacement, that needs to be explicitly included > from frontend code. It'd not be meaningfully harder, imo, and it'd > require fewer adaptions, and it'd look more familiar. I agree that it's far from obvious that the hacks in the patch are best; to the contrary, they are hacks. That said, I feel that the semantics of throwing an error are not very well-defined in a front-end environment. I mean, in a backend context, throwing an error is going to abort the current transaction, with all that this implies. If the frontend equivalent is to do nothing and hope for the best, I doubt it will survive anything more than the simplest use cases. This is one of the reasons I've been very reluctant to go do down this whole path in the first place. > > +#ifdef FRONTEND > > + lex->token_terminator = s + > > PQmblen(s, PG_UTF8); > > +#else > > lex->token_terminator = s + > > pg_mblen(s); > > +#endif > > If we were to go this way, it seems like the ifdef should rather be in a > helper function, rather than all over. Sure... like I said, this is just to illustrate the problem. > It seems like it should be > unproblematic to have a common interface for both frontend/backend? Not sure how. pg_mblen() and PQmblen() are both existing interfaces, and they're not compatible with each other. I guess we could make PQmblen() available to backend code, but given that the function name implies an origin in libpq, that seems wicked confusing. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: src/test/recovery regression failure on bionic
On Thu, Jan 16, 2020 at 2:10 AM Christoph Berg wrote: > > Re: Amit Kapila 2020-01-09 > > > The point is that we know what is going wrong on sidewinder on back > > branches. However, we still don't know what is going wrong with tern > > and mandrill on v10 [1][2] where the log is: > > Fwiw, the problem on bionic disappeared yesterday with the build > triggered by "Revert test added by commit d207038053". > Thanks for the update. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: Improve errors when setting incorrect bounds for SSL protocols
On Wed, Jan 15, 2020 at 06:34:39PM +0100, Daniel Gustafsson wrote: > This is pretty much exactly the patch I was intending to write for this, so +1 > from me. Thanks for the review. Let's wait a couple of days to see if others have objections or more comments about this patch, but I'd like to fix the issue and backpatch down to 12 where the parameters have been introduced. -- Michael signature.asc Description: PGP signature
Re: [PATCH v1] pg_ls_tmpdir to show directories
On Wed, Jan 15, 2020 at 11:21:36AM +0100, Fabien COELHO wrote: > I'm trying to think about how to get rid of the strange structure and hacks, > and the arbitrary looking size 2 array. > > Also the recursion is one step, but I'm not sure why, ISTM it could/should > go on always? Because tmpfiles only go one level deep. > Looking at the code, ISTM that relying on a stack/list would be much cleaner > and easier to understand. The code could look like: I'm willing to change the implementation, but only after there's an agreement about the desired behavior (extra column, one level, etc). Justin
Re: Use compiler intrinsics for bit ops in hash
On Tue, Jan 14, 2020 at 2:09 PM David Fetter wrote: > > The changes in hash AM and SIMPLEHASH do look like a net positive > > improvement. My biggest cringe might be in pg_bitutils: > > > > 1. Is ceil_log2_64 dead code? > > Let's call it nascent code. I suspect there are places it could go, if > I look for them. Also, it seemed silly to have one without the other. > While not absolutely required, I'd like us to find at least one place and start using it. (Clang also nags at me when we have unused functions). > On Tue, Jan 14, 2020 at 12:21:41PM -0800, Jesse Zhang wrote: > > 4. It seems like you *really* would like an operation like LZCNT in x86 > > (first appearing in Haswell) that is well defined on zero input. ISTM > > the alternatives are: > > > >a) Special case 1. That seems straightforward, but the branching cost > >on a seemingly unlikely condition seems to be a lot of performance > >loss > > > >b) Use architecture specific intrinsic (and possibly with CPUID > >shenanigans) like __builtin_ia32_lzcnt_u64 on x86 and use the CLZ > >intrinsic elsewhere. The CLZ GCC intrinsic seems to map to > >instructions that are well defined on zero in most ISA's other than > >x86, so maybe we can get away with special-casing x86? i. We can detect LZCNT instruction by checking one of the "extended feature" (EAX=8001) bits using CPUID. Unlike the "basic features" (EAX=1), extended feature flags have been more vendor-specific, but fortunately it seems that the feature bit for LZCNT is the same [1][2]. ii. We'll most likely still need to provide a fallback implementation for processors that don't have LZCNT (either because they are from a different vendor, or an older Intel/AMD processor). I wonder if simply checking for 1 is "good enough". Maybe a micro benchmark is in order? > Is there some way to tilt the tools so that this happens? We have a couple options here: 1. Use a separate object (a la our SSE 4.2 implemenation of CRC). On Clang and GCC (I don't have MSVC at hand), -mabm or -mlzcnt should cause __builtin_clz to generate the LZCNT instruction, which is well defined on zero input. The default configuration would translate __builtin_clz to code that subtracts BSR from the width of the input, but BSR leaves the destination undefined on zero input. 2. (My least favorite) use inline asm (a la our popcount implementation). > b) seems much more attractive. Is there some way to tilt the tools so > that this happens? What should I be reading up on? The enclosed references hopefully are good places to start. Let me know if you have more ideas. Cheers, Jesse References: [1] "How to detect New Instruction support in the 4th generation Intel® Core™ processor family" https://software.intel.com/en-us/articles/how-to-detect-new-instruction-support-in-the-4th-generation-intel-core-processor-family [2] "Bit Manipulation Instruction Sets" https://en.wikipedia.org/wiki/Bit_Manipulation_Instruction_Sets
Re: making the backend's json parser work in frontend code
Hi, On 2020-01-15 16:02:49 -0500, Robert Haas wrote: > The discussion on the backup manifest thread has gotten bogged down on > the issue of the format that should be used to store the backup > manifest file. I want something simple and ad-hoc; David Steele and > Stephen Frost prefer JSON. That is problematic because our JSON parser > does not work in frontend code, and I want to be able to validate a > backup against its manifest, which involves being able to parse the > manifest from frontend code. The latest development over there is that > David Steele has posted the JSON parser that he wrote for pgbackrest > with an offer to try to adapt it for use in front-end PostgreSQL code, > an offer which I genuinely appreciate. I'll write more about that over > on that thread. I'm not sure where I come down between using json and a simple ad-hoc format, when the dependency for the former is making the existing json parser work in the frontend. But if the alternative is to add a second json parser, it very clearly shifts towards using an ad-hoc format. Having to maintain a simple ad-hoc parser is a lot less technical debt than having a second full blown json parser. Imo even when an external project or three also has to have that simple parser. If the alternative were to use that newly proposed json parser to *replace* the backend one too, the story would again be different. > 0001 moves wchar.c from src/backend/utils/mb to src/common. Unless I'm > missing something, this seems like an overdue cleanup. It's long been > the case that wchar.c is actually compiled and linked into both > frontend and backend code. Commit > 60f11b87a2349985230c08616fa8a34ffde934c8 added code into src/common > that depends on wchar.c being available, but didn't actually make > wchar.c part of src/common, which seems like an odd decision: the > functions in the library are dependent on code that is not part of any > library but whose source files get copied around where needed. Eh? Cool. > 0002 does some basic header cleanup to make it possible to include the > existing header file jsonapi.h in frontend code. The state of the JSON > headers today looks generally poor. There seems not to have been much > attempt to get the prototypes for a given source file, say foo.c, into > a header file with the same name, say foo.h. Also, dependencies > between various header files seem to be have added somewhat freely. > This patch does not come close to fixing all that, but I consider it a > modest down payment on a cleanup that probably ought to be taken > further. Yea, this seems like a necessary cleanup (or well, maybe the start of it). > 0003 splits json.c into two files, json.c and jsonapi.c. All the > lexing and parsing stuff (whose prototypes are in jsonapi.h) goes into > jsonapi.c, while the stuff that pertains to the 'json' data type > remains in json.c. This also seems like a good cleanup, because to me, > at least, it's not a great idea to mix together code that is used by > both the json and jsonb data types as well as other things in the > system that want to generate or parse json together with things that > are specific to the 'json' data type. +1 > On the other hand, 0004, 0005, and 0006 are charitably described as > experimental or WIP. 0004 and 0005 hack up jsonapi.c so that it can > still be compiled even if #include "postgres.h" is changed to #include > "postgres-fe.h" and 0006 moves it into src/common. Note that I say > that they make it compile, not work. It's not just untested; it's > definitely broken. But it gives a feeling for what the remaining > obstacles to making this code available in a frontend environment are. > Since I wrote my very first email complaining about the difficulty of > making the backend's JSON parser work in a frontend environment, one > obstacle has been knocked down: StringInfo is now available in > front-end code (commit 26aaf97b683d6258c098859e6b1268e1f5da242f). The > remaining problems (that I know about) have to do with error reporting > and multibyte character support; a read of the patches is suggested > for those wanting further details. > From d05e1fc82a51cb583a0367e72b1afc0de561dd00 Mon Sep 17 00:00:00 2001 > From: Robert Haas > Date: Wed, 15 Jan 2020 10:36:52 -0500 > Subject: [PATCH 4/6] Introduce json_error() macro. > > --- > src/backend/utils/adt/jsonapi.c | 221 +--- > 1 file changed, 90 insertions(+), 131 deletions(-) > > diff --git a/src/backend/utils/adt/jsonapi.c b/src/backend/utils/adt/jsonapi.c > index fc8af9f861..20f7f0f7ac 100644 > --- a/src/backend/utils/adt/jsonapi.c > +++ b/src/backend/utils/adt/jsonapi.c > @@ -17,6 +17,9 @@ > #include "miscadmin.h" > #include "utils/jsonapi.h" > > +#define json_error(rest) \ > + ereport(ERROR, (rest, report_json_context(lex))) > + It's not obvious why the better approach here wouldn't be to just have a very simple ereport replacement, that needs to be explicitly included from frontend
Enabling B-Tree deduplication by default
There are some outstanding questions about how B-Tree deduplication [1] should be configured, and whether or not it should be enabled by default. I'm starting this new thread in the hopes of generating discussion on these high level questions. The commit message of the latest version of the patch has a reasonable summary of its overall design, that might be worth reviewing before reading on: https://www.postgresql.org/message-id/CAH2-Wz=Tr6mxMsKRmv_=9-05_o9qwqozq8gwerv2dxs6+y3...@mail.gmail.com (If you want to go deeper than that, check out the nbtree README changes.) B-Tree deduplication comes in two basic varieties: * Unique index deduplication. * Non-unique index deduplication. Each variety works in essentially the same way. However, they differ in how and when deduplication is actually applied. We can infer quite a lot about versioning within a unique index, and we know that version churn is the only problem that we should try to address -- it's not really about space efficiency, since we know that there won't be any duplicates in the long run. Also, workloads that have a lot of version churn in unique indexes are generally quite reliant on LP_DEAD bit setting within _bt_check_unique() (this is not to be confused with the similar kill_prior_tuple LP_DEAD bit optimization) -- we need to be sensitive to that. Despite the fact that there are many more similarities than differences here, I would like to present each variety as a different thing to users (actually, I don't really want users to have to think about unique index deduplication at all). I believe that the best path forward for users is to make deduplication in non-unique indexes a user-visible feature with documented knobs (a GUC and an index storage parameter), while leaving unique index deduplication as an internal thing that is barely documented in the sgml user docs (I'll have a paragraph about it in the B-Tree internals chapter). Unique index deduplication won't care about the GUC, and will only trigger a deduplication pass when the incoming tuple is definitely a duplicate of an existing tuple (this signals version churn cheaply but reliably). The index storage parameter will be respected with unique indexes, but only as a debugging option -- our presumption is that nobody will want to disable deduplication in unique indexes, since leaving it on has virtually no downside (because we know exactly when to trigger it, and when not to). Unique index deduplication is an internal thing that users benefit from, but aren't really aware of, much like opportunistic deletion of LP_DEAD items. (A secondary benefit of this approach is that we don't have to have an awkward section in the documentation that explains why deduplication in unique indexes isn't actually an oxymoron.) Thoughts on presenting unique index deduplication a separate internal-only optimization, that doesn't care about the GUC? I also want to talk about a related but separate topic. I propose that the deduplicate_btree_items GUC (which only affects non-unique indexes) be turned on by default in Postgres 13. This can be reviewed at the end of the beta period, just like it was with parallelism and with JIT. Note that this means that we'd opportunistically perform a deduplication pass at the point that we usually have to split the page, even though in general there is no reason to think that that will work out. (We have no better way of applying deduplication than "try it and see", unless it's a unique index that has the version churn trigger heuristic that I just described.) Append-only tables with multiple non-unique indexes that happen to have only unique key values will pay a cost without seeing a benefit. I believe that I saw a 3% throughput cost when I assessed this using something called the "insert benchmark" [2] a couple of months ago (this is maintained by Mark Callaghan, the Facebook MyRocks guy). I need to do more work on quantifying the cost with a recent version of the patch, especially the cost of only having one LP_DEAD bit for an entire posting list tuple, but in general I think that this is likely to be worth it. Consider these points in favor of enabling deduplication by default: * This insert-only workload is something that Postgres does particularly well with -- append-only tables are reported to have greater throughput in Postgres than in other comparable RDBMSs. Presumably this is because we don't need UNDO logs, and don't do index key locking in the same way (apparently even Oracle has locks in indexes that protect logical transaction state). We are paying a small cost by enabling deduplication by default, but it is paid in an area where we already do particularly well. * Deduplication can be turned off at any time. nbtree posting list tuples are really not like the ones in GIN, in that we're not maintaining them over time. A deduplication pass fundamentally works by generating an alternative physical representation for the same logical contents --
Re: making the backend's json parser work in frontend code
Robert Haas writes: > ... However, I decided to spend today doing some further > investigation of an alternative approach, namely making the backend's > existing JSON parser work in frontend code as well. I did not solve > all the problems there, but I did come up with some patches which I > think would be worth committing on independent grounds, and I think > the whole series is worth posting. So here goes. In general, if we can possibly get to having one JSON parser in src/common, that seems like an obviously better place to be than having two JSON parsers. So I'm encouraged that it might be feasible after all. > 0001 moves wchar.c from src/backend/utils/mb to src/common. Unless I'm > missing something, this seems like an overdue cleanup. FWIW, I've been wanting to do that for awhile. I've not studied your patch, but +1 for the idea. We might also need to take a hard look at mbutils.c to see if any of that code can/should move. > Since I wrote my very first email complaining about the difficulty of > making the backend's JSON parser work in a frontend environment, one > obstacle has been knocked down: StringInfo is now available in > front-end code (commit 26aaf97b683d6258c098859e6b1268e1f5da242f). The > remaining problems (that I know about) have to do with error reporting > and multibyte character support; a read of the patches is suggested > for those wanting further details. The patch I just posted at <2863.1579127...@sss.pgh.pa.us> probably affects this in small ways, but not anything major. regards, tom lane
Re: Unicode escapes with any backend encoding
I wrote: > Andrew Dunstan writes: >> Perhaps I expressed myself badly. What I meant was that we should keep >> the json and text escape rules in sync, as they are now. Since we're >> changing the text rules to allow resolvable non-ascii unicode escapes >> in non-utf8 locales, we should do the same for json. > Got it. I'll make the patch do that in a little bit. OK, here's v2, which brings JSONB into the fold and also makes some effort to produce an accurate error cursor for invalid Unicode escapes. As it's set up, we only pay the extra cost of setting up an error context callback when we're actually processing a Unicode escape, so I think that's an acceptable cost. (It's not much of a cost, anyway.) The callback support added here is pretty much a straight copy-and-paste of the existing functions setup_parser_errposition_callback() and friends. That's slightly annoying --- we could perhaps merge those into one. But I didn't see a good common header to put such a thing into, so I just did it like this. Another note is that we could use the additional scanner infrastructure to produce more accurate error pointers for other cases where we're whining about a bad escape sequence, or some other sub-part of a lexical token. I think that'd likely be a good idea, since the existing cursor placement at the start of the token isn't too helpful if e.g. you're dealing with a very long string constant. But to keep this focused, I only touched the behavior for Unicode escapes. The rest could be done as a separate patch. This also mops up after 7f380c59 by making use of the new pg_wchar.c exports is_utf16_surrogate_first() etc everyplace that they're relevant (which is just the JSON code I was touching anyway, as it happens). I also made a bit of an effort to ensure test coverage of all the code touched in that patch and this one. regards, tom lane diff --git a/doc/src/sgml/json.sgml b/doc/src/sgml/json.sgml index 6ff8751..0f0d0c6 100644 --- a/doc/src/sgml/json.sgml +++ b/doc/src/sgml/json.sgml @@ -61,8 +61,8 @@ - PostgreSQL allows only one character set - encoding per database. It is therefore not possible for the JSON + RFC 7159 specifies that JSON strings should be encoded in UTF8. + It is therefore not possible for the JSON types to conform rigidly to the JSON specification unless the database encoding is UTF8. Attempts to directly include characters that cannot be represented in the database encoding will fail; conversely, @@ -77,13 +77,13 @@ regardless of the database encoding, and are checked only for syntactic correctness (that is, that four hex digits follow \u). However, the input function for jsonb is stricter: it disallows - Unicode escapes for non-ASCII characters (those above U+007F) - unless the database encoding is UTF8. The jsonb type also + Unicode escapes for characters that cannot be represented in the database + encoding. The jsonb type also rejects \u (because that cannot be represented in PostgreSQL's text type), and it insists that any use of Unicode surrogate pairs to designate characters outside the Unicode Basic Multilingual Plane be correct. Valid Unicode escapes - are converted to the equivalent ASCII or UTF8 character for storage; + are converted to the equivalent single character for storage; this includes folding surrogate pairs into a single character. @@ -96,9 +96,8 @@ not jsonb. The fact that the json input function does not make these checks may be considered a historical artifact, although it does allow for simple storage (without processing) of JSON Unicode - escapes in a non-UTF8 database encoding. In general, it is best to - avoid mixing Unicode escapes in JSON with a non-UTF8 database encoding, - if possible. + escapes in a database encoding that does not support the represented + characters. @@ -144,8 +143,8 @@ string text -\u is disallowed, as are non-ASCII Unicode - escapes if database encoding is not UTF8 +\u is disallowed, as are Unicode escapes + representing characters not available in the database encoding number diff --git a/doc/src/sgml/syntax.sgml b/doc/src/sgml/syntax.sgml index c908e0b..e134877 100644 --- a/doc/src/sgml/syntax.sgml +++ b/doc/src/sgml/syntax.sgml @@ -189,6 +189,23 @@ UPDATE "my_table" SET "a" = 5; ampersands. The length limitation still applies. + +Quoting an identifier also makes it case-sensitive, whereas +unquoted names are always folded to lower case. For example, the +identifiers FOO, foo, and +"foo" are considered the same by +PostgreSQL, but +"Foo" and "FOO" are +different from these three and each other. (The folding of +unquoted names to lower case in PostgreSQL is +incompatible with the SQL standard, which says that unquoted names +should be folded to upper case.
making the backend's json parser work in frontend code
The discussion on the backup manifest thread has gotten bogged down on the issue of the format that should be used to store the backup manifest file. I want something simple and ad-hoc; David Steele and Stephen Frost prefer JSON. That is problematic because our JSON parser does not work in frontend code, and I want to be able to validate a backup against its manifest, which involves being able to parse the manifest from frontend code. The latest development over there is that David Steele has posted the JSON parser that he wrote for pgbackrest with an offer to try to adapt it for use in front-end PostgreSQL code, an offer which I genuinely appreciate. I'll write more about that over on that thread. However, I decided to spend today doing some further investigation of an alternative approach, namely making the backend's existing JSON parser work in frontend code as well. I did not solve all the problems there, but I did come up with some patches which I think would be worth committing on independent grounds, and I think the whole series is worth posting. So here goes. 0001 moves wchar.c from src/backend/utils/mb to src/common. Unless I'm missing something, this seems like an overdue cleanup. It's long been the case that wchar.c is actually compiled and linked into both frontend and backend code. Commit 60f11b87a2349985230c08616fa8a34ffde934c8 added code into src/common that depends on wchar.c being available, but didn't actually make wchar.c part of src/common, which seems like an odd decision: the functions in the library are dependent on code that is not part of any library but whose source files get copied around where needed. Eh? 0002 does some basic header cleanup to make it possible to include the existing header file jsonapi.h in frontend code. The state of the JSON headers today looks generally poor. There seems not to have been much attempt to get the prototypes for a given source file, say foo.c, into a header file with the same name, say foo.h. Also, dependencies between various header files seem to be have added somewhat freely. This patch does not come close to fixing all that, but I consider it a modest down payment on a cleanup that probably ought to be taken further. 0003 splits json.c into two files, json.c and jsonapi.c. All the lexing and parsing stuff (whose prototypes are in jsonapi.h) goes into jsonapi.c, while the stuff that pertains to the 'json' data type remains in json.c. This also seems like a good cleanup, because to me, at least, it's not a great idea to mix together code that is used by both the json and jsonb data types as well as other things in the system that want to generate or parse json together with things that are specific to the 'json' data type. As far as I know all three of the above patches are committable as-is; review and contrary opinions welcome. On the other hand, 0004, 0005, and 0006 are charitably described as experimental or WIP. 0004 and 0005 hack up jsonapi.c so that it can still be compiled even if #include "postgres.h" is changed to #include "postgres-fe.h" and 0006 moves it into src/common. Note that I say that they make it compile, not work. It's not just untested; it's definitely broken. But it gives a feeling for what the remaining obstacles to making this code available in a frontend environment are. Since I wrote my very first email complaining about the difficulty of making the backend's JSON parser work in a frontend environment, one obstacle has been knocked down: StringInfo is now available in front-end code (commit 26aaf97b683d6258c098859e6b1268e1f5da242f). The remaining problems (that I know about) have to do with error reporting and multibyte character support; a read of the patches is suggested for those wanting further details. Suggestions welcome. Thanks, -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company 0001-Move-wchar.c-to-src-common.patch Description: Binary data 0002-Adjust-src-include-utils-jsonapi.h-so-it-s-not-backe.patch Description: Binary data 0003-Split-JSON-lexer-parser-from-json-data-type-support.patch Description: Binary data 0004-Introduce-json_error-macro.patch Description: Binary data 0005-Frontendify-jsonapi.c.patch Description: Binary data 0006-Move-jsonapi.c-to-src-common.patch Description: Binary data
Re: Corruption with duplicate primary key
On Thu., December 12, 2019 at 5:25 PM, Tomas Vondra wrote: >On Wed, Dec 11, 2019 at 11:46:40PM +, Alex Adriaanse wrote: >>On Thu., December 5, 2019 at 5:45 PM, Tomas Vondra wrote: >>> At first I thought maybe this might be due to collations changing and >>> breaking the index silently. What collation are you using? >> >>We're using en_US.utf8. We did not make any collation changes to my >>knowledge. > >Well, the idea was more that glibc got updated and the collations >changed because of that (without PostgreSQL having a chance to even >notice that). Closing the loop on this, I've investigated this some more and it turns out this is exactly what happened. As you suspected, the issue had nothing to do with pg_upgrade or PG12, but rather the glibc upgrade that was seen in Debian Buster. The postgres:10 and postgres:11 images are based on Debian Stretch, whereas postgres:12 is based on Buster. When I kept the database on an older version of Postgres (10 or 11) but switched from the older Docker image to the postgres:12 or debian:buster(-slim) image, manually installing older Postgres packages inside those images, I saw index corruption there too. Thanks for the input! Alex
Re: aggregate crash
Hi, On 2020-01-14 23:27:02 -0800, Andres Freund wrote: > On 2020-01-14 17:54:16 -0500, Tom Lane wrote: > > Andres Freund writes: > > > I'm still not sure I actually fully understand the bug. It's obvious how > > > returning the input value again could lead to memory not being freed (so > > > that leak seems to go all the way back). And similarly, since the > > > introduction of expanded objects, it can also lead to the expanded > > > object not being deleted. > > > But that's not the problem causing the crash here. What I think must > > > instead be the problem is that pergroupstate->transValueIsNull, but > > > pergroupstate->transValue is set to something looking like a > > > pointer. Which caused us not to datumCopy() a new transition value into > > > a long lived context. and then a later transition causes us to free the > > > short-lived value? > > > > Yeah, I was kind of wondering that too. While formally the Datum value > > for a null is undefined, I'm not aware offhand of any functions that > > wouldn't return zero --- and this would have to be an aggregate transition > > function doing so, which reduces the universe of candidates quite a lot. > > Plus there's the question of how often a transition function would return > > null for non-null input at all. > > > > Could we see a test case that provokes this crash, even if it doesn't > > do so reliably? > > There's a larger reproducer referenced in the first message. I had hoped > that Teodor could narrow it down - I guess I'll try to do that tomorrow... FWIW, I'm working on narrowing it down to something small. I can reliably trigger the bug, and I understand the mechanics, I think. Interestingly enough the reproducer currently only triggers on v12, not on v11 and before. As you say, this requires a transition function returning a NULL that has the datum part set - the reproducer here defines a non-strict aggregate transition function that can indirectly do so: CREATE FUNCTION public.state_max_bytea(st bytea, inp bytea) RETURNS bytea LANGUAGE plpgsql AS $$ BEGIN if st is null then return inp; elseif std.func.fcinfo_data; NullableDatum *args = fcinfo->args; int argno; Datum d; /* strict function, so check for NULL args */ for (argno = 0; argno < op->d.func.nargs; argno++) { if (args[argno].isnull) { *op->resnull = true; goto strictfail; } } fcinfo->isnull = false; d = op->d.func.fn_addr(fcinfo); *op->resvalue = d; *op->resnull = fcinfo->isnull; strictfail: EEO_NEXT(); } I.e. if the transitions argument is a strict function, and that strict function is not evaluated because of a NULL input, we set op->resnull = true, but do *not* touch op->resvalue. If there was a previous row that actually set resvalue to something meaningful, we get an input to the transition function consisting out of the old resvalue (!= 0), but the new resnull = true. If the transition function returns that unchanged, ExecAggTransReparent() doesn't do anything, because the new value is null. Afterwards pergroup->transValue is set != 0, even though transValueIsNull = true. The somewhat tricky bit is arranging this to happen with pointers that are the same. I think I'm on the way to narrow that down, but it'll take me a bit longer. To fix this I think we should set newVal = 0 in ExecAggTransReparent()'s, as a new else to !newValueIsNull. That should not add any additional branches, I think. I contrast to always doing so when checking whether ExecAggTransReparent() ought to be called. Greetings, Andres Freund
Re: src/test/recovery regression failure on bionic
Re: Amit Kapila 2020-01-09 > The point is that we know what is going wrong on sidewinder on back > branches. However, we still don't know what is going wrong with tern > and mandrill on v10 [1][2] where the log is: Fwiw, the problem on bionic disappeared yesterday with the build triggered by "Revert test added by commit d207038053". Christoph
Re: Append with naive multiplexing of FDWs
On Tue, Jan 14, 2020 at 02:37:48PM +0500, Ahsan Hadi wrote: > Summary > The patch is pretty good, it works well when there were little data back to > the parent node. The patch doesn’t provide parallel FDW scan, it ensures > that child nodes can send data to parent in parallel but the parent can only > sequennly process the data from data nodes. > > Providing there is no performance degrdation for non FDW append queries, > I would recomend to consider this patch as an interim soluton while we are > waiting for parallel FDW scan. Wow, these are very impressive results! -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +
Re: pgsql: Add basic TAP tests for psql's tab-completion logic.
Re: Tom Lane 2020-01-09 <16328.1578606...@sss.pgh.pa.us> > build part.) I pushed a patch using SKIP_READLINE_TESTS. > Christoph should be able to set that for the Ubuntu branches > where the test is failing. That "fixed" the problem on xenial, thanks. Christoph
Re: [HACKERS] Block level parallel vacuum
On Wed, 15 Jan 2020 at 19:31, Mahendra Singh Thalor wrote: > > On Wed, 15 Jan 2020 at 19:04, Mahendra Singh Thalor > wrote: > > > > On Wed, 15 Jan 2020 at 17:27, Amit Kapila wrote: > > > > > > On Wed, Jan 15, 2020 at 10:05 AM Masahiko Sawada > > > wrote: > > > > > > > > Thank you for updating the patch! I have a few small comments. > > > > > > > > > > I have adapted all your changes, fixed the comment by Mahendra related > > > to initializing parallel state only when there are at least two > > > indexes. Additionally, I have changed a few comments (make the > > > reference to parallel vacuum consistent, at some places we were > > > referring it as 'parallel lazy vacuum' and at other places it was > > > 'parallel index vacuum'). > > > > > > > The > > > > rest looks good to me. > > > > > > > > > > Okay, I think the patch is in good shape. I am planning to read it a > > > few more times (at least 2 times) and then probably will commit it > > > early next week (Monday or Tuesday) unless there are any major > > > comments. I have already committed the API patch (4d8a8d0c73). > > > > > > > Hi, > > Thanks Amit for fixing review comments. > > > > I reviewed v48 patch and below are some comments. > > > > 1. > > +* based on the number of indexes. -1 indicates a parallel vacuum is > > > > I think, above should be like "-1 indicates that parallel vacuum is" > > > > 2. > > +/* Variables for cost-based parallel vacuum */ > > > > At the end of comment, there is 2 spaces. I think, it should be only 1 > > space. > > > > 3. > > I think, we should add a test case for parallel option(when degree is not > > specified). > > Ex: > > postgres=# VACUUM (PARALLEL) tmp; > > ERROR: parallel option requires a value between 0 and 1024 > > LINE 1: VACUUM (PARALLEL) tmp; > > ^ > > postgres=# > > > > Because above error is added in this parallel patch, so we should have test > > case for this to increase code coverage. > > > > Hi > Below are some more review comments for v48 patch. > > 1. > #include "storage/bufpage.h" > #include "storage/lockdefs.h" > +#include "storage/shm_toc.h" > +#include "storage/dsm.h" > > Here, order of header file is not alphabetically. (storage/dsm.h > should come before storage/lockdefs.h) > > 2. > +/* No index supports parallel vacuum */ > +if (nindexes_parallel == 0) > +return 0; > + > +/* The leader process takes one index */ > +nindexes_parallel--; > > Above code can be rearranged as: > > +/* The leader process takes one index */ > +nindexes_parallel--; > + > +/* No index supports parallel vacuum */ > +if (nindexes_parallel <= 0) > +return 0; > > If we do like this, then in some cases, we can skip some calculations > of parallel workers. > > -- > Thanks and Regards > Mahendra Singh Thalor > EnterpriseDB: http://www.enterprisedb.com Hi, I checked code coverage and time taken by vacuum.sql test with and without v48 patch. Below are some findings (I ran "make check-world -i" to get coverage.) 1. With v45 patch, compute_parallel_delay is never called so function hit is zero. I think, we can add some delay options into vacuum.sql test to hit function. 2. I checked time taken by vacuum.sql test. Execution time is almost same with and without v45 patch. Without v45 patch: Run1) vacuum ... ok 701 ms Run2) vacuum ... ok 549 ms Run3) vacuum ... ok 559 ms Run4) vacuum ... ok 480 ms With v45 patch: Run1) vacuum ... ok 842 ms Run2) vacuum ... ok 808 ms Run3) vacuum ... ok 774 ms Run4) vacuum ... ok 792 ms -- Thanks and Regards Mahendra Singh Thalor EnterpriseDB: http://www.enterprisedb.com
Re: our checks for read-only queries are not great
Greetings, * Robert Haas (robertmh...@gmail.com) wrote: > On Wed, Jan 15, 2020 at 10:25 AM Peter Eisentraut > wrote: > > Well, if the transaction was declared read-only, then committing it > > (directly or 2PC) shouldn't change anything. This appears to be a > > circular argument. > > I don't think it's a circular argument. Suppose that someone decrees > that, as of 5pm Eastern time, no more read-write transactions are > permitted. And because the person issuing the decree has a lot of > power, everybody obeys. Now, every pg_dump taken after that time will > be semantically equivalent to every other pg_dump taken after that > time, with one tiny exception. That exception is that someone could > still do COMMIT PREPARED of a read-write transaction that was prepared > before 5pm. If the goal of the powerful person who issued the decree > was to make sure that the database couldn't change - e.g. so they > could COPY each table individually without keeping a snapshot open and > still get a consistent backup - they might fail to achieve it if, as > of the moment of the freeze, there are some prepared write > transactions. > > I'm not saying we have to change the behavior or anything. I'm just > saying that there seems to be one, and only one, way to make the > apparent contents of the database change in a read-only transaction > right now. And that's a COMMIT PREPARED of a read-write transaction. Yeah, allowing a read-only transaction to start and then commit a read-write transaction doesn't seem sensible. I'd be in favor of changing that. Thanks, Stephen signature.asc Description: PGP signature
Re: our checks for read-only queries are not great
On Wed, Jan 15, 2020 at 10:25 AM Peter Eisentraut wrote: > Well, if the transaction was declared read-only, then committing it > (directly or 2PC) shouldn't change anything. This appears to be a > circular argument. I don't think it's a circular argument. Suppose that someone decrees that, as of 5pm Eastern time, no more read-write transactions are permitted. And because the person issuing the decree has a lot of power, everybody obeys. Now, every pg_dump taken after that time will be semantically equivalent to every other pg_dump taken after that time, with one tiny exception. That exception is that someone could still do COMMIT PREPARED of a read-write transaction that was prepared before 5pm. If the goal of the powerful person who issued the decree was to make sure that the database couldn't change - e.g. so they could COPY each table individually without keeping a snapshot open and still get a consistent backup - they might fail to achieve it if, as of the moment of the freeze, there are some prepared write transactions. I'm not saying we have to change the behavior or anything. I'm just saying that there seems to be one, and only one, way to make the apparent contents of the database change in a read-only transaction right now. And that's a COMMIT PREPARED of a read-write transaction. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Making psql error out on output failures
Right, the file difference is caused by "-At". On the other side, in order to keep the output message more consistent with other tools, I did a litter bit more investigation on pg_dump to see how it handles this situation. Here is my findings. pg_dump using WRITE_ERROR_EXIT to throw the error message when "(bytes_written != size * nmemb)", where WRITE_ERROR_EXIT calls fatal("could not write to output file: %m") and then "pg_log_generic(PG_LOG_ERROR, __VA_ARGS__)". After ran a quick test in the same situation, I got message like below, $ pg_dump -h localhost -p 5432 -d postgres -t psql_error -f /mnt/ramdisk/file pg_dump: error: could not write to output file: No space left on device If I change the error log message like below, where "%m" is used to pass the value of strerror(errno), "could not write to output file:" is copied from function "WRITE_ERROR_EXIT". - pg_log_error("Error printing tuples"); + pg_log_error("could not write to output file: %m"); then the output message is something like below, which, I believe, is more consistent with pg_dump. $ psql -d postgres -t -c "select repeat('111', 100)" -o /mnt/ramdisk/file could not write to output file: No space left on device $ psql -d postgres -t -c "select repeat('111', 100)" > /mnt/ramdisk/file could not write to output file: No space left on device Hope the information will help. David --- Highgo Software Inc. (Canada) www.highgo.ca
Re: Allow to_date() and to_timestamp() to accept localized names
On Wed, Jan 15, 2020 at 11:20 AM Arthur Zakirov wrote: > > I have some couple picky notes. > > Thanks for the review. > > + if (name_len != norm_len) > > + pfree(norm_name); > > I'm not sure here. Is it save to assume that if it was allocated a new > norm_name name_len and norm_len always will differ? > Good call, that check can be more robust. > > > static const char *const months_full[] = { > > "January", "February", "March", "April", "May", "June", "July", > > - "August", "September", "October", "November", "December", NULL > > + "August", "September", "October", "November", "December" > > }; > > Unfortunately I didn't see from the patch why it was necessary to remove > last null element from months_full, days_short, adbc_strings, > adbc_strings_long and other arrays. I think it is not good because > unnecessarily increases the patch and adds code like the following: > > > + from_char_seq_search(&value, &s, months, > localized_names, > > + > ONE_UPPER, MAX_MON_LEN, n, have_error, > > + > lengthof(months_full)); > > Here it passes "months" from datetime.c to from_char_seq_search() and > calculates its length using "months_full" array from formatting.c. > With the introduction of seq_search_localized() that can be avoided, minimizing code churn. Please, find attached a version addressing the above mentioned. Regards, Juan José Santamaría Flecha > > 0001-Allow-localized-month-names-to_date-v6.patch Description: Binary data
Re: Rearranging ALTER TABLE to avoid multi-operations bugs
Alvaro Herrera writes: > I didn't review in detail, but it seems good to me. I especially liked > getting rid of the ProcessedConstraint code, and the additional test > cases. Thanks for looking! Yeah, all those test cases expose situations where we misbehave today :-(. I wish this were small enough to be back-patchable, but it's not feasible. regards, tom lane
relcache sometimes initially ignores has_generated_stored
Hi, I think it's probably not relevant, but it confused me for a moment that RelationBuildTupleDesc() might set constr->has_generated_stored to true, but then throw away the constraint at the end, because nothing matches the /* * Set up constraint/default info */ if (has_not_null || ndef > 0 || attrmiss || relation->rd_rel->relchecks) test, i.e. there are no defaults. A quick assert confirms we do indeed pfree() constr in cases where has_generated_stored == true. I suspect that's just an intermediate catalog, however, e.g. when DefineRelation() does heap_create_with_catalog(); CommandCounterIncrement(); relation_open(); AddRelationNewConstraints(). It does still strike me as not great that we can get a different relcache entry, even if transient, depending on whether there are other reasons to create a TupleConstr. Say a NOT NULL column. I'm inclined to think we should just also check has_generated_stored in the if quoted above? Greetings, Andres Freund
Re: Improve errors when setting incorrect bounds for SSL protocols
> On 15 Jan 2020, at 03:28, Michael Paquier wrote: > > On Tue, Jan 14, 2020 at 11:21:53AM +0100, Daniel Gustafsson wrote: >> On 14 Jan 2020, at 04:54, Michael Paquier wrote: >>> Please note that OpenSSL 1.1.0 has added two routines to be able to >>> get the min/max protocols set in a context, called >>> SSL_CTX_get_min/max_proto_version. Thinking about older versions of >>> OpenSSL I think that it is better to use >>> ssl_protocol_version_to_openssl to do the parsing work. I also found >>> that it is easier to check for compatible versions after setting both >>> bounds in the SSL context, so as there is no need to worry about >>> invalid values depending on the build of OpenSSL used. >> >> I'm not convinced that it's a good idea to check for incompatible protocol >> range in the OpenSSL backend. We've spent a lot of energy to make the TLS >> code >> library agnostic and pluggable, and since identifying a basic configuration >> error isn't OpenSSL specific I think it should be in the guc code. That >> would >> keep the layering as well as ensure that we don't mistakenly treat this >> differently should we get a second TLS backend. > > Good points. And the get routines are not that portable in OpenSSL > either even if HEAD supports 1.0.1 and newer versions... Attached is > an updated patch which uses a GUC check for both parameters, and > provides a hint on top of the original error message. The SSL context > does not get reloaded if there is an error, so the errors from OpenSSL > cannot be triggered as far as I checked (after mixing a couple of > corrent and incorrect combinations manually). This is pretty much exactly the patch I was intending to write for this, so +1 from me. cheers ./daniel
Re: Duplicate Workers entries in some EXPLAIN plans
Sounds good, I'll try that format. Any idea how to test YAML? With the JSON format, I was able to rely on Postgres' own JSON-manipulating functions to strip or canonicalize fields that can vary across executions--I can't really do that with YAML. Or should I run EXPLAIN with COSTS OFF, TIMING OFF, SUMMARY OFF and assume that for simple queries the BUFFERS output (and other fields I can't turn off like Sort Space Used) *is* going to be stable?
Re: Rearranging ALTER TABLE to avoid multi-operations bugs
On 2020-Jan-14, Tom Lane wrote: > I wrote: > > [ fix-alter-table-order-of-operations-3.patch ] > > Rebased again, fixing a minor conflict with f595117e2. > > > I'd kind of like to get this cleared out of my queue soon. > > Does anyone intend to review it further? > > If I don't hear objections pretty darn quick, I'm going to > go ahead and push this. I didn't review in detail, but it seems good to me. I especially liked getting rid of the ProcessedConstraint code, and the additional test cases. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: progress report for ANALYZE
I just pushed this after some small extra tweaks. Thanks, Yamada-san, for seeing this to completion! -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Complete data erasure
Greetings, * Tom Lane (t...@sss.pgh.pa.us) wrote: > Tomas Vondra writes: > > But let's assume it makes sense - is this really the right solution? I > > think what I'd prefer is encryption + rotation of the keys. Which should > > work properly even on COW filesystems, the performance impact is kinda > > low and amortized etc. Of course, we're discussing built-in encryption > > for quite a bit of time. > > Yeah, it seems to me that encrypted storage would solve strictly more > cases than this proposal does, and it has fewer foot-guns. I still view that as strictly a different solution and one that certainly won't fit in all cases, not to mention that it's a great deal more complicated and we're certainly no where near close to having it. > One foot-gun that had been vaguely bothering me yesterday, but the point > didn't quite crystallize till just now, is that the only place that we > could implement such file zeroing is post-commit in a transaction that > has done DROP TABLE or TRUNCATE. Of course post-commit is an absolutely > horrid place to be doing anything that could fail, since there's no very > good way to recover from an error. It's an even worse place to be doing > anything that could take a long time --- if the user loses patience and > kills the session, now your problems are multiplied. This is presuming that we make this feature something that can be run in a transaction and rolled back. I don't think there's been any specific expression that there is such a requirement, and you bring up good points that show why providing that functionality would be particularly challenging, but that isn't really an argument against this feature in general. Thanks, Stephen signature.asc Description: PGP signature
Re: Complete data erasure
Tomas Vondra writes: > But let's assume it makes sense - is this really the right solution? I > think what I'd prefer is encryption + rotation of the keys. Which should > work properly even on COW filesystems, the performance impact is kinda > low and amortized etc. Of course, we're discussing built-in encryption > for quite a bit of time. Yeah, it seems to me that encrypted storage would solve strictly more cases than this proposal does, and it has fewer foot-guns. One foot-gun that had been vaguely bothering me yesterday, but the point didn't quite crystallize till just now, is that the only place that we could implement such file zeroing is post-commit in a transaction that has done DROP TABLE or TRUNCATE. Of course post-commit is an absolutely horrid place to be doing anything that could fail, since there's no very good way to recover from an error. It's an even worse place to be doing anything that could take a long time --- if the user loses patience and kills the session, now your problems are multiplied. Right now our risks in that area are confined to leaking files if unlink() fails, which isn't great but it isn't catastrophic either. With this proposal, erroring out post-commit becomes a security failure, if it happens anywhere before we've finished a possibly large amount of zero-writing. I don't want to go there. regards, tom lane
Re: pgindent && weirdness
On Tue, Jan 14, 2020 at 05:30:21PM -0500, Tom Lane wrote: > Alvaro Herrera writes: > > I just ran pgindent over some patch, and noticed that this hunk ended up > > in my working tree: > > > - if (IsA(leftop, Var) && IsA(rightop, Const)) > > + if (IsA(leftop, Var) &&IsA(rightop, Const)) > > Yeah, it's been doing that for decades. I think the triggering > factor is the typedef name (Var, here) preceding the &&. > > It'd be nice to fix properly, but I've tended to take the path > of least resistance by breaking such lines to avoid the ugliness: > > if (IsA(leftop, Var) && > IsA(rightop, Const)) In the past I would use a post-processing step after BSD indent to fix up these problems. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +
Re: Rearranging ALTER TABLE to avoid multi-operations bugs
Sergei Kornilov writes: > I am clearly not a good reviewer for such changes... But for a note: I read > the v4 patch and have no useful comments. Good new tests, reasonable code > changes to fix multiple bug reports. Thanks for looking! > The patch is proposed only for the master branch, right? Yes, it seems far too risky for the back branches. regards, tom lane
Re: Complete data erasure
Greetings, * Tomas Vondra (tomas.von...@2ndquadrant.com) wrote: > On Wed, Jan 15, 2020 at 10:23:22AM -0500, Stephen Frost wrote: > >I disagree entirely. If the operating system and hardware level provide > >a way for this to work, which is actually rather common when you > >consider that ext4 is an awful popular filesystem where this would work > >just fine with nearly all traditional hardware underneath it, then we're > >just blocking enabling this capability for no justifiably reason. > > Not sure. I agree the goal (securely discarding data) is certainly > worthwile, although I suspect it's just of many things you'd need to > care about. That is, there's probably a million other things you'd need > to worry about (logs, WAL, CSV files, temp files, ...), so with actually > sensitive data I'd expect people to just dump/load the data into a clean > system and rebuild the old one (zero drives, ...). Of course there's other things that one would need to worry about, but saying this isn't useful because PG isn't also scanning your entire infrastructure for CSV files that have this sensitive information isn't a sensible argument. I agree that there are different levels of sensitive data- and for many, many cases what is being proposed here will work just fine, even if that level of sensitive data isn't considered "actually sensitive data" by other people. > But let's assume it makes sense - is this really the right solution? I > think what I'd prefer is encryption + rotation of the keys. Which should > work properly even on COW filesystems, the performance impact is kinda > low and amortized etc. Of course, we're discussing built-in encryption > for quite a bit of time. In some cases that may make sense as an alternative, in other situations it doesn't. In other words, I disagree that what you're proposing is just a different implementation of the same thing (which I'd be happy to discuss), it's an entirely different feature with different trade-offs. I'm certainly on-board with the idea of having table level and column level encryption to facilitate an approach like this, but I disagree strongly that we shouldn't have this simple "just overwrite the data a few times" because we might, one day, have useful in-core encryption or even that, even if we had it, that we should tell everyone to use that instead of providing this capability. I don't uninstall 'shread' when I install 'gpg' on my system. Thanks, Stephen signature.asc Description: PGP signature
Re: Complete data erasure
On Wed, Jan 15, 2020 at 10:23:22AM -0500, Stephen Frost wrote: Greetings, * Tom Lane (t...@sss.pgh.pa.us) wrote: "asaba.takan...@fujitsu.com" writes: > I want to add the feature to erase data so that it cannot be restored > because it prevents attackers from stealing data from released data area. I think this is fairly pointless, unfortunately. I disagree- it's a feature that's been asked for multiple times and does have value in some situations. Dropping or truncating tables is as much as we can do without making unwarranted assumptions about the filesystem's behavior. You say you want to zero out the files first, but what will that accomplish on copy-on-write filesystems? What about filesystems which are not copy-on-write though? Even granting that zeroing our storage files is worth something, it's not worth much if there are more copies of the data elsewhere. Backups are not our responsibility to worry about, or, at least, it'd be an independent feature if we wanted to add something like this to pg_basebackup, and not something the initial feature would need to worry about. And any well-run database is going to have backups, or standby servers, or both. There's no way for the primary node to force standbys to erase themselves (and it'd be a different sort of security hazard if there were). A user can't "force" PG to do anything more than we can "force" a replica to do something, but a user can issue a request to a primary and that primary can then pass that request along to the replica as part of the WAL stream. Also to the point: if your assumption is that an attacker has access to the storage medium at a sufficiently low level that they can examine previously-deleted files, what's stopping them from reading the data *before* it's deleted? This argument certainly doesn't make any sense- who said they had access to the storage medium at a time before the files were deleted? What if they only had access after the files were zero'd? When you consider the lifetime of a storage medium, it's certainly a great deal longer than the length of time that a given piece of sensitive data might reside on it. So I think doing anything useful in this area is a bit outside Postgres' remit. You'd need to be thinking at an operating-system or hardware level. I disagree entirely. If the operating system and hardware level provide a way for this to work, which is actually rather common when you consider that ext4 is an awful popular filesystem where this would work just fine with nearly all traditional hardware underneath it, then we're just blocking enabling this capability for no justifiably reason. Not sure. I agree the goal (securely discarding data) is certainly worthwile, although I suspect it's just of many things you'd need to care about. That is, there's probably a million other things you'd need to worry about (logs, WAL, CSV files, temp files, ...), so with actually sensitive data I'd expect people to just dump/load the data into a clean system and rebuild the old one (zero drives, ...). But let's assume it makes sense - is this really the right solution? I think what I'd prefer is encryption + rotation of the keys. Which should work properly even on COW filesystems, the performance impact is kinda low and amortized etc. Of course, we're discussing built-in encryption for quite a bit of time. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Rearranging ALTER TABLE to avoid multi-operations bugs
Hello Thank you! I am clearly not a good reviewer for such changes... But for a note: I read the v4 patch and have no useful comments. Good new tests, reasonable code changes to fix multiple bug reports. The patch is proposed only for the master branch, right? regards, Sergei
Re: Extracting only the columns needed for a query
> On Thu, Jan 02, 2020 at 05:21:55PM -0800, Melanie Plageman wrote: > > That makes me think that maybe the function name, > extract_used_columns() is bad, though. Maybe extract_scan_columns()? > I tried this in the attached, updated patch. Thanks, I'll take a look at the new version. Following your explanation extract_scan_columns sounds better, but at the same time scan is pretty broad term and one can probably misunderstand what kind of scan is that in the name. > For UPDATE, we need all of the columns in the table because of the > table_lock() API's current expectation that the slot has all of the > columns populated. If we want UPDATE to only need to insert the column > values which have changed, table_tuple_lock() will have to change. Can you please elaborate on this part? I'm probably missing something, since I don't see immediately where are those expectations expressed. > > AM callback relation_estimate_size exists currently which planner leverages. > > Via this callback it fetches tuples, pages, etc.. So, our thought is to > > extend > > this API if possible to pass down needed column and help perform better > > costing > > for the query. Though we think if wish to leverage this function, need to > > know > > list of columns before planning hence might need to use query tree. > > I believe it would be beneficial to add this potential API extension patch > into > the thread (as an example of an interface defining how scanCols could be used) > and review them together. I still find this question from my previous email interesting to explore.
Re: our checks for read-only queries are not great
On 2020-01-13 20:14, Robert Haas wrote: On Mon, Jan 13, 2020 at 5:57 AM Peter Eisentraut wrote: On 2020-01-10 14:41, Robert Haas wrote: This rule very nearly matches the current behavior: it explains why temp table operations are allowed, and why ALTER SYSTEM is allowed, and why REINDEX etc. are allowed. However, there's a notable exception: PREPARE, COMMIT PREPARED, and ROLLBACK PREPARED are allowed in a read-only transaction. Under the "doesn't change pg_dump output" criteria, the first and third ones should be permitted but COMMIT PREPARED should be denied, except maybe if the prepared transaction didn't do any writes (and in that case, why did we bother preparing it?). Despite that, this rule does a way better job explaining the current behavior than anything else suggested so far. I don't follow. Does pg_dump dump prepared transactions? No, but committing one changes the database contents as seen by a subsequent pg_dump. Well, if the transaction was declared read-only, then committing it (directly or 2PC) shouldn't change anything. This appears to be a circular argument. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Complete data erasure
Greetings, * Tom Lane (t...@sss.pgh.pa.us) wrote: > "asaba.takan...@fujitsu.com" writes: > > I want to add the feature to erase data so that it cannot be restored > > because it prevents attackers from stealing data from released data area. > > I think this is fairly pointless, unfortunately. I disagree- it's a feature that's been asked for multiple times and does have value in some situations. > Dropping or truncating tables is as much as we can do without making > unwarranted assumptions about the filesystem's behavior. You say > you want to zero out the files first, but what will that accomplish > on copy-on-write filesystems? What about filesystems which are not copy-on-write though? > Even granting that zeroing our storage files is worth something, > it's not worth much if there are more copies of the data elsewhere. Backups are not our responsibility to worry about, or, at least, it'd be an independent feature if we wanted to add something like this to pg_basebackup, and not something the initial feature would need to worry about. > And any well-run database is going to have backups, or standby servers, > or both. There's no way for the primary node to force standbys to erase > themselves (and it'd be a different sort of security hazard if there > were). A user can't "force" PG to do anything more than we can "force" a replica to do something, but a user can issue a request to a primary and that primary can then pass that request along to the replica as part of the WAL stream. > Also to the point: if your assumption is that an attacker has access > to the storage medium at a sufficiently low level that they can examine > previously-deleted files, what's stopping them from reading the data > *before* it's deleted? This argument certainly doesn't make any sense- who said they had access to the storage medium at a time before the files were deleted? What if they only had access after the files were zero'd? When you consider the lifetime of a storage medium, it's certainly a great deal longer than the length of time that a given piece of sensitive data might reside on it. > So I think doing anything useful in this area is a bit outside > Postgres' remit. You'd need to be thinking at an operating-system > or hardware level. I disagree entirely. If the operating system and hardware level provide a way for this to work, which is actually rather common when you consider that ext4 is an awful popular filesystem where this would work just fine with nearly all traditional hardware underneath it, then we're just blocking enabling this capability for no justifiably reason. Thanks, Stephen signature.asc Description: PGP signature
Re: base backup client as auxiliary backend process
On 2020-01-09 11:57, Alexandra Wang wrote: Back to the base backup stuff, I don't quite understand all the benefits you mentioned above. It seems to me the greatest benefit with this patch is that postmaster takes care of pg_basebackup itself, which reduces the human wait in between running the pg_basebackup and pg_ctl/postgres commands. Is that right? I personally don't mind the --write-recovery-conf option because it helps me write the primary_conninfo and primary_slot_name gucs into postgresql.auto.conf, which to me as a developer is easier than editing postgres.conf without automation. Sorry about the dumb question but what's so bad about --write-recovery-conf? Making it easier to automate is one major appeal of my proposal. The current way of setting up a standby is very difficult to automate correctly. Are you planning to completely replace pg_basebackup with this? Is there any use case that a user just need a basebackup but not immediately start the backend process? I'm not planning to replace or change pg_basebackup. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: base backup client as auxiliary backend process
On 2020-01-15 01:40, Masahiko Sawada wrote: Could you rebase the main patch that adds base backup client as auxiliary backend process since the previous version patch (v3) conflicts with the current HEAD? attached -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services From 892ba431956c7d936555f758efc874f58b3679e8 Mon Sep 17 00:00:00 2001 From: Peter Eisentraut Date: Wed, 15 Jan 2020 16:15:06 +0100 Subject: [PATCH v4] Base backup client as auxiliary backend process Discussion: https://www.postgresql.org/message-id/flat/61b8d18d-c922-ac99-b990-a31ba63cd...@2ndquadrant.com --- doc/src/sgml/protocol.sgml| 12 +- doc/src/sgml/ref/initdb.sgml | 17 + src/backend/access/transam/xlog.c | 102 +++-- src/backend/bootstrap/bootstrap.c | 9 + src/backend/postmaster/pgstat.c | 6 + src/backend/postmaster/postmaster.c | 114 - src/backend/replication/basebackup.c | 70 +++ .../libpqwalreceiver/libpqwalreceiver.c | 400 ++ src/backend/replication/repl_gram.y | 9 +- src/backend/replication/repl_scanner.l| 1 + src/backend/storage/file/fd.c | 36 +- src/bin/initdb/initdb.c | 39 +- src/bin/pg_resetwal/pg_resetwal.c | 6 +- src/include/access/xlog.h | 8 +- src/include/miscadmin.h | 2 + src/include/pgstat.h | 1 + src/include/replication/basebackup.h | 2 + src/include/replication/walreceiver.h | 4 + src/include/storage/fd.h | 2 +- src/include/utils/guc.h | 2 +- src/test/recovery/t/018_basebackup.pl | 29 ++ 21 files changed, 783 insertions(+), 88 deletions(-) create mode 100644 src/test/recovery/t/018_basebackup.pl diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml index 80275215e0..f54b820edf 100644 --- a/doc/src/sgml/protocol.sgml +++ b/doc/src/sgml/protocol.sgml @@ -2466,7 +2466,7 @@ Streaming Replication Protocol -BASE_BACKUP [ LABEL 'label' ] [ PROGRESS ] [ FAST ] [ WAL ] [ NOWAIT ] [ MAX_RATE rate ] [ TABLESPACE_MAP ] [ NOVERIFY_CHECKSUMS ] +BASE_BACKUP [ LABEL 'label' ] [ PROGRESS ] [ FAST ] [ WAL ] [ NOWAIT ] [ MAX_RATE rate ] [ TABLESPACE_MAP ] [ NOVERIFY_CHECKSUMS ] [ EXCLUDE_CONF ] BASE_BACKUP @@ -2576,6 +2576,16 @@ Streaming Replication Protocol + + +EXCLUDE_CONF + + + Do not copy configuration files, that is, files that end in + .conf. + + + diff --git a/doc/src/sgml/ref/initdb.sgml b/doc/src/sgml/ref/initdb.sgml index da5c8f5307..1261e02d59 100644 --- a/doc/src/sgml/ref/initdb.sgml +++ b/doc/src/sgml/ref/initdb.sgml @@ -286,6 +286,23 @@ Options + + -r + --replica + + +Initialize a data directory for a physical replication replica. The +data directory will not be initialized with a full database system, +but will instead only contain a minimal set of files. A server that +is started on this data directory will first fetch a base backup and +then switch to standby mode. The connection information for the base +backup has to be configured by setting , and other parameters as desired, +before the server is started. + + + + -S --sync-only diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 7f4f784c0e..36c6cdef82 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -903,8 +903,6 @@ static void CheckRecoveryConsistency(void); static XLogRecord *ReadCheckpointRecord(XLogReaderState *xlogreader, XLogRecPtr RecPtr, int whichChkpt, bool report); static bool rescanLatestTimeLine(void); -static void WriteControlFile(void); -static void ReadControlFile(void); static char *str_time(pg_time_t tnow); static bool CheckForStandbyTrigger(void); @@ -4494,7 +4492,7 @@ rescanLatestTimeLine(void) * ReadControlFile() verifies they are correct. We could split out the * I/O and compatibility-check functions, but there seems no need currently. */ -static void +void WriteControlFile(void) { int fd; @@ -4585,7 +4583,7 @@ WriteControlFile(void) XLOG_CONTROL_FILE))); } -static void +void ReadControlFile(void) { pg_crc32c crc; @@ -5075,6 +5073,41 @@ XLOGShmemInit(void) InitSharedLatch(&XLogCtl->recoveryWakeupLatch); } +void +InitControlFile(uint64 sysidentifier) +{ + charmock_auth_nonce[MO
Re: Remove libpq.rc, use win32ver.rc for libpq
On 2020-01-15 07:44, Michael Paquier wrote: I have been testing and checking the patch a bit more seriously, and the information gets generated correctly for dlls and exe files. The rest of the changes look fine to me. For src/makefiles/Makefile.win32, I don't have a MinGW environment at hand so I have not directly tested but the logic looks fine. I have tested MinGW. Patch committed. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: TRUNCATE on foreign tables
2020年1月15日(水) 17:11 Michael Paquier : > > On Tue, Jan 14, 2020 at 06:16:17PM +0900, Kohei KaiGai wrote: > > The "frels_list" is a list of foreign tables that are connected to a > > particular > > foreign server, thus, the server-id pulled out by foreign tables id should > > be > > identical for all the relations in the list. > > Due to the API design, this callback shall be invoked for each foreign > > server > > involved in the TRUNCATE command, not per table basis. > > > > The 2nd and 3rd arguments also informs FDW driver other options of the > > command. If FDW has a concept of "cascaded truncate" or "restart sequence", > > it can adjust its remote query. In postgres_fdw, it follows the manner of > > usual TRUNCATE command. > > I have done a quick read through the patch. You have modified the > patch to pass down to the callback a list of relation OIDs to execute > one command for all, and there are tests for FKs so that coverage > looks fine. > > Regression tests are failing with this patch: > -- TRUNCATE doesn't work on foreign tables, either directly or > recursively > TRUNCATE ft2; -- ERROR > -ERROR: "ft2" is not a table > +ERROR: foreign-data wrapper "dummy" has no handler > You visibly just need to update the output because no handlers are > available for truncate in this case. > What error message is better in this case? It does not print "ft2" anywhere, so user may not notice that "ft2" is the source of the error. How about 'foreign table "ft2" does not support truncate' ? > +void > +deparseTruncateSql(StringInfo buf, Relation rel) > +{ > + deparseRelation(buf, rel); > +} > Don't see much point in having this routine. > deparseRelation() is a static function in postgres_fdw/deparse.c On the other hand, it may be better to move entire logic to construct remote TRUNCATE command in the deparse.c side like other commands. > + If FDW does not provide this callback, PostgreSQL considers > + TRUNCATE is not supported on the foreign table. > + > This sentence is weird. Perhaps you meant "as not supported"? > Yes. If FDW does not provide this callback, PostgreSQL performs as if TRUNCATE is not supported on the foreign table. > + frels_list is a list of foreign tables that are > + connected to a particular foreign server; thus, these foreign tables > + should have identical foreign server ID > The list is built by the backend code, so that has to be true. > > + foreach (lc, frels_list) > + { > + Relationfrel = lfirst(lc); > + Oid frel_oid = RelationGetRelid(frel); > + > + if (server_id == GetForeignServerIdByRelId(frel_oid)) > + { > + frels_list = foreach_delete_current(frels_list, lc); > + curr_frels = lappend(curr_frels, frel); > + } > + } > Wouldn't it be better to fill in a hash table for each server with a > list of relations? > It's just a matter of preference. A temporary hash-table with server-id and list of foreign-tables is an idea. Let me try to revise. > +typedef void (*ExecForeignTruncate_function) (List *frels_list, > + bool is_cascade, > + bool restart_seqs); > I would recommend to pass down directly DropBehavior instead of a > boolean to the callback. That's more extensible. > Ok, Best regards, -- HeteroDB, Inc / The PG-Strom Project KaiGai Kohei
Re: remove support for old Python versions
Rémi, please update your build farm member "locust" to a new Python version (>=2.6) for the master branch, or disable the Python option. Thanks. On 2020-01-08 23:04, Peter Eisentraut wrote: On 2019-12-31 10:46, Peter Eisentraut wrote: On 2019-12-09 11:37, Peter Eisentraut wrote: Per discussion in [0], here is a patch set to remove support for Python versions older than 2.6. [0]: https://www.postgresql.org/message-id/6d3b7b69-0970-4d40-671a-268c46e93...@2ndquadrant.com It appears that the removal of old OpenSSL support is stalled or abandoned for now, so this patch set is on hold for now as well, as far as I'm concerned. I have committed the change of the Python exception syntax in the documentation, though. Since the OpenSSL patch went ahead, I have now committed this one as well. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [HACKERS] Block level parallel vacuum
On Wed, 15 Jan 2020 at 19:04, Mahendra Singh Thalor wrote: > > On Wed, 15 Jan 2020 at 17:27, Amit Kapila wrote: > > > > On Wed, Jan 15, 2020 at 10:05 AM Masahiko Sawada > > wrote: > > > > > > Thank you for updating the patch! I have a few small comments. > > > > > > > I have adapted all your changes, fixed the comment by Mahendra related > > to initializing parallel state only when there are at least two > > indexes. Additionally, I have changed a few comments (make the > > reference to parallel vacuum consistent, at some places we were > > referring it as 'parallel lazy vacuum' and at other places it was > > 'parallel index vacuum'). > > > > > The > > > rest looks good to me. > > > > > > > Okay, I think the patch is in good shape. I am planning to read it a > > few more times (at least 2 times) and then probably will commit it > > early next week (Monday or Tuesday) unless there are any major > > comments. I have already committed the API patch (4d8a8d0c73). > > > > Hi, > Thanks Amit for fixing review comments. > > I reviewed v48 patch and below are some comments. > > 1. > +* based on the number of indexes. -1 indicates a parallel vacuum is > > I think, above should be like "-1 indicates that parallel vacuum is" > > 2. > +/* Variables for cost-based parallel vacuum */ > > At the end of comment, there is 2 spaces. I think, it should be only 1 space. > > 3. > I think, we should add a test case for parallel option(when degree is not > specified). > Ex: > postgres=# VACUUM (PARALLEL) tmp; > ERROR: parallel option requires a value between 0 and 1024 > LINE 1: VACUUM (PARALLEL) tmp; > ^ > postgres=# > > Because above error is added in this parallel patch, so we should have test > case for this to increase code coverage. > Hi Below are some more review comments for v48 patch. 1. #include "storage/bufpage.h" #include "storage/lockdefs.h" +#include "storage/shm_toc.h" +#include "storage/dsm.h" Here, order of header file is not alphabetically. (storage/dsm.h should come before storage/lockdefs.h) 2. +/* No index supports parallel vacuum */ +if (nindexes_parallel == 0) +return 0; + +/* The leader process takes one index */ +nindexes_parallel--; Above code can be rearranged as: +/* The leader process takes one index */ +nindexes_parallel--; + +/* No index supports parallel vacuum */ +if (nindexes_parallel <= 0) +return 0; If we do like this, then in some cases, we can skip some calculations of parallel workers. -- Thanks and Regards Mahendra Singh Thalor EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] Block level parallel vacuum
On Wed, 15 Jan 2020 at 17:27, Amit Kapila wrote: > > On Wed, Jan 15, 2020 at 10:05 AM Masahiko Sawada > wrote: > > > > Thank you for updating the patch! I have a few small comments. > > > > I have adapted all your changes, fixed the comment by Mahendra related > to initializing parallel state only when there are at least two > indexes. Additionally, I have changed a few comments (make the > reference to parallel vacuum consistent, at some places we were > referring it as 'parallel lazy vacuum' and at other places it was > 'parallel index vacuum'). > > > The > > rest looks good to me. > > > > Okay, I think the patch is in good shape. I am planning to read it a > few more times (at least 2 times) and then probably will commit it > early next week (Monday or Tuesday) unless there are any major > comments. I have already committed the API patch (4d8a8d0c73). > Hi, Thanks Amit for fixing review comments. I reviewed v48 patch and below are some comments. 1. +* based on the number of indexes. -1 indicates a parallel vacuum is I think, above should be like "-1 indicates that parallel vacuum is" 2. +/* Variables for cost-based parallel vacuum */ At the end of comment, there is 2 spaces. I think, it should be only 1 space. 3. I think, we should add a test case for parallel option(when degree is not specified). *Ex:* postgres=# VACUUM (PARALLEL) tmp; ERROR: parallel option requires a value between 0 and 1024 LINE 1: VACUUM (PARALLEL) tmp; ^ postgres=# Because above error is added in this parallel patch, so we should have test case for this to increase code coverage. -- Thanks and Regards Mahendra Singh Thalor EnterpriseDB: http://www.enterprisedb.com
RE: Index Skip Scan
Hi all, I reviewed the latest version of the patch. Overall some good improvements I think. Please find my feedback below. - I think I mentioned this before - it's not that big of a deal, but it just looks weird and inconsistent to me: create table t2 as (select a, b, c, 10 d from generate_series(1,5) a, generate_series(1,100) b, generate_series(1,1) c); create index on t2 (a,b,c desc); postgres=# explain select distinct on (a,b) a,b,c from t2 where a=2 and b>=5 and b<=5 order by a,b,c desc; QUERY PLAN - Index Only Scan using t2_a_b_c_idx on t2 (cost=0.43..216.25 rows=500 width=12) Skip scan: true Index Cond: ((a = 2) AND (b >= 5) AND (b <= 5)) (3 rows) postgres=# explain select distinct on (a,b) a,b,c from t2 where a=2 and b=5 order by a,b,c desc; QUERY PLAN - Unique (cost=0.43..8361.56 rows=500 width=12) -> Index Only Scan using t2_a_b_c_idx on t2 (cost=0.43..8361.56 rows=9807 width=12) Index Cond: ((a = 2) AND (b = 5)) (3 rows) When doing a distinct on (params) and having equality conditions for all params, it falls back to the regular index scan even though there's no reason not to use the skip scan here. It's much faster to write b between 5 and 5 now rather than writing b=5. I understand this was a limitation of the unique-keys patch at the moment which could be addressed in the future. I think for the sake of consistency it would make sense if this eventually gets addressed. - nodeIndexScan.c, line 126 This sets xs_want_itup to true in all cases (even for non skip-scans). I don't think this is acceptable given the side-effects this has (page will never be unpinned in between returned tuples in _bt_drop_lock_and_maybe_pin) - nbsearch.c, _bt_skip, line 1440 _bt_update_skip_scankeys(scan, indexRel); This function is called twice now - once in the else {} and immediately after that outside of the else. The second call can be removed I think. - nbtsearch.c _bt_skip line 1597 LockBuffer(so->currPos.buf, BUFFER_LOCK_UNLOCK); scan->xs_itup = (IndexTuple) PageGetItem(page, itemid); This is an UNLOCK followed by a read of the unlocked page. That looks incorrect? - nbtsearch.c _bt_skip line 1440 if (BTScanPosIsValid(so->currPos) && _bt_scankey_within_page(scan, so->skipScanKey, so->currPos.buf, dir)) Is it allowed to look at the high key / low key of the page without have a read lock on it? - nbtsearch.c line 1634 if (_bt_readpage(scan, indexdir, offnum)) ... else error() Is it really guaranteed that a match can be found on the page itself? Isn't it possible that an extra index condition, not part of the scan key, makes none of the keys on the page match? - nbtsearch.c in general Most of the code seems to rely quite heavily on the fact that xs_want_itup forces _bt_drop_lock_and_maybe_pin to never release the buffer pin. Have you considered that compacting of a page may still happen even if you hold the pin? [1] I've been trying to come up with cases in which this may break the patch, but I haven't able to produce such a scenario - so it may be fine. But it would be good to consider again. One thing I was thinking of was a scenario where page splits and/or compacting would happen in between returning tuples. Could this break the _bt_scankey_within_page check such that it thinks the scan key is within the current page, while it actually isn't? Mainly for backward and/or cursor scans. Forward scans shouldn't be a problem I think. Apologies for being a bit vague as I don't have a clear example ready when it would go wrong. It may well be fine, but it was one of the things on my mind. [1] https://postgrespro.com/list/id/1566683972147.11...@optiver.com -Floris
Re: [Proposal] Add accumulated statistics for wait event
On 2020/01/13 4:11, Pavel Stehule wrote: > The following review has been posted through the commitfest application: > make installcheck-world: tested, passed > Implements feature: tested, passed > Spec compliant: not tested > Documentation:tested, passed > > I like this patch, because I used similar functionality some years ago very > successfully. The implementation is almost simple, and the result should be > valid by used method. Thanks for your review! > The potential problem is performance impact. Very early test show impact cca > 3% worst case, but I'll try to repeat these tests. Yes, performance impact is the main concern. I want to know how it affects performance in various test cases or on various environments. > There are some ending whitespaces and useless tabs. > > The new status of this patch is: Waiting on Author I attach v4 patches removing those extra whitespaces of the end of lines and useless tabs. -- Yoshikazu Imai From b009b1f8f6be47ae61b5e4538e2730d721ee60db Mon Sep 17 00:00:00 2001 From: "imai.yoshikazu" Date: Wed, 15 Jan 2020 09:13:19 + Subject: [PATCH v4 1/2] Add pg_stat_waitaccum view. pg_stat_waitaccum shows counts and duration of each wait events. Each backend/backgrounds counts and measures the time of wait event in every pgstat_report_wait_start and pgstat_report_wait_end. They store those info into their local variables and send to Statistics Collector. We can get those info via Statistics Collector. For reducing overhead, I implemented statistic hash instead of dynamic hash. I also implemented track_wait_timing which determines wait event duration is collected or not. On windows, this function might be not worked correctly, because now it initializes local variables in pg_stat_init which is not passed to fork processes on windows. --- src/backend/catalog/system_views.sql | 8 + src/backend/postmaster/pgstat.c | 344 ++ src/backend/storage/lmgr/lwlock.c | 19 ++ src/backend/utils/adt/pgstatfuncs.c | 80 ++ src/backend/utils/misc/guc.c | 9 + src/backend/utils/misc/postgresql.conf.sample | 1 + src/include/catalog/pg_proc.dat | 9 + src/include/pgstat.h | 123 - src/include/storage/lwlock.h | 1 + src/include/storage/proc.h| 1 + src/test/regress/expected/rules.out | 5 + 11 files changed, 598 insertions(+), 2 deletions(-) diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql index 773edf8..80f2caa 100644 --- a/src/backend/catalog/system_views.sql +++ b/src/backend/catalog/system_views.sql @@ -957,6 +957,14 @@ CREATE VIEW pg_stat_bgwriter AS pg_stat_get_buf_alloc() AS buffers_alloc, pg_stat_get_bgwriter_stat_reset_time() AS stats_reset; +CREATE VIEW pg_stat_waitaccum AS +SELECT + S.wait_event_type AS wait_event_type, + S.wait_event AS wait_event, + S.calls AS calls, + S.times AS times + FROM pg_stat_get_waitaccum(NULL) AS S; + CREATE VIEW pg_stat_progress_vacuum AS SELECT S.pid AS pid, S.datid AS datid, D.datname AS datname, diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c index 51c486b..08e10ad 100644 --- a/src/backend/postmaster/pgstat.c +++ b/src/backend/postmaster/pgstat.c @@ -123,6 +123,7 @@ */ bool pgstat_track_activities = false; bool pgstat_track_counts = false; +bool pgstat_track_wait_timing = false; intpgstat_track_functions = TRACK_FUNC_OFF; intpgstat_track_activity_query_size = 1024; @@ -153,6 +154,10 @@ static time_t last_pgstat_start_time; static bool pgStatRunningInCollector = false; +WAHash *wa_hash; + +instr_time waitStart; + /* * Structures in which backends store per-table info that's waiting to be * sent to the collector. @@ -255,6 +260,7 @@ static int localNumBackends = 0; */ static PgStat_ArchiverStats archiverStats; static PgStat_GlobalStats globalStats; +static PgStat_WaitAccumStats waitAccumStats; /* * List of OIDs of databases we need to write out. If an entry is InvalidOid, @@ -280,6 +286,8 @@ static pid_t pgstat_forkexec(void); #endif NON_EXEC_STATIC void PgstatCollectorMain(int argc, char *argv[]) pg_attribute_noreturn(); +static void pgstat_init_waitaccum_hash(WAHash **hash); +static PgStat_WaitAccumEntry *pgstat_add_wa_entry(WAHash *hash, uint32 key); static void pgstat_beshutdown_hook(int code, Datum arg); static PgStat_StatDBEntry *pgstat_get_db_entry(Oid databaseid, bool create); @@ -287,8 +295,11 @@ static PgStat_StatTabEntry *pgstat_get_tab_entry(PgStat_StatDBEntry *dbentry, Oid tableoid, bool create); static void pgstat_write_s
Re: [Proposal] Global temporary tables
> 2020年1月13日 下午4:08,Konstantin Knizhnik 写道: > > > > On 12.01.2020 4:51, Tomas Vondra wrote: >> On Fri, Jan 10, 2020 at 11:47:42AM +0300, Konstantin Knizhnik wrote: >>> >>> >>> On 09.01.2020 19:48, Tomas Vondra wrote: > The most complex and challenged task is to support GTT for all kind of > indexes. Unfortunately I can not proposed some good universal solution > for it. > Just patching all existed indexes implementation seems to be the only > choice. > I haven't looked at the indexing issue closely, but IMO we need to ensure that every session sees/uses only indexes on GTT that were defined before the seesion started using the table. >>> >>> Why? It contradicts with behavior of normal tables. >>> Assume that you have active clients and at some point of time DBA >>> recognizes that them are spending to much time in scanning some GTT. >>> It cab create index for this GTT but if existed client will not be able to >>> use this index, then we need somehow make this clients to restart their >>> sessions? >>> In my patch I have implemented building indexes for GTT on demand: if >>> accessed index on GTT is not yet initialized, then it is filled with local >>> data. >> >> Yes, I know the behavior would be different from behavior for regular >> tables. And yes, it would not allow fixing slow queries in sessions >> without interrupting those sessions. >> >> I proposed just ignoring those new indexes because it seems much simpler >> than alternative solutions that I can think of, and it's not like those >> other solutions don't have other issues. > > Quit opposite: prohibiting sessions to see indexes created before session > start to use GTT requires more efforts. We need to somehow maintain and check > GTT first access time. > >> >> For example, I've looked at the "on demand" building as implemented in >> global_private_temp-8.patch, I kinda doubt adding a bunch of index build >> calls into various places in index code seems somewht suspicious. > > We in any case has to initialize GTT indexes on demand even if we prohibit > usages of indexes created after first access by session to GTT. > So the difference is only in one thing: should we just initialize empty index > or populate it with local data (if rules for index usability are the same for > GTT as for normal tables). > From implementation point of view there is no big difference. Actually > building index in standard way is even simpler than constructing empty index. > Originally I have implemented > first approach (I just forgot to consider case when GTT was already user by a > session). Then I rewrited it using second approach and patch even became > simpler. > >> >> * brinbuild is added to brinRevmapInitialize, which is meant to >> initialize state for scanning. It seems wrong to build the index we're >> scanning from this function (layering and all that). >> >> * btbuild is called from _bt_getbuf. That seems a bit ... suspicious? > > > As I already mentioned - support of indexes for GTT is one of the most > challenged things in my patch. > I didn't find good and universal solution. So I agreed that call of btbuild > from _bt_getbuf may be considered as suspicious. > I will be pleased if you or sombody else can propose better elternative and > not only for B-Tree, but for all other indexes. > > But as I already wrote above, prohibiting session to used indexes created > after first access to GTT doesn't solve the problem. > For normal tables (and for local temp tables) indexes are initialized at the > time of their creation. > With GTT it doesn't work, because each session has its own local data of GTT. > We should either initialize/build index on demand (when it is first > accessed), either at the moment of session start initialize indexes for all > existed GTTs. > Last options seem to be much worser from my point of view: there may me huge > number of GTT and session may not need to access GTT at all. >> >> ... and so on for other index types. Also, what about custom indexes >> implemented in extensions? It seems a bit strange each of them has to >> support this separately. > > I have already complained about it: my patch supports GTT for all built-in > indexes, but custom indexes has to handle it themselves. > Looks like to provide some generic solution we need to extend index API, > providing two diffrent operations: creation and initialization. > But extending index API is very critical change... And also it doesn't solve > the problem with all existed extensions: them in any case have > to be rewritten to implement new API version in order to support GTT. >> >> IMHO if this really is the right solution, we need to make it work for >> existing indexes without having to tweak them individually. Why don't we >> track a flag whether an index on GTT was initialized in a given session, >> and if it was not then call the build function before calling an
Re: Add pg_file_sync() to adminpack
On Wed, Jan 15, 2020 at 1:49 Julien Rouhaud : > Actually, pg_write_server_files has enough privileges to execute those > functions anywhere on the FS as far as C code is concerned, provided > that the user running postgres daemon is allowed to (see > convert_and_check_filename), but won't be allowed to do so by default > as it won't have EXECUTE privilege on the functions. > I see, thanks for your explanation. -- Regards, Atsushi Torikoshi
Re: [HACKERS] Block level parallel vacuum
On Wed, Jan 15, 2020 at 10:05 AM Masahiko Sawada wrote: > > Thank you for updating the patch! I have a few small comments. > I have adapted all your changes, fixed the comment by Mahendra related to initializing parallel state only when there are at least two indexes. Additionally, I have changed a few comments (make the reference to parallel vacuum consistent, at some places we were referring it as 'parallel lazy vacuum' and at other places it was 'parallel index vacuum'). > The > rest looks good to me. > Okay, I think the patch is in good shape. I am planning to read it a few more times (at least 2 times) and then probably will commit it early next week (Monday or Tuesday) unless there are any major comments. I have already committed the API patch (4d8a8d0c73). -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com v48-0001-Allow-vacuum-command-to-process-indexes-in-parallel.patch Description: Binary data
Re: Disallow cancellation of waiting for synchronous replication
On 15.01.2020 01:53, Andres Freund wrote: On 2020-01-12 16:18:38 +0500, Andrey Borodin wrote: 11 янв. 2020 г., в 7:34, Bruce Momjian написал(а): Actually, it might be worse than that. In my reading of RecordTransactionCommit(), we do this: write to WAL flush WAL (durable) make visible to other backends replicate communicate to the client I think this means we make the transaction commit visible to all backends _before_ we replicate it, and potentially wait until we get a replication reply to return SUCCESS to the client. No. Data is not visible to other backend when we await sync rep. Yea, as the relevant comment in RecordTransactionCommit() says; * Note that at this stage we have marked clog, but still show as running * in the procarray and continue to hold locks. */ if (wrote_xlog && markXidCommitted) SyncRepWaitForLSN(XactLastRecEnd, true); But it's worthwhile to emphasize that data at that stage actually *can* be visible on standbys. The fact that the transaction still shows as running via procarray, on the primary, does not influence visibility determinations on the standby. In common case, consistent reading in cluster (even if remote_apply is on) is available (and have to be) only on master node. For example, if random load balancer on read-only queries is established above master and sync replicas (while meeting remote_apply is on) it's possible to catch the case when preceding reads would return data that will be absent on subsequent ones. Moreover, such visible commits on sync standby are not durable from the point of cluster view. For example, if we have two sync standbys then under failover we can switch master to sync standby on which waiting commit was not replicated but it was applied (and visible) on other standby. This switching is completely safe because client haven't received acknowledge on commit request and that transaction is in indeterminate state, it can be as committed so aborted depending on which standby will be promoted. -- Best regards, Maksim Milyutin
Re: [PATCH v1] pg_ls_tmpdir to show directories
Hello Justin, About this v4. It applies cleanly. I'm trying to think about how to get rid of the strange structure and hacks, and the arbitrary looking size 2 array. Also the recursion is one step, but I'm not sure why, ISTM it could/should go on always? Maybe the recursive implementation was not such a good idea, if I suggested it is because I did not noticed the "return next" re-entrant stuff, shame on me. Looking at the code, ISTM that relying on a stack/list would be much cleaner and easier to understand. The code could look like: ls_func() if (first_time) initialize description set next directory to visit while (1) if next directory init/push next directory to visit as current read current directory if emty pop/close current directory elif is_a_dir and recursion allowed set next directory to visit else ... return next tuple cleanup The point is to avoid a hack around the directory_fctx array, to have only one place to push/init a new directory (i.e. call AllocateDir and play around with the memory context) instead of two, and to allow deeper recursion if useful. Details : snprintf return is not checked. Maybe it should say why an overflow cannot occur. "bool nulls[3] = { 0,};" -> "bool nulls[3} = { false, false, false };"? -- Fabien.
Re: Allow to_date() and to_timestamp() to accept localized names
Hello! On 2020/01/13 21:04, Juan José Santamaría Flecha wrote: Please, find attached a version addressing the above mentioned. I have some couple picky notes. + if (name_len != norm_len) + pfree(norm_name); I'm not sure here. Is it save to assume that if it was allocated a new norm_name name_len and norm_len always will differ? static const char *const months_full[] = { "January", "February", "March", "April", "May", "June", "July", - "August", "September", "October", "November", "December", NULL + "August", "September", "October", "November", "December" }; Unfortunately I didn't see from the patch why it was necessary to remove last null element from months_full, days_short, adbc_strings, adbc_strings_long and other arrays. I think it is not good because unnecessarily increases the patch and adds code like the following: + from_char_seq_search(&value, &s, months, localized_names, + ONE_UPPER, MAX_MON_LEN, n, have_error, + lengthof(months_full)); Here it passes "months" from datetime.c to from_char_seq_search() and calculates its length using "months_full" array from formatting.c. -- Arthur
Re: standby recovery fails (tablespace related) (tentative patch and discussion)
I further fixed the last test failure (due to a small bug in the test, not in code). Attached are the new patch series. Let's see the CI pipeline result. v9-0001-Support-node-initialization-from-backup-with-tabl.patch Description: Binary data v9-0003-Fix-replay-of-create-database-records-on-standby.patch Description: Binary data v9-0002-Tests-to-replay-create-database-operation-on-stan.patch Description: Binary data
Re: Duplicate Workers entries in some EXPLAIN plans
The following review has been posted through the commitfest application: make installcheck-world: tested, failed Implements feature: not tested Spec compliant: not tested Documentation:not tested The current version of the patch (v2) applies cleanly and does what it says on the box. > Any thoughts on what we should do with text mode output (which is untouched right now)? The output Andres proposed above makes sense to me, but I'd like to get more input. Nothing to add beyond what is stated in the thread. > Sort Method: external merge Disk: 4920kB > Buffers: shared hit=682 read=10188, temp read=1415 written=2101 > Worker 0: actual time=130.058..130.324 rows=1324 loops=1 >Sort Method: external merge Disk: 5880kB >Buffers: shared hit=337 read=3489, temp read=505 written=739 > Worker 1: actual time=130.273..130.512 rows=1297 loops=1 >Buffers: shared hit=345 read=3507, temp read=505 written=744 >Sort Method: external merge Disk: 5920kB This proposal seems like a fitting approach. Awaiting for v3 which will include the text version. May I suggest a format yaml test case? Just to make certain that no regressions occur in the future. Thanks,
Re: proposal: type info support functions for functions that use "any" type
út 14. 1. 2020 v 22:09 odesílatel Tom Lane napsal: > Pavel Stehule writes: > > [ parser-support-function-with-demo-20191128.patch ] > > TBH, I'm still not convinced that this is a good idea. Restricting > the support function to only change the function's return type is > safer than the original proposal, but it's still not terribly safe. > If you change the support function's algorithm in any way, how do > you know whether you've broken existing stored queries? If the > support function consults external resources to make its choice > (perhaps checking the existence of a cast), where could we record > that the query depends on the existence of that cast? There'd be > no visible trace of that in the query parsetree. > This risk is real and cannot be simply solved without more complications. Can be solution to limit and enforce this functionality only for extensions that be initialized from shared_preload_libraries or local_preload_libraries? > I'm also still not convinced that this idea allows doing anything > that can't be done just as well with polymorphism. It would be a > really bad idea for the support function to be examining the values > of the arguments (else what happens when they're not constants?). > So all you can do is look at their types, and then it seems like > the things you can usefully do are pretty much like polymorphism, > i.e. select some one of the input types, or a related type such > as an array type or element type. If there are gaps in what you > can express with polymorphism, I'd much rather spend effort on > improving that facility than in adding something that is only > accessible to advanced C coders. (Yes, I know I've been slacking > on reviewing [1].) > For my purpose critical information is type. I don't need to work with constant, but I can imagine, so some API can be nice to work with constant value. Yes, I can solve lot of things by patch [1], but not all, and this patch shorter, and almost trivial. > Lastly, I still think that this patch doesn't begin to address > all the places that would have to know about the feature. There's > a lot of places that know about polymorphism --- if this is > polymorphism on steroids, which it is, then why don't all of those > places need to be touched? > I am sorry, I don't understand last sentence? > On the whole I think we should reject this idea. > > regards, tom lane > > [1] https://commitfest.postgresql.org/26/1911/ >
Re: pause recovery if pitr target not reached
> "Peter Eisentraut" skrev 14. januar 2020 > kl. 21:13: > > On 2019-12-11 12:40, Leif Gunnar Erlandsen wrote: >> Adding patch written for 13dev from git >> "Michael Paquier" skrev 1. desember 2019 kl. 03:08: >> On Fri, Nov 22, 2019 at 11:26:59AM +, Leif Gunnar Erlandsen wrote: > >> No it does not. It works well to demonstrate its purpose though. >> And it might be to stop with FATAL would be more correct. > > This is still under active discussion. Please note that the latest > patch does not apply, so a rebase would be nice to have. I have moved > the patch to next CF, waiting on author. > > I reworked your patch a bit. I changed the outcome to be an error, as was > discussed. I also added > tests and documentation. Please take a look. Thank you, it was not unexpexted for the patch to be a little bit smaller. Although it would have been nice to log where recover ended before reporting fatal error. And since you use RECOVERY_TARGET_UNSET, RECOVERY_TARGET_IMMEDIATE also gets included, is this correct? > -- Peter Eisentraut http://www.2ndQuadrant.com > PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [HACKERS] WAL logging problem in 9.4.3?
Hello. I added a fix for the defect 2. At Wed, 25 Dec 2019 16:15:21 -0800, Noah Misch wrote in > === Defect 2: Forgets to skip WAL due to oversimplification in heap_create() > > In ALTER TABLE cases where TryReuseIndex() avoided an index rebuild, we need > to transfer WAL-skipped state to the new index relation. Before v24nm, the > new index relation skipped WAL unconditionally. Since v24nm, the new index > relation never skips WAL. I've added a test to alter_table.sql that reveals > this problem under wal_level=minimal. The fix for this defect utilizes the mechanism that preserves relcache entry for dropped relation. If ATExecAddIndex can obtain such a relcache entry for the old relation, it should hold the newness flags and we can copy them to the new relcache entry. I added one member named oldRelId to the struct IndexStmt to let the function access the relcache entry for the old index relation. I forgot to assing version 31 to the last patch, I reused the number for this version. regards. -- Kyotaro Horiguchi NTT Open Source Software Center >From 591872bdd7b18566fe2529d20e4073900dec32fd Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi Date: Thu, 21 Nov 2019 15:28:06 +0900 Subject: [PATCH v31 1/3] Rework WAL-skipping optimization While wal_level=minimal we omit WAL-logging for certain some operations on relfilenodes that are created in the current transaction. The files are fsynced at commit. The machinery accelerates bulk-insertion operations but it fails in certain sequence of operations and a crash just after commit may leave broken table files. This patch overhauls the machinery so that WAL-loggings on all operations are omitted for such relfilenodes. This patch also introduces a new feature that small files are emitted as a WAL record instead of syncing. The new GUC variable wal_skip_threshold controls the threshold. --- doc/src/sgml/config.sgml| 43 ++-- doc/src/sgml/perform.sgml | 47 +--- src/backend/access/common/toast_internals.c | 4 +- src/backend/access/gist/gistutil.c | 31 ++- src/backend/access/gist/gistxlog.c | 21 ++ src/backend/access/heap/heapam.c| 45 +--- src/backend/access/heap/heapam_handler.c| 22 +- src/backend/access/heap/rewriteheap.c | 21 +- src/backend/access/nbtree/nbtsort.c | 41 +--- src/backend/access/rmgrdesc/gistdesc.c | 5 + src/backend/access/transam/README | 45 +++- src/backend/access/transam/xact.c | 15 ++ src/backend/access/transam/xloginsert.c | 10 +- src/backend/access/transam/xlogutils.c | 18 +- src/backend/catalog/heap.c | 4 + src/backend/catalog/storage.c | 248 ++-- src/backend/commands/cluster.c | 12 +- src/backend/commands/copy.c | 58 + src/backend/commands/createas.c | 11 +- src/backend/commands/matview.c | 12 +- src/backend/commands/tablecmds.c| 11 +- src/backend/storage/buffer/bufmgr.c | 125 +- src/backend/storage/lmgr/lock.c | 12 + src/backend/storage/smgr/md.c | 36 ++- src/backend/storage/smgr/smgr.c | 35 +++ src/backend/utils/cache/relcache.c | 159 ++--- src/backend/utils/misc/guc.c| 13 + src/include/access/gist_private.h | 2 + src/include/access/gistxlog.h | 1 + src/include/access/heapam.h | 3 - src/include/access/rewriteheap.h| 2 +- src/include/access/tableam.h| 15 +- src/include/catalog/storage.h | 6 + src/include/storage/bufmgr.h| 4 + src/include/storage/lock.h | 3 + src/include/storage/smgr.h | 1 + src/include/utils/rel.h | 57 +++-- src/include/utils/relcache.h| 8 +- src/test/regress/expected/alter_table.out | 6 + src/test/regress/sql/alter_table.sql| 7 + 40 files changed, 868 insertions(+), 351 deletions(-) diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml index 5d45b6f7cb..0e7a0bc0ee 100644 --- a/doc/src/sgml/config.sgml +++ b/doc/src/sgml/config.sgml @@ -2481,21 +2481,14 @@ include_dir 'conf.d' levels. This parameter can only be set at server start. -In minimal level, WAL-logging of some bulk -operations can be safely skipped, which can make those -operations much faster (see ). -Operations in which this optimization can be applied include: - - CREATE TABLE AS - CREATE INDEX - CLUSTER - COPY into tables that were created or truncated in the same - transaction - -But minimal WAL does not contain enough information to reconstruct the -data from a base backup and the WAL logs, so replica or -higher mus
Re: TRUNCATE on foreign tables
On Tue, Jan 14, 2020 at 06:16:17PM +0900, Kohei KaiGai wrote: > The "frels_list" is a list of foreign tables that are connected to a > particular > foreign server, thus, the server-id pulled out by foreign tables id should be > identical for all the relations in the list. > Due to the API design, this callback shall be invoked for each foreign server > involved in the TRUNCATE command, not per table basis. > > The 2nd and 3rd arguments also informs FDW driver other options of the > command. If FDW has a concept of "cascaded truncate" or "restart sequence", > it can adjust its remote query. In postgres_fdw, it follows the manner of > usual TRUNCATE command. I have done a quick read through the patch. You have modified the patch to pass down to the callback a list of relation OIDs to execute one command for all, and there are tests for FKs so that coverage looks fine. Regression tests are failing with this patch: -- TRUNCATE doesn't work on foreign tables, either directly or recursively TRUNCATE ft2; -- ERROR -ERROR: "ft2" is not a table +ERROR: foreign-data wrapper "dummy" has no handler You visibly just need to update the output because no handlers are available for truncate in this case. +void +deparseTruncateSql(StringInfo buf, Relation rel) +{ + deparseRelation(buf, rel); +} Don't see much point in having this routine. + If FDW does not provide this callback, PostgreSQL considers + TRUNCATE is not supported on the foreign table. + This sentence is weird. Perhaps you meant "as not supported"? + frels_list is a list of foreign tables that are + connected to a particular foreign server; thus, these foreign tables + should have identical foreign server ID The list is built by the backend code, so that has to be true. + foreach (lc, frels_list) + { + Relationfrel = lfirst(lc); + Oid frel_oid = RelationGetRelid(frel); + + if (server_id == GetForeignServerIdByRelId(frel_oid)) + { + frels_list = foreach_delete_current(frels_list, lc); + curr_frels = lappend(curr_frels, frel); + } + } Wouldn't it be better to fill in a hash table for each server with a list of relations? +typedef void (*ExecForeignTruncate_function) (List *frels_list, + bool is_cascade, + bool restart_seqs); I would recommend to pass down directly DropBehavior instead of a boolean to the callback. That's more extensible. -- Michael signature.asc Description: PGP signature