Re: [proposal] recovery_target "latest"
At Fri, 8 Nov 2019 16:08:47 +0300, Grigory Smolkin wrote in > > On 11/8/19 7:00 AM, Grigory Smolkin wrote: > > > > On 11/7/19 4:36 PM, Grigory Smolkin wrote: > >> I gave it some thought and now think that prohibiting recovery_target > >> 'latest' and standby was a bad idea. > >> All recovery_targets follow the same pattern of usage, so > >> recovery_target "latest" also must be capable of working in standby > >> mode. > >> All recovery_targets have a clear deterministic 'target' where > >> recovery should end. > >> In case of recovery_target "latest" this target is the end of > >> available WAL, therefore the end of available WAL must be more clearly > >> defined. > >> I will work on it. > >> > >> Thank you for a feedback. > > > > > > Attached new patch revision, now end of available WAL is defined as > > the fact of missing required WAL. > > In case of standby, the end of WAL is defined as 2 consecutive > > switches of WAL source, that didn`t provided requested record. > > In case of streaming standby, each switch of WAL source is forced > > after 3 failed attempts to get new data from walreceiver. > > > > All constants are taken off the top of my head and serves as proof of > > concept. > > TAP test is updated. > > > Attached new revision, it contains some minor refactoring. Thanks for the new patch. I found that it needs more than I thought, but it seems a bit too complicated and less stable. As the patch does, WaitForWALToBecomeAvailable needs to exit when avaiable sources are exhausted. However, we don't need to count failures to do that. It is enough that the function have two more exit point. When streaming timeout fires, and when we found that streaming is not set up when archive/wal failed. In my opinion, it is better that we have less dependency to global variables in such deep levels in a call hierachy. Such nformation can be stored in XLogPageReadPrivate. I think The doc needs to exiplain on the difference between default and latest. Please find the attached, which illustrates the first two points of the aboves. regards. -- Kyotaro Horiguchi NTT Open Source Software Center diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 3b766e66b9..8c8a1c6bf0 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -808,6 +808,7 @@ typedef struct XLogPageReadPrivate int emode; bool fetching_ckpt; /* are we fetching a checkpoint record? */ bool randAccess; + bool return_on_eow; /* returns when reaching End of WAL */ } XLogPageReadPrivate; /* @@ -887,7 +888,9 @@ static int XLogFileReadAnyTLI(XLogSegNo segno, int emode, int source); static int XLogPageRead(XLogReaderState *xlogreader, XLogRecPtr targetPagePtr, int reqLen, XLogRecPtr targetRecPtr, char *readBuf); static bool WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess, - bool fetching_ckpt, XLogRecPtr tliRecPtr); + bool fetching_ckpt, + XLogRecPtr tliRecPtr, + bool return_on_eow); static int emode_for_corrupt_record(int emode, XLogRecPtr RecPtr); static void XLogFileClose(void); static void PreallocXlogFiles(XLogRecPtr endptr); @@ -4253,6 +4256,7 @@ ReadRecord(XLogReaderState *xlogreader, XLogRecPtr RecPtr, int emode, private->fetching_ckpt = fetching_ckpt; private->emode = emode; private->randAccess = (RecPtr != InvalidXLogRecPtr); + private->return_on_eow = (recoveryTarget == RECOVERY_TARGET_LATEST); /* This is the first attempt to read this page. */ lastSourceFailed = false; @@ -4371,8 +4375,12 @@ ReadRecord(XLogReaderState *xlogreader, XLogRecPtr RecPtr, int emode, continue; } - /* In standby mode, loop back to retry. Otherwise, give up. */ - if (StandbyMode && !CheckForStandbyTrigger()) + /* + * In standby mode, loop back to retry. Otherwise, give up. + * If we told to return on end of WAL, also give up. + */ + if (StandbyMode && !CheckForStandbyTrigger() && +!private->return_on_eow) continue; else return NULL; @@ -7271,7 +7279,7 @@ StartupXLOG(void) * end of main redo apply loop */ - if (reachedStopPoint) + if (reachedStopPoint || recoveryTarget == RECOVERY_TARGET_LATEST) { if (!reachedConsistency) ereport(FATAL, @@ -7473,6 +7481,8 @@ StartupXLOG(void) recoveryStopName); else if (recoveryTarget == RECOVERY_TARGET_IMMEDIATE) snprintf(reason, sizeof(reason), "reached consistency"); + else if (recoveryTarget == RECOVERY_TARGET_LATEST) + snprintf(reason, sizeof(reason), "end of WAL"); else snprintf(reason, sizeof(reason), "no recovery target specified"); @@ -11605,7 +11615,8 @@ retry: if (!WaitForWALToBecomeAvailable(targetPagePtr + reqLen, private->randAccess, private->fetching_ckpt, - targetRecPtr)) + targetRecPtr, + private->return_on_eow)) { if (readFile >= 0) close(readFile); @@ -11745,7 +11756,9 @@ next_record
Re: FETCH FIRST clause WITH TIES option
On Mon, Nov 11, 2019 at 5:56 PM Alvaro Herrera wrote: > First, I noticed that there's a significant unanswered issue from Andrew > Gierth about this using a completely different mechanism, namely an > implicit window function. I see that Robert was concerned about the > performance of Andrew's proposed approach, but I don't see any reply > from Surafel on the whole issue. > > i don't know much about window function implementation but am sure teaching window function about limit and implementing limit *variant on top of it will not be much simpler * *and performant than the current implementation. i also think more appropriate place to * *include limit variant is limitNode not other place which can case redundancy and * *maintainability issue * *regards * *Surafel *
Re: Coding in WalSndWaitForWal
On Tue, Nov 12, 2019 at 11:27:16AM -0800, Andres Freund wrote: > It seems to me it'd be better to just remove the "get a more recent > flush pointer" block - it doesn't seem to currently surve a meaningful > purpose. +1. That was actually my suggestion upthread :) -- Michael signature.asc Description: PGP signature
Re: [PATCH] use separate PartitionedRelOptions structure to store partitioned table options
On Tue, Nov 12, 2019 at 01:50:03PM +0900, Michael Paquier wrote: > We have been through great length to have build_reloptions, so > wouldn't it be better to also have this code path do so? Sure you > need to pass NULL for the parsing table.. But there is a point to > reduce the code paths using directly parseRelOptions() and the > follow-up, expected calls to allocate and to fill in the set of > reloptions. So I have been looking at this one, and finished with the attached. It looks much better to use build_reloptions() IMO, taking advantage of the same sanity checks present for the other relkinds. -- Michael diff --git a/src/include/access/reloptions.h b/src/include/access/reloptions.h index db42aa35e0..ee7248ad58 100644 --- a/src/include/access/reloptions.h +++ b/src/include/access/reloptions.h @@ -306,6 +306,7 @@ extern bytea *default_reloptions(Datum reloptions, bool validate, relopt_kind kind); extern bytea *heap_reloptions(char relkind, Datum reloptions, bool validate); extern bytea *view_reloptions(Datum reloptions, bool validate); +extern bytea *partitioned_table_reloptions(Datum reloptions, bool validate); extern bytea *index_reloptions(amoptions_function amoptions, Datum reloptions, bool validate); extern bytea *attribute_reloptions(Datum reloptions, bool validate); diff --git a/src/include/utils/rel.h b/src/include/utils/rel.h index 8b8b237f0d..1eb323a5d4 100644 --- a/src/include/utils/rel.h +++ b/src/include/utils/rel.h @@ -277,6 +277,16 @@ typedef struct StdRdOptions #define HEAP_MIN_FILLFACTOR 10 #define HEAP_DEFAULT_FILLFACTOR 100 +/* + * PartitionedRelOptions + * Binary representation of relation options for partitioned tables. + */ +typedef struct PartitionedRelOptions +{ + int32 vl_len_;/* varlena header (do not touch directly!) */ + /* No options defined yet */ +} PartitionedRelOptions; + /* * RelationGetToastTupleTarget * Returns the relation's toast_tuple_target. Note multiple eval of argument! diff --git a/src/backend/access/common/reloptions.c b/src/backend/access/common/reloptions.c index d8790ad7a3..6600a154ab 100644 --- a/src/backend/access/common/reloptions.c +++ b/src/backend/access/common/reloptions.c @@ -1099,9 +1099,11 @@ extractRelOptions(HeapTuple tuple, TupleDesc tupdesc, case RELKIND_RELATION: case RELKIND_TOASTVALUE: case RELKIND_MATVIEW: - case RELKIND_PARTITIONED_TABLE: options = heap_reloptions(classForm->relkind, datum, false); break; + case RELKIND_PARTITIONED_TABLE: + options = partitioned_table_reloptions(datum, false); + break; case RELKIND_VIEW: options = view_reloptions(datum, false); break; @@ -1571,6 +1573,21 @@ build_reloptions(Datum reloptions, bool validate, return rdopts; } +/* + * Option parser for partitioned tables + */ +bytea * +partitioned_table_reloptions(Datum reloptions, bool validate) +{ + /* + * There are no options for partitioned tables yet, but this is + * able to do some validation. + */ + return (bytea *) build_reloptions(reloptions, validate, + RELOPT_KIND_PARTITIONED, + 0, NULL, 0); +} + /* * Option parser for views */ @@ -1614,9 +1631,6 @@ heap_reloptions(char relkind, Datum reloptions, bool validate) case RELKIND_RELATION: case RELKIND_MATVIEW: return default_reloptions(reloptions, validate, RELOPT_KIND_HEAP); - case RELKIND_PARTITIONED_TABLE: - return default_reloptions(reloptions, validate, - RELOPT_KIND_PARTITIONED); default: /* other relkinds are not supported */ return NULL; diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c index 496c206332..45aae5955d 100644 --- a/src/backend/commands/tablecmds.c +++ b/src/backend/commands/tablecmds.c @@ -719,10 +719,17 @@ DefineRelation(CreateStmt *stmt, char relkind, Oid ownerId, reloptions = transformRelOptions((Datum) 0, stmt->options, NULL, validnsps, true, false); - if (relkind == RELKIND_VIEW) - (void) view_reloptions(reloptions, true); - else - (void) heap_reloptions(relkind, reloptions, true); + switch (relkind) + { + case RELKIND_VIEW: + (void) view_reloptions(reloptions, true); + break; + case RELKIND_PARTITIONED_TABLE: + (void) partitioned_table_reloptions(reloptions, true); + break; + default: + (void) heap_reloptions(relkind, reloptions, true); + } if (stmt->ofTypename) { @@ -12187,9 +12194,11 @@ ATExecSetRelOptions(Relation rel, List *defList, AlterTableType operation, case RELKIND_RELATION: case RELKIND_TOASTVALUE: case RELKIND_MATVIEW: - case RELKIND_PARTITIONED_TABLE: (void) heap_reloptions(rel->rd_rel->relkind, newOptions, true); break; + case RELKIND_PARTITIONED_TABLE: + (void) partitioned_table_reloptions(newOptions, true); + break; case RELKIND_VIEW: (void) view_reloptions(newOptions, true); break; signature.asc Description: PGP signature
Re: [PATCH] Do not use StdRdOptions in Access Methods
On Wed, Nov 13, 2019 at 02:29:49PM +0900, Amit Langote wrote: > On Wed, Nov 13, 2019 at 2:18 PM Michael Paquier wrote: >> There could be an argument for keeping >> GET_STRING_RELOPTION actually which is still useful to get a string >> value in an option set using the stored offset, and we have >> the recently-added dummy_index_am in this category. Any thoughts? > > Not sure, maybe +0.5 on keeping GET_STRING_RELOPTION. Thinking more about it, I would tend to keep this one around. I'll wait a couple of days before coming back to it. -- Michael signature.asc Description: PGP signature
Re: Ought to use heap_multi_insert() for pg_attribute/depend insertions?
On Tue, Nov 12, 2019 at 10:33:16AM -0800, Andres Freund wrote: > On 2019-11-12 16:25:06 +0100, Daniel Gustafsson wrote: >> On 11 Nov 2019, at 09:32, Michael Paquier wrote: >> >>> This part has resulted in 75c1921, and we could just change >>> DecodeMultiInsert() so as if there is no tuple data then we'd just >>> leave. However, I don't feel completely comfortable with that either >>> as it would be nice to still check that for normal relations we >>> *always* have a FPW available. >> >> XLH_INSERT_CONTAINS_NEW_TUPLE will only be set in case of catalog relations >> IIUC (that is, non logically decoded relations), so it seems to me that we >> can >> have a fastpath out of DecodeMultiInsert() which inspects that flag without >> problems. Is this proposal along the lines of what you were thinking? > > Maybe I'm missing something, but it's the opposite, no? > boolneed_tuple_data = RelationIsLogicallyLogged(relation); Yep. This checks after IsCatalogRelation(). > ... > if (need_tuple_data) > xlrec->flags |= XLH_INSERT_CONTAINS_NEW_TUPLE; > > or am I misunderstanding what you mean? Not sure that I can think about a good way to properly track if the new tuple data is associated to a catalog or not, aka if the data is expected all the time or not to get a good sanity check when doing the multi-insert decoding. We could extend xl_multi_insert_tuple with a flag to track that, but that seems like an overkill for a simple sanity check.. -- Michael signature.asc Description: PGP signature
RE: Recovery performance of DROP DATABASE with many tablespaces
On Wed, Oct. 2, 2019 5:40 PM, Fujii Masao wrote: > On Tue, Jul 10, 2018 at 3:04 PM Michael Paquier wrote: > > > > On Thu, Jul 05, 2018 at 01:42:20AM +0900, Fujii Masao wrote: > > > TBH, I have no numbers measured by the test. > > > One question about your test is; how did you measure the *recovery > > > time* of DROP DATABASE? Since it's *recovery* performance, basically > > > it's not easy to measure that. > > > > It would be simple to measure the time it takes to replay this single > > DROP DATABASE record by putting two gettimeofday() calls or such > > things and then take the time difference. There are many methods that > > you could use here, and I suppose that with a shared buffer setting of > > a couple of GBs of shared buffers you would see a measurable > > difference with a dozen of tablespaces or so. You could also take a > > base backup after creating all the tablespaces, connect the standby > > and then drop the database on the primary to see the actual time it > > takes. Your patch looks logically correct to me because > > DropDatabaseBuffers is a > > *bottleneck* with large shared_buffers, and it would be nice to see > > numbers. > > Thanks for the comment! > > I measured how long it takes to replay DROP DATABASE with 1000 tablespaces, > in master and patched version. shared_buffers was set to 16GB. > > [master] > It took 8 seconds to replay DROP DATABASE with 1000 tablespaces, as follows. > In this case, 16GB shared_buffers was fully scanned 1000 times. > > 2019-10-02 16:50:14 JST LOG: redo starts at 0/228 > 2019-10-02 16:50:22 JST LOG: redo done at 0/300A298 > > [patched] > It took less than 1 second to replay DROP DATABASE with 1000 tablespaces, > as follows. In this case, 16GB shared_buffers was scanned only one time. > > 2019-10-02 16:47:03 JST LOG: redo starts at 0/228 > 2019-10-02 16:47:03 JST LOG: redo done at 0/3001588 > Hi Fujii-san, It's been a while, so I checked the patch once again. It's fairly straightforward and I saw no problems nor bug in the code. > [patched] > It took less than 1 second to replay DROP DATABASE with 1000 tablespaces, The results are good. I want to replicate the performance to confirm the results as well. Could you share how you measured the recovery replay? Did you actually execute a failover? Regards, Kirk Jamison
Re: pg_waldump and PREPARE
12.11.2019 12:41, Fujii Masao пишет: Ok, I changed the patch that way. Attached is the latest version of the patch. Regards, I did not see any problems in this version of the patch. The information displayed by pg_waldump for the PREPARE record is sufficient for use. -- Andrey Lepikhov Postgres Professional https://postgrespro.com The Russian Postgres Company
Re: Remove HeapTuple and Buffer dependency for predicate locking functions
On Sun, Nov 10, 2019 at 8:21 PM Thomas Munro wrote: > I pushed the first two, Thank You! but on another read-through of the main patch > I didn't like the comments for CheckForSerializableConflictOut() or > the fact that it checks SerializationNeededForRead() again, after I > thought a bit about what the contract for this API is now. Here's a > version with small fixup that I'd like to squash into the patch. > Please let me know what you think, The thought or reasoning behind having SerializationNeededForRead() inside CheckForSerializableConflictOut() is to keep that API clean and complete by itself. Only if AM like heap needs to perform some AM specific checking only for serialization needed case can do so but is not forced. So, if AM for example Zedstore doesn't need to do any specific checking, then it can directly call CheckForSerializableConflictOut(). With the modified fixup patch, the responsibility is unnecessarily forced to caller even if CheckForSerializableConflictOut() can do it. I understand the intent is to avoid duplicate check for heap. > > or if you see how to improve it > further. > I had proposed as alternative way in initial email and also later, didn't receive comment on that, so re-posting. Alternative way to refactor. CheckForSerializableConflictOut() can take callback function and (void *) callback argument for AM specific check. So, the flow would be AM calling CheckForSerializableConflictOut() as today and only if SerializationNeededForRead() will invoke the callback to check with AM if more work should be performed or not. Essentially HeapCheckForSerializableConflictOut() will become callback function instead. So, roughly would look like typedef bool (*AMCheckForSerializableConflictOutCallback) (void *arg); void CheckForSerializableConflictOut(Relation relation, TransactionId xid, Snapshot snapshot, AMCheckForSerializableConflictOutCallback callback, void *callback_arg) { if (!SerializationNeededForRead(relation, snapshot)) return; if (callback != NULL && !callback(callback_args)) return; . } With this AMs which don't have any specific checks to perform can pass callback as NULL. So, function call is involved only if SerializationNeededForRead() and only for AMs which need it. Just due to void* callback argument aspect I didn't prefer that solution and felt AM performing checks and calling CheckForSerializableConflictOut() seems better. Please let me know how you feel about this.
Re: dropdb --force
st 13. 11. 2019 v 7:12 odesílatel Amit Kapila napsal: > On Tue, Nov 12, 2019 at 11:17 AM Amit Kapila > wrote: > > > > I am planning to commit this patch tomorrow unless I see more comments > > or interest from someone else to review this. > > > > Pushed. Pavel, feel free to submit dropdb utility-related patch if you > want. > I hope I send this patch today. It's simply job. Pavel > -- > With Regards, > Amit Kapila. > EnterpriseDB: http://www.enterprisedb.com >
Re: dropdb --force
On Tue, Nov 12, 2019 at 11:17 AM Amit Kapila wrote: > > I am planning to commit this patch tomorrow unless I see more comments > or interest from someone else to review this. > Pushed. Pavel, feel free to submit dropdb utility-related patch if you want. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] Block level parallel vacuum
On Tue, Nov 12, 2019 at 3:14 PM Mahendra Singh wrote: > > On Mon, 11 Nov 2019 at 16:36, Amit Kapila wrote: > > > > On Mon, Nov 11, 2019 at 2:53 PM Mahendra Singh wrote: > > > > > > > > > For small indexes also, we gained some performance by parallel vacuum. > > > > > > > Thanks for doing all these tests. It is clear with this and previous > > tests that this patch has benefit in wide variety of cases. However, > > we should try to see some worst cases as well. For example, if there > > are multiple indexes on a table and only one of them is large whereas > > all others are very small say having a few 100 or 1000 rows. > > > > Thanks Amit for your comments. > > I did some testing on the above suggested lines. Below is the summary: > Test case:(I created 16 indexes but only 1 index is large, other are very > small) > create table test(a int, b int, c int, d int, e int, f int, g int, h int); > create index i3 on test (a) where a > 2000 and a < 3000; > create index i4 on test (a) where a > 3000 and a < 4000; > create index i5 on test (a) where a > 4000 and a < 5000; > create index i6 on test (a) where a > 5000 and a < 6000; > create index i7 on test (b) where a < 1000; > create index i8 on test (c) where a < 1000; > create index i9 on test (d) where a < 1000; > create index i10 on test (d) where a < 1000; > create index i11 on test (d) where a < 1000; > create index i12 on test (d) where a < 1000; > create index i13 on test (d) where a < 1000; > create index i14 on test (d) where a < 1000; > create index i15 on test (d) where a < 1000; > create index i16 on test (d) where a < 1000; > insert into test select i,i,i,i,i,i,i,i from generate_series(1,100) as i; > delete from test where a %2=0; > > case 1: vacuum without using parallel workers. > vacuum test; > 228.259 ms > > case 2: vacuum with 1 parallel worker. > vacuum (parallel 1) test; > 251.725 ms > > case 3: vacuum with 3 parallel workers. > vacuum (parallel 3) test; > 259.986 > > From above results, it seems that if indexes are small, then parallel vacuum > is not beneficial as compared to normal vacuum. > Right and that is what is expected as well. However, I think if somehow disallow very small indexes to use parallel worker, then it will be better. Can we use min_parallel_index_scan_size to decide whether a particular index can participate in a parallel vacuum? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] Block level parallel vacuum
On Wed, 13 Nov 2019 at 12:43, Dilip Kumar wrote: > > On Tue, Nov 12, 2019 at 5:31 PM Masahiko Sawada > wrote: > > > > On Tue, 12 Nov 2019 at 20:29, Dilip Kumar wrote: > > > > > > On Tue, Nov 12, 2019 at 4:04 PM Masahiko Sawada > > > wrote: > > > > > > > > On Mon, 11 Nov 2019 at 17:57, Dilip Kumar wrote: > > > > > > > > > > On Tue, Oct 29, 2019 at 12:37 PM Masahiko Sawada > > > > > wrote: > > > > > > I realized that v31-0006 patch doesn't work fine so I've attached > > > > > > the > > > > > > updated version patch that also incorporated some comments I got so > > > > > > far. Sorry for the inconvenience. I'll apply your 0001 patch and > > > > > > also > > > > > > test the total delay time. > > > > > > > > > > > While reviewing the 0002, I got one doubt related to how we are > > > > > dividing the maintainance_work_mem > > > > > > > > > > +prepare_index_statistics(LVShared *lvshared, Relation *Irel, int > > > > > nindexes) > > > > > +{ > > > > > + /* Compute the new maitenance_work_mem value for index vacuuming */ > > > > > + lvshared->maintenance_work_mem_worker = > > > > > + (nindexes_mwm > 0) ? maintenance_work_mem / nindexes_mwm : > > > > > maintenance_work_mem; > > > > > +} > > > > > Is it fair to just consider the number of indexes which use > > > > > maintenance_work_mem? Or we need to consider the number of worker as > > > > > well. My point is suppose there are 10 indexes which will use the > > > > > maintenance_work_mem but we are launching just 2 workers then what is > > > > > the point in dividing the maintenance_work_mem by 10. > > > > > > > > > > IMHO the calculation should be like this > > > > > lvshared->maintenance_work_mem_worker = (nindexes_mwm > 0) ? > > > > > maintenance_work_mem / Min(nindexes_mwm, nworkers) : > > > > > maintenance_work_mem; > > > > > > > > > > Am I missing something? > > > > > > > > No, I think you're right. On the other hand I think that dividing it > > > > by the number of indexes that will use the mantenance_work_mem makes > > > > sense when parallel degree > the number of such indexes. Suppose the > > > > table has 2 indexes and there are 10 workers then we should divide the > > > > maintenance_work_mem by 2 rather than 10 because it's possible that at > > > > most 2 indexes that uses the maintenance_work_mem are processed in > > > > parallel at a time. > > > > > > > Right, thats the reason I suggested divide with Min(nindexes_mwm, > > > nworkers). > > > > Thanks! I'll fix it in the next version patch. > > > One more comment. > > +lazy_vacuum_indexes(LVRelStats *vacrelstats, Relation *Irel, > + int nindexes, IndexBulkDeleteResult **stats, > + LVParallelState *lps) > +{ > + > > + if (ParallelVacuumIsActive(lps)) > + { > > + > + lazy_parallel_vacuum_or_cleanup_indexes(vacrelstats, Irel, nindexes, > + stats, lps); > + > + } > + > + for (idx = 0; idx < nindexes; idx++) > + { > + /* > + * Skip indexes that we have already vacuumed during parallel index > + * vacuuming. > + */ > + if (ParallelVacuumIsActive(lps) && !IndStatsIsNull(lps->lvshared, idx)) > + continue; > + > + lazy_vacuum_index(Irel[idx], &stats[idx], vacrelstats->dead_tuples, > + vacrelstats->old_live_tuples); > + } > +} > > In this function, if ParallelVacuumIsActive, we perform the parallel > vacuum for all the index for which parallel vacuum is supported and > once that is over we finish vacuuming remaining indexes for which > parallel vacuum is not supported. But, my question is that inside > lazy_parallel_vacuum_or_cleanup_indexes, we wait for all the workers > to finish their job then only we start with the sequential vacuuming > shouldn't we start that immediately as soon as the leader > participation is over in the parallel vacuum? If we do that, while the leader process is vacuuming indexes that don't not support parallel vacuum sequentially some workers might be vacuuming for other indexes. Isn't it a problem? If it's not problem, I think we can tie up indexes that don't support parallel vacuum to the leader and do parallel index vacuum. Regards, -- Masahiko Sawadahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Invisible PROMPT2
st 13. 11. 2019 v 4:15 odesílatel Thomas Munro napsal: > Hello hackers, > > From the advanced bikeshedding department: I'd like my psql > transcripts to have the usual alignment, but be easier to copy and > paste later without having weird prompt stuff in the middle. How > about a prompt format directive %w that means "whitespace of the same > width as %/"? Then you can make set your PROMPT2 to '%w ' and it > becomes invisible: > > pgdu=# create table foo ( > i int, > j int >); > CREATE TABLE > pgdu=# > +1 Pavel
Re: pg_waldump and PREPARE
On Tue, Nov 12, 2019 at 06:41:12PM +0900, Fujii Masao wrote: > Ok, I changed the patch that way. > Attached is the latest version of the patch. Thanks for the new patch. Looks fine to me. -- Michael signature.asc Description: PGP signature
Re: [HACKERS] Block level parallel vacuum
On Wed, Nov 13, 2019 at 9:48 AM Amit Kapila wrote: > > Yeah, 0,2,3 and 4 sounds reasonable to me. Earlier, Dilip also got > confused with option 1. > Let me try to summarize the discussion on this point and see if others have any opinion on this matter. We need a way to allow IndexAm to specify whether it can participate in a parallel vacuum. As we know there are two phases of index-vacuum, bulkdelete and vacuumcleanup and in many cases, the bulkdelete performs the main deletion work and then vacuumcleanup just returns index statistics. So, for such cases, we don't want the second phase to be performed by a parallel vacuum worker. Now, if the bulkdelete phase is not performed, then vacuumcleanup can process the entire index in which case it is better to do that phase via parallel worker. OTOH, in some cases vacuumcleanup takes another pass over-index to reclaim empty pages and update record the same in FSM even if bulkdelete is performed. This happens in gin and bloom indexes. Then, we have an index where we do all the work in cleanup phase like in the case of brin indexes. Now, for this category of indexes, we want vacuumcleanup phase to be also performed by a parallel worker. In short different indexes have different requirements for which phase of index vacuum can be performed in parallel. Just to be clear, we can't perform both the phases (bulkdelete and cleanup) in one-go as bulk-delete can happen multiple times on a large index whereas vacuumcleanup is done once at the end. Based on these needs, we came up with a way to allow users to specify this information for IndexAm's. Basically, Indexam will expose a variable amparallelvacuumoptions which can have below options VACUUM_OPTION_NO_PARALLEL 1 << 0 # vacuum (neither bulkdelete nor vacuumcleanup) can't be performed in parallel VACUUM_OPTION_PARALLEL_BULKDEL 1 << 1 # bulkdelete can be done in parallel (Indexes nbtree, hash, gin, gist, spgist, bloom will set this flag) VACUUM_OPTION_PARALLEL_COND_CLEANUP 1 << 2 # vacuumcleanup can be done in parallel if bulkdelete is not performed (Indexes nbtree, brin, gin, gist, spgist, bloom will set this flag) VACUUM_OPTION_PARALLEL_CLEANUP 1 << 3 # vacuumcleanup can be done in parallel even if bulkdelete is already performed (Indexes gin, brin, and bloom will set this flag) We have discussed to expose this information via two variables but the above seems like a better idea to all the people involved. Any suggestions? Anyone thinks this is not the right way to expose this information or there is no need to expose this information or they have a better idea for this? Sawada-San, Dilip, feel free to correct me. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [PATCH] Do not use StdRdOptions in Access Methods
On Wed, Nov 13, 2019 at 2:18 PM Michael Paquier wrote: > On Wed, Nov 13, 2019 at 10:52:52AM +0900, Amit Langote wrote: > > Thanks for chiming in about that. I guess that means that we don't > > need those macros, except GET_STRING_RELOPTION_LEN() that's used in > > allocateReloptStruct(), which can be moved to reloptions.c. Is that > > correct? > > I have been looking on the net to see if there are any traces of code > using those macros, but could not find any. The last trace of a macro > use is in 8ebe1e3, which just relies on GET_STRING_RELOPTION_LEN. So > it looks rather convincing now to just remove this code. Attached is > a patch for that. Thank you. > There could be an argument for keeping > GET_STRING_RELOPTION actually which is still useful to get a string > value in an option set using the stored offset, and we have > the recently-added dummy_index_am in this category. Any thoughts? Not sure, maybe +0.5 on keeping GET_STRING_RELOPTION. Thanks, Amit
Re: SPI error with non-volatile functions
On 13/11/2019 06:13, Andres Freund wrote: > Hi, > > On 2019-11-13 05:09:31 +, Tom Mercha wrote: >> I've been using SPI to execute some queries and this time I've tried to >> issue CREATE TABLE commands through SPI. I've been getting the message >> "ERROR: CREATE TABLE AS is not allowed in a non-volatile function". > > Any chance you're specifying read_only = true to > SPI_execute()/execute_plan()/...? > > Greetings, > > Andres Freund > Dear Andres That's exactly what's up! Everything is working as intended now. So sorry this was a bit silly of me, I didn't understand the message as a reference to that configuration. Thanks so much. Best regards Tom
Re: [PATCH] Do not use StdRdOptions in Access Methods
On Wed, Nov 13, 2019 at 10:52:52AM +0900, Amit Langote wrote: > Thanks for chiming in about that. I guess that means that we don't > need those macros, except GET_STRING_RELOPTION_LEN() that's used in > allocateReloptStruct(), which can be moved to reloptions.c. Is that > correct? I have been looking on the net to see if there are any traces of code using those macros, but could not find any. The last trace of a macro use is in 8ebe1e3, which just relies on GET_STRING_RELOPTION_LEN. So it looks rather convincing now to just remove this code. Attached is a patch for that. There could be an argument for keeping GET_STRING_RELOPTION actually which is still useful to get a string value in an option set using the stored offset, and we have the recently-added dummy_index_am in this category. Any thoughts? -- Michael diff --git a/src/include/access/reloptions.h b/src/include/access/reloptions.h index db42aa35e0..e24fdd1975 100644 --- a/src/include/access/reloptions.h +++ b/src/include/access/reloptions.h @@ -148,124 +148,6 @@ typedef struct int offset; /* offset of field in result struct */ } relopt_parse_elt; - -/* - * These macros exist for the convenience of amoptions writers (but consider - * using fillRelOptions, which is a lot simpler). Beware of multiple - * evaluation of arguments! - * - * The last argument in the HANDLE_*_RELOPTION macros allows the caller to - * determine whether the option was set (true), or its value acquired from - * defaults (false); it can be passed as (char *) NULL if the caller does not - * need this information. - * - * optname is the option name (a string), var is the variable - * on which the value should be stored (e.g. StdRdOptions->fillfactor), and - * option is a relopt_value pointer. - * - * The normal way to use this is to loop on the relopt_value array returned by - * parseRelOptions: - * for (i = 0; options[i].gen->name; i++) - * { - * if (HAVE_RELOPTION("fillfactor", options[i]) - * { - * HANDLE_INT_RELOPTION("fillfactor", rdopts->fillfactor, options[i], &isset); - * continue; - * } - * if (HAVE_RELOPTION("default_row_acl", options[i]) - * { - * ... - * } - * ... - * if (validate) - * ereport(ERROR, - * (errmsg("unknown option"))); - * } - * - * Note that this is more or less the same that fillRelOptions does, so only - * use this if you need to do something non-standard within some option's - * code block. - */ -#define HAVE_RELOPTION(optname, option) \ - (strncmp(option.gen->name, optname, option.gen->namelen + 1) == 0) - -#define HANDLE_INT_RELOPTION(optname, var, option, wasset) \ - do { \ - if (option.isset) \ - var = option.values.int_val; \ - else \ - var = ((relopt_int *) option.gen)->default_val; \ - (wasset) != NULL ? *(wasset) = option.isset : (dummyret)NULL; \ - } while (0) - -#define HANDLE_BOOL_RELOPTION(optname, var, option, wasset) \ - do { \ - if (option.isset) \ - var = option.values.bool_val; \ - else \ - var = ((relopt_bool *) option.gen)->default_val; \ - (wasset) != NULL ? *(wasset) = option.isset : (dummyret) NULL; \ - } while (0) - -#define HANDLE_REAL_RELOPTION(optname, var, option, wasset) \ - do { \ - if (option.isset) \ - var = option.values.real_val; \ - else \ - var = ((relopt_real *) option.gen)->default_val; \ - (wasset) != NULL ? *(wasset) = option.isset : (dummyret) NULL; \ - } while (0) - -/* - * Note that this assumes that the variable is already allocated at the tail of - * reloptions structure (StdRdOptions or equivalent). - * - * "base" is a pointer to the reloptions structure, and "offset" is an integer - * variable that must be initialized to sizeof(reloptions structure). This - * struct must have been allocated with enough space to hold any string option - * present, including terminating \0 for every option. SET_VARSIZE() must be - * called on the struct with this offset as the second argument, after all the - * string options have been processed. - */ -#define HANDLE_STRING_RELOPTION(optname, var, option, base, offset, wasset) \ - do { \ - relopt_string *optstring = (relopt_string *) option.gen;\ - char *string_val; \ - if (option.isset) \ - string_val = option.values.string_val;\ - else if (!optstring->default_isnull) \ - string_val = optstring->default_val;\ - else \ - string_val = NULL; \ - (wasset) != NULL ? *(wasset) = option.isset : (dummyret) NULL; \ - if (string_val == NULL) \ - var = 0; \ - else \ - { \ - strcpy(((char *)(base)) + (offset), string_val); \ - var = (offset); \ - (offset) += strlen(string_val) + 1; \ - } \ - } while (0) - -/* - * For use during amoptions: get the strlen of a string option - * (either default or the user defined value) - */ -#define GET_STRING_RELOPTION
Re: SPI error with non-volatile functions
Hi, On 2019-11-13 05:09:31 +, Tom Mercha wrote: > I've been using SPI to execute some queries and this time I've tried to > issue CREATE TABLE commands through SPI. I've been getting the message > "ERROR: CREATE TABLE AS is not allowed in a non-volatile function". Any chance you're specifying read_only = true to SPI_execute()/execute_plan()/...? Greetings, Andres Freund
SPI error with non-volatile functions
Dear Hackers I've been using SPI to execute some queries and this time I've tried to issue CREATE TABLE commands through SPI. I've been getting the message "ERROR: CREATE TABLE AS is not allowed in a non-volatile function". I'm a bit confused because my functions are set as volatile when I got that result. I was sure I'd be able to issue CREATE TABLE through SPI because a possible return value of SPI_execute is SPI_OK_UTILITY if a utility command such as CREATE TABLE was executed. Maybe the caveat is the following. I am actually invoking my function through CREATE LANGUAGE's inline handler using an anonymous do block. So maybe I need to take some additional considerations for this reason? The following is what my functions look like: CREATE FUNCTION myLanguage.myLanguage_function_call_handler() RETURNS language_handler AS 'MODULE_PATHNAME','myLanguage_function_call_handler' LANGUAGE C VOLATILE; CREATE FUNCTION myLanguage.myLanguage_inline_function_handler(internal) RETURNS void AS 'MODULE_PATHNAME','myLanguage_inline_function_handler' LANGUAGE C VOLATILE; CREATE LANGUAGE myLanguage HANDLER myLanguage.myLanguage_function_call_handler INLINE myLanguage.myLanguage_inline_function_handler; COMMENT ON LANGUAGE myLanguage IS 'My Language'; Have I correctly approached the issue? Maybe there is a workaround? Best regards Tom
Re: [PATCH][DOC] Fix for PREPARE TRANSACTION doc and postgres_fdw message.
On Tue, Nov 12, 2019 at 09:35:03AM +0900, Michael Paquier wrote: > Indeed, thanks for looking. I thought that the callback was called > after checking for max_prepared_transaction, but that's not the case. > So let's add at least a test case. Any objections? Okay, done. -- Michael signature.asc Description: PGP signature
Re: cost based vacuum (parallel)
On Tue, 12 Nov 2019 at 20:22, Masahiko Sawada wrote: > > On Tue, 12 Nov 2019 at 19:08, Amit Kapila wrote: > > > > On Tue, Nov 12, 2019 at 3:03 PM Dilip Kumar wrote: > > > > > > On Tue, Nov 12, 2019 at 10:47 AM Dilip Kumar > > > wrote: > > > > > > > > On Mon, Nov 11, 2019 at 4:23 PM Amit Kapila > > > > wrote: > > > > > > > > > > On Mon, Nov 11, 2019 at 12:59 PM Dilip Kumar > > > > > wrote: > > > > > > > > > > > > On Mon, Nov 11, 2019 at 9:43 AM Dilip Kumar > > > > > > wrote: > > > > > > > > > > > > > > On Fri, Nov 8, 2019 at 11:49 AM Amit Kapila > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > Yeah, I think it is difficult to get the exact balance, but we > > > > > > > > can try > > > > > > > > to be as close as possible. We can try to play with the > > > > > > > > threshold and > > > > > > > > another possibility is to try to sleep in proportion to the > > > > > > > > amount of > > > > > > > > I/O done by the worker. > > > > > > > I have done another experiment where I have done another 2 > > > > > > > changes on > > > > > > > top op patch3 > > > > > > > a) Only reduce the local balance from the total shared balance > > > > > > > whenever it's applying delay > > > > > > > b) Compute the delay based on the local balance. > > > > > > > > > > > > > > patch4: > > > > > > > worker 0 delay=84.13 total I/O=17931 hit=17891 miss=0 dirty=2 > > > > > > > worker 1 delay=89.23 total I/O=17931 hit=17891 miss=0 dirty=2 > > > > > > > worker 2 delay=88.68 total I/O=17931 hit=17891 miss=0 dirty=2 > > > > > > > worker 3 delay=80.79 total I/O=16378 hit=4318 miss=0 dirty=603 > > > > > > > > > > > > > > I think with this approach the delay is divided among the worker > > > > > > > quite > > > > > > > well compared to other approaches > > > > > > > > > > > > > > > > > > > > .. > > > > > > I have tested the same with some other workload(test file attached). > > > > > > I can see the same behaviour with this workload as well that with > > > > > > the > > > > > > patch 4 the distribution of the delay is better compared to other > > > > > > patches i.e. worker with more I/O have more delay and with equal IO > > > > > > have alsomost equal delay. Only thing is that the total delay with > > > > > > the patch 4 is slightly less compared to other pacthes. > > > > > > > > > > > > > > > > I see one problem with the formula you have used in the patch, maybe > > > > > that is causing the value of total delay to go down. > > > > > > > > > > - if (new_balance >= VacuumCostLimit) > > > > > + VacuumCostBalanceLocal += VacuumCostBalance; > > > > > + if ((new_balance >= VacuumCostLimit) && > > > > > + (VacuumCostBalanceLocal > VacuumCostLimit/(0.5 * nworker))) > > > > > > > > > > As per discussion, the second part of the condition should be > > > > > "VacuumCostBalanceLocal > (0.5) * VacuumCostLimit/nworker". I think > > > > > you can once change this and try again. Also, please try with the > > > > > different values of threshold (0.3, 0.5, 0.7, etc.). > > > > > > > > > I have modified the patch4 and ran with different values. But, I > > > > don't see much difference in the values with the patch4. Infact I > > > > removed the condition for the local balancing check completely still > > > > the delays are the same, I think this is because with patch4 worker > > > > are only reducing their own balance and also delaying as much as their > > > > local balance. So maybe the second condition will not have much > > > > impact. > > > > > > > > Yeah, but I suspect the condition (when the local balance exceeds a > > certain threshold, then only try to perform delay) you mentioned can > > have an impact in some other scenarios. So, it is better to retain > > the same. I feel the overall results look sane and the approach seems > > reasonable to me. > > > > > > > > > I have revised the patch4 so that it doesn't depent upon the fix > > > number of workers, instead I have dynamically updated the worker > > > count. > > > > > > > Thanks. Sawada-San, by any chance, can you try some of the tests done > > by Dilip or some similar tests just to rule out any sort of > > machine-specific dependency? > > Sure. I'll try it tomorrow. I've done some tests while changing shared buffer size, delays and number of workers. The overall results has the similar tendency as the result shared by Dilip and looks reasonable to me. * test.sh shared_buffers = '4GB'; max_parallel_maintenance_workers = 6; vacuum_cost_delay = 1; worker 0 delay=89.315000 total io=17931 hit=17891 miss=0 dirty=2 worker 1 delay=88.86 total io=17931 hit=17891 miss=0 dirty=2 worker 2 delay=89.29 total io=17931 hit=17891 miss=0 dirty=2 worker 3 delay=81.805000 total io=16378 hit=4318 miss=0 dirty=603 shared_buffers = '1GB'; max_parallel_maintenance_workers = 6; vacuum_cost_delay = 1; worker 0 delay=89.21 total io=17931 hit=17891 miss=0 dirty=2 worker 1 delay=89.325000 total io=17931 hit=17891 miss=0 dirty=2 worker 2 delay=88.870
Re: [HACKERS] Block level parallel vacuum
On Wed, Nov 13, 2019 at 9:12 AM Dilip Kumar wrote: > > On Tue, Nov 12, 2019 at 5:31 PM Masahiko Sawada > wrote: > > > > On Tue, 12 Nov 2019 at 20:29, Dilip Kumar wrote: > > > > > > On Tue, Nov 12, 2019 at 4:04 PM Masahiko Sawada > > > wrote: > > > > > > > > On Mon, 11 Nov 2019 at 17:57, Dilip Kumar wrote: > > > > > > > > > > On Tue, Oct 29, 2019 at 12:37 PM Masahiko Sawada > > > > > wrote: > > > > > > I realized that v31-0006 patch doesn't work fine so I've attached > > > > > > the > > > > > > updated version patch that also incorporated some comments I got so > > > > > > far. Sorry for the inconvenience. I'll apply your 0001 patch and > > > > > > also > > > > > > test the total delay time. > > > > > > > > > > > While reviewing the 0002, I got one doubt related to how we are > > > > > dividing the maintainance_work_mem > > > > > > > > > > +prepare_index_statistics(LVShared *lvshared, Relation *Irel, int > > > > > nindexes) > > > > > +{ > > > > > + /* Compute the new maitenance_work_mem value for index vacuuming */ > > > > > + lvshared->maintenance_work_mem_worker = > > > > > + (nindexes_mwm > 0) ? maintenance_work_mem / nindexes_mwm : > > > > > maintenance_work_mem; > > > > > +} > > > > > Is it fair to just consider the number of indexes which use > > > > > maintenance_work_mem? Or we need to consider the number of worker as > > > > > well. My point is suppose there are 10 indexes which will use the > > > > > maintenance_work_mem but we are launching just 2 workers then what is > > > > > the point in dividing the maintenance_work_mem by 10. > > > > > > > > > > IMHO the calculation should be like this > > > > > lvshared->maintenance_work_mem_worker = (nindexes_mwm > 0) ? > > > > > maintenance_work_mem / Min(nindexes_mwm, nworkers) : > > > > > maintenance_work_mem; > > > > > > > > > > Am I missing something? > > > > > > > > No, I think you're right. On the other hand I think that dividing it > > > > by the number of indexes that will use the mantenance_work_mem makes > > > > sense when parallel degree > the number of such indexes. Suppose the > > > > table has 2 indexes and there are 10 workers then we should divide the > > > > maintenance_work_mem by 2 rather than 10 because it's possible that at > > > > most 2 indexes that uses the maintenance_work_mem are processed in > > > > parallel at a time. > > > > > > > Right, thats the reason I suggested divide with Min(nindexes_mwm, > > > nworkers). > > > > Thanks! I'll fix it in the next version patch. > > > One more comment. > > +lazy_vacuum_indexes(LVRelStats *vacrelstats, Relation *Irel, > + int nindexes, IndexBulkDeleteResult **stats, > + LVParallelState *lps) > +{ > + > > + if (ParallelVacuumIsActive(lps)) > + { > > + > + lazy_parallel_vacuum_or_cleanup_indexes(vacrelstats, Irel, nindexes, > + stats, lps); > + > + } > + > + for (idx = 0; idx < nindexes; idx++) > + { > + /* > + * Skip indexes that we have already vacuumed during parallel index > + * vacuuming. > + */ > + if (ParallelVacuumIsActive(lps) && !IndStatsIsNull(lps->lvshared, idx)) > + continue; > + > + lazy_vacuum_index(Irel[idx], &stats[idx], vacrelstats->dead_tuples, > + vacrelstats->old_live_tuples); > + } > +} > > In this function, if ParallelVacuumIsActive, we perform the parallel > vacuum for all the index for which parallel vacuum is supported and > once that is over we finish vacuuming remaining indexes for which > parallel vacuum is not supported. But, my question is that inside > lazy_parallel_vacuum_or_cleanup_indexes, we wait for all the workers > to finish their job then only we start with the sequential vacuuming > shouldn't we start that immediately as soon as the leader > participation is over in the parallel vacuum? > + /* + * Since parallel workers cannot access data in temporary tables, parallel + * vacuum is not allowed for temporary relation. + */ + if (RelationUsesLocalBuffers(onerel) && params->nworkers >= 0) + { + ereport(WARNING, + (errmsg("skipping vacuum on \"%s\" --- cannot vacuum temporary tables in parallel", + RelationGetRelationName(onerel; + relation_close(onerel, lmode); + PopActiveSnapshot(); + CommitTransactionCommand(); + /* It's OK to proceed with ANALYZE on this table */ + return true; + } + If we can not support the parallel vacuum for the temporary table then shouldn't we fall back to the normal vacuum instead of skipping the table. I think it's not fair that if the user has given system-wide parallel vacuum then all the temp table will be skipped and not at all vacuumed then user need to again perform normal vacuum on those tables. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
RE: libpq debug log
Hello, Thank you for your review. I update patch. Please find attached my patch. > > 2019-04-04 02:39:51.488 UTC > Query 59 "SELECT > pg_catalog.set_config('search_path', '', false)" > > The "59" there seems quite odd, though. Could you explain more detail about this? "59" is length of protocol message contents. (It does not contain first 1 byte.) This order is based on the message format. https://www.postgresql.org/docs/current/protocol-message-formats.html > * What is this in pg_conn? I don't understand why this array is of size > MAXPGPATH. > + PGFrontendLogMsgEntry frontend_entry[MAXPGPATH]; > This seems to make pg_conn much larger than we'd like. Sure. I remove this and change code to use list. > Minor review points: I accept all these review points. Regards, Aya Iwata v7-libpq-PQtrace-output-one-line.patch Description: v7-libpq-PQtrace-output-one-line.patch
Re: [HACKERS] Block level parallel vacuum
On Wed, Nov 13, 2019 at 8:34 AM Masahiko Sawada wrote: > > On Wed, 13 Nov 2019 at 11:38, Amit Kapila wrote: > > > > > It might be that we need to do it the way > > I originally proposed the different values of amparallelvacuumoptions > > or maybe some variant of it where the default value can clearly say > > that IndexAm doesn't support a parallel vacuum. > > Okay. After more thoughts on your original proposal, what I get > confused on your proposal is that there are two types of flags that > enable and disable options. Looking at 2, 3 and 4, it looks like all > options are disabled by default and setting these flags means to > enable them. On the other hand looking at 1, it looks like these > options are enabled by default and setting the flag means to disable > it. 0 makes sense to me. So how about having 0, 2, 3 and 4? > Yeah, 0,2,3 and 4 sounds reasonable to me. Earlier, Dilip also got confused with option 1. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
RE: New SQL counter statistics view (pg_stat_sql)
From: Thomas Munro Sent: Monday, 4 November 2019 1:43 PM > No comment on the patch but I noticed that the documentation changes don't > build. Please make sure you can "make docs" successfully, having installed > the documentation tools[1]. Thanks for the feedback. An updated patch which fixes the docs issue is attached. Kind Regards. Peter Smith --- Fujitsu Australia 0001-pg_stat_sql.patch Description: 0001-pg_stat_sql.patch
Re: [HACKERS] Block level parallel vacuum
On Tue, Nov 12, 2019 at 5:31 PM Masahiko Sawada wrote: > > On Tue, 12 Nov 2019 at 20:29, Dilip Kumar wrote: > > > > On Tue, Nov 12, 2019 at 4:04 PM Masahiko Sawada > > wrote: > > > > > > On Mon, 11 Nov 2019 at 17:57, Dilip Kumar wrote: > > > > > > > > On Tue, Oct 29, 2019 at 12:37 PM Masahiko Sawada > > > > wrote: > > > > > I realized that v31-0006 patch doesn't work fine so I've attached the > > > > > updated version patch that also incorporated some comments I got so > > > > > far. Sorry for the inconvenience. I'll apply your 0001 patch and also > > > > > test the total delay time. > > > > > > > > > While reviewing the 0002, I got one doubt related to how we are > > > > dividing the maintainance_work_mem > > > > > > > > +prepare_index_statistics(LVShared *lvshared, Relation *Irel, int > > > > nindexes) > > > > +{ > > > > + /* Compute the new maitenance_work_mem value for index vacuuming */ > > > > + lvshared->maintenance_work_mem_worker = > > > > + (nindexes_mwm > 0) ? maintenance_work_mem / nindexes_mwm : > > > > maintenance_work_mem; > > > > +} > > > > Is it fair to just consider the number of indexes which use > > > > maintenance_work_mem? Or we need to consider the number of worker as > > > > well. My point is suppose there are 10 indexes which will use the > > > > maintenance_work_mem but we are launching just 2 workers then what is > > > > the point in dividing the maintenance_work_mem by 10. > > > > > > > > IMHO the calculation should be like this > > > > lvshared->maintenance_work_mem_worker = (nindexes_mwm > 0) ? > > > > maintenance_work_mem / Min(nindexes_mwm, nworkers) : > > > > maintenance_work_mem; > > > > > > > > Am I missing something? > > > > > > No, I think you're right. On the other hand I think that dividing it > > > by the number of indexes that will use the mantenance_work_mem makes > > > sense when parallel degree > the number of such indexes. Suppose the > > > table has 2 indexes and there are 10 workers then we should divide the > > > maintenance_work_mem by 2 rather than 10 because it's possible that at > > > most 2 indexes that uses the maintenance_work_mem are processed in > > > parallel at a time. > > > > > Right, thats the reason I suggested divide with Min(nindexes_mwm, nworkers). > > Thanks! I'll fix it in the next version patch. > One more comment. +lazy_vacuum_indexes(LVRelStats *vacrelstats, Relation *Irel, + int nindexes, IndexBulkDeleteResult **stats, + LVParallelState *lps) +{ + + if (ParallelVacuumIsActive(lps)) + { + + lazy_parallel_vacuum_or_cleanup_indexes(vacrelstats, Irel, nindexes, + stats, lps); + + } + + for (idx = 0; idx < nindexes; idx++) + { + /* + * Skip indexes that we have already vacuumed during parallel index + * vacuuming. + */ + if (ParallelVacuumIsActive(lps) && !IndStatsIsNull(lps->lvshared, idx)) + continue; + + lazy_vacuum_index(Irel[idx], &stats[idx], vacrelstats->dead_tuples, + vacrelstats->old_live_tuples); + } +} In this function, if ParallelVacuumIsActive, we perform the parallel vacuum for all the index for which parallel vacuum is supported and once that is over we finish vacuuming remaining indexes for which parallel vacuum is not supported. But, my question is that inside lazy_parallel_vacuum_or_cleanup_indexes, we wait for all the workers to finish their job then only we start with the sequential vacuuming shouldn't we start that immediately as soon as the leader participation is over in the parallel vacuum? -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
RE: Proposal: Add more compile-time asserts to expose inconsistencies.
From: Andres Freund Sent: Wednesday, 13 November 2019 6:01 AM >Peter Smith: > > Is there a reason to not just make StaticAssertExpr and StaticAssertStmt be > the same? I don't want to proliferate variants that users have to understand > if there's no compelling > need. Nor do I think do we really need two different fallback implementation > for static asserts. > > As far as I can tell we should be able to use the prototype based approach in > all the cases where we currently use the "negative bit-field width" approach? I also thought that the new "prototype negative array-dimension" based approach (i.e. StaticAssertDecl) looked like an improvement over the existing "negative bit-field width" approach (i.e. StaticAssertStmt), because it seems to work for more scenarios (e.g. file scope). But I did not refactor existing code to use the new way because I was fearful that there might be some subtle reason why the StaticAssertStmt was deliberately made that way (e.g. as do/while), and last thing I want to do was break working code. > Should then also update > * Otherwise we fall back on a kluge that assumes the compiler will complain > * about a negative width for a struct bit-field. This will not include a > * helpful error message, but it beats not getting an error at all. Kind Regards. Peter Smith --- Fujitsu Australia
Invisible PROMPT2
Hello hackers, >From the advanced bikeshedding department: I'd like my psql transcripts to have the usual alignment, but be easier to copy and paste later without having weird prompt stuff in the middle. How about a prompt format directive %w that means "whitespace of the same width as %/"? Then you can make set your PROMPT2 to '%w ' and it becomes invisible: pgdu=# create table foo ( i int, j int ); CREATE TABLE pgdu=# invisible-database.patch Description: Binary data
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
On Mon, Nov 11, 2019 at 05:37:30PM +0900, Michael Paquier wrote: > On Wed, Sep 11, 2019 at 06:30:22PM +0200, Julien Rouhaud wrote: > > The thing is that pg_stat_statements assigns a 0 queryid in the > > post_parse_analyze_hook to recognize utility statements and avoid > > tracking instrumentation twice in case of utility statements, and then > > compute a queryid base on a hash of the query text. Maybe we could > > instead fully reserve queryid "2" for utility statements (so forcing > > queryid "1" for standard queries if jumbling returns 0 *or* 2 instead > > of only 0), and use "2" as the identifier for utility statement > > instead of "0"? > > Hmm. Not sure. At this stage it would be nice to gather more input > on the matter, and FWIW, I don't like much the assumption that a query > ID of 0 is perhaps a utility statement, or perhaps nothing depending > on the state of a backend entry, or even perhaps something else > depending how on how modules make use and define such query IDs. I thought each extension would export a function to compute the query id, and you would all that function with the pg_stat_activity.query string. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +
Re: libpq sslpassword parameter and callback function
On Sun, Nov 10, 2019 at 05:47:24PM +0800, Craig Ringer wrote: > Yep, that was a trivial change I rolled into it. > > FWIW, this is related to two other patches: the patch to allow passwordless > fdw > connections with explicit superuser approval, and the patch to allow sslkey/ > sslpassword to be set as user mapping options in postgres_fdw . Together all > three patches make it possible to use SSL client certificates to manage > authentication in postgres_fdw user mappings. Oh, nice, greatly needed! -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +
Re: Does 'instead of delete' trigger support modification of OLD
On Mon, Nov 11, 2019 at 07:00:22PM +0300, Liudmila Mantrova wrote: > > > 8 нояб. 2019 г., в 0:26, Bruce Momjian написал(а): > > > > First, notice "only", which was missing from the later sentence: > > > >For INSERT and UPDATE > >operations [only], the trigger may modify the > >NEW row before returning it. > > > > which I have now added with my applied patch to all supported releases. > > > > Hi Bruce, > > I happened to browse recent documentation-related commits and I didn’t see > this patch in REL_12_STABLE. Judging by the commit message, it should be > applied there too. Wow, not sure how that happened, fixed. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +
Re: [HACKERS] Block level parallel vacuum
On Wed, 13 Nov 2019 at 11:38, Amit Kapila wrote: > > On Wed, Nov 13, 2019 at 6:53 AM Masahiko Sawada > wrote: > > > > On Tue, 12 Nov 2019 at 22:33, Amit Kapila wrote: > > > > > > > > > Hmm, I think we should define these flags in the most simple way. > > > Your previous proposal sounds okay to me. > > > > Okay. As you mentioned before, my previous proposal won't work for > > existing index AMs that don't set amparallelvacuumoptions. > > > > You mean to say it won't work because it has to set multiple flags > which means that if IndexAm user doesn't set the value of > amparallelvacuumoptions then it won't work? Yes. In my previous proposal every index AMs need to set two flags. > > > But since we > > have amcanparallelvacuum which is false by default I think we don't > > need to worry about backward compatibility problem. The existing index > > AM will use neither parallel bulk-deletion nor parallel cleanup by > > default. When it wants to support parallel vacuum they will set > > amparallelvacuumoptions as well as amcanparallelvacuum. > > > > Hmm, I was not thinking of multiple variables rather only one > variable. The default value should indicate that IndexAm doesn't > support a parallel vacuum. Yes. > It might be that we need to do it the way > I originally proposed the different values of amparallelvacuumoptions > or maybe some variant of it where the default value can clearly say > that IndexAm doesn't support a parallel vacuum. Okay. After more thoughts on your original proposal, what I get confused on your proposal is that there are two types of flags that enable and disable options. Looking at 2, 3 and 4, it looks like all options are disabled by default and setting these flags means to enable them. On the other hand looking at 1, it looks like these options are enabled by default and setting the flag means to disable it. 0 makes sense to me. So how about having 0, 2, 3 and 4? Regards, -- Masahiko Sawadahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [HACKERS] Block level parallel vacuum
On Tue, Nov 12, 2019 at 7:03 PM Amit Kapila wrote: > > On Tue, Nov 12, 2019 at 5:30 PM Masahiko Sawada > wrote: > > > > On Tue, 12 Nov 2019 at 20:11, Amit Kapila wrote: > > > > > > On Tue, Nov 12, 2019 at 3:39 PM Masahiko Sawada > > > wrote: > > > > > > > > On Tue, 12 Nov 2019 at 18:26, Dilip Kumar wrote: > > > > > > > > > > On Tue, Nov 12, 2019 at 2:25 PM Amit Kapila > > > > > wrote: > > > > > > > > > > > > Yeah, maybe something like amparallelvacuumoptions. The options > > > > > > can be: > > > > > > > > > > > > VACUUM_OPTION_NO_PARALLEL 0 # vacuum (neither bulkdelete nor > > > > > > vacuumcleanup) can't be performed in parallel > > > > > > VACUUM_OPTION_NO_PARALLEL_CLEANUP 1 # vacuumcleanup cannot be > > > > > > performed in parallel (hash index will set this flag) > > > > > > > > > > Maybe we don't want this option? because if 3 or 4 is not set then we > > > > > will not do the cleanup in parallel right? > > > > > > > > > > > Yeah, but it is better to be explicit about this. > > > > VACUUM_OPTION_NO_PARALLEL_BULKDEL is missing? > > > > I am not sure if that is required. > > > I think brin indexes > > will use this flag. > > > > Brin index can set VACUUM_OPTION_PARALLEL_CLEANUP in my proposal and > it should work. IIUC, VACUUM_OPTION_PARALLEL_CLEANUP means no parallel bulk delete and always parallel cleanup? I am not sure whether this is the best way because for the cleanup option we are being explicit for each option i.e PARALLEL_CLEANUP, NO_PARALLEL_CLEANUP, etc, then why not the same for the bulk delete. I mean why don't we keep both PARALLEL_BULKDEL and NO_PARALLEL_BULKDEL? > > > It will end up with > > (VACUUM_OPTION_NO_PARALLEL_CLEANUP | > > VACUUM_OPTION_NO_PARALLEL_BULKDEL) is equivalent to > > VACUUM_OPTION_NO_PARALLEL, though. > > > > > > > > > > > VACUUM_OPTION_PARALLEL_BULKDEL 2 # bulkdelete can be done in > > > > > > parallel (Indexes nbtree, hash, gin, gist, spgist, bloom will set > > > > > > this > > > > > > flag) > > > > > > VACUUM_OPTION_PARALLEL_COND_CLEANUP 3 # vacuumcleanup can be done > > > > > > in > > > > > > parallel if bulkdelete is not performed (Indexes nbtree, brin, hash, > > > > > > gin, gist, spgist, bloom will set this flag) > > > > > > VACUUM_OPTION_PARALLEL_CLEANUP 4 # vacuumcleanup can be done in > > > > > > parallel even if bulkdelete is already performed (Indexes gin, brin, > > > > > > and bloom will set this flag) > > > > > > > > > > > > Does something like this make sense? > > > > > > > > 3 and 4 confused me because 4 also looks conditional. How about having > > > > two flags instead: one for doing parallel cleanup when not performed > > > > yet (VACUUM_OPTION_PARALLEL_COND_CLEANUP) and another one for doing > > > > always parallel cleanup (VACUUM_OPTION_PARALLEL_CLEANUP)? > > > > > > > > > > Hmm, this is exactly what I intend to say with 3 and 4. I am not sure > > > what makes you think 4 is conditional. > > > > Hmm so why gin and bloom will set 3 and 4 flags? I thought if it sets > > 4 it doesn't need to set 3 because 4 means always doing cleanup in > > parallel. > > > > Yeah, that makes sense. They can just set 4. > > > > > > > > That way, we > > > > can have flags as follows and index AM chooses two flags, one from the > > > > first two flags for bulk deletion and another from next three flags > > > > for cleanup. > > > > > > > > VACUUM_OPTION_PARALLEL_NO_BULKDEL 1 << 0 > > > > VACUUM_OPTION_PARALLEL_BULKDEL 1 << 1 > > > > VACUUM_OPTION_PARALLEL_NO_CLEANUP 1 << 2 > > > > VACUUM_OPTION_PARALLEL_COND_CLEANUP 1 << 3 > > > > VACUUM_OPTION_PARALLEL_CLEANUP 1 << 4 > > > > > > > > > > This also looks reasonable, but if there is an index that doesn't want > > > to support a parallel vacuum, it needs to set multiple flags. > > > > Right. It would be better to use uint16 as two uint8. I mean that if > > first 8 bits are 0 it means VACUUM_OPTION_PARALLEL_NO_BULKDEL and if > > next 8 bits are 0 means VACUUM_OPTION_PARALLEL_NO_CLEANUP. Other flags > > could be followings: > > > > VACUUM_OPTION_PARALLEL_BULKDEL 0x0001 > > VACUUM_OPTION_PARALLEL_COND_CLEANUP 0x0100 > > VACUUM_OPTION_PARALLEL_CLEANUP 0x0200 > > > > Hmm, I think we should define these flags in the most simple way. > Your previous proposal sounds okay to me. > > -- > With Regards, > Amit Kapila. > EnterpriseDB: http://www.enterprisedb.com -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: [proposal] recovery_target "latest"
At Fri, 8 Nov 2019 16:08:47 +0300, Grigory Smolkin wrote in > While working on it, I stumbled upon something strange: > > why DisownLatch(&XLogCtl->recoveryWakeupLatch) is called before > ReadRecord(xlogreader, LastRec, PANIC, false) ? > Isn`t this latch may be accessed in WaitForWALToBecomeAvailable() if > streaming standby gets promoted? The DisownLatch is just for the sake of tidiness and can be placed anywhere after the ShutdownWalRcv() call but the current place (just before "StandbyMode = false") seems natural. The ReadRecord call doesn't launch another wal receiver because we cleard StandbyMode just before. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: ssl passphrase callback
On Sun, Nov 10, 2019 at 01:01:17PM -0600, Magnus Hagander wrote: > On Wed, Nov 6, 2019 at 7:24 PM Bruce Momjian wrote: > We had this > discussion in relation to archive_command years ago, and decided on a > shell command as the best API. > > I don't recall that from back then, but that was a long time ago. > > But it's interesting that you mention it, given the number of people I have > been discussing that with recently, under the topic of changing it from a > shell > command into a shared library API (with there being a shell command as one > possible implementation of course). > > One of the main reasons there being to be easily able to transfer more state > and give results other than just an exit code, no need to deal with parameter > escaping etc. Which probably wouldn't matter as much to an SSL passphrase > command, but still. I get the callback-is-easier issue with shared objects, but are we expecting to pass in more information here than we do for archive_command? I would think not. What I am saying is that if we don't think passing things in works, we should fix all these external commands, or something. I don't see why ssl_passphrase_command is different, except that it is new. Or is it related to _securely_ passing something? Also, why was this patch posted without any discussion of these issues? Shouldn't we ideally discuss the API first? -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +
Re: [HACKERS] Block level parallel vacuum
On Wed, Nov 13, 2019 at 6:53 AM Masahiko Sawada wrote: > > On Tue, 12 Nov 2019 at 22:33, Amit Kapila wrote: > > > > > > Hmm, I think we should define these flags in the most simple way. > > Your previous proposal sounds okay to me. > > Okay. As you mentioned before, my previous proposal won't work for > existing index AMs that don't set amparallelvacuumoptions. > You mean to say it won't work because it has to set multiple flags which means that if IndexAm user doesn't set the value of amparallelvacuumoptions then it won't work? > But since we > have amcanparallelvacuum which is false by default I think we don't > need to worry about backward compatibility problem. The existing index > AM will use neither parallel bulk-deletion nor parallel cleanup by > default. When it wants to support parallel vacuum they will set > amparallelvacuumoptions as well as amcanparallelvacuum. > Hmm, I was not thinking of multiple variables rather only one variable. The default value should indicate that IndexAm doesn't support a parallel vacuum. It might be that we need to do it the way I originally proposed the different values of amparallelvacuumoptions or maybe some variant of it where the default value can clearly say that IndexAm doesn't support a parallel vacuum. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: 'Invalid lp' during heap_xlog_delete
It's been tedious to get it exactly right but I think I got it. FYI, I was delayed because today we had yet another customer hit this: 'redo max offset' error. The system crashed as a number of autovacuums and a checkpoint happened and then the REDO failure. Two tiny code changes: bufmgr.c:bufferSync() pg_usleep(1000); // At begin of function smgr.c:smgrtruncate(): Add the following just after CacheInvalidateSmgr() if (forknum == MAIN_FORKNUM && nblocks == 0) { pg_usleep(4000); { char *cp=NULL; *cp=13; } } Now for the heavily commented SQL repro. It will require that you execute a checkpoint in another session when instructed by the repro.sql script. You have 4 seconds to do that. The repro script explains exactly what must happen. --- create table t (c char()); alter table t alter column c set storage plain; -- Make sure there actually is an allocated page 0 and it is empty. -- REDO Delete would ignore a non-existant page: XLogReadBufferForRedoExtended: return BLK_NOTFOUND; -- Hopefully two row deletes don't trigger autovacuum and truncate the empty page. insert into t values ('1'), ('2'); delete from t; checkpoint; -- Checkpoint the empty page to disk. -- This insert should be before the next checkpoint 'start'. I don't want to replay it. -- And, yes, there needs to be another checkpoint completed to skip its replay and start -- with the replay of the delete below. insert into t values ('1'), ('2'); -- Checkpoint needs to start in another session. However, I need to stall the checkpoint -- to prevent it from writing the dirty page to disk until I get to the vacuum below. select 'Please start checkpoint in another session'; select pg_sleep(4); -- Below is the problematic delete. -- It succeeds now(online) because the dirty page has two rows on it. -- However, with respect to crash recovery there are 3 possible scenarios depending on timing. -- 1) The ongoing checkpoint might write the page with the two rows on it before -- the deletes. This leads to BLK_NEEDS_REDO for the deletes. It works -- because the page read from disk has the rows on it. -- 2) The ongoing checkpoint might write the page just after the deletes. -- In that case BLK_DONE will happen and there'll be no problem. LSN check. -- 3) The checkpoint can fail to write the dirty page because a vacuum can call -- smgrtruncate->DropRelFileNodeBuffers() which invalidates the dirty page. -- If smgrtruncate safely completes the physical truncation then BLK_NOTFOUND -- happens and we skip the redo of the delete. So the skipped dirty write is OK. -- The problme happens if we crash after the 2nd checkpoint completes -- but before the physical truncate 'mdtruncate()'. delete from t; -- The vacuum must complete DropRelFileNodeBuffers. -- The vacuum must sleep for a few seconds to allow the checkpoint to complete -- such that recovery starts with the Delete REDO. -- We must crash before mdtruncate() does the physical truncate. If the physical -- truncate happens the BLK_NOTFOUND will be returned and the Delete REDO skipped. vacuum t; > On November 10, 2019 at 11:51 PM Michael Paquier < mich...@paquier.xyz > mailto:mich...@paquier.xyz > wrote: > > > On Fri, Nov 08, 2019 at 06:44:08PM -0800, Daniel Wood wrote: > > > > I repro'ed on PG11 and PG10 STABLE but several months old. > > I looked at 6d05086 but it doesn't address the core issue. > > > > DropRelFileNodeBuffers prevents the checkpoint from writing all > > needed dirty pages for any REDO's that exist BEFORE the truncate. > > If we crash after a checkpoint but before the physical truncate then > > the REDO will need to replay the operation against the dirty page > > that the Drop invalidated. > > > > > I am beginning to look at this thread more seriously, and I'd > > like to > first try to reproduce that by myself. Could you share the steps you > used to do that? This includes any manual sleep calls you may have > added, the timing of the crash, manual checkpoints, debugger > breakpoints, etc. It may be possible to extract some more generic > test from that. > -- > Michael >
Re: Add SQL function to show total block numbers in the relation
Size in block number is useless for those who doesn't understand the notion of block, or block size. Those who understands the notion should come up with the simple formula (except the annoying casts). Anyone can find the clue to the base values by searching the document in the Web with the keywords "block size" and "relation size" or even with "table size". (FWIW, I would even do the same for the new function if any...) If they need it so frequently, a user-defined function is easily made up. regards. I didn't know about the existence of the user-defined function . I fully understood , Thanks . Regards, Yu Kimura
Re: [PATCH] Do not use StdRdOptions in Access Methods
Hi Alvaro, On Wed, Nov 13, 2019 at 6:55 AM Alvaro Herrera wrote: > On 2019-Nov-07, Amit Langote wrote: > > On Tue, Nov 5, 2019 at 9:22 AM Michael Paquier wrote: > > > Please note that I have not switched the old interface > > > to be static to reloptions.c as if you look closely at reloptions.h we > > > allow much more flexibility around HANDLE_INT_RELOPTION to fill and > > > parse the reloptions in custom AM. AM maintainers had better use the > > > new interface, but there could be a point for more customized error > > > messages. > > > > I looked around but don't understand why these macros need to be > > exposed. I read this comment: > > > > * Note that this is more or less the same that fillRelOptions does, so > > only > > * use this if you need to do something non-standard within some option's > > * code block. > > > > but don't see how an AM author might be able to do something > > non-standard with this interface. > > > > Maybe Alvaro knows this better. > > I think the idea was that you could have external code doing things in a > different way for some reason, ie. not use a bytea varlena struct that > could be filled by fillRelOptions but instead ... do something else. > That's why those macros are exposed. Now, this idea doesn't seem to be > aged very well. Maybe exposing that stuff is unnecessary. Thanks for chiming in about that. I guess that means that we don't need those macros, except GET_STRING_RELOPTION_LEN() that's used in allocateReloptStruct(), which can be moved to reloptions.c. Is that correct? Thanks, Amit
Re: [PATCH][BUG_FIX] Potential null pointer dereferencing.
At Tue, 12 Nov 2019 14:03:53 +, Ranier Vilela wrote in > Hi, > The condition is : > 74. if (TupIsNull(slot)) is true > 85. if (TupIsNull(resultTupleSlot)) is true too. See the definition of TupIsNull. It checks the tupleslot at a valid pointer is EMPTY as well. And node->ps.ps_ResultTupleSlot cannot be NULL there since ExecInitUnique initializes it. The sequence is obvious so even Assert is not needed there, I think. > If resultTupleSlot is not can be NULL, why test if > (TupIsNull(resultTupleSlot))? It checks if there is no stored "previous" tuple, which is used to detect the next value. If it is EMPTY (not NULL), it is the first tuple from the outerPlan as described in the comment just above. > Occurring these two conditions ExecClearTuple is called, but, don't check by > NULL arg. > > There are at least 2 more possible cases, envolving ExecClearTuple: > nodeFunctionscan.c and nodeWindowAgg.c regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Why overhead of SPI is so large?
At Tue, 12 Nov 2019 11:27:24 +0300, Konstantin Knizhnik wrote in > > > On 11.11.2019 20:22, Tom Lane wrote: > > > > None of those statements are true, in my experience. > > > > In general, this patch seems like it's learned nothing from our > > experiences with the late and unlamented exec_simple_check_node() > > (cf commit 00418c612). Having a piece of plpgsql that has to know > > about all possible "simple expression" node types is just a major > > maintenance headache; but there is no short-cut solution that is > > really going to be acceptable. Either you're unsafe, or you fail > > to optimize cases that users will complain about, or you write > > and maintain a lot of code. > > > > I'm also pretty displeased with the is-it-in-the-pg_catalog-schema > > tests. Those are expensive, requiring additional catalog lookups, > > and they prove just about nothing. There are extensions that shove > > stuff into pg_catalog (look no further than contrib/adminpack), or > > users could do it, so you really still are relying on the whole world > > to get immutability right. If we're going to not trust immutability > > markings on user-defined objects, I'd be inclined to do it by > > checking that the object's OID is less than FirstGenbkiObjectId. > > > > But maybe we should just forget that bit of paranoia and rely solely > > on contain_mutable_functions(). That gets rid of the new maintenance > > requirement, and I think arguably it's OK. An "immutable" function > > whose result depends on changes that were just made by the calling > > function is pretty obviously mislabeled, so people won't have much of > > a leg to stand on if they complain about that. Pavel's argument > > upthread that people sometimes intentionally mislabel functions as > > immutable isn't really relevant: in his example, at least, the user > > is intentionally trying to get the function to be evaluated early. > > So whether it sees the effects of changes just made by the calling > > function doesn't seem like it would matter to such a user. > > > > regards, tom lane > > In my opinion contain_mutable_functions() is the best solution. > But if it is not acceptable, I will rewrite the patch in white-list > fashion. I agree for just relying on contain_mutable_functions for the same reasons to Tom. > I do not understand the argument about expensive > is-it-in-the-pg_catalog-schema test. > IsCatalogNameaspace is doing just simple comparison without any > catalog lookups: > > bool > IsCatalogNamespace(Oid namespaceId) > { > return namespaceId == PG_CATALOG_NAMESPACE; > } > > I may agree that it "proves nothing" if extension can put stuff in > pg_catalog. > I can replace it with comparison with FirstGenbkiObjectId. > But all this makes sense only if using contain_mutable_function() is > not acceptable. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: [HACKERS] Block level parallel vacuum
On Tue, 12 Nov 2019 at 22:33, Amit Kapila wrote: > > On Tue, Nov 12, 2019 at 5:30 PM Masahiko Sawada > wrote: > > > > On Tue, 12 Nov 2019 at 20:11, Amit Kapila wrote: > > > > > > On Tue, Nov 12, 2019 at 3:39 PM Masahiko Sawada > > > wrote: > > > > > > > > On Tue, 12 Nov 2019 at 18:26, Dilip Kumar wrote: > > > > > > > > > > On Tue, Nov 12, 2019 at 2:25 PM Amit Kapila > > > > > wrote: > > > > > > > > > > > > Yeah, maybe something like amparallelvacuumoptions. The options > > > > > > can be: > > > > > > > > > > > > VACUUM_OPTION_NO_PARALLEL 0 # vacuum (neither bulkdelete nor > > > > > > vacuumcleanup) can't be performed in parallel > > > > > > VACUUM_OPTION_NO_PARALLEL_CLEANUP 1 # vacuumcleanup cannot be > > > > > > performed in parallel (hash index will set this flag) > > > > > > > > > > Maybe we don't want this option? because if 3 or 4 is not set then we > > > > > will not do the cleanup in parallel right? > > > > > > > > > > > Yeah, but it is better to be explicit about this. > > > > VACUUM_OPTION_NO_PARALLEL_BULKDEL is missing? > > > > I am not sure if that is required. > > > I think brin indexes > > will use this flag. > > > > Brin index can set VACUUM_OPTION_PARALLEL_CLEANUP in my proposal and > it should work. > > > It will end up with > > (VACUUM_OPTION_NO_PARALLEL_CLEANUP | > > VACUUM_OPTION_NO_PARALLEL_BULKDEL) is equivalent to > > VACUUM_OPTION_NO_PARALLEL, though. > > > > > > > > > > > VACUUM_OPTION_PARALLEL_BULKDEL 2 # bulkdelete can be done in > > > > > > parallel (Indexes nbtree, hash, gin, gist, spgist, bloom will set > > > > > > this > > > > > > flag) > > > > > > VACUUM_OPTION_PARALLEL_COND_CLEANUP 3 # vacuumcleanup can be done > > > > > > in > > > > > > parallel if bulkdelete is not performed (Indexes nbtree, brin, hash, > > > > > > gin, gist, spgist, bloom will set this flag) > > > > > > VACUUM_OPTION_PARALLEL_CLEANUP 4 # vacuumcleanup can be done in > > > > > > parallel even if bulkdelete is already performed (Indexes gin, brin, > > > > > > and bloom will set this flag) > > > > > > > > > > > > Does something like this make sense? > > > > > > > > 3 and 4 confused me because 4 also looks conditional. How about having > > > > two flags instead: one for doing parallel cleanup when not performed > > > > yet (VACUUM_OPTION_PARALLEL_COND_CLEANUP) and another one for doing > > > > always parallel cleanup (VACUUM_OPTION_PARALLEL_CLEANUP)? > > > > > > > > > > Hmm, this is exactly what I intend to say with 3 and 4. I am not sure > > > what makes you think 4 is conditional. > > > > Hmm so why gin and bloom will set 3 and 4 flags? I thought if it sets > > 4 it doesn't need to set 3 because 4 means always doing cleanup in > > parallel. > > > > Yeah, that makes sense. They can just set 4. Okay, > > > > > > > > That way, we > > > > can have flags as follows and index AM chooses two flags, one from the > > > > first two flags for bulk deletion and another from next three flags > > > > for cleanup. > > > > > > > > VACUUM_OPTION_PARALLEL_NO_BULKDEL 1 << 0 > > > > VACUUM_OPTION_PARALLEL_BULKDEL 1 << 1 > > > > VACUUM_OPTION_PARALLEL_NO_CLEANUP 1 << 2 > > > > VACUUM_OPTION_PARALLEL_COND_CLEANUP 1 << 3 > > > > VACUUM_OPTION_PARALLEL_CLEANUP 1 << 4 > > > > > > > > > > This also looks reasonable, but if there is an index that doesn't want > > > to support a parallel vacuum, it needs to set multiple flags. > > > > Right. It would be better to use uint16 as two uint8. I mean that if > > first 8 bits are 0 it means VACUUM_OPTION_PARALLEL_NO_BULKDEL and if > > next 8 bits are 0 means VACUUM_OPTION_PARALLEL_NO_CLEANUP. Other flags > > could be followings: > > > > VACUUM_OPTION_PARALLEL_BULKDEL 0x0001 > > VACUUM_OPTION_PARALLEL_COND_CLEANUP 0x0100 > > VACUUM_OPTION_PARALLEL_CLEANUP 0x0200 > > > > Hmm, I think we should define these flags in the most simple way. > Your previous proposal sounds okay to me. Okay. As you mentioned before, my previous proposal won't work for existing index AMs that don't set amparallelvacuumoptions. But since we have amcanparallelvacuum which is false by default I think we don't need to worry about backward compatibility problem. The existing index AM will use neither parallel bulk-deletion nor parallel cleanup by default. When it wants to support parallel vacuum they will set amparallelvacuumoptions as well as amcanparallelvacuum. I'll try to use my previous proposal and check it. If something wrong we can back to your proposal or others. -- Masahiko Sawadahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: PHJ file leak.
At Wed, 13 Nov 2019 09:48:19 +1300, Thomas Munro wrote in > On Tue, Nov 12, 2019 at 5:03 PM Thomas Munro wrote: > > On Tue, Nov 12, 2019 at 4:23 PM Thomas Munro wrote: > > > On Tue, Nov 12, 2019 at 4:20 PM Kyotaro Horiguchi > > > wrote: > > > > The previous patch would be wrong. The root cause is a open batch so > > > > the right thing to be done at scan end is > > > > ExecHashTableDeatchBatch. And the real issue here seems to be in > > > > ExecutePlan, not in PHJ. > > > > > > You are right. Here is the email I just wrote that says the same > > > thing, but with less efficiency: > > > > And yeah, your Make_parallel_shutdown_on_broken_channel.patch seems > > like the real fix here. It's not optional to run that at > > end-of-query, though you might get that impression from various > > comments, and it's also not OK to call it before the end of the query, > > though you might get that impression from what the code actually does. > > Here's the version I'd like to commit in a day or two, once the dust > has settled on the minor release. Instead of adding yet another copy > of that code, I just moved it out of the loop; this way there is no > way to miss it. I think the comment could also be better, but I'll > wait for the concurrent discussions about the meaning of > ExecShutdownNode() to fix that in master. The phatch's shape looks better. Thanks. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: checking my understanding of TupleDesc
Hi, On 2019-11-12 18:20:56 -0500, Tom Lane wrote: > Andres Freund writes: > > On 2019-11-12 17:39:20 -0500, Tom Lane wrote: > >> There's a semi-exception, which is that the planner might decide that we > >> can skip a projection step for the output of a table scan node, in which > >> case dropped columns would be included in its output. But that would only > >> be true if there are upper plan nodes that are doing some projections of > >> their own. The final query output will definitely not have them. > > > I *think* we don't even do that, because build_physical_tlist() bails > > out if there's a dropped (or missing) column. > > Ah, right. Probably because we need to insist on every column of an > execution-time tupdesc having a valid atttypid ... although I wonder, > is that really necessary? Yea, the stated reasoning is ExecTypeFromTL(): * * Exception: if there are any dropped or missing columns, we punt and return * NIL. Ideally we would like to handle these cases too. However this * creates problems for ExecTypeFromTL, which may be asked to build a tupdesc * for a tlist that includes vars of no-longer-existent types. In theory we * could dig out the required info from the pg_attribute entries of the * relation, but that data is not readily available to ExecTypeFromTL. * For now, we don't apply the physical-tlist optimization when there are * dropped cols. I think the main problem is that we don't even have a convenient way to identify that a targetlist expression is actually a dropped column, and treat that differently. If we were to expand physical tlists to cover dropped and missing columns, we'd need to be able to add error checks to at least ExecInitExprRec, and to printtup_prepare_info(). I wonder if we could get away with making build_physical_tlist() returning a TargetEntry for a Const instead of a Var for the dropped columns? That'd contain enough information for tuple deforming to work on higher query levels? Or perhaps we ought to invent a DroppedVar node, that includes the type information? That'd make it trivial to error out when such an expression is actually evaluated, and allow to detect such columns. We already put Const nodes in some places like that IIRC... Greetings, Andres Freund
Re: make pg_attribute_noreturn() work for msvc?
Hi, On 2019-11-12 18:15:28 -0500, Tom Lane wrote: > Andres Freund writes: > > It's worthwhile to note - I forgot this - that noreturn actually has > > been standardized in C11 and C++11. For C11 the keyword is _Noreturn, > > with a convenience macro 'noreturn' defined in stdnoreturn.h. > > > For C++11, the syntax is (please don't get an aneurysm...): > > [[ noreturn ]] void funcname(params)... > > (yes, the [[]] are actually part of the syntax, not some BNF like thing) > > Egad. I'd *want* to hide that under a macro :-( Yea, it's quite ugly. I think the only saving grace is that C++ made that the generic syntax for various annotations / attributes. Everywhere, not just for function properties. So there's [[noreturn]], [[fallthrough]], [[nodiscard]], [[maybe_unused]] etc, and that there is explicit namespacing for vendor extensions by using [[vendorname::attname]], e.g. the actually existing [[gnu::always_inline]]. There's probably not that many other forms of syntax one can add to all the various places, without running into syntax limitations, or various vendor extensions... But still. > > While it looks tempting to just use 'noreturn', and backfill it if the > > current environment doesn't support it, I think that's a bit too > > dangerous, because it will tend to break other code like > > __attribute__((noreturn)) and _declspec(noreturn). As there's plenty > > other software using either or both of these, I don't think it's worth > > going there. > > Agreed, defining noreturn is too dangerous, it'll have to be > pg_noreturn. Or maybe use _Noreturn? But that feels ugly too. Yea, I don't like that all that much. We'd have to define it in C++ mode, and it's in the explicit standard reserved namespace... > Anyway, I still like the idea of merging the void keyword in with > that. Hm. Any other opinions? Greetings, Andres Freund
Re: checking my understanding of TupleDesc
Andres Freund writes: > On 2019-11-12 17:39:20 -0500, Tom Lane wrote: >> There's a semi-exception, which is that the planner might decide that we >> can skip a projection step for the output of a table scan node, in which >> case dropped columns would be included in its output. But that would only >> be true if there are upper plan nodes that are doing some projections of >> their own. The final query output will definitely not have them. > I *think* we don't even do that, because build_physical_tlist() bails > out if there's a dropped (or missing) column. Ah, right. Probably because we need to insist on every column of an execution-time tupdesc having a valid atttypid ... although I wonder, is that really necessary? regards, tom lane
Re: make pg_attribute_noreturn() work for msvc?
Andres Freund writes: > It's worthwhile to note - I forgot this - that noreturn actually has > been standardized in C11 and C++11. For C11 the keyword is _Noreturn, > with a convenience macro 'noreturn' defined in stdnoreturn.h. > For C++11, the syntax is (please don't get an aneurysm...): > [[ noreturn ]] void funcname(params)... > (yes, the [[]] are actually part of the syntax, not some BNF like thing) Egad. I'd *want* to hide that under a macro :-( > While it looks tempting to just use 'noreturn', and backfill it if the > current environment doesn't support it, I think that's a bit too > dangerous, because it will tend to break other code like > __attribute__((noreturn)) and _declspec(noreturn). As there's plenty > other software using either or both of these, I don't think it's worth > going there. Agreed, defining noreturn is too dangerous, it'll have to be pg_noreturn. Or maybe use _Noreturn? But that feels ugly too. Anyway, I still like the idea of merging the void keyword in with that. regards, tom lane
Re: checking my understanding of TupleDesc
Hi, On 2019-11-12 17:39:20 -0500, Tom Lane wrote: > > and under what other > > circumstances one would only encounter 'cleaned up' TupleDescs with > > no dropped attributes, and contiguous numbers for the real ones? > > I don't believe we ever include dropped columns in a projection result, > so generally speaking, the output of a query plan node wouldn't have them. > > There's a semi-exception, which is that the planner might decide that we > can skip a projection step for the output of a table scan node, in which > case dropped columns would be included in its output. But that would only > be true if there are upper plan nodes that are doing some projections of > their own. The final query output will definitely not have them. I *think* we don't even do that, because build_physical_tlist() bails out if there's a dropped (or missing) column. Or are you thinking of something else? Greetings, Andres Freund
Re: make pg_attribute_noreturn() work for msvc?
Hi, On 2019-11-12 17:22:05 -0500, Tom Lane wrote: > Andres Freund writes: > > On 2019-11-12 15:58:07 -0500, Tom Lane wrote: > > I'm a bit confused as to why pg_upgrade.h doesn't use 'extern' for > > function declarations? Not that it's really related, except for the > > 'extern' otherwise hiding the effect of pgindent not liking 'void > > pg_noreturn'… > > There are various headers where people have tended to not use "extern". > I always disliked that, thinking it was not per project style, but > never bothered to force the issue. If we went around and inserted > extern in these places, it wouldn't bother me any. > > I don't see a reason not to go for 'pg_noreturn void'? > > That seems kind of ugly from here. Not sure why, but at least to > my mind that's a surprising ordering. Oh, to me it seemed a quite reasonable order. It think it feels that way to me because we put properties like 'static', 'extern', 'inline' etc also before the return type (and it's similar for variable declarations too). It's maybe also worthwhile to note that emacs parses 'pg_noreturn void' correctly, but gets confused by 'void pg_noreturn'. It's just syntax highlighting though, so whatever. > >> One idea is to merge it with the "void" result type that such > >> a function would presumably have, along the lines of > >> #define pg_noreturnvoid __declspec(noreturn) > > > Yea, that'd be an alternative. But since not necessary, I'd not go > > there? > > I kind of liked that idea, too bad you don't. I don't actively dislike it. It just seemed a bit more magic than necessary. One need not understand what pg_noreturn does - not that it's hard to infer from the name - to know the return type of the function. > One argument for it is that then there'd be exactly one right way to > do it, not two. Also, if we find out that there's some compiler > that's pickier about where to place the annotation, we'd have a > central place to handle it. The former seems like a good argument to me. I'm not quite sure I think the second is likely. It's worthwhile to note - I forgot this - that noreturn actually has been standardized in C11 and C++11. For C11 the keyword is _Noreturn, with a convenience macro 'noreturn' defined in stdnoreturn.h. For C++11, the syntax is (please don't get an aneurysm...): [[ noreturn ]] void funcname(params)... (yes, the [[]] are actually part of the syntax, not some BNF like thing) I *think* the standard prescribes _Noreturn to be before the return type (it's defined in the same rule as inline), but I have some difficulty parsing the standard language. Gcc at least accepts inline only before the return type, but _Noreturn in both places. Certainly all the standard examples place it before the type. While it looks tempting to just use 'noreturn', and backfill it if the current environment doesn't support it, I think that's a bit too dangerous, because it will tend to break other code like __attribute__((noreturn)) and _declspec(noreturn). As there's plenty other software using either or both of these, I don't think it's worth going there. Greetings, Andres Freund
Re: checking my understanding of TupleDesc
Chapman Flack writes: > On 09/29/19 20:13, Chapman Flack wrote: >> From looking around the code, I've made these tentative observations >> about TupleDescs: >> >> 1. If the TupleDesc was obtained straight from the relcache for some >> relation, then all of its attributes should have nonzero attrelid >> identifying that relation, but in (every? nearly every?) other case, >> the attributes found in a TupleDesc will have a dummy attrelid of zero. I'm not sure about every vs. nearly every, but otherwise this seems accurate. Generally attrelid is meaningful in a pg_attribute catalog entry, but not in TupleDescs in memory. It appears valid in relcache entry tupdescs only because they are built straight from pg_attribute. >> 2. The attributes in a TupleDesc will (always?) have consecutive attnum >> corresponding to their positions in the TupleDesc (and therefore >> redundant). Correct. > And one more: > 3. One could encounter a TupleDesc with one or more 'attisdropped' > attributes, which do have their original attnums (corresponding > to their positions in the TupleDesc and therefore redundant), > so the attnums of nondropped attributes may be discontiguous. Right. > Is it simple to say under what circumstances a TupleDesc possibly > with dropped members could be encountered, Any tupdesc that's describing the rowtype of a table with dropped columns would look like that. > and under what other > circumstances one would only encounter 'cleaned up' TupleDescs with > no dropped attributes, and contiguous numbers for the real ones? I don't believe we ever include dropped columns in a projection result, so generally speaking, the output of a query plan node wouldn't have them. There's a semi-exception, which is that the planner might decide that we can skip a projection step for the output of a table scan node, in which case dropped columns would be included in its output. But that would only be true if there are upper plan nodes that are doing some projections of their own. The final query output will definitely not have them. regards, tom lane
Re: make pg_attribute_noreturn() work for msvc?
Andres Freund writes: > On 2019-11-12 15:58:07 -0500, Tom Lane wrote: >> I guess my big question about that is whether pgindent will make a >> hash of it. > If one writes 'pg_noreturn void', rather than 'void pg_noreturn', then > there's only one place where pgindent changes something in a somewhat > weird way. For tablesync.c, it indents the pg_noreturn for > finish_sync_worker(). But only due to being on a separate newline, which > seems unnecessary… I think that it might be like that because some previous version of pgindent changed it to that. That's probably why we never adopted this style generally in the first place. > With 'void pg_noreturn', a few prototypes in headers get indented more > than pretty, e.g. in pg_upgrade.h it turns > void pg_noreturn pg_fatal(const char *fmt,...) pg_attribute_printf(1, 2); > into > void pg_noreturn pg_fatal(const char *fmt,...) > pg_attribute_printf(1, 2); > I'm a bit confused as to why pg_upgrade.h doesn't use 'extern' for > function declarations? Not that it's really related, except for the > 'extern' otherwise hiding the effect of pgindent not liking 'void > pg_noreturn'… There are various headers where people have tended to not use "extern". I always disliked that, thinking it was not per project style, but never bothered to force the issue. If we went around and inserted extern in these places, it wouldn't bother me any. > I don't see a reason not to go for 'pg_noreturn void'? That seems kind of ugly from here. Not sure why, but at least to my mind that's a surprising ordering. >> One idea is to merge it with the "void" result type that such >> a function would presumably have, along the lines of >> #define pg_noreturn void __declspec(noreturn) > Yea, that'd be an alternative. But since not necessary, I'd not go > there? I kind of liked that idea, too bad you don't. One argument for it is that then there'd be exactly one right way to do it, not two. Also, if we find out that there's some compiler that's pickier about where to place the annotation, we'd have a central place to handle it. regards, tom lane
Re: JIT performance bug/regression & JIT EXPLAIN
Hi, On 2019-11-12 13:42:10 -0800, Maciek Sakrejda wrote: > On Mon, Oct 28, 2019 at 5:02 PM Andres Freund wrote: > > What I dislike about that is that it basically again is introducing > > "again"? Am I missing some history here? I'd love to read up on this > if there are mistakes to learn from. I think I was mostly referring to mistakes we've made for the json etc key names. By e.g. having expressions as "Function Call", "Table Function Call", "Filter", "TID Cond", ... a tool that wants to interpret the output needs awareness of all of these different names, rather than knowing that everything with a sub-group "Expression" has to be an expression. I.e. instead of "Plan": { "Node Type": "Seq Scan", "Parallel Aware": false, "Relation Name": "pg_class", "Schema": "pg_catalog", "Alias": "pg_class", "Startup Cost": 0.00, "Total Cost": 17.82, "Plan Rows": 385, "Plan Width": 68, "Output": ["relname", "tableoid"], "Filter": "(pg_class.relname <> 'foo'::name)" } we ought to have gone for "Plan": { "Node Type": "Seq Scan", "Parallel Aware": false, "Relation Name": "pg_class", "Schema": "pg_catalog", "Alias": "pg_class", "Startup Cost": 0.00, "Total Cost": 17.82, "Plan Rows": 385, "Plan Width": 68, "Output": ["relname", "tableoid"], "Filter": {"Expression" : { "text": (pg_class.relname <> 'foo'::name)"}} } or something like that. Which'd then make it obvious how to add information about JIT to each expression. Whereas the proposal of the separate key name perpetuates the messiness... > > something that requires either pattern matching on key names (i.e. a key > > of '(.*) JIT' is one that has information about JIT, and the associated > > expresssion is in key $1), or knowing all the potential keys an > > expression could be in. > > That still seems less awkward than having to handle a Filter field > that's either scalar or a group. Yea, it's a sucky option :( > Most current EXPLAIN options just add > additional fields to the structured plan instead of modifying it, no? > If that output is better enough, though, maybe we should just always > make Filter a group and go with the breaking change? If tooling > authors need to treat this case specially anyway, might as well evolve > the format. Yea, maybe that's the right thing to do. Would be nice to have some more input... Greetings, Andres Freund
2019-11-14 Press Release Draft
Hi, Attached is a draft of the press release for the update release going out on 2010-11-14. Please provide feedback, particularly on the technical accuracy of the statements. Thanks! Jonathan 2019-05-09 Cumulative Update Release The PostgreSQL Global Development Group has released an update to all supported versions of our database system, including 12.1, 11.6, 10.11, 9.6.16, 9.5.20, and 9.4.25. This release fixes over 60 bugs reported over the last three months. PostgreSQL 9.4 EOL Approaching -- PostgreSQL 9.4 will stop receiving fixes on February 13, 2020, which is the next planned cumulative update release. Please see our [versioning policy](https://www.postgresql.org/support/versioning/) for more information. Bug Fixes and Improvements -- This update also fixes over 50 bugs that were reported in the last several months. Some of these issues affect only version 12, but may also affect all supported versions. Some of these fixes include: * Fix crash that occurs when `ALTER TABLE` adds a column without a default value along with other changes that require a table rewrite * Several fixes for `REINDEX CONCURRENTLY`. * Fix for `VACUUM` that would cause it to fail under a specific case involving a still-running transaction. * Fix for a memory leak that could occur when `VACUUM` runs on a GiST index. * Fix for an error that occurred when running `CLUSTER` on an expression index. * Fix failure for `SET CONSTRAINTS ... DEFERRED` on partitioned tables. * Several fixes for the creation and dropping of indexes on partitioned tables. * Fix for partition-wise joins that could lead to planner failures. * Ensure that offset expressions in WINDOW clauses are processed when a query's expressions are manipulated. * Fix misbehavior of `bitshiftright()` where it failed to zero out padding space in the last byte if the bit string length is not a multiple of 8. For how to correct your data, please see the "Updating" section. * Ensure an empty string that is evaluated by the `position()` returns 1, as per the SQL standard. * Fix for a parallel query failure when it is unable to request a background worker. * Fix crash triggered by a case involving a `BEFORE UPDATE` trigger. * Display the correct error when a query tries to access a TOAST table. * Allow encoding conversion to succeed on strings with output up to 1GB. Previously there was hard limit of 0.25GB on the input string. * Ensure that temporary WAL and history files are removed at the end of archive recovery. * Avoid failure in archive recovery if `recovery_min_apply_delay` is enabled. * Ignore `restore_command`, `recovery_end_command`, and `recovery_min_apply_delay` settings during crash recovery. * Several fixes for logical replication, including a failure when the publisher & subscriber had different REPLICA IDENTITY columns set. * Correctly timestamp replication messages for logical decoding, which in the broken case would lead to `pg_stat_subscription.last_msg_send_time` set to `NULL`. * Several fixes for libpq, including one that improves PostgreSQL 12 compatibility. * Several `pg_upgrade` fixes. * Fix how a parallel restore handles foreign key constraints on partitioned tables to ensure they are not created too soon. * `pg_dump` now outputs similarly named triggers and RLS policies in order based on table name, instead of OID. * Fix `pg_rewind` to not update the contents of `pg_control` when using the `--dry-run` option. This update also contains tzdata release 2019c for DST law changes in Fiji and Norfolk Island. Historical corrections for Alberta, Austria, Belgium, British Columbia, Cambodia, Hong Kong, Indiana (Perry County), Kaliningrad, Kentucky, Michigan, Norfolk Island, South Korea, and Turkey. For the full list of changes available, please review the [release notes](https://www.postgresql.org/docs/current/release.html). Updating All PostgreSQL update releases are cumulative. As with other minor releases, users are not required to dump and reload their database or use `pg_upgrade` in order to apply this update release; you may simply shutdown PostgreSQL and update its binaries. Users who have skipped one or more update releases may need to run additional, post-update steps; please see the release notes for earlier versions for details. If you have inconsistent data as a result of saving the output of `bitshiftright()` in a table, it's possible to fix it with a query similar to: UPDATE mytab SET bitcol = ~(~bitcol) WHERE bitcol != ~(~bitcol); **NOTE**: PostgreSQL 9.4 will stop receiving fixes on February 13, 2020. Please see our [versioning policy](https://www.postgresql.org/support/versioning/) for more information. Links - * [Download](https://www.postgresql.org/download/) * [Release Notes](https://www.postgresql.org/docs/current/release.html) * [Security Page](https://www.postgresql.org/support/security/) * [Versioning Policy](ht
Re: make pg_attribute_noreturn() work for msvc?
On 2019-11-12 15:58:07 -0500, Tom Lane wrote: > Andres Freund writes: > > So perhaps we ought to rename pg_attribute_noreturn() to pg_noreturn(), > > add a __declspec(noreturn) version, and move the existing uses to it. > > > I'm inclined to also drop the parentheses at the same time (i.e > > pg_noreturn rather than pg_noreturn()) - it seems easier to mentally > > parse the code that way. > > I guess my big question about that is whether pgindent will make a > hash of it. If one writes 'pg_noreturn void', rather than 'void pg_noreturn', then there's only one place where pgindent changes something in a somewhat weird way. For tablesync.c, it indents the pg_noreturn for finish_sync_worker(). But only due to being on a separate newline, which seems unnecessary… With 'void pg_noreturn', a few prototypes in headers get indented more than pretty, e.g. in pg_upgrade.h it turns void pg_noreturn pg_fatal(const char *fmt,...) pg_attribute_printf(1, 2); into voidpg_noreturn pg_fatal(const char *fmt,...) pg_attribute_printf(1, 2); I'm a bit confused as to why pg_upgrade.h doesn't use 'extern' for function declarations? Not that it's really related, except for the 'extern' otherwise hiding the effect of pgindent not liking 'void pg_noreturn'… I don't see a reason not to go for 'pg_noreturn void'? > One idea is to merge it with the "void" result type that such > a function would presumably have, along the lines of > > #define pg_noreturn void __declspec(noreturn) > ... > extern pg_noreturn proc_exit(int); > and if necessary, we could strongarm pgindent into believing > that pg_noreturn is a typedef. Yea, that'd be an alternative. But since not necessary, I'd not go there? Greetings, Andres Freund >From ed3bea2b054fdd0f74b599b50bc8ea457a42fc63 Mon Sep 17 00:00:00 2001 From: Andres Freund Date: Tue, 12 Nov 2019 13:55:39 -0800 Subject: [PATCH v1] Change pg_attribute_noreturn() to pg_noreturn, move to before return type. --- contrib/cube/cubedata.h | 2 +- contrib/dblink/dblink.c | 6 ++ contrib/pgcrypto/px.h| 2 +- contrib/seg/segdata.h| 2 +- src/backend/postmaster/autovacuum.c | 4 ++-- src/backend/postmaster/pgarch.c | 2 +- src/backend/postmaster/pgstat.c | 2 +- src/backend/postmaster/postmaster.c | 4 ++-- src/backend/postmaster/syslogger.c | 2 +- src/backend/replication/logical/tablesync.c | 3 +-- src/backend/replication/walsender.c | 2 +- src/backend/utils/adt/json.c | 4 ++-- src/backend/utils/adt/ri_triggers.c | 8 src/backend/utils/fmgr/dfmgr.c | 4 ++-- src/backend/utils/misc/guc.c | 3 +-- src/bin/pg_dump/pg_backup_utils.h| 2 +- src/bin/pg_upgrade/pg_upgrade.h | 2 +- src/bin/pgbench/pgbench.h| 12 ++-- src/include/bootstrap/bootstrap.h| 4 ++-- src/include/c.h | 10 +- src/include/mb/pg_wchar.h| 6 +++--- src/include/parser/parse_relation.h | 6 +++--- src/include/parser/scanner.h | 2 +- src/include/pgstat.h | 2 +- src/include/postmaster/autovacuum.h | 4 ++-- src/include/postmaster/bgworker_internals.h | 2 +- src/include/postmaster/bgwriter.h| 4 ++-- src/include/postmaster/pgarch.h | 2 +- src/include/postmaster/postmaster.h | 4 ++-- src/include/postmaster/startup.h | 2 +- src/include/postmaster/syslogger.h | 2 +- src/include/postmaster/walwriter.h | 2 +- src/include/replication/walreceiver.h| 2 +- src/include/replication/walsender_private.h | 2 +- src/include/storage/ipc.h| 2 +- src/include/storage/lock.h | 2 +- src/include/tcop/tcopprot.h | 10 +- src/include/utils/datetime.h | 4 ++-- src/include/utils/elog.h | 8 src/include/utils/help_config.h | 2 +- src/interfaces/ecpg/preproc/preproc_extern.h | 2 +- src/pl/plpgsql/src/plpgsql.h | 2 +- src/test/modules/test_shm_mq/test_shm_mq.h | 2 +- src/test/modules/worker_spi/worker_spi.c | 2 +- src/timezone/zic.c | 4 ++-- 45 files changed, 79 insertions(+), 83 deletions(-) diff --git a/contrib/cube/cubedata.h b/contrib/cube/cubedata.h index dbe7d4f7429..178fb46f217 100644 --- a/contrib/cube/cubedata.h +++ b/contrib/cube/cubedata.h @@ -61,7 +61,7 @@ typedef struct NDBOX /* in cubescan.l */ extern int cube_yylex(void); -extern void cube_yyerror(NDBOX **result, const char *message) pg_attribute_noreturn(); +extern pg_noreturn void cube_yyerror(NDBOX **result, const char *message); extern void cube_scanner_init(const char *str);
Re: [PATCH] Do not use StdRdOptions in Access Methods
On 2019-Nov-07, Amit Langote wrote: > On Tue, Nov 5, 2019 at 9:22 AM Michael Paquier wrote: > > Please note that I have not switched the old interface > > to be static to reloptions.c as if you look closely at reloptions.h we > > allow much more flexibility around HANDLE_INT_RELOPTION to fill and > > parse the reloptions in custom AM. AM maintainers had better use the > > new interface, but there could be a point for more customized error > > messages. > > I looked around but don't understand why these macros need to be > exposed. I read this comment: > > * Note that this is more or less the same that fillRelOptions does, so only > * use this if you need to do something non-standard within some option's > * code block. > > but don't see how an AM author might be able to do something > non-standard with this interface. > > Maybe Alvaro knows this better. I think the idea was that you could have external code doing things in a different way for some reason, ie. not use a bytea varlena struct that could be filled by fillRelOptions but instead ... do something else. That's why those macros are exposed. Now, this idea doesn't seem to be aged very well. Maybe exposing that stuff is unnecessary. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: auxiliary processes in pg_stat_ssl
On 2019-Nov-04, Stephen Frost wrote: > Based on what we claim in our docs, it does look like 'client_port IS > NOT NULL' should work. I do think we might want to update the docs to > make it a bit more explicit, what we say now is: > > TCP port number that the client is using for communication with this > backend, or -1 if a Unix socket is used > > We don't explain there that NULL means the backend doesn't have an > external connection even though plenty of those entries show up in every > instance of PG. Perhaps we should add this: > > If this field is null, it indicates that this is an internal process > such as autovacuum. > > Which is what we say for 'client_addr'. Seems sensible. Done. Thanks -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: auxiliary processes in pg_stat_ssl
On 2019-Nov-04, Euler Taveira wrote: > Yep, it is pointless. BackendType that open connections to server are: > autovacuum worker, client backend, background worker, wal sender. I > also notice that pg_stat_gssapi is in the same boat as pg_stat_ssl and > we should constraint the rows to backend types that open connections. > I'm attaching a patch to list only connections in those system views. Thanks! I pushed this. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: JIT performance bug/regression & JIT EXPLAIN
On Mon, Oct 28, 2019 at 5:02 PM Andres Freund wrote: > What I dislike about that is that it basically again is introducing "again"? Am I missing some history here? I'd love to read up on this if there are mistakes to learn from. > something that requires either pattern matching on key names (i.e. a key > of '(.*) JIT' is one that has information about JIT, and the associated > expresssion is in key $1), or knowing all the potential keys an > expression could be in. That still seems less awkward than having to handle a Filter field that's either scalar or a group. Most current EXPLAIN options just add additional fields to the structured plan instead of modifying it, no? If that output is better enough, though, maybe we should just always make Filter a group and go with the breaking change? If tooling authors need to treat this case specially anyway, might as well evolve the format. > Another alternative would be to just remove the 'Output' line when a > node doesn't project - it can't really carry meaning in those cases > anyway? ¯\_(ツ)_/¯ For what it's worth, I certainly wouldn't miss it.
Re: Missing dependency tracking for TableFunc nodes
On 2019-11-12 15:32:14 -0500, Tom Lane wrote: > Andres Freund writes: > > On 2019-11-12 10:19:30 -0500, Tom Lane wrote: > >> I could imagine annotations that say "this field contains a function OID" > >> or "this list contains collation OIDs" and then the find_expr_references > >> logic could be derived from that. > > > I want to attach some annotations to types, rather than fields. I > > e.g. introduced a Location typedef, annotated as being ignored for > > equality purposes, instead of annotating each 'int location'. Wonder if > > we should also do something like that for your hypothetical "function > > OID" etc. above - seems like it also might give the human reader of code > > a hint. > > Hm. We could certainly do "typedef FunctionOid Oid;", > "typedef CollationOidList List;" etc, but I think it'd get pretty > tedious pretty quickly --- just for this one purpose, you'd need > a couple of typedefs for every system catalog that contains > query-referenceable OIDs. Maybe that's better than comment-style > annotations, but I'm not convinced. I'm not saying that we should exclusively do so, just that it's worthwhile for a few frequent cases. > One issue there that I'm not sure how to resolve with autogenerated > code, much less automated checking, is that quite a few cases in > find_expr_references know that we don't need to record a dependency on > an OID stored in the node because there's an indirect dependency on > something else. For example, in FuncExpr we needn't log > funcresulttype because the funcid is enough dependency, and we needn't > log either funccollid or inputcollid because those are derived from > the input expressions or the function result type. (And giving up > those optimizations would be pretty costly, 4x more dependency checks > in this example alone.) Yea, that one is hard. I suspect the best way to address that is to have explicit code for a few cases that are worth optimizing (like e.g. FuncExpr), and for the rest use the generic logic using. I'd so far just written the specialized cases into the "generated metadata" using code - but we also could have an annotation that instructs to instead call some function, but I doubt that's worthwhile. > For sure I don't want both "CollationOid" and "RedundantCollationOid" > typedefs Indeed. > so it seems like annotation is the solution for this I'm not even sure annotations are going to be the easiest way to implement some of the more complicated edge cases. Might be easier to just open-code those, and fall back to generic logic for the rest. We'll have to see, I think. Greetings, Andres Freund
Re: Collation versioning
On Wed, Nov 13, 2019 at 3:27 AM Julien Rouhaud wrote: > On Sun, Nov 10, 2019 at 10:08 AM Thomas Munro wrote: > > That's because the 0003 patch only calls recordDependencyOnVersion() > > for simple attribute references. When > > recordDependencyOnSingleRelExpr() is called by index_create() to > > analyse ii_Expressions and ii_Predicate, it's going to have to be > > smart enough to detect collation references and record the versions. > > There is also some more code that ignores pinned collations hiding in > > there. > > > > This leads to the difficult question of how you recognise a real > > dependency on a collation's version in an expression. I have some > > vague ideas but haven't seriously looked into it yet. (The same > > question comes up for check constraint -> collation dependencies.) > > Oh good point. A simple and exhaustive way to deal with that would be > to teach recordMultipleDependencies() to override isObjectPinned() and > retrieve the collation version if the referenced object is a collation > and it's neither C or POSIX collation. It means that we could also > remove the extra "version" argument and get rid of > recordDependencyOnVersion to simply call recordMultipleDependencies in > create_index for direct column references having a collation. > > Would it be ok to add this kind of knowledge in > recordMultipleDependencies() or is it too hacky? That doesn't seem like the right place; that's a raw data insertion function, though... I guess it does already have enough brains to skip pinned objects. Hmm. > > I think create_index() will need to perform recursive analysis on > > composite types to look for text attributes, when they appear as > > simple attributes, and then add direct dependencies index -> collation > > to capture the versions. Then we might need to do the same for > > composite types hiding inside ii_Expressions and ii_Predicate (once we > > figure out what that really means). > > Isn't that actually a bug? For instance such an index will have a 0 > indcollation in pg_index, and according to pg_index documentation: > > " this contains the OID of the collation to use for the index, or zero > if the column is not of a collatable data type." > > You can't use a COLLATE expression on such data type, but it still has > a collation used. I don't think it's a bug. The information is available, but you have to follow the graph to get it. Considering that the composite type could be something like CREATE TYPE my_type AS (fr_name text COLLATE "fr_CA", en_name text COLLATE "en_CA"), there is no single collation you could put into pg_index.indcollation anyway. As for pg_depend, it's currently enough for the index to depend on the type, and the type to depend on the collation, because the purpose of dependencies is to control dropping and dumping order, but for our new purpose we also need to create direct dependencies index -> "fr_CA", index -> "en_CA" so we can record the current versions. > > 3. Test t/002_pg_dump.pl in src/bin/pg_upgrade fails. > > Apparently neither "make check" nor "make world" run this test :( > This was broken due collversion support in pg_dump, I have fixed it > locally. make check-world
Re: make pg_attribute_noreturn() work for msvc?
Andres Freund writes: > So perhaps we ought to rename pg_attribute_noreturn() to pg_noreturn(), > add a __declspec(noreturn) version, and move the existing uses to it. > I'm inclined to also drop the parentheses at the same time (i.e > pg_noreturn rather than pg_noreturn()) - it seems easier to mentally > parse the code that way. I guess my big question about that is whether pgindent will make a hash of it. One idea is to merge it with the "void" result type that such a function would presumably have, along the lines of #define pg_noreturn void __declspec(noreturn) ... extern pg_noreturn proc_exit(int); and if necessary, we could strongarm pgindent into believing that pg_noreturn is a typedef. regards, tom lane
Re: PHJ file leak.
On Tue, Nov 12, 2019 at 5:03 PM Thomas Munro wrote: > On Tue, Nov 12, 2019 at 4:23 PM Thomas Munro wrote: > > On Tue, Nov 12, 2019 at 4:20 PM Kyotaro Horiguchi > > wrote: > > > The previous patch would be wrong. The root cause is a open batch so > > > the right thing to be done at scan end is > > > ExecHashTableDeatchBatch. And the real issue here seems to be in > > > ExecutePlan, not in PHJ. > > > > You are right. Here is the email I just wrote that says the same > > thing, but with less efficiency: > > And yeah, your Make_parallel_shutdown_on_broken_channel.patch seems > like the real fix here. It's not optional to run that at > end-of-query, though you might get that impression from various > comments, and it's also not OK to call it before the end of the query, > though you might get that impression from what the code actually does. Here's the version I'd like to commit in a day or two, once the dust has settled on the minor release. Instead of adding yet another copy of that code, I just moved it out of the loop; this way there is no way to miss it. I think the comment could also be better, but I'll wait for the concurrent discussions about the meaning of ExecShutdownNode() to fix that in master. 0001-Make-sure-we-call-ExecShutdownNode-if-appropriate.patch Description: Binary data
Re: Missing dependency tracking for TableFunc nodes
Andres Freund writes: > On 2019-11-12 10:19:30 -0500, Tom Lane wrote: >> I could imagine annotations that say "this field contains a function OID" >> or "this list contains collation OIDs" and then the find_expr_references >> logic could be derived from that. > I want to attach some annotations to types, rather than fields. I > e.g. introduced a Location typedef, annotated as being ignored for > equality purposes, instead of annotating each 'int location'. Wonder if > we should also do something like that for your hypothetical "function > OID" etc. above - seems like it also might give the human reader of code > a hint. Hm. We could certainly do "typedef FunctionOid Oid;", "typedef CollationOidList List;" etc, but I think it'd get pretty tedious pretty quickly --- just for this one purpose, you'd need a couple of typedefs for every system catalog that contains query-referenceable OIDs. Maybe that's better than comment-style annotations, but I'm not convinced. > Wonder if there's any way to write an assertion check that verifies we > have the necessary dependencies. But the only idea I have - basically > record all the syscache lookups while parse analysing an expression, and > then check that all the necessary dependencies exist - seems too > complicated to be worthwhile. Yeah, it's problematic. One issue there that I'm not sure how to resolve with autogenerated code, much less automated checking, is that quite a few cases in find_expr_references know that we don't need to record a dependency on an OID stored in the node because there's an indirect dependency on something else. For example, in FuncExpr we needn't log funcresulttype because the funcid is enough dependency, and we needn't log either funccollid or inputcollid because those are derived from the input expressions or the function result type. (And giving up those optimizations would be pretty costly, 4x more dependency checks in this example alone.) For sure I don't want both "CollationOid" and "RedundantCollationOid" typedefs, so it seems like annotation is the solution for this, but I see no reasonable way to automatically verify such annotations. Still, just writing down the annotations would be a way to expose such assumptions for manual checking, which we don't really have now. regards, tom lane
make pg_attribute_noreturn() work for msvc?
Hi, At the bottom of https://www.postgresql.org/message-id/20191112192716.emrqs2afuefunw6v%40alap3.anarazel.de I mused about the somewhat odd coding pattern at the end of WalSndShutdown(): /* * Handle a client's connection abort in an orderly manner. */ static void WalSndShutdown(void) { /* * Reset whereToSendOutput to prevent ereport from attempting to send any * more messages to the standby. */ if (whereToSendOutput == DestRemote) whereToSendOutput = DestNone; proc_exit(0); abort();/* keep the compiler quiet */ } namely that we prox_exit() and then abort(). This case seems to be purely historical baggage (from when this was inside other functiosn, before being centralized), so we can likely just remove the abort(). But even back then, one would have hoped that proc_exit() being annotated with pg_attribute_noreturn() should have told the compiler enough. But it turns out, we don't actually implement that for MSVC. Which does explain at least some cases where we had to add "keep compiler quiet" type code specifically for MSVC. As it turns out msvc has it's own annotation for functions that don't return, __declspec(noreturn). But it unfortunately needs to be placed before where we, so far, placed pg_attribute_noreturn(), namely after the function name / parameters. Instead it needs to be before the function name. But as it turns out GCC et al's __attribute__((noreturn)) actually can also be placed there, and seemingly for a long time: https://godbolt.org/z/8v5aFS So perhaps we ought to rename pg_attribute_noreturn() to pg_noreturn(), add a __declspec(noreturn) version, and move the existing uses to it. I'm inclined to also drop the parentheses at the same time (i.e pg_noreturn rather than pg_noreturn()) - it seems easier to mentally parse the code that way. I actually find the placement closer to the return type easier to understand, so I'd find this mildly beneficial even without the msvc angle. Greetings, Andres Freund
Re: Extension development
Hi Yonatan, You can follow this blog for creating your own extension in PostgreSQL.. https://www.highgo.ca/2019/10/01/a-guide-to-create-user-defined-extension-modules-to-postgres/ -- Ahsan On Tue, Nov 12, 2019 at 11:54 AM Yonatan Misgan wrote: > I am developed my own PostgreSQL extension for learning purpose and it is > working correctly but I want to know to which components of the database is > my own extension components communicate. For example I have c code, make > file sql script, and control file after compiling the make file to which > components of the database are each of my extension components to > communicate. Thanks for your response. > > > > Regards, > > > > *Yonathan Misgan * > > *Assistant Lecturer, @ Debre Tabor University* > > *Faculty of Technology* > > *Department of Computer Science* > > *Studying MSc in **Computer Science** (in Data and Web Engineering) * > > *@ Addis Ababa University* > > *E-mail: yona...@dtu.edu.et * > > *yonathanmisga...@gmail.com * > > *Tel: **(+251)-911180185 (mob)* > > > -- Highgo Software (Canada/China/Pakistan) URL : http://www.highgo.ca ADDR: 10318 WHALLEY BLVD, Surrey, BC EMAIL: mailto: ahsan.h...@highgo.ca
Re: Missing dependency tracking for TableFunc nodes
Hi, On 2019-11-12 10:19:30 -0500, Tom Lane wrote: > I think that the long-term answer, if Andres gets somewhere with his > project to autogenerate code like this, is that we'd rely on annotating > the struct declarations to tell us what to do. In the case at hand, > I could imagine annotations that say "this field contains a function OID" > or "this list contains collation OIDs" and then the find_expr_references > logic could be derived from that. Now, that's not perfect either, because > it's always possible to forget to annotate something. But it'd be a lot > easier, because there'd be tons of nearby examples of doing it right. Yea, I think that'd be going in the right direction. I've a few annotations for other purposes in my local version of the patch (e.g. to ignore fields for comparison), and adding further ones for purposes like this ought to be easy. I want to attach some annotations to types, rather than fields. I e.g. introduced a Location typedef, annotated as being ignored for equality purposes, instead of annotating each 'int location'. Wonder if we should also do something like that for your hypothetical "function OID" etc. above - seems like it also might give the human reader of code a hint. On 2019-11-11 16:41:41 -0500, Tom Lane wrote: > I happened to notice that find_expr_references_walker has not > been taught anything about TableFunc nodes, which means it will > miss the type and collation OIDs embedded in such a node. > Would it be a good idea to move find_expr_references_walker to > nodeFuncs.c, in hopes of making it more visible to people adding > new node types? Can't hurt, at least. Reducing the number of files that need to be fairly mechanically be touched when adding a node type / node type field strikes me as a good idea. Wonder if there's any way to write an assertion check that verifies we have the necessary dependencies. But the only idea I have - basically record all the syscache lookups while parse analysing an expression, and then check that all the necessary dependencies exist - seems too complicated to be worthwhile. > We could decouple it from the specific use-case > of recordDependencyOnExpr by having it call a callback function > for each identified OID; although maybe there's no point in that, > since I'm not sure there are any other use-cases. I could see it being useful for a few other purposes, e.g. it seems *marginally* possible we could share *some* code with extract_query_dependencies(). But I think I'd only go there if we actually convert something else to it. Greetings, Andres Freund
Re: Proposal: Make use of C99 designated initialisers for nulls/values arrays
Hi, On 2019-11-12 14:17:42 +0900, Michael Paquier wrote: > On Tue, Oct 22, 2019 at 04:13:03PM +0530, Amit Kapila wrote: > > Hmm, but then what is your suggestion for existing code that uses {0}. > > If we reject this patch and leave the current code as it is, there is > > always a risk of some people using {0} and others using memset which > > will lead to further deviation in the code. Now, maybe if we change > > the existing code to always use memset where we use {0}, then we can > > kind of enforce such a rule for future patch authors. > > Well, we could have a shot at reducing the footprint of {0} then where > we can. I am seeing less than a dozen in contrib/, and a bit more > than thirty in src/backend/. -many. I think this serves zero positive purpose, except to make it harder to analyze code-flow. I think it's not worth going around to convert code to use {0} style initializers in most cases, but when authors write it, we shouldn't remove it either. Greetings, Andres Freund
Re: Coding in WalSndWaitForWal
Hi, On 2019-11-11 13:53:40 -0300, Alvaro Herrera wrote: > On 2019-Nov-11, Amit Kapila wrote: > > > On Mon, Nov 11, 2019 at 7:53 AM Michael Paquier wrote: > > > > So your suggestion would be to call GetFlushRecPtr() before the first > > > check on RecentFlushPtr and before entering the loop? > > > > No. What I meant was to keep the current code as-is and have an > > additional check on RecentFlushPtr before entering the loop. > > I noticed that the "return" at the bottom of the function does a > SetLatch(), but the other returns do not. Isn't that a bug? I don't think it is - We never reset the latch in that case. I don't see what we'd gain from setting it explicitly, other than unnecessarily performing more work? > /* >* Fast path to avoid acquiring the spinlock in case we already know we >* have enough WAL available. This is particularly interesting if we're >* far behind. >*/ > if (RecentFlushPtr != InvalidXLogRecPtr && > loc <= RecentFlushPtr) > + { > + SetLatch(MyLatch); > return RecentFlushPtr; > + } I.e. let's not do this. > /* Get a more recent flush pointer. */ > if (!RecoveryInProgress()) > RecentFlushPtr = GetFlushRecPtr(); > else > RecentFlushPtr = GetXLogReplayRecPtr(NULL); > > + if (loc <= RecentFlushPtr) > + { > + SetLatch(MyLatch); > + return RecentFlushPtr; > + } Hm. I'm doubtful this is a good idea - it essentially means we'd not check for interrupts, protocol replies, etc. for an unbounded amount of time. Whereas the existing fast-path does so for a bounded - although not necessarily short - amount of time. It seems to me it'd be better to just remove the "get a more recent flush pointer" block - it doesn't seem to currently surve a meaningful purpose. > for (;;) > { > longsleeptime; > > /* Clear any already-pending wakeups */ > ResetLatch(MyLatch); > > @@ -2267,15 +2276,14 @@ WalSndLoop(WalSndSendDataCallback send_data) > > /* Sleep until something happens or we time out */ > (void) WaitLatchOrSocket(MyLatch, wakeEvents, > > MyProcPort->sock, sleeptime, > > WAIT_EVENT_WAL_SENDER_MAIN); > } > } > - return; > } Having dug into history, the reason this exists is that there used to be the following below the return: - -send_failure: - -/* - * Get here on send failure. Clean up and exit. - * - * Reset whereToSendOutput to prevent ereport from attempting to send any - * more messages to the standby. - */ -if (whereToSendOutput == DestRemote) -whereToSendOutput = DestNone; - -proc_exit(0); -abort();/* keep the compiler quiet */ but when 5a991ef8692ed (Allow logical decoding via the walsender interface) moved the shutdown code into its own function, WalSndShutdown(), we left the returns in place. We still have the curious proc_exit(0); abort();/* keep the compiler quiet */ pattern in WalSndShutdown() - wouldn't the right approach to instead proc_exit() with pg_attribute_noreturn()? Greetings, Andres Freund
Re: [Patch] Optimize dropping of relation buffers using dlist
On Tue, Nov 12, 2019 at 10:49:49AM +, k.jami...@fujitsu.com wrote: On Thurs, November 7, 2019 1:27 AM (GMT+9), Robert Haas wrote: On Tue, Nov 5, 2019 at 10:34 AM Tomas Vondra wrote: > 2) This adds another hashtable maintenance to BufferAlloc etc. but > you've only done tests / benchmark for the case this optimizes. I > think we need to see a benchmark for workload that allocates and > invalidates lot of buffers. A pgbench with a workload that fits into > RAM but not into shared buffers would be interesting. Yeah, it seems pretty hard to believe that this won't be bad for some workloads. Not only do you have the overhead of the hash table operations, but you also have locking overhead around that. A whole new set of LWLocks where you have to take and release one of them every time you allocate or invalidate a buffer seems likely to cause a pretty substantial contention problem. I'm sorry for the late reply. Thank you Tomas and Robert for checking this patch. Attached is the v3 of the patch. - I moved the unnecessary items from buf_internals.h to cached_buf.c since most of of those items are only used in that file. - Fixed the bug of v2. Seems to pass both RT and TAP test now Thanks for the advice on benchmark test. Please refer below for test and results. [Machine spec] CPU: 16, Number of cores per socket: 8 RHEL6.5, Memory: 240GB scale: 3125 (about 46GB DB size) shared_buffers = 8GB [workload that fits into RAM but not into shared buffers] pgbench -i -s 3125 cachetest pgbench -c 16 -j 8 -T 600 cachetest [Patched] scaling factor: 3125 query mode: simple number of clients: 16 number of threads: 8 duration: 600 s number of transactions actually processed: 8815123 latency average = 1.089 ms tps = 14691.436343 (including connections establishing) tps = 14691.482714 (excluding connections establishing) [Master/Unpatched] ... number of transactions actually processed: 8852327 latency average = 1.084 ms tps = 14753.814648 (including connections establishing) tps = 14753.861589 (excluding connections establishing) My patch caused a little overhead of about 0.42-0.46%, which I think is small. Kindly let me know your opinions/comments about the patch or tests, etc. Now try measuring that with a read-only workload, with prepared statements. I've tried that on a machine with 16 cores, doing # 16 clients pgbench -n -S -j 16 -c 16 -M prepared -T 60 test # 1 client pgbench -n -S -c 1 -M prepared -T 60 test and average from 30 runs of each looks like this: # clients master patched % - 1 29690 27833 93.7% 16300935 283383 94.1% That's quite significant regression, considering it's optimizing an operation that is expected to be pretty rare (people are generally not dropping dropping objects as often as they query them). regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Proposal: Add more compile-time asserts to expose inconsistencies.
Hi Peter, Peter, :) On 2019-10-28 00:30:11 +, Smith, Peter wrote: > From: Andres Freund Sent: Sunday, 27 October 2019 12:03 > PM > > My proposal for this really was just to use this as a fallback for when > > static assert isn't available. Which in turn I was just suggesting because > > Tom wanted a fallback. > > The patch is updated to use "extern" technique only when "_Static_assert" is > unavailable. Cool. > /* > * forkname_to_number - look up fork number by name > diff --git a/src/include/c.h b/src/include/c.h > index d752cc0..3e24ff4 100644 > --- a/src/include/c.h > +++ b/src/include/c.h > @@ -838,11 +838,16 @@ extern void ExceptionalCondition(const char > *conditionName, > do { _Static_assert(condition, errmessage); } while(0) > #define StaticAssertExpr(condition, errmessage) \ > ((void) ({ StaticAssertStmt(condition, errmessage); true; })) > +/* StaticAssertDecl is suitable for use at file scope. */ > +#define StaticAssertDecl(condition, errmessage) \ > + _Static_assert(condition, errmessage) > #else/* > !HAVE__STATIC_ASSERT */ > #define StaticAssertStmt(condition, errmessage) \ > ((void) sizeof(struct { int static_assert_failure : (condition) ? 1 : > -1; })) > #define StaticAssertExpr(condition, errmessage) \ > StaticAssertStmt(condition, errmessage) > +#define StaticAssertDecl(condition, errmessage) \ > + extern void static_assert_func(int static_assert_failure[(condition) ? > 1 : -1]) > #endif /* > HAVE__STATIC_ASSERT */ > #else/* C++ */ > #if defined(__cpp_static_assert) && __cpp_static_assert >= 200410 > @@ -858,7 +863,6 @@ extern void ExceptionalCondition(const char > *conditionName, > #endif > #endif /* C++ > */ Peter Smith: Is there a reason to not just make StaticAssertExpr and StaticAssertStmt be the same? I don't want to proliferate variants that users have to understand if there's no compelling need. Nor do I think do we really need two different fallback implementation for static asserts. As far as I can tell we should be able to use the prototype based approach in all the cases where we currently use the "negative bit-field width" approach? Should then also update * Otherwise we fall back on a kluge that assumes the compiler will complain * about a negative width for a struct bit-field. This will not include a * helpful error message, but it beats not getting an error at all. Peter Eisentraut: Looking at the cplusplus variant, I'm somewhat surprised to see that you made both fallback and plain version unconditionally use GCC style compound expressions: commit a2c8e5cfdb9d82ae6d4bb8f37a4dc7cbeca63ec1 Author: Peter Eisentraut Date: 2016-08-30 12:00:00 -0400 Add support for static assertions in C++ ... +#if defined(__cpp_static_assert) && __cpp_static_assert >= 200410 +#define StaticAssertStmt(condition, errmessage) \ +static_assert(condition, errmessage) +#define StaticAssertExpr(condition, errmessage) \ +StaticAssertStmt(condition, errmessage) +#else +#define StaticAssertStmt(condition, errmessage) \ +do { struct static_assert_struct { int static_assert_failure : (condition) ? 1 : -1; }; } while(0) +#define StaticAssertExpr(condition, errmessage) \ +({ StaticAssertStmt(condition, errmessage); }) +#endif Was that intentional? The C version intentionally uses compound expressions only for the _Static_assert case, where configure tests for the compound expression support? As far as I can tell this'll not allow using our headers e.g. with msvc in C++ mode if somebody introduce a static assertion in a header - which seems like a likely and good outcome with the changes proposed here? Btw, it looks to me like msvc supports using the C++ static_assert() even in C mode: https://godbolt.org/z/b_dxDW Greetings, Andres Freund
Re: Ought to use heap_multi_insert() for pg_attribute/depend insertions?
On 2019-11-12 16:25:06 +0100, Daniel Gustafsson wrote: > > On 11 Nov 2019, at 09:32, Michael Paquier wrote: > > > This part has resulted in 75c1921, and we could just change > > DecodeMultiInsert() so as if there is no tuple data then we'd just > > leave. However, I don't feel completely comfortable with that either > > as it would be nice to still check that for normal relations we > > *always* have a FPW available. > > XLH_INSERT_CONTAINS_NEW_TUPLE will only be set in case of catalog relations > IIUC (that is, non logically decoded relations), so it seems to me that we can > have a fastpath out of DecodeMultiInsert() which inspects that flag without > problems. Is this proposal along the lines of what you were thinking? Maybe I'm missing something, but it's the opposite, no? boolneed_tuple_data = RelationIsLogicallyLogged(relation); ... if (need_tuple_data) xlrec->flags |= XLH_INSERT_CONTAINS_NEW_TUPLE; or am I misunderstanding what you mean? > @@ -1600,10 +1598,16 @@ recordDependencyOnExpr(const ObjectAddress *depender, > /* Remove any duplicates */ > eliminate_duplicate_dependencies(context.addrs); > > - /* And record 'em */ > - recordMultipleDependencies(depender, > -context.addrs->refs, > context.addrs->numrefs, > -behavior); > + /* > + * And record 'em. If we know we only have a single dependency then > + * avoid the extra cost of setting up a multi insert. > + */ > + if (context.addrs->numrefs == 1) > + recordDependencyOn(depender, &context.addrs->refs[0], behavior); > + else > + recordMultipleDependencies(depender, > + > context.addrs->refs, context.addrs->numrefs, > +behavior); I'm not sure this is actually a worthwhile complexity to keep. Hard to believe that setting up a multi-insert is goign to have a significant enough overhead to matter here? And if it does, is there a chance we can hide this repeated block somewhere within recordMultipleDependencies() or such? I don't like the repetitiveness here. Seems recordMultipleDependencies() could easily optimize the case of there being exactly one dependency to insert? > +/* > + * InsertPgAttributeTuples > + * Construct and insert multiple tuples in pg_attribute. > + * > + * This is a variant of InsertPgAttributeTuple() which dynamically allocates > + * space for multiple tuples. Having two so similar functions is a kludge, > but > + * for now it's a TODO to make it less terrible. > + */ > +void > +InsertPgAttributeTuples(Relation pg_attribute_rel, > + FormData_pg_attribute > *new_attributes, > + int natts, > + CatalogIndexState indstate) > +{ > + Datum values[Natts_pg_attribute]; > + boolnulls[Natts_pg_attribute]; > + HeapTuple tup; > + int i; > + TupleTableSlot **slot; > + > + /* > + * The slots are dropped in CatalogMultiInsertWithInfo(). TODO: natts > + * number of slots is not a reasonable assumption as a wide relation > + * would be detrimental, figuring a good number is left as a TODO. > + */ > + slot = palloc(sizeof(TupleTableSlot *) * natts); Hm. Looking at SELECT avg(pg_column_size(pa)) FROM pg_attribute pa; yielding ~144 bytes, we can probably cap this at 128 or such, without loosing efficiency. Or just use #define MAX_BUFFERED_BYTES 65535 from copy.c or such (MAX_BUFFERED_BYTES / sizeof(FormData_pg_attribute)). > + /* This is a tad tedious, but way cleaner than what we used to do... */ I don't like comments that refer to "what we used to" because there's no way for anybody to make sense of that, because it's basically a dangling reference :) > + memset(values, 0, sizeof(values)); > + memset(nulls, false, sizeof(nulls)); > + /* start out with empty permissions and empty options */ > + nulls[Anum_pg_attribute_attacl - 1] = true; > + nulls[Anum_pg_attribute_attoptions - 1] = true; > + nulls[Anum_pg_attribute_attfdwoptions - 1] = true; > + nulls[Anum_pg_attribute_attmissingval - 1] = true; > + > + /* attcacheoff is always -1 in storage */ > + values[Anum_pg_attribute_attcacheoff - 1] = Int32GetDatum(-1); > + > + for (i = 0; i < natts; i++) > + { > + values[Anum_pg_attribute_attrelid - 1] = > ObjectIdGetDatum(new_attributes[i].attrelid); > + values[Anum_pg_attribute_attname - 1] = > NameGetDatum(&new_attributes[i].attname); > + values[Anum_pg_attribute_atttypid - 1] = > ObjectIdGetDatum(new_attributes[i]
Re: SQL/JSON: JSON_TABLE
Hi please, can you rebase 0001-SQL-JSON-functions-v40.patch. I have a problem with patching Pavel
Re: segmentation fault when cassert enabled
On Mon, 28 Oct 2019 16:47:02 +0900 (JST) Kyotaro Horiguchi wrote: > At Fri, 25 Oct 2019 12:28:38 -0400, Tom Lane wrote in > > Jehan-Guillaume de Rorthais writes: > > > When investigating for the bug reported in thread "logical replication - > > > negative bitmapset member not allowed", I found a way to seg fault > > > postgresql only when cassert is enabled. > > > ... > > > I hadn't time to digg further yet. However, I don't understand why this > > > crash is triggered when cassert is enabled. > > > > Most likely, it's not so much assertions that provoke the crash as > > CLOBBER_FREED_MEMORY, ie the actual problem here is use of already-freed > > memory. > > Agreed. > > By the way I didn't get a crash by Jehan's script with the > --enable-cassert build of the master HEAD of a few days ago. I am now working with HEAD and I can confirm I am able to make it crash 99% of the time using my script. It feels like a race condition between cache invalidation and record processing from worker.c. Make sure you have enough write activity during the test. > FWIW I sometimes got SEGVish crashes or mysterious misbehavor when > some structs were changed and I didn't do "make clean". Rarely I > needed "make distclean". (Yeah, I didn't ususally turn on > --enable-depend..) I'm paranoid, I always do: * make distclean * git reset; git clean -df * ./configure && make install Regards,
Re: Built-in connection pooler
Hi On 12.11.2019 10:50, ideriha.take...@fujitsu.com wrote: Hi. From: Konstantin Knizhnik [mailto:k.knizh...@postgrespro.ru] New version of builtin connection pooler fixing handling messages of extended protocol. Here are things I've noticed. 1. Is adding guc to postgresql.conf.sample useful for users? Good catch: I will add it. 2. When proxy_port is a bit large (perhaps more than 2^15), connection failed though regular "port" is fine with number more than 2^15. $ bin/psql -p 32768 2019-11-12 16:11:25.460 JST [5617] LOG: Message size 84 2019-11-12 16:11:25.461 JST [5617] WARNING: could not setup local connect to server 2019-11-12 16:11:25.461 JST [5617] DETAIL: invalid port number: "-22768" 2019-11-12 16:11:25.461 JST [5617] LOG: Handshake response will be sent to the client later when backed is assigned psql: error: could not connect to server: invalid port number: "-22768" Hmmm, ProxyPortNumber is used exactly in the same way as PostPortNumber. I was able to connect to the specified port: knizhnik@knizhnik:~/dtm-data$ psql postgres -p 42768 psql (13devel) Type "help" for help. postgres=# \q knizhnik@knizhnik:~/dtm-data$ psql postgres -h 127.0.0.1 -p 42768 psql (13devel) Type "help" for help. postgres=# \q 3. When porxy_port is 6543 and connection_proxies is 2, running "make installcheck" twice without restarting server failed. This is because of remaining backend. == dropping database "regression" == ERROR: database "regression" is being accessed by other users DETAIL: There is 1 other session using the database. command failed: "/usr/local/pgsql-connection-proxy-performance/bin/psql" -X -c "DROP DATABASE IF EXISTS \"regression\"" "postgres" Yes, this is known limitation. Frankly speaking I do not consider it as a problem: it is not possible to drop database while there are active sessions accessing it. And definitely proxy has such sessions. You can specify idle_pool_worker_timeout to shutdown pooler workers after some idle time. In this case, if you make large enough pause between test iterations, then workers will be terminated and it will be possible to drop database. 4. When running "make installcheck-world" with various connection-proxies, it results in a different number of errors. With connection_proxies = 2, the test never ends. With connection_proxies = 20, 23 tests failed. More connection_proxies, the number of failed tests decreased. Please notice, that each proxy maintains its own connection pool. Default number of pooled backends is 10 (session_pool_size). If you specify too large number of proxies then number of spawned backends = session_pool_size * connection_proxies can be too large (for the specified number of max_connections). Please notice the difference between number of proxies and number of pooler backends. Usually one proxy process is enough to serve all workers. Only in case of MPP systems with large number of cores and especially with SSL connections, proxy can become a bottleneck. In this case you can configure several proxies. But having more than 1-4 proxies seems to be bad idea. But in case of check-world the problem is not related with number of proxies. It takes place even with connection_proxies = 1 There was one bug with handling clients terminated inside transaction. It is fixed in the attached patch. But there is still problem with passing isolation tests under connection proxy: them are using pg_isolation_test_session_is_blocked function which checks if backends with specified PIDs are blocked. But as far as in case of using connection proxy session is no more bounded to the particular backend, this check may not work as expected and test is blocked. I do not know how it can be fixed and not sure if it has to be fixed at all. -- Konstantin Knizhnik Postgres Professional: http://www.postgrespro.com The Russian Postgres Company diff --git a/contrib/spi/refint.c b/contrib/spi/refint.c index adf0490..5c2095f 100644 --- a/contrib/spi/refint.c +++ b/contrib/spi/refint.c @@ -11,6 +11,7 @@ #include "commands/trigger.h" #include "executor/spi.h" +#include "storage/proc.h" #include "utils/builtins.h" #include "utils/rel.h" @@ -93,6 +94,8 @@ check_primary_key(PG_FUNCTION_ARGS) else tuple = trigdata->tg_newtuple; + MyProc->is_tainted = true; + trigger = trigdata->tg_trigger; nargs = trigger->tgnargs; args = trigger->tgargs; @@ -284,6 +287,8 @@ check_foreign_key(PG_FUNCTION_ARGS) /* internal error */ elog(ERROR, "check_foreign_key: cannot process INSERT events"); + MyProc->is_tainted = true; + /* Have to check tg_trigtuple - tuple being deleted */ trigtuple = trigdata->tg_trigtuple; diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml index c91e3e1..df0bcaf 100644 --- a/doc/src/sgml/config.sgml +++ b/doc/src/sgml/config.sgml @@ -719,6 +719,169 @@ include_dir 'conf.d' + + max_sessions (integer) + + ma
[PATCH][BUG FIX] Potential uninitialized vars used.
Hi, Var TargetEntry *tle; Have several paths where can it fail. Can anyone check this bug fix? --- \dll\postgresql-12.0\a\backend\parser\parse_expr.c Mon Sep 30 17:06:55 2019 +++ parse_expr.cTue Nov 12 12:43:07 2019 @@ -349,6 +349,7 @@ errmsg("DEFAULT is not allowed in this context"), parser_errposition(pstate, ((SetToDefault *) expr)->location))); + result = NULL; /* keep compiler quiet */ break; /* @@ -1637,11 +1638,13 @@ pstate->p_multiassign_exprs = lappend(pstate->p_multiassign_exprs, tle); } - else + else { ereport(ERROR, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), errmsg("source for a multiple-column UPDATE item must be a sub-SELECT or ROW() expression"), parser_errposition(pstate, exprLocation(maref->source; +return NULL; +} } else { @@ -1653,6 +1656,10 @@ Assert(pstate->p_multiassign_exprs != NIL); tle = (TargetEntry *) llast(pstate->p_multiassign_exprs); } +if (tle == NULL) { + elog(ERROR, "unexpected expr type in multiassign list"); + return NULL;/* keep compiler quiet */ +} /* * Emit the appropriate output expression for the current column parse_expr.c.patch Description: parse_expr.c.patch
Re: Option to dump foreign data in pg_dump
> On 12 Nov 2019, at 15:21, Luis Carril wrote: > >> The nitpicks have been addressed. However, it seems that the new file >> containing the test FDW seems missing from the new version of the patch. Did >> you forget to git add the file? > Yes, I forgot, thanks for noticing. New patch attached again. The patch applies, compiles and tests clean. The debate whether we want to allow dumping of foreign data at all will continue but I am marking the patch as ready for committer as I believe it is ready for input on that level. cheers ./daniel
Re: Ought to use heap_multi_insert() for pg_attribute/depend insertions?
> On 11 Nov 2019, at 09:32, Michael Paquier wrote: > This part has resulted in 75c1921, and we could just change > DecodeMultiInsert() so as if there is no tuple data then we'd just > leave. However, I don't feel completely comfortable with that either > as it would be nice to still check that for normal relations we > *always* have a FPW available. XLH_INSERT_CONTAINS_NEW_TUPLE will only be set in case of catalog relations IIUC (that is, non logically decoded relations), so it seems to me that we can have a fastpath out of DecodeMultiInsert() which inspects that flag without problems. Is this proposal along the lines of what you were thinking? The patch is now generating an error in the regression tests as well, on your recently added CREATE INDEX test from 68ac9cf2499236996f3d4bf31f7f16d5bd3c77af. By using the ObjectAddresses API the dependencies are deduped before recorded, removing the duplicate entry from that test output. AFAICT there is nothing benefiting us from having duplicate dependencies, so this seems an improvement (albeit tiny and not very important), or am I missing something? Is there a use for duplicate dependencies? Attached version contains the above two fixes, as well as a multi_insert for dependencies in CREATE EXTENSION which I had missed to git add in previous versions. cheers ./daniel catalog_multi_insert-v5.patch Description: Binary data
Re: documentation inconsistent re: alignment
Chapman Flack writes: > On 10/20/19 14:47, Tom Lane wrote: >> Probably the statement in CREATE TYPE is too strong. There are, I >> believe, still machines in the buildfarm where maxalign is just 4. > So just closing the circle on this, the low-down seems to be that > the alignments called s, i, and d (by pg_type), and int2, int4, and > double (by CREATE TYPE) refer to the machine values configure picks > for ALIGNOF_SHORT, ALIGNOF_INT, and ALIGNOF_DOUBLE, respectively. Right. > And while configure also defines an ALIGNOF_LONG, and there are > LONGALIGN macros in c.h that use it, that one isn't a choice when > creating a type, presumably because it's never been usefully different > on any interesting platform? The problem with "long int" is that it's 32 bits on some platforms and 64 bits on others, so it's not terribly useful as a basis for a user-visible SQL type. That's why it's not accounted for in the typalign options. I think ALIGNOF_LONG is just there for completeness' sake --- it doesn't look to me like we actually use that, or LONGALIGN, anyplace. regards, tom lane
Re: Missing dependency tracking for TableFunc nodes
Mark Dilger writes: > I played with this a bit, making the change I proposed, and got lots of > warnings from the compiler. I don't know how many of these would need > to be suppressed by adding a no-op for them at the end of the switch vs. > how many need to be handled, but the attached patch implements the idea. > I admit adding all these extra cases to the end is verbose Yeah, that's why it's not done that way ... > The change as written is much too verbose to be acceptable, but given > how many places in the code could really use this sort of treatment, I > wonder if there is a way to reorganize the NodeTag enum into multiple > enums, one for each logical subtype (such as executor nodes, plan nodes, > etc) and then have switches over enums of the given subtype, with the > compiler helping detect tags of same subtype that are unhandled in the > switch. The problem here is that the set of nodes of interest can vary depending on what you're doing. As a case in point, find_expr_references has to cover both expression nodes and some things that aren't expression nodes but can represent dependencies of a plan tree. I think that the long-term answer, if Andres gets somewhere with his project to autogenerate code like this, is that we'd rely on annotating the struct declarations to tell us what to do. In the case at hand, I could imagine annotations that say "this field contains a function OID" or "this list contains collation OIDs" and then the find_expr_references logic could be derived from that. Now, that's not perfect either, because it's always possible to forget to annotate something. But it'd be a lot easier, because there'd be tons of nearby examples of doing it right. regards, tom lane
Re: Option to dump foreign data in pg_dump
On 2019-Nov-12, Luis Carril wrote: > But, not all foreign tables are necessarily in a remote server like > the ones referenced by the postgres_fdw. > In FDWs like swarm64da, cstore, citus or timescaledb, the foreign > tables are part of your database, and one could expect that a dump of > the database includes data from these FDWs. BTW these are not FDWs in the "foreign" sense at all; they're just abusing the FDW system in order to be able to store data in some different way. The right thing to do IMO is to port these systems to be users of the new storage abstraction (table AM). If we do that, what value is there to the feature being proposed here? -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Collation versioning
On Sun, Nov 10, 2019 at 10:08 AM Thomas Munro wrote: > > Some more thoughts: > > 1. If you create an index on an expression that includes a COLLATE or > a partial index that has one in the WHERE clause, you get bogus > warnings: > > postgres=# create table t (v text); > CREATE TABLE > postgres=# create index on t(v) where v > 'hello' collate "en_NZ"; > WARNING: index "t_v_idx3" depends on collation "en_NZ" version "", > but the current version is "2.28" > DETAIL: The index may be corrupted due to changes in sort order. > HINT: REINDEX to avoid the risk of corruption. > CREATE INDEX > > postgres=# create index on t((case when v < 'x' collate "en_NZ" then > 'foo' else 'bar' end)); > WARNING: index "t_case_idx" depends on collation "en_NZ" version "", > but the current version is "2.28" > DETAIL: The index may be corrupted due to changes in sort order. > HINT: REINDEX to avoid the risk of corruption. > CREATE INDEX > > That's because the 0003 patch only calls recordDependencyOnVersion() > for simple attribute references. When > recordDependencyOnSingleRelExpr() is called by index_create() to > analyse ii_Expressions and ii_Predicate, it's going to have to be > smart enough to detect collation references and record the versions. > There is also some more code that ignores pinned collations hiding in > there. > > This leads to the difficult question of how you recognise a real > dependency on a collation's version in an expression. I have some > vague ideas but haven't seriously looked into it yet. (The same > question comes up for check constraint -> collation dependencies.) Oh good point. A simple and exhaustive way to deal with that would be to teach recordMultipleDependencies() to override isObjectPinned() and retrieve the collation version if the referenced object is a collation and it's neither C or POSIX collation. It means that we could also remove the extra "version" argument and get rid of recordDependencyOnVersion to simply call recordMultipleDependencies in create_index for direct column references having a collation. Would it be ok to add this kind of knowledge in recordMultipleDependencies() or is it too hacky? > 2. If you create a composite type with a text attribute (with or > without an explicit collation), and then create an index on a column > of that type, we don't record the dependency. > > postgres=# create type my_type as (x text collate "en_NZ"); > CREATE TYPE > postgres=# create table t (v my_type); > CREATE TABLE > postgres=# create index on t(v); > CREATE INDEX > postgres=# select * from pg_depend where refobjversion != ''; > classid | objid | objsubid | refclassid | refobjid | refobjsubid | > refobjversion | deptype > -+---+--++--+-+---+- > (0 rows) > > I think create_index() will need to perform recursive analysis on > composite types to look for text attributes, when they appear as > simple attributes, and then add direct dependencies index -> collation > to capture the versions. Then we might need to do the same for > composite types hiding inside ii_Expressions and ii_Predicate (once we > figure out what that really means). Isn't that actually a bug? For instance such an index will have a 0 indcollation in pg_index, and according to pg_index documentation: " this contains the OID of the collation to use for the index, or zero if the column is not of a collatable data type." You can't use a COLLATE expression on such data type, but it still has a collation used. > 3. Test t/002_pg_dump.pl in src/bin/pg_upgrade fails. Apparently neither "make check" nor "make world" run this test :( This was broken due collversion support in pg_dump, I have fixed it locally. > 4. In the warning message we should show get_collation_name() instead > of the OID. +1, I also fixed it locally.
Re: Option to dump foreign data in pg_dump
The nitpicks have been addressed. However, it seems that the new file containing the test FDW seems missing from the new version of the patch. Did you forget to git add the file? Yes, I forgot, thanks for noticing. New patch attached again. Cheers Luis M Carril diff --git a/doc/src/sgml/ref/pg_dump.sgml b/doc/src/sgml/ref/pg_dump.sgml index 4bcd4bdaef..319851b936 100644 --- a/doc/src/sgml/ref/pg_dump.sgml +++ b/doc/src/sgml/ref/pg_dump.sgml @@ -767,6 +767,34 @@ PostgreSQL documentation + + --include-foreign-data=foreignserver + + +Dump the data for any foreign table with a foreign server +matching foreignserver +pattern. Multiple foreign servers can be selected by writing multiple --include-foreign-data. +Also, the foreignserver parameter is +interpreted as a pattern according to the same rules used by +psql's \d commands (see ), +so multiple foreign servers can also be selected by writing wildcard characters +in the pattern. When using wildcards, be careful to quote the pattern +if needed to prevent the shell from expanding the wildcards; see +. +The only exception is that an empty pattern is disallowed. + + + + + When --include-foreign-data is specified, pg_dump + does not check if the foreign table is also writeable. Therefore, there is no guarantee that + the results of a foreign table dump can be successfully restored by themselves into a clean database. + + + + + --inserts diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c index bf69adc2f4..31465dec79 100644 --- a/src/bin/pg_dump/pg_dump.c +++ b/src/bin/pg_dump/pg_dump.c @@ -120,6 +120,8 @@ static SimpleStringList table_exclude_patterns = {NULL, NULL}; static SimpleOidList table_exclude_oids = {NULL, NULL}; static SimpleStringList tabledata_exclude_patterns = {NULL, NULL}; static SimpleOidList tabledata_exclude_oids = {NULL, NULL}; +static SimpleStringList foreign_servers_include_patterns = {NULL, NULL}; +static SimpleOidList foreign_servers_include_oids = {NULL, NULL}; char g_opaque_type[10]; /* name for the opaque type */ @@ -156,6 +158,9 @@ static void expand_schema_name_patterns(Archive *fout, SimpleStringList *patterns, SimpleOidList *oids, bool strict_names); +static void expand_foreign_server_name_patterns(Archive *fout, + SimpleStringList *patterns, + SimpleOidList *oids); static void expand_table_name_patterns(Archive *fout, SimpleStringList *patterns, SimpleOidList *oids, @@ -388,6 +393,7 @@ main(int argc, char **argv) {"no-sync", no_argument, NULL, 7}, {"on-conflict-do-nothing", no_argument, &dopt.do_nothing, 1}, {"rows-per-insert", required_argument, NULL, 10}, + {"include-foreign-data", required_argument, NULL, 11}, {NULL, 0, NULL, 0} }; @@ -604,6 +610,15 @@ main(int argc, char **argv) dopt.dump_inserts = (int) rowsPerInsert; break; + case 11:/* include foreign data */ +if (strcmp(optarg, "") == 0) +{ + pg_log_error("empty string is not a valid pattern in --include-foreign-data"); + exit_nicely(1); +} +simple_string_list_append(&foreign_servers_include_patterns, optarg); +break; + default: fprintf(stderr, _("Try \"%s --help\" for more information.\n"), progname); exit_nicely(1); @@ -645,6 +660,12 @@ main(int argc, char **argv) exit_nicely(1); } + if (dopt.schemaOnly && foreign_servers_include_patterns.head != NULL) + { + pg_log_error("options -s/--schema-only and --include-foreign-data cannot be used together"); + exit_nicely(1); + } + if (dopt.dataOnly && dopt.outputClean) { pg_log_error("options -c/--clean and -a/--data-only cannot be used together"); @@ -812,6 +833,9 @@ main(int argc, char **argv) &tabledata_exclude_oids, false); + expand_foreign_server_name_patterns(fout, &foreign_servers_include_patterns, + &foreign_servers_include_oids); + /* non-matching exclusion patterns aren't an error */ /* @@ -1035,6 +1059,9 @@ help(const char *progname) printf(_(" --use-set-session-authorization\n" " use SET SESSION AUTHORIZATION commands instead of\n" " ALTER OWNER commands to set ownership\n")); + printf(_(" --include-foreign-data=PATTERN\n" + " include data of foreign tables with the named\n" + " foreign servers in dump\n")); printf(_("\nConnection options:\n")); printf(_(" -d, --dbname=DBNAME database to dump\n")); @@ -1333,6 +1360,53 @@ expand_schema_name_patterns(Archive *fout, destroyPQExpBuffer(query); } +/* + * Find the OIDs of all foreign servers matching the given list of patterns, + * and append them to the given OID
RE: [PATCH][BUG_FIX] Potential null pointer dereferencing.
Hi, The condition is : 74. if (TupIsNull(slot)) is true 85. if (TupIsNull(resultTupleSlot)) is true too. If resultTupleSlot is not can be NULL, why test if (TupIsNull(resultTupleSlot))? Occurring these two conditions ExecClearTuple is called, but, don't check by NULL arg. There are at least 2 more possible cases, envolving ExecClearTuple: nodeFunctionscan.c and nodeWindowAgg.c Best regards, Ranier Vilela De: Daniel Gustafsson Enviado: terça-feira, 12 de novembro de 2019 13:43 Para: Ranier Vilela Cc: PostgreSQL Hackers Assunto: Re: [PATCH][BUG_FIX] Potential null pointer dereferencing. > On 12 Nov 2019, at 14:07, Ranier Vilela wrote: > ExecClearTuple don't check por NULL pointer arg and according > TupIsNull slot can be NULL. I assume you are referring to the TupIsNull(resultTupleSlot) check a few lines down in the same loop? If resultTupleSlot was indeed NULL and not empty, the subsequent call to ExecCopySlot would be a NULL pointer dereference too. I might be missing something obvious, but in which case can resultTupleSlot be NULL when calling ExecUnique? cheers ./daniel
Re: [PATCH][BUG_FIX] Potential null pointer dereferencing.
> On 12 Nov 2019, at 14:07, Ranier Vilela wrote: > ExecClearTuple don't check por NULL pointer arg and according > TupIsNull slot can be NULL. I assume you are referring to the TupIsNull(resultTupleSlot) check a few lines down in the same loop? If resultTupleSlot was indeed NULL and not empty, the subsequent call to ExecCopySlot would be a NULL pointer dereference too. I might be missing something obvious, but in which case can resultTupleSlot be NULL when calling ExecUnique? cheers ./daniel
Re: [HACKERS] Block level parallel vacuum
On Tue, Nov 12, 2019 at 5:30 PM Masahiko Sawada wrote: > > On Tue, 12 Nov 2019 at 20:11, Amit Kapila wrote: > > > > On Tue, Nov 12, 2019 at 3:39 PM Masahiko Sawada > > wrote: > > > > > > On Tue, 12 Nov 2019 at 18:26, Dilip Kumar wrote: > > > > > > > > On Tue, Nov 12, 2019 at 2:25 PM Amit Kapila > > > > wrote: > > > > > > > > > > Yeah, maybe something like amparallelvacuumoptions. The options can > > > > > be: > > > > > > > > > > VACUUM_OPTION_NO_PARALLEL 0 # vacuum (neither bulkdelete nor > > > > > vacuumcleanup) can't be performed in parallel > > > > > VACUUM_OPTION_NO_PARALLEL_CLEANUP 1 # vacuumcleanup cannot be > > > > > performed in parallel (hash index will set this flag) > > > > > > > > Maybe we don't want this option? because if 3 or 4 is not set then we > > > > will not do the cleanup in parallel right? > > > > > > > > Yeah, but it is better to be explicit about this. > > VACUUM_OPTION_NO_PARALLEL_BULKDEL is missing? > I am not sure if that is required. > I think brin indexes > will use this flag. > Brin index can set VACUUM_OPTION_PARALLEL_CLEANUP in my proposal and it should work. > It will end up with > (VACUUM_OPTION_NO_PARALLEL_CLEANUP | > VACUUM_OPTION_NO_PARALLEL_BULKDEL) is equivalent to > VACUUM_OPTION_NO_PARALLEL, though. > > > > > > > > VACUUM_OPTION_PARALLEL_BULKDEL 2 # bulkdelete can be done in > > > > > parallel (Indexes nbtree, hash, gin, gist, spgist, bloom will set this > > > > > flag) > > > > > VACUUM_OPTION_PARALLEL_COND_CLEANUP 3 # vacuumcleanup can be done in > > > > > parallel if bulkdelete is not performed (Indexes nbtree, brin, hash, > > > > > gin, gist, spgist, bloom will set this flag) > > > > > VACUUM_OPTION_PARALLEL_CLEANUP 4 # vacuumcleanup can be done in > > > > > parallel even if bulkdelete is already performed (Indexes gin, brin, > > > > > and bloom will set this flag) > > > > > > > > > > Does something like this make sense? > > > > > > 3 and 4 confused me because 4 also looks conditional. How about having > > > two flags instead: one for doing parallel cleanup when not performed > > > yet (VACUUM_OPTION_PARALLEL_COND_CLEANUP) and another one for doing > > > always parallel cleanup (VACUUM_OPTION_PARALLEL_CLEANUP)? > > > > > > > Hmm, this is exactly what I intend to say with 3 and 4. I am not sure > > what makes you think 4 is conditional. > > Hmm so why gin and bloom will set 3 and 4 flags? I thought if it sets > 4 it doesn't need to set 3 because 4 means always doing cleanup in > parallel. > Yeah, that makes sense. They can just set 4. > > > > > That way, we > > > can have flags as follows and index AM chooses two flags, one from the > > > first two flags for bulk deletion and another from next three flags > > > for cleanup. > > > > > > VACUUM_OPTION_PARALLEL_NO_BULKDEL 1 << 0 > > > VACUUM_OPTION_PARALLEL_BULKDEL 1 << 1 > > > VACUUM_OPTION_PARALLEL_NO_CLEANUP 1 << 2 > > > VACUUM_OPTION_PARALLEL_COND_CLEANUP 1 << 3 > > > VACUUM_OPTION_PARALLEL_CLEANUP 1 << 4 > > > > > > > This also looks reasonable, but if there is an index that doesn't want > > to support a parallel vacuum, it needs to set multiple flags. > > Right. It would be better to use uint16 as two uint8. I mean that if > first 8 bits are 0 it means VACUUM_OPTION_PARALLEL_NO_BULKDEL and if > next 8 bits are 0 means VACUUM_OPTION_PARALLEL_NO_CLEANUP. Other flags > could be followings: > > VACUUM_OPTION_PARALLEL_BULKDEL 0x0001 > VACUUM_OPTION_PARALLEL_COND_CLEANUP 0x0100 > VACUUM_OPTION_PARALLEL_CLEANUP 0x0200 > Hmm, I think we should define these flags in the most simple way. Your previous proposal sounds okay to me. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: Option to dump foreign data in pg_dump
> On 12 Nov 2019, at 12:12, Luis Carril wrote: >a new version of the patch with the tests from Daniel (thanks!) and the > nitpicks. The nitpicks have been addressed. However, it seems that the new file containing the test FDW seems missing from the new version of the patch. Did you forget to git add the file? >> I don't feel good about this feature. >> pg_dump should not dump any data that are not part of the database >> being dumped. >> >> If you restore such a dump, the data will be inserted into the foreign table, >> right? Unless someone emptied the remote table first, this will add >> duplicated data to that table. >> I think that is an unpleasant surprise. I'd expect that if I drop a database >> and restore it from a dump, it should be as it was before. This change would >> break that assumption. >> >> What are the use cases of a dump with foreign table data? >> >> Unless I misunderstood something there, -1. > > This feature is opt-in so if the user makes dumps of a remote server > explicitly by other means, then the user would not need to use these option. > But, not all foreign tables are necessarily in a remote server like the ones > referenced by the postgres_fdw. > In FDWs like swarm64da, cstore, citus or timescaledb, the foreign tables are > part of your database, and one could expect that a dump of the database > includes data from these FDWs. Right, given the deliberate opt-in which is required I don't see much risk of unpleasant user surprises. There are no doubt foot-guns available with this feature, as has been discussed upthread, but the current proposal is IMHO minimizing them. cheers ./daniel
Re: MarkBufferDirtyHint() and LSN update
Kyotaro Horiguchi wrote: > At Mon, 11 Nov 2019 10:03:14 +0100, Antonin Houska wrote > in > > Michael Paquier wrote: > > > Does something like the attached patch make sense? Reviews are > > > welcome. > > > > This looks good to me. > > I have a qustion. > > The current code assumes that !BM_DIRTY means that the function is > dirtying the page. But if !BM_JUST_DIRTIED, the function actually is > going to re-dirty the page even if BM_DIRTY. It makes sense to me. I can imagine the following: 1. FlushBuffer() cleared BM_JUST_DIRTIED, wrote the page to disk but hasn't yet cleared BM_DIRTY. 2. Another backend changed a hint bit in shared memory and called MarkBufferDirtyHint(). Thus FlushBuffer() missed the current hint bit change, so we need to dirty the page again. > If this is correct, the trigger for stats update is not !BM_DIRTY but > !BM_JUST_DIRTIED, or the fact that we passed the line of > XLogSaveBufferForHint() could be the trigger, regardless whether the > LSN is valid or not. I agree. -- Antonin Houska Web: https://www.cybertec-postgresql.com
[PATCH][BUG_FIX] Potential null pointer dereferencing.
Hi, ExecClearTuple don't check por NULL pointer arg and according TupIsNull slot can be NULL. Can anyone check this buf fix? --- \dll\postgresql-12.0\a\backend\executor\nodeUnique.cMon Sep 30 17:06:55 2019 +++ nodeUnique.cTue Nov 12 09:54:34 2019 @@ -74,7 +74,8 @@ if (TupIsNull(slot)) { /* end of subplan, so we're done */ - ExecClearTuple(resultTupleSlot); + if (!TupIsNull(resultTupleSlot)) + ExecClearTuple(resultTupleSlot); return NULL; } nodeUnique.c.patch Description: nodeUnique.c.patch
Re: MarkBufferDirtyHint() and LSN update
At Mon, 11 Nov 2019 10:03:14 +0100, Antonin Houska wrote in > Michael Paquier wrote: > > Does something like the attached patch make sense? Reviews are > > welcome. > > This looks good to me. I have a qustion. The current code assumes that !BM_DIRTY means that the function is dirtying the page. But if !BM_JUST_DIRTIED, the function actually is going to re-dirty the page even if BM_DIRTY. If this is correct, the trigger for stats update is not !BM_DIRTY but !BM_JUST_DIRTIED, or the fact that we passed the line of XLogSaveBufferForHint() could be the trigger, regardless whether the LSN is valid or not. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: [HACKERS] Block level parallel vacuum
On Tue, 12 Nov 2019 at 20:29, Dilip Kumar wrote: > > On Tue, Nov 12, 2019 at 4:04 PM Masahiko Sawada > wrote: > > > > On Mon, 11 Nov 2019 at 17:57, Dilip Kumar wrote: > > > > > > On Tue, Oct 29, 2019 at 12:37 PM Masahiko Sawada > > > wrote: > > > > I realized that v31-0006 patch doesn't work fine so I've attached the > > > > updated version patch that also incorporated some comments I got so > > > > far. Sorry for the inconvenience. I'll apply your 0001 patch and also > > > > test the total delay time. > > > > > > > While reviewing the 0002, I got one doubt related to how we are > > > dividing the maintainance_work_mem > > > > > > +prepare_index_statistics(LVShared *lvshared, Relation *Irel, int > > > nindexes) > > > +{ > > > + /* Compute the new maitenance_work_mem value for index vacuuming */ > > > + lvshared->maintenance_work_mem_worker = > > > + (nindexes_mwm > 0) ? maintenance_work_mem / nindexes_mwm : > > > maintenance_work_mem; > > > +} > > > Is it fair to just consider the number of indexes which use > > > maintenance_work_mem? Or we need to consider the number of worker as > > > well. My point is suppose there are 10 indexes which will use the > > > maintenance_work_mem but we are launching just 2 workers then what is > > > the point in dividing the maintenance_work_mem by 10. > > > > > > IMHO the calculation should be like this > > > lvshared->maintenance_work_mem_worker = (nindexes_mwm > 0) ? > > > maintenance_work_mem / Min(nindexes_mwm, nworkers) : > > > maintenance_work_mem; > > > > > > Am I missing something? > > > > No, I think you're right. On the other hand I think that dividing it > > by the number of indexes that will use the mantenance_work_mem makes > > sense when parallel degree > the number of such indexes. Suppose the > > table has 2 indexes and there are 10 workers then we should divide the > > maintenance_work_mem by 2 rather than 10 because it's possible that at > > most 2 indexes that uses the maintenance_work_mem are processed in > > parallel at a time. > > > Right, thats the reason I suggested divide with Min(nindexes_mwm, nworkers). Thanks! I'll fix it in the next version patch. -- Masahiko Sawadahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [HACKERS] Block level parallel vacuum
On Tue, 12 Nov 2019 at 20:11, Amit Kapila wrote: > > On Tue, Nov 12, 2019 at 3:39 PM Masahiko Sawada > wrote: > > > > On Tue, 12 Nov 2019 at 18:26, Dilip Kumar wrote: > > > > > > On Tue, Nov 12, 2019 at 2:25 PM Amit Kapila > > > wrote: > > > > > > > > Yeah, maybe something like amparallelvacuumoptions. The options can be: > > > > > > > > VACUUM_OPTION_NO_PARALLEL 0 # vacuum (neither bulkdelete nor > > > > vacuumcleanup) can't be performed in parallel > > > > VACUUM_OPTION_NO_PARALLEL_CLEANUP 1 # vacuumcleanup cannot be > > > > performed in parallel (hash index will set this flag) > > > > > > Maybe we don't want this option? because if 3 or 4 is not set then we > > > will not do the cleanup in parallel right? > > > > > Yeah, but it is better to be explicit about this. VACUUM_OPTION_NO_PARALLEL_BULKDEL is missing? I think brin indexes will use this flag. It will end up with (VACUUM_OPTION_NO_PARALLEL_CLEANUP | VACUUM_OPTION_NO_PARALLEL_BULKDEL) is equivalent to VACUUM_OPTION_NO_PARALLEL, though. > > > > > VACUUM_OPTION_PARALLEL_BULKDEL 2 # bulkdelete can be done in > > > > parallel (Indexes nbtree, hash, gin, gist, spgist, bloom will set this > > > > flag) > > > > VACUUM_OPTION_PARALLEL_COND_CLEANUP 3 # vacuumcleanup can be done in > > > > parallel if bulkdelete is not performed (Indexes nbtree, brin, hash, > > > > gin, gist, spgist, bloom will set this flag) > > > > VACUUM_OPTION_PARALLEL_CLEANUP 4 # vacuumcleanup can be done in > > > > parallel even if bulkdelete is already performed (Indexes gin, brin, > > > > and bloom will set this flag) > > > > > > > > Does something like this make sense? > > > > 3 and 4 confused me because 4 also looks conditional. How about having > > two flags instead: one for doing parallel cleanup when not performed > > yet (VACUUM_OPTION_PARALLEL_COND_CLEANUP) and another one for doing > > always parallel cleanup (VACUUM_OPTION_PARALLEL_CLEANUP)? > > > > Hmm, this is exactly what I intend to say with 3 and 4. I am not sure > what makes you think 4 is conditional. Hmm so why gin and bloom will set 3 and 4 flags? I thought if it sets 4 it doesn't need to set 3 because 4 means always doing cleanup in parallel. > > > That way, we > > can have flags as follows and index AM chooses two flags, one from the > > first two flags for bulk deletion and another from next three flags > > for cleanup. > > > > VACUUM_OPTION_PARALLEL_NO_BULKDEL 1 << 0 > > VACUUM_OPTION_PARALLEL_BULKDEL 1 << 1 > > VACUUM_OPTION_PARALLEL_NO_CLEANUP 1 << 2 > > VACUUM_OPTION_PARALLEL_COND_CLEANUP 1 << 3 > > VACUUM_OPTION_PARALLEL_CLEANUP 1 << 4 > > > > This also looks reasonable, but if there is an index that doesn't want > to support a parallel vacuum, it needs to set multiple flags. Right. It would be better to use uint16 as two uint8. I mean that if first 8 bits are 0 it means VACUUM_OPTION_PARALLEL_NO_BULKDEL and if next 8 bits are 0 means VACUUM_OPTION_PARALLEL_NO_CLEANUP. Other flags could be followings: VACUUM_OPTION_PARALLEL_BULKDEL 0x0001 VACUUM_OPTION_PARALLEL_COND_CLEANUP 0x0100 VACUUM_OPTION_PARALLEL_CLEANUP 0x0200 -- Masahiko Sawadahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
RE: [BUG FIX] Uninitialized var fargtypes used.
Hi, Sorry by error in the patch. --- \dll\postgresql-12.0\a\backend\commands\event_trigger.c Mon Sep 30 17:06:55 2019 +++ event_trigger.c Tue Nov 12 08:34:30 2019 @@ -171,7 +171,7 @@ HeapTuple tuple; Oid funcoid; Oid funcrettype; - Oid fargtypes[1]; /* dummy */ + Oid fargtypes[1] = {InvalidOid}; /* dummy */ Oid evtowner = GetUserId(); ListCell *lc; List *tags = NULL; De: Michael Paquier Enviado: terça-feira, 12 de novembro de 2019 03:31 Para: Ranier Vilela Cc: pgsql-hackers@lists.postgresql.org Assunto: Re: [BUG FIX] Uninitialized var fargtypes used. On Mon, Nov 11, 2019 at 06:28:47PM +, Ranier Vilela wrote: > Can anyone check this bug fix? > > +++ event_trigger.c Mon Nov 11 13:52:35 2019 > @@ -171,7 +171,7 @@ > HeapTuple tuple; > Oid funcoid; > Oid funcrettype; > - Oid fargtypes[1]; /* dummy */ > + Oid fargtypes[1] = {InvalidOid, InvalidOid}; > /* dummy */ > Oid evtowner = GetUserId(); Yeah, it would be better to fix this initialization. -- Michael event_trigger.c.patch Description: event_trigger.c.patch
Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions
On Tue, Nov 12, 2019 at 4:12 PM Alexey Kondratov wrote: > > On 04.11.2019 13:05, Kuntal Ghosh wrote: > > On Mon, Nov 4, 2019 at 3:32 PM Dilip Kumar wrote: > >> So your result shows that with "streaming on", performance is > >> degrading? By any chance did you try to see where is the bottleneck? > >> > > Right. But, as we increase the logical_decoding_work_mem, the > > performance improves. I've not analyzed the bottleneck yet. I'm > > looking into the same. > > My guess is that 64 kB is just too small value. In the table schema used > for tests every rows takes at least 24 bytes for storing column values. > Thus, with this logical_decoding_work_mem value the limit should be hit > after about 2500+ rows, or about 400 times during transaction of 100 > rows size. > > It is just too frequent, while ReorderBufferStreamTXN includes a whole > bunch of logic, e.g. it always starts internal transaction: > > /* > * Decoding needs access to syscaches et al., which in turn use > * heavyweight locks and such. Thus we need to have enough state around to > * keep track of those. The easiest way is to simply use a transaction > * internally. That also allows us to easily enforce that nothing writes > * to the database by checking for xid assignments. ... > */ > > Also it issues separated stream_start/stop messages around each streamed > transaction chunk. So if streaming starts and stops too frequently it > adds additional overhead and may even interfere with current in-progress > transaction. > Yeah, I've also found the same. With stream_start/stop message, it writes 1 byte of checksum and 4 bytes of number of sub-transactions which increases the write amplification significantly. -- Thanks & Regards, Kuntal Ghosh EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] Block level parallel vacuum
On Tue, Nov 12, 2019 at 4:04 PM Masahiko Sawada wrote: > > On Mon, 11 Nov 2019 at 17:57, Dilip Kumar wrote: > > > > On Tue, Oct 29, 2019 at 12:37 PM Masahiko Sawada > > wrote: > > > I realized that v31-0006 patch doesn't work fine so I've attached the > > > updated version patch that also incorporated some comments I got so > > > far. Sorry for the inconvenience. I'll apply your 0001 patch and also > > > test the total delay time. > > > > > While reviewing the 0002, I got one doubt related to how we are > > dividing the maintainance_work_mem > > > > +prepare_index_statistics(LVShared *lvshared, Relation *Irel, int nindexes) > > +{ > > + /* Compute the new maitenance_work_mem value for index vacuuming */ > > + lvshared->maintenance_work_mem_worker = > > + (nindexes_mwm > 0) ? maintenance_work_mem / nindexes_mwm : > > maintenance_work_mem; > > +} > > Is it fair to just consider the number of indexes which use > > maintenance_work_mem? Or we need to consider the number of worker as > > well. My point is suppose there are 10 indexes which will use the > > maintenance_work_mem but we are launching just 2 workers then what is > > the point in dividing the maintenance_work_mem by 10. > > > > IMHO the calculation should be like this > > lvshared->maintenance_work_mem_worker = (nindexes_mwm > 0) ? > > maintenance_work_mem / Min(nindexes_mwm, nworkers) : > > maintenance_work_mem; > > > > Am I missing something? > > No, I think you're right. On the other hand I think that dividing it > by the number of indexes that will use the mantenance_work_mem makes > sense when parallel degree > the number of such indexes. Suppose the > table has 2 indexes and there are 10 workers then we should divide the > maintenance_work_mem by 2 rather than 10 because it's possible that at > most 2 indexes that uses the maintenance_work_mem are processed in > parallel at a time. > Right, thats the reason I suggested divide with Min(nindexes_mwm, nworkers). -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: cost based vacuum (parallel)
On Tue, 12 Nov 2019 at 19:08, Amit Kapila wrote: > > On Tue, Nov 12, 2019 at 3:03 PM Dilip Kumar wrote: > > > > On Tue, Nov 12, 2019 at 10:47 AM Dilip Kumar wrote: > > > > > > On Mon, Nov 11, 2019 at 4:23 PM Amit Kapila > > > wrote: > > > > > > > > On Mon, Nov 11, 2019 at 12:59 PM Dilip Kumar > > > > wrote: > > > > > > > > > > On Mon, Nov 11, 2019 at 9:43 AM Dilip Kumar > > > > > wrote: > > > > > > > > > > > > On Fri, Nov 8, 2019 at 11:49 AM Amit Kapila > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > Yeah, I think it is difficult to get the exact balance, but we > > > > > > > can try > > > > > > > to be as close as possible. We can try to play with the > > > > > > > threshold and > > > > > > > another possibility is to try to sleep in proportion to the > > > > > > > amount of > > > > > > > I/O done by the worker. > > > > > > I have done another experiment where I have done another 2 changes > > > > > > on > > > > > > top op patch3 > > > > > > a) Only reduce the local balance from the total shared balance > > > > > > whenever it's applying delay > > > > > > b) Compute the delay based on the local balance. > > > > > > > > > > > > patch4: > > > > > > worker 0 delay=84.13 total I/O=17931 hit=17891 miss=0 dirty=2 > > > > > > worker 1 delay=89.23 total I/O=17931 hit=17891 miss=0 dirty=2 > > > > > > worker 2 delay=88.68 total I/O=17931 hit=17891 miss=0 dirty=2 > > > > > > worker 3 delay=80.79 total I/O=16378 hit=4318 miss=0 dirty=603 > > > > > > > > > > > > I think with this approach the delay is divided among the worker > > > > > > quite > > > > > > well compared to other approaches > > > > > > > > > > > > > > > > > .. > > > > > I have tested the same with some other workload(test file attached). > > > > > I can see the same behaviour with this workload as well that with the > > > > > patch 4 the distribution of the delay is better compared to other > > > > > patches i.e. worker with more I/O have more delay and with equal IO > > > > > have alsomost equal delay. Only thing is that the total delay with > > > > > the patch 4 is slightly less compared to other pacthes. > > > > > > > > > > > > > I see one problem with the formula you have used in the patch, maybe > > > > that is causing the value of total delay to go down. > > > > > > > > - if (new_balance >= VacuumCostLimit) > > > > + VacuumCostBalanceLocal += VacuumCostBalance; > > > > + if ((new_balance >= VacuumCostLimit) && > > > > + (VacuumCostBalanceLocal > VacuumCostLimit/(0.5 * nworker))) > > > > > > > > As per discussion, the second part of the condition should be > > > > "VacuumCostBalanceLocal > (0.5) * VacuumCostLimit/nworker". I think > > > > you can once change this and try again. Also, please try with the > > > > different values of threshold (0.3, 0.5, 0.7, etc.). > > > > > > > I have modified the patch4 and ran with different values. But, I > > > don't see much difference in the values with the patch4. Infact I > > > removed the condition for the local balancing check completely still > > > the delays are the same, I think this is because with patch4 worker > > > are only reducing their own balance and also delaying as much as their > > > local balance. So maybe the second condition will not have much > > > impact. > > > > > Yeah, but I suspect the condition (when the local balance exceeds a > certain threshold, then only try to perform delay) you mentioned can > have an impact in some other scenarios. So, it is better to retain > the same. I feel the overall results look sane and the approach seems > reasonable to me. > > > > > > I have revised the patch4 so that it doesn't depent upon the fix > > number of workers, instead I have dynamically updated the worker > > count. > > > > Thanks. Sawada-San, by any chance, can you try some of the tests done > by Dilip or some similar tests just to rule out any sort of > machine-specific dependency? Sure. I'll try it tomorrow. -- Masahiko Sawadahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Option to dump foreign data in pg_dump
Hello a new version of the patch with the tests from Daniel (thanks!) and the nitpicks. I don't feel good about this feature. pg_dump should not dump any data that are not part of the database being dumped. If you restore such a dump, the data will be inserted into the foreign table, right? Unless someone emptied the remote table first, this will add duplicated data to that table. I think that is an unpleasant surprise. I'd expect that if I drop a database and restore it from a dump, it should be as it was before. This change would break that assumption. What are the use cases of a dump with foreign table data? Unless I misunderstood something there, -1. This feature is opt-in so if the user makes dumps of a remote server explicitly by other means, then the user would not need to use these option. But, not all foreign tables are necessarily in a remote server like the ones referenced by the postgres_fdw. In FDWs like swarm64da, cstore, citus or timescaledb, the foreign tables are part of your database, and one could expect that a dump of the database includes data from these FDWs. Cheers Luis M Carril diff --git a/doc/src/sgml/ref/pg_dump.sgml b/doc/src/sgml/ref/pg_dump.sgml index 4bcd4bdaef..319851b936 100644 --- a/doc/src/sgml/ref/pg_dump.sgml +++ b/doc/src/sgml/ref/pg_dump.sgml @@ -767,6 +767,34 @@ PostgreSQL documentation + + --include-foreign-data=foreignserver + + +Dump the data for any foreign table with a foreign server +matching foreignserver +pattern. Multiple foreign servers can be selected by writing multiple --include-foreign-data. +Also, the foreignserver parameter is +interpreted as a pattern according to the same rules used by +psql's \d commands (see ), +so multiple foreign servers can also be selected by writing wildcard characters +in the pattern. When using wildcards, be careful to quote the pattern +if needed to prevent the shell from expanding the wildcards; see +. +The only exception is that an empty pattern is disallowed. + + + + + When --include-foreign-data is specified, pg_dump + does not check if the foreign table is also writeable. Therefore, there is no guarantee that + the results of a foreign table dump can be successfully restored by themselves into a clean database. + + + + + --inserts diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c index bf69adc2f4..31465dec79 100644 --- a/src/bin/pg_dump/pg_dump.c +++ b/src/bin/pg_dump/pg_dump.c @@ -120,6 +120,8 @@ static SimpleStringList table_exclude_patterns = {NULL, NULL}; static SimpleOidList table_exclude_oids = {NULL, NULL}; static SimpleStringList tabledata_exclude_patterns = {NULL, NULL}; static SimpleOidList tabledata_exclude_oids = {NULL, NULL}; +static SimpleStringList foreign_servers_include_patterns = {NULL, NULL}; +static SimpleOidList foreign_servers_include_oids = {NULL, NULL}; char g_opaque_type[10]; /* name for the opaque type */ @@ -156,6 +158,9 @@ static void expand_schema_name_patterns(Archive *fout, SimpleStringList *patterns, SimpleOidList *oids, bool strict_names); +static void expand_foreign_server_name_patterns(Archive *fout, + SimpleStringList *patterns, + SimpleOidList *oids); static void expand_table_name_patterns(Archive *fout, SimpleStringList *patterns, SimpleOidList *oids, @@ -388,6 +393,7 @@ main(int argc, char **argv) {"no-sync", no_argument, NULL, 7}, {"on-conflict-do-nothing", no_argument, &dopt.do_nothing, 1}, {"rows-per-insert", required_argument, NULL, 10}, + {"include-foreign-data", required_argument, NULL, 11}, {NULL, 0, NULL, 0} }; @@ -604,6 +610,15 @@ main(int argc, char **argv) dopt.dump_inserts = (int) rowsPerInsert; break; + case 11:/* include foreign data */ +if (strcmp(optarg, "") == 0) +{ + pg_log_error("empty string is not a valid pattern in --include-foreign-data"); + exit_nicely(1); +} +simple_string_list_append(&foreign_servers_include_patterns, optarg); +break; + default: fprintf(stderr, _("Try \"%s --help\" for more information.\n"), progname); exit_nicely(1); @@ -645,6 +660,12 @@ main(int argc, char **argv) exit_nicely(1); } + if (dopt.schemaOnly && foreign_servers_include_patterns.head != NULL) + { + pg_log_error("options -s/--schema-only and --include-foreign-data cannot be used together"); + exit_nicely(1); + } + if (dopt.dataOnly && dopt.outputClean) { pg_log_error("options -c/--clean and -a/--data-only cannot be used together"); @@ -812,6 +833,9 @@ main(int argc, char **argv) &tabledata_exclude_oids, false); + expand_foreign_server_name_patterns(fout, &foreign_servers_include_patterns, + &foreign_server
Re: [HACKERS] Block level parallel vacuum
On Tue, Nov 12, 2019 at 3:39 PM Masahiko Sawada wrote: > > On Tue, 12 Nov 2019 at 18:26, Dilip Kumar wrote: > > > > On Tue, Nov 12, 2019 at 2:25 PM Amit Kapila wrote: > > > > > > Yeah, maybe something like amparallelvacuumoptions. The options can be: > > > > > > VACUUM_OPTION_NO_PARALLEL 0 # vacuum (neither bulkdelete nor > > > vacuumcleanup) can't be performed in parallel > > > VACUUM_OPTION_NO_PARALLEL_CLEANUP 1 # vacuumcleanup cannot be > > > performed in parallel (hash index will set this flag) > > > > Maybe we don't want this option? because if 3 or 4 is not set then we > > will not do the cleanup in parallel right? > > Yeah, but it is better to be explicit about this. > > > VACUUM_OPTION_PARALLEL_BULKDEL 2 # bulkdelete can be done in > > > parallel (Indexes nbtree, hash, gin, gist, spgist, bloom will set this > > > flag) > > > VACUUM_OPTION_PARALLEL_COND_CLEANUP 3 # vacuumcleanup can be done in > > > parallel if bulkdelete is not performed (Indexes nbtree, brin, hash, > > > gin, gist, spgist, bloom will set this flag) > > > VACUUM_OPTION_PARALLEL_CLEANUP 4 # vacuumcleanup can be done in > > > parallel even if bulkdelete is already performed (Indexes gin, brin, > > > and bloom will set this flag) > > > > > > Does something like this make sense? > > 3 and 4 confused me because 4 also looks conditional. How about having > two flags instead: one for doing parallel cleanup when not performed > yet (VACUUM_OPTION_PARALLEL_COND_CLEANUP) and another one for doing > always parallel cleanup (VACUUM_OPTION_PARALLEL_CLEANUP)? > Hmm, this is exactly what I intend to say with 3 and 4. I am not sure what makes you think 4 is conditional. > That way, we > can have flags as follows and index AM chooses two flags, one from the > first two flags for bulk deletion and another from next three flags > for cleanup. > > VACUUM_OPTION_PARALLEL_NO_BULKDEL 1 << 0 > VACUUM_OPTION_PARALLEL_BULKDEL 1 << 1 > VACUUM_OPTION_PARALLEL_NO_CLEANUP 1 << 2 > VACUUM_OPTION_PARALLEL_COND_CLEANUP 1 << 3 > VACUUM_OPTION_PARALLEL_CLEANUP 1 << 4 > This also looks reasonable, but if there is an index that doesn't want to support a parallel vacuum, it needs to set multiple flags. > > Yeah, something like that seems better to me. > > > > > If we all agree on this, then I > > > think we can summarize the part of the discussion related to this API > > > and get feedback from a broader audience. > > > > Make sense. > > +1 > Okay, then I will write a separate email for this topic. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: Attempt to consolidate reading of XLOG page
Michael Paquier wrote: > On Mon, Nov 11, 2019 at 04:25:56PM +0100, Antonin Houska wrote: > >> On Fri, Oct 04, 2019 at 12:11:11PM +0200, Antonin Houska wrote: > >> Your patch removes all the three optional lseek() calls which can > >> happen in a segment. Am I missing something but isn't that plain > >> wrong? You could reuse the error context for that as well if an error > >> happens as what's needed is basically the segment name and the LSN > >> offset. > > > > Explicit call of lseek() is not used because XLogRead() uses pg_pread() > > now. Nevertheless I found out that in the the last version of the patch I > > set > > ws_off to 0 for a newly opened segment. This was wrong, fixed now. > > Missed that part, thanks. This was actually not obvious after an > initial lookup of the patch. Wouldn't it make sense to split that > part in a separate patch that we could review and get committed first > then? It would have the advantage to make the rest easier to review > and follow. And using pread is actually better for performance > compared to read+lseek. Now there is also the argument that we don't > always seek into an opened WAL segment, and that a plain read() is > actually better than pread() in some cases. ok, the next version uses explicit lseek(). Maybe the fact that XLOG is mostly read sequentially (i.e. without frequent seeks) is the reason pread() has't been adopted so far. The new version reflects your other suggestions too, except the one about not renaming "XLOG" -> "WAL" (actually you mentioned that earlier in the thread). I recall that when working on the preliminary patch (709d003fbd), Alvaro suggested "WAL" for some structures because these are new. The rule seemed to be that "XLOG..." should be left for the existing symbols, while the new ones should be "WAL...": https://www.postgresql.org/message-id/20190917221521.GA15733%40alvherre.pgsql So I decided to rename the new symbols and to remove the related comment. -- Antonin Houska Web: https://www.cybertec-postgresql.com >From 34c0ae891de7dae3b84de33795c5d0521ccc0a88 Mon Sep 17 00:00:00 2001 From: Antonin Houska Date: Tue, 12 Nov 2019 11:51:14 +0100 Subject: [PATCH 1/2] Use only xlogreader.c:XLogRead() The implementations in xlogutils.c and walsender.c are just renamed now, to be removed by the following diff. --- src/backend/access/transam/xlogreader.c | 121 src/backend/access/transam/xlogutils.c | 94 ++-- src/backend/replication/walsender.c | 143 +++- src/bin/pg_waldump/pg_waldump.c | 63 ++- src/include/access/xlogreader.h | 37 ++ src/include/access/xlogutils.h | 1 + 6 files changed, 441 insertions(+), 18 deletions(-) diff --git a/src/backend/access/transam/xlogreader.c b/src/backend/access/transam/xlogreader.c index 7f24f0cb95..006c6298c9 100644 --- a/src/backend/access/transam/xlogreader.c +++ b/src/backend/access/transam/xlogreader.c @@ -17,6 +17,8 @@ */ #include "postgres.h" +#include + #include "access/transam.h" #include "access/xlog_internal.h" #include "access/xlogreader.h" @@ -27,6 +29,7 @@ #ifndef FRONTEND #include "miscadmin.h" +#include "pgstat.h" #include "utils/memutils.h" #endif @@ -1015,6 +1018,124 @@ out: #endif /* FRONTEND */ +/* + * Read 'count' bytes from WAL fetched from timeline 'tli' into 'buf', + * starting at location 'startptr'. 'seg' is the last segment used, + * 'openSegment' is a callback to open the next segment and 'segcxt' is + * additional segment info that does not fit into 'seg'. + * + * 'errinfo' should point to XLogReadError structure which will receive error + * details in case the read fails. + * + * Returns true if succeeded, false if failed. + * + * XXX probably this should be improved to suck data directly from the + * WAL buffers when possible. + */ +bool +XLogRead(char *buf, XLogRecPtr startptr, Size count, TimeLineID tli, + WALOpenSegment *seg, WALSegmentContext *segcxt, + WALSegmentOpen openSegment, WALReadError *errinfo) +{ + char *p; + XLogRecPtr recptr; + Size nbytes; + + p = buf; + recptr = startptr; + nbytes = count; + + while (nbytes > 0) + { + uint32 startoff; + int segbytes; + int readbytes; + + startoff = XLogSegmentOffset(recptr, segcxt->ws_segsize); + + if (seg->ws_file < 0 || + !XLByteInSeg(recptr, seg->ws_segno, segcxt->ws_segsize) || + tli != seg->ws_tli) + { + XLogSegNo nextSegNo; + + /* Switch to another logfile segment */ + if (seg->ws_file >= 0) +close(seg->ws_file); + + XLByteToSeg(recptr, nextSegNo, segcxt->ws_segsize); + + /* Open the next segment in the caller's way. */ + openSegment(nextSegNo, segcxt, &tli, &seg->ws_file); + + /* Update the current segment info. */ + seg->ws_tli = tli; + seg->ws_segno = nextSegNo; + seg->ws_off = 0; + } + + /* Need to seek in the file? */ + if (seg->ws_off != startoff) + { + /* + * Update ws_off unconditionally, it will be useful fo