Re: [HACKERS] Measuring replay lag
On 1 March 2017 at 10:47, Thomas Munrowrote: >>> I added a fourth case 'overwhelm.png' which you might find >>> interesting. It's essentially like one 'burst' followed by a 100% ide >>> primary. The primary stops sending new WAL around 50 seconds in and >>> then there is no autovacuum, nothing happening at all. The standby >>> start is still replaying its backlog of WAL, but is sending back >>> replies only every 10 seconds (because no WAL arriving so no other >>> reason to send replies except status message timeout, which could be >>> lowered). So we see some big steps, and then we finally see it >>> flat-line around 60 seconds because there is still now new WAL so we >>> keep showing the last measured lag. If new WAL is flushed it will pop >>> back to 0ish, but until then its last known measurement is ~14 >>> seconds, which I don't think is technically wrong. >> >> If I understand what you're saying, "14 secs" would not be seen as the >> correct answer by our users when the delay is now zero. >> >> Solving that is where the keepalives need to come into play. If no new >> WAL, send a keepalive and track the lag on that. > > Hmm. Currently it works strictly with measurements of real WAL write, > flush and apply times. I rather like the simplicity of that > definition of the lag numbers, and the fact that they move only as a > result of genuine measured activity WAL. A keepalive message is never > written, flushed or applied, so if we had special cases here to show > constant 0 or measure keepalive round-trip time when we hit the end of > known WAL or something like that, the reported lag times for those > three operations wouldn't be true. In any real database cluster there > is real WAL being generated all the time, so after a big backload is > finally processed by a standby the "14 secs" won't linger for very > long, and during the time when you see that, it really is the last > true measured lag time. > > I do see why a new user trying this feature for the first time might > expect it to show a lag of 0 just as soon as sent LSN = > write/flush/apply LSN or something like that, but after some > reflection I suspect that it isn't useful information, and it would be > smoke and mirrors rather than real data. Perhaps I am misunderstanding the way it works. If the last time WAL was generated the lag was 14 secs, then nothing occurs for 2 hours aftwards AND all changes have been successfully applied then it should not continue to show 14 secs for the next 2 hours. IMHO the lag time should drop to zero in a reasonable time and stay at zero for those 2 hours because there is no current lag. If we want to show historical lag data, I'm supportive of the idea, but we must report an accurate current value when the system is busy and when the system is quiet. -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Allow pg_dumpall to work without pg_authid
On 5 March 2017 at 18:00, Simon Riggswrote: > I'm looking to commit the patch version I posted, so I would like your > comments that it does continue to solve the problems you raised. > Thanks Simon, for confirming. Yes, the updated patch does solve the problem. - robins
Re: [HACKERS] Measuring replay lag
On 1 March 2017 at 10:47, Thomas Munrowrote: > On Fri, Feb 24, 2017 at 9:05 AM, Simon Riggs wrote: >> On 21 February 2017 at 21:38, Thomas Munro >> wrote: >>> However, I think a call like LagTrackerWrite(SendRqstPtr, >>> GetCurrentTimestamp()) needs to go into XLogSendLogical, to mirror >>> what happens in XLogSendPhysical. I'm not sure about that. >> >> Me neither, but I think we need this for both physical and logical. >> >> Same use cases graphs for both, I think. There might be issues with >> the way LSNs work for logical. > > This seems to be problematic. Logical peers report LSN changes for > all three operations (write, flush, commit) only on commit. I suppose > that might work OK for synchronous replication, but it makes it a bit > difficult to get lag measurements that don't look really strange and > sawtoothy when you have long transactions, and overlapping > transactions might interfere with the measurements in odd ways. I > wonder if the way LSNs are reported by logical rep would need to be > changed first. I need to study this some more and would be grateful > for ideas from any of the logical rep people. I have no doubt there are problems with the nature of logical replication that affect this. Those things are not the problem of this patch but that doesn't push everything away. What we want from this patch is something that works for both, as much as that is possible. With that in mind, this patch should be able to provide sensible lag measurements from a simple case like logical replication of a standard pgbench run. If that highlights problems with this patch then we can fix them here. Thanks -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] dropping partitioned tables without CASCADE
On 27 February 2017 at 02:38, Amit Langotewrote: > On 2017/02/26 5:30, Simon Riggs wrote: >> On 23 February 2017 at 16:33, Simon Riggs wrote: >> >>> I'll be happy to review >> >> Patch looks OK so far, but fails on a partition that has partitions, >> probably because of the way we test relkind in the call to >> StoreCatalogInheritance1(). > > Thanks for the review. > > I could not reproduce the failure you are seeing; could you perhaps share > the failing test case? I used a slight modification of the case mentioned on the docs. I confirm this fails repeatably for me on current HEAD. CREATE TABLE cities ( city_id bigserial not null, name text not null, population bigint ) PARTITION BY LIST (left(lower(name), 1)); CREATE TABLE cities_ab PARTITION OF cities ( CONSTRAINT city_id_nonzero CHECK (city_id != 0) ) FOR VALUES IN ('a', 'b') PARTITION BY RANGE (population); drop table cities; ERROR: cannot drop table cities because other objects depend on it DETAIL: table cities_ab depends on table cities HINT: Use DROP ... CASCADE to drop the dependent objects too. I notice also that \d+ does not show which partitions have subpartitions. I'm worried that these things illustrate something about the catalog representation that we may need to improve, but I don't have anything concrete to say on that at present. -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Block level parallel vacuum WIP
On Sun, Mar 5, 2017 at 12:14 PM, David Steelewrote: > On 3/4/17 9:08 PM, Masahiko Sawada wrote: >> On Sat, Mar 4, 2017 at 5:47 PM, Robert Haas wrote: >>> On Fri, Mar 3, 2017 at 9:50 PM, Masahiko Sawada >>> wrote: Yes, it's taking a time to update logic and measurement but it's coming along. Also I'm working on changing deadlock detection. Will post new patch and measurement result. >>> >>> I think that we should push this patch out to v11. I think there are >>> too many issues here to address in the limited time we have remaining >>> this cycle, and I believe that if we try to get them all solved in the >>> next few weeks we're likely to end up getting backed into some choices >>> by time pressure that we may later regret bitterly. Since I created >>> the deadlock issues that this patch is facing, I'm willing to try to >>> help solve them, but I think it's going to require considerable and >>> delicate surgery, and I don't think doing that under time pressure is >>> a good idea. >>> >>> From a fairness point of view, a patch that's not in reviewable shape >>> on March 1st should really be pushed out, and we're several days past >>> that. >>> >> >> Agreed. There are surely some rooms to discuss about the design yet, >> and it will take long time. it's good to push this out to CF2017-07. >> Thank you for the comment. > > I have marked this patch "Returned with Feedback." Of course you are > welcome to submit this patch to the 2017-07 CF, or whenever you feel it > is ready. Thank you! Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Allow pg_dumpall to work without pg_authid
On 28 February 2017 at 17:49, Simon Riggswrote: > I've edited the stated reason for the patch on the CF app, so its > clearer as to why this might be acceptable. Robins, I'm looking to commit the patch version I posted, so I would like your comments that it does continue to solve the problems you raised. Thanks -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Block level parallel vacuum WIP
On 3/4/17 9:08 PM, Masahiko Sawada wrote: > On Sat, Mar 4, 2017 at 5:47 PM, Robert Haaswrote: >> On Fri, Mar 3, 2017 at 9:50 PM, Masahiko Sawada >> wrote: >>> Yes, it's taking a time to update logic and measurement but it's >>> coming along. Also I'm working on changing deadlock detection. Will >>> post new patch and measurement result. >> >> I think that we should push this patch out to v11. I think there are >> too many issues here to address in the limited time we have remaining >> this cycle, and I believe that if we try to get them all solved in the >> next few weeks we're likely to end up getting backed into some choices >> by time pressure that we may later regret bitterly. Since I created >> the deadlock issues that this patch is facing, I'm willing to try to >> help solve them, but I think it's going to require considerable and >> delicate surgery, and I don't think doing that under time pressure is >> a good idea. >> >> From a fairness point of view, a patch that's not in reviewable shape >> on March 1st should really be pushed out, and we're several days past >> that. >> > > Agreed. There are surely some rooms to discuss about the design yet, > and it will take long time. it's good to push this out to CF2017-07. > Thank you for the comment. I have marked this patch "Returned with Feedback." Of course you are welcome to submit this patch to the 2017-07 CF, or whenever you feel it is ready. -- -David da...@pgmasters.net -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] DROP SUBSCRIPTION and ROLLBACK
On Sat, Mar 4, 2017 at 1:46 PM, Peter Eisentrautwrote: > On 3/3/17 13:58, Petr Jelinek wrote: >> On 23/02/17 08:24, Masahiko Sawada wrote: >>> Attached updated version patches. Please review these. >>> >> >> This version looks good to me, I'd only change the >> >>> +PreventTransactionChain(isTopLevel, "CREATE SUBSCRIPTION >>> CREATE SLOT"); >> >> to "CREATE SUBSCRIPTION ... CREATE SLOT" as that's afaik how we do it >> for other commands (and same with DROP). > > I have committed fixes for these issues. Thanks! > > I didn't like the syntax change in DROP SUBSCRIPTION, so I have just > fixed the parsing of the existing syntax. We can discuss syntax changes > separately. Understood. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Block level parallel vacuum WIP
On Sat, Mar 4, 2017 at 5:47 PM, Robert Haaswrote: > On Fri, Mar 3, 2017 at 9:50 PM, Masahiko Sawada wrote: >> Yes, it's taking a time to update logic and measurement but it's >> coming along. Also I'm working on changing deadlock detection. Will >> post new patch and measurement result. > > I think that we should push this patch out to v11. I think there are > too many issues here to address in the limited time we have remaining > this cycle, and I believe that if we try to get them all solved in the > next few weeks we're likely to end up getting backed into some choices > by time pressure that we may later regret bitterly. Since I created > the deadlock issues that this patch is facing, I'm willing to try to > help solve them, but I think it's going to require considerable and > delicate surgery, and I don't think doing that under time pressure is > a good idea. > > From a fairness point of view, a patch that's not in reviewable shape > on March 1st should really be pushed out, and we're several days past > that. > Agreed. There are surely some rooms to discuss about the design yet, and it will take long time. it's good to push this out to CF2017-07. Thank you for the comment. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] mat views stats
On Wed, Mar 1, 2017 at 8:39 PM, Michael Paquierwrote: > On Thu, Mar 2, 2017 at 7:20 AM, Jim Mlodgenski wrote: > > > > > > On Sun, Feb 26, 2017 at 11:49 AM, Robert Haas > wrote: > >> > >> On Wed, Feb 22, 2017 at 11:13 AM, Jim Nasby > >> wrote: > >> > Certainly easier, but I don't think it'd be better. Matviews really > >> > aren't > >> > the same thing as tables. Off-hand (without reviewing the patch), > update > >> > and > >> > delete counts certainly wouldn't make any sense. "Insert" counts > might, > >> > in > >> > as much as it's how many rows have been added by refreshes. You'd > want a > >> > refresh count too. > >> > >> Regular REFRESH truncates the view and repopulates it, but REFRESH > >> CONCURRENTLY does inserts, updates, and deletes as needed to adjust > >> the contrs that make sense for > >> regular tables are also sensible here. > >> > > > > After digging into things further, just making refresh report the stats > for > > what is it basically doing simplifies and solves it and it is something > we > > can back patch if that the consensus. See the attached patch. > > This is unhappy: > $ git diff master --check > src/backend/commands/matview.c:155: indent with spaces. > +uint64 processed = 0; > > +/* > + * Send the stats to mimic what we are essentially doing. > + * A truncate and insert > + */ > This sentence is unfinished. > > There is also no need to report the number of inserts if WITH NO DATA is > used. > Here is the cleaned up patch diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c index a18c917..94a69dd 100644 --- a/src/backend/commands/matview.c +++ b/src/backend/commands/matview.c @@ -30,6 +30,7 @@ #include "executor/spi.h" #include "miscadmin.h" #include "parser/parse_relation.h" +#include "pgstat.h" #include "rewrite/rewriteHandler.h" #include "storage/lmgr.h" #include "storage/smgr.h" @@ -59,7 +60,7 @@ static void transientrel_startup(DestReceiver *self, int operation, TupleDesc ty static bool transientrel_receive(TupleTableSlot *slot, DestReceiver *self); static void transientrel_shutdown(DestReceiver *self); static void transientrel_destroy(DestReceiver *self); -static void refresh_matview_datafill(DestReceiver *dest, Query *query, +static uint64 refresh_matview_datafill(DestReceiver *dest, Query *query, const char *queryString); static char *make_temptable_name_n(char *tempname, int n); @@ -151,6 +152,7 @@ ExecRefreshMatView(RefreshMatViewStmt *stmt, const char *queryString, Oid save_userid; int save_sec_context; int save_nestlevel; + uint64 processed = 0; ObjectAddress address; /* Determine strength of lock needed. */ @@ -322,7 +324,7 @@ ExecRefreshMatView(RefreshMatViewStmt *stmt, const char *queryString, /* Generate the data, if wanted. */ if (!stmt->skipData) - refresh_matview_datafill(dest, dataQuery, queryString); + processed = refresh_matview_datafill(dest, dataQuery, queryString); heap_close(matviewRel, NoLock); @@ -345,8 +347,18 @@ ExecRefreshMatView(RefreshMatViewStmt *stmt, const char *queryString, Assert(matview_maintenance_depth == old_depth); } else + { refresh_by_heap_swap(matviewOid, OIDNewHeap, relpersistence); + /* + * Send the stats to mimic what we are essentially doing. Swapping the heap + * is equilivant to truncating the relation and inserting the new data. + */ + pgstat_count_truncate(matviewRel); + if (!stmt->skipData) + pgstat_count_heap_insert(matviewRel, processed); + } + /* Roll back any GUC changes */ AtEOXact_GUC(false, save_nestlevel); @@ -361,7 +373,7 @@ ExecRefreshMatView(RefreshMatViewStmt *stmt, const char *queryString, /* * refresh_matview_datafill */ -static void +static uint64 refresh_matview_datafill(DestReceiver *dest, Query *query, const char *queryString) { @@ -369,6 +381,7 @@ refresh_matview_datafill(DestReceiver *dest, Query *query, PlannedStmt *plan; QueryDesc *queryDesc; Query *copied_query; + uint64 processed; /* Lock and rewrite, using a copy to preserve the original query. */ copied_query = copyObject(query); @@ -406,6 +419,8 @@ refresh_matview_datafill(DestReceiver *dest, Query *query, /* run the plan */ ExecutorRun(queryDesc, ForwardScanDirection, 0L); + processed = queryDesc->estate->es_processed; + /* and clean up */ ExecutorFinish(queryDesc); ExecutorEnd(queryDesc); @@ -413,6 +428,8 @@ refresh_matview_datafill(DestReceiver *dest, Query *query, FreeQueryDesc(queryDesc); PopActiveSnapshot(); + + return processed; } DestReceiver * -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Cost model for parallel CREATE INDEX
On Sat, Mar 4, 2017 at 6:00 AM, Stephen Frostwrote: >> It is, but I was using that with index size, not table size. I can >> change it to be table size, based on what you said. But the workMem >> related cap, which probably won't end up being applied all that often >> in practice, *should* still do something with projected index size, >> since that really is what we're sorting, which could be very different >> (e.g. with partial indexes). > > Isn't that always going to be very different, unless you're creating a > single index across every column in the table..? Or perhaps I've > misunderstood what you're comparing as being 'very different' in your > last sentence. I mean: though a primary key index or similar is smaller than the table by maybe 5X, they are still generally within an order of magnitude. Given that the number of workers is determined at logarithmic intervals, it may not actually matter that much whether the scaling is based on heap size (input size) or index size (output size), at a very high level. Despite a 5X difference. I'm referring to the initial determination of the number of workers to be used, based on the scan the parallel CREATE INDEX has to do. So, I'm happy to go along with Robert's suggestion for V9, and have this number determined based on heap input size rather than index output size. It's good to be consistent with what we do for parallel seq scan (care about input size), and it probably won't change things by much anyway. This is generally the number that the cost model will end up going with, in practice. However, we then need to consider that since maintenance_work_mem is doled out as maintenance_work_mem/nworkers slices for parallel CREATE INDEX, there is a sensitivity to how much memory is left per worker as workers are added. This clear needs to be based on projected/estimated index size (output size), since that is what is being sorted, and because partial indexes imply that the size of the index could be *vastly* less than heap input size with still-sensible use of the feature. This will be applied as a cap on the first number. So, I agree with Robert that we should actually use heap size for the main, initial determination of # of workers to use, but we still need to estimate the size of the final index [1], to let the cost model cap the initial determination when maintenance_work_mem is just too low. (This cap will rarely be applied in practice, as I said.) [1] https://wiki.postgresql.org/wiki/Parallel_External_Sort#bt_estimated_nblocks.28.29_function_in_pageinspect -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: check failure with -DRELCACHE_FORCE_RELEASE -DCLOBBER_FREED_MEMORY
I wrote: > Andrew Dunstanwrites: >> On 03/03/2017 02:24 PM, Andrew Dunstan wrote: >>> I have been setting up a buildfarm member with -DRELCACHE_FORCE_RELEASE >>> -DCLOBBER_FREED_MEMORY, settings which Alvaro suggested to me.I got core >>> dumps with these stack traces. The platform is Amazon Linux. >> I have replicated this on a couple of other platforms (Fedora, FreeBSD) >> and back to 9.5. The same failure doesn't happen with buildfarm runs on >> earlier branches, although possibly they don't have the same set of tests. > well, the problem in rebuild_relation() seems pretty blatant: I fixed that, and the basic regression tests no longer crash outright with these settings, but I do see half a dozen errors that all seem to be in RLS-related tests. They all look like something is trying to access an already-closed relcache entry, much like the problem in rebuild_relation(). But I have no time to look closer for the next several days. Stephen, I think this is your turf anyway. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: check failure with -DRELCACHE_FORCE_RELEASE -DCLOBBER_FREED_MEMORY
Andrew Dunstanwrites: > On 03/03/2017 02:24 PM, Andrew Dunstan wrote: >> I have been setting up a buildfarm member with -DRELCACHE_FORCE_RELEASE >> -DCLOBBER_FREED_MEMORY, settings which Alvaro suggested to me.I got core >> dumps with these stack traces. The platform is Amazon Linux. > I have replicated this on a couple of other platforms (Fedora, FreeBSD) > and back to 9.5. The same failure doesn't happen with buildfarm runs on > earlier branches, although possibly they don't have the same set of tests. well, the problem in rebuild_relation() seems pretty blatant: /* Close relcache entry, but keep lock until transaction commit */ heap_close(OldHeap, NoLock); /* Create the transient table that will receive the re-ordered data */ OIDNewHeap = make_new_heap(tableOid, tableSpace, OldHeap->rd_rel->relpersistence, ^^^ AccessExclusiveLock); There are two such references after the heap_close. I don't know that those are the only bugs, but this reference is certainly the proximate cause of the crash I'm seeing. Will push a fix in a little bit. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Use $ parameters as replacement characters for pg_stat_statements
Hi, On 2017-03-04 11:02:14 -0500, Tom Lane wrote: > But speaking of ambiguity: isn't it possible for $n symbols to appear in > pg_stat_statements already? Indeed. > I think it is, both from extended-protocol > client queries and from SPI commands, which would mean that the proposal > as it stands is not fixing the ambiguity problem at all. So yes, we need > another idea. I think using $ to signal a parameter is still a good idea, as people kinda have to know that one anyway. Maybe one of "$:n" "$-n"? - Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Use $ parameters as replacement characters for pg_stat_statements
On Sat, Mar 4, 2017 at 8:02 AM, Tom Lanewrote: >> Perhaps there could be a choice of behaviors. Even if we all agreed >> that parameter notation was better in theory, there's something to be >> said for maintaining backward compatibility, or having an option to do >> so. > > Meh ... we've generally regretted it when we "solved" a backwards > compatibility problem by introducing a GUC that changes query semantics. > I'm inclined to think we should either do it or not. In my opinion, we expose query id (and dbid, and userid) as the canonical identifier for each pg_stat_statements entry, and have done so for some time. That's the stable API -- not query text. I'm aware of cases where query text was used as an identifier, but that ended up being hashed anyway. Query text is just for human consumption. I'd be in favor of a change that makes it easier to copy and paste a query, to run EXPLAIN and so on. Lukas probably realizes that there are no guarantees that the query text that appears in pg_stat_statements will even appear as normalized in all cases. The "sticky entry" stuff is intended to maximize the chances of that happening, but it's still generally quite possible (e.g. pg_stat_statements never swaps constants in a query like "SELECT 5, pg_stat_statements_reset()"). This means that we cannot really say that this buys us a machine-readable query text format, at least not without adding some fairly messy caveats. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: pageinspect / add page_checksum and bt_page_items(bytea)
On 3/4/17 12:39, Tomas Vondra wrote: >> Can we have a test case for page_checksum(), or is that too difficult to >> get running deterministicly? > > I'm not sure it can be made deterministic. Certainly not on heap pages, > because then it'd be susceptible to xmin/xmax changes, but we have other > bits that change even on index pages (say pd_lsn). > > So I'm afraid that's not going to fly. Then just run it and throw away the result. See sql/page.sql for some examples. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Change in "policy" on dump ordering?
On 3/1/17 08:36, Peter Eisentraut wrote: > On 2/22/17 18:24, Jim Nasby wrote: >>> Yes, by that logic matview refresh should always be last. >> >> Patches for head attached. >> >> RLS was the first item added after DO_REFRESH_MATVIEW, which was added >> in 9.5. So if we want to treat this as a bug, they'd need to be patched >> as well, which is a simple matter of swapping 33 and 34. > > I wonder whether we should emphasize this change by assigning > DO_REFRESH_MATVIEW a higher number, like 100? Since there wasn't any interest in that idea, I have committed Jim's patch as is. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: pageinspect / add page_checksum and bt_page_items(bytea)
On 03/04/2017 02:08 PM, Peter Eisentraut wrote: On 3/3/17 09:03, Tomas Vondra wrote: Damn. In my defense, the patch was originally created for an older PostgreSQL version (to investigate issue on a production system), which used that approach to building values. Should have notice it, though. Attached is v2, fixing both issues. Can we have a test case for page_checksum(), or is that too difficult to get running deterministicly? I'm not sure it can be made deterministic. Certainly not on heap pages, because then it'd be susceptible to xmin/xmax changes, but we have other bits that change even on index pages (say pd_lsn). So I'm afraid that's not going to fly. > Also, perhaps page_checksum() doesn't need to be superuser-only, but I can see arguments for keeping it that way for consistency. Yes, I'll change that. It won't have much impact in practice, because all functions providing the page data (get_raw_page etc.) do require superuser. But you can still input the page as a bytea directly, and it's good practice not to add unnecessary superuser checks. regard -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] REINDEX CONCURRENTLY 2.0
On March 4, 2017 1:16:56 AM PST, Robert Haaswrote: >Maybe. But it looks to me like this patch is going to have >considerably more than its share of user-visible warts, and I'm not >very excited about that. I feel like what we ought to be doing is >keeping the index OID the same and changing the relfilenode to point >to a newly-created one, and I attribute our failure to make that >design work thus far to insufficiently aggressive hacking. We literally spent years and a lot of emails waiting for that to happen. Users now hack up solutions like this in userspace, obviously a bad solution. I agree that'd it be nicer not to have this, but not having the feature at all is a lot worse than this wart. Andres -- Sent from my Android device with K-9 Mail. Please excuse my brevity. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Parallel seq. plan is not coming against inheritance or partition table
Hi All, >From following git commit onwards, parallel seq scan is never getting selected for inheritance or partitioned tables. commit 51ee6f3160d2e1515ed6197594bda67eb99dc2cc Author: Robert HaasDate: Wed Feb 15 13:37:24 2017 -0500 Replace min_parallel_relation_size with two new GUCs. Steps to reproduce: == create table t1 (a integer); create table t1_1 (check (a >=1 and a < 100)) inherits (t1); create table t1_2 (check (a >= 100 and a < 200)) inherits (t1); insert into t1_1 select generate_series(1, 90); insert into t1_2 select generate_series(100, 190); analyze t1; analyze t1_1; analyze t1_2; explain analyze select * from t1 where a < 5 OR a > 195; EXPLAIN ANALYZE output: 1) Prior to "Replace min_parallel_relation_size with two new GUCs" commit, postgres=# explain analyze select * from t1 where a < 5 OR a > 195; QUERY PLAN --- Gather (cost=1000.00..25094.71 rows=48787 width=4) (actual time=0.431..184.264 rows=4 loops=1) Workers Planned: 2 Workers Launched: 2 -> Append (cost=0.00..19216.01 rows=20328 width=4) (actual time=0.025..167.465 rows=1 loops=3) -> Parallel Seq Scan on t1 (cost=0.00..0.00 rows=1 width=4) (actual time=0.001..0.001 rows=0 loops=3) Filter: ((a < 5) OR (a > 195)) -> Parallel Seq Scan on t1_1 (cost=0.00..9608.00 rows=20252 width=4) (actual time=0.023..76.644 rows=1 loops=3) Filter: ((a < 5) OR (a > 195)) Rows Removed by Filter: 283334 -> Parallel Seq Scan on t1_2 (cost=0.00..9608.01 rows=75 width=4) (actual time=89.505..89.505 rows=0 loops=3) Filter: ((a < 5) OR (a > 195)) Rows Removed by Filter: 30 Planning time: 0.343 ms Execution time: 188.624 ms (14 rows) 2) From "Replace min_parallel_relation_size with two new GUCs" commit onwards, postgres=# explain analyze select * from t1 where a < 5 OR a > 195; QUERY PLAN -- Append (cost=0.00..34966.01 rows=50546 width=4) (actual time=0.021..375.747 rows=4 loops=1) -> Seq Scan on t1 (cost=0.00..0.00 rows=1 width=4) (actual time=0.004..0.004 rows=0 loops=1) Filter: ((a < 5) OR (a > 195)) -> Seq Scan on t1_1 (cost=0.00..17483.00 rows=50365 width=4) (actual time=0.016..198.393 rows=4 loops=1) Filter: ((a < 5) OR (a > 195)) Rows Removed by Filter: 850001 -> Seq Scan on t1_2 (cost=0.00..17483.01 rows=180 width=4) (actual time=173.310..173.310 rows=0 loops=1) Filter: ((a < 5) OR (a > 195)) Rows Removed by Filter: 91 Planning time: 0.812 ms Execution time: 377.831 ms (11 rows) RCA: >From "Replace min_parallel_relation_size with two new GUCs" commit onwards, we are not assigning parallel workers for the child rel with zero heap pages. This means we won't be able to create a partial append path as this requires all the child rels within an Append Node to have a partial path. Please check the following code snippet from set_append_rel_pathlist(). /* Same idea, but for a partial plan. */ if (childrel->partial_pathlist != NIL) partial_subpaths = accumulate_append_subpath(partial_subpaths, linitial(childrel->partial_pathlist)); else partial_subpaths_valid = false; . . /* * Consider an append of partial unordered, unparameterized partial paths. */ if (partial_subpaths_valid) { ... ... /* Generate a partial append path. */ appendpath = create_append_path(rel, partial_subpaths, NULL, parallel_workers); add_partial_path(rel, (Path *) appendpath); } In short, no Gather path would be generated if baserel having an Append Node contains any child rel without partial path. This means just because one childRel having zero heap pages didn't get parallel workers other childRels that was good enough to go for Parallel Seq Scan had to go for normal seq scan which could be costlier. Fix: Attached is the patch that fixes this issue. -- With Regards, Ashutosh Sharma EnterpriseDB:http://www.enterprisedb.com assign_par_workers_for_empty_childRel_v1.patch Description: invalid/octet-stream -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GSoC 2017
Robert Haaswrites: > On Thu, Mar 2, 2017 at 3:45 AM, Jim Nasby wrote: >> On 2/27/17 4:52 PM, Thomas Munro wrote: >>> By the way, that page claims that PostgreSQL runs on Irix and Tru64, >>> which hasn't been true for a few years. >> There could be a GSoC project to add support for those back in... ;P > ... Whether it's also > useful to try to support running the system on unobtainable operating > systems is less clear to me. I seriously doubt that we'd take patches to run on non-mainstream OSes without a concomitant promise to support buildfarm animals running such OSes for the foreseeable future. Without that we don't know if the patches still work even a week after they're committed. We killed the above-mentioned OSes mainly for lack of any such animals, IIRC. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Use $ parameters as replacement characters for pg_stat_statements
Robert Haaswrites: > On Wed, Mar 1, 2017 at 8:21 PM, Peter Eisentraut > wrote: >> On 2/28/17 20:01, Lukas Fittl wrote: >>> I'd like to propose changing the replacement character from ? to instead >>> be a parameter (like $1). >> Hmm, I think this could confuse people into thinking that the queries >> displayed were in fact prepared queries. > Perhaps there could be a choice of behaviors. Even if we all agreed > that parameter notation was better in theory, there's something to be > said for maintaining backward compatibility, or having an option to do > so. Meh ... we've generally regretted it when we "solved" a backwards compatibility problem by introducing a GUC that changes query semantics. I'm inclined to think we should either do it or not. My own vote would probably be for "not", because I haven't seen a case made why it's important to be able to automatically distinguish a constant-substitution marker from a "?" operator. On the other hand, it seems like arguing for backwards compatibility here is a bit contradictory, because that would only matter if you think there *are* people trying to automatically parse the output of pg_stat_statements in that much detail. And if there are, they would likely appreciate it becoming less ambiguous. But speaking of ambiguity: isn't it possible for $n symbols to appear in pg_stat_statements already? I think it is, both from extended-protocol client queries and from SPI commands, which would mean that the proposal as it stands is not fixing the ambiguity problem at all. So yes, we need another idea. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch to implement pg_current_logfile() function
"Karl O. Pinc"writes: > On Sat, 4 Mar 2017 14:26:54 +0530 > Robert Haas wrote: >> On Fri, Mar 3, 2017 at 7:21 PM, Karl O. Pinc wrote: >>> But if the code does not exit the loop then >>> before calling elog(ERROR), shouldn't it >>> also call FreeFile(fd)? >> Hmm. Normally error recovery cleans up opened files, but since >> logfile_open() directly calls fopen(), that's not going to work here. >> So that does seem to be a problem. > Should something different be done to open the file or is it > enough to call FreeFile(fd) to clean up properly? I would say that in at least 99.44% of cases, if you think you need to do some cleanup action immediately before an elog(ERROR), that means You're Doing It Wrong. That can only be a safe coding pattern in a segment of code in which no called function does, *or ever will*, throw elog(ERROR) itself. Straight-line code with no reason ever to call anything else might qualify, but otherwise you're taking too much risk of current or future breakage. You need some mechanism that will ensure cleanup after any elog call anywhere, and the backend environment offers lots of such mechanisms. Without having actually looked at this patch, I would say that if it added a direct call of fopen() to backend-side code, that was already the wrong thing. Almost always, AllocateFile() would be a better choice, not only because it's tied into transaction abort, but also because it knows how to release virtual FDs in event of ENFILE/EMFILE errors. If there is some convincing reason why you shouldn't use AllocateFile(), then a safe cleanup pattern would be to have the fclose() in a PG_CATCH stanza. (FWIW, I don't particularly agree with Michael's objection to "break" after elog(ERROR). Our traditional coding style is to write such things anyway, so as to avoid drawing complaints from compilers that don't know that elog(ERROR) doesn't return. You could argue that that's an obsolete reason, but I don't think I want to see it done some places and not others. Consistency of coding style is a good thing.) regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch to implement pg_current_logfile() function
On Sat, 4 Mar 2017 14:26:54 +0530 Robert Haaswrote: > On Fri, Mar 3, 2017 at 7:21 PM, Karl O. Pinc wrote: > > But if the code does not exit the loop then > > before calling elog(ERROR), shouldn't it > > also call FreeFile(fd)? > > Hmm. Normally error recovery cleans up opened files, but since > logfile_open() directly calls fopen(), that's not going to work here. > So that does seem to be a problem. Should something different be done to open the file or is it enough to call FreeFile(fd) to clean up properly? Karl Free Software: "You don't pay back, you pay forward." -- Robert A. Heinlein -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] wait events for disk I/O
On Mon, Feb 20, 2017 at 4:04 PM, Rushabh Lathiawrote: > > My colleague Rahila reported compilation issue with > the patch. Issue was only coming with we do the clean > build on the branch. > > Fixed the same into latest version of patch. > Few assorted comments: 1. + + WriteBuffile + Waiting during buffile read operation. + Operation name and definition are not matching. 2. +FilePrefetch(File file, off_t offset, int amount, uint32 wait_event_info) { #if defined(USE_POSIX_FADVISE) && defined(POSIX_FADV_WILLNEED) int returnCode; @@ -1527,8 +1527,10 @@ FilePrefetch(File file, off_t offset, int amount) if (returnCode < 0) return returnCode; + pgstat_report_wait_start(wait_event_info); returnCode = posix_fadvise(VfdCache[file].fd, offset, amount, POSIX_FADV_WILLNEED); + pgstat_report_wait_end(); AFAIK, this call is non-blocking and will just initiate a read, so do you think we should record wait event for such a call. 3. - written = FileWrite(src->vfd, waldata_start, len); + written = FileWrite(src->vfd, waldata_start, len, + WAIT_EVENT_WRITE_REWRITE_DATA_BLOCK); if (written != len) ereport(ERROR, (errcode_for_file_access(), @@ -957,7 +960,7 @@ logical_end_heap_rewrite(RewriteState state) hash_seq_init(_status, state->rs_logical_mappings); while ((src = (RewriteMappingFile *) hash_seq_search(_status)) != NULL) { - if (FileSync(src->vfd) != 0) + if (FileSync(src->vfd, WAIT_EVENT_SYNC_REWRITE_DATA_BLOCK) != 0) Do we want to consider recording wait event for both write and sync? It seems to me OS level writes are relatively cheap and sync calls are expensive, so shouldn't we just log for sync calls? I could see the similar usage at multiple places in the patch. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: check failure with -DRELCACHE_FORCE_RELEASE -DCLOBBER_FREED_MEMORY
On 03/03/2017 02:24 PM, Andrew Dunstan wrote: > I have been setting up a buildfarm member with -DRELCACHE_FORCE_RELEASE > -DCLOBBER_FREED_MEMORY, settings which Alvaro suggested to me.I got core > dumps with these stack traces. The platform is Amazon Linux. > I have replicated this on a couple of other platforms (Fedora, FreeBSD) and back to 9.5. The same failure doesn't happen with buildfarm runs on earlier branches, although possibly they don't have the same set of tests. cheers andrew -- Andrew Dunstanhttps://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: Make pg_stop_backup() archive wait optional
Hi Robert, On 3/4/17 1:58 AM, Robert Haas wrote: > On Wed, Mar 1, 2017 at 9:07 AM, David Steelewrote: >> On 2/28/17 10:22 PM, Robert Haas wrote: >>> On Tue, Feb 28, 2017 at 6:22 AM, David Steele wrote: >> I'm not sure that's the case. It seems like it should lock just as >> multiple backends would now. One process would succeed and the others >> would error. Maybe I'm missing something? > > Hm, any errors happening in the workers would be reported to the > leader, meaning that even if one worker succeeded to run > pg_start_backup() it would be reported as an error at the end to the > client, no? By marking the exclusive function restricted we get sure > that it is just the leader that fails or succeeds. Good point, and it strengthens the argument beyond, "it just seems right." >>> >>> I think the argument should be based on whether or not the function >>> depends on backend-private state that will not be synchronized. >>> That's the definition of what makes something parallel-restricted or >>> not. >> >> Absolutely. Yesterday was a long day so I may have (perhaps) become a >> bit flippant. >> >>> It looks like pg_start_backup() and pg_stop_backup() depend on the >>> backend-private global variable nonexclusive_backup_running, so they >>> should be parallel-restricted. >> >> Agreed. > > How about a separately-committable patch that just does that, and then > a main patch that applies on top of it? Yes, that makes sense. Attached are two patches as requested: 01 - Just marks pg_stop_backup() variants as parallel restricted 02 - Add the wait_for_archive param to pg_stop_backup(). These apply cleanly on 272adf4. Thanks, -- -David da...@pgmasters.net diff --git a/src/include/catalog/catversion.h b/src/include/catalog/catversion.h index 438378d..b17a236 100644 --- a/src/include/catalog/catversion.h +++ b/src/include/catalog/catversion.h @@ -53,6 +53,6 @@ */ /* mmddN */ -#define CATALOG_VERSION_NO 201703032 +#define CATALOG_VERSION_NO 201703041 #endif diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h index 0c8b5c6..ec4aedb 100644 --- a/src/include/catalog/pg_proc.h +++ b/src/include/catalog/pg_proc.h @@ -3141,9 +3141,9 @@ DATA(insert OID = 2096 ( pg_terminate_backend PGNSP PGUID 12 1 0 0 0 f f f f t DESCR("terminate a server process"); DATA(insert OID = 2172 ( pg_start_backup PGNSP PGUID 12 1 0 0 0 f f f f t f v r 3 0 3220 "25 16 16" _null_ _null_ _null_ _null_ _null_ pg_start_backup _null_ _null_ _null_ )); DESCR("prepare for taking an online backup"); -DATA(insert OID = 2173 ( pg_stop_backupPGNSP PGUID 12 1 0 0 0 f f f f t f v s 0 0 3220 "" _null_ _null_ _null_ _null_ _null_ pg_stop_backup _null_ _null_ _null_ )); +DATA(insert OID = 2173 ( pg_stop_backupPGNSP PGUID 12 1 0 0 0 f f f f t f v r 0 0 3220 "" _null_ _null_ _null_ _null_ _null_ pg_stop_backup _null_ _null_ _null_ )); DESCR("finish taking an online backup"); -DATA(insert OID = 2739 ( pg_stop_backupPGNSP PGUID 12 1 1 0 0 f f f f t t v s 1 0 2249 "16" "{16,3220,25,25}" "{i,o,o,o}" "{exclusive,lsn,labelfile,spcmapfile}" _null_ _null_ pg_stop_backup_v2 _null_ _null_ _null_ )); +DATA(insert OID = 2739 ( pg_stop_backupPGNSP PGUID 12 1 1 0 0 f f f f t t v r 1 0 2249 "16" "{16,3220,25,25}" "{i,o,o,o}" "{exclusive,lsn,labelfile,spcmapfile}" _null_ _null_ pg_stop_backup_v2 _null_ _null_ _null_ )); DESCR("finish taking an online backup"); DATA(insert OID = 3813 ( pg_is_in_backup PGNSP PGUID 12 1 0 0 0 f f f f t f v s 0 0 16 "" _null_ _null_ _null_ _null_ _null_ pg_is_in_backup _null_ _null_ _null_ )); DESCR("true if server is in online backup"); diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index 9e084ad..ba6f030 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -18098,7 +18098,7 @@ SELECT set_config('log_statement_stats', 'off', false); -pg_stop_backup(exclusive boolean) +pg_stop_backup(exclusive boolean , wait_for_archive boolean ) setof record Finish performing exclusive or non-exclusive on-line backup (restricted to superusers by default, but other users can be granted EXECUTE to run the function) @@ -18182,7 +18182,13 @@ postgres=# select pg_start_backup('label_goes_here'); pg_start_backup. In a non-exclusive backup, the contents of the backup_label and tablespace_map are returned in the result of the function, and should be written to files in the -backup (and not in the data directory). +backup (and not in the data directory). There is an optional second +parameter of type boolean. If false, the pg_stop_backup +will return immediately after the backup is completed
Re: [HACKERS] Cost model for parallel CREATE INDEX
Peter, * Peter Geoghegan (p...@bowt.ie) wrote: > On Sat, Mar 4, 2017 at 12:50 AM, Robert Haaswrote: > > If the result of > > compute_parallel_workers() based on min_parallel_table_scan_size is > > smaller, then use that value instead. I must be confused, because I > > actually though that was the exact algorithm you were describing, and > > it sounded good to me. > > It is, but I was using that with index size, not table size. I can > change it to be table size, based on what you said. But the workMem > related cap, which probably won't end up being applied all that often > in practice, *should* still do something with projected index size, > since that really is what we're sorting, which could be very different > (e.g. with partial indexes). Isn't that always going to be very different, unless you're creating a single index across every column in the table..? Or perhaps I've misunderstood what you're comparing as being 'very different' in your last sentence. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] [FEATURE PATCH] pg_stat_statements with plans
Alright, for the next version of this patch I'll look into standard deviation (an implementation of Welfords' algorithm already exists in pg_stat_statements). On 3/4/17 14:18, Peter Eisentraut wrote: The other problem is that this measures execution time, which can vary for reasons other than plan. I would have expected that the cost numbers are tracked somehow. I've already thought of tracking specific parts of the explanation, like the cost numbers, instead of the whole string, I'll think of something, but if anybody has any bright ideas in the meantime, I'd gladly listen to them. There is also the issue of generic vs specific plans, which this approach might be papering over. Would you be so kind and elaborate a little bit on this? I'm not sure if I understand this correctly. This patch only tracks specific plans, yes. The inital idea was that there might be some edge-cases that are not apparent when looking at generalized plans or queries. kind regards Julian -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [FEATURE PATCH] pg_stat_statements with plans
Hi Julian, On 3/4/17 8:41 AM, Julian Markwort wrote: >> Any improvements would be greatly welcome by the admin >> community, I'm sure. > That's good to hear - on the other hand, I really appreciate the opinion > of admins on this idea! >> However, this is a rather large change which could be destabilizing to >> the many users of this extension. > I'm fully aware of that, which is why there are already several switches > in place so you can keep using the existing functionality without > compromises or added complexity. > At the same time, I'm always open to suggestions regarding the reduction > of complexity and probably more importantly the reduction of disk IO. >> I recommend moving this patch to the 2017-07 CF. > Since I do not have very much time for this at the moment I'll be > looking forward to discussions on this patch in the next commitfest! Since some concerns were raised about the implementation, I have instead marked this "Returned with Feedback". I encourage you to continue developing the patch with the community and resubmit into the appropriate CF when it is ready. -- -David da...@pgmasters.net -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: pageinspect / add page_checksum and bt_page_items(bytea)
On 3/3/17 09:03, Tomas Vondra wrote: > Damn. In my defense, the patch was originally created for an older > PostgreSQL version (to investigate issue on a production system), which > used that approach to building values. Should have notice it, though. > > Attached is v2, fixing both issues. Can we have a test case for page_checksum(), or is that too difficult to get running deterministicly? Also, perhaps page_checksum() doesn't need to be superuser-only, but I can see arguments for keeping it that way for consistency. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [FEATURE PATCH] pg_stat_statements with plans
Any improvements would be greatly welcome by the admin community, I'm sure. That's good to hear - on the other hand, I really appreciate the opinion of admins on this idea! However, this is a rather large change which could be destabilizing to the many users of this extension. I'm fully aware of that, which is why there are already several switches in place so you can keep using the existing functionality without compromises or added complexity. At the same time, I'm always open to suggestions regarding the reduction of complexity and probably more importantly the reduction of disk IO. I recommend moving this patch to the 2017-07 CF. Since I do not have very much time for this at the moment I'll be looking forward to discussions on this patch in the next commitfest! kind regards Julian -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [WIP]Vertical Clustered Index (columnar store extension)
On 3/4/17 8:33 AM, Peter Eisentraut wrote: > On 3/3/17 16:16, David Steele wrote: >> While this looks like it could be a really significant performance >> improvement, I think the above demonstrates that it needs a lot of work. >> I know this is not new to the 2017-03 CF but it doesn't seem enough >> progress has been made since posting to allow it to be committed in time >> for v10. >> >> I recommend moving this patch to the 2017-07 CF. > > I think the patch that was in 2017-01 was given some feedback that put > the fundamental approach in question, which the author appeared to agree > with. So I don't know why this patch appeared in this CF at all. Then it sounds like it should be marked RWF. Haribabu can resubmit when there's a new candidate patch. -- -David da...@pgmasters.net -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [WIP]Vertical Clustered Index (columnar store extension)
On 3/3/17 16:16, David Steele wrote: > While this looks like it could be a really significant performance > improvement, I think the above demonstrates that it needs a lot of work. > I know this is not new to the 2017-03 CF but it doesn't seem enough > progress has been made since posting to allow it to be committed in time > for v10. > > I recommend moving this patch to the 2017-07 CF. I think the patch that was in 2017-01 was given some feedback that put the fundamental approach in question, which the author appeared to agree with. So I don't know why this patch appeared in this CF at all. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [FEATURE PATCH] pg_stat_statements with plans
On 1/25/17 12:43, Simon Riggs wrote: > On 25 January 2017 at 17:34, Julian Markwort >wrote: > >> Analogous to this, a bad_plan is saved, when the time has been exceeded by a >> factor greater than 1.1 . > ...and the plan differs? > > Probably best to use some stat math to calculate deviation, rather than fixed > %. Yeah, it seems to me too that this needs a bit more deeper analysis. I don't see offhand why a 10% deviation in execution time would be a reasonable threshold for "good" or "bad". A deviation approach like you allude to would be better. The other problem is that this measures execution time, which can vary for reasons other than plan. I would have expected that the cost numbers are tracked somehow. There is also the issue of generic vs specific plans, which this approach might be papering over. Needs more thought. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_dump, pg_dumpall and data durability
On Sat, Mar 4, 2017 at 3:08 PM, Robert Haaswrote: > On Thu, Mar 2, 2017 at 5:02 AM, Michael Paquier > wrote: >> On Thu, Mar 2, 2017 at 2:26 AM, David Steele wrote: >>> This patch is in need of a committer. Any takers? >>> I didn't see a lot of enthusiasm from committers on the thread >> >> Stephen at least has showed interest. >> >>> so if nobody picks it up by the end of the CF I'm going to mark the patch >>> RWF. >> >> Yes, that makes sense. If no committer is willing to have a look at >> code-level and/or has room for this patch then it is as good as >> doomed. Except for bug fixes, I have a couple of other patches that >> are piling up so they would likely get the same treatment. There is >> nothing really we can do about that. > > Before we reach the question of whether committers have time to look > at this, we should first consider the question of whether it has > achieved consensus. I'll try to summarize the votes: > > Tom Lane: premise pretty dubious > Robert Haas: do we even want this? > Peter Eisentraut: I had voiced a similar concern [to Robert's] previously > Albe Laurenz: I think we should have that > Andres Freund: [Tom's counterproposal won't work] > Robert Haas: [Andres has a good point, still nervous] I'm not sure > there's any better alternative to what's being proposed, though. > Fujii Masao: One idea is to provide the utility or extension which > fsync's the specified files and directories > > Here's an attempt to translate those words into numerical votes. As > per my usual practice, I will count the patch author as +1: > > Michael Paquier: +1 > Tom Lane: -1 > Peter Eisentraut: -1 > Albe Laurenz: +1 > Andres Freund: +1 > Robert Haas: +0.5 > Fujii Masao: -0.5 > > So, my interpretation is that, out of 7 votes, we have -2.5 and +3.5. > Perhaps that is a consensus for proceeding, but if so it's a pretty > marginal one. I think the next step for this patch is to consider why > we shouldn't, in lieu of what's proposed here, add a pg_fsync utility > that fsyncs a file or recursively fsyncs a directory, ship that, and > let people use it on their pg_dump files and/or base backups if they > wish. I am not altogether convinced that's a better option, but I am > also not altogether convinced that it's worse. Also, if anyone else > wishes to vote, or if anyone to whom I've attributed a vote wishes to > assign a numerical value to their vote other than the one I've > assigned, please feel free. Not completely exact by including this message: https://www.postgresql.org/message-id/20170123050248.go18...@tamriel.snowman.net If I interpret this message correctly Stephen Frost can be counted with either +1 or +0.5. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WAL Consistency checking for hash indexes
On Sat, Mar 4, 2017 at 2:29 PM, Robert Haaswrote: > On Fri, Mar 3, 2017 at 9:44 AM, Amit Kapila wrote: >> On Tue, Feb 28, 2017 at 11:06 AM, Kuntal Ghosh >> wrote: >>> Hello everyone, >>> >>> I've attached a patch which implements WAL consistency checking for >>> hash indexes. This feature is going to be useful for developing and >>> testing of WAL logging for hash index. >>> >> >> I think it is better if you base your patch on (Microvacuum support >> for hash index - https://commitfest.postgresql.org/13/835/). > > I'd rather have this based on top of the WAL logging patch, and have > any subsequent patches that tinker with the WAL logging include the > necessary consistency checking changes also. > Fair point. I thought as the other patch has been proposed before this patch, so it might be better to tackle the changes related to that patch in this patch. However, changing the MicroVacuum or any other patch to consider what needs to be masked for that patch sounds sensible. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GUC for cleanup indexes threshold.
On Sat, Mar 4, 2017 at 8:55 AM, Peter Geogheganwrote: > On Fri, Mar 3, 2017 at 6:58 PM, Amit Kapila wrote: >> You are right that we don't want the number of unclaimed-by-FSM >> recyclable pages to grow forever, but I think that won't happen with >> this patch. As soon as there are more deletions (in heap), in the >> next vacuum cycle, the pages will be reclaimed by lazy_vacuum_index(). > > Right. > >>> (Thinks about it some more...) >>> >>> Unfortunately, I just saw a whole new problem with this patch: >>> _bt_page_recyclable() is the one place in the B-Tree AM where we stash >>> an XID. >>> >> >> Can you be more specific to tell which code exactly you are referring here? > > I meant that we stash an XID when a B-Tree page is deleted, used to > determine when it's finally safe to to recycle the page by comparing > it to RecentGlobalXmin (recyling will happen during the next VACUUM). > We can't immediately recycle a page, because an existing index scan > might land on the page while following a stale pointer. > > _bt_page_recyclable(), which checks if recycling is safe (no possible > index scan can land of dead page), is a pretty small function. I'm > concerned about what happens when > pg_class.relfrozenxid/pg_database.datfrozenxid are advanced past > opaque->btpo.xact. > Are you talking pg_class.relfrozenxid of index or table? IIUC, for indexes it will be InvalidTransactionId and for tables, I think it will be updated with or without the patch. > While I can't see this explained anywhere, I'm > pretty sure that that's supposed to be impossible, which this patch > changes. > What makes you think that patch will allow pg_class.relfrozenxid to be advanced past opaque->btpo.xact which was previously not possible? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Print correct startup cost for the group aggregate.
On Thu, Mar 2, 2017 at 6:48 PM, Ashutosh Bapatwrote: > On Thu, Mar 2, 2017 at 6:06 PM, Rushabh Lathia > wrote: >> While reading through the cost_agg() I found that startup cost for the >> group aggregate is not correctly assigned. Due to this explain plan is >> not printing the correct startup cost. >> >> Without patch: >> >> postgres=# explain select aid, sum(abalance) from pgbench_accounts where >> filler like '%foo%' group by aid; >> QUERY PLAN >> - >> GroupAggregate (cost=81634.33..85102.04 rows=198155 width=12) >>Group Key: aid >>-> Sort (cost=81634.33..82129.72 rows=198155 width=8) >> Sort Key: aid >> -> Seq Scan on pgbench_accounts (cost=0.00..61487.89 rows=198155 >> width=8) >>Filter: (filler ~~ '%foo%'::text) >> (6 rows) >> >> With patch: >> >> postgres=# explain select aid, sum(abalance) from pgbench_accounts where >> filler like '%foo%' group by aid; >> QUERY PLAN >> - >> GroupAggregate (cost=82129.72..85102.04 rows=198155 width=12) >>Group Key: aid >>-> Sort (cost=81634.33..82129.72 rows=198155 width=8) >> Sort Key: aid >> -> Seq Scan on pgbench_accounts (cost=0.00..61487.89 rows=198155 >> width=8) >>Filter: (filler ~~ '%foo%'::text) >> (6 rows) >> > > The reason the reason why startup_cost = input_startup_cost and not > input_total_cost for aggregation by sorting is we don't need the whole > input before the Group/Agg plan can produce the first row. But I think > setting startup_cost = input_startup_cost is also not exactly correct. > Before the plan can produce one row, it has to transit through all the > rows belonging to the group to which the first row belongs. On an > average it has to scan (total number of rows)/(number of groups) > before producing the first aggregated row. startup_cost will be > input_startup_cost + cost to scan (total number of rows)/(number of > groups) rows + cost of transiting over those many rows. Total cost = > startup_cost + cost of scanning and transiting through the remaining > number of input rows. While that idea has some merit, I think it's inconsistent with current practice. cost_seqscan(), for example, doesn't include the cost of reading the first page in the startup cost, even though that certainly must be done before returning the first row. I think there have been previous discussions of switching over to the practice for which you are advocating here, but my impression (without researching) is that the current practice is more like what Rushabh did. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] REINDEX CONCURRENTLY 2.0
On Thu, Mar 2, 2017 at 11:48 AM, Andres Freundwrote: > On 2017-03-01 19:25:23 -0600, Jim Nasby wrote: >> On 2/28/17 11:21 AM, Andreas Karlsson wrote: >> > The only downside I can see to this approach is that we no logner will >> > able to reindex catalog tables concurrently, but in return it should be >> > easier to confirm that this approach can be made work. >> >> Another downside is any stored regclass fields will become invalid. >> Admittedly that's a pretty unusual use case, but it'd be nice if there was >> at least a way to let users fix things during the rename phase (perhaps via >> an event trigger). > > I'm fairly confident that we don't want to invoke event triggers inside > the CIC code... I'm also fairly confident that between index oids > stored somewhere being invalidated - what'd be a realistic use case of > that - and not having reindex concurrently, just about everyone will > choose the former. Maybe. But it looks to me like this patch is going to have considerably more than its share of user-visible warts, and I'm not very excited about that. I feel like what we ought to be doing is keeping the index OID the same and changing the relfilenode to point to a newly-created one, and I attribute our failure to make that design work thus far to insufficiently aggressive hacking. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WAL Consistency checking for hash indexes
On Fri, Mar 3, 2017 at 9:44 AM, Amit Kapilawrote: > On Tue, Feb 28, 2017 at 11:06 AM, Kuntal Ghosh > wrote: >> Hello everyone, >> >> I've attached a patch which implements WAL consistency checking for >> hash indexes. This feature is going to be useful for developing and >> testing of WAL logging for hash index. >> > > I think it is better if you base your patch on (Microvacuum support > for hash index - https://commitfest.postgresql.org/13/835/). I'd rather have this based on top of the WAL logging patch, and have any subsequent patches that tinker with the WAL logging include the necessary consistency checking changes also. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Cost model for parallel CREATE INDEX
On Sat, Mar 4, 2017 at 12:50 AM, Robert Haaswrote: > If you think parallelism isn't worthwhile unless the sort was going to > be external anyway, I don't -- that's just when it starts to look like a safe bet that parallelism is worthwhile. There are quite a few cases where an external sort is faster than an internal sort these days, actually. > then it seems like the obvious thing to do is > divide the projected size of the sort by maintenance_work_mem, round > down, and cap the number of workers to the result. I'm sorry, I don't follow. > If the result of > compute_parallel_workers() based on min_parallel_table_scan_size is > smaller, then use that value instead. I must be confused, because I > actually though that was the exact algorithm you were describing, and > it sounded good to me. It is, but I was using that with index size, not table size. I can change it to be table size, based on what you said. But the workMem related cap, which probably won't end up being applied all that often in practice, *should* still do something with projected index size, since that really is what we're sorting, which could be very different (e.g. with partial indexes). -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch to implement pg_current_logfile() function
On Fri, Mar 3, 2017 at 7:21 PM, Karl O. Pincwrote: > But if the code does not exit the loop then > before calling elog(ERROR), shouldn't it > also call FreeFile(fd)? Hmm. Normally error recovery cleans up opened files, but since logfile_open() directly calls fopen(), that's not going to work here. So that does seem to be a problem. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch to implement pg_current_logfile() function
On Fri, Mar 3, 2017 at 11:54 AM, Michael Paquierwrote: > On Fri, Mar 3, 2017 at 3:18 PM, Robert Haas wrote: >> Hopefully I haven't broken anything; please let me know if you >> encounter any issues. > > Reading what has just been committed... > > + /* > +* No space found, file content is corrupted. Return NULL to the > +* caller and inform him on the situation. > +*/ > + elog(ERROR, > +"missing space character in \"%s\"", LOG_METAINFO_DATAFILE); > + break; > There is no need to issue a break after a elog(ERROR). True, but it's not wrong, either. We do it all the time. git grep -A2 elog.*ERROR /break The fact that the comment doesn't match the code, though, is wrong. Oops. > +* No newlinei found, file content is corrupted. Return NULL to > s/newlinei/newline/ That's also a problem, and that comment also refers to returning, which we don't. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Cost model for parallel CREATE INDEX
On Sat, Mar 4, 2017 at 2:17 PM, Peter Geogheganwrote: > On Sat, Mar 4, 2017 at 12:43 AM, Robert Haas wrote: >> Oh. But then I don't see why you need min_parallel_anything. That's >> just based on an estimate of the amount of data per worker vs. >> maintenance_work_mem, isn't it? > > Yes -- and it's generally a pretty good estimate. > > I don't really know what minimum amount of memory to insist workers > have, which is why I provisionally chose one of those GUCs as the > threshold. > > Any better ideas? I don't understand how min_parallel_anything is telling you anything about memory. It has, in general, nothing to do with that. If you think parallelism isn't worthwhile unless the sort was going to be external anyway, then it seems like the obvious thing to do is divide the projected size of the sort by maintenance_work_mem, round down, and cap the number of workers to the result. If the result of compute_parallel_workers() based on min_parallel_table_scan_size is smaller, then use that value instead. I must be confused, because I actually though that was the exact algorithm you were describing, and it sounded good to me. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Block level parallel vacuum WIP
On Fri, Mar 3, 2017 at 9:50 PM, Masahiko Sawadawrote: > Yes, it's taking a time to update logic and measurement but it's > coming along. Also I'm working on changing deadlock detection. Will > post new patch and measurement result. I think that we should push this patch out to v11. I think there are too many issues here to address in the limited time we have remaining this cycle, and I believe that if we try to get them all solved in the next few weeks we're likely to end up getting backed into some choices by time pressure that we may later regret bitterly. Since I created the deadlock issues that this patch is facing, I'm willing to try to help solve them, but I think it's going to require considerable and delicate surgery, and I don't think doing that under time pressure is a good idea. >From a fairness point of view, a patch that's not in reviewable shape on March 1st should really be pushed out, and we're several days past that. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Cost model for parallel CREATE INDEX
On Sat, Mar 4, 2017 at 12:43 AM, Robert Haaswrote: > Oh. But then I don't see why you need min_parallel_anything. That's > just based on an estimate of the amount of data per worker vs. > maintenance_work_mem, isn't it? Yes -- and it's generally a pretty good estimate. I don't really know what minimum amount of memory to insist workers have, which is why I provisionally chose one of those GUCs as the threshold. Any better ideas? -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Cost model for parallel CREATE INDEX
On Sat, Mar 4, 2017 at 2:01 PM, Peter Geogheganwrote: > On Sat, Mar 4, 2017 at 12:23 AM, Robert Haas wrote: >>> I guess that the workMem scaling threshold thing could be >>> min_parallel_index_scan_size, rather than min_parallel_relation_size >>> (which we now call min_parallel_table_scan_size)? >> >> No, it should be based on min_parallel_table_scan_size, because that >> is the size of the parallel heap scan that will be done as input to >> the sort. > > I'm talking about the extra thing we do to prevent parallelism from > being used when per-worker workMem is excessively low. That has much > more to do with projected index size than current heap size. Oh. But then I don't see why you need min_parallel_anything. That's just based on an estimate of the amount of data per worker vs. maintenance_work_mem, isn't it? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Cost model for parallel CREATE INDEX
On Sat, Mar 4, 2017 at 12:23 AM, Robert Haaswrote: >> I guess that the workMem scaling threshold thing could be >> min_parallel_index_scan_size, rather than min_parallel_relation_size >> (which we now call min_parallel_table_scan_size)? > > No, it should be based on min_parallel_table_scan_size, because that > is the size of the parallel heap scan that will be done as input to > the sort. I'm talking about the extra thing we do to prevent parallelism from being used when per-worker workMem is excessively low. That has much more to do with projected index size than current heap size. I agree with everything else you've said, I think. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] I propose killing PL/Tcl's "modules" infrastructure
On Thu, Mar 2, 2017 at 11:29 PM, Tom Lanewrote: > After thinking about that for awhile, it seemed like the most useful thing > to do is to set up an errcontext callback that will be active throughout > execution of the start_proc. That will cover both setup failures like > the above, and errors occurring within the start_proc, which could be > equally confusing if you think they apply to the function you initially > tried to call. v2 patch attached that does it like that. +1 for that approach. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Cost model for parallel CREATE INDEX
On Thu, Mar 2, 2017 at 10:38 PM, Peter Geogheganwrote: > I'm glad. This justifies the lack of much of any "veto" on the > logarithmic scaling. The only thing that can do that is > max_parallel_workers_maintenance, the storage parameter > parallel_workers (maybe this isn't a storage parameter in V9), and > insufficient maintenance_work_mem per worker (as judged by > min_parallel_relation_size being greater than workMem per worker). > > I guess that the workMem scaling threshold thing could be > min_parallel_index_scan_size, rather than min_parallel_relation_size > (which we now call min_parallel_table_scan_size)? No, it should be based on min_parallel_table_scan_size, because that is the size of the parallel heap scan that will be done as input to the sort. >> I think it's totally counter-intuitive that any hypothetical index >> storage parameter would affect the degree of parallelism involved in >> creating the index and also the degree of parallelism involved in >> scanning it. Whether or not other systems do such crazy things seems >> to me to beside the point. I think if CREATE INDEX allows an explicit >> specification of the degree of parallelism (a decision I would favor) >> it should have a syntactically separate place for unsaved build >> options vs. persistent storage parameters. > > I can see both sides of it. > > On the one hand, it's weird that you might have query performance > adversely affected by what you thought was a storage parameter that > only affected the index build. On the other hand, it's useful that you > retain that as a parameter, because you may want to periodically > REINDEX, or have a way of ensuring that pg_restore does go on to use > parallelism, since it generally won't otherwise. (As mentioned > already, pg_restore does not trust the cost model due to issues with > the availability of statistics). If you make the changes I'm proposing above, this parenthetical issue goes away, because the only statistic you need is the table size, which is what it is. As to the rest, I think a bare REINDEX should just use the cost model as if it were CREATE INDEX, and if you want to override that behavior, you can do that by explicit syntax. I see very little utility for a setting that fixes the number of workers to be used for future reindexes: there won't be many of them, and it's kinda confusing. But even if we decide to have that, I see no justification at all for conflating it with the number of workers to be used for a scan, which is something else altogether. > To be clear, I don't have any strong feelings on all this. I just > think it's worth pointing out that there are reasons to not do what > you suggest, that you might want to consider if you haven't already. I have considered them. I also acknowledge that other people may view the situation differently than I do. I'm just telling you my opinion on the topic. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers