Re: [HACKERS] matview scannability rehash (was Re: Drastic performance loss in assert-enabled build in HEAD)
Kevin Grittner wrote: > I think a large part of what is at issue here stems from a bad > name for the new bool field I added to the RelationData structure > -- instead of rd_isscannable it should probably be called > something like rd_ispopulated. The current name led to some > fuzzy thinking on my part when it was referenced, and both the > name and a couple ill-considered uses of it probably contributed > to concerns about how it was generated and used. > > I will post a draft patch today to see whether concerns abate. > Basically, the name change should help make clear that this is > not intended to be the only way to determine whether a matview is > scannable. Second, there is at least on (and probably more) > direct tests of this field which should use a function for a > scannability test. For 9.3, that will just wrap a test of this > bool, but it makes clear what the longer-term intent is, and help > ensure that things don't get missed when patches are written in > later releases. Third, some comments need to be corrected and > added. > > Hopefully it can help get us all onto the same page. If not, it > should at least better focus the discussion. Attached is a firt cut at drawing a bright line between the notion of whether a matview has been *populated* and whether it is *scannable*. In 9.3 one is true if and only if the other is, but I plan on doing work such that this will no longer be true in the next release, and not making a clear distinction now has been confusing everyone, including me. This patch is light on functional changes, and heavier on name and comment changes. It does fix the lack of locking for the user-visible pg_relation_is_scannable() function, and it does correct the performance regression in pg_dump, but those should be the only user-visible changes. Internally I also tried to correct a modularity violation complained of by Tom. Vacuum still needs to be taught not to truncate away the first page of a matview, but that seemed like it was better left for a separate patch, and if we decide not to use the current technique for identifying a non-populated matview, it owuld be one more thing to undo. It passed `make check-world` and `make installcheck-world`, and a dump and load of the regression database works. I got sidetracked with some support issues, so I didn't have time to dig around for other weak comments, but I think I'm close on all other changes. Comments? -- Kevin Grittner EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company*** a/src/backend/commands/cluster.c --- b/src/backend/commands/cluster.c *** *** 381,393 cluster_rel(Oid tableOid, Oid indexOid, bool recheck, bool verbose, check_index_is_clusterable(OldHeap, indexOid, recheck, AccessExclusiveLock); /* ! * Quietly ignore the request if the a materialized view is not scannable. ! * No harm is done because there is nothing no data to deal with, and we ! * don't want to throw an error if this is part of a multi-relation ! * request -- for example, CLUSTER was run on the entire database. */ if (OldHeap->rd_rel->relkind == RELKIND_MATVIEW && ! !OldHeap->rd_isscannable) { relation_close(OldHeap, AccessExclusiveLock); return; --- 381,394 check_index_is_clusterable(OldHeap, indexOid, recheck, AccessExclusiveLock); /* ! * Quietly ignore the request if this is a materialized view which has not ! * been populated from its query. No harm is done because there is no data ! * to deal with, and we don't want to throw an error if this is part of a ! * multi-relation request -- for example, CLUSTER was run on the entire ! * database. */ if (OldHeap->rd_rel->relkind == RELKIND_MATVIEW && ! !OldHeap->rd_ispopulated) { relation_close(OldHeap, AccessExclusiveLock); return; *** *** 923,929 copy_heap_data(Oid OIDNewHeap, Oid OIDOldHeap, Oid OIDOldIndex, if (OldHeap->rd_rel->relkind == RELKIND_MATVIEW) /* Make sure the heap looks good even if no rows are written. */ ! SetRelationIsScannable(NewHeap); /* * Scan through the OldHeap, either in OldIndex order or sequentially; --- 924,930 if (OldHeap->rd_rel->relkind == RELKIND_MATVIEW) /* Make sure the heap looks good even if no rows are written. */ ! SetMatViewToPopulated(NewHeap); /* * Scan through the OldHeap, either in OldIndex order or sequentially; *** a/src/backend/commands/createas.c --- b/src/backend/commands/createas.c *** *** 417,423 intorel_startup(DestReceiver *self, int operation, TupleDesc typeinfo) if (into->relkind == RELKIND_MATVIEW && !into->skipData) /* Make sure the heap looks good even if no rows are written. */ ! SetRelationIsScannable(intoRelationDesc); /* * Check INSERT permission on the constructed table. --- 417,423 if (into->relkind == RELKIND_MATVIEW && !into->skipData) /* Make sure the heap looks good even if
Re: [HACKERS] matview scannability rehash (was Re: Drastic performance loss in assert-enabled build in HEAD)
Kevin Grittner wrote: > I think I need to review the whole thread again to make sure I > wasn't too quick to concede the point. On a fresh reading of this, I think a large part of what is at issue here stems from a bad name for the new bool field I added to the RelationData structure -- instead of rd_isscannable it should probably be called something like rd_ispopulated. The current name led to some fuzzy thinking on my part when it was referenced, and both the name and a couple ill-considered uses of it probably contributed to concerns about how it was generated and used. I will post a draft patch today to see whether concerns abate. Basically, the name change should help make clear that this is not intended to be the only way to determine whether a matview is scannable. Second, there is at least on (and probably more) direct tests of this field which should use a function for a scannability test. For 9.3, that will just wrap a test of this bool, but it makes clear what the longer-term intent is, and help ensure that things don't get missed when patches are written in later releases. Third, some comments need to be corrected and added. Hopefully it can help get us all onto the same page. If not, it should at least better focus the discussion. -- Kevin Grittner EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] matview scannability rehash (was Re: Drastic performance loss in assert-enabled build in HEAD)
Noah Misch wrote: > The SQL commands I cited as responsible for creating or removing > the fork all make a new relfilenode anyway. Thus, "add" actually > means creating the fork with the new relfilenode, and "remove" > actually means omitting the fork from the new relfilenode. The > association between relfilenodes and relations is, of course, > transactional. The same argument applies to the currently-committed code. The goal is to not change a matview between zero-length and non-zero length once the heap exists; but to only have this state change when REFRESH replaces the heap in the style of CLUSTER, TRUNCATE, ALTER TABLE with heap rewrite, or the new VACUUM FULL. -- Kevin Grittner EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] matview scannability rehash (was Re: Drastic performance loss in assert-enabled build in HEAD)
Tom Lane wrote: > I realize that there's no other (easy) way to make unlogged > matviews reset to an invalid state on crash, but that doesn't > make this design choice less of a disaster. It boxes us into > something that's entirely unable to support transitions between > scannable and unscannable states by any means short of a complete > rewrite of the matview contents; which seems fundamentally > incompatible with any sort of incremental update scenario. That assertion makes absolutely no sense. Once we design (in a future release cycle) a way for users to declare how current a view must be to be usable, there is no reason the pg_relation_is_scannable() function cannot be modified to use those mechanisms. I continue to believe that it is a bad idea to ever allow a matview to be scanned when it is not a materialization of its query, but that does *not* mean that all matviews that are a materialization of the query would need to be considered scannable. My working assumption was that the isscannable field in relcache would be the first of many tests once we get there; if the matview actually does represent some materialization of data, the function would then proceed to check whether it is current enough to use, It does mean that you could not start incremental maintenance on a matview without first generating a base by running the query to create initial data (as part of CREATE or REFRESH), but that seems like a *good* thing to me. To do otherwise would make no more sense than to try to recover a database using just WAL files without a base backup to apply them to. > And I remain of the opinion that it's going to box us into not > being able to fix the problems because of pg_upgrade on-disk- > compatibility issues. This argument also seems bogus to me. Since it is a valid heap either way, the *worst case* would be recommending that after upgrading users take some special action on any materialized views which were not scannable; and I doubt that we would need to do that. If you see some risk I'm missing, please elaborate. > We will be far better off to drop unlogged matviews until we can > come up with a better design. If that's so far off that no one > can see it happening, well, that's tough. Leaving the door open > for incremental maintenance is more important. I've been looking at what is needed for incremental maintenance, and I'm not seeing the problem. Since you can't incrementally update a view which has never been populated, the only way in which it could create a problem in terms of transactional behavior, I think, is that this would make it harder to not hold an AccessExclusiveLock when transitioning between not having materialized data and having it (or vice versa), and I'm dubious we can avoid such a lock anyway. I don't see where it creates any problems for performing incremental updates once we are in a state which can allow them. > This really needs to be catalog state, not filesystem state. That may be true, but the arguments in this post are so off-base that I'm wondering whether it really is. When I read some earlier posts I was convinced, but now I think I need to review the whole thread again to make sure I wasn't too quick to concede the point. -- Kevin Grittner EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] matview scannability rehash (was Re: Drastic performance loss in assert-enabled build in HEAD)
Tom Lane escribió: > Noah Misch writes: > > A slight variation on the committed approach would be to add a "_scannable" > > relation fork. > > Not very transaction-safe, I think (consider crash midway through a > transaction that adds or removes the fork), and in any case orders of > magnitude more expensive than looking at a pg_class field. This really > needs to be catalog state, not filesystem state. We could revive the pg_class_nt patch proposed a decade ago, perhaps ... -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] matview scannability rehash (was Re: Drastic performance loss in assert-enabled build in HEAD)
Noah Misch writes: > On Wed, Apr 03, 2013 at 05:49:18PM -0400, Tom Lane wrote: >> No. This is an absolute disaster. It's taking something we have always >> considered to be an irrelevant implementation detail and making it into >> user-visible DDL state, despite the fact that it doesn't begin to satisfy >> basic transactional behaviors. We *need* to get rid of that aspect of >> things. If you must have scannability state in version 0, okay, but >> it has to be a catalog property not this. > In large part, this ended up outside the catalogs due to key limitations of > the startup process. Yeah, I realize that there's no other (easy) way to make unlogged matviews reset to an invalid state on crash, but that doesn't make this design choice less of a disaster. It boxes us into something that's entirely unable to support transitions between scannable and unscannable states by any means short of a complete rewrite of the matview contents; which seems fundamentally incompatible with any sort of incremental update scenario. And I remain of the opinion that it's going to box us into not being able to fix the problems because of pg_upgrade on-disk-compatibility issues. We will be far better off to drop unlogged matviews until we can come up with a better design. If that's so far off that no one can see it happening, well, that's tough. Leaving the door open for incremental maintenance is more important. > A slight variation on the committed approach would be to add a "_scannable" > relation fork. Not very transaction-safe, I think (consider crash midway through a transaction that adds or removes the fork), and in any case orders of magnitude more expensive than looking at a pg_class field. This really needs to be catalog state, not filesystem state. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] matview scannability rehash (was Re: Drastic performance loss in assert-enabled build in HEAD)
Subject updated to account for the wider topics now appearing. On Wed, Apr 03, 2013 at 05:49:18PM -0400, Tom Lane wrote: > What I'd actually rather see us spending time on right now is making > some provision for incremental updates, which I will boldly propose > could be supported by user-written triggers on the underlying tables > if we only diked out the prohibitions against INSERT/UPDATE/DELETE on > matviews, and allowed them to operate on a matview's contents just like > it was a table. Now admittedly that would foreclose allowing matviews > to be updatable in the updatable-view sense, but that's a feature I > would readily give up if it meant users could build incremental update > mechanisms this year and not two years down the road. I can't see taking MVs in the direction of allowing arbitrary DML; that's what tables are for. Users wishing to do that should keep using current methods: CREATE VIEW mv_impl AS SELECT ...; CREATE TABLE mv AS SELECT * FROM mv_impl; -- REFRESH analogue: fancier approaches exist BEGIN; TRUNCATE mv; INSERT INTO mv SELECT * FROM mv_impl; COMMIT; If anything, I'd help these users by introducing mechanisms for obtaining a TRUNCATE;INSERT with the bells and whistles of REFRESH MATERIALIZED VIEW. Namely, bulk index rebuilds, skipping WAL, and frozen tuples. > > ... Making sure that > > the heap has at least one page if data has been generated seems > > like a not-entirely-unreasonable way to do that, although there > > remains at least one vacuum bug to fix if we keep it, in addition > > to Tom's concerns. > > No. This is an absolute disaster. It's taking something we have always > considered to be an irrelevant implementation detail and making it into > user-visible DDL state, despite the fact that it doesn't begin to satisfy > basic transactional behaviors. We *need* to get rid of that aspect of > things. If you must have scannability state in version 0, okay, but > it has to be a catalog property not this. In large part, this ended up outside the catalogs due to key limitations of the startup process. This isn't the first time we've arranged an unusual dance for this reason, and it probably won't be the last. Sure, we could take out unlogged MVs to evade the problem, but re-adding them will mean either restoring relfilenode-based bookkeeping or attacking the startup process limitation directly. There exist fundamental challenges to a direct attack, like the potential inconsistency of system catalogs themselves. We could teach pg_relation_is_scannable() that unlogged MVs are always non-scannable during recovery, then somehow update system catalogs in all databases at the end of recovery. Not a project for 9.3, and I wouldn't be surprised to see it mushroom in complexity. The currently-committed approach is a good one given the applicable constraints. A slight variation on the committed approach would be to add a "_scannable" relation fork. The fork would either be absent (not scannable if an MV) or empty (possibly-scannable). Create it in CREATE MATERIALIZED VIEW sans WITH NO DATA and in REFRESH MATERIALIZED VIEW. Remove it in TRUNCATE. When the startup process reinitializes an unlogged relation, it would also remove any _scannable fork. This has a few advantages over the current approach: VACUUM won't need a special case, and pg_upgrade will be in a better position to blow away all traces if we introduce a better approach. The disadvantage is an extra inode per healthy MV. (Though it does not lead to a 9.3 solution, I'll note that an always-present relation metapage would help here.) Note that I said "possibly-scannable" -- the relfilenode-based indicator (whether the committed approach or something else) doesn't need to remain the only input to the question of scannability. If 9.5 introduces the concept of age-based scannability expiration, the applicable timestamp could go in pg_class, and pg_relation_is_scannable() could check both that and the relfilenode-based indicator. Concerning the original $SUBJECT, I would look at fixing the performance problem by having pg_relation_is_scannable() use an algorithm like this: 1. Grab the pg_class entry from syscache. If it's not found, return NULL. 2. If it's not a matview, return false. 3. Lock the matview and try to open a relcache entry. Return NULL on failure. 4. Return the scannability as reported by the relcache. This boils down to the CASE statement you noted upthread, except putting the fast-exit logic in the function rather than in its caller(s). nm -- Noah Misch EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers