Re: [HACKERS] Materialized views WIP patch
Kevin Grittner kgri...@ymail.com wrote: Noah Misch n...@leadboat.com wrote: Also, could you explain the use of RelationCacheInvalidateEntry() in ExecRefreshMatView()? CacheInvalidateRelcacheByRelid() followed by CommandCounterIncrement() is the typical pattern; this is novel. I suspect, though, neither is necessary now that the relcache does not maintain populated status based on a fork size reading. Yeah, that was part of the attempt to support unlogged materialized views while also not returning bogus results if the view had not been populated, using heap file size. I agree that this line can just come out. If there are no objections real soon now, I will remove it in master and the 9.3 branch before the release candidate. Done. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Materialized views WIP patch
Apologies, but this sub-thread got lost when I changed email accounts. I found it in a final review to make sure nothing had fallen through the cracks. Noah Misch n...@leadboat.com wrote: On Thu, Jan 24, 2013 at 01:09:28PM -0500, Noah Misch wrote: There's no documented support for table constraints on MVs, but UNIQUE constraints are permitted: [local] test=# alter materialized view mymv add unique (c); ALTER MATERIALIZED VIEW Fix pushed. Also, could you explain the use of RelationCacheInvalidateEntry() in ExecRefreshMatView()? CacheInvalidateRelcacheByRelid() followed by CommandCounterIncrement() is the typical pattern; this is novel. I suspect, though, neither is necessary now that the relcache does not maintain populated status based on a fork size reading. Yeah, that was part of the attempt to support unlogged materialized views while also not returning bogus results if the view had not been populated, using heap file size. I agree that this line can just come out. If there are no objections real soon now, I will remove it in master and the 9.3 branch before the release candidate. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Materialized views WIP patch
While doing some post-commit review of the 9.3 materialized view feature, I noticed a few loose ends: On Thu, Jan 24, 2013 at 01:09:28PM -0500, Noah Misch wrote: Note that [...] ALTER TABLE ... RENAME CONSTRAINT [is] currently supported for MVs by ALTER TABLE but not by ALTER MATERIALIZED VIEW. There's no documented support for table constraints on MVs, but UNIQUE constraints are permitted: [local] test=# alter materialized view mymv add unique (c); ALTER MATERIALIZED VIEW [local] test=# alter materialized view mymv add check (c 0); ERROR: mymv is not a table [local] test=# alter materialized view mymv add primary key (c); ERROR: mymv is not a table or foreign table The above points still apply. Also, could you explain the use of RelationCacheInvalidateEntry() in ExecRefreshMatView()? CacheInvalidateRelcacheByRelid() followed by CommandCounterIncrement() is the typical pattern; this is novel. I suspect, though, neither is necessary now that the relcache does not maintain populated status based on a fork size reading. Thanks, nm -- Noah Misch EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Materialized views WIP patch
On Wed, Mar 6, 2013 at 09:16:59AM -0500, Tom Lane wrote: Simon Riggs si...@2ndquadrant.com writes: On 5 March 2013 22:02, Tom Lane t...@sss.pgh.pa.us wrote: FWIW, my opinion is that doing anything like this in the planner is going to be enormously expensive. As we already said: no MVs = zero overhead = no problem. Well, in the first place that statement is false on its face: we'll still spend cycles looking for relevant MVs, or at least maintaining a complexly-indexed cache that helps us find out that there are none in a reasonable amount of time. In the second place, even if it were approximately true it wouldn't help the people who were using MVs. It costs in the cases where time savings are possible and not otherwise. And that is just complete nonsense: matching costs whether you find a match or not. Could we have a little less Pollyanna-ish optimism and a bit more realism about the likely cost of such a feature? Should we add this to the TODO list as a possibility? -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Materialized views WIP patch
On Tue, Mar 5, 2013 at 08:50:39AM +, Simon Riggs wrote: Its not a different feature, its what most people expect a feature called MV to deliver. That's not a matter of opinion, its simply how every other database works currently - Oracle, Teradata, SQLServer at least. The fact that we don't allow MVs to automatically optimize Good points. queries is acceptable, as long as that is clearly marked in some way to avoid confusion, and I don't mean buried on p5 of the docs. What we have here is a partial implementation that can be improved upon over next few releases. I hope anyone isn't going to claim that Materialized Views have been implemented in the release notes in this release, because unqualified that would be seriously misleading and might even stifle further funding to improve things to the level already implemented elsewhere. Just to reiterate, I fully support the committing of this partial feature into Postgres in this release because it will be a long haul to complete the full feature and what we have here is a reasonable stepping stone to get there. Transactionally up-yo-date MVs can be used like indexes in the planner. The idea that this is impossible because of the permutations involved is somewhat ridiculous; there is much published work on optimising that and some obvious optimisations. Clearly that varies according to the number of MVs and the number of tables they touch, not the overall complexity of the query. The overhead is probably same or less as partial indexes, which we currently think is acceptable. In any case, if you don't wish that overhead, don't use MVs. While you are right that automatically using materialized views is like the optimizer choosing partial indexes, we actually already have auto-selection of row-level materialized views with expression indexes and index-only scans. When you do the insert or update, the indexed function is called and the value stored in the index. If you later query the function call, we can pull the value right from the index. This, of course, is a very crude definition of materialized view, but it seems relevant. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Materialized views WIP patch
On Thu, Mar 7, 2013 at 12:14 PM, David E. Wheeler da...@justatheory.com wrote: On Mar 7, 2013, at 7:55 AM, Kevin Grittner kgri...@ymail.com wrote: If the answer to both those questions is “yes,” I think the term should remain “table,” with a few mentions that the term includes materialized views (and excludes foreign tables). And if the answers are not exactly and yes? I still tend to think that the term should remain “table,” with brief mentions at the top of pages when the term should be assumed to represent tables and matviews, and otherwise required disambiguations. This ship has already sailed. There are plenty of places where operations apply to a subset of the relation types that exist today, and we either list them out or refer to relations generically. Changing that would require widespread changes to both the documentation and existing error message text. We cannot decide that table now means table or materialized view any more than we can decide that it means table or foreign table, as was proposed around the time those changes went in. Yeah, it's more work, and it's a little annoying, but it's also clear. Nothing else is. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Materialized views WIP patch
On 6 March 2013 14:16, Tom Lane t...@sss.pgh.pa.us wrote: Simon Riggs si...@2ndquadrant.com writes: On 5 March 2013 22:02, Tom Lane t...@sss.pgh.pa.us wrote: FWIW, my opinion is that doing anything like this in the planner is going to be enormously expensive. As we already said: no MVs = zero overhead = no problem. Well, in the first place that statement is false on its face: we'll still spend cycles looking for relevant MVs, or at least maintaining a complexly-indexed cache that helps us find out that there are none in a reasonable amount of time. In the second place, even if it were approximately true it wouldn't help the people who were using MVs. We can store info in the relcache, and reduce such a lookup to a simple if test in the planner. Populating the cache would be easy enough, approx same overhead as deriving list of constraints for the relcache. If you were using MVs, there are further heuristics to apply. MVs come in various shapes, so we can assess whether they use aggregates, joins, filters etc and use that for a general match against a query. I don't see the need for complex assessments in every case. It costs in the cases where time savings are possible and not otherwise. And that is just complete nonsense: matching costs whether you find a match or not. Could we have a little less Pollyanna-ish optimism and a bit more realism about the likely cost of such a feature? It's not a trivial feature; this is a lot of work. But it can be done efficiently, without significant effect on other workloads. If that really were to be true, then enable_lookaside = off can be the default, just as we have for another costly planning feature, constraint_exclusion. Matview lookaside is the next-best-action for further work on the planner, AFAICS. Correctly optimised query parallelism is harder, IMHO. What I'm hearing at the moment is please don't make any changes in my area or don't climb the North face. Considering the rather high bar to being able to do this effectively, I do understand your interest in not having your/our time wasted by casual attempts to approach the problem, but I don't want to slam the door on a serious attempt (2 year project, 1+ man year effort). -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Materialized views WIP patch
David E. Wheeler da...@justatheory.com wrote: Kevin Grittner kgri...@ymail.com wrote: I also think that something should be done about the documentation for indexes. Right now that always refers to a table. It would clearly be awkward to change that to table or materialized view everywhere. I wonder if most of thosse should be changed to relation with a few mentions that the relation could be a table or a materialized view, or whether some less intrusive change would be better. Opinions welcome. Isn’t a materialized view really just a table that gets updated periodically? Not exactly. It is a relation which has some characteristics of a view (including an entry in pg_rewrite exactly like that for a view) and some characteristics of a table (including a heap and optional indexes). Whether it looks more like a table or more like a view depends on how you tilt your head. You could just as easily say that it is really just a view which periodically caches its results on disk. They really are their own thing. As future releases add more subtle freshness concepts, incremental updates, and query rewrite that unique identity will become even more conspicuous, I think. And isn’t a non-matierialized view also thought of as a “relation”? Yes. Tables, views, and materialized views are all relations. If the answer to both those questions is “yes,” I think the term should remain “table,” with a few mentions that the term includes materialized views (and excludes foreign tables). And if the answers are not exactly and yes? -- Kevin Grittner EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Materialized views WIP patch
On Mar 7, 2013, at 7:55 AM, Kevin Grittner kgri...@ymail.com wrote: If the answer to both those questions is “yes,” I think the term should remain “table,” with a few mentions that the term includes materialized views (and excludes foreign tables). And if the answers are not exactly and yes? I still tend to think that the term should remain “table,” with brief mentions at the top of pages when the term should be assumed to represent tables and matviews, and otherwise required disambiguations. Trying to make the least possible work for you here. :-) David -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Materialized views WIP patch
2013/3/5 Kevin Grittner kgri...@ymail.com: Perhaps it would be worth looking for anything in the patch that you think might be painting us into a corner where it would be hard to do all the other cool things. While it's late enough in the process that changing anything like that which you find would be painful, it might be a lot more painful later if we release without doing something about it. My hope, of course, is that you won't find any such thing. With this patch I've tried to provide a minimal framework onto which these other things can be bolted. I've tried hard not to do anything which would make it hard to extend, but new eyes may see something I missed. (Without having looked at the patch, or even the documentation :-/.) I think that something that might prove useful is the following: Keeping in mind the possibility of storing something in the matview’s heap that doesn’t correspond to what a SELECT * FROM matview would yield (i.e., the “logical content”). The transformation could be performed by an INSTEAD rule (similar to how a view is expanded to its definition, a reference to a matview would expand to its heap content transformed to the “logical content”). (Note that I don’t have any reason to believe that the current implementation would make this more difficult than it should be.) ridiculously long rationale for the previous (All the following requires making matviews (inter- and intra-) transactionally up-to-date w.r.t. their base tables at the moment of querying them. I don’t deal with approximate results, however useful that might be.) I think that the possibility of optimizing COUNT(*) (see mail by Greg Stark in this thread with “the queue of updates with transacion information that are periodically flattened into the aggregate”) can be generalized to generic aggregation that way. The idea would be that a transaction that adds (or deletes or updates) a row in a base table causes a “delta” row version in the matview. Selecting from the matview then merges these deltas into one value (for each row that is logically present in the matview). Every once in a while (or rather quite often, if the base tables change often), a VACUUM-like clean-up operations must be run to merge all rows that are “old enough” (i.e., whose transactions are not in flight anymore). Example of trivial aggregation matview weight_per_kind defined as: SELECT kind, SUM(weight) FROM fruit GROUP BY kind; The matview would then physically contain rows such as: xmin, xmax, kind, weight 1000, 0, 'banana', 123 1000, 0, 'apple', 1 1001, 0, 'banana', 2 1002, 0, 'banana', -3 Which means: * tx 1000 probably performed a clean-up operation and merged a bunch of banana rows together to yield 123; it also inserted an apple of weight 1. * tx 1001 inserted a banana of weight 2. Any clean-up operation coming by could not merge the 2 into the first row, as long as tx 1000 is in flight. Otherwise, it would yield 125; physically this would mean adding a 125 row, marking the 123 and 2 rows as deleted, and then waiting for VACUUM to remove them). * tx 1002 deleted a banana with weight 3. The result of a SELECT * FROM weight_per_kind; would actually execute SELECT kind, SUM_merge(weight) FROM heap_of_weight_per_kind GROUP BY kind; This would result, for tx 1001 (assuming tx 1000 committed and our snapshot can see it), in: kind, weight 'banana', 125 'apple', 1 (The -3 is not taken into account, because it is not visible to tx 1001.) The operator to use at the location of SUM_merge is something that merges multiple aggregation results (plus results that represent some kind of “negation”) together. For SUM, it would be SUM itself and the negation would be numerical negation. There might also be some kind of “difference” concept used for UPDATE: When updating a weight from 4 to 3, the difference would be -1. Those additional properties could optionally be added to the definition of each aggregation function; It must be done for each function that you want to use in such a way. Other aggregation functions such as AVG would require storing the SUM + number of rows in the matview (otherwise two AVGs could not be merged); again a difference between the heap and the logical content. Functions such as MIN and MAX are more difficult to fit in this framework: I can only see how it would work if row deletion were not allowed (which might still be a valid use-case); luckily, I think MIN and MAX are not the typical things for which you would want to use matviews, because quick computation can typically be done directly using the base tables. This whole thing would result in incremental updates of aggregation-matviews that don’t require physical serialization of the transactions that update the base tables and that query the matview, which other models (that depend on correspondence between the heap and logical content of matviews) would probably require. And that’s where I stop rambling because nobody gets this far anyway,
Re: [HACKERS] Materialized views WIP patch
On 5 March 2013 22:02, Tom Lane t...@sss.pgh.pa.us wrote: FWIW, my opinion is that doing anything like this in the planner is going to be enormously expensive. Index matching is already pretty expensive, and that has the saving grace that we only do it once per base relation. Your sketch above implies trying to match to MVs once per considered join relation, which will be combinatorially worse. Even with a lot of sweat over reducing the cost of the matching, it will hurt. As we already said: no MVs = zero overhead = no problem. It costs in the cases where time savings are possible and not otherwise. The type of queries and their typical run times are different to the OLTP case, so any additional planning time is likely to be acceptable. I'm sure we can have a deep disussion about how to optimise the planner for this; I'm pretty sure there are reasonable answers to the not-small difficulties along that road. Most importantly, I want to make sure we don't swing the door shut on improvements here, especially if you (Tom) are not personally convinced enough to do the work yourself. Making transactional MVs work would be in part justified by the possible existence of automatic lookaside planning, so yes, the two things are not linked but certainly closely related. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Materialized views WIP patch
Simon Riggs si...@2ndquadrant.com writes: On 5 March 2013 22:02, Tom Lane t...@sss.pgh.pa.us wrote: FWIW, my opinion is that doing anything like this in the planner is going to be enormously expensive. As we already said: no MVs = zero overhead = no problem. Well, in the first place that statement is false on its face: we'll still spend cycles looking for relevant MVs, or at least maintaining a complexly-indexed cache that helps us find out that there are none in a reasonable amount of time. In the second place, even if it were approximately true it wouldn't help the people who were using MVs. It costs in the cases where time savings are possible and not otherwise. And that is just complete nonsense: matching costs whether you find a match or not. Could we have a little less Pollyanna-ish optimism and a bit more realism about the likely cost of such a feature? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Materialized views WIP patch
Tatsuo Ishii is...@postgresql.org wrote: Was the remaining work on docs done? I would like to test MVs and am waiting for the docs completed. I think they are done. If you notice anything missing or in need of clarification please let me know. At this point missing docs would be a bug in need of a fix. -- Kevin Grittner EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Materialized views WIP patch
Kevin, I haven't seen a reply to this. Were you able to give my notes below any consideration? On 2/15/13 12:44 PM, Peter Eisentraut wrote: On 1/25/13 1:00 AM, Kevin Grittner wrote: New patch rebased, fixes issues raised by Thom Brown, and addresses some of your points. This patch doesn't apply anymore, so I just took a superficial look. I think the intended functionality and the interfaces look pretty good. Documentation looks complete, tests are there. I have a couple of notes: * What you call WITH [NO] DATA, Oracle calls BUILD IMMEDIATE/DEFERRED. It might be better to use that as well then. * You use fields named relkind in the parse nodes, but they don't actually contain relkind values, which is confusing. I'd just name the field is_matview or something. * More generally, I wouldn't be so fond of combining the parse handling of CREATE TABLE AS and CREATE MATERIALIZED VIEW. They are similar, but then again so are a lot of other things. * Some of the terminology is inconsistent. A materialized view is sometimes called valid, populated, or built, with approximately the same meaning. Personally, I would settle on built, as per above, but it should be one term only. * I find the name of the relisvalid column a bit confusing. Especially because it only applies to materialized views, and there is already a meaning of valid for indexes. (Recall that indexes are also stored in pg_class, but they are concerned about indisvalid.) I would name it something like relmvbuilt. Btw., half of the patch seems to consist of updating places referring to relkind. Is something wrong with the meaning of relkind that this sort of thing is required? Maybe these places should be operating in terms of features, not accessing relkind directly. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Materialized views WIP patch
On Tue, Mar 5, 2013 at 9:08 PM, Robert Haas robertmh...@gmail.com wrote: All that having been said, it's hard for me to imagine that anyone really cares about any of this until we have an incremental update feature, which right now we don't. Actually, I'm betting that's going to be significantly harder than automatic-query-rewrite, when all is said and done. Automatic-query-rewrite, if and when we get it, will not be easy and will require a bunch of work from someone with a good understanding of the planner, but it strikes me as the sort of thing that might work out to one large project and then it's done. Whereas, incremental update sounds to me like a series of projects over a series of releases targeting various special cases, where we can always point to some improvements vs. release N-1 but we're never actually done and able to move on to the next thing. As a roadmap goes, I think that's OK. Even a reasonably simplistic and partial implementation of incremental update will benefit a lot of users. But in terms of relative difficulty, it's not at all obvious to me that that's the easier part of the project. While true that's true for a lot of Postgres features. The only ones that are one-shot projects are buried deep in the internals. Anything with UI implications inevitably has limitations and then other people come along and and work on removing or extending those features. I do agree with Tom though -- the most frequently asked for materialized view in the past has always been select count(*) from tab. People assume we already do this and are surprised when we don't. The cookie cutter solution for it is basically exactly what a incrementally updated materialized view solution would look like (with the queue of updates with transacion information that are periodically flattened into the aggregate). Rewriting this might be a bit tricky and require heuristics to determine just how much work to expend trying to match materialized views, this type of view would be where most of the win would be. I also can't see implementing query rewriting for non-transactionally-accurate materialized views. If people want a snapshot of the data that may be out of date that's great. I can tons of use cases for that. But then surely they won't be surprised to have to query the snapshot explicitly. If can't see going to all this trouble to implement transactions and snapshots and wal logging and so on and then silently rewriting queries to produce data that is not up to date. I think users would be surprised to find bog-standard SQL occasionally producing incorrect results. That said, there are cases where snapshots might be up to date even though we don't implement any incremental updates. If the underlying data is read-only or hasn't received any update commits since the snapshot was taken then it might still be useful. There are tons of ETL applications where you load the data once and then build MVs for it and never touch the underlying data again. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Materialized views WIP patch
Kevin Grittner kgri...@ymail.com wrote: Tatsuo Ishii is...@postgresql.org wrote: Was the remaining work on docs done? I would like to test MVs and am waiting for the docs completed. I think they are done. If you notice anything missing or in need of clarification please let me know. At this point missing docs would be a bug in need of a fix. I decided to take another pass through to try to spot anyplace I might have missed. I found that I had missed documenting the new pg_matviews system view, so I have just pushed that. I also think that something should be done about the documentation for indexes. Right now that always refers to a table. It would clearly be awkward to change that to table or materialized view everywhere. I wonder if most of thosse should be changed to relation with a few mentions that the relation could be a table or a materialized view, or whether some less intrusive change would be better. Opinions welcome. -- Kevin Grittner EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Materialized views WIP patch
On Mar 6, 2013, at 1:51 PM, Kevin Grittner kgri...@ymail.com wrote: I also think that something should be done about the documentation for indexes. Right now that always refers to a table. It would clearly be awkward to change that to table or materialized view everywhere. I wonder if most of thosse should be changed to relation with a few mentions that the relation could be a table or a materialized view, or whether some less intrusive change would be better. Opinions welcome. Isn’t a materialized view really just a table that gets updated periodically? And isn’t a non-matierialized view also thought of as a “relation”? If the answer to both those questions is “yes,” I think the term should remain “table,” with a few mentions that the term includes materialized views (and excludes foreign tables). Best, David -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Materialized views WIP patch
Kevin Grittner kgri...@ymail.com wrote: Tatsuo Ishii is...@postgresql.org wrote: Was the remaining work on docs done? I would like to test MVs and am waiting for the docs completed. I think they are done. If you notice anything missing or in need of clarification please let me know. At this point missing docs would be a bug in need of a fix. Ok. I decided to take another pass through to try to spot anyplace I might have missed. I found that I had missed documenting the new pg_matviews system view, so I have just pushed that. I also think that something should be done about the documentation for indexes. Right now that always refers to a table. It would clearly be awkward to change that to table or materialized view everywhere. I wonder if most of thosse should be changed to relation with a few mentions that the relation could be a table or a materialized view, or whether some less intrusive change would be better. Opinions welcome. You might want to add small description about MVs to Tutorial documentation 3.2 Views. Here is the current description of views in the doc. - 3.2. Views Refer back to the queries in Section 2.6. Suppose the combined listing of weather records and city location is of particular interest to your application, but you do not want to type the query each time you need it. You can create a view over the query, which gives a name to the query that you can refer to like an ordinary table: CREATE VIEW myview AS SELECT city, temp_lo, temp_hi, prcp, date, location FROM weather, cities WHERE city = name; SELECT * FROM myview; Making liberal use of views is a key aspect of good SQL database design. Views allow you to encapsulate the details of the structure of your tables, which might change as your application evolves, behind consistent interfaces. Views can be used in almost any place a real table can be used. Building views upon other views is not uncommon. - -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese: http://www.sraoss.co.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Materialized views WIP patch
On 3 March 2013 23:39, Tom Lane t...@sss.pgh.pa.us wrote: Nicolas Barbier nicolas.barb...@gmail.com writes: 2013/3/3 Kevin Grittner kgri...@ymail.com: Nicolas Barbier nicolas.barb...@gmail.com wrote: I think that automatically using materialized views even when the query doesn’t mention them directly, is akin to automatically using indexes without having to mention them in the query. Oh, I understand that concept perfectly well, I just wonder how often it is useful in practice. There's a much more fundamental reason why this will never happen, which is that the query planner is not licensed to decide that you only want an approximate and not an exact answer to your query. If MVs were guaranteed always up-to-date, maybe we could think about automatic use of them --- but that's a far different feature from what Kevin has here. Its not a different feature, its what most people expect a feature called MV to deliver. That's not a matter of opinion, its simply how every other database works currently - Oracle, Teradata, SQLServer at least. The fact that we don't allow MVs to automatically optimize queries is acceptable, as long as that is clearly marked in some way to avoid confusion, and I don't mean buried on p5 of the docs. What we have here is a partial implementation that can be improved upon over next few releases. I hope anyone isn't going to claim that Materialized Views have been implemented in the release notes in this release, because unqualified that would be seriously misleading and might even stifle further funding to improve things to the level already implemented elsewhere. Just to reiterate, I fully support the committing of this partial feature into Postgres in this release because it will be a long haul to complete the full feature and what we have here is a reasonable stepping stone to get there. Transactionally up-yo-date MVs can be used like indexes in the planner. The idea that this is impossible because of the permutations involved is somewhat ridiculous; there is much published work on optimising that and some obvious optimisations. Clearly that varies according to the number of MVs and the number of tables they touch, not the overall complexity of the query. The overhead is probably same or less as partial indexes, which we currently think is acceptable. In any case, if you don't wish that overhead, don't use MVs. Non-transactionally up-to-date MVs could also be used like indexes in the planner, if we gave the planner the licence it (clearly) lacks. If using MV makes a two-hour query return in 1 minute, then using an MV that is 15 minutes out of date is likely to be a win. The licence is some kind of user parameter/option that specifies how stale an answer a query can return. For many queries that involve averages and sums, a stale or perhaps an approximate answer would hardly differ anyway. So I think there is room somewhere there for a staleness time specification by the user, allowing approximation. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Materialized views WIP patch
Simon Riggs si...@2ndquadrant.com wrote: On 3 March 2013 23:39, Tom Lane t...@sss.pgh.pa.us wrote: Nicolas Barbier nicolas.barb...@gmail.com writes: 2013/3/3 Kevin Grittner kgri...@ymail.com: Nicolas Barbier nicolas.barb...@gmail.com wrote: I think that automatically using materialized views even when the query doesn’t mention them directly, is akin to automatically using indexes without having to mention them in the query. Oh, I understand that concept perfectly well, I just wonder how often it is useful in practice. There's a much more fundamental reason why this will never happen, which is that the query planner is not licensed to decide that you only want an approximate and not an exact answer to your query. If MVs were guaranteed always up-to-date, maybe we could think about automatic use of them --- but that's a far different feature from what Kevin has here. Its not a different feature, its what most people expect a feature called MV to deliver. That's not a matter of opinion, its simply how every other database works currently - Oracle, Teradata, SQLServer at least. The fact that we don't allow MVs to automatically optimize queries is acceptable, as long as that is clearly marked in some way to avoid confusion, and I don't mean buried on p5 of the docs. What we have here is a partial implementation that can be improved upon over next few releases. I hope anyone isn't going to claim that Materialized Views have been implemented in the release notes in this release, because unqualified that would be seriously misleading and might even stifle further funding to improve things to the level already implemented elsewhere. Just to reiterate, I fully support the committing of this partial feature into Postgres in this release because it will be a long haul to complete the full feature and what we have here is a reasonable stepping stone to get there. Transactionally up-yo-date MVs can be used like indexes in the planner. The idea that this is impossible because of the permutations involved is somewhat ridiculous; there is much published work on optimising that and some obvious optimisations. Clearly that varies according to the number of MVs and the number of tables they touch, not the overall complexity of the query. The overhead is probably same or less as partial indexes, which we currently think is acceptable. In any case, if you don't wish that overhead, don't use MVs. Non-transactionally up-to-date MVs could also be used like indexes in the planner, if we gave the planner the licence it (clearly) lacks. If using MV makes a two-hour query return in 1 minute, then using an MV that is 15 minutes out of date is likely to be a win. The licence is some kind of user parameter/option that specifies how stale an answer a query can return. For many queries that involve averages and sums, a stale or perhaps an approximate answer would hardly differ anyway. So I think there is room somewhere there for a staleness time specification by the user, allowing approximation. I don't think I disagree with any of what Simon says other than his feelings about the planning cost. Imagine that there are ten MVs that might apply to a complex query, but some of them are mutually exclusive, so there are a large number of permutations of MVs which could be used to replace parts of the original query. And maybe some of base tables have indexes which could reduce execution cost which aren't present in some or all of the MVs. And each MV has a number of indexes. The combinatorial explosion of possible plans would make it hard to constrain plan time without resorting to some crude rules about what to choose. That's not an unsolvable problem, but I see it as a pretty big problem. I have no doubt that someone could take a big data warehouse and add one or two MVs and show a dramatic improvement in the run time of a query. Almost as big as if the query were rewritten to usee the MV directly. It would make for a very nice presentation, and as long as they are used sparingly this could be a useful tool for a data warehouse environment which is playing with alternative ways to optimize slow queries which pass a lot of data. In other environments, I feel that it gets a lot harder to show a big win. The good news is that it sounds like we agree on the ideal long-term feature set. I'm just a lot more excited, based on the use-cases I've seen, about the addition of incremental updates than substituting MVs into query plans which reference the underlying tables. Perhaps that indicates a chance to the final feature set sooner, through everyone scratching their own itches. :-) And we both seem to feel that some system for managing acceptable levels of MV freshness is a necessary feature in order to go very much further. -- Kevin Grittner EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list
Re: [HACKERS] Materialized views WIP patch
On 5 March 2013 12:15, Kevin Grittner kgri...@ymail.com wrote: I don't think I disagree with any of what Simon says other than his feelings about the planning cost. Imagine that there are ten MVs that might apply to a complex query, but some of them are mutually exclusive, so there are a large number of permutations of MVs which could be used to replace parts of the original query. And maybe some of base tables have indexes which could reduce execution cost which aren't present in some or all of the MVs. And each MV has a number of indexes. The combinatorial explosion of possible plans would make it hard to constrain plan time without resorting to some crude rules about what to choose. That's not an unsolvable problem, but I see it as a pretty big problem. If we are proposing that we shouldn't try to optimise because its not usefully solvable, then I would disagree. If we are saying that more plans are possible with MVs, then I'd say, yes there *could* be - that's the one of the purposes. That represents more options for optimisation and we should be happy, not sad about that. Yes, we would need some thought to how those potential optimisations can be applied without additional planning cost, but I see that as a long term task, not a problem. The question is will the potential for additional planning time actually materialise into a planning problem? (See below). I have no doubt that someone could take a big data warehouse and add one or two MVs and show a dramatic improvement in the run time of a query. Almost as big as if the query were rewritten to usee the MV directly. It would make for a very nice presentation, and as long as they are used sparingly this could be a useful tool for a data warehouse environment which is playing with alternative ways to optimize slow queries which pass a lot of data. In other environments, I feel that it gets a lot harder to show a big win. Are there realistically going to be more options to consider? In practice, no, because in most cases we won't be considering both MVs and indexes. Splatting MVs on randomly won't show any more improvement than splatting indexes on randomly helps. Specific optimisations help in specific cases only. And of course, adding extra data structures that have no value certainly does increase planning time. Presumably we'd need some way of seeing how frequently MVs were picked, so we could drop unused MVs just like we can indexes. * Indexes are great at speeding up various kinds of search queries. If you don't have any queries like that, they help very little. * MVs help in specific and restricted use cases, but can otherwise be thought of as a new kind of index structure. MVs help with joins and aggregations, so if you don't do much of that, you'll see no benefit. That knowledge also allows us to develop heuristics for sane optimisation. If the MV has a GROUP BY in it, and a query doesn't, then it probably won't help much to improve query times. If it involves a join you aren't using, then that won't help either. My first guess would be that we don't even bother looking for MV plans unless it has an aggregated result, or a large scale join. We do something similar when we look for plans that use indexes when we have appropriate quals - no quals, no indexes. As a result, I don't see MVs increasing planning times for requests that would make little or no use of them. There will be more planning time on queries that could make use of them and that is good because we really care about that. Sure, a badly written planner might cost more time than it saves. All of this work requires investment from someone with the time and/or experience to make a good go at it. I'm not pushing Tom towards it, nor anyone else, but I do want to see the door kept open for someone to do this when possible (i.e. not GSoC). -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Materialized views WIP patch
Simon, Kevin, all: Actually, there was already an attempt at automated MV query planning as a prior university project. We could mine that for ideas. Hmmm. I thought it was on pgfoundry, but it's not. Does anyone have access to ACM databases etc. so they could search for this? -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Materialized views WIP patch
On Tue, Mar 5, 2013 at 7:15 AM, Kevin Grittner kgri...@ymail.com wrote: I don't think I disagree with any of what Simon says other than his feelings about the planning cost. Imagine that there are ten MVs that might apply to a complex query, but some of them are mutually exclusive, so there are a large number of permutations of MVs which could be used to replace parts of the original query. And maybe some of base tables have indexes which could reduce execution cost which aren't present in some or all of the MVs. And each MV has a number of indexes. The combinatorial explosion of possible plans would make it hard to constrain plan time without resorting to some crude rules about what to choose. That's not an unsolvable problem, but I see it as a pretty big problem. I'm not sure I agree. Suppose you have a query like SELECT * FROM a INNER JOIN b ON a.x = b.x INNER JOIN c ON a.y = c.y WHERE some stuff. The query planner will construct paths for scans on a, b, and c. Then it will construct joinrels for (a b), (a c), (b c), and eventually (a b c) and calculate a set of promising paths for each of them. If there is a materialized view available for one of those joinrels, all we really need to do is add the possible paths for scanning that materialized view to the joinrel. They'll either be better than the existing paths, or they won't. If they're better, the paths that otherwise would have gotten chosen will get kicked out. If they're worse, the materialized-view paths will get kicked out. Either way, we don't have any more paths than we would have otherwise - so no combinatorial explosion. There is one case where we might end up with more paths than we had before. Suppose there's a materialized view on the query SELECT a.x, a.y, a.z, b.t FROM a INNER JOIN b ON a.x = b.x ORDER BY a.z and the users enters just that query. Suppose further that the materialized view has an index on column z, but table a does not. In that case, the best paths for the joinrel (a b) not involving the materialized view will be an unordered path, but we could scan the materialized view using the index and so will have a path that is ordered by a.z to add to the joinrel. This path will stick around even if it's more expensive than the unordered path because we know it avoids a final sort. So in that case we do have more paths, but they are potentially useful paths, so I don't see that as a problem. It seems to me that the tricky part of this is not that it might add a lot more paths (because I don't think it will, and if it does I think they're useful paths), but that figuring out whether or not a materialized view matches any particular joinrel might be expensive. I think we're going to need a pretty accurate heuristic for quickly discarding materialized views that can't possibly be relevant to the query as a whole, or to particular joinrels. There are a couple of possible ways to approach that. The most manual of these is probably to have a command like ALTER TABLE x {ENABLE | DISABLE } REWRITE MATERIALIZED y, where the user has to explicitly ask for materialized views to be considered, or else they aren't. That sort of fine-grained control might have other benefits as well. I think a more automated solution is also possible, if we want it. For a materialized view to match a query, all of the baserels in the materialized view must also be present in the query. (Actually, there are situations where this isn't true; e.g. the materialized view has an extra table, but it's joined in a way that could be pruned away by the join removal code, but I think we could ignore such somewhat-pathological cases at least initially.) It seems to me that if we could figure out a very-cheap way to throw away all of the materialized views that don't pass that basic test, we'd be reasonably close to a workable solution. A database with tons of materialized views defined on the same set of target relations might still have some planning time problems, but so might a database with tons of indexes. In that regard, what I was thinking about is to use something sort of like a Bloom filter. Essentially, produce a fingerprint for each materialized view. For the sake of argument, let's say that the fingerprint is a 64-bit integer, although it could be a bit string of any length. To construct the fingerprint, hash the OID of each relation involved in the view onto a bit position between 0 and 63. Set the bits for all relations involved in the materialized view. Then, construct a fingerprint for the query in the same way. Any materialized view where (matview_fingerprint query_fingerprint) != matview_fingerprint needn't be considered; moreover, for a given joinrel, any materialized view where matview_fingerprint != joinrel_fingerprint (computed using the same scheme) needn't be considered. Of course, a matching fingerprint doesn't mean that the materialized view matches, or even necessarily that it involves the correct
Re: [HACKERS] Materialized views WIP patch
Josh Berkus j...@agliodbs.com wrote: There is no shortage of literature on the topic, although any papers from the ACM could certainly be of interest due to the general high quality of papers published there. Adding anything like this to 9.3 is clearly out of the question, though, so I really don't want to spend time researching this now, or encouraging others to do so until after we have a 9.3 release candidate. -- Kevin Grittner EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Materialized views WIP patch
Robert Haas robertmh...@gmail.com wrote: All that having been said, it's hard for me to imagine that anyone really cares about any of this until we have an incremental update feature, which right now we don't. These are actually independent of one another, as long as we nail down how we're determining freshness -- which is probably needed for either. Someone who's immersed in tuning long-running DW queries might be interested in this before incremental update. (They might load the data once per month, so refreshing the MVs as a step in that process might be cheaper than incrementally maintaining them.) Someone could base freshness on pg_relation_is_scannable() and start working on automatic query rewrite right now, if they wanted to. Actually, I'm betting that's going to be significantly harder than automatic-query-rewrite, when all is said and done. Automatic-query-rewrite, if and when we get it, will not be easy and will require a bunch of work from someone with a good understanding of the planner, but it strikes me as the sort of thing that might work out to one large project and then it's done. I still think we're going to hit the wall on planning time under certain circumstances and need to tweak that over the course of several releases, but now is not the time to get into the details of why I think that. We've spent way too much time on it already for the point we're at in the 9.3 cycle. I've kept my concerns hand-wavy on purpose, and am trying hard to resist the temptation to spend a lot of time demonstrating the problems. Whereas, incremental update sounds to me like a series of projects over a series of releases targeting various special cases, where we can always point to some improvements vs. release N-1 but we're never actually done Exactly. I predict that we will eventually have some special sort of trigger for maintaining MVs based on base table changes to handle the ones that are just too expensive (in developer time or run time) to fully automate. But there is a lot of low-hanging fruit for automation. Even a reasonably simplistic and partial implementation of incremental update will benefit a lot of users. Agreed. But in terms of relative difficulty, it's not at all obvious to me that that's the easier part of the project. I totally agree that getting something working to use MVs in place of underlying tables is not all that different or more difficult than using partial indexes. I just predict that we'll get a lot of complaints about cases where it results in worse performance and we'll need to deal with those issues. I don't seem that as being brain surgery; just a messy matter of trying to get this pretty theory to work well in the real world -- probably using a bunch of not-so-elegant heuristics. And in the end, the best you can hope for is performance not noticeably worse than you would get if you modified your query to explicitly use the MV(s) -- you're just saving yourself the rewrite. Well, OK, there is the point that, (like indexes) if you run the query which hits the base tables with different parameters, and a new plan is generated each time, it might pick different MVs or exclude them as is most efficient for the given parameters. That's the Holy Grail of all this. -- Kevin Grittner EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Materialized views WIP patch
2013/3/5 Robert Haas robertmh...@gmail.com: All that having been said, it's hard for me to imagine that anyone really cares about any of this until we have an incremental update feature, which right now we don't. Actually, I'm betting that's going to be significantly harder than automatic-query-rewrite, when all is said and done. I agree. E.g., things such as keeping a matview consistent relative to changes applied to the base tables during the same transaction, might be mightily difficult to implement in a performant way. OTOH, matviews that can only be used for optimization if their base tables were not changed “too recently” (e.g., by transactions that are still in flight, including the current transaction), are probably kind of useful in themselves as long as those base tables are not updated all the time. Nicolas -- A. Because it breaks the logical sequence of discussion. Q. Why is top posting bad? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Materialized views WIP patch
Robert Haas robertmh...@gmail.com writes: On Tue, Mar 5, 2013 at 7:15 AM, Kevin Grittner kgri...@ymail.com wrote: I don't think I disagree with any of what Simon says other than his feelings about the planning cost. I'm not sure I agree. Suppose you have a query like SELECT * FROM a INNER JOIN b ON a.x = b.x INNER JOIN c ON a.y = c.y WHERE some stuff. The query planner will construct paths for scans on a, b, and c. Then it will construct joinrels for (a b), (a c), (b c), and eventually (a b c) and calculate a set of promising paths for each of them. If there is a materialized view available for one of those joinrels, all we really need to do is add the possible paths for scanning that materialized view to the joinrel. That only works to the extent that a materialized view can be described by a path. My impression is that most of the use-cases for MVs will involve aggregates or similar data reduction operators, and we don't currently implement anything about aggregates at the Path level. Arguably it would be useful to do so; in particular, we could get rid of the currently hard-wired mechanism for choosing between sorted and hashed aggregation, and perhaps there'd be a less grotty way to deal with index-optimized MIN/MAX aggregates. But there's a great deal to do to make that happen, and up to now I haven't seen any indication that it would do much except add overhead. FWIW, my opinion is that doing anything like this in the planner is going to be enormously expensive. Index matching is already pretty expensive, and that has the saving grace that we only do it once per base relation. Your sketch above implies trying to match to MVs once per considered join relation, which will be combinatorially worse. Even with a lot of sweat over reducing the cost of the matching, it will hurt. All that having been said, it's hard for me to imagine that anyone really cares about any of this until we have an incremental update feature, which right now we don't. Agreed. Even if we're willing to have an approximate results are OK GUC (which frankly strikes me as a horrid idea), people would certainly not be willing to turn it on without some guarantee as to how stale the results could be. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Materialized views WIP patch
2013/3/5 Kevin Grittner kgri...@ymail.com: Exactly. I predict that we will eventually have some special sort of trigger for maintaining MVs based on base table changes to handle the ones that are just too expensive (in developer time or run time) to fully automate. But there is a lot of low-hanging fruit for automation. I think it would be totally OK to restrict the possible definitions for matviews that can be maintained fully incrementally to something like: SELECT attributes and aggregations FROM trivial joins WHERE trivial condition GROUP BY attributes; Those definitions are the most useful for optimizing the things that matviews are good at (joins and aggregation). Nicolas PS. Sorry for having fired off this discussion that obviously doesn’t really relate to the current patch. -- A. Because it breaks the logical sequence of discussion. Q. Why is top posting bad? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Materialized views WIP patch
Nicolas Barbier nicolas.barb...@gmail.com wrote: PS. Sorry for having fired off this discussion that obviously doesn’t really relate to the current patch. I know it's hard to resist. While I think there will be a number of people for whom the current patch will be a convenience and will therefore use it, it is hard to look at what's there and *not* go if only it also... Perhaps it would be worth looking for anything in the patch that you think might be painting us into a corner where it would be hard to do all the other cool things. While it's late enough in the process that changing anything like that which you find would be painful, it might be a lot more painful later if we release without doing something about it. My hope, of course, is that you won't find any such thing. With this patch I've tried to provide a minimal framework onto which these other things can be bolted. I've tried hard not to do anything which would make it hard to extend, but new eyes may see something I missed. -- Kevin Grittner EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Materialized views WIP patch
On 03/05/2013 01:09 PM, Kevin Grittner wrote: Josh Berkus j...@agliodbs.com wrote: There is no shortage of literature on the topic, although any papers from the ACM could certainly be of interest due to the general high quality of papers published there. Adding anything like this to 9.3 is clearly out of the question, though, so I really don't want to spend time researching this now, or encouraging others to do so until after we have a 9.3 release candidate. Good point. Just FYI: once we start work on 9.4, some university team got planner stuff for matviews working with postgres, using version 8.0 or something. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Materialized views WIP patch
Kevin Grittner kgri...@ymail.com wrote: REFRESH MATERIALIZED VIEW name [, ...] WITH [ NO ] DATA Given the short time, I left out the [, ...]. If people think that is important to get into this release, a follow-on patch might be possible. Barring objections, I will use the above and push tomorrow. I'm still working on docs, and the changes related to the syntax change are still only lightly tested, but as far as I know, all is complete except for the docs. I'm still working on those and expect to have them completed late today. I'm posting this patch to allow a chance for final review of the code changes before I push. Was the remaining work on docs done? I would like to test MVs and am waiting for the docs completed. -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese: http://www.sraoss.co.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Materialized views WIP patch
On 03-03-2013 21:27, Josh Berkus wrote: I think it would be worth talking about when someone wants to implement it. I'd imagine it would require setting a GUC, though, which would be off by default for obvious reasosn. -1. Why not adding another storage_parameter, say auto_refresh=on? Also, that's another feature. And of limited utility, as mentioned. For a first release, that is fine as is. Let's not complicate a feature that has been widely discussed in this development cycle. -- Euler Taveira de Oliveira - Timbira http://www.timbira.com.br/ PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Materialized views WIP patch
On 2 March 2013 15:06, Kevin Grittner kgri...@ymail.com wrote: [ ... ] led to this thought: REFRESH MATERIALIZED VIEW name [, ...] WITH [ NO ] DATA [Sorry to join this discussion so late] FWIW I had a quick look at other DBs to see if there were any other precedents out there. Oracle was the only one I could find with anything similar. They use the same creation syntax: CREATE MATERIALIZED VIEW name [options] AS SELECT ... and they use ALTER for everything else, such as refreshing the MV: ALTER MATERIALIZED VIEW name REFRESH [options]; AFAICT the nearest thing they have to TRUNCATE/DISCARD is: ALTER MATERIALIZED VIEW name CONSIDER FRESH; They do also support updateable materialized views with standard DML, but it doesn't look as though they allow TRUNCATE to operate directly on a MV (although it can be made to propagate from a base table to a MV, in which case allowing TRUNCATE on the MV itself with a different meaning would likely be confusing). Oracle's MVs have lots of options, all of which hang off the 2 basic CREATE and ALTER commands. There's a certain appeal to that, rather than inventing or overloading a bunch of other commands as more options get added. The proposed REFRESH command is OK for today's options, but I think it might be overly limiting in the future. Of course, since this isn't in the SQL standard, we are free to use any syntax we like. We don't have to follow Oracle, but having a common syntax might make some people's lives easier, and I haven't seen a convincing argument as to why any alternative syntax is better. Regards, Dean -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Materialized views WIP patch
Dean Rasheed dean.a.rash...@gmail.com wrote: Kevin Grittner kgri...@ymail.com wrote: [ ... ] led to this thought: REFRESH MATERIALIZED VIEW name [, ...] WITH [ NO ] DATA [Sorry to join this discussion so late] FWIW I had a quick look at other DBs to see if there were any other precedents out there. Oracle was the only one I could find with anything similar. They use the same creation syntax: CREATE MATERIALIZED VIEW name [options] AS SELECT ... It is a pretty obvious choice when you look at other SQL statements. and they use ALTER for everything else, such as refreshing the MV: ALTER MATERIALIZED VIEW name REFRESH [options]; No, that is for specifiying when and under what conditions an automatic refresh is done. To do an immediate action which is equivalent to what I have for the REFRESH statement, they use a REFRESH() function. That seemed too incompatible with how we've done everything else in PostgreSQL -- I felt that a statement would make more sense. Consider REINDEX, CLUSTER, and VACUUM FULL for example. AFAICT the nearest thing they have to TRUNCATE/DISCARD is: ALTER MATERIALIZED VIEW name CONSIDER FRESH; No, that doesn't rebuild or discard data -- if the MV is out-of-date and therefore unscannable according to the how the MV has been set up, this overrides that indication and allows scanning in spite of that. They do also support updateable materialized views with standard DML, but it doesn't look as though they allow TRUNCATE to operate directly on a MV (although it can be made to propagate from a base table to a MV, in which case allowing TRUNCATE on the MV itself with a different meaning would likely be confusing). They allow DML on the MV in order to update it. The default REFRESH() function executes a TRUNCATE statement followed by INSERT / SELECT using the MV's query. Oracle's MVs have lots of options, all of which hang off the 2 basic CREATE and ALTER commands. There's a certain appeal to that, rather than inventing or overloading a bunch of other commands as more options get added. The proposed REFRESH command is OK for today's options, but I think it might be overly limiting in the future. For what ALTER MATERIALIZED VIEW in Oracle does, I think it makes sense to use ALTER. I don't think this feature should use functions for REFRESH. Why Oracle can get away with functions for it is that they allow DML on an MV, which seems to me to compromise the integrity of the feature, at least as default behavior. I see us supporting automatic incremental updates of progressively more complex queries, and we may at some point want to add a trigger-based maintenance option; but the functionality available with a trigger-based approach is almost entirely availaable in PostgreSQL today without this feature. Rewriting queries using expressions which match the MV's query to pull from the MV instead of the underlying tables is the exception. While that is a sexy feature, and I'm sure one can construct examples where it helps performance, it seems to me unlikely to be very generally useful. I suspect that it exists mostly so that people who want to write an RFP to pick a particular product can include that as a requirement. In other words, I think the main benefit of automatic rewrite using an MV is marketing, not technical or performance. That's important, too; but let's focus first on getting what is most useful. -- Kevin Grittner EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Materialized views WIP patch
On 3 March 2013 13:12, Kevin Grittner kgri...@ymail.com wrote: Dean Rasheed dean.a.rash...@gmail.com wrote: Kevin Grittner kgri...@ymail.com wrote: [ ... ] led to this thought: REFRESH MATERIALIZED VIEW name [, ...] WITH [ NO ] DATA [Sorry to join this discussion so late] FWIW I had a quick look at other DBs to see if there were any other precedents out there. Oracle was the only one I could find with anything similar. They use the same creation syntax: CREATE MATERIALIZED VIEW name [options] AS SELECT ... It is a pretty obvious choice when you look at other SQL statements. and they use ALTER for everything else, such as refreshing the MV: ALTER MATERIALIZED VIEW name REFRESH [options]; No, that is for specifiying when and under what conditions an automatic refresh is done. To do an immediate action which is equivalent to what I have for the REFRESH statement, they use a REFRESH() function. That seemed too incompatible with how we've done everything else in PostgreSQL -- I felt that a statement would make more sense. Consider REINDEX, CLUSTER, and VACUUM FULL for example. AFAICT the nearest thing they have to TRUNCATE/DISCARD is: ALTER MATERIALIZED VIEW name CONSIDER FRESH; No, that doesn't rebuild or discard data -- if the MV is out-of-date and therefore unscannable according to the how the MV has been set up, this overrides that indication and allows scanning in spite of that. Ah, OK I see. I misunderstood what the Oracle docs were saying. ALTER only changes the MV's definition, whereas their REFRESH() function and your REFRESH statement updates the data in the MV. That makes much more sense. Regards, Dean -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Materialized views WIP patch
2013/3/3 Kevin Grittner kgri...@ymail.com: Rewriting queries using expressions which match the MV's query to pull from the MV instead of the underlying tables is the exception. While that is a sexy feature, and I'm sure one can construct examples where it helps performance, it seems to me unlikely to be very generally useful. I suspect that it exists mostly so that people who want to write an RFP to pick a particular product can include that as a requirement. In other words, I think the main benefit of automatic rewrite using an MV is marketing, not technical or performance. I think that automatically using materialized views even when the query doesn’t mention them directly, is akin to automatically using indexes without having to mention them in the query. That way, queries can be written the natural way, and “creating materialized views” is an optimization that can be applied by a DBA without having to update the application queries to use them. Nicolas -- A. Because it breaks the logical sequence of discussion. Q. Why is top posting bad? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Materialized views WIP patch
Nicolas Barbier nicolas.barb...@gmail.com wrote: 2013/3/3 Kevin Grittner kgri...@ymail.com: Rewriting queries using expressions which match the MV's query to pull from the MV instead of the underlying tables is the exception. While that is a sexy feature, and I'm sure one can construct examples where it helps performance, it seems to me unlikely to be very generally useful. I suspect that it exists mostly so that people who want to write an RFP to pick a particular product can include that as a requirement. In other words, I think the main benefit of automatic rewrite using an MV is marketing, not technical or performance. I think that automatically using materialized views even when the query doesn’t mention them directly, is akin to automatically using indexes without having to mention them in the query. That way, queries can be written the natural way, and “creating materialized views” is an optimization that can be applied by a DBA without having to update the application queries to use them. Oh, I understand that concept perfectly well, I just wonder how often it is useful in practice. The cost of planning with indexes tends to go up dramatically the planner needs to evaluate all possible combinations of access paths. We've devoted quite a bit of energy keeping that from being something like the factorial of the number of indexes. If you now need to find all materialized views which could substitute for parts of a query, and look at all permutations of how those could be used, and which indexes can be used for each of those combinations, you have planning time which can explode to extreme levels. Where the number of database objects are small and their sizes are large (like some data warehouse situations), you could come out ahead; and if I wanted to showcase the capability you describe that's what I would use. With a large number of database objects with only a few tens of millions of rows per table, I doubt you will come out ahead. Granted, you could say the same thing about indexes, and they are very often useful. I'm saying that I expect the usefulness of the technique you describe is generally very low, but not zero. Except for marketing, where it's a flashy feature. I would be interested in seeing information to show where it works well, though. There is probably something to be learned by looking at the details of the environment and workload of such a site. -- Kevin Grittner EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Materialized views WIP patch
On Sat, Mar 2, 2013 at 3:06 PM, Kevin Grittner kgri...@ymail.com wrote: (1) Any DML against the MV would need to be limited to some context fired by the underlying changes. If we allow changes to the MV outside of that without it being part of some updateable MV feature (reversing the direction of flow of changes), the MV could not be trusted at all. If you're going to do that, just use a table. Oh! I misunderstood what you were suggesting. I think we were talking at cross-purposes. You're imagining a user issues truncate against the underlying table(s) and the code that handles updating the materialized view will need to issue a truncate against the MV to update it. I was imagining that you wanted to be able to issue DML against the MV just as one can against an updateable view. That DML should propagate to the underlying table(s) through various magic. It's a pretty theoretical fear now but one day it may be important to avoid confusion between these two. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Materialized views WIP patch
2013/3/3 Kevin Grittner kgri...@ymail.com: Nicolas Barbier nicolas.barb...@gmail.com wrote: I think that automatically using materialized views even when the query doesn’t mention them directly, is akin to automatically using indexes without having to mention them in the query. That way, queries can be written the natural way, and “creating materialized views” is an optimization that can be applied by a DBA without having to update the application queries to use them. Oh, I understand that concept perfectly well, I just wonder how often it is useful in practice. The cost of planning with indexes tends to go up dramatically the planner needs to evaluate all possible combinations of access paths. We've devoted quite a bit of energy keeping that from being something like the factorial of the number of indexes. If you now need to find all materialized views which could substitute for parts of a query, and look at all permutations of how those could be used, and which indexes can be used for each of those combinations, you have planning time which can explode to extreme levels. I guess that a basic version of such a feature would do something like this: (1) Check for each matview whether it simply consists of an optional bunch of joins + optional aggregation + optional general filter conditions (to support something akin to a partial index). If not, the optimization doesn’t take this matview into account. This step can be done beforehand. (2) Check for each (sub)query in the query-to-optimize whether it does the following (at a smart point in the optimization phase). If any of these conditions are not met, don’t use this matview: - Joins at least the tables that are joined in the matview. - Contains join conditions and general filter conditions that are at least as strict. - Doesn’t refer elsewhere to any attributes that the matview doesn’t contain. (3) Always replace the corresponding query parts with the matview (i.e., assume that the cost will always be lower than performing the original query). (4) If multiple matviews fit in step 3, try them all (and use the one that yields the lower total cost). (5) Always replace any aggregation with the corresponding aggregation-results (if they exist) from the matview. That doesn’t sound as if it would make planning time explode that much (except, because of step 4, when there are many matviews that contain overlapping sets of joined tables, and a query joins over the union of those sets, *and* the replaceable-by-matviews parts are in subqueries). It could even decrease it significantly (e.g., in the case where a bunch of joins would be replaced with a scan of a matview). Also, I suppose that once such functionality exists, application writers would be more inclined to write “heavy” queries that do lots of aggregation even in an OLTP environment, of the kind that is these days typically only done in OLAP environments. Nicolas -- A. Because it breaks the logical sequence of discussion. Q. Why is top posting bad? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Materialized views WIP patch
Nicolas Barbier nicolas.barb...@gmail.com writes: 2013/3/3 Kevin Grittner kgri...@ymail.com: Nicolas Barbier nicolas.barb...@gmail.com wrote: I think that automatically using materialized views even when the query doesnât mention them directly, is akin to automatically using indexes without having to mention them in the query. Oh, I understand that concept perfectly well, I just wonder how often it is useful in practice. There's a much more fundamental reason why this will never happen, which is that the query planner is not licensed to decide that you only want an approximate and not an exact answer to your query. If MVs were guaranteed always up-to-date, maybe we could think about automatic use of them --- but that's a far different feature from what Kevin has here. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Materialized views WIP patch
There's a much more fundamental reason why this will never happen, which is that the query planner is not licensed to decide that you only want an approximate and not an exact answer to your query. I think it would be worth talking about when someone wants to implement it. I'd imagine it would require setting a GUC, though, which would be off by default for obvious reasosn. If MVs were guaranteed always up-to-date, maybe we could think about automatic use of them --- but that's a far different feature from what Kevin has here. And of limited utility, as mentioned. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Materialized views WIP patch
Kevin Grittner kgri...@ymail.com wrote: I'm still working on docs, and the changes related to the syntax change are still only lightly tested, but as far as I know, all is complete except for the docs. I'm still working on those and expect to have them completed late today. I'm posting this patch to allow a chance for final review of the code changes before I push. Pushed. I'll be keeping an eye on the buildfarm. -- Kevin Grittner EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Materialized views WIP patch
On 03/04/2013 08:27 AM, Josh Berkus wrote: There's a much more fundamental reason why this will never happen, which is that the query planner is not licensed to decide that you only want an approximate and not an exact answer to your query. I think it would be worth talking about when someone wants to implement it. I'd imagine it would require setting a GUC, though, which would be off by default for obvious reasosn. I'm not a fan of this, even with a GUC. Imagine doing remote debugging by email/phone. There are enough things to check already (does your application REALLY commit that transaction?) without also having to deal with settings that can cause a potentially out of date view of the data to be used without it being visible in the query its self. I hate to even say it, but this is where a per-query [redacted] would be good, so we could say in the query text that this query may use matviews that are not perfectly up to date. At this point it's all hand-waving anyway, since no feature to allow the planner to automatically rewrite a subtree of a query to use a matview instead exists. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Materialized views WIP patch
On Fri, Mar 1, 2013 at 3:01 PM, Ants Aasma a...@cybertec.at wrote: . To give another example of potential future update semantics, if we were to allow users manually maintaining materialized view contents using DML commands, one would expect TRUNCATE to mean make this matview empty, not make this matview unavailable. Wouldn't that just be a regular table then though? How is that a materialized view? If anything someone might expect truncate to delete any rows from the source table that appear in the view. But I think it's likely that even if materialized views were updateable truncate wouldn't be one of the updateable operations. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Materialized views WIP patch
Greg Stark st...@mit.edu wrote: Ants Aasma a...@cybertec.at wrote: To give another example of potential future update semantics, if we were to allow users manually maintaining materialized view contents using DML commands, one would expect TRUNCATE to mean make this matview empty, not make this matview unavailable. Wouldn't that just be a regular table then though? How is that a materialized view? If anything someone might expect truncate to delete any rows from the source table that appear in the view. But I think it's likely that even if materialized views were updateable truncate wouldn't be one of the updateable operations. Yeah, the only way it would make sense to truncate MV contents from a user-written maintenance trigger (assuming we might have such a thing some day) would be if you decided that the change to the underlying data was so severe that you effectively needed to REFRESH, and there would probably be a better way to go about dealing with that. Two other reasons that this might not be a problem are: (1) Any DML against the MV would need to be limited to some context fired by the underlying changes. If we allow changes to the MV outside of that without it being part of some updateable MV feature (reversing the direction of flow of changes), the MV could not be trusted at all. If you're going to do that, just use a table. (2) I can think of a couple not-too-horrible syntax tricks we could use to escape from the corner we're worried about painting ourselves into. All of that said, some combination of Heikki's previous suggestion that maybe REFRESH could be used and my noticing that both TRUNCATE and REFRESH create a new heap for an MV, it's just a question of whether we then run the MV's query to fill it with data, led to this thought: REFRESH MATERIALIZED VIEW name [, ...] WITH [ NO ] DATA This sort of mirrors the CREATE MATERIALIZED VIEW style (which was based on CREATE TABLE AS) and WITH NO DATA puts the MV into the unscannable state either way. I can change the parser to make this literally just the new spelling of TRUNCATE MATERIALIZED VIEW without dashing my hopes of pushing the patch tomorrow. (The patch has been ready to go for weeks other than this syntax issue and documentation which needs to refer to whatever syntax is chosen.) Barring objections, I will use the above and push tomorrow. The only issues which have been raised which will not be addressed at that point are: (1) The suggestion that ALTER MATERIALIZED VIEW name ALTER column support changing the collation isn't something I can see how to do without complication and risk beyond what is appropriate at this point in the release cycle. It will be left off, at least for now. To get a new collation, you will need to drop the MV and re-create it with a query which specifies the collation for the result column. (2) The sepgsql changes are still waiting for a decision from security focused folks. I have two patches for that contrib module ready based on my best reading of things -- one which uses table security labels and one which instead uses new matview labels. When we get a call on which is preferred, I suspect that one patch or the other will be good as-is or with minimal change. -- Kevin Grittner EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Materialized views WIP patch
On Thu, Feb 28, 2013 at 7:52 PM, Kevin Grittner kgri...@ymail.com wrote: Barring a sudden confluence of opinion, I will go with TRUNCATE for the initial spelling. I tend to favor that spelling for several reasons. One was the size of the patch needed to add the opposite of REFRESH to the backend code: FWIW, I found Andres's point about closing the door on updatable views quite convincing. If at any point we want to add updatable materialized views, it seems like a bad inconsistency to have TRUNCATE mean something completely different from DELETE. While update capability for materialized views might not be a common use case, I don't think it's fair to completely shut the door on it to have easier implementation and shorter syntax. Especially as the shorter syntax is semantically inconsistent - normal truncate removes the data, materialized view just makes the data inaccessible until the next refresh. Sorry for weighing in late, but it seemed to me that this point didn't get enough consideration. Ants Aasma -- Cybertec Schönig Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt Web: http://www.postgresql-support.de -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Materialized views WIP patch
Ants Aasma a...@cybertec.at wrote: Kevin Grittner kgri...@ymail.com wrote: Barring a sudden confluence of opinion, I will go with TRUNCATE for the initial spelling. I tend to favor that spelling for several reasons. One was the size of the patch needed to add the opposite of REFRESH to the backend code: FWIW, I found Andres's point about closing the door on updatable views quite convincing. If at any point we want to add updatable materialized views, it seems like a bad inconsistency to have TRUNCATE mean something completely different from DELETE. While update capability for materialized views might not be a common use case, I don't think it's fair to completely shut the door on it to have easier implementation and shorter syntax. Especially as the shorter syntax is semantically inconsistent - normal truncate removes the data, materialized view just makes the data inaccessible until the next refresh. Sorry for weighing in late, but it seemed to me that this point didn't get enough consideration. Personally, I don't understand why anyone would want updateable materialized views. That's probably because 99% of the cases where I've seen that someone wanted them, they wanted them updated to match the underlying data using some technique that didn't require the modification or commit of the underlying data to carry the overhead of maintaining the MV. In other words, they do not want the MV to be up-to-date for performance reasons. That is a big part of the reason for *wanting* to use an MV. How do you make an asynchronously-maintained view updateable? In addtion, at least 80% of the cases I've seen where people want an MV it is summary information, which does not tie a single MV row to a single underlying row. If someone updates an aggregate number like an average, I see no reasonable way to map that to the underlying data in a meaningful way. I see the contract of a materialized view as providing a table-backed relation which shows the result set of a query as of some point in time. Perhaps it is a failure of imagination, but I don't see where modifying that relation directly is compatible with that contract. Can you describe a meaningful use cases for an udpateable materialized view? -- Kevin Grittner EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Materialized views WIP patch
On Fri, Mar 1, 2013 at 4:18 PM, Kevin Grittner kgri...@ymail.com wrote: Personally, I don't understand why anyone would want updateable materialized views. That's probably because 99% of the cases where I've seen that someone wanted them, they wanted them updated to match the underlying data using some technique that didn't require the modification or commit of the underlying data to carry the overhead of maintaining the MV. In other words, they do not want the MV to be up-to-date for performance reasons. That is a big part of the reason for *wanting* to use an MV. How do you make an asynchronously-maintained view updateable? In addtion, at least 80% of the cases I've seen where people want an MV it is summary information, which does not tie a single MV row to a single underlying row. If someone updates an aggregate number like an average, I see no reasonable way to map that to the underlying data in a meaningful way. I see the contract of a materialized view as providing a table-backed relation which shows the result set of a query as of some point in time. Perhaps it is a failure of imagination, but I don't see where modifying that relation directly is compatible with that contract. Can you describe a meaningful use cases for an udpateable materialized view? I actually agree that overwhelming majority of users don't need or want updateable materialized views. My point was that we can't at this point rule out that people will think of a good use for this. I don't have any real use cases for this, but I can imagine a few situations where updateable materialized views wouldn't be nonsensical. One case would be if the underlying data is bulkloaded and is subsetted into smaller materialized views for processing using off-the-shelf tools that expect tables. One might want to propagate changes from those applications to the base data. The other case would be the theoretical future where materialized views can be incrementally and transactionally maintained, in that case being able to express modifications on the views could actually make sense. I understand that the examples are completely hypothetical and could be solved by using regular tables. I just have feeling that will regret conflating TRUNCATE semantics for slight implementation and notation convenience. To give another example of potential future update semantics, if we were to allow users manually maintaining materialized view contents using DML commands, one would expect TRUNCATE to mean make this matview empty, not make this matview unavailable. Ants Aasma -- Cybertec Schönig Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt Web: http://www.postgresql-support.de -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Materialized views WIP patch
On Sat, Feb 23, 2013 at 8:00 AM, Michael Paquier michael.paqu...@gmail.com wrote: it is. http://www.postgresql.org/docs/9.2/static/sql-reset.html DISCARD would be better. Well, personally, I'm in favor of either TRUNCATE or ALTER MATERIALIZED VIEW ... DISCARD. I think it's a dangerous precedent to suppose that we're going to start using DISCARD for things that have nothing to do with the existing meanings of DISCARD. Number one, I think it's confusing. Number two, it's currently possible to determine whether something is DDL, DML, or other by looking at the first word of the command. If we throw that out the window we may cause performance issues for connection pooling software that tries to be clever like that. Mind you, I'm not aware of any connection pooling software that actually does this today, because I think pgpool includes a full parser and pgbouncer includes no parser at all, but it still seems like a possibly-useful trick. And number three, the possibility of grammar conflicts with things we might want to do in the future seems unnecessarily high. If we use ALTER MATERIALIZED VIEW ... WHATEVER, we only have to worry about grammar conflicts with other ALTER MATERIALIZED VIEW commands; if we reuse DISCARD or RESET, we've got potential conflicts with completely unrelated syntax. That consideration alone would be sufficient reason for me to choose to stick everything under ALTER MATERIALIZED VIEW, no matter how poor a semantic fit it seems otherwise. Of course, if we stick with TRUNCATE, this becomes a non-issue. All that having been said, I'm not in favor of pushing this patch out to 9.4 because we can't agree on minor syntax details. In the absence of consensus, my feeling is that Kevin should exercise committer's prerogative and commit this in the way that seems best to him. If a clear contrary consensus subsequently emerges, we can always change it. It is not as if the particular choice of SQL syntax is hard to whack around. Once we release it we're stuck with it, but certainly between now and beta we can change it whenever we like. I'd rather have the core part of the feature committed and tinker with the syntax than wait longer to land the patch. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Materialized views WIP patch
Robert Haas robertmh...@gmail.com writes: On Sat, Feb 23, 2013 at 8:00 AM, Michael Paquier michael.paqu...@gmail.com wrote: it is. http://www.postgresql.org/docs/9.2/static/sql-reset.html DISCARD would be better. Well, personally, I'm in favor of either TRUNCATE or ALTER MATERIALIZED VIEW ... DISCARD. I think it's a dangerous precedent to suppose that we're going to start using DISCARD for things that have nothing to do with the existing meanings of DISCARD. Yeah, there's actually a serious problem with choosing DISCARD: surely we are not going to include trash all MVs in the behavior of DISCARD ALL. So unless you would like to say that DISCARD ALL doesn't mean what it appears to mean, we can't make MV reset be one of the sub-flavors of DISCARD. So that seems to leave us with either TRUNCATE or an ALTER sub-syntax. Personally I'd prefer the latter but it's surely debatable. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Materialized views WIP patch
On 28.02.2013 16:55, Robert Haas wrote: On Sat, Feb 23, 2013 at 8:00 AM, Michael Paquier michael.paqu...@gmail.com wrote: it is. http://www.postgresql.org/docs/9.2/static/sql-reset.html DISCARD would be better. Well, personally, I'm in favor of either TRUNCATE or ALTER MATERIALIZED VIEW ... DISCARD. I think it's a dangerous precedent to suppose that we're going to start using DISCARD for things that have nothing to do with the existing meanings of DISCARD. Number one, I think it's confusing. Number two, it's currently possible to determine whether something is DDL, DML, or other by looking at the first word of the command. If we throw that out the window we may cause performance issues for connection pooling software that tries to be clever like that. FWIW, I totally agree with that. From that point of view, the best thing would be to tack this onto the REFRESH command, perhaps something like: REFRESH matview INVALIDATE; REFRESH matview UNREFRESH; REFRESH matview DISCARD; It's a bit weird that the command is called REFRESH, if the effect is the exact opposite of refreshing it. And we usually do have two separate commands for doing something and undoing the same; CREATE - DROP, PREPARE - DEALLOCATE, LISTEN - UNLISTEN, and so forth. I think we're being too hung up on avoiding new (unreserved) keywords. Yes, the grammar is large because of so many keywords, but surely there's some better solution to that than adopt syntax that sucks. Let's invent a new keyword (INVALIDATE? UNREFRESH?), and deal with the grammar bloat separately. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Materialized views WIP patch
On Thu, Feb 28, 2013 at 11:00 AM, Tom Lane t...@sss.pgh.pa.us wrote: Well, personally, I'm in favor of either TRUNCATE or ALTER MATERIALIZED VIEW ... DISCARD. I think it's a dangerous precedent to suppose that we're going to start using DISCARD for things that have nothing to do with the existing meanings of DISCARD. Yeah, there's actually a serious problem with choosing DISCARD: surely we are not going to include trash all MVs in the behavior of DISCARD ALL. So unless you would like to say that DISCARD ALL doesn't mean what it appears to mean, we can't make MV reset be one of the sub-flavors of DISCARD. Good point. So that seems to leave us with either TRUNCATE or an ALTER sub-syntax. Personally I'd prefer the latter but it's surely debatable. I agree. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Materialized views WIP patch
Heikki Linnakangas hlinnakan...@vmware.com wrote: On 28.02.2013 16:55, Robert Haas wrote: Well, personally, I'm in favor of either TRUNCATE or ALTER MATERIALIZED VIEW ... DISCARD. I think it's a dangerous precedent to suppose that we're going to start using DISCARD for things that have nothing to do with the existing meanings of DISCARD. Number one, I think it's confusing. Number two, it's currently possible to determine whether something is DDL, DML, or other by looking at the first word of the command. If we throw that out the window we may cause performance issues for connection pooling software that tries to be clever like that. FWIW, I totally agree with that. From that point of view, the best thing would be to tack this onto the REFRESH command, perhaps something like: REFRESH matview INVALIDATE; REFRESH matview UNREFRESH; REFRESH matview DISCARD; It's a bit weird that the command is called REFRESH, if the effect is the exact opposite of refreshing it. And we usually do have two separate commands for doing something and undoing the same; CREATE - DROP, PREPARE - DEALLOCATE, LISTEN - UNLISTEN, and so forth. I think we're being too hung up on avoiding new (unreserved) keywords. Yes, the grammar is large because of so many keywords, but surely there's some better solution to that than adopt syntax that sucks. Let's invent a new keyword (INVALIDATE? UNREFRESH?), and deal with the grammar bloat separately. I'm OK with any grammar that we can reach consensus on, but that seems elusive and I don't want to hold up getting the meat of the patch committed while we haggle out this syntax detail. Votes have shifted frequently, but as I make out the latest opinion of people making a statement that looks like an explicit vote, dividing the value of a split vote between the choices, I get: Kevin Grittner: TRUNCATE Stephen Frost: TRUNCATE Peter Eisentraut: ALTER Andres Freund: RESET or DISCARD Josh Berkus: RESET Greg Stark: TRUNCATE Michael Paquier: DISCARD Robert Haas: TRUNCATE or ALTER Tom Lane: TRUNCATE or ALTER Heikki Linnakangas: REFRESH? Vote totals: TRUNCATE: 4.0 ALTER: 2.0 DISCARD: 1.5 RESET: 1.5 REFRESH 1.0? Barring a sudden confluence of opinion, I will go with TRUNCATE for the initial spelling. I tend to favor that spelling for several reasons. One was the size of the patch needed to add the opposite of REFRESH to the backend code: diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c index 2a55e02..eb7a14f 100644 --- a/src/backend/commands/tablecmds.c +++ b/src/backend/commands/tablecmds.c @@ -1234,11 +1234,12 @@ truncate_check_rel(Relation rel) { AclResult aclresult; - /* Only allow truncate on regular tables */ - if (rel-rd_rel-relkind != RELKIND_RELATION) + /* Only allow truncate on regular tables and materialized views. */ + if (rel-rd_rel-relkind != RELKIND_RELATION + rel-rd_rel-relkind != RELKIND_MATVIEW) ereport(ERROR, (errcode(ERRCODE_WRONG_OBJECT_TYPE), - errmsg(\%s\ is not a table, + errmsg(\%s\ is not a table or materialized view, RelationGetRelationName(rel; /* Permissions checks */ That's it. That takes it from no way to release the space held by the current contents of a materialized view and render it unscannable until its rule's query is used to populate it again to working. Now, there are docs and psql support needed on top of that, and the regression tests use the verb, and bikeshedding led to a minor tweak of TRUNCATE so that you could specify TRUNCATE MATERIALIZED VIEW -- but to get working backend code, the above is sufficient. That strikes me a prima facie evidence that it's not a horribly off-target verb. In terms of things to consider in choosing a verb are that, while complete absence of data in the MV's backing table is the only thing which will render it unscannable in this initial version, there is clear demand to eventually track information on how up-to-date data is, and treat the MV as unscannable for less extreme conditions, such as the asynchronous application of changes from backing tables having fallen behind some configurable threshold. In essence, people want the error instead of scanning the relation if the generated data is not within a range of time that is fresh enough, and truncating the backing relation (or creating the MV WITH NO DATA) is a special case of that since the data is not representing the results of the MV's query as of *any* known point in time. (As previously discussed, that is distinct from an empty relation which is known to represent a fresh view of the results of that query.) If we pick a new pair of verbs, the connotations I think we should be looking for include the ability to deal with at least these: * Make the MV represent the result set of its query as of this moment. * Declare the MV to be too stale for the contents of its backing
Re: [HACKERS] Materialized views WIP patch
2013-02-19 Kevin Grittner wrote: [matview-system_views-v2.diff] I assumed the patches matview-v4.patch and matview-system_views-v2.diff were to be applied together. They do apply correctly but during tests, the test rules ... FAILED. Perhaps it is solved already but I thought I'd mention it in case it is overlooked. Thanks, Erik Rijkers -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Materialized views WIP patch
On Sat, Feb 23, 2013 at 1:00 AM, Josh Berkus j...@agliodbs.com wrote: I prefer RESET, especially since it could eventually support RESET ALL MATERIALIZED VIEWS if that turns out to be useful. How does the parser like that? Isn't reset currently only used for GUCs? I think that makes for a strange crossover. Really, I'm sorry for bringing this up. I really don't think truncate is a bad way to spell it. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Materialized views WIP patch
On Sat, Feb 23, 2013 at 9:55 PM, Greg Stark st...@mit.edu wrote: On Sat, Feb 23, 2013 at 1:00 AM, Josh Berkus j...@agliodbs.com wrote: I prefer RESET, especially since it could eventually support RESET ALL MATERIALIZED VIEWS if that turns out to be useful. How does the parser like that? Isn't reset currently only used for GUCs? I think that makes for a strange crossover. it is. http://www.postgresql.org/docs/9.2/static/sql-reset.html DISCARD would be better. My 2c. -- Michael
Re: [HACKERS] Materialized views WIP patch
Andres Freund and...@2ndquadrant.com writes: On 2013-02-21 14:11:10 -0800, Kevin Grittner wrote: DISABLE MATERIALIZED VIEW mv; -- ALTER clause for constraints DISCARD MATERIALIZED VIEW DATA mv; -- session state RELEASE MATERIALIZED VIEW DATA mv; -- savepoint RESET MATERIALIZED VIEW DATA mv; -- run-time parameter I think any of these could work. I'm personally most inclined toward DISABLE MATERIALIZED VIEW. It seems to convey the semantics better, especially if you leave out DATA as an additonal word. I vote for RESET or DISCARD. DISABLE sounds more like you disable automatic refreshes or somesuch. Yeah, I don't much like DISABLE either. I'm also concerned about overloading RESET this way --- that statement has complicated-enough syntax already, not to mention way too many shades of meaning. So that leaves me voting for DISCARD M.V. DATA, which seems pretty precise. It's a bit verbose, but since when has SQL been succinct? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Materialized views WIP patch
That feels completely wrong to me. For one thing, I can't think of any ALTER commands to populate or remove data. What did you think of the idea of something like DISCARD MATERIALIZED VIEW DATA as a new statment? Or maybe RESET MATERIALIZED VIEW? I prefer RESET, especially since it could eventually support RESET ALL MATERIALIZED VIEWS if that turns out to be useful. How does the parser like that? BTW, to contradict Peter E., for my part I would NOT want matview resets to be logged as DDL. I would only want matview definitition changes to be so logged. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Materialized views WIP patch
Greg Stark st...@mit.edu writes: The way I was thinking about it, whatever the command is named, you might be able to tell the database to drop the storage associated with the view but that would make the view invalid until it was refreshed. It wouldn't make it appear to be empty. Actually, that seems like a pretty key point to me. TRUNCATE TABLE results in a table that is perfectly valid, you just deleted all the rows that used to be in it. Throwing away an MV's contents should not result in an MV that is considered valid. That being the case, lumping them as being the same operation feels like the wrong thing, and so we should choose a different name for the MV operation. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Materialized views WIP patch
On 2013-02-21 04:14:09 +, Greg Stark wrote: On Wed, Feb 20, 2013 at 9:25 PM, Peter Eisentraut pete...@gmx.net wrote: More generally, I would consider the invalidation of a materialized view a DDL command, whereas truncating a table is a DML command. That's not entirely true. From the database's point of view, TRUNCATE is in many ways actually DDL. I actually don't really dislike using TRUNCATE for this command. I was more asking about whether this meant people were thinking of the view as a thing where you could control the data in it by hand and could have the view be empty rather than just not valid. It also might get confusing when we get materialized views that are auto-updateable. I am not suggesting to forward TRUNCATE to the internal storage in that case but giving an error so its an easy to find distinction to a normal table seems like a good idea. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Materialized views WIP patch
On Wed, Feb 20, 2013 at 11:14 PM, Greg Stark st...@mit.edu wrote: On Wed, Feb 20, 2013 at 9:25 PM, Peter Eisentraut pete...@gmx.net wrote: More generally, I would consider the invalidation of a materialized view a DDL command, whereas truncating a table is a DML command. That's not entirely true. From the database's point of view, TRUNCATE is in many ways actually DDL. I actually don't really dislike using TRUNCATE for this command. Me neither. I'm astonished that we're seriously considering adding new keywords for this. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Materialized views WIP patch
On 2/20/13 5:03 PM, Kevin Grittner wrote: Peter Eisentraut pete...@gmx.net wrote: On 2/20/13 2:30 PM, Kevin Grittner wrote: Are there TRUNCATE triggers on materialized views? No. Nor SELECT, INSERT, UPDATE, or DELETE triggers. You can't create a trigger of any type on a materialized view. I don't think that would interfere with event triggers, though. More generally, I would consider the invalidation of a materialized view a DDL command, whereas truncating a table is a DML command. The force of that assertion is somewhat undercut by the fact that the ExecuteTruncate() function does exactly what needs to be done to discard the data in a materialized view and make it appear as non-scannable. Even if we dress it up with different syntax, it's not clear that we wouldn't build a TruncateStmt in the parser and pass it through exactly the same execution path. We would just need to look at the relkind to generate a different command tag. This is a fall-out of the implementation, and that's fine (although I'd personally still be in favor of putting that state in the catalog, not into the block count on disk, effectively), but I'm talking about the external interfaces we present. This has various implications with triggers, logging, permissions. I think it's not good to mix those two. Could you give a more concrete example of where you see a problem? * Logging: You can set things to log DDL commands only. I would want a MV invalidation to be logged. * Permissions: There is a TRUNCATE permission, would that apply here? There is no refresh permission. * Triggers: There are TRUNCATE triggers, but they don't apply here. * Triggers: I don't know how event triggers work, but I'd like materialized view events to be grouped together somehow. * Don't know the opinion of sepgsql on all this. I think what this all comes down to, as I've mentioned before, is that the opposite of this proposed truncate operation is the refresh operation, and that is a DDL command under ALTER MATERIALIZED VIEW. Both of these fundamental operations -- truncate/refresh, invalidate/validate, empty/refill, whatever -- should be grouped together somehow, as far as syntax, as well logging, permissions, trigger handling, and so on are concerned. You don't need a new command or key word for this. How about ALTER MATERIALIZED VIEW DISCARD? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Materialized views WIP patch
On 2/20/13 11:14 PM, Greg Stark wrote: That's not entirely true. From the database's point of view, TRUNCATE is in many ways actually DDL. Whether something is DDL or DML or a read operation (query) is not an implementation detail, it's a user-exposed category. Since TRUNCATE is logically equivalent to DELETE, it's a DML operation, as far as the user is concerned. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Materialized views WIP patch
Peter Eisentraut pete...@gmx.net wrote: On 2/20/13 11:14 PM, Greg Stark wrote: That's not entirely true. From the database's point of view, TRUNCATE is in many ways actually DDL. Whether something is DDL or DML or a read operation (query) is not an implementation detail, it's a user-exposed category. Since TRUNCATE is logically equivalent to DELETE, it's a DML operation, as far as the user is concerned. Not really. It doesn't follow the same MVCC behavior as DML. This is user-visible, documented behavior. -- Kevin Grittner EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Materialized views WIP patch
Tom Lane t...@sss.pgh.pa.us wrote: Greg Stark st...@mit.edu writes: The way I was thinking about it, whatever the command is named, you might be able to tell the database to drop the storage associated with the view but that would make the view invalid until it was refreshed. It wouldn't make it appear to be empty. Actually, that seems like a pretty key point to me. TRUNCATE TABLE results in a table that is perfectly valid, you just deleted all the rows that used to be in it. Throwing away an MV's contents should not result in an MV that is considered valid. It doesn't. That was one of the more contentious points in the earlier bikeshedding phases. Some felt that throwing away the contents was a form of making the MV out of date and as such didn't require any special handling. Others, including myself, felt that data not present was a distinct state from generated zero rows and that attempting to scan a materialized view for which data had not been generated must result in an error. The latter property has been maintained from the beginning -- or at least that has been the intent. test=# CREATE TABLE t (id int NOT NULL PRIMARY KEY, type text NOT NULL, amt numeric NOT NULL); CREATE TABLE test=# CREATE MATERIALIZED VIEW tm AS SELECT type, sum(amt) AS totamt FROM t GROUP BY type WITH NO DATA; SELECT 0 test=# SELECT pg_relation_is_scannable('tm'::regclass); pg_relation_is_scannable -- f (1 row) test=# SELECT * FROM tm; ERROR: materialized view tm has not been populated HINT: Use the REFRESH MATERIALIZED VIEW command. test=# REFRESH MATERIALIZED VIEW tm; REFRESH MATERIALIZED VIEW test=# SELECT pg_relation_is_scannable('tm'::regclass); pg_relation_is_scannable -- t (1 row) test=# TRUNCATE MATERIALIZED VIEW tm; TRUNCATE TABLE test=# SELECT * FROM tm; ERROR: materialized view tm has not been populated HINT: Use the REFRESH MATERIALIZED VIEW command. test=# SELECT pg_relation_is_scannable('tm'::regclass); pg_relation_is_scannable -- f (1 row) That being the case, lumping them as being the same operation feels like the wrong thing, and so we should choose a different name for the MV operation. There is currently no truncation of MV data without rendering the MV unscannable. Do you still feel it needs a different command name? -- Kevin Grittner EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Materialized views WIP patch
Greg Stark st...@mit.edu wrote: I actually don't really dislike using TRUNCATE for this command. I was more asking about whether this meant people were thinking of the view as a thing where you could control the data in it by hand and could have the view be empty rather than just not valid. You can either populate the MV in the CREATE command or by REFRESH, and it will be scannable. If it is created WITH NO DATA or TRUNCATEd it is not scannable, generating an error on an attempt to reference it. test=# select * from tm; type | totamt --+ y | 12 z | 24 x | 5 (3 rows) test=# truncate tm; TRUNCATE TABLE test=# select * from tm; ERROR: materialized view tm has not been populated HINT: Use the REFRESH MATERIALIZED VIEW command. The way I was thinking about it, whatever the command is named, you might be able to tell the database to drop the storage associated with the view but that would make the view invalid until it was refreshed. It wouldn't make it appear to be empty. I think we're on the same page after all. -- Kevin Grittner EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Materialized views WIP patch
Peter Eisentraut pete...@gmx.net wrote: I'll need more time to ponder your other points, but... You don't need a new command or key word for this. How about ALTER MATERIALIZED VIEW DISCARD? I don't like this because we don't have ALTER TABLE REINDEX. But the fact that DISCARD is a keyword does open up some interesting syntax possibilities without adding more keywords. Maybe: DISCARD MATERIALIZED VIEW DATA tm; Or something like that. Paint buckets are over there by the bikeshed. Have at it. :-) -- Kevin Grittner EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Materialized views WIP patch
On 21.02.2013 16:38, Kevin Grittner wrote: Tom Lanet...@sss.pgh.pa.us wrote: Greg Starkst...@mit.edu writes: The way I was thinking about it, whatever the command is named, you might be able to tell the database to drop the storage associated with the view but that would make the view invalid until it was refreshed. It wouldn't make it appear to be empty. Actually, that seems like a pretty key point to me. TRUNCATE TABLE results in a table that is perfectly valid, you just deleted all the rows that used to be in it. Throwing away an MV's contents should not result in an MV that is considered valid. It doesn't. That was one of the more contentious points in the earlier bikeshedding phases. Some felt that throwing away the contents was a form of making the MV out of date and as such didn't require any special handling. Others, including myself, felt that data not present was a distinct state from generated zero rows and that attempting to scan a materialized view for which data had not been generated must result in an error. The latter property has been maintained from the beginning -- or at least that has been the intent. Yeah, data not present is clearly different from 0 rows. I'm surprised there isn't an explicit boolean column somewhere for that, but I guess you can use the size of the heap for that too, as long as you're careful to not truncate it to 0 blocks when it's empty but scannable. There's at least one bug left in that area: postgres=# create table t (id int4); CREATE TABLE postgres=# create materialized view tm as select * from t where id 0;SELECT 0 postgres=# select * from tm; id (0 rows) postgres=# create index i_tm on tm(id);CREATE INDEX postgres=# cluster tm using i_tm; CLUSTER postgres=# select * from tm; ERROR: materialized view tm has not been populated HINT: Use the REFRESH MATERIALIZED VIEW command. Clustering a materialized view invalidates it. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Materialized views WIP patch
Andres Freund and...@2ndquadrant.com wrote: giving an error so its an easy to find distinction to a normal table seems like a good idea. I'm not sure I understood your concerns entirely, but wonder whether this helps?: test=# \d List of relations Schema | Name | Type | Owner +---+---+- public | bb | materialized view | kgrittn public | t | table | kgrittn public | tm | materialized view | kgrittn public | tmm | materialized view | kgrittn public | tv | view | kgrittn public | tvmm | materialized view | kgrittn public | tvv | view | kgrittn public | tvvm | materialized view | kgrittn public | tvvmv | view | kgrittn (9 rows) test=# truncate table tm; ERROR: tm is not a table test=# truncate materialized view t; ERROR: t is not a materialized view test=# truncate materialized view tm; TRUNCATE TABLE test=# truncate table t; TRUNCATE TABLE Well, maybe those command tags could use a tweak. Then there's this, if you don't specify an object type: test=# truncate t, tm; TRUNCATE TABLE -- Kevin Grittner EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Materialized views WIP patch
Kevin Grittner kgri...@ymail.com writes: Tom Lane t...@sss.pgh.pa.us wrote: That being the case, lumping them as being the same operation feels like the wrong thing, and so we should choose a different name for the MV operation. There is currently no truncation of MV data without rendering the MV unscannable. Do you still feel it needs a different command name? You didn't say anything that changed my opinion: it doesn't feel like a TRUNCATE to me. It's not changing the object to a different but entirely valid state, which is what TRUNCATE does. Peter claimed upthread that REFRESH is a subcommand of ALTER MATERIALIZE VIEW and that this operation should be another one. That sounds pretty reasonable from here. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Materialized views WIP patch
Heikki Linnakangas hlinnakan...@vmware.com wrote: On 21.02.2013 16:38, Kevin Grittner wrote: Tom Lanet...@sss.pgh.pa.us wrote: Greg Starkst...@mit.edu writes: The way I was thinking about it, whatever the command is named, you might be able to tell the database to drop the storage associated with the view but that would make the view invalid until it was refreshed. It wouldn't make it appear to be empty. Actually, that seems like a pretty key point to me. TRUNCATE TABLE results in a table that is perfectly valid, you just deleted all the rows that used to be in it. Throwing away an MV's contents should not result in an MV that is considered valid. It doesn't. That was one of the more contentious points in the earlier bikeshedding phases. Some felt that throwing away the contents was a form of making the MV out of date and as such didn't require any special handling. Others, including myself, felt that data not present was a distinct state from generated zero rows and that attempting to scan a materialized view for which data had not been generated must result in an error. The latter property has been maintained from the beginning -- or at least that has been the intent. Yeah, data not present is clearly different from 0 rows. I'm surprised there isn't an explicit boolean column somewhere for that, There was, in earlier versions of the patch: pg_class.relisvald. The problem is that we needed some way to determine from the heap that it was invalid to support UNLOGGED MVs. Several people were offended by my attempt to use relisvald as the primary indicator and transfer the information from the heap state to pg_class and relcache. There were some pretty big technical challenges to that. So I caved on that one and went with the pg_relation_is_scannable() function based on the heap as reported by relcache. That being one of the newer parts of the patch, it is probably not as solid as the parts which haven't changed much in the last three months. but I guess you can use the size of the heap for that too, as long as you're careful to not truncate it to 0 blocks when it's empty but scannable. There's at least one bug left in that area: postgres=# create table t (id int4); CREATE TABLE postgres=# create materialized view tm as select * from t where id 0;SELECT 0 postgres=# select * from tm; id (0 rows) postgres=# create index i_tm on tm(id);CREATE INDEX postgres=# cluster tm using i_tm; CLUSTER postgres=# select * from tm; ERROR: materialized view tm has not been populated HINT: Use the REFRESH MATERIALIZED VIEW command. Clustering a materialized view invalidates it. Good spot. That should be easy enough to fix. Thanks. -- Kevin Grittner EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Materialized views WIP patch
On 2013-02-21 07:10:09 -0800, Kevin Grittner wrote: Andres Freund and...@2ndquadrant.com wrote: giving an error so its an easy to find distinction to a normal table seems like a good idea. I'm not sure I understood your concerns entirely, but wonder whether this helps?: To explain it a bit: I assume that at some point matviews will get (auto-)updateable, just as normal views recently got. In that case application programmers might not be aware anymore that something is a view either because they just don't know or because a table got converted into a matview after the code was written. Because of the potential wish for transparency (with security views as a potential user) at least normal views might get the capability to be TRUNCATEd directly, so it might be that matviews do as well. test=# \d List of relations Schema | Name | Type | Owner +---+---+- public | bb | materialized view | kgrittn public | t | table | kgrittn public | tm | materialized view | kgrittn public | tmm | materialized view | kgrittn public | tv | view | kgrittn public | tvmm | materialized view | kgrittn public | tvv | view | kgrittn public | tvvm | materialized view | kgrittn public | tvvmv | view | kgrittn (9 rows) test=# truncate table tm; ERROR: tm is not a table test=# truncate materialized view t; ERROR: t is not a materialized view test=# truncate materialized view tm; TRUNCATE TABLE test=# truncate table t; TRUNCATE TABLE Thats not bad. But what if we allow TRUNCATE on views someday (possibly only if a truncate trigger is defined). For consistency we might also want that on matvies. Having a difference between TRUNCATE view; and TRUNCATE MATERIALIZED VIEW; in that case sounds ugly to me. What about DISABLE? DISCARD or DEALLOCATE would also be nice but it seems hard to fit that into existing syntax. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Materialized views WIP patch
Tom Lane t...@sss.pgh.pa.us wrote: Kevin Grittner kgri...@ymail.com writes: Tom Lane t...@sss.pgh.pa.us wrote: That being the case, lumping them as being the same operation feels like the wrong thing, and so we should choose a different name for the MV operation. There is currently no truncation of MV data without rendering the MV unscannable. Do you still feel it needs a different command name? You didn't say anything that changed my opinion: it doesn't feel like a TRUNCATE to me. It's not changing the object to a different but entirely valid state, which is what TRUNCATE does. Peter claimed upthread that REFRESH is a subcommand of ALTER MATERIALIZE VIEW It's not, nor do I think it should be. and that this operation should be another one. That sounds pretty reasonable from here. That feels completely wrong to me. For one thing, I can't think of any ALTER commands to populate or remove data. What did you think of the idea of something like DISCARD MATERIALIZED VIEW DATA as a new statment? Or maybe RESET MATERIALIZED VIEW? -- Kevin Grittner EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Materialized views WIP patch
Andres Freund and...@2ndquadrant.com wrote: I assume that at some point matviews will get (auto-)updateable, just as normal views recently got. I'm dubious about that. Every use case I've seen for MVs involves aggregation, although they are a generalized feature, so that won't always be true. But if you have a view like: CREATE MATERIALIZED VIEW tm AS SELECT t.type, sum(t.amt) AS totamt FROM t GROUP BY t.type; ... I don't see how that can be updateable. If I add 5 to totamt for some row, what do you do? I expect that 99% of MVs will be updated asynchronously from changes to the underlying data -- what do you do if someone updates a row that no longer exists in the underlying data. This are just seems fraught with peril and out of sync with the usual uses of MVs. What about DISABLE? DISCARD or DEALLOCATE would also be nice but it seems hard to fit that into existing syntax. Thanks for the suggestions. -- Kevin Grittner EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Materialized views WIP patch
Kevin Grittner kgri...@ymail.com writes: Tom Lane t...@sss.pgh.pa.us wrote: Peter claimed upthread that REFRESH is a subcommand of ALTER MATERIALIZE VIEW It's not, nor do I think it should be. Oh, never mind then. and that this operation should be another one. That sounds pretty reasonable from here. That feels completely wrong to me. For one thing, I can't think of any ALTER commands to populate or remove data. What did you think of the idea of something like DISCARD MATERIALIZED VIEW DATA as a new statment? Or maybe RESET MATERIALIZED VIEW? I could live with either DISCARD or RESET. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Materialized views WIP patch
On 2/21/13 9:25 AM, Kevin Grittner wrote: Peter Eisentraut pete...@gmx.net wrote: On 2/20/13 11:14 PM, Greg Stark wrote: That's not entirely true. From the database's point of view, TRUNCATE is in many ways actually DDL. Whether something is DDL or DML or a read operation (query) is not an implementation detail, it's a user-exposed category. Since TRUNCATE is logically equivalent to DELETE, it's a DML operation, as far as the user is concerned. Not really. It doesn't follow the same MVCC behavior as DML. This is user-visible, documented behavior. MVCC behavior does not determine whether something is considered DDL or DML. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Materialized views WIP patch
Tom Lane t...@sss.pgh.pa.us wrote: Kevin Grittner kgri...@ymail.com writes: What did you think of the idea of something like DISCARD MATERIALIZED VIEW DATA as a new statment? Or maybe RESET MATERIALIZED VIEW? I could live with either DISCARD or RESET. I figured this was worth a pass through the keyword list to look for all imperative verbs suitable for this, which could support the needed syntax without adding a new keyword. Here are the possibilities I came up with, along with a note about why they are keywords already. DISABLE MATERIALIZED VIEW mv; -- ALTER clause for constraints DISCARD MATERIALIZED VIEW DATA mv; -- session state RELEASE MATERIALIZED VIEW DATA mv; -- savepoint RESET MATERIALIZED VIEW DATA mv; -- run-time parameter I think any of these could work. I'm personally most inclined toward DISABLE MATERIALIZED VIEW. It seems to convey the semantics better, especially if you leave out DATA as an additonal word. Given that a materialized view will retain its query, tablespace, indexes, statistics targets, etc. with this operation, and will just not be available for scanning, some of the above seem downright misleading without DATA thrown in. Opinions? -- Kevin Grittner EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Materialized views WIP patch
On Thu, Feb 21, 2013 at 2:38 PM, Kevin Grittner kgri...@ymail.com wrote: It doesn't. That was one of the more contentious points in the earlier bikeshedding phases. Some felt that throwing away the contents was a form of making the MV out of date and as such didn't require any special handling. Others, including myself, felt that data not present was a distinct state from generated zero rows and that attempting to scan a materialized view for which data had not been generated must result in an error. The latter property has been maintained from the beginning -- or at least that has been the intent. Actually this sounds like exactly what I was saying. I withdraw my concern entirely. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Materialized views WIP patch
Greg Stark st...@mit.edu wrote: Kevin Grittner kgri...@ymail.com wrote: It doesn't. That was one of the more contentious points in the earlier bikeshedding phases. Some felt that throwing away the contents was a form of making the MV out of date and as such didn't require any special handling. Others, including myself, felt that data not present was a distinct state from generated zero rows and that attempting to scan a materialized view for which data had not been generated must result in an error. The latter property has been maintained from the beginning -- or at least that has been the intent. Actually this sounds like exactly what I was saying. I withdraw my concern entirely. Reviewing your concerns and discussions of freshness in general got me thinking -- while it is clear that not having generated values in the MV based on its query clearly should make the view non-scannable, and will be the only criterion for that in this initial patch; later versions will almost certainly support age-based conditions for whether the MV is scannable. So in the next release the MV may become non-scannable based on the passage of time since DML was run on a source table without the MV having been refreshed or incrementally updated to reflect that DML. Which makes me wonder why DML making the MV non-scannable is such a bad thing in the case of TRUNCATE. Granted there is a difference in that it is run *on the MV* rather than *on a source relation*; but still, I'm having some second thoughts about that being a problem. The problem with TRUNCATE MATERIALIZED VIEW from a logical perspective doesn't seem to me so much that it makes the MV non-scannable, as that it is the only DML which would be allowed directly on an MV -- which is kind of a weird exception. It is pretty much a given that when we can get to implementing it, DML statements will render MVs unscannable under various conditions. Josh Berkus and Greg Stark have been the most explicit about that, but I think most of us who are interested in the feature take it as a given. -- Kevin Grittner EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Materialized views WIP patch
On 2013-02-21 14:11:10 -0800, Kevin Grittner wrote: Tom Lane t...@sss.pgh.pa.us wrote: Kevin Grittner kgri...@ymail.com writes: What did you think of the idea of something like DISCARD MATERIALIZED VIEW DATA as a new statment? Or maybe RESET MATERIALIZED VIEW? I could live with either DISCARD or RESET. I figured this was worth a pass through the keyword list to look for all imperative verbs suitable for this, which could support the needed syntax without adding a new keyword. Here are the possibilities I came up with, along with a note about why they are keywords already. DISABLE MATERIALIZED VIEW mv; -- ALTER clause for constraints DISCARD MATERIALIZED VIEW DATA mv; -- session state RELEASE MATERIALIZED VIEW DATA mv; -- savepoint RESET MATERIALIZED VIEW DATA mv; -- run-time parameter I think any of these could work. I'm personally most inclined toward DISABLE MATERIALIZED VIEW. It seems to convey the semantics better, especially if you leave out DATA as an additonal word. Given that a materialized view will retain its query, tablespace, indexes, statistics targets, etc. with this operation, and will just not be available for scanning, some of the above seem downright misleading without DATA thrown in. I vote for RESET or DISCARD. DISABLE sounds more like you disable automatic refreshes or somesuch. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Materialized views WIP patch
Kevin Grittner kgri...@ymail.com writes: When I went to do this, I hit a shift/reduce conflict, because with TABLE being optional it couldn't tell whether: TRUNCATE MATERIALIZED VIEW x, y, z; ... was looking for five relations or three. That goes away with MATERIALIZED escalated to TYPE_FUNC_NAME_KEYWORD. Is that OK? Not really. I would much rather see us not bother with this pedantic syntax than introduce an even-partially-reserved word. Having said that, I don't think I believe your analysis of why this doesn't work. The presence or absence of commas ought to make the syntax non-ambiguous, I would think. Maybe you just factored the grammar wrong. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Materialized views WIP patch
On Tue, Feb 19, 2013 at 11:01 AM, Peter Eisentraut pete...@gmx.net wrote: On 2/19/13 8:54 AM, Robert Haas wrote: In the department of crazy ideas, what about having pg_dump NEVER refresh ANY materialized views? It might be useful to have an option for this, but I don't think it should be the default. The default should be that the new database is ready to go. Then again, when would you ever actually use that option? You'd use that option if you'd rather get the database mostly-up as soon as possible, and then worry about the materialized views afterwards. This might be different if there were a command to refresh all materialized views, because you don't want to have to go around and type separate commands 47 times after a restore. Well, it's pretty easy to do: SELECT 'LOAD MATERIALIZED VIEW ' || p.oid::regclass || ';' FROM pg_class WHERE relkind = 'm'; ...but we could also add explicit syntax for it, perhaps along the lines of what we have for CLUSTER and VACUUM. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Materialized views WIP patch
Robert Haas robertmh...@gmail.com writes: On Tue, Feb 19, 2013 at 11:01 AM, Peter Eisentraut pete...@gmx.net wrote: This might be different if there were a command to refresh all materialized views, because you don't want to have to go around and type separate commands 47 times after a restore. Well, it's pretty easy to do: SELECT 'LOAD MATERIALIZED VIEW ' || p.oid::regclass || ';' FROM pg_class WHERE relkind = 'm'; ...but we could also add explicit syntax for it, perhaps along the lines of what we have for CLUSTER and VACUUM. It's not really that easy, because of the likelihood that MVs have to be refreshed in a specific order. The SELECT you suggest definitely seems too simplistic. A dedicated command could perhaps be built to pay attention to dependencies ... but if we're still coding such things now, it seems a bit late for 9.3. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Materialized views WIP patch
On 2/19/13 5:22 PM, David Fetter wrote: On Tue, Feb 19, 2013 at 11:09:13PM +0100, Erik Rijkers wrote: On Sat, February 16, 2013 02:01, Kevin Grittner wrote: matview-v4.patch.gz Hi, I was wondering if material views should not go into information_schema. I was thinking either .views or .tables. Have you considered this? I'm guessing it'd be .views if anything. Haven't been able to decipher from section 11 of the standard (Schemata) whether the standard has anything to say on the matter. I suppose one should be able to expect that if one finds a view in the information schema, then one should be able to use DROP VIEW to remove it. Which in this case wouldn't work. So I don't think including a materialized view under views or tables is appropriate. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Materialized views WIP patch
Tom Lane t...@sss.pgh.pa.us wrote: Kevin Grittner kgri...@ymail.com writes: When I went to do this, I hit a shift/reduce conflict, because with TABLE being optional it couldn't tell whether: TRUNCATE MATERIALIZED VIEW x, y, z; ... was looking for five relations or three. That goes away with MATERIALIZED escalated to TYPE_FUNC_NAME_KEYWORD. Is that OK? Not really. I would much rather see us not bother with this pedantic syntax than introduce an even-partially-reserved word. I'm not sure it's worth it either; but two people requested it and I didn't forsee this shift/reduce conflict, so I took a shot at it. If we can't eliminate the conflict, I'm fine with leaving things as they are in the latest posted patch. Having said that, I don't think I believe your analysis of why this doesn't work. The presence or absence of commas ought to make the syntax non-ambiguous, I would think. Maybe you just factored the grammar wrong. Well, it wouldn't be the first time you've seen a better way to do something in flex than I was able to see. Taking just the gram.y part of the change which implemented this, and omitting the change in reservedness of MATERIALIZED, I have: diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y index 820cb41..1d393c5 100644 --- a/src/backend/parser/gram.y +++ b/src/backend/parser/gram.y @@ -394,6 +394,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query); %type ival opt_column event cursor_options opt_hold opt_set_data %type objtype reindex_type drop_type comment_type security_label_type + trunc_type %type node fetch_args limit_clause select_limit_value offset_clause select_offset_value @@ -5172,9 +5173,10 @@ attrs: '.' attr_name */ TruncateStmt: - TRUNCATE opt_table relation_expr_list opt_restart_seqs opt_drop_behavior + TRUNCATE trunc_type relation_expr_list opt_restart_seqs opt_drop_behavior { TruncateStmt *n = makeNode(TruncateStmt); + n-objtype = $2; n-relations = $3; n-restart_seqs = $4; n-behavior = $5; @@ -5182,6 +5184,12 @@ TruncateStmt: } ; +trunc_type: + TABLE { $$ = OBJECT_TABLE; } + | MATERIALIZED VIEW { $$ = OBJECT_MATVIEW; } + | /*EMPTY*/ { $$ = OBJECT_UNSPECIFIED; } + ; + opt_restart_seqs: CONTINUE_P IDENTITY_P { $$ = false; } | RESTART IDENTITY_P { $$ = true; } I'm open to suggestions on a better way. -- Kevin Grittner EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Materialized views WIP patch
Peter Eisentraut pete...@gmx.net wrote: I suppose one should be able to expect that if one finds a view in the information schema, then one should be able to use DROP VIEW to remove it. Which in this case wouldn't work. So I don't think including a materialized view under views or tables is appropriate. Right. I think adding pg_matviews covers the stated use-case enough to answer Erik's concern. I'm not going to mess with adding non-standard stuff to the standard views. -- Kevin Grittner EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Materialized views WIP patch
Kevin Grittner kgri...@ymail.com writes: Tom Lane t...@sss.pgh.pa.us wrote: Having said that, I don't think I believe your analysis of why this doesn't work. Well, it wouldn't be the first time you've seen a better way to do something in flex than I was able to see. Taking just the gram.y part of the change which implemented this, and omitting the change in reservedness of MATERIALIZED, I have: - TRUNCATE opt_table relation_expr_list opt_restart_seqs opt_drop_behavior + TRUNCATE trunc_type relation_expr_list opt_restart_seqs opt_drop_behavior +trunc_type: + TABLE { $$ = OBJECT_TABLE; } + | MATERIALIZED VIEW { $$ = OBJECT_MATVIEW; } + | /*EMPTY*/ { $$ = OBJECT_UNSPECIFIED; } + ; Yeah, this is a standard gotcha when working with unreserved keywords. You can't factor it like that because then the parser is required to make a shift-reduce decision (on whether to reduce trunc_type to empty) before it can see past the first word. So for instance given TRUNCATE MATERIALIZED ... ^ the parser has to make that decision when it can't see past the word MATERIALIZED and so doesn't know what comes after it. The way to fix it is to not try to use the sub-production but spell it all out: TRUNCATE TABLE relation_expr_list ... | TRUNCATE MATERIALIZED VIEW relation_expr_list ... | TRUNCATE relation_expr_list ... Now the parser doesn't have to make any shift-reduce decision until after it can see past the first identifier. It's a bit tedious but beats making a word more reserved than it has to be. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Materialized views WIP patch
Tom Lane t...@sss.pgh.pa.us wrote: The way to fix it is to not try to use the sub-production but spell it all out: TRUNCATE TABLE relation_expr_list ... | TRUNCATE MATERIALIZED VIEW relation_expr_list ... | TRUNCATE relation_expr_list ... Now the parser doesn't have to make any shift-reduce decision until after it can see past the first identifier. It's a bit tedious but beats making a word more reserved than it has to be. Thanks! Will do. -- Kevin Grittner EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Materialized views WIP patch
On 2/20/13 6:13 AM, Robert Haas wrote: It might be useful to have an option for this, but I don't think it should be the default. The default should be that the new database is ready to go. Then again, when would you ever actually use that option? You'd use that option if you'd rather get the database mostly-up as soon as possible, and then worry about the materialized views afterwards. Since the proposed materialized views are not available for implicit use in query optimization, the only way an application would make use of them is to access them directly. And if it accesses an unpopulated materialized view, it would fail. So I don't think in the current state a database is mostly-up without the materialized views filled in. I can see the value in having a restore mode that postpones certain nonessential operations, such as creating indexes or certain constraints or even materialized views. But I think the boundaries and expectations for that need to be defined more precisely. For example, a database without constraints might be considered ready for read-only use, without secondary indexes it might be ready for use but slow. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Materialized views WIP patch
On Wed, February 20, 2013 16:28, Kevin Grittner wrote: Peter Eisentraut pete...@gmx.net wrote: I suppose one should be able to expect that if one finds a view in the information schema, then one should be able to use DROP VIEW to remove it. Which in this case wouldn't work. So I don't think including a materialized view under views or tables is appropriate. Right. I think adding pg_matviews covers the stated use-case enough to answer Erik's concern. Absolutely - I agree pg_matviews is much better than adding deviating information_schema stuff. Thank you, Erik Rijkers -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Materialized views WIP patch
On Wed, Feb 20, 2013 at 4:26 PM, Kevin Grittner kgri...@ymail.com wrote: Tom Lane t...@sss.pgh.pa.us wrote: The way to fix it is to not try to use the sub-production but spell it all out: TRUNCATE TABLE relation_expr_list ... | TRUNCATE MATERIALIZED VIEW relation_expr_list ... | TRUNCATE relation_expr_list ... Now the parser doesn't have to make any shift-reduce decision until after it can see past the first identifier. It's a bit tedious but beats making a word more reserved than it has to be. Thanks! Will do. Fwiw I think worrying about stuff like this at this point is probably a waste of time. There'll be a period of bike-shedding where people debate what the command should be called so worrying about parser conflicts before there's a consensus is kind pointless. I would like to know what operations you plan to support independently of the command names. I may have missed much earlier in the discussion but then I suspect things have evolved since then. It sounds like you want to support: 1) Selecting from materialized viws 2) Manually refreshing materialized views 3) Manually truncating materialized views And explicitly not support 1) Automatically rewriting queries to select from matching views 2) Incrementally refreshing materialized views 3) Manual DML against data in materialized views (except truncate which is kind of DDL) 4) Keeping track of whether the data in the materialized view is up to date I have to say I find this model a bit odd. It seems the UI you're presenting is that they're basically read-only tables that the database will fill in the data for automatically. My mental model of materialized views is that they're basically views that the database guarantees a different performance characteristic for. I would expect a materialized view to be up to date all the time. If we don't support incremental updates (which seems like a fine thing not to support in a first cut) then I would expect any DML against the table to mark the view invalid and any queries against it to produce an error (or possibly go to the source tables using the view definition but that's probably a bad idea for most use cases). Ie. they should behave like a view at all times and have up to date information or fail entirely. I would expect a command like TRUNCATE MATERIALIZED VIEW to exist but I would expect it to be called something like INVALIDATE rather than TRUNCATE and dropping the storage is a side effect of simply telling the database that it doesn't need to maintain this materialized view. Though I could be convinced truncate is a good name as long as it's documented well. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Materialized views WIP patch
Greg Stark st...@mit.edu wrote: Kevin Grittner kgri...@ymail.com wrote: Tom Lane t...@sss.pgh.pa.us wrote: The way to fix it is to not try to use the sub-production but spell it all out: TRUNCATE TABLE relation_expr_list ... | TRUNCATE MATERIALIZED VIEW relation_expr_list ... | TRUNCATE relation_expr_list ... Now the parser doesn't have to make any shift-reduce decision until after it can see past the first identifier. It's a bit tedious but beats making a word more reserved than it has to be. Thanks! Will do. Fwiw I think worrying about stuff like this at this point is probably a waste of time. There'll be a period of bike-shedding where people debate what the command should be called so worrying about parser conflicts before there's a consensus is kind pointless. That sort of bikeshedding already happened three months ago. Too late now. I would like to know what operations you plan to support independently of the command names. I may have missed much earlier in the discussion but then I suspect things have evolved since then. It sounds like you want to support: 1) Selecting from materialized viws 2) Manually refreshing materialized views 3) Manually truncating materialized views And explicitly not support 1) Automatically rewriting queries to select from matching views 2) Incrementally refreshing materialized views Those are material for later releases, building on the base of what goes into this release. 3) Manual DML against data in materialized views (except truncate which is kind of DDL) There is quite a lot of DML allowed -- changing tablespace, changing schema, changing name of the MV or of individual columns in it, changing statistics targets, creating indexes, and other operations are supported. 4) Keeping track of whether the data in the materialized view is up to date Only keeping track of whether data has been populated or not, for now. There has been agreement that one or more timestamps relating to freshness will make sense, but these are not in the initial patch. I have to say I find this model a bit odd. It's not a model, it's a starting point. Several people have already said that even this much is useful and they expect to take advantage of it. I'm doing what I can to not paint us into a corner where it's hard to extend to all the features everyone dreams of, but if we waited for that to commit something, it will never happen. I would expect a materialized view to be up to date all the time. I expect that this will eventually be an option, but I expect that is will be a seldom-used one. Most cases that I've seen, people want summary data that is reasonably up-to-date without unduly affecting the performance of incremental changes to the underlying data. I've sketched out the roadmap from this patch to all of these options in a vauge, handwavy fashion, and don't have a lot of interest in taking it farther until we're past 9.3 beta. If we don't support incremental updates (which seems like a fine thing not to support in a first cut) then I would expect any DML against the table to mark the view invalid and any queries against it to produce an error (or possibly go to the source tables using the view definition but that's probably a bad idea for most use cases). Ie. they should behave like a view at all times and have up to date information or fail entirely. That would render them completely useless for the use-cases I've seen. If you want to offer a patch to do that as an option, feel free, but I will strongly argue against that as unconditional behavior. I would expect a command like TRUNCATE MATERIALIZED VIEW to exist but I would expect it to be called something like INVALIDATE rather than TRUNCATE and dropping the storage is a side effect of simply telling the database that it doesn't need to maintain this materialized view. Though I could be convinced truncate is a good name as long as it's documented well. I'm trying to minimize the number of new keywords. The initial patch only added MATERIALIZED. I added REFRESH due to near-universal demand for something other than the LOAD MATERIALIZED VIEW I initially used. Have you seen the statistics Tom gave out on how much the size of the executable bloats with every new keyword? Until now nobody has expressed concern about TRUNCATE MATERIALIZED VIEW, so it would take quite a groundswell of concern at this point to even consider a new keyword for this functionality this late in the game. -- Kevin Grittner EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Materialized views WIP patch
And explicitly not support 1) Automatically rewriting queries to select from matching views 2) Incrementally refreshing materialized views 3) Manual DML against data in materialized views (except truncate which is kind of DDL) 4) Keeping track of whether the data in the materialized view is up to date The idea is to add the above features over the next few versions of Postgres. I have to say I find this model a bit odd. It seems the UI you're presenting is that they're basically read-only tables that the database will fill in the data for automatically. This is what matviews are in other DBMSes. My mental model of materialized views is that they're basically views that the database guarantees a different performance characteristic for. How would we do that, exactly? That would be lovely, but unless you have a way to accomplish it ... I would expect a materialized view to be up to date all the time. Actually, there's a huge use case for asynchronously updated matviews, so we would not want an implementation which ruled them out. Also there's the argument that synchronously updated matviews have little actual performance advantage over regular dynamic views. Or to put it another way: I could use this feature, as it is, in about 8 different projects I'm currently supporting. I personally can't think of a single project where I need synchronously updated matviews, currently. I have in the past, but it's a LOT less frequent that the desire for async, just as the desire for async replication is more common than the desire for syncrep. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Materialized views WIP patch
On 2/19/13 5:47 PM, Kevin Grittner wrote: When I went to do this, I hit a shift/reduce conflict, because with TABLE being optional it couldn't tell whether: TRUNCATE MATERIALIZED VIEW x, y, z; ... was looking for five relations or three. That goes away with MATERIALIZED escalated to TYPE_FUNC_NAME_KEYWORD. Is that OK? Is TRUNCATE even the right command here? For regular tables TRUNCATE is a fast DELETE, which logically empties the table. For materialized views, there is no deleting, so this command (I suppose?) just invalidates the materalized view. That's not the same thing. Are there TRUNCATE triggers on materialized views? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Materialized views WIP patch
Peter Eisentraut pete...@gmx.net wrote: On 2/19/13 5:47 PM, Kevin Grittner wrote: When I went to do this, I hit a shift/reduce conflict, because with TABLE being optional it couldn't tell whether: TRUNCATE MATERIALIZED VIEW x, y, z; ... was looking for five relations or three. That goes away with MATERIALIZED escalated to TYPE_FUNC_NAME_KEYWORD. Is that OK? Is TRUNCATE even the right command here? For regular tables TRUNCATE is a fast DELETE, which logically empties the table. For materialized views, there is no deleting, so this command (I suppose?) just invalidates the materalized view. That's not the same thing. Hmm. That's what Greg Stark just said, and I replied that nobody else had raised the issue in over three months. With Greg, that's two now. TRUNCATE MATERIALIZED VIEW discards any data which has been loaded into the MV, rendering it unavailable for scanning. Internally, it does do a truncate, exactly as truncate table. The resulting zero-length heap file is what is used to determine whether a materialized view is scannable. When a CREATE WITH DATA or a REFRESH generates zero rows, an empty single page is created to indicate that it is scannable (valid to use in queries) but contains no rows. I agree that INVALIDATE is probably more descriptive, although it seems that there might be some even better word if we bikeshed enough. The question is, is it worth creating a new keyword to call the internal truncate function for materialized views, versus documenting that truncating a materialized view renders it invalid? Again, given the numbers that Tom presented a while back about the space requirements of every new keyword, I don't think this is enough of a gain to justify that. I still squirm a little about having used REFRESH, even though demand for that was overwhelming. Are there TRUNCATE triggers on materialized views? No. Nor SELECT, INSERT, UPDATE, or DELETE triggers. You can't create a trigger of any type on a materialized view. I don't think that would interfere with event triggers, though. -- Kevin Grittner EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Materialized views WIP patch
* Kevin Grittner (kgri...@ymail.com) wrote: Peter Eisentraut pete...@gmx.net wrote: Is TRUNCATE even the right command here? For regular tables TRUNCATE is a fast DELETE, which logically empties the table. For materialized views, there is no deleting, so this command (I suppose?) just invalidates the materalized view. That's not the same thing. Hmm. That's what Greg Stark just said, and I replied that nobody else had raised the issue in over three months. With Greg, that's two now. TRUNCATE MAT VIEW seems like the right command to me. Just my 2c. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] Materialized views WIP patch
On 2/20/13 2:30 PM, Kevin Grittner wrote: Are there TRUNCATE triggers on materialized views? No. Nor SELECT, INSERT, UPDATE, or DELETE triggers. You can't create a trigger of any type on a materialized view. I don't think that would interfere with event triggers, though. More generally, I would consider the invalidation of a materialized view a DDL command, whereas truncating a table is a DML command. This has various implications with triggers, logging, permissions. I think it's not good to mix those two. Also note that un-invalidating==refreshing a materialized view is already a DDL command. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Materialized views WIP patch
Peter Eisentraut pete...@gmx.net wrote: On 2/20/13 2:30 PM, Kevin Grittner wrote: Are there TRUNCATE triggers on materialized views? No. Nor SELECT, INSERT, UPDATE, or DELETE triggers. You can't create a trigger of any type on a materialized view. I don't think that would interfere with event triggers, though. More generally, I would consider the invalidation of a materialized view a DDL command, whereas truncating a table is a DML command. The force of that assertion is somewhat undercut by the fact that the ExecuteTruncate() function does exactly what needs to be done to discard the data in a materialized view and make it appear as non-scannable. Even if we dress it up with different syntax, it's not clear that we wouldn't build a TruncateStmt in the parser and pass it through exactly the same execution path. We would just need to look at the relkind to generate a different command tag. This has various implications with triggers, logging, permissions. I think it's not good to mix those two. Could you give a more concrete example of where you see a problem? -- Kevin Grittner EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Materialized views WIP patch
I would like to know what operations you plan to support independently of the command names. I may have missed much earlier in the discussion but then I suspect things have evolved since then. It sounds like you want to support: 1) Selecting from materialized viws 2) Manually refreshing materialized views 3) Manually truncating materialized views Maybe plus? 4) Automatically dropping materialized views if underlying table(s) are dropped/altered Or this has to be done manually? And explicitly not support 1) Automatically rewriting queries to select from matching views 2) Incrementally refreshing materialized views 3) Manual DML against data in materialized views (except truncate which is kind of DDL) 4) Keeping track of whether the data in the materialized view is up to date I have to say I find this model a bit odd. It seems the UI you're presenting is that they're basically read-only tables that the database will fill in the data for automatically. My mental model of materialized views is that they're basically views that the database guarantees a different performance characteristic for. I would expect a materialized view to be up to date all the time. If we don't support incremental updates (which seems like a fine thing not to support in a first cut) then I would expect any DML against the table to mark the view invalid and any queries against it to produce an error (or possibly go to the source tables using the view definition but that's probably a bad idea for most use cases). Ie. they should behave like a view at all times and have up to date information or fail entirely. I would expect a command like TRUNCATE MATERIALIZED VIEW to exist but I would expect it to be called something like INVALIDATE rather than TRUNCATE and dropping the storage is a side effect of simply telling the database that it doesn't need to maintain this materialized view. Though I could be convinced truncate is a good name as long as it's documented well. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Materialized views WIP patch
On Wed, Feb 20, 2013 at 9:25 PM, Peter Eisentraut pete...@gmx.net wrote: More generally, I would consider the invalidation of a materialized view a DDL command, whereas truncating a table is a DML command. That's not entirely true. From the database's point of view, TRUNCATE is in many ways actually DDL. I actually don't really dislike using TRUNCATE for this command. I was more asking about whether this meant people were thinking of the view as a thing where you could control the data in it by hand and could have the view be empty rather than just not valid. The way I was thinking about it, whatever the command is named, you might be able to tell the database to drop the storage associated with the view but that would make the view invalid until it was refreshed. It wouldn't make it appear to be empty. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers