subject:"\[HACKERS\] Materialized views WIP patch"

Re: [HACKERS] Materialized views WIP patch

2013-08-18 Thread Kevin Grittner

Kevin Grittner kgri...@ymail.com wrote:
 Noah Misch n...@leadboat.com wrote:

 Also, could you explain the use of RelationCacheInvalidateEntry()
 in ExecRefreshMatView()?  CacheInvalidateRelcacheByRelid()
 followed by CommandCounterIncrement() is the typical pattern;
 this is novel. I suspect, though, neither is necessary now that
 the relcache does not maintain populated status based on a fork
 size reading.

 Yeah, that was part of the attempt to support unlogged materialized
 views while also not returning bogus results if the view had not
 been populated, using heap file size.  I agree that this line can
 just come out.  If there are no objections real soon now, I will
 remove it in master and the 9.3 branch before the release
 candidate.

Done.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Materialized views WIP patch

2013-08-15 Thread Kevin Grittner

Apologies, but this sub-thread got lost when I changed email
accounts.  I found it in a final review to make sure nothing had
fallen through the cracks.

Noah Misch n...@leadboat.com wrote:
 On Thu, Jan 24, 2013 at 01:09:28PM -0500, Noah Misch wrote:

 There's no documented support for table constraints on MVs, but
 UNIQUE constraints are permitted:

 [local] test=# alter materialized view mymv add unique (c);
 ALTER MATERIALIZED VIEW

Fix pushed.

 Also, could you explain the use of RelationCacheInvalidateEntry()
 in ExecRefreshMatView()?  CacheInvalidateRelcacheByRelid()
 followed by CommandCounterIncrement() is the typical pattern;
 this is novel. I suspect, though, neither is necessary now that
 the relcache does not maintain populated status based on a fork
 size reading.

Yeah, that was part of the attempt to support unlogged materialized
views while also not returning bogus results if the view had not
been populated, using heap file size.  I agree that this line can
just come out.  If there are no objections real soon now, I will
remove it in master and the 9.3 branch before the release
candidate.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Materialized views WIP patch

2013-07-14 Thread Noah Misch

While doing some post-commit review of the 9.3 materialized view feature, I
noticed a few loose ends:

On Thu, Jan 24, 2013 at 01:09:28PM -0500, Noah Misch wrote:
 Note that [...] ALTER TABLE ... RENAME CONSTRAINT [is]
 currently supported for MVs by ALTER TABLE but not by ALTER MATERIALIZED VIEW.
 
 There's no documented support for table constraints on MVs, but UNIQUE
 constraints are permitted:
 
 [local] test=# alter materialized view mymv add unique (c);
 ALTER MATERIALIZED VIEW
 [local] test=# alter materialized view mymv add check (c  0);
 ERROR:  mymv is not a table
 [local] test=# alter materialized view mymv add primary key (c);
 ERROR:  mymv is not a table or foreign table

The above points still apply.

Also, could you explain the use of RelationCacheInvalidateEntry() in
ExecRefreshMatView()?  CacheInvalidateRelcacheByRelid() followed by
CommandCounterIncrement() is the typical pattern; this is novel.  I suspect,
though, neither is necessary now that the relcache does not maintain populated
status based on a fork size reading.

Thanks,
nm

-- 
Noah Misch
EnterpriseDB http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Materialized views WIP patch

2013-03-08 Thread Bruce Momjian

On Wed, Mar  6, 2013 at 09:16:59AM -0500, Tom Lane wrote:
 Simon Riggs si...@2ndquadrant.com writes:
  On 5 March 2013 22:02, Tom Lane t...@sss.pgh.pa.us wrote:
  FWIW, my opinion is that doing anything like this in the planner is
  going to be enormously expensive.
 
  As we already said: no MVs = zero overhead = no problem.
 
 Well, in the first place that statement is false on its face: we'll
 still spend cycles looking for relevant MVs, or at least maintaining a
 complexly-indexed cache that helps us find out that there are none in
 a reasonable amount of time.  In the second place, even if it were
 approximately true it wouldn't help the people who were using MVs.
 
  It costs in
  the cases where time savings are possible and not otherwise.
 
 And that is just complete nonsense: matching costs whether you find a
 match or not.  Could we have a little less Pollyanna-ish optimism and
 a bit more realism about the likely cost of such a feature?

Should we add this to the TODO list as a possibility?

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Materialized views WIP patch

2013-03-08 Thread Bruce Momjian

On Tue, Mar  5, 2013 at 08:50:39AM +, Simon Riggs wrote:
 Its not a different feature, its what most people expect a feature
 called MV to deliver. That's not a matter of opinion, its simply how
 every other database works currently - Oracle, Teradata, SQLServer at
 least. The fact that we don't allow MVs to automatically optimize

Good points.

 queries is acceptable, as long as that is clearly marked in some way
 to avoid confusion, and I don't mean buried on p5 of the docs. What we
 have here is a partial implementation that can be improved upon over
 next few releases. I hope anyone isn't going to claim that
 Materialized Views have been implemented in the release notes in
 this release, because unqualified that would be seriously misleading
 and might even stifle further funding to improve things to the level
 already implemented elsewhere. Just to reiterate, I fully support the
 committing of this partial feature into Postgres in this release
 because it will be a long haul to complete the full feature and what
 we have here is a reasonable stepping stone to get there.
 
 Transactionally up-yo-date MVs can be used like indexes in the
 planner. The idea that this is impossible because of the permutations
 involved is somewhat ridiculous; there is much published work on
 optimising that and some obvious optimisations. Clearly that varies
 according to the number of MVs and the number of tables they touch,
 not the overall complexity of the query. The overhead is probably same
 or less as partial indexes, which we currently think is acceptable. In
 any case, if you don't wish that overhead, don't use MVs.

While you are right that automatically using materialized views is like
the optimizer choosing partial indexes, we actually already have
auto-selection of row-level materialized views with expression indexes
and index-only scans.  When you do the insert or update, the indexed
function is called and the value stored in the index.  If you later
query the function call, we can pull the value right from the index. 
This, of course, is a very crude definition of materialized view, but it
seems relevant.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Materialized views WIP patch

2013-03-08 Thread Robert Haas

On Thu, Mar 7, 2013 at 12:14 PM, David E. Wheeler da...@justatheory.com wrote:
 On Mar 7, 2013, at 7:55 AM, Kevin Grittner kgri...@ymail.com wrote:
 If the answer to both those questions is “yes,” I think the term
 should remain “table,” with a few mentions that the term includes
 materialized views (and excludes foreign tables).

 And if the answers are not exactly and yes?

 I still tend to think that the term should remain “table,” with brief 
 mentions at the top of pages when the term should be assumed to represent 
 tables and matviews, and otherwise required disambiguations.

This ship has already sailed.  There are plenty of places where
operations apply to a subset of the relation types that exist today,
and we either list them out or refer to relations generically.
Changing that would require widespread changes to both the
documentation and existing error message text.  We cannot decide that
table now means table or materialized view any more than we can
decide that it means table or foreign table, as was proposed around
the time those changes went in.  Yeah, it's more work, and it's a
little annoying, but it's also clear.  Nothing else is.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Materialized views WIP patch

2013-03-07 Thread Simon Riggs

On 6 March 2013 14:16, Tom Lane t...@sss.pgh.pa.us wrote:
 Simon Riggs si...@2ndquadrant.com writes:
 On 5 March 2013 22:02, Tom Lane t...@sss.pgh.pa.us wrote:
 FWIW, my opinion is that doing anything like this in the planner is
 going to be enormously expensive.

 As we already said: no MVs = zero overhead = no problem.

 Well, in the first place that statement is false on its face: we'll
 still spend cycles looking for relevant MVs, or at least maintaining a
 complexly-indexed cache that helps us find out that there are none in
 a reasonable amount of time.  In the second place, even if it were
 approximately true it wouldn't help the people who were using MVs.

We can store info in the relcache, and reduce such a lookup to a
simple if test in the planner. Populating the cache would be easy
enough, approx same overhead as deriving list of constraints for the
relcache.

If you were using MVs, there are further heuristics to apply. MVs come
in various shapes, so we can assess whether they use aggregates,
joins, filters etc and use that for a general match against a query. I
don't see the need for complex assessments in every case.


 It costs in
 the cases where time savings are possible and not otherwise.

 And that is just complete nonsense: matching costs whether you find a
 match or not.  Could we have a little less Pollyanna-ish optimism and
 a bit more realism about the likely cost of such a feature?

It's not a trivial feature; this is a lot of work. But it can be done
efficiently, without significant effect on other workloads. If that
really were to be true, then enable_lookaside = off can be the
default, just as we have for another costly planning feature,
constraint_exclusion.

Matview lookaside is the next-best-action for further work on the
planner, AFAICS. Correctly optimised query parallelism is harder,
IMHO.

What I'm hearing at the moment is please don't make any changes in my
area or don't climb the North face. Considering the rather high bar
to being able to do this effectively, I do understand your interest in
not having your/our time wasted by casual attempts to approach the
problem, but I don't want to slam the door on a serious attempt (2
year project, 1+ man year effort).

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Materialized views WIP patch

2013-03-07 Thread Kevin Grittner

David E. Wheeler da...@justatheory.com wrote:
 Kevin Grittner kgri...@ymail.com wrote:

 I also think that something should be done about the
 documentation for indexes.  Right now that always refers to a
 table.  It would clearly be awkward to change that to table
 or materialized view everywhere.  I wonder if most of thosse
 should be changed to relation with a few mentions that the
 relation could be a table or a materialized view, or whether
 some less intrusive change would be better.  Opinions welcome.

 Isn’t a materialized view really just a table that gets updated
 periodically?

Not exactly.  It is a relation which has some characteristics of a
view (including an entry in pg_rewrite exactly like that for a
view) and some characteristics of a table (including a heap and
optional indexes).  Whether it looks more like a table or more like
a view depends on how you tilt your head.  You could just as easily
say that it is really just a view which periodically caches its
results on disk.  They really are their own thing.  As future
releases add more subtle freshness concepts, incremental updates,
and query rewrite that unique identity will become even more
conspicuous, I think.

 And isn’t a non-matierialized view also thought of as a
 “relation”?

Yes.  Tables, views, and materialized views are all relations.

 If the answer to both those questions is “yes,” I think the term
 should remain “table,” with a few mentions that the term includes
 materialized views (and excludes foreign tables).

And if the answers are not exactly and yes?

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Materialized views WIP patch

2013-03-07 Thread David E. Wheeler

On Mar 7, 2013, at 7:55 AM, Kevin Grittner kgri...@ymail.com wrote:

 If the answer to both those questions is “yes,” I think the term
 should remain “table,” with a few mentions that the term includes
 materialized views (and excludes foreign tables).
 
 And if the answers are not exactly and yes?

I still tend to think that the term should remain “table,” with brief mentions 
at the top of pages when the term should be assumed to represent tables and 
matviews, and otherwise required disambiguations.

Trying to make the least possible work for you here. :-)

David



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Materialized views WIP patch

2013-03-07 Thread Nicolas Barbier

2013/3/5 Kevin Grittner kgri...@ymail.com:

 Perhaps it would be worth looking for anything in the patch that
 you think might be painting us into a corner where it would be hard
 to do all the other cool things.  While it's late enough in the
 process that changing anything like that which you find would be
 painful, it might be a lot more painful later if we release without
 doing something about it.  My hope, of course, is that you won't
 find any such thing.  With this patch I've tried to provide a
 minimal framework onto which these other things can be bolted.
 I've tried hard not to do anything which would make it hard to
 extend, but new eyes may see something I missed.

(Without having looked at the patch, or even the documentation :-/.)

I think that something that might prove useful is the following:
Keeping in mind the possibility of storing something in the matview’s
heap that doesn’t correspond to what a SELECT * FROM matview would
yield (i.e., the “logical content”). The transformation could be
performed by an INSTEAD rule (similar to how a view is expanded to its
definition, a reference to a matview would expand to its heap content
transformed to the “logical content”).

(Note that I don’t have any reason to believe that the current
implementation would make this more difficult than it should be.)

ridiculously long rationale for the previous

(All the following requires making matviews (inter- and intra-)
transactionally up-to-date w.r.t. their base tables at the moment of
querying them. I don’t deal with approximate results, however useful
that might be.)

I think that the possibility of optimizing COUNT(*) (see mail by Greg
Stark in this thread with “the queue of updates with transacion
information that are periodically flattened into the aggregate”) can
be generalized to generic aggregation that way. The idea would be that
a transaction that adds (or deletes or updates) a row in a base table
causes a “delta” row version in the matview. Selecting from the
matview then merges these deltas into one value (for each row that is
logically present in the matview). Every once in a while (or rather
quite often, if the base tables change often), a VACUUM-like clean-up
operations must be run to merge all rows that are “old enough” (i.e.,
whose transactions are not in flight anymore).

Example of trivial aggregation matview weight_per_kind defined as:

SELECT kind, SUM(weight) FROM fruit GROUP BY kind;

The matview would then physically contain rows such as:

xmin, xmax, kind, weight
1000, 0, 'banana', 123
1000, 0, 'apple', 1
1001, 0, 'banana', 2
1002, 0, 'banana', -3

Which means:

* tx 1000 probably performed a clean-up operation and merged a bunch
of banana rows together to yield 123; it also inserted an apple of
weight 1.
* tx 1001 inserted a banana of weight 2. Any clean-up operation coming
by could not merge the 2 into the first row, as long as tx 1000 is in
flight. Otherwise, it would yield 125; physically this would mean
adding a 125 row, marking the 123 and 2 rows as deleted, and then
waiting for VACUUM to remove them).
* tx 1002 deleted a banana with weight 3.

The result of a SELECT * FROM weight_per_kind; would actually execute
SELECT kind, SUM_merge(weight) FROM heap_of_weight_per_kind GROUP BY
kind;

This would result, for tx 1001 (assuming tx 1000 committed and our
snapshot can see it), in:

kind, weight
'banana', 125
'apple', 1

(The -3 is not taken into account, because it is not visible to tx 1001.)

The operator to use at the location of SUM_merge is something that
merges multiple aggregation results (plus results that represent some
kind of “negation”) together. For SUM, it would be SUM itself and the
negation would be numerical negation. There might also be some kind of
“difference” concept used for UPDATE: When updating a weight from 4 to
3, the difference would be -1. Those additional properties could
optionally be added to the definition of each aggregation function; It
must be done for each function that you want to use in such a way.

Other aggregation functions such as AVG would require storing the SUM
+ number of rows in the matview (otherwise two AVGs could not be
merged); again a difference between the heap and the logical content.
Functions such as MIN and MAX are more difficult to fit in this
framework: I can only see how it would work if row deletion were not
allowed (which might still be a valid use-case); luckily, I think MIN
and MAX are not the typical things for which you would want to use
matviews, because quick computation can typically be done directly
using the base tables.

This whole thing would result in incremental updates of
aggregation-matviews that don’t require physical serialization of the
transactions that update the base tables and that query the matview,
which other models (that depend on correspondence between the heap and
logical content of matviews) would probably require.

And that’s where I stop rambling because nobody gets this far anyway,

Re: [HACKERS] Materialized views WIP patch

2013-03-06 Thread Simon Riggs

On 5 March 2013 22:02, Tom Lane t...@sss.pgh.pa.us wrote:

 FWIW, my opinion is that doing anything like this in the planner is
 going to be enormously expensive.  Index matching is already pretty
 expensive, and that has the saving grace that we only do it once per
 base relation.  Your sketch above implies trying to match to MVs once
 per considered join relation, which will be combinatorially worse.
 Even with a lot of sweat over reducing the cost of the matching, it
 will hurt.

As we already said: no MVs = zero overhead = no problem. It costs in
the cases where time savings are possible and not otherwise. The type
of queries and their typical run times are different to the OLTP case,
so any additional planning time is likely to be acceptable. I'm sure
we can have a deep disussion about how to optimise the planner for
this; I'm pretty sure there are reasonable answers to the not-small
difficulties along that road.

Most importantly, I want to make sure we don't swing the door shut on
improvements here, especially if you (Tom) are not personally
convinced enough to do the work yourself.

Making transactional MVs work would be in part justified by the
possible existence of automatic lookaside planning, so yes, the two
things are not linked but certainly closely related.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Materialized views WIP patch

2013-03-06 Thread Tom Lane

Simon Riggs si...@2ndquadrant.com writes:
 On 5 March 2013 22:02, Tom Lane t...@sss.pgh.pa.us wrote:
 FWIW, my opinion is that doing anything like this in the planner is
 going to be enormously expensive.

 As we already said: no MVs = zero overhead = no problem.

Well, in the first place that statement is false on its face: we'll
still spend cycles looking for relevant MVs, or at least maintaining a
complexly-indexed cache that helps us find out that there are none in
a reasonable amount of time.  In the second place, even if it were
approximately true it wouldn't help the people who were using MVs.

 It costs in
 the cases where time savings are possible and not otherwise.

And that is just complete nonsense: matching costs whether you find a
match or not.  Could we have a little less Pollyanna-ish optimism and
a bit more realism about the likely cost of such a feature?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Materialized views WIP patch

2013-03-06 Thread Kevin Grittner

Tatsuo Ishii is...@postgresql.org wrote:

 Was the remaining work on docs done? I would like to test MVs and
 am waiting for the docs completed.

I think they are done.  If you notice anything missing or in need
of clarification please let me know.  At this point missing docs
would be a bug in need of a fix.

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Materialized views WIP patch

2013-03-06 Thread Peter Eisentraut

Kevin,

I haven't seen a reply to this.  Were you able to give my notes below
any consideration?


On 2/15/13 12:44 PM, Peter Eisentraut wrote:
 On 1/25/13 1:00 AM, Kevin Grittner wrote:
 New patch rebased, fixes issues raised by Thom Brown, and addresses
 some of your points.
 
 This patch doesn't apply anymore, so I just took a superficial look.  I
 think the intended functionality and the interfaces look pretty good.
 Documentation looks complete, tests are there.
 
 I have a couple of notes:
 
 * What you call WITH [NO] DATA, Oracle calls BUILD IMMEDIATE/DEFERRED.
 It might be better to use that as well then.
 
 * You use fields named relkind in the parse nodes, but they don't
 actually contain relkind values, which is confusing.  I'd just name the
 field is_matview or something.
 
 * More generally, I wouldn't be so fond of combining the parse handling
 of CREATE TABLE AS and CREATE MATERIALIZED VIEW.  They are similar, but
 then again so are a lot of other things.
 
 * Some of the terminology is inconsistent.  A materialized view is
 sometimes called valid, populated, or built, with approximately the same
 meaning.  Personally, I would settle on built, as per above, but it
 should be one term only.
 
 * I find the name of the relisvalid column a bit confusing.  Especially
 because it only applies to materialized views, and there is already a
 meaning of valid for indexes.  (Recall that indexes are also stored in
 pg_class, but they are concerned about indisvalid.)  I would name it
 something like relmvbuilt.
 
 
 Btw., half of the patch seems to consist of updating places referring to
 relkind.  Is something wrong with the meaning of relkind that this sort
 of thing is required?  Maybe these places should be operating in terms
 of features, not accessing relkind directly.
 
 
 



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Materialized views WIP patch

2013-03-06 Thread Greg Stark

On Tue, Mar 5, 2013 at 9:08 PM, Robert Haas robertmh...@gmail.com wrote:
 All that having been said, it's hard for me to imagine that anyone
 really cares about any of this until we have an incremental update
 feature, which right now we don't.  Actually, I'm betting that's going
 to be significantly harder than automatic-query-rewrite, when all is
 said and done.  Automatic-query-rewrite, if and when we get it, will
 not be easy and will require a bunch of work from someone with a good
 understanding of the planner, but it strikes me as the sort of thing
 that might work out to one large project and then it's done.  Whereas,
 incremental update sounds to me like a series of projects over a
 series of releases targeting various special cases, where we can
 always point to some improvements vs. release N-1 but we're never
 actually done and able to move on to the next thing.  As a roadmap
 goes, I think that's OK.  Even a reasonably simplistic and partial
 implementation of incremental update will benefit a lot of users.  But
 in terms of relative difficulty, it's not at all obvious to me that
 that's the easier part of the project.

While true that's true for a lot of Postgres features. The only ones
that are one-shot projects are buried deep in the internals. Anything
with UI implications inevitably has limitations and then other people
come along and and work on removing or extending those features.

I do agree with Tom though -- the most frequently asked for
materialized view in the past has always been select count(*) from
tab. People assume we already do this and are surprised when we
don't. The cookie cutter solution for it is basically exactly what a
incrementally updated materialized view  solution would look like
(with the queue of updates with transacion information that are
periodically flattened into the aggregate). Rewriting this might be a
bit tricky and require heuristics to determine just how much work to
expend trying to match materialized views, this type of view would be
where most of the win would be.

I also can't see implementing query rewriting for
non-transactionally-accurate materialized views. If people want a
snapshot of the data that may be out of date that's great. I can tons
of use cases for that. But then surely they won't be surprised to have
to query the snapshot explicitly. If can't see going to all this
trouble to implement transactions and snapshots and wal logging and so
on and then silently rewriting queries to produce data that is not up
to date. I think users would be surprised to find bog-standard SQL
occasionally producing incorrect results.

That said, there are cases where snapshots might be up to date even
though we don't implement any incremental updates. If the underlying
data is read-only or hasn't received any update commits since the
snapshot was taken then it might still be useful. There are tons of
ETL applications where you load the data once and then build MVs for
it and  never touch the underlying data again.

-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Materialized views WIP patch

2013-03-06 Thread Kevin Grittner

Kevin Grittner kgri...@ymail.com wrote:
 Tatsuo Ishii is...@postgresql.org wrote:

 Was the remaining work on docs done? I would like to test MVs and
 am waiting for the docs completed.

 I think they are done.  If you notice anything missing or in need
 of clarification please let me know.  At this point missing docs
 would be a bug in need of a fix.

I decided to take another pass through to try to spot anyplace I
might have missed.  I found that I had missed documenting the new
pg_matviews system view, so I have just pushed that.

I also think that something should be done about the documentation
for indexes.  Right now that always refers to a table.  It would
clearly be awkward to change that to table or materialized view
everywhere.  I wonder if most of thosse should be changed to
relation with a few mentions that the relation could be a table
or a materialized view, or whether some less intrusive change would
be better.  Opinions welcome.

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Materialized views WIP patch

2013-03-06 Thread David E. Wheeler

On Mar 6, 2013, at 1:51 PM, Kevin Grittner kgri...@ymail.com wrote:

 I also think that something should be done about the documentation
 for indexes.  Right now that always refers to a table.  It would
 clearly be awkward to change that to table or materialized view
 everywhere.  I wonder if most of thosse should be changed to
 relation with a few mentions that the relation could be a table
 or a materialized view, or whether some less intrusive change would
 be better.  Opinions welcome.

Isn’t a materialized view really just a table that gets updated periodically? 
And isn’t a non-matierialized view also thought of as a “relation”?

If the answer to both those questions is “yes,” I think the term should remain 
“table,” with a few mentions that the term includes materialized views (and 
excludes foreign tables).

Best,

David

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Materialized views WIP patch

2013-03-06 Thread Tatsuo Ishii

 Kevin Grittner kgri...@ymail.com wrote:
 Tatsuo Ishii is...@postgresql.org wrote:

 Was the remaining work on docs done? I would like to test MVs and
 am waiting for the docs completed.

 I think they are done.  If you notice anything missing or in need
 of clarification please let me know.  At this point missing docs
 would be a bug in need of a fix.

Ok.

 I decided to take another pass through to try to spot anyplace I
 might have missed.  I found that I had missed documenting the new
 pg_matviews system view, so I have just pushed that.
 
 I also think that something should be done about the documentation
 for indexes.  Right now that always refers to a table.  It would
 clearly be awkward to change that to table or materialized view
 everywhere.  I wonder if most of thosse should be changed to
 relation with a few mentions that the relation could be a table
 or a materialized view, or whether some less intrusive change would
 be better.  Opinions welcome.

You might want to add small description about MVs to Tutorial
documentation 3.2 Views. Here is the current description of views in
the doc.

-
3.2. Views

Refer back to the queries in Section 2.6. Suppose the combined listing
of weather records and city location is of particular interest to your
application, but you do not want to type the query each time you need
it. You can create a view over the query, which gives a name to the
query that you can refer to like an ordinary table:

CREATE VIEW myview AS
SELECT city, temp_lo, temp_hi, prcp, date, location
FROM weather, cities
WHERE city = name;

SELECT * FROM myview;

Making liberal use of views is a key aspect of good SQL database
design. Views allow you to encapsulate the details of the structure of
your tables, which might change as your application evolves, behind
consistent interfaces.

Views can be used in almost any place a real table can be
used. Building views upon other views is not uncommon.
-
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Materialized views WIP patch

2013-03-05 Thread Simon Riggs

On 3 March 2013 23:39, Tom Lane t...@sss.pgh.pa.us wrote:
 Nicolas Barbier nicolas.barb...@gmail.com writes:
 2013/3/3 Kevin Grittner kgri...@ymail.com:
 Nicolas Barbier nicolas.barb...@gmail.com wrote:
 I think that automatically using materialized views even when the
 query doesn’t mention them directly, is akin to automatically
 using indexes without having to mention them in the query.

 Oh, I understand that concept perfectly well, I just wonder how
 often it is useful in practice.

 There's a much more fundamental reason why this will never happen, which
 is that the query planner is not licensed to decide that you only want
 an approximate and not an exact answer to your query.

 If MVs were guaranteed always up-to-date, maybe we could think about
 automatic use of them --- but that's a far different feature from what
 Kevin has here.

Its not a different feature, its what most people expect a feature
called MV to deliver. That's not a matter of opinion, its simply how
every other database works currently - Oracle, Teradata, SQLServer at
least. The fact that we don't allow MVs to automatically optimize
queries is acceptable, as long as that is clearly marked in some way
to avoid confusion, and I don't mean buried on p5 of the docs. What we
have here is a partial implementation that can be improved upon over
next few releases. I hope anyone isn't going to claim that
Materialized Views have been implemented in the release notes in
this release, because unqualified that would be seriously misleading
and might even stifle further funding to improve things to the level
already implemented elsewhere. Just to reiterate, I fully support the
committing of this partial feature into Postgres in this release
because it will be a long haul to complete the full feature and what
we have here is a reasonable stepping stone to get there.

Transactionally up-yo-date MVs can be used like indexes in the
planner. The idea that this is impossible because of the permutations
involved is somewhat ridiculous; there is much published work on
optimising that and some obvious optimisations. Clearly that varies
according to the number of MVs and the number of tables they touch,
not the overall complexity of the query. The overhead is probably same
or less as partial indexes, which we currently think is acceptable. In
any case, if you don't wish that overhead, don't use MVs.

Non-transactionally up-to-date MVs could also be used like indexes in
the planner, if we gave the planner the licence it (clearly) lacks.
If using MV makes a two-hour query return in 1 minute, then using an
MV that is 15 minutes out of date is likely to be a win. The licence
is some kind of user parameter/option that specifies how stale an
answer a query can return. For many queries that involve averages and
sums, a stale or perhaps an approximate answer would hardly differ
anyway. So I think there is room somewhere there for a staleness
time specification by the user, allowing approximation.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Materialized views WIP patch

2013-03-05 Thread Kevin Grittner

Simon Riggs si...@2ndquadrant.com wrote:
 On 3 March 2013 23:39, Tom Lane t...@sss.pgh.pa.us wrote:
 Nicolas Barbier nicolas.barb...@gmail.com writes:
 2013/3/3 Kevin Grittner kgri...@ymail.com:
 Nicolas Barbier nicolas.barb...@gmail.com wrote:
 I think that automatically using materialized views even when
 the query doesn’t mention them directly, is akin to
 automatically using indexes without having to mention them in
 the query.

 Oh, I understand that concept perfectly well, I just wonder
 how often it is useful in practice.

 There's a much more fundamental reason why this will never
 happen, which is that the query planner is not licensed to
 decide that you only want an approximate and not an exact answer
 to your query.

 If MVs were guaranteed always up-to-date, maybe we could think
 about automatic use of them --- but that's a far different
 feature from what Kevin has here.

 Its not a different feature, its what most people expect a
 feature called MV to deliver. That's not a matter of opinion, its
 simply how every other database works currently - Oracle,
 Teradata, SQLServer at least. The fact that we don't allow MVs to
 automatically optimize queries is acceptable, as long as that is
 clearly marked in some way to avoid confusion, and I don't mean
 buried on p5 of the docs. What we have here is a partial
 implementation that can be improved upon over next few releases.
 I hope anyone isn't going to claim that Materialized Views have
 been implemented in the release notes in this release, because
 unqualified that would be seriously misleading and might even
 stifle further funding to improve things to the level already
 implemented elsewhere. Just to reiterate, I fully support the
 committing of this partial feature into Postgres in this release
 because it will be a long haul to complete the full feature and
 what we have here is a reasonable stepping stone to get there.

 Transactionally up-yo-date MVs can be used like indexes in the
 planner. The idea that this is impossible because of the
 permutations involved is somewhat ridiculous; there is much
 published work on optimising that and some obvious optimisations.
 Clearly that varies according to the number of MVs and the number
 of tables they touch, not the overall complexity of the query.
 The overhead is probably same or less as partial indexes, which
 we currently think is acceptable. In any case, if you don't wish
 that overhead, don't use MVs.

 Non-transactionally up-to-date MVs could also be used like
 indexes in the planner, if we gave the planner the licence it
 (clearly) lacks. If using MV makes a two-hour query return in 1
 minute, then using an MV that is 15 minutes out of date is likely
 to be a win. The licence is some kind of user parameter/option
 that specifies how stale an answer a query can return. For many
 queries that involve averages and sums, a stale or perhaps an
 approximate answer would hardly differ anyway. So I think there
 is room somewhere there for a staleness time specification by
 the user, allowing approximation.

I don't think I disagree with any of what Simon says other than his
feelings about the planning cost.  Imagine that there are ten MVs
that might apply to a complex query, but some of them are mutually
exclusive, so there are a large number of permutations of MVs which
could be used to replace parts of the original query.  And maybe
some of base tables have indexes which could reduce execution cost
which aren't present in some or all of the MVs.  And each MV has a
number of indexes.  The combinatorial explosion of possible plans
would make it hard to constrain plan time without resorting to some
crude rules about what to choose.  That's not an unsolvable
problem, but I see it as a pretty big problem.

I have no doubt that someone could take a big data warehouse and
add one or two MVs and show a dramatic improvement in the run time
of a query.  Almost as big as if the query were rewritten to usee
the MV directly.  It would make for a very nice presentation, and
as long as they are used sparingly this could be a useful tool for
a data warehouse environment which is playing with alternative ways
to optimize slow queries which pass a lot of data.  In other
environments, I feel that it gets a lot harder to show a big win.

The good news is that it sounds like we agree on the ideal
long-term feature set.  I'm just a lot more excited, based on the
use-cases I've seen, about the addition of incremental updates than
substituting MVs into query plans which reference the underlying
tables.  Perhaps that indicates a chance to the final feature set
sooner, through everyone scratching their own itches.  :-)

And we both seem to feel that some system for managing acceptable
levels of MV freshness is a necessary feature in order to go very
much further.

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list

Re: [HACKERS] Materialized views WIP patch

2013-03-05 Thread Simon Riggs

On 5 March 2013 12:15, Kevin Grittner kgri...@ymail.com wrote:

 I don't think I disagree with any of what Simon says other than his
 feelings about the planning cost.  Imagine that there are ten MVs
 that might apply to a complex query, but some of them are mutually
 exclusive, so there are a large number of permutations of MVs which
 could be used to replace parts of the original query.  And maybe
 some of base tables have indexes which could reduce execution cost
 which aren't present in some or all of the MVs.  And each MV has a
 number of indexes.  The combinatorial explosion of possible plans
 would make it hard to constrain plan time without resorting to some
 crude rules about what to choose.  That's not an unsolvable
 problem, but I see it as a pretty big problem.

If we are proposing that we shouldn't try to optimise because its not
usefully solvable, then I would disagree.

If we are saying that more plans are possible with MVs, then I'd say,
yes there *could* be - that's the one of the purposes. That represents
more options for optimisation and we should be happy, not sad about
that. Yes, we would need some thought to how those potential
optimisations can be applied without additional planning cost, but I
see that as a long term task, not a problem. The question is will the
potential for additional planning time actually materialise into a
planning problem? (See below).

 I have no doubt that someone could take a big data warehouse and
 add one or two MVs and show a dramatic improvement in the run time
 of a query.  Almost as big as if the query were rewritten to usee
 the MV directly.  It would make for a very nice presentation, and
 as long as they are used sparingly this could be a useful tool for
 a data warehouse environment which is playing with alternative ways
 to optimize slow queries which pass a lot of data.  In other
 environments, I feel that it gets a lot harder to show a big win.

Are there realistically going to be more options to consider? In
practice, no, because in most cases we won't be considering both MVs
and indexes.

Splatting MVs on randomly won't show any more improvement than
splatting indexes on randomly helps. Specific optimisations help in
specific cases only. And of course, adding extra data structures that
have no value certainly does increase planning time. Presumably we'd
need some way of seeing how frequently MVs were picked, so we could
drop unused MVs just like we can indexes.

* Indexes are great at speeding up various kinds of search queries. If
you don't have any queries like that, they help very little.

* MVs help in specific and restricted use cases, but can otherwise be
thought of as a new kind of index structure. MVs help with joins and
aggregations, so if you don't do much of that, you'll see no benefit.

That knowledge also allows us to develop heuristics for sane
optimisation. If the MV has a GROUP BY in it, and a query doesn't,
then it probably won't help much to improve query times. If it
involves a join you aren't using, then that won't help either. My
first guess would be that we don't even bother looking for MV plans
unless it has an aggregated result, or a large scale join. We do
something similar when we look for plans that use indexes when we have
appropriate quals - no quals, no indexes.

As a result, I don't see MVs increasing planning times for requests
that would make little or no use of them. There will be more planning
time on queries that could make use of them and that is good because
we really care about that.

Sure, a badly written planner might cost more time than it saves. All
of this work requires investment from someone with the time and/or
experience to make a good go at it. I'm not pushing Tom towards it,
nor anyone else, but I do want to see the door kept open for someone
to do this when possible (i.e. not GSoC).

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Materialized views WIP patch

2013-03-05 Thread Josh Berkus

Simon, Kevin, all:

Actually, there was already an attempt at automated MV query planning as
a prior university project.  We could mine that for ideas.

Hmmm.  I thought it was on pgfoundry, but it's not.  Does anyone have
access to ACM databases etc. so they could search for this?


-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Materialized views WIP patch

2013-03-05 Thread Robert Haas

On Tue, Mar 5, 2013 at 7:15 AM, Kevin Grittner kgri...@ymail.com wrote:
 I don't think I disagree with any of what Simon says other than his
 feelings about the planning cost.  Imagine that there are ten MVs
 that might apply to a complex query, but some of them are mutually
 exclusive, so there are a large number of permutations of MVs which
 could be used to replace parts of the original query.  And maybe
 some of base tables have indexes which could reduce execution cost
 which aren't present in some or all of the MVs.  And each MV has a
 number of indexes.  The combinatorial explosion of possible plans
 would make it hard to constrain plan time without resorting to some
 crude rules about what to choose.  That's not an unsolvable
 problem, but I see it as a pretty big problem.

I'm not sure I agree.  Suppose you have a query like SELECT * FROM a
INNER JOIN b ON a.x = b.x INNER JOIN c ON a.y = c.y WHERE some
stuff.  The query planner will construct paths for scans on a, b, and
c.  Then it will construct joinrels for (a b), (a c), (b c), and
eventually (a b c) and calculate a set of promising paths for each of
them.  If there is a materialized view available for one of those
joinrels, all we really need to do is add the possible paths for
scanning that materialized view to the joinrel.  They'll either be
better than the existing paths, or they won't.  If they're better, the
paths that otherwise would have gotten chosen will get kicked out.  If
they're worse, the materialized-view paths will get kicked out.
Either way, we don't have any more paths than we would have otherwise
- so no combinatorial explosion.

There is one case where we might end up with more paths than we had
before.  Suppose there's a materialized view on the query SELECT a.x,
a.y, a.z, b.t FROM a INNER JOIN b ON a.x = b.x ORDER BY a.z and the
users enters just that query.  Suppose further that the materialized
view has an index on column z, but table a does not.  In that case,
the best paths for the joinrel (a b) not involving the materialized
view will be an unordered path, but we could scan the materialized
view using the index and so will have a path that is ordered by a.z to
add to the joinrel.  This path will stick around even if it's more
expensive than the unordered path because we know it avoids a final
sort.  So in that case we do have more paths, but they are potentially
useful paths, so I don't see that as a problem.

It seems to me that the tricky part of this is not that it might add a
lot more paths (because I don't think it will, and if it does I think
they're useful paths), but that figuring out whether or not a
materialized view matches any particular joinrel might be expensive.
I think we're going to need a pretty accurate heuristic for quickly
discarding materialized views that can't possibly be relevant to the
query as a whole, or to particular joinrels.  There are a couple of
possible ways to approach that.  The most manual of these is probably
to have a command like ALTER TABLE x {ENABLE | DISABLE } REWRITE
MATERIALIZED y, where the user has to explicitly ask for materialized
views to be considered, or else they aren't.  That sort of
fine-grained control might have other benefits as well.

I think a more automated solution is also possible, if we want it.
For a materialized view to match a query, all of the baserels in the
materialized view must also be present in the query.  (Actually, there
are situations where this isn't true; e.g. the materialized view has
an extra table, but it's joined in a way that could be pruned away by
the join removal code, but I think we could ignore such
somewhat-pathological cases at least initially.)  It seems to me that
if we could figure out a very-cheap way to throw away all of the
materialized views that don't pass that basic test, we'd be reasonably
close to a workable solution.  A database with tons of materialized
views defined on the same set of target relations might still have
some planning time problems, but so might a database with tons of
indexes.

In that regard, what I was thinking about is to use something sort of
like a Bloom filter.  Essentially, produce a fingerprint for each
materialized view.  For the sake of argument, let's say that the
fingerprint is a 64-bit integer, although it could be a bit string of
any length.  To construct the fingerprint, hash the OID of each
relation involved in the view onto a bit position between 0 and 63.
Set the bits for all relations involved in the materialized view.
Then, construct a fingerprint for the query in the same way.  Any
materialized view where (matview_fingerprint  query_fingerprint) !=
matview_fingerprint needn't be considered; moreover, for a given
joinrel, any materialized view where matview_fingerprint !=
joinrel_fingerprint (computed using the same scheme) needn't be
considered.  Of course, a matching fingerprint doesn't mean that the
materialized view matches, or even necessarily that it involves the
correct

Re: [HACKERS] Materialized views WIP patch

2013-03-05 Thread Kevin Grittner

Josh Berkus j...@agliodbs.com wrote:

There is no shortage of literature on the topic, although any
papers from the ACM could certainly be of interest due to the
general high quality of papers published there.  Adding anything
like this to 9.3 is clearly out of the question, though, so I
really don't want to spend time researching this now, or
encouraging others to do so until after we have a 9.3 release
candidate.

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Materialized views WIP patch

2013-03-05 Thread Kevin Grittner

Robert Haas robertmh...@gmail.com wrote:

 All that having been said, it's hard for me to imagine that
 anyone really cares about any of this until we have an
 incremental update feature, which right now we don't.

These are actually independent of one another, as long as we nail
down how we're determining freshness -- which is probably needed
for either.  Someone who's immersed in tuning long-running DW
queries might be interested in this before incremental update.
(They might load the data once per month, so refreshing the MVs as
a step in that process might be cheaper than incrementally
maintaining them.)  Someone could base freshness on
pg_relation_is_scannable() and start working on automatic query
rewrite right now, if they wanted to.

 Actually, I'm betting that's going to be significantly harder
 than automatic-query-rewrite, when all is said and done. 
 Automatic-query-rewrite, if and when we get it, will not be easy
 and will require a bunch of work from someone with a good
 understanding of the planner, but it strikes me as the sort of
 thing that might work out to one large project and then it's
 done.

I still think we're going to hit the wall on planning time under
certain circumstances and need to tweak that over the course of
several releases, but now is not the time to get into the details
of why I think that.  We've spent way too much time on it already
for the point we're at in the 9.3 cycle.  I've kept my concerns
hand-wavy on purpose, and am trying hard to resist the temptation
to spend a lot of time demonstrating the problems.

 Whereas, incremental update sounds to me like a series of
 projects over a series of releases targeting various special
 cases, where we can always point to some improvements vs. release
 N-1 but we're never actually done

Exactly.  I predict that we will eventually have some special sort
of trigger for maintaining MVs based on base table changes to
handle the ones that are just too expensive (in developer time or
run time) to fully automate.  But there is a lot of low-hanging
fruit for automation.

 Even a reasonably simplistic and partial implementation of
 incremental update will benefit a lot of users.

Agreed.

 But in terms of relative difficulty, it's not at all obvious to
 me that that's the easier part of the project.

I totally agree that getting something working to use MVs in place
of underlying tables is not all that different or more difficult
than using partial indexes.  I just predict that we'll get a lot of
complaints about cases where it results in worse performance and
we'll need to deal with those issues.  I don't seem that as being
brain surgery; just a messy matter of trying to get this pretty
theory to work well in the real world -- probably using a bunch of
not-so-elegant heuristics.  And in the end, the best you can hope
for is performance not noticeably worse than you would get if you
modified your query to explicitly use the MV(s) -- you're just
saving yourself the rewrite.  Well, OK, there is the point that,
(like indexes) if you run the query which hits the base tables with
different parameters, and a new plan is generated each time, it
might pick different MVs or exclude them as is most efficient for
the given parameters.  That's the Holy Grail of all this.

-- 
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Materialized views WIP patch

2013-03-05 Thread Nicolas Barbier

2013/3/5 Robert Haas robertmh...@gmail.com:

 All that having been said, it's hard for me to imagine that anyone
 really cares about any of this until we have an incremental update
 feature, which right now we don't.  Actually, I'm betting that's going
 to be significantly harder than automatic-query-rewrite, when all is
 said and done.

I agree.

E.g., things such as keeping a matview consistent relative to changes
applied to the base tables during the same transaction, might be
mightily difficult to implement in a performant way. OTOH, matviews
that can only be used for optimization if their base tables were not
changed “too recently” (e.g., by transactions that are still in
flight, including the current transaction), are probably kind of
useful in themselves as long as those base tables are not updated all
the time.

Nicolas

-- 
A. Because it breaks the logical sequence of discussion.
Q. Why is top posting bad?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Materialized views WIP patch

2013-03-05 Thread Tom Lane

Robert Haas robertmh...@gmail.com writes:
 On Tue, Mar 5, 2013 at 7:15 AM, Kevin Grittner kgri...@ymail.com wrote:
 I don't think I disagree with any of what Simon says other than his
 feelings about the planning cost.

 I'm not sure I agree.  Suppose you have a query like SELECT * FROM a
 INNER JOIN b ON a.x = b.x INNER JOIN c ON a.y = c.y WHERE some
 stuff.  The query planner will construct paths for scans on a, b, and
 c.  Then it will construct joinrels for (a b), (a c), (b c), and
 eventually (a b c) and calculate a set of promising paths for each of
 them.  If there is a materialized view available for one of those
 joinrels, all we really need to do is add the possible paths for
 scanning that materialized view to the joinrel.

That only works to the extent that a materialized view can be described
by a path.  My impression is that most of the use-cases for MVs will
involve aggregates or similar data reduction operators, and we don't
currently implement anything about aggregates at the Path level.
Arguably it would be useful to do so; in particular, we could get rid
of the currently hard-wired mechanism for choosing between sorted and
hashed aggregation, and perhaps there'd be a less grotty way to deal
with index-optimized MIN/MAX aggregates.  But there's a great deal to do
to make that happen, and up to now I haven't seen any indication that it
would do much except add overhead.

FWIW, my opinion is that doing anything like this in the planner is
going to be enormously expensive.  Index matching is already pretty
expensive, and that has the saving grace that we only do it once per
base relation.  Your sketch above implies trying to match to MVs once
per considered join relation, which will be combinatorially worse.
Even with a lot of sweat over reducing the cost of the matching, it
will hurt.

 All that having been said, it's hard for me to imagine that anyone
 really cares about any of this until we have an incremental update
 feature, which right now we don't.

Agreed.  Even if we're willing to have an approximate results are OK
GUC (which frankly strikes me as a horrid idea), people would certainly
not be willing to turn it on without some guarantee as to how stale the
results could be.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Materialized views WIP patch

2013-03-05 Thread Nicolas Barbier

2013/3/5 Kevin Grittner kgri...@ymail.com:

 Exactly.  I predict that we will eventually have some special sort
 of trigger for maintaining MVs based on base table changes to
 handle the ones that are just too expensive (in developer time or
 run time) to fully automate.  But there is a lot of low-hanging
 fruit for automation.

I think it would be totally OK to restrict the possible definitions
for matviews that can be maintained fully incrementally to something
like:

SELECT attributes and aggregations FROM trivial joins WHERE trivial
condition GROUP BY attributes;

Those definitions are the most useful for optimizing the things that
matviews are good at (joins and aggregation).

Nicolas

PS. Sorry for having fired off this discussion that obviously doesn’t
really relate to the current patch.

-- 
A. Because it breaks the logical sequence of discussion.
Q. Why is top posting bad?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Materialized views WIP patch

2013-03-05 Thread Kevin Grittner

Nicolas Barbier nicolas.barb...@gmail.com wrote:

 PS. Sorry for having fired off this discussion that obviously
 doesn’t really relate to the current patch.

I know it's hard to resist.  While I think there will be a number
of people for whom the current patch will be a convenience and will
therefore use it, it is hard to look at what's there and *not* go
if only it also...

Perhaps it would be worth looking for anything in the patch that
you think might be painting us into a corner where it would be hard
to do all the other cool things.  While it's late enough in the
process that changing anything like that which you find would be
painful, it might be a lot more painful later if we release without
doing something about it.  My hope, of course, is that you won't
find any such thing.  With this patch I've tried to provide a
minimal framework onto which these other things can be bolted. 
I've tried hard not to do anything which would make it hard to
extend, but new eyes may see something I missed.

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Materialized views WIP patch

2013-03-05 Thread Josh Berkus

On 03/05/2013 01:09 PM, Kevin Grittner wrote:
 Josh Berkus j...@agliodbs.com wrote:
 
 There is no shortage of literature on the topic, although any
 papers from the ACM could certainly be of interest due to the
 general high quality of papers published there.  Adding anything
 like this to 9.3 is clearly out of the question, though, so I
 really don't want to spend time researching this now, or
 encouraging others to do so until after we have a 9.3 release
 candidate.

Good point.

Just FYI: once we start work on 9.4, some university team got planner
stuff for matviews working with postgres, using version 8.0 or something.


-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Materialized views WIP patch

2013-03-05 Thread Tatsuo Ishii

 Kevin Grittner kgri...@ymail.com wrote:
 
 REFRESH MATERIALIZED VIEW name [, ...] WITH [ NO ] DATA
 
 Given the short time, I left out the [, ...].  If people think
 that is important to get into this release, a follow-on patch might
 be possible.
 
 Barring objections, I will use the above and push tomorrow.
 
 I'm still working on docs, and the changes related to the syntax
 change are still only lightly tested, but as far as I know, all is
 complete except for the docs.  I'm still working on those and
 expect to have them completed late today.  I'm posting this patch
 to allow a chance for final review of the code changes before I
 push.

Was the remaining work on docs done? I would like to test MVs and am
waiting for the docs completed.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Materialized views WIP patch

2013-03-04 Thread Euler Taveira

On 03-03-2013 21:27, Josh Berkus wrote:
 I think it would be worth talking about when someone wants to implement
 it.  I'd imagine it would require setting a GUC, though, which would be
 off by default for obvious reasosn.
 
-1. Why not adding another storage_parameter, say auto_refresh=on? Also,
that's another feature.

 And of limited utility, as mentioned.
 
For a first release, that is fine as is. Let's not complicate a feature that
has been widely discussed in this development cycle.


-- 
   Euler Taveira de Oliveira - Timbira   http://www.timbira.com.br/
   PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Materialized views WIP patch

2013-03-03 Thread Dean Rasheed

On 2 March 2013 15:06, Kevin Grittner kgri...@ymail.com wrote:
 [ ... ] led to
 this thought:

 REFRESH MATERIALIZED VIEW name [, ...] WITH [ NO ] DATA


[Sorry to join this discussion so late]

FWIW I had a quick look at other DBs to see if there were any other
precedents out there. Oracle was the only one I could find with
anything similar. They use the same creation syntax:

  CREATE MATERIALIZED VIEW name [options] AS SELECT ...

and they use ALTER for everything else, such as refreshing the MV:

  ALTER MATERIALIZED VIEW name REFRESH [options];

AFAICT the nearest thing they have to TRUNCATE/DISCARD is:

  ALTER MATERIALIZED VIEW name CONSIDER FRESH;

They do also support updateable materialized views with standard DML,
but it doesn't look as though they allow TRUNCATE to operate directly
on a MV (although it can be made to propagate from a base table to a
MV, in which case allowing TRUNCATE on the MV itself with a different
meaning would likely be confusing).

Oracle's MVs have lots of options, all of which hang off the 2 basic
CREATE and ALTER commands. There's a certain appeal to that, rather
than inventing or overloading a bunch of other commands as more
options get added. The proposed REFRESH command is OK for today's
options, but I think it might be overly limiting in the future.

Of course, since this isn't in the SQL standard, we are free to use
any syntax we like. We don't have to follow Oracle, but having a
common syntax might make some people's lives easier, and I haven't
seen a convincing argument as to why any alternative syntax is better.

Regards,
Dean


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Materialized views WIP patch

2013-03-03 Thread Kevin Grittner

Dean Rasheed dean.a.rash...@gmail.com wrote:
 Kevin Grittner kgri...@ymail.com wrote:

 [ ... ] led to this thought:

 REFRESH MATERIALIZED VIEW name [, ...] WITH [ NO ] DATA

 [Sorry to join this discussion so late]

 FWIW I had a quick look at other DBs to see if there were any
 other precedents out there. Oracle was the only one I could find
 with anything similar. They use the same creation syntax:

   CREATE MATERIALIZED VIEW name [options] AS SELECT ...

It is a pretty obvious choice when you look at other SQL
statements.

 and they use ALTER for everything else, such as refreshing the
 MV:

   ALTER MATERIALIZED VIEW name REFRESH [options];

No, that is for specifiying when and under what conditions an
automatic refresh is done.  To do an immediate action which is
equivalent to what I have for the REFRESH statement, they use a
REFRESH() function.  That seemed too incompatible with how we've
done everything else in PostgreSQL -- I felt that a statement would
make more sense.  Consider REINDEX, CLUSTER, and VACUUM FULL for
example.

 AFAICT the nearest thing they have to TRUNCATE/DISCARD is:

   ALTER MATERIALIZED VIEW name CONSIDER FRESH;

No, that doesn't rebuild or discard data -- if the MV is
out-of-date and therefore unscannable according to the how the MV
has been set up, this overrides that indication and allows scanning
in spite of that.

 They do also support updateable materialized views with standard
 DML, but it doesn't look as though they allow TRUNCATE to operate
 directly on a MV (although it can be made to propagate from a
 base table to a MV, in which case allowing TRUNCATE on the MV
 itself with a different meaning would likely be confusing).

They allow DML on the MV in order to update it.  The default
REFRESH() function executes a TRUNCATE statement followed by INSERT
/ SELECT using the MV's query.

 Oracle's MVs have lots of options, all of which hang off the 2
 basic CREATE and ALTER commands. There's a certain appeal to
 that, rather than inventing or overloading a bunch of other
 commands as more options get added. The proposed REFRESH command
 is OK for today's options, but I think it might be overly
 limiting in the future.

For what ALTER MATERIALIZED VIEW in Oracle does, I think it makes
sense to use ALTER.  I don't think this feature should use
functions for REFRESH.  Why Oracle can get away with functions for
it is that they allow DML on an MV, which seems to me to compromise
the integrity of the feature, at least as default behavior.

I see us supporting automatic incremental updates of progressively
more complex queries, and we may at some point want to add a
trigger-based maintenance option; but the functionality available
with a trigger-based approach is almost entirely availaable in
PostgreSQL today without this feature.  Rewriting queries using
expressions which match the MV's query to pull from the MV instead
of the underlying tables is the exception.  While that is a sexy
feature, and I'm sure one can construct examples where it helps
performance, it seems to me unlikely to be very generally useful. 
I suspect that it exists mostly so that people who want to write an
RFP to pick a particular product can include that as a requirement.
 In other words, I think the main benefit of automatic rewrite
using an MV is marketing, not technical or performance.  That's
important, too; but let's focus first on getting what is most
useful.

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Materialized views WIP patch

2013-03-03 Thread Dean Rasheed

On 3 March 2013 13:12, Kevin Grittner kgri...@ymail.com wrote:
 Dean Rasheed dean.a.rash...@gmail.com wrote:
 Kevin Grittner kgri...@ymail.com wrote:

 [ ... ] led to this thought:

 REFRESH MATERIALIZED VIEW name [, ...] WITH [ NO ] DATA

 [Sorry to join this discussion so late]

 FWIW I had a quick look at other DBs to see if there were any
 other precedents out there. Oracle was the only one I could find
 with anything similar. They use the same creation syntax:

   CREATE MATERIALIZED VIEW name [options] AS SELECT ...

 It is a pretty obvious choice when you look at other SQL
 statements.

 and they use ALTER for everything else, such as refreshing the
 MV:

   ALTER MATERIALIZED VIEW name REFRESH [options];

 No, that is for specifiying when and under what conditions an
 automatic refresh is done.  To do an immediate action which is
 equivalent to what I have for the REFRESH statement, they use a
 REFRESH() function.  That seemed too incompatible with how we've
 done everything else in PostgreSQL -- I felt that a statement would
 make more sense.  Consider REINDEX, CLUSTER, and VACUUM FULL for
 example.

 AFAICT the nearest thing they have to TRUNCATE/DISCARD is:

   ALTER MATERIALIZED VIEW name CONSIDER FRESH;

 No, that doesn't rebuild or discard data -- if the MV is
 out-of-date and therefore unscannable according to the how the MV
 has been set up, this overrides that indication and allows scanning
 in spite of that.


Ah, OK I see.

I misunderstood what the Oracle docs were saying. ALTER only changes
the MV's definition, whereas their REFRESH() function and your REFRESH
statement updates the data in the MV. That makes much more sense.

Regards,
Dean


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Materialized views WIP patch

2013-03-03 Thread Nicolas Barbier

2013/3/3 Kevin Grittner kgri...@ymail.com:

 Rewriting queries using
 expressions which match the MV's query to pull from the MV instead
 of the underlying tables is the exception.  While that is a sexy
 feature, and I'm sure one can construct examples where it helps
 performance, it seems to me unlikely to be very generally useful.
 I suspect that it exists mostly so that people who want to write an
 RFP to pick a particular product can include that as a requirement.
  In other words, I think the main benefit of automatic rewrite
 using an MV is marketing, not technical or performance.

I think that automatically using materialized views even when the
query doesn’t mention them directly, is akin to automatically using
indexes without having to mention them in the query. That way, queries
can be written the natural way, and “creating materialized views” is
an optimization that can be applied by a DBA without having to update
the application queries to use them.

Nicolas

--
A. Because it breaks the logical sequence of discussion.
Q. Why is top posting bad?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Materialized views WIP patch

2013-03-03 Thread Kevin Grittner

Nicolas Barbier nicolas.barb...@gmail.com wrote:
 2013/3/3 Kevin Grittner kgri...@ymail.com:

 Rewriting queries using expressions which match the MV's query
 to pull from the MV instead of the underlying tables is the
 exception.  While that is a sexy feature, and I'm sure one can
 construct examples where it helps performance, it seems to me
 unlikely to be very generally useful.  I suspect that it exists
 mostly so that people who want to write an RFP to pick a
 particular product can include that as a requirement.  In other
 words, I think the main benefit of automatic rewrite using an MV
 is marketing, not technical or performance.

 I think that automatically using materialized views even when the
 query doesn’t mention them directly, is akin to automatically
 using indexes without having to mention them in the query. That
 way, queries can be written the natural way, and “creating
 materialized views” is an optimization that can be applied by a
 DBA without having to update the application queries to use them.

Oh, I understand that concept perfectly well, I just wonder how
often it is useful in practice.  The cost of planning with indexes
tends to go up dramatically the planner needs to evaluate all
possible combinations of access paths.  We've devoted quite a bit
of energy keeping that from being something like the factorial of
the number of indexes.  If you now need to find all materialized
views which could substitute for parts of a query, and look at all
permutations of how those could be used, and which indexes can be
used for each of those combinations, you have planning time which
can explode to extreme levels.

Where the number of database objects are small and their sizes are
large (like some data warehouse situations), you could come out
ahead; and if I wanted to showcase the capability you describe
that's what I would use.  With a large number of database objects
with only a few tens of millions of rows per table, I doubt you
will come out ahead.

Granted, you could say the same thing about indexes, and they are
very often useful.  I'm saying that I expect the usefulness of the
technique you describe is generally very low, but not zero.  Except
for marketing, where it's a flashy feature.  I would be interested
in seeing information to show where it works well, though.  There
is probably something to be learned by looking at the details of
the environment and workload of such a site.

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Materialized views WIP patch

2013-03-03 Thread Greg Stark

On Sat, Mar 2, 2013 at 3:06 PM, Kevin Grittner kgri...@ymail.com wrote:

 (1)  Any DML against the MV would need to be limited to some
 context fired by the underlying changes.  If we allow changes to
 the MV outside of that without it being part of some updateable
 MV feature (reversing the direction of flow of changes), the MV
 could not be trusted at all.  If you're going to do that, just use
 a table.

Oh! I misunderstood what you were suggesting. I think we were talking
at cross-purposes.

You're imagining a user issues truncate against the underlying
table(s) and the code that handles updating the materialized view will
need to issue a truncate against the MV to update it.

I was imagining that you wanted to be able to issue DML against the MV
just as one can against an updateable view. That DML should propagate
to the underlying table(s) through various magic.

It's a pretty theoretical fear now but one day it may be important to
avoid confusion between these two.

-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Materialized views WIP patch

2013-03-03 Thread Nicolas Barbier

2013/3/3 Kevin Grittner kgri...@ymail.com:

 Nicolas Barbier nicolas.barb...@gmail.com wrote:

 I think that automatically using materialized views even when the
 query doesn’t mention them directly, is akin to automatically
 using indexes without having to mention them in the query. That
 way, queries can be written the natural way, and “creating
 materialized views” is an optimization that can be applied by a
 DBA without having to update the application queries to use them.

 Oh, I understand that concept perfectly well, I just wonder how
 often it is useful in practice.  The cost of planning with indexes
 tends to go up dramatically the planner needs to evaluate all
 possible combinations of access paths.  We've devoted quite a bit
 of energy keeping that from being something like the factorial of
 the number of indexes.  If you now need to find all materialized
 views which could substitute for parts of a query, and look at all
 permutations of how those could be used, and which indexes can be
 used for each of those combinations, you have planning time which
 can explode to extreme levels.

I guess that a basic version of such a feature would do something like this:

(1) Check for each matview whether it simply consists of an optional
bunch of joins + optional aggregation + optional general filter
conditions (to support something akin to a partial index). If not, the
optimization doesn’t take this matview into account. This step can be
done beforehand.
(2) Check for each (sub)query in the query-to-optimize whether it does
the following (at a smart point in the optimization phase). If any of
these conditions are not met, don’t use this matview:
  - Joins at least the tables that are joined in the matview.
  - Contains join conditions and general filter conditions that are at
least as strict.
  - Doesn’t refer elsewhere to any attributes that the matview doesn’t contain.
(3) Always replace the corresponding query parts with the matview
(i.e., assume that the cost will always be lower than performing the
original query).
(4) If multiple matviews fit in step 3, try them all (and use the one
that yields the lower total cost).
(5) Always replace any aggregation with the corresponding
aggregation-results (if they exist) from the matview.

That doesn’t sound as if it would make planning time explode that much
(except, because of step 4, when there are many matviews that contain
overlapping sets of joined tables, and a query joins over the union of
those sets, *and* the replaceable-by-matviews parts are in
subqueries). It could even decrease it significantly (e.g., in the
case where a bunch of joins would be replaced with a scan of a
matview).

Also, I suppose that once such functionality exists, application
writers would be more inclined to write “heavy” queries that do lots
of aggregation even in an OLTP environment, of the kind that is these
days typically only done in OLAP environments.

Nicolas

-- 
A. Because it breaks the logical sequence of discussion.
Q. Why is top posting bad?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Materialized views WIP patch

2013-03-03 Thread Tom Lane

Nicolas Barbier nicolas.barb...@gmail.com writes:
 2013/3/3 Kevin Grittner kgri...@ymail.com:
 Nicolas Barbier nicolas.barb...@gmail.com wrote:
 I think that automatically using materialized views even when the
 query doesnât mention them directly, is akin to automatically
 using indexes without having to mention them in the query.

 Oh, I understand that concept perfectly well, I just wonder how
 often it is useful in practice.

There's a much more fundamental reason why this will never happen, which
is that the query planner is not licensed to decide that you only want
an approximate and not an exact answer to your query.

If MVs were guaranteed always up-to-date, maybe we could think about
automatic use of them --- but that's a far different feature from what
Kevin has here.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Materialized views WIP patch

2013-03-03 Thread Josh Berkus


 There's a much more fundamental reason why this will never happen, which
 is that the query planner is not licensed to decide that you only want
 an approximate and not an exact answer to your query.

I think it would be worth talking about when someone wants to implement
it.  I'd imagine it would require setting a GUC, though, which would be
off by default for obvious reasosn.

 If MVs were guaranteed always up-to-date, maybe we could think about
 automatic use of them --- but that's a far different feature from what
 Kevin has here.

And of limited utility, as mentioned.


-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Materialized views WIP patch

2013-03-03 Thread Kevin Grittner

Kevin Grittner kgri...@ymail.com wrote:

 I'm still working on docs, and the changes related to the syntax
 change are still only lightly tested, but as far as I know, all is
 complete except for the docs.  I'm still working on those and
 expect to have them completed late today.  I'm posting this patch
 to allow a chance for final review of the code changes before I
 push.

Pushed.

I'll be keeping an eye on the buildfarm.

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Materialized views WIP patch

2013-03-03 Thread Craig Ringer

On 03/04/2013 08:27 AM, Josh Berkus wrote:
 There's a much more fundamental reason why this will never happen, which
 is that the query planner is not licensed to decide that you only want
 an approximate and not an exact answer to your query.
 I think it would be worth talking about when someone wants to implement
 it.  I'd imagine it would require setting a GUC, though, which would be
 off by default for obvious reasosn.
I'm not a fan of this, even with a GUC. Imagine doing remote debugging
by email/phone. There are enough things to check already (does your
application REALLY commit that transaction?) without also having to
deal with settings that can cause a potentially out of date view of the
data to be used without it being visible in the query its self.

I hate to even say it, but this is where a per-query [redacted] would be
good, so we could say in the query text that this query may use matviews
that are not perfectly up to date.

At this point it's all hand-waving anyway, since no feature to allow the
planner to automatically rewrite a subtree of a query to use a matview
instead exists.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Materialized views WIP patch

2013-03-02 Thread Greg Stark

On Fri, Mar 1, 2013 at 3:01 PM, Ants Aasma a...@cybertec.at wrote:
 . To give another example of potential future
 update semantics, if we were to allow users manually maintaining
 materialized view contents using DML commands, one would expect
 TRUNCATE to mean make this matview empty, not make this matview
 unavailable.

Wouldn't that just be a regular table then though? How is that a
materialized view?

If anything someone might expect truncate to delete any rows from the
source table that appear in the view. But I think it's likely that
even if materialized views were updateable truncate wouldn't be one of
the updateable operations.


-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Materialized views WIP patch

2013-03-02 Thread Kevin Grittner

Greg Stark st...@mit.edu wrote:
 Ants Aasma a...@cybertec.at wrote:

 To give another example of potential future update semantics, if
 we were to allow users manually maintaining materialized view
 contents using DML commands, one would expect TRUNCATE to mean
 make this matview empty, not make this matview unavailable.

 Wouldn't that just be a regular table then though? How is that a
 materialized view?

 If anything someone might expect truncate to delete any rows from
 the source table that appear in the view. But I think it's likely
 that even if materialized views were updateable truncate wouldn't
 be one of the updateable operations.

Yeah, the only way it would make sense to truncate MV contents from
a user-written maintenance trigger (assuming we might have such a
thing some day) would be if you decided that the change to the
underlying data was so severe that you effectively needed to
REFRESH, and there would probably be a better way to go about
dealing with that.

Two other reasons that this might not be a problem are:

(1)  Any DML against the MV would need to be limited to some
context fired by the underlying changes.  If we allow changes to
the MV outside of that without it being part of some updateable
MV feature (reversing the direction of flow of changes), the MV
could not be trusted at all.  If you're going to do that, just use
a table.

(2)  I can think of a couple not-too-horrible syntax tricks we
could use to escape from the corner we're worried about painting
ourselves into.

All of that said, some combination of Heikki's previous suggestion
that maybe REFRESH could be used and my noticing that both TRUNCATE
and REFRESH create a new heap for an MV, it's just a question of
whether we then run the MV's query to fill it with data, led to
this thought:

REFRESH MATERIALIZED VIEW name [, ...] WITH [ NO ] DATA

This sort of mirrors the CREATE MATERIALIZED VIEW style (which was
based on CREATE TABLE AS) and WITH NO DATA puts the MV into the
unscannable state either way.  I can change the parser to make this
literally just the new spelling of TRUNCATE MATERIALIZED VIEW
without dashing my hopes of pushing the patch tomorrow.  (The patch
has been ready to go for weeks other than this syntax issue and
documentation which needs to refer to whatever syntax is chosen.)

Barring objections, I will use the above and push tomorrow.

The only issues which have been raised which will not be addressed
at that point are:

(1)  The suggestion that ALTER MATERIALIZED VIEW name ALTER column
support changing the collation isn't something I can see how to do
without complication and risk beyond what is appropriate at this
point in the release cycle.  It will be left off, at least for now.
 To get a new collation, you will need to drop the MV and re-create
it with a query which specifies the collation for the result
column.

(2)  The sepgsql changes are still waiting for a decision from
security focused folks.  I have two patches for that contrib module
ready based on my best reading of things -- one which uses table
security labels and one which instead uses new matview labels. 
When we get a call on which is preferred, I suspect that one patch
or the other will be good as-is or with minimal change.

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Materialized views WIP patch

2013-03-01 Thread Ants Aasma

On Thu, Feb 28, 2013 at 7:52 PM, Kevin Grittner kgri...@ymail.com wrote:
 Barring a sudden confluence of opinion, I will go with TRUNCATE for
 the initial spelling.  I tend to favor that spelling for several
 reasons.  One was the size of the patch needed to add the opposite
 of REFRESH to the backend code:

FWIW, I found Andres's point about closing the door on updatable views
quite convincing. If at any point we want to add updatable
materialized views, it seems like a bad inconsistency to have TRUNCATE
mean something completely different from DELETE. While update
capability for materialized views might not be a common use case, I
don't think it's fair to completely shut the door on it to have easier
implementation and shorter syntax. Especially as the shorter syntax is
semantically inconsistent - normal truncate removes the data,
materialized view just makes the data inaccessible until the next
refresh.

Sorry for weighing in late, but it seemed to me that this point didn't
get enough consideration.

Ants Aasma
-- 
Cybertec Schönig  Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Materialized views WIP patch

2013-03-01 Thread Kevin Grittner

Ants Aasma a...@cybertec.at wrote:
 Kevin Grittner kgri...@ymail.com wrote:
 Barring a sudden confluence of opinion, I will go with TRUNCATE
 for the initial spelling.  I tend to favor that spelling for
 several reasons.  One was the size of the patch needed to add
 the opposite of REFRESH to the backend code:

 FWIW, I found Andres's point about closing the door on updatable
 views quite convincing. If at any point we want to add updatable
 materialized views, it seems like a bad inconsistency to have
 TRUNCATE mean something completely different from DELETE. While
 update capability for materialized views might not be a common
 use case, I don't think it's fair to completely shut the door on
 it to have easier implementation and shorter syntax. Especially
 as the shorter syntax is semantically inconsistent - normal
 truncate removes the data, materialized view just makes the data
 inaccessible until the next refresh.

 Sorry for weighing in late, but it seemed to me that this point
 didn't get enough consideration.

Personally, I don't understand why anyone would want updateable
materialized views.  That's probably because 99% of the cases where
I've seen that someone wanted them, they wanted them updated to
match the underlying data using some technique that didn't require
the modification or commit of the underlying data to carry the
overhead of maintaining the MV.  In other words, they do not want
the MV to be up-to-date for performance reasons.  That is a big
part of the reason for *wanting* to use an MV.  How do you make an
asynchronously-maintained view updateable?

In addtion, at least 80% of the cases I've seen where people want
an MV it is summary information, which does not tie a single MV row
to a single underlying row.  If someone updates an aggregate number
like an average, I see no reasonable way to map that to the
underlying data in a meaningful way.

I see the contract of a materialized view as providing a
table-backed relation which shows the result set of a query as of
some point in time.  Perhaps it is a failure of imagination, but I
don't see where modifying that relation directly is compatible with
that contract.

Can you describe a meaningful use cases for an udpateable
materialized view?

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Materialized views WIP patch

2013-03-01 Thread Ants Aasma

On Fri, Mar 1, 2013 at 4:18 PM, Kevin Grittner kgri...@ymail.com wrote:
 Personally, I don't understand why anyone would want updateable
 materialized views.  That's probably because 99% of the cases where
 I've seen that someone wanted them, they wanted them updated to
 match the underlying data using some technique that didn't require
 the modification or commit of the underlying data to carry the
 overhead of maintaining the MV.  In other words, they do not want
 the MV to be up-to-date for performance reasons.  That is a big
 part of the reason for *wanting* to use an MV.  How do you make an
 asynchronously-maintained view updateable?

 In addtion, at least 80% of the cases I've seen where people want
 an MV it is summary information, which does not tie a single MV row
 to a single underlying row.  If someone updates an aggregate number
 like an average, I see no reasonable way to map that to the
 underlying data in a meaningful way.

 I see the contract of a materialized view as providing a
 table-backed relation which shows the result set of a query as of
 some point in time.  Perhaps it is a failure of imagination, but I
 don't see where modifying that relation directly is compatible with
 that contract.

 Can you describe a meaningful use cases for an udpateable
 materialized view?

I actually agree that overwhelming majority of users don't need or
want updateable materialized views. My point was that we can't at this
point rule out that people will think of a good use for this. I don't
have any real use cases for this, but I can imagine a few situations
where updateable materialized views wouldn't be nonsensical.

One case would be if the underlying data is bulkloaded and is
subsetted into smaller materialized views for processing using
off-the-shelf tools that expect tables. One might want to propagate
changes from those applications to the base data.

The other case would be the theoretical future where materialized
views can be incrementally and transactionally maintained, in that
case being able to express modifications on the views could actually
make sense.

I understand that the examples are completely hypothetical and could
be solved by using regular tables. I just have feeling that will
regret conflating TRUNCATE semantics for slight implementation and
notation convenience. To give another example of potential future
update semantics, if we were to allow users manually maintaining
materialized view contents using DML commands, one would expect
TRUNCATE to mean make this matview empty, not make this matview
unavailable.

Ants Aasma
-- 
Cybertec Schönig  Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Materialized views WIP patch

2013-02-28 Thread Robert Haas

On Sat, Feb 23, 2013 at 8:00 AM, Michael Paquier
michael.paqu...@gmail.com wrote:
 it is. http://www.postgresql.org/docs/9.2/static/sql-reset.html
 DISCARD would be better.

Well, personally, I'm in favor of either TRUNCATE or ALTER
MATERIALIZED VIEW ... DISCARD.  I think it's a dangerous precedent to
suppose that we're going to start using DISCARD for things that have
nothing to do with the existing meanings of DISCARD.  Number one, I
think it's confusing.  Number two, it's currently possible to
determine whether something is DDL, DML, or other by looking at the
first word of the command.  If we throw that out the window we may
cause performance issues for connection pooling software that tries to
be clever like that.  Mind you, I'm not aware of any connection
pooling software that actually does this today, because I think pgpool
includes a full parser and pgbouncer includes no parser at all, but it
still seems like a possibly-useful trick.  And number three, the
possibility of grammar conflicts with things we might want to do in
the future seems unnecessarily high.  If we use ALTER MATERIALIZED
VIEW ... WHATEVER, we only have to worry about grammar conflicts with
other ALTER MATERIALIZED VIEW commands; if we reuse DISCARD or RESET,
we've got potential conflicts with completely unrelated syntax.  That
consideration alone would be sufficient reason for me to choose to
stick everything under ALTER MATERIALIZED VIEW, no matter how poor a
semantic fit it seems otherwise.  Of course, if we stick with
TRUNCATE, this becomes a non-issue.

All that having been said, I'm not in favor of pushing this patch out
to 9.4 because we can't agree on minor syntax details.  In the absence
of consensus, my feeling is that Kevin should exercise committer's
prerogative and commit this in the way that seems best to him.  If a
clear contrary consensus subsequently emerges, we can always change
it.  It is not as if the particular choice of SQL syntax is hard to
whack around.  Once we release it we're stuck with it, but certainly
between now and beta we can change it whenever we like.  I'd rather
have the core part of the feature committed and tinker with the syntax
than wait longer to land the patch.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Materialized views WIP patch

2013-02-28 Thread Tom Lane

Robert Haas robertmh...@gmail.com writes:
 On Sat, Feb 23, 2013 at 8:00 AM, Michael Paquier
 michael.paqu...@gmail.com wrote:
 it is. http://www.postgresql.org/docs/9.2/static/sql-reset.html
 DISCARD would be better.

 Well, personally, I'm in favor of either TRUNCATE or ALTER
 MATERIALIZED VIEW ... DISCARD.  I think it's a dangerous precedent to
 suppose that we're going to start using DISCARD for things that have
 nothing to do with the existing meanings of DISCARD.

Yeah, there's actually a serious problem with choosing DISCARD:
surely we are not going to include trash all MVs in the behavior
of DISCARD ALL.  So unless you would like to say that DISCARD ALL
doesn't mean what it appears to mean, we can't make MV reset be
one of the sub-flavors of DISCARD.

So that seems to leave us with either TRUNCATE or an ALTER sub-syntax.
Personally I'd prefer the latter but it's surely debatable.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Materialized views WIP patch

2013-02-28 Thread Heikki Linnakangas


On 28.02.2013 16:55, Robert Haas wrote:

On Sat, Feb 23, 2013 at 8:00 AM, Michael Paquier
michael.paqu...@gmail.com  wrote:

it is. http://www.postgresql.org/docs/9.2/static/sql-reset.html
DISCARD would be better.


Well, personally, I'm in favor of either TRUNCATE or ALTER
MATERIALIZED VIEW ... DISCARD.  I think it's a dangerous precedent to
suppose that we're going to start using DISCARD for things that have
nothing to do with the existing meanings of DISCARD.  Number one, I
think it's confusing.  Number two, it's currently possible to
determine whether something is DDL, DML, or other by looking at the
first word of the command.  If we throw that out the window we may
cause performance issues for connection pooling software that tries to
be clever like that.


FWIW, I totally agree with that. From that point of view, the best thing 
would be to tack this onto the REFRESH command, perhaps something like:


REFRESH matview INVALIDATE;
REFRESH matview UNREFRESH;
REFRESH matview DISCARD;

It's a bit weird that the command is called REFRESH, if the effect is 
the exact opposite of refreshing it. And we usually do have two separate 
commands for doing something and undoing the same; CREATE - DROP, 
PREPARE - DEALLOCATE, LISTEN - UNLISTEN, and so forth.


I think we're being too hung up on avoiding new (unreserved) keywords. 
Yes, the grammar is large because of so many keywords, but surely 
there's some better solution to that than adopt syntax that sucks. Let's 
invent a new keyword (INVALIDATE? UNREFRESH?), and deal with the grammar 
bloat separately.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Materialized views WIP patch

2013-02-28 Thread Robert Haas

On Thu, Feb 28, 2013 at 11:00 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Well, personally, I'm in favor of either TRUNCATE or ALTER
 MATERIALIZED VIEW ... DISCARD.  I think it's a dangerous precedent to
 suppose that we're going to start using DISCARD for things that have
 nothing to do with the existing meanings of DISCARD.

 Yeah, there's actually a serious problem with choosing DISCARD:
 surely we are not going to include trash all MVs in the behavior
 of DISCARD ALL.  So unless you would like to say that DISCARD ALL
 doesn't mean what it appears to mean, we can't make MV reset be
 one of the sub-flavors of DISCARD.

Good point.

 So that seems to leave us with either TRUNCATE or an ALTER sub-syntax.
 Personally I'd prefer the latter but it's surely debatable.

I agree.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Materialized views WIP patch

2013-02-28 Thread Kevin Grittner

Heikki Linnakangas hlinnakan...@vmware.com wrote:
 On 28.02.2013 16:55, Robert Haas wrote:

 Well, personally, I'm in favor of either TRUNCATE or ALTER
 MATERIALIZED VIEW ... DISCARD.  I think it's a dangerous
 precedent to suppose that we're going to start using DISCARD for
 things that have nothing to do with the existing meanings of
 DISCARD.  Number one, I think it's confusing.  Number two, it's
 currently possible to determine whether something is DDL, DML,
 or other by looking at the first word of the command.  If we
 throw that out the window we may cause performance issues for
 connection pooling software that tries to be clever like that.

 FWIW, I totally agree with that. From that point of view, the
 best thing would be to tack this onto the REFRESH command,
 perhaps something like:

 REFRESH matview INVALIDATE;
 REFRESH matview UNREFRESH;
 REFRESH matview DISCARD;

 It's a bit weird that the command is called REFRESH, if the
 effect is the exact opposite of refreshing it. And we usually do
 have two separate commands for doing something and undoing the
 same; CREATE - DROP, PREPARE - DEALLOCATE, LISTEN - UNLISTEN, and
 so forth.

 I think we're being too hung up on avoiding new (unreserved)
 keywords. Yes, the grammar is large because of so many keywords,
 but surely there's some better solution to that than adopt syntax
 that sucks. Let's invent a new keyword (INVALIDATE? UNREFRESH?),
 and deal with the grammar bloat separately.

I'm OK with any grammar that we can reach consensus on, but that
seems elusive and I don't want to hold up getting the meat of the
patch committed while we haggle out this syntax detail.  Votes have
shifted frequently, but as I make out the latest opinion of people
making a statement that looks like an explicit vote, dividing the
value of a split vote between the choices, I get:

Kevin Grittner: TRUNCATE
Stephen Frost: TRUNCATE
Peter Eisentraut: ALTER
Andres Freund: RESET or DISCARD
Josh Berkus: RESET
Greg Stark: TRUNCATE
Michael Paquier: DISCARD
Robert Haas: TRUNCATE or ALTER
Tom Lane: TRUNCATE or ALTER
Heikki Linnakangas: REFRESH?

Vote totals:

TRUNCATE:  4.0
ALTER: 2.0
DISCARD:   1.5
RESET: 1.5
REFRESH    1.0?

Barring a sudden confluence of opinion, I will go with TRUNCATE for
the initial spelling.  I tend to favor that spelling for several
reasons.  One was the size of the patch needed to add the opposite
of REFRESH to the backend code:

diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 2a55e02..eb7a14f 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -1234,11 +1234,12 @@ truncate_check_rel(Relation rel)
 {
    AclResult   aclresult;
 
-   /* Only allow truncate on regular tables */
-   if (rel-rd_rel-relkind != RELKIND_RELATION)
+   /* Only allow truncate on regular tables and materialized views. */
+   if (rel-rd_rel-relkind != RELKIND_RELATION 
+   rel-rd_rel-relkind != RELKIND_MATVIEW)
    ereport(ERROR,
    (errcode(ERRCODE_WRONG_OBJECT_TYPE),
-    errmsg(\%s\ is not a table,
+    errmsg(\%s\ is not a table or materialized view,
    RelationGetRelationName(rel;
 
    /* Permissions checks */

That's it.  That takes it from no way to release the space held by
the current contents of a materialized view and render it
unscannable until its rule's query is used to populate it again to
working.  Now, there are docs and psql support needed on top of
that, and the regression tests use the verb, and bikeshedding led
to a minor tweak of TRUNCATE so that you could specify TRUNCATE
MATERIALIZED VIEW -- but to get working backend code, the above is
sufficient.  That strikes me a prima facie evidence that it's not a
horribly off-target verb.

In terms of things to consider in choosing a verb are that, while
complete absence of data in the MV's backing table is the only
thing which will render it unscannable in this initial version,
there is clear demand to eventually track information on how
up-to-date data is, and treat the MV as unscannable for less
extreme conditions, such as the asynchronous application of changes
from backing tables having fallen behind some configurable
threshold.  In essence, people want the error instead of scanning
the relation if the generated data is not within a range of time
that is fresh enough, and truncating the backing relation (or
creating the MV WITH NO DATA) is a special case of that since the
data is not representing the results of the MV's query as of *any*
known point in time.  (As previously discussed, that is distinct
from an empty relation which is known to represent a fresh view of
the results of that query.)

If we pick a new pair of verbs, the connotations I think we should
be looking for include the ability to deal with at least these:

* Make the MV represent the result set of its query as of this
moment.

* Declare the MV to be too stale for the contents of its backing

Re: [HACKERS] Materialized views WIP patch

2013-02-25 Thread Erik Rijkers

2013-02-19  Kevin Grittner wrote:
[matview-system_views-v2.diff]

I assumed the patches matview-v4.patch and matview-system_views-v2.diff
were to be applied together.

They do apply correctly but during tests, the test rules ... FAILED.

Perhaps it is solved already but I thought I'd mention it in case it is 
overlooked.

Thanks,

Erik Rijkers




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Materialized views WIP patch

2013-02-23 Thread Greg Stark

On Sat, Feb 23, 2013 at 1:00 AM, Josh Berkus j...@agliodbs.com wrote:
 I prefer RESET, especially since it could eventually support RESET ALL
 MATERIALIZED VIEWS if that turns out to be useful.  How does the parser
 like that?

Isn't reset currently only used for GUCs? I think that makes for a
strange crossover.

Really, I'm sorry for bringing this up. I really don't think
truncate is a bad way to spell it.


-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Materialized views WIP patch

2013-02-23 Thread Michael Paquier

On Sat, Feb 23, 2013 at 9:55 PM, Greg Stark st...@mit.edu wrote:

 On Sat, Feb 23, 2013 at 1:00 AM, Josh Berkus j...@agliodbs.com wrote:
  I prefer RESET, especially since it could eventually support RESET ALL
  MATERIALIZED VIEWS if that turns out to be useful.  How does the parser
  like that?

 Isn't reset currently only used for GUCs? I think that makes for a
 strange crossover.

it is. http://www.postgresql.org/docs/9.2/static/sql-reset.html
DISCARD would be better.
My 2c.
-- 
Michael

Re: [HACKERS] Materialized views WIP patch

2013-02-22 Thread Tom Lane

Andres Freund and...@2ndquadrant.com writes:
 On 2013-02-21 14:11:10 -0800, Kevin Grittner wrote:
 DISABLE MATERIALIZED VIEW mv;  -- ALTER clause for constraints
 DISCARD MATERIALIZED VIEW DATA mv;  -- session state
 RELEASE MATERIALIZED VIEW DATA mv;  -- savepoint
 RESET MATERIALIZED VIEW DATA mv;  -- run-time parameter
 
 I think any of these could work.  I'm personally most inclined
 toward DISABLE MATERIALIZED VIEW.  It seems to convey the semantics
 better, especially if you leave out DATA as an additonal word. 

 I vote for RESET or DISCARD. DISABLE sounds more like you disable
 automatic refreshes or somesuch.

Yeah, I don't much like DISABLE either.  I'm also concerned about
overloading RESET this way --- that statement has complicated-enough
syntax already, not to mention way too many shades of meaning.  So that
leaves me voting for DISCARD M.V. DATA, which seems pretty precise.
It's a bit verbose, but since when has SQL been succinct?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Materialized views WIP patch

2013-02-22 Thread Josh Berkus


 That feels completely wrong to me.  For one thing, I can't think of
 any ALTER commands to populate or remove data.  What did you think
 of the idea of something like DISCARD MATERIALIZED VIEW DATA as a
 new statment?  Or maybe RESET MATERIALIZED VIEW?

I prefer RESET, especially since it could eventually support RESET ALL
MATERIALIZED VIEWS if that turns out to be useful.  How does the parser
like that?

BTW, to contradict Peter E., for my part I would NOT want matview resets
to be logged as DDL.  I would only want matview definitition changes to
be so logged.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Materialized views WIP patch

2013-02-21 Thread Tom Lane

Greg Stark st...@mit.edu writes:
 The way I was thinking about it, whatever the command is named, you
 might be able to tell the database to drop the storage associated with
 the view but that would make the view invalid until it was refreshed.
 It wouldn't make it appear to be empty.

Actually, that seems like a pretty key point to me.  TRUNCATE TABLE
results in a table that is perfectly valid, you just deleted all the
rows that used to be in it.  Throwing away an MV's contents should
not result in an MV that is considered valid.  That being the case,
lumping them as being the same operation feels like the wrong thing,
and so we should choose a different name for the MV operation.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Materialized views WIP patch

2013-02-21 Thread Andres Freund

On 2013-02-21 04:14:09 +, Greg Stark wrote:
 On Wed, Feb 20, 2013 at 9:25 PM, Peter Eisentraut pete...@gmx.net wrote:
  More generally, I would consider the invalidation of a materialized view
  a DDL command, whereas truncating a table is a DML command.
 
 That's not entirely true. From the database's point of view, TRUNCATE
 is in many ways actually DDL.
 
 I actually don't really dislike using TRUNCATE for this command. I
 was more asking about whether this meant people were thinking of the
 view as a thing where you could control the data in it by hand and
 could have the view be empty rather than just not valid.

It also might get confusing when we get materialized views that are
auto-updateable. I am not suggesting to forward TRUNCATE to the internal
storage in that case but giving an error so its an easy to find
distinction to a normal table seems like a good idea.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Materialized views WIP patch

2013-02-21 Thread Robert Haas

On Wed, Feb 20, 2013 at 11:14 PM, Greg Stark st...@mit.edu wrote:
 On Wed, Feb 20, 2013 at 9:25 PM, Peter Eisentraut pete...@gmx.net wrote:
 More generally, I would consider the invalidation of a materialized view
 a DDL command, whereas truncating a table is a DML command.

 That's not entirely true. From the database's point of view, TRUNCATE
 is in many ways actually DDL.

 I actually don't really dislike using TRUNCATE for this command.

Me neither.  I'm astonished that we're seriously considering adding
new keywords for this.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Materialized views WIP patch

2013-02-21 Thread Peter Eisentraut

On 2/20/13 5:03 PM, Kevin Grittner wrote:
 Peter Eisentraut pete...@gmx.net wrote:
 On 2/20/13 2:30 PM, Kevin Grittner wrote:
 Are there TRUNCATE triggers on materialized views?
 No.  Nor SELECT, INSERT, UPDATE, or DELETE triggers.  You can't
 create a trigger of any type on a materialized view.  I don't
 think that would interfere with event triggers, though.

 More generally, I would consider the invalidation of a
 materialized view a DDL command, whereas truncating a table is a
 DML command.
 
 The force of that assertion is somewhat undercut by the fact that
 the ExecuteTruncate() function does exactly what needs to be done
 to discard the data in a materialized view and make it appear as
 non-scannable.  Even if we dress it up with different syntax, it's
 not clear that we wouldn't build a TruncateStmt in the parser and
 pass it through exactly the same execution path.  We would just
 need to look at the relkind to generate a different command tag.

This is a fall-out of the implementation, and that's fine (although I'd
personally still be in favor of putting that state in the catalog, not
into the block count on disk, effectively), but I'm talking about the
external interfaces we present.

 This has various implications with triggers, logging,
 permissions.  I think it's not good to mix those two.
 
 Could you give a more concrete example of where you see a problem?

* Logging: You can set things to log DDL commands only.  I would want a
MV invalidation to be logged.

* Permissions: There is a TRUNCATE permission, would that apply here?
There is no refresh permission.

* Triggers: There are TRUNCATE triggers, but they don't apply here.

* Triggers: I don't know how event triggers work, but I'd like
materialized view events to be grouped together somehow.

* Don't know the opinion of sepgsql on all this.

I think what this all comes down to, as I've mentioned before, is that
the opposite of this proposed truncate operation is the refresh
operation, and that is a DDL command under ALTER MATERIALIZED VIEW.
Both of these fundamental operations -- truncate/refresh,
invalidate/validate, empty/refill, whatever -- should be grouped
together somehow, as far as syntax, as well logging, permissions,
trigger handling, and so on are concerned.  You don't need a new command
or key word for this.  How about ALTER MATERIALIZED VIEW DISCARD?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Materialized views WIP patch

2013-02-21 Thread Peter Eisentraut

On 2/20/13 11:14 PM, Greg Stark wrote:
 That's not entirely true. From the database's point of view, TRUNCATE
 is in many ways actually DDL.

Whether something is DDL or DML or a read operation (query) is not an
implementation detail, it's a user-exposed category.  Since TRUNCATE is
logically equivalent to DELETE, it's a DML operation, as far as the user
is concerned.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Materialized views WIP patch

2013-02-21 Thread Kevin Grittner

Peter Eisentraut pete...@gmx.net wrote:
 On 2/20/13 11:14 PM, Greg Stark wrote:
 That's not entirely true. From the database's point of view,
 TRUNCATE is in many ways actually DDL.

 Whether something is DDL or DML or a read operation (query) is
 not an implementation detail, it's a user-exposed category. 
 Since TRUNCATE is logically equivalent to DELETE, it's a DML
 operation, as far as the user is concerned.

Not really.  It doesn't follow the same MVCC behavior as DML.  This
is user-visible, documented behavior.

-- 
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Materialized views WIP patch

2013-02-21 Thread Kevin Grittner

Tom Lane t...@sss.pgh.pa.us wrote:
 Greg Stark st...@mit.edu writes:
 The way I was thinking about it, whatever the command is named, you
 might be able to tell the database to drop the storage associated with
 the view but that would make the view invalid until it was refreshed.
 It wouldn't make it appear to be empty.

 Actually, that seems like a pretty key point to me.  TRUNCATE TABLE
 results in a table that is perfectly valid, you just deleted all the
 rows that used to be in it.  Throwing away an MV's contents should
 not result in an MV that is considered valid.

It doesn't.  That was one of the more contentious points in the
earlier bikeshedding phases.  Some felt that throwing away the
contents was a form of making the MV out of date and as such
didn't require any special handling.  Others, including myself,
felt that data not present was a distinct state from generated
zero rows and that attempting to scan a materialized view for
which data had not been generated must result in an error.  The
latter property has been maintained from the beginning -- or at
least that has been the intent.

test=# CREATE TABLE t (id int NOT NULL PRIMARY KEY, type text NOT NULL, amt 
numeric NOT NULL);
CREATE TABLE
test=# CREATE MATERIALIZED VIEW tm AS SELECT type, sum(amt) AS totamt FROM t 
GROUP BY type WITH NO DATA;
SELECT 0
test=# SELECT pg_relation_is_scannable('tm'::regclass);
 pg_relation_is_scannable
--
 f
(1 row)

test=# SELECT * FROM tm;
ERROR:  materialized view tm has not been populated
HINT:  Use the REFRESH MATERIALIZED VIEW command.
test=# REFRESH MATERIALIZED VIEW tm;
REFRESH MATERIALIZED VIEW
test=# SELECT pg_relation_is_scannable('tm'::regclass);
 pg_relation_is_scannable
--
 t
(1 row)

test=# TRUNCATE MATERIALIZED VIEW tm;
TRUNCATE TABLE
test=# SELECT * FROM tm;
ERROR:  materialized view tm has not been populated
HINT:  Use the REFRESH MATERIALIZED VIEW command.
test=# SELECT pg_relation_is_scannable('tm'::regclass);
 pg_relation_is_scannable
--
 f
(1 row)

 That being the case, lumping them as being the same operation
 feels like the wrong thing, and so we should choose a different
 name for the MV operation.

There is currently no truncation of MV data without rendering the
MV unscannable.  Do you still feel it needs a different command
name?

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Materialized views WIP patch

2013-02-21 Thread Kevin Grittner

Greg Stark st...@mit.edu wrote:

 I actually don't really dislike using TRUNCATE for this
 command.  I was more asking about whether this meant people were
 thinking of the view as a thing where you could control the data
 in it by hand and could have the view be empty rather than just
 not valid.

You can either populate the MV in the CREATE command or by REFRESH,
and it will be scannable.  If it is created WITH NO DATA or
TRUNCATEd it is not scannable, generating an error on an attempt to
reference it.

test=# select * from tm;
 type | totamt
--+
 y    | 12
 z    | 24
 x    |  5
(3 rows)

test=# truncate tm;
TRUNCATE TABLE
test=# select * from tm;
ERROR:  materialized view tm has not been populated
HINT:  Use the REFRESH MATERIALIZED VIEW command.

 The way I was thinking about it, whatever the command is named,
 you might be able to tell the database to drop the storage
 associated with the view but that would make the view invalid
 until it was refreshed.  It wouldn't make it appear to be empty.

I think we're on the same page after all.

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Materialized views WIP patch

2013-02-21 Thread Kevin Grittner

Peter Eisentraut pete...@gmx.net wrote:

I'll need more time to ponder your other points, but...

 You don't need a new command or key word for this.  How about
 ALTER MATERIALIZED VIEW DISCARD?

I don't like this because we don't have ALTER TABLE REINDEX.  But
the fact that DISCARD is a keyword does open up some interesting
syntax possibilities without adding more keywords.  Maybe:

DISCARD MATERIALIZED VIEW DATA tm;

Or something like that.  Paint buckets are over there by the
bikeshed.  Have at it.  :-)
 
-- 
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Materialized views WIP patch

2013-02-21 Thread Heikki Linnakangas


On 21.02.2013 16:38, Kevin Grittner wrote:

Tom Lanet...@sss.pgh.pa.us  wrote:

Greg Starkst...@mit.edu  writes:

The way I was thinking about it, whatever the command is named, you
might be able to tell the database to drop the storage associated with
the view but that would make the view invalid until it was refreshed.
It wouldn't make it appear to be empty.


Actually, that seems like a pretty key point to me.  TRUNCATE TABLE
results in a table that is perfectly valid, you just deleted all the
rows that used to be in it.  Throwing away an MV's contents should
not result in an MV that is considered valid.


It doesn't.  That was one of the more contentious points in the
earlier bikeshedding phases.  Some felt that throwing away the
contents was a form of making the MV out of date and as such
didn't require any special handling.  Others, including myself,
felt that data not present was a distinct state from generated
zero rows and that attempting to scan a materialized view for
which data had not been generated must result in an error.  The
latter property has been maintained from the beginning -- or at
least that has been the intent.


Yeah, data not present is clearly different from 0 rows. I'm 
surprised there isn't an explicit boolean column somewhere for that, but 
I guess you can use the size of the heap for that too, as long as you're 
careful to not truncate it to 0 blocks when it's empty but scannable.


There's at least one bug left in that area:

postgres=# create table t (id int4);
CREATE TABLE
postgres=# create materialized view tm as select * from t where id  
0;SELECT 0

postgres=# select * from tm;
 id

(0 rows)

postgres=# create index i_tm on tm(id);CREATE INDEX
postgres=# cluster tm using i_tm;
CLUSTER
postgres=# select * from tm;
ERROR:  materialized view tm has not been populated
HINT:  Use the REFRESH MATERIALIZED VIEW command.

Clustering a materialized view invalidates it.

- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Materialized views WIP patch

2013-02-21 Thread Kevin Grittner

Andres Freund and...@2ndquadrant.com wrote:
 giving an error so its an easy to find distinction to a normal
 table seems like a good idea.

I'm not sure I understood your concerns entirely, but wonder
whether this helps?:

test=# \d
  List of relations
 Schema | Name  |   Type    |  Owner 
+---+---+-
 public | bb    | materialized view | kgrittn
 public | t | table | kgrittn
 public | tm    | materialized view | kgrittn
 public | tmm   | materialized view | kgrittn
 public | tv    | view  | kgrittn
 public | tvmm  | materialized view | kgrittn
 public | tvv   | view  | kgrittn
 public | tvvm  | materialized view | kgrittn
 public | tvvmv | view  | kgrittn
(9 rows)

test=# truncate table tm;
ERROR:  tm is not a table
test=# truncate materialized view t;
ERROR:  t is not a materialized view
test=# truncate materialized view tm;
TRUNCATE TABLE
test=# truncate table t;
TRUNCATE TABLE

Well, maybe those command tags could use a tweak.

Then there's this, if you don't specify an object type:

test=# truncate t, tm;
TRUNCATE TABLE

-- 
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Materialized views WIP patch

2013-02-21 Thread Tom Lane

Kevin Grittner kgri...@ymail.com writes:
 Tom Lane t...@sss.pgh.pa.us wrote:
 That being the case, lumping them as being the same operation
 feels like the wrong thing, and so we should choose a different
 name for the MV operation.

 There is currently no truncation of MV data without rendering the
 MV unscannable.  Do you still feel it needs a different command
 name?

You didn't say anything that changed my opinion: it doesn't feel like
a TRUNCATE to me.  It's not changing the object to a different but
entirely valid state, which is what TRUNCATE does.

Peter claimed upthread that REFRESH is a subcommand of ALTER MATERIALIZE
VIEW and that this operation should be another one.  That sounds pretty
reasonable from here.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Materialized views WIP patch

2013-02-21 Thread Kevin Grittner

Heikki Linnakangas hlinnakan...@vmware.com wrote:
 On 21.02.2013 16:38, Kevin Grittner wrote:
 Tom Lanet...@sss.pgh.pa.us  wrote:
 Greg Starkst...@mit.edu  writes:
 The way I was thinking about it, whatever the command is named, you
 might be able to tell the database to drop the storage associated with
 the view but that would make the view invalid until it was refreshed.
 It wouldn't make it appear to be empty.

 Actually, that seems like a pretty key point to me.  TRUNCATE TABLE
 results in a table that is perfectly valid, you just deleted all the
 rows that used to be in it.  Throwing away an MV's contents should
 not result in an MV that is considered valid.

 It doesn't.  That was one of the more contentious points in the
 earlier bikeshedding phases.  Some felt that throwing away the
 contents was a form of making the MV out of date and as such
 didn't require any special handling.  Others, including myself,
 felt that data not present was a distinct state from generated
 zero rows and that attempting to scan a materialized view for
 which data had not been generated must result in an error.  The
 latter property has been maintained from the beginning -- or at
 least that has been the intent.

 Yeah, data not present is clearly different from 0 rows.
 I'm surprised there isn't an explicit boolean column somewhere
 for that,

There was, in earlier versions of the patch: pg_class.relisvald. 
The problem is that we needed some way to determine from the heap
that it was invalid to support UNLOGGED MVs.  Several people were
offended by my attempt to use relisvald as the primary indicator
and transfer the information from the heap state to pg_class and
relcache.  There were some pretty big technical challenges to that.
So I caved on that one and went with the pg_relation_is_scannable()
function based on the heap as reported by relcache.  That being one
of the newer parts of the patch, it is probably not as solid as the
parts which haven't changed much in the last three months.

 but I guess you can use the size of the heap for that too, as
 long as you're careful to not truncate it to 0 blocks when it's
 empty but scannable.

 There's at least one bug left in that area:

 postgres=# create table t (id int4);
 CREATE TABLE
 postgres=# create materialized view tm as select * from t where id  0;SELECT
 0
 postgres=# select * from tm;
 id
 
 (0 rows)

 postgres=# create index i_tm on tm(id);CREATE INDEX
 postgres=# cluster tm using i_tm;
 CLUSTER
 postgres=# select * from tm;
 ERROR:  materialized view tm has not been populated
 HINT:  Use the REFRESH MATERIALIZED VIEW command.

 Clustering a materialized view invalidates it.

Good spot.  That should be easy enough to fix.

Thanks.

-- 
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Materialized views WIP patch

2013-02-21 Thread Andres Freund

On 2013-02-21 07:10:09 -0800, Kevin Grittner wrote:
 Andres Freund and...@2ndquadrant.com wrote:
  giving an error so its an easy to find distinction to a normal
  table seems like a good idea.
 
 I'm not sure I understood your concerns entirely, but wonder
 whether this helps?:

To explain it a bit:

I assume that at some point matviews will get (auto-)updateable, just as
normal views recently got. In that case application programmers might
not be aware anymore that something is a view either because they just
don't know or because a table got converted into a matview after the
code was written.

Because of the potential wish for transparency (with security views as a
potential user) at least normal views might get the capability to be
TRUNCATEd directly, so it might be that matviews do as well.

 test=# \d
   List of relations
  Schema | Name  |   Type    |  Owner 
 +---+---+-
  public | bb    | materialized view | kgrittn
  public | t | table | kgrittn
  public | tm    | materialized view | kgrittn
  public | tmm   | materialized view | kgrittn
  public | tv    | view  | kgrittn
  public | tvmm  | materialized view | kgrittn
  public | tvv   | view  | kgrittn
  public | tvvm  | materialized view | kgrittn
  public | tvvmv | view  | kgrittn
 (9 rows)
 
 test=# truncate table tm;
 ERROR:  tm is not a table
 test=# truncate materialized view t;
 ERROR:  t is not a materialized view
 test=# truncate materialized view tm;
 TRUNCATE TABLE
 test=# truncate table t;
 TRUNCATE TABLE

Thats not bad.

But what if we allow TRUNCATE on views someday (possibly only if a
truncate trigger is defined). For consistency we might also want that on
matvies. Having a difference between TRUNCATE view; and TRUNCATE
MATERIALIZED VIEW; in that case sounds ugly to me.

What about DISABLE? DISCARD or DEALLOCATE would also be nice but it
seems hard to fit that into existing syntax.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Materialized views WIP patch

2013-02-21 Thread Kevin Grittner

Tom Lane t...@sss.pgh.pa.us wrote:
 Kevin Grittner kgri...@ymail.com writes:
 Tom Lane t...@sss.pgh.pa.us wrote:
 That being the case, lumping them as being the same
 operation feels like the wrong thing, and so we should choose a
 different name for the MV operation.

 There is currently no truncation of MV data without rendering
 the MV unscannable.  Do you still feel it needs a different
 command name?

 You didn't say anything that changed my opinion: it doesn't feel
 like a TRUNCATE to me.  It's not changing the object to a
 different but entirely valid state, which is what TRUNCATE does.

 Peter claimed upthread that REFRESH is a subcommand of ALTER
 MATERIALIZE VIEW

It's not, nor do I think it should be.

 and that this operation should be another one.  That sounds
 pretty reasonable from here.

That feels completely wrong to me.  For one thing, I can't think of
any ALTER commands to populate or remove data.  What did you think
of the idea of something like DISCARD MATERIALIZED VIEW DATA as a
new statment?  Or maybe RESET MATERIALIZED VIEW?

-- 
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Materialized views WIP patch

2013-02-21 Thread Kevin Grittner

Andres Freund and...@2ndquadrant.com wrote:

 I assume that at some point matviews will get (auto-)updateable,
 just as normal views recently got.

I'm dubious about that.  Every use case I've seen for MVs involves
aggregation, although they are a generalized feature, so that won't
always be true.  But if you have a view like:

CREATE MATERIALIZED VIEW tm AS
 SELECT t.type,
    sum(t.amt) AS totamt
   FROM t
  GROUP BY t.type;

... I don't see how that can be updateable.  If I add 5 to totamt
for some row, what do you do?  I expect that 99% of MVs will be
updated asynchronously from changes to the underlying data -- what
do you do if someone updates a row that no longer exists in the
underlying data.  This are just seems fraught with peril and out of
sync with the usual uses of MVs.

 What about DISABLE? DISCARD or DEALLOCATE would also be nice but
 it seems hard to fit that into existing syntax.

Thanks for the suggestions.

-- 
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Materialized views WIP patch

2013-02-21 Thread Tom Lane

Kevin Grittner kgri...@ymail.com writes:
 Tom Lane t...@sss.pgh.pa.us wrote:
 Peter claimed upthread that REFRESH is a subcommand of ALTER
 MATERIALIZE VIEW

 It's not, nor do I think it should be.

Oh, never mind then.

 and that this operation should be another one.  That sounds
 pretty reasonable from here.

 That feels completely wrong to me.  For one thing, I can't think of
 any ALTER commands to populate or remove data.  What did you think
 of the idea of something like DISCARD MATERIALIZED VIEW DATA as a
 new statment?  Or maybe RESET MATERIALIZED VIEW?

I could live with either DISCARD or RESET.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Materialized views WIP patch

2013-02-21 Thread Peter Eisentraut

On 2/21/13 9:25 AM, Kevin Grittner wrote:
 Peter Eisentraut pete...@gmx.net wrote:
 On 2/20/13 11:14 PM, Greg Stark wrote:
 That's not entirely true. From the database's point of view,
 TRUNCATE is in many ways actually DDL.

 Whether something is DDL or DML or a read operation (query) is
 not an implementation detail, it's a user-exposed category. 
 Since TRUNCATE is logically equivalent to DELETE, it's a DML
 operation, as far as the user is concerned.
 
 Not really.  It doesn't follow the same MVCC behavior as DML.  This
 is user-visible, documented behavior.

MVCC behavior does not determine whether something is considered DDL or DML.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Materialized views WIP patch

2013-02-21 Thread Kevin Grittner

Tom Lane t...@sss.pgh.pa.us wrote:
 Kevin Grittner kgri...@ymail.com writes:

 What did you think of the idea of something like DISCARD
 MATERIALIZED VIEW DATA as a new statment?  Or maybe RESET
 MATERIALIZED VIEW?

 I could live with either DISCARD or RESET.

I figured this was worth a pass through the keyword list to look
for all imperative verbs suitable for this, which could support the
needed syntax without adding a new keyword.  Here are the
possibilities I came up with, along with a note about why they are
keywords already.

DISABLE MATERIALIZED VIEW mv;  -- ALTER clause for constraints
DISCARD MATERIALIZED VIEW DATA mv;  -- session state
RELEASE MATERIALIZED VIEW DATA mv;  -- savepoint
RESET MATERIALIZED VIEW DATA mv;  -- run-time parameter

I think any of these could work.  I'm personally most inclined
toward DISABLE MATERIALIZED VIEW.  It seems to convey the semantics
better, especially if you leave out DATA as an additonal word. 
Given that a materialized view will retain its query, tablespace,
indexes, statistics targets, etc. with this operation, and will
just not be available for scanning, some of the above seem
downright misleading without DATA thrown in.

Opinions?

-- 
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Materialized views WIP patch

2013-02-21 Thread Greg Stark

On Thu, Feb 21, 2013 at 2:38 PM, Kevin Grittner kgri...@ymail.com wrote:
 It doesn't.  That was one of the more contentious points in the
 earlier bikeshedding phases.  Some felt that throwing away the
 contents was a form of making the MV out of date and as such
 didn't require any special handling.  Others, including myself,
 felt that data not present was a distinct state from generated
 zero rows and that attempting to scan a materialized view for
 which data had not been generated must result in an error.  The
 latter property has been maintained from the beginning -- or at
 least that has been the intent.

Actually this sounds like exactly what I was saying. I withdraw my
concern entirely.

-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Materialized views WIP patch

2013-02-21 Thread Kevin Grittner

Greg Stark st...@mit.edu wrote:
 Kevin Grittner kgri...@ymail.com wrote:
 It doesn't.  That was one of the more contentious points in the
 earlier bikeshedding phases.  Some felt that throwing away the
 contents was a form of making the MV out of date and as such
 didn't require any special handling.  Others, including myself,
 felt that data not present was a distinct state from
 generated zero rows and that attempting to scan a materialized
 view for which data had not been generated must result in an
 error.  The latter property has been maintained from the
 beginning -- or at least that has been the intent.

 Actually this sounds like exactly what I was saying. I withdraw
 my concern entirely.

Reviewing your concerns and discussions of freshness in general
got me thinking -- while it is clear that not having generated
values in the MV based on its query clearly should make the view
non-scannable, and will be the only criterion for that in this
initial patch; later versions will almost certainly support
age-based conditions for whether the MV is scannable.  So in the
next release the MV may become non-scannable based on the passage
of time since DML was run on a source table without the MV having
been refreshed or incrementally updated to reflect that DML.  Which
makes me wonder why DML making the MV non-scannable is such a bad
thing in the case of TRUNCATE.  Granted there is a difference in
that it is run *on the MV* rather than *on a source relation*; but
still, I'm having some second thoughts about that being a problem.

The problem with TRUNCATE MATERIALIZED VIEW from a logical
perspective doesn't seem to me so much that it makes the MV
non-scannable, as that it is the only DML which would be allowed
directly on an MV -- which is kind of a weird exception.  It is
pretty much a given that when we can get to implementing it, DML
statements will render MVs unscannable under various conditions. 
Josh Berkus and Greg Stark have been the most explicit about that,
but I think most of us who are interested in the feature take it as
a given.

-- 
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Materialized views WIP patch

2013-02-21 Thread Andres Freund

On 2013-02-21 14:11:10 -0800, Kevin Grittner wrote:
 Tom Lane t...@sss.pgh.pa.us wrote:
  Kevin Grittner kgri...@ymail.com writes:
 
  What did you think of the idea of something like DISCARD
  MATERIALIZED VIEW DATA as a new statment?  Or maybe RESET
  MATERIALIZED VIEW?
 
  I could live with either DISCARD or RESET.
 
 I figured this was worth a pass through the keyword list to look
 for all imperative verbs suitable for this, which could support the
 needed syntax without adding a new keyword.  Here are the
 possibilities I came up with, along with a note about why they are
 keywords already.
 
 DISABLE MATERIALIZED VIEW mv;  -- ALTER clause for constraints
 DISCARD MATERIALIZED VIEW DATA mv;  -- session state
 RELEASE MATERIALIZED VIEW DATA mv;  -- savepoint
 RESET MATERIALIZED VIEW DATA mv;  -- run-time parameter
 
 I think any of these could work.  I'm personally most inclined
 toward DISABLE MATERIALIZED VIEW.  It seems to convey the semantics
 better, especially if you leave out DATA as an additonal word. 

 Given that a materialized view will retain its query, tablespace,
 indexes, statistics targets, etc. with this operation, and will
 just not be available for scanning, some of the above seem
 downright misleading without DATA thrown in.

I vote for RESET or DISCARD. DISABLE sounds more like you disable
automatic refreshes or somesuch.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Materialized views WIP patch

2013-02-20 Thread Tom Lane

Kevin Grittner kgri...@ymail.com writes:
 When I went to do this, I hit a shift/reduce conflict, because with
 TABLE being optional it couldn't tell whether:

 TRUNCATE MATERIALIZED VIEW x, y, z;

 ... was looking for five relations or three.  That goes away with
 MATERIALIZED escalated to TYPE_FUNC_NAME_KEYWORD.  Is that OK?

Not really.  I would much rather see us not bother with this pedantic
syntax than introduce an even-partially-reserved word.

Having said that, I don't think I believe your analysis of why this
doesn't work.  The presence or absence of commas ought to make the
syntax non-ambiguous, I would think.  Maybe you just factored the
grammar wrong.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Materialized views WIP patch

2013-02-20 Thread Robert Haas

On Tue, Feb 19, 2013 at 11:01 AM, Peter Eisentraut pete...@gmx.net wrote:
 On 2/19/13 8:54 AM, Robert Haas wrote:
 In the department of crazy ideas, what about having pg_dump NEVER
 refresh ANY materialized views?

 It might be useful to have an option for this, but I don't think it
 should be the default.  The default should be that the new database is
 ready to go.

 Then again, when would you ever actually use that option?

You'd use that option if you'd rather get the database mostly-up as
soon as possible, and then worry about the materialized views
afterwards.

 This might be different if there were a command to refresh all
 materialized views, because you don't want to have to go around and type
 separate commands 47 times after a restore.

Well, it's pretty easy to do:

SELECT 'LOAD MATERIALIZED VIEW ' || p.oid::regclass || ';' FROM
pg_class WHERE relkind = 'm';

...but we could also add explicit syntax for it, perhaps along the
lines of what we have for CLUSTER and VACUUM.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Materialized views WIP patch

2013-02-20 Thread Tom Lane

Robert Haas robertmh...@gmail.com writes:
 On Tue, Feb 19, 2013 at 11:01 AM, Peter Eisentraut pete...@gmx.net wrote:
 This might be different if there were a command to refresh all
 materialized views, because you don't want to have to go around and type
 separate commands 47 times after a restore.

 Well, it's pretty easy to do:

 SELECT 'LOAD MATERIALIZED VIEW ' || p.oid::regclass || ';' FROM
 pg_class WHERE relkind = 'm';

 ...but we could also add explicit syntax for it, perhaps along the
 lines of what we have for CLUSTER and VACUUM.

It's not really that easy, because of the likelihood that MVs have to be
refreshed in a specific order.  The SELECT you suggest definitely seems
too simplistic.  A dedicated command could perhaps be built to pay
attention to dependencies ... but if we're still coding such things now,
it seems a bit late for 9.3.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Materialized views WIP patch

2013-02-20 Thread Peter Eisentraut

On 2/19/13 5:22 PM, David Fetter wrote:
 On Tue, Feb 19, 2013 at 11:09:13PM +0100, Erik Rijkers wrote:
 On Sat, February 16, 2013 02:01, Kevin Grittner wrote:
 matview-v4.patch.gz

 Hi,

 I was wondering if material views should not go into information_schema.  I 
 was thinking either
 .views or .tables.  Have you considered this?
 
 I'm guessing it'd be .views if anything.  Haven't been able to
 decipher from section 11 of the standard (Schemata) whether the
 standard has anything to say on the matter.

I suppose one should be able to expect that if one finds a view in the
information schema, then one should be able to use DROP VIEW to remove
it.  Which in this case wouldn't work.  So I don't think including a
materialized view under views or tables is appropriate.




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Materialized views WIP patch

2013-02-20 Thread Kevin Grittner

Tom Lane t...@sss.pgh.pa.us wrote:
 Kevin Grittner kgri...@ymail.com writes:
 When I went to do this, I hit a shift/reduce conflict, because
 with TABLE being optional it couldn't tell whether:

 TRUNCATE MATERIALIZED VIEW x, y, z;

 ... was looking for five relations or three.  That goes away
 with MATERIALIZED escalated to TYPE_FUNC_NAME_KEYWORD.  Is that
 OK?

 Not really.  I would much rather see us not bother with this
 pedantic syntax than introduce an even-partially-reserved word.

I'm not sure it's worth it either; but two people requested it and
I didn't forsee this shift/reduce conflict, so I took a shot at it.
 If we can't eliminate the conflict, I'm fine with leaving things
as they are in the latest posted patch.

 Having said that, I don't think I believe your analysis of why
 this doesn't work.  The presence or absence of commas ought to
 make the syntax non-ambiguous, I would think.  Maybe you just
 factored the grammar wrong.

Well, it wouldn't be the first time you've seen a better way to do
something in flex than I was able to see.  Taking just the gram.y
part of the change which implemented this, and omitting the change
in reservedness of MATERIALIZED, I have:

diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 820cb41..1d393c5 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -394,6 +394,7 @@ static Node *makeRecursiveViewSelect(char *relname, List 
*aliases, Node *query);

 %type ival   opt_column event cursor_options opt_hold opt_set_data
 %type objtype    reindex_type drop_type comment_type security_label_type
+   trunc_type

 %type node   fetch_args limit_clause select_limit_value
    offset_clause select_offset_value
@@ -5172,9 +5173,10 @@ attrs:   '.' attr_name
  */

 TruncateStmt:
-   TRUNCATE opt_table relation_expr_list opt_restart_seqs 
opt_drop_behavior
+   TRUNCATE trunc_type relation_expr_list opt_restart_seqs 
opt_drop_behavior
    {
    TruncateStmt *n = makeNode(TruncateStmt);
+   n-objtype = $2;
    n-relations = $3;
    n-restart_seqs = $4;
    n-behavior = $5;
@@ -5182,6 +5184,12 @@ TruncateStmt:
    }
    ;

+trunc_type:
+   TABLE   { $$ = OBJECT_TABLE; }
+   | MATERIALIZED VIEW { $$ = OBJECT_MATVIEW; }
+   | /*EMPTY*/ { $$ = OBJECT_UNSPECIFIED; }
+   ;
+
 opt_restart_seqs:
    CONTINUE_P IDENTITY_P   { $$ = false; }
    | RESTART IDENTITY_P    { $$ = true; }

I'm open to suggestions on a better way.

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Materialized views WIP patch

2013-02-20 Thread Kevin Grittner

Peter Eisentraut pete...@gmx.net wrote:

 I suppose one should be able to expect that if one finds a view
 in the information schema, then one should be able to use DROP
 VIEW to remove it.  Which in this case wouldn't work.  So I don't
 think including a materialized view under views or tables is
 appropriate.

Right.  I think adding pg_matviews covers the stated use-case
enough to answer Erik's concern.  I'm not going to mess with adding
non-standard stuff to the standard views.

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Materialized views WIP patch

2013-02-20 Thread Tom Lane

Kevin Grittner kgri...@ymail.com writes:
 Tom Lane t...@sss.pgh.pa.us wrote:
 Having said that, I don't think I believe your analysis of why
 this doesn't work.

 Well, it wouldn't be the first time you've seen a better way to do
 something in flex than I was able to see.  Taking just the gram.y
 part of the change which implemented this, and omitting the change
 in reservedness of MATERIALIZED, I have:

 -   TRUNCATE opt_table relation_expr_list opt_restart_seqs 
 opt_drop_behavior
 +   TRUNCATE trunc_type relation_expr_list opt_restart_seqs 
 opt_drop_behavior

 +trunc_type:
 +   TABLE   { $$ = OBJECT_TABLE; }
 +   | MATERIALIZED VIEW { $$ = OBJECT_MATVIEW; }
 +   | /*EMPTY*/ { $$ = OBJECT_UNSPECIFIED; }
 +   ;

Yeah, this is a standard gotcha when working with unreserved keywords.
You can't factor it like that because then the parser is required to
make a shift-reduce decision (on whether to reduce trunc_type to empty)
before it can see past the first word.  So for instance given

TRUNCATE MATERIALIZED ...
^

the parser has to make that decision when it can't see past the word
MATERIALIZED and so doesn't know what comes after it.

The way to fix it is to not try to use the sub-production but spell it
all out:

  TRUNCATE TABLE relation_expr_list ...
| TRUNCATE MATERIALIZED VIEW relation_expr_list ...
| TRUNCATE relation_expr_list ...

Now the parser doesn't have to make any shift-reduce decision until
after it can see past the first identifier.  It's a bit tedious
but beats making a word more reserved than it has to be.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Materialized views WIP patch

2013-02-20 Thread Kevin Grittner

Tom Lane t...@sss.pgh.pa.us wrote:


 The way to fix it is to not try to use the sub-production but spell it
 all out:
 
       TRUNCATE TABLE relation_expr_list ...
     | TRUNCATE MATERIALIZED VIEW relation_expr_list ...
     | TRUNCATE relation_expr_list ...
 
 Now the parser doesn't have to make any shift-reduce decision until
 after it can see past the first identifier.  It's a bit tedious
 but beats making a word more reserved than it has to be.

Thanks!  Will do.


--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Materialized views WIP patch

2013-02-20 Thread Peter Eisentraut

On 2/20/13 6:13 AM, Robert Haas wrote:
 It might be useful to have an option for this, but I don't think it
  should be the default.  The default should be that the new database is
  ready to go.
 
  Then again, when would you ever actually use that option?
 You'd use that option if you'd rather get the database mostly-up as
 soon as possible, and then worry about the materialized views
 afterwards.

Since the proposed materialized views are not available for implicit use
in query optimization, the only way an application would make use of
them is to access them directly.  And if it accesses an unpopulated
materialized view, it would fail.  So I don't think in the current state
a database is mostly-up without the materialized views filled in.

I can see the value in having a restore mode that postpones certain
nonessential operations, such as creating indexes or certain constraints
or even materialized views.  But I think the boundaries and expectations
for that need to be defined more precisely.  For example, a database
without constraints might be considered ready for read-only use,
without secondary indexes it might be ready for use but slow.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Materialized views WIP patch

2013-02-20 Thread Erik Rijkers

On Wed, February 20, 2013 16:28, Kevin Grittner wrote:
 Peter Eisentraut pete...@gmx.net wrote:

 I suppose one should be able to expect that if one finds a view
 in the information schema, then one should be able to use DROP
 VIEW to remove it.  Which in this case wouldn't work.  So I don't
 think including a materialized view under views or tables is
 appropriate.

 Right.  I think adding pg_matviews covers the stated use-case
 enough to answer Erik's concern. 

Absolutely - I agree pg_matviews is much better than adding deviating 
information_schema stuff.

Thank you,

Erik Rijkers



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Materialized views WIP patch

2013-02-20 Thread Greg Stark

On Wed, Feb 20, 2013 at 4:26 PM, Kevin Grittner kgri...@ymail.com wrote:
 Tom Lane t...@sss.pgh.pa.us wrote:


 The way to fix it is to not try to use the sub-production but spell it
 all out:

   TRUNCATE TABLE relation_expr_list ...
 | TRUNCATE MATERIALIZED VIEW relation_expr_list ...
 | TRUNCATE relation_expr_list ...

 Now the parser doesn't have to make any shift-reduce decision until
 after it can see past the first identifier.  It's a bit tedious
 but beats making a word more reserved than it has to be.

 Thanks!  Will do.

Fwiw I think worrying about stuff like this at this point is probably
a waste of time. There'll be a period of bike-shedding where people
debate what the command should be called so worrying about parser
conflicts before there's a consensus is kind pointless.

I would like to know what operations you plan to support independently
of the command names. I may have missed much earlier in the discussion
but then I suspect things have evolved since then.

It sounds like you want to support:

1) Selecting from materialized viws
2) Manually refreshing materialized views
3) Manually truncating materialized views

And explicitly not support

1) Automatically rewriting queries to select from matching views
2) Incrementally refreshing materialized views
3) Manual DML against data in materialized views (except truncate
which is kind of DDL)
4) Keeping track of whether the data in the materialized view is up to date

I have to say I find this model a bit odd. It seems the UI you're
presenting is that they're basically read-only tables that the
database will fill in the data for automatically. My mental model of
materialized views is that they're basically views that the database
guarantees a different performance characteristic for.

I would expect a materialized view to be up to date all the time. If
we don't support incremental updates (which seems like a fine thing
not to support in a first cut) then I would expect any DML against the
table to mark the view invalid and any queries against it to produce
an error (or possibly go to the source tables using the view
definition but that's probably a bad idea for most use cases). Ie.
they should behave like a view at all times and have up to date
information or fail entirely.

I would expect a command like TRUNCATE MATERIALIZED VIEW to exist but
I would expect it to be called something like INVALIDATE rather than
TRUNCATE and dropping the storage is a side effect of simply telling
the database that it doesn't need to maintain this materialized view.
Though I could be convinced truncate is a good name as long as it's
documented well.

-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Materialized views WIP patch

2013-02-20 Thread Kevin Grittner

Greg Stark st...@mit.edu wrote:
 Kevin Grittner kgri...@ymail.com wrote:
 Tom Lane t...@sss.pgh.pa.us wrote:

 The way to fix it is to not try to use the sub-production but
 spell it all out:

   TRUNCATE TABLE relation_expr_list ...
 | TRUNCATE MATERIALIZED VIEW relation_expr_list ...
 | TRUNCATE relation_expr_list ...

 Now the parser doesn't have to make any shift-reduce decision
 until after it can see past the first identifier.  It's a bit
 tedious but beats making a word more reserved than it has to
 be.

 Thanks!  Will do.

 Fwiw I think worrying about stuff like this at this point is
 probably a waste of time. There'll be a period of bike-shedding
 where people debate what the command should be called so worrying
 about parser conflicts before there's a consensus is kind
 pointless.

That sort of bikeshedding already happened three months ago.  Too
late now.

 I would like to know what operations you plan to support
 independently of the command names. I may have missed much
 earlier in the discussion but then I suspect things have evolved
 since then.

 It sounds like you want to support:

 1) Selecting from materialized viws
 2) Manually refreshing materialized views
 3) Manually truncating materialized views

 And explicitly not support

 1) Automatically rewriting queries to select from matching views
 2) Incrementally refreshing materialized views

Those are material for later releases, building on the base of what
goes into this release.

 3) Manual DML against data in materialized views (except truncate
 which is kind of DDL)

There is quite a lot of DML allowed -- changing tablespace,
changing schema, changing name of the MV or of individual columns
in it, changing statistics targets, creating indexes, and other
operations are supported.

 4) Keeping track of whether the data in the materialized view is
 up to date

Only keeping track of whether data has been populated or not, for
now.  There has been agreement that one or more timestamps relating
to freshness will make sense, but these are not in the initial
patch.

 I have to say I find this model a bit odd.

It's not a model, it's a starting point.  Several people have
already said that even this much is useful and they expect to take
advantage of it.  I'm doing what I can to not paint us into a
corner where it's hard to extend to all the features everyone
dreams of, but if we waited for that to commit something, it will
never happen.

 I would expect a materialized view to be up to date all the time.

I expect that this will eventually be an option, but I expect that
is will be a seldom-used one.  Most cases that I've seen, people
want summary data that is reasonably up-to-date without unduly
affecting the performance of incremental changes to the underlying
data.  I've sketched out the roadmap from this patch to all of
these options in a vauge, handwavy fashion, and don't have a lot of
interest in taking it farther until we're past 9.3 beta.

 If we don't support incremental updates (which seems like a fine
 thing not to support in a first cut) then I would expect any DML
 against the table to mark the view invalid and any queries
 against it to produce an error (or possibly go to the source
 tables using the view definition but that's probably a bad idea
 for most use cases). Ie. they should behave like a view at all
 times and have up to date information or fail entirely.

That would render them completely useless for the use-cases I've
seen.  If you want to offer a patch to do that as an option, feel
free, but I will strongly argue against that as unconditional
behavior.

 I would expect a command like TRUNCATE MATERIALIZED VIEW to exist
 but I would expect it to be called something like INVALIDATE
 rather than TRUNCATE and dropping the storage is a side effect of
 simply telling the database that it doesn't need to maintain this
 materialized view. Though I could be convinced truncate is a
 good name as long as it's documented well.

I'm trying to minimize the number of new keywords.  The initial
patch only added MATERIALIZED.  I added REFRESH due to
near-universal demand for something other than the LOAD
MATERIALIZED VIEW I initially used.  Have you seen the statistics
Tom gave out on how much the size of the executable bloats with
every new keyword?  Until now nobody has expressed concern about
TRUNCATE MATERIALIZED VIEW, so it would take quite a groundswell of
concern at this point to even consider a new keyword for this
functionality this late in the game.

-- 
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Materialized views WIP patch

2013-02-20 Thread Josh Berkus


 And explicitly not support
 
 1) Automatically rewriting queries to select from matching views
 2) Incrementally refreshing materialized views
 3) Manual DML against data in materialized views (except truncate
 which is kind of DDL)
 4) Keeping track of whether the data in the materialized view is up to date

The idea is to add the above features over the next few versions of
Postgres.

 I have to say I find this model a bit odd. It seems the UI you're
 presenting is that they're basically read-only tables that the
 database will fill in the data for automatically. 

This is what matviews are in other DBMSes.

 My mental model of
 materialized views is that they're basically views that the database
 guarantees a different performance characteristic for.

How would we do that, exactly?  That would be lovely, but unless you
have a way to accomplish it ...

 I would expect a materialized view to be up to date all the time. 

Actually, there's a huge use case for asynchronously updated matviews,
so we would not want an implementation which ruled them out.  Also
there's the argument that synchronously updated matviews have little
actual performance advantage over regular dynamic views.

Or to put it another way: I could use this feature, as it is, in about 8
different projects I'm currently supporting.  I personally can't think
of a single project where I need synchronously updated matviews,
currently.  I have in the past, but it's a LOT less frequent that the
desire for async, just as the desire for async replication is more
common than the desire for syncrep.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Materialized views WIP patch

2013-02-20 Thread Peter Eisentraut

On 2/19/13 5:47 PM, Kevin Grittner wrote:
 When I went to do this, I hit a shift/reduce conflict, because with
 TABLE being optional it couldn't tell whether:
 
 TRUNCATE MATERIALIZED VIEW x, y, z;
 
 ... was looking for five relations or three.  That goes away with
 MATERIALIZED escalated to TYPE_FUNC_NAME_KEYWORD.  Is that OK?

Is TRUNCATE even the right command here?  For regular tables TRUNCATE is
a fast DELETE, which logically empties the table.  For materialized
views, there is no deleting, so this command (I suppose?) just
invalidates the materalized view.  That's not the same thing.

Are there TRUNCATE triggers on materialized views?



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Materialized views WIP patch

2013-02-20 Thread Kevin Grittner

Peter Eisentraut pete...@gmx.net wrote:
 On 2/19/13 5:47 PM, Kevin Grittner wrote:
 When I went to do this, I hit a shift/reduce conflict, because
 with TABLE being optional it couldn't tell whether:

 TRUNCATE MATERIALIZED VIEW x, y, z;

 ... was looking for five relations or three.  That goes away
 with MATERIALIZED escalated to TYPE_FUNC_NAME_KEYWORD.  Is that
 OK?

 Is TRUNCATE even the right command here?  For regular tables
 TRUNCATE is a fast DELETE, which logically empties the table. 
 For materialized views, there is no deleting, so this command (I
 suppose?) just invalidates the materalized view.  That's not the
 same thing.

Hmm.  That's what Greg Stark just said, and I replied that nobody
else had raised the issue in over three months.  With Greg, that's
two now.

TRUNCATE MATERIALIZED VIEW discards any data which has been loaded
into the MV, rendering it unavailable for scanning.  Internally, it
does do a truncate, exactly as truncate table.  The resulting
zero-length heap file is what is used to determine whether a
materialized view is scannable.  When a CREATE WITH DATA or a
REFRESH generates zero rows, an empty single page is created to
indicate that it is scannable (valid to use in queries) but
contains no rows.

I agree that INVALIDATE is probably more descriptive, although it
seems that there might be some even better word if we bikeshed
enough.  The question is, is it worth creating a new keyword to
call the internal truncate function for materialized views, versus
documenting that truncating a materialized view renders it invalid?
Again, given the numbers that Tom presented a while back about the
space requirements of every new keyword, I don't think this is
enough of a gain to justify that.  I still squirm a little about
having used REFRESH, even though demand for that was overwhelming.

 Are there TRUNCATE triggers on materialized views?

No.  Nor SELECT, INSERT, UPDATE, or DELETE triggers.  You can't
create a trigger of any type on a materialized view.  I don't think
that would interfere with event triggers, though.

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Materialized views WIP patch

2013-02-20 Thread Stephen Frost

* Kevin Grittner (kgri...@ymail.com) wrote:
 Peter Eisentraut pete...@gmx.net wrote:
  Is TRUNCATE even the right command here?  For regular tables
  TRUNCATE is a fast DELETE, which logically empties the table. 
  For materialized views, there is no deleting, so this command (I
  suppose?) just invalidates the materalized view.  That's not the
  same thing.
 
 Hmm.  That's what Greg Stark just said, and I replied that nobody
 else had raised the issue in over three months.  With Greg, that's
 two now.

TRUNCATE MAT VIEW seems like the right command to me.  Just my 2c.

Thanks,

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] Materialized views WIP patch

2013-02-20 Thread Peter Eisentraut

On 2/20/13 2:30 PM, Kevin Grittner wrote:
 Are there TRUNCATE triggers on materialized views?
 No.  Nor SELECT, INSERT, UPDATE, or DELETE triggers.  You can't
 create a trigger of any type on a materialized view.  I don't think
 that would interfere with event triggers, though.

More generally, I would consider the invalidation of a materialized view
a DDL command, whereas truncating a table is a DML command.  This has
various implications with triggers, logging, permissions.  I think it's
not good to mix those two.

Also note that un-invalidating==refreshing a materialized view is
already a DDL command.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Materialized views WIP patch

2013-02-20 Thread Kevin Grittner

Peter Eisentraut pete...@gmx.net wrote:
 On 2/20/13 2:30 PM, Kevin Grittner wrote:
 Are there TRUNCATE triggers on materialized views?
 No.  Nor SELECT, INSERT, UPDATE, or DELETE triggers.  You can't
 create a trigger of any type on a materialized view.  I don't
 think that would interfere with event triggers, though.

 More generally, I would consider the invalidation of a
 materialized view a DDL command, whereas truncating a table is a
 DML command.

The force of that assertion is somewhat undercut by the fact that
the ExecuteTruncate() function does exactly what needs to be done
to discard the data in a materialized view and make it appear as
non-scannable.  Even if we dress it up with different syntax, it's
not clear that we wouldn't build a TruncateStmt in the parser and
pass it through exactly the same execution path.  We would just
need to look at the relkind to generate a different command tag.

 This has various implications with triggers, logging,
 permissions.  I think it's not good to mix those two.

Could you give a more concrete example of where you see a problem?

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Materialized views WIP patch

2013-02-20 Thread Tatsuo Ishii

 I would like to know what operations you plan to support independently
 of the command names. I may have missed much earlier in the discussion
 but then I suspect things have evolved since then.
 
 It sounds like you want to support:
 
 1) Selecting from materialized viws
 2) Manually refreshing materialized views
 3) Manually truncating materialized views

Maybe plus?

4) Automatically dropping materialized views if underlying table(s)
are dropped/altered

Or this has to be done manually?

 And explicitly not support
 
 1) Automatically rewriting queries to select from matching views
 2) Incrementally refreshing materialized views
 3) Manual DML against data in materialized views (except truncate
 which is kind of DDL)
 4) Keeping track of whether the data in the materialized view is up to date
 
 I have to say I find this model a bit odd. It seems the UI you're
 presenting is that they're basically read-only tables that the
 database will fill in the data for automatically. My mental model of
 materialized views is that they're basically views that the database
 guarantees a different performance characteristic for.
 
 I would expect a materialized view to be up to date all the time. If
 we don't support incremental updates (which seems like a fine thing
 not to support in a first cut) then I would expect any DML against the
 table to mark the view invalid and any queries against it to produce
 an error (or possibly go to the source tables using the view
 definition but that's probably a bad idea for most use cases). Ie.
 they should behave like a view at all times and have up to date
 information or fail entirely.
 
 I would expect a command like TRUNCATE MATERIALIZED VIEW to exist but
 I would expect it to be called something like INVALIDATE rather than
 TRUNCATE and dropping the storage is a side effect of simply telling
 the database that it doesn't need to maintain this materialized view.
 Though I could be convinced truncate is a good name as long as it's
 documented well.
 
 -- 
 greg
 
 
 -- 
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Materialized views WIP patch

2013-02-20 Thread Greg Stark

On Wed, Feb 20, 2013 at 9:25 PM, Peter Eisentraut pete...@gmx.net wrote:
 More generally, I would consider the invalidation of a materialized view
 a DDL command, whereas truncating a table is a DML command.

That's not entirely true. From the database's point of view, TRUNCATE
is in many ways actually DDL.

I actually don't really dislike using TRUNCATE for this command. I
was more asking about whether this meant people were thinking of the
view as a thing where you could control the data in it by hand and
could have the view be empty rather than just not valid.

The way I was thinking about it, whatever the command is named, you
might be able to tell the database to drop the storage associated with
the view but that would make the view invalid until it was refreshed.
It wouldn't make it appear to be empty.




-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

1 2 3 >

1 - 100 of 204 matches

Mail list logo