Re: [HACKERS] GSoC - proposal - Materialized Views in PostgreSQL

2010-04-09 Thread Robert Haas
2010/4/9 pavelbaros baro...@seznam.cz:
 Implementation:  could be divided to few steps:

 1) create materialized view
 - modify grammar (parser): CREATE MATERIALIZED VIEW mv_name AS SELECT ...
 - change executor, so that it will create physical table defined by select
 statement

This basically needs to work the same was as CREATE TABLE ... AS
SELECT ... - save that it should also stuff the rewritten query
someplace, so that it can be re-executed.  I think one of the
important design questions here is figuring out exactly where that
someplace should be.

I also suspect that we want to block any write access to the relation
except for view refreshes.  IOW, INSERT, UPDATE, and DELETE on the
underlying relation should be rejected (though perhaps rewrite rules
redirecting such operations to other tables could be allowed).

 2) change rewriter
 - usually, view is relation with defined rule and when rewriting, rule is
 fired and relation (view) is replaced by definition of view. If relation do
 not have rule, planner and executor behave to it as physical table
 (relation). In case of materialized view we want to rewrite select statement
 only in case when we refreshing MV. In other cases rewriter should skip
 rewriting and pick up physical relation. Exclude situation when other
 rewrite rules which are not related to MV definition are specified.

 3) create command that takes snapshot (refresh MV)
 - modify grammar: ALTER MATERIALIZED VIEW mv_name REFRESH;
 - taking snapshot (refreshing) is similar to command SELECT INTO ... and I
 decided to follow the way it works. After parsing query and before
 transformation is MANUALLY created tree representation of SELECT * INTO
 ... with flag IntoClause-isrefresh set true, indicating it is refreshing
 materialized view. Everithing acts as it would be regular SELECT INTO ...
 except functions OpenIntoRel() and CloseIntoRel(). In function OpenIntoRel
 is created temp table (without catalog) and set as destination for result of
 select. In function CloseIntoRel executor swap relfilenode's of temp table
 and original table and finally delete temp table. Behavior of CloseIntoRel
 function is inspired by CLUSTER statement.

I'll have to read the code before I can comment on the rest of this in detail.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GSoC - proposal - Materialized Views in PostgreSQL

2010-04-09 Thread Greg Smith

pavelbaros wrote:
I am also waiting for approval for my repository named 
materialized_view on git.postgresql.org, so I could publish 
completed parts. 


Presuming that you're going to wander there and get assigned what looks 
like an official repo name for this project is a bit...optimistic.  I 
would recommend that you publish to something like github instead (you 
can fork http://github.com/postgres/postgres ), and if the work looks 
good enough that it gets picked up by the community maybe you migrate it 
onto the main site eventually.  git.postgresql.org is really not setup 
to be general hosting space for everyone who has a PostgreSQL related 
project; almost every repo on there belongs to someone who has already 
been a steady project contributor for a number of years.


(Switching to boilerplate mode for a paragraph...) You have picked a 
PostgreSQL feature that is dramatically more difficult than it appears 
to be, and I wouldn't expect you'll actually finish even a fraction of 
your goals in a summer of work.  You're at least in plentiful 
company--most students do the same.  As a rule, if you see a feature on 
our TODO list that looks really useful and fun to work on, it's only 
still there because people have tried multiple times to build it 
completely but not managed to do so because it's harder than it 
appears.  This is certainly the case with materialized views.


You've outlined a reasonable way to build a prototype that does a 
limited implementation here.  The issue is what it will take to extend 
that into being production quality for the real-world uses of 
materialized views.  How useful your prototype is depends on how well it 
implements a subset of that in a way that will get used by the final design.


The main hidden complexity in this particular project relates to 
handling view refreshes.  The non-obvious problem is that when the view 
updates, you need something like a SQL MERGE to really handle that in a 
robust way that doesn't conflict with concurrent access to queries 
against the materialized view.  And work on MERGE support is itself 
blocked behind the fact that PostgreSQL doesn't have a good way to lock 
access to a key value that doesn't exist yet--what other databases call 
key range locking.  See the notes for Add SQL-standard 
MERGE/REPLACE/UPSERT command at http://wiki.postgresql.org/wiki/Todo 
for more information.


You can work around that to build a prototype by grabbing a full table 
lock on the materialized view when updating it, but that's not a 
production quality solution.  Solving that little detail is actually 
more work than the entire project you've outlined.  Your suggested 
implementation--In function CloseIntoRel executor swap relfilenode's of 
temp table and original table and finally delete temp table--is where 
the full table lock is going to end up at.  The exact use cases that 
need materialized views cannot handle a CLUSTER-style table recreation 
each time that needs an exclusive lock to switchover, so that whole part 
of your design is going to be a prototype that doesn't work at all like 
what needs to get built to make this feature committable.  It's also not 
a reasonable assumption that you have enough disk space to hold a second 
copy of the MV in a production system.


Once there's a good way to merge updates, how to efficiently generate 
them against the sort of large data sets that need materalized views--so 
you just write out the updates rather than a whole new copy--is itself a 
large project with a significant quantity of academic research to absorb 
before starting.  Dan Colish at Portland State has been playing around 
with prototypes for the specific problem of finding a good algorithm for 
view refreshing that is compatible with PostgreSQL's execution model.  
He's already recognized the table lock issue here and for the moment is 
ignoring that part.  I don't have a good feel yet for how long the 
targeted update code will take to mature, but based on what I do know I 
suspect that little detail is also a larger effort than the entire scope 
you're envisioning.  There's a reason why the MIT Press compendium 
Materialized Views: Techniques, Implementations, and Applications is 
over 600 pages long--I hope you've already started digging through that 
material.


Now, with all that said, that doesn't mean there's not a useful project 
for you buried in this mess.  The first two steps in your plan:


1) create materialized view
2) change rewriter

Include building a prototype grammer, doing an initial executor 
implementation, and getting some sort of rewriter working.  That is 
potentially good groundwork to lay here.  I would suggest that you 
completely drop your step 3:


3) create command that takes snapshot (refresh MV)

Because you cannot built that in a way that will be useful (and by that 
I mean committable quality) until there's a better way to handle updates 
than writing a whole new table and grabbing a full 

Re: [HACKERS] GSoC - proposal - Materialized Views in PostgreSQL

2010-04-09 Thread Robert Haas
2010/4/9 Greg Smith g...@2ndquadrant.com:
 The main hidden complexity in this particular project relates to handling
 view refreshes.  The non-obvious problem is that when the view updates, you
 need something like a SQL MERGE to really handle that in a robust way that
 doesn't conflict with concurrent access to queries against the materialized
 view.  And work on MERGE support is itself blocked behind the fact that
 PostgreSQL doesn't have a good way to lock access to a key value that
 doesn't exist yet--what other databases call key range locking.  See the
 notes for Add SQL-standard MERGE/REPLACE/UPSERT command at
 http://wiki.postgresql.org/wiki/Todo for more information.

 You can work around that to build a prototype by grabbing a full table lock
 on the materialized view when updating it, but that's not a production
 quality solution.  Solving that little detail is actually more work than the

Hmm... I am not sure you're right about this.  It's not obvious to me
that a brief full-table lock wouldn't be acceptable for an initial
implementation.  Obviously it wouldn't be suitable for every use case
but since we're talking about manually refreshed views that was bound
to be true anyway.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


<    1   2