On Fri, Sep 20, 2013 at 11:23 AM, Stephen Frost <sfr...@snowman.net> wrote: > Perhaps I'm assuming things are farther along than they are.. I was > assumed that 'incremental mat view' updates were actuallly 'partial'- > iow, that it keeps track of the rows in the mat view which are > impacted by the set of rows in the base relations and then runs specific > queries to pull out and operate on the base-rel set-of-rows to update > the specific mat-view-rows. If it's running the whole query again then > it's certainly more likely to get the same results that the user did, > but it's not a guarantee and that's only a happy coincidence while we're > still doing the whole query approach (which I hope we'd eventually > progress beyond..).
It seems odd to me that you have such strong opinions about what is and is not acceptable here given that you don't seem to familiar with the current state of this work. I will attempt to summarize my understanding. In 9.3, we can refresh a materialized view by taking an access exclusive lock on the relation, rerunning the query, and replacing the contents wholesale. In master, there is a new option to perform the refresh concurrently, which is documented here: http://www.postgresql.org/docs/devel/static/sql-refreshmaterializedview.html It reruns the query in its entirety and then figures out what inserts, updates, and deletes are needed to make the matview match the output of the query (see refresh_by_match_merge). This is an improvement over what is available in 9.3, because even though you still have to rerun the full query, you don't have to lock out access to the table in order to apply the changes. However, currently, it sometimes fails to perform updates that are needed to make the contents of the view match the query output, because it relies on a notion of equality other than exact equality. Kevin is proposing to fix this problem via this patch. Now, the next project Kevin's going to work on, and that he was working on when he discovered this problem, is incremental maintenance: that is, allowing us to update the view *without* needing to rerun the entire query. This record comparison operator will be just as important in that context. The *only* strategy refreshing a materialized view that *doesn't* need this operator is the only we have in 9.3: through out all the old data and replace it with the result of re-executing the query. If you want anything smarter than that, you MUST compare old and new rows for equality so that you can update only those rows that have been changed. And if you compare them *any strategy other than the one Kevin is proposing*, namely binary equality, then you may sometimes decide that a row has not been changed when it really has, and then you won't update the row, and then incremental maintenance will be enabled to produce *wrong answers*. So to me this has come to seem pretty much open and shut. We all know that materialized views are not going to always match the data returned by the underlying query. Perhaps the canonical example is SELECT random(). As you pointed out upthread, any query that is nondeterministic is a potential problem for materialized views. When you write a query that can return different output based on the order in which input rows are scanned, or based on any sort of external state such as a random-number generator, each refresh of a materialized view based on that query may produce different answers. There's not a lot we can do about that, except tell people to avoid using such queries in materialized views that they expect to be stable. However, what we're talking about here is a completely separate problem. Even if the query is 100% deterministic, without this patch, the materialized view can get out of sync with the query results. Granted, most of the ways in which it can get out of sync are fairly subtle: failing to preserve case in a data type where comparisons are text-insensitive; gaining or loosing an all-zeroes null bitmap on an array type; adding or removing trailing zeroes after the decimal point in a numeric. If the materialized view sometimes said "1" when the query was returning "0", we'd presumably all say "that's a bug, let's fix it". But what we're actually talking about is that the query returns "0.00" and the view still says zero. So we're doing a lot of soul-searching about whether that's unequal enough to justify updating the row. Personally, though, there's not a lot of doubt in my mind. If I create a table and I put 0 into a column of that table and then create a materialized view and that 0 ends up in the materialized view, and then I update the 0 to 0.00 and refresh the view, I expect that change to propagate through to the materialized view. It works that way if I select from a regular, non-materialized view; and it also works that way if I select from the table itself. The idea that materialized views should somehow be exempt from reflecting changes to the underlying data in certain corner cases seems odd and indefensible to me, and I can't understand why anyone's arguing that we should do that. If we're going to avoid that, we need this operator. We can argue about how it should be named and whether it should be documented, and we can have all those arguments and still fix the problem. But if we decide we're not going to add this operator, then that seems to be basically saying that we don't want to allow materialized views to accurately reflect the results of the underlying queries. And I think that would be an extremely poor decision. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers