Stephen Frost <sfr...@snowman.net> wrote: > Robert Haas (robertmh...@gmail.com) wrote:
>> Now, the next project Kevin's going to work on, and that he was >> working on when he discovered this problem, is incremental >> maintenance: that is, allowing us to update the view *without* >> needing to rerun the entire query. This record comparison >> operator will be just as important in that context. > > You state this but I don't see where you justify this claim.. Unless we can tell whether there are any differences between two versions of a row, we can't accurately generate the delta to drive the incremental maintenance. The initial thread discussing how incremental maintenance would be done is here: http://www.postgresql.org/message-id/flat/1368561126.64093.yahoomail...@web162904.mail.bf1.yahoo.com#1368561126.64093.yahoomail...@web162904.mail.bf1.yahoo.com The thread on the initial patch for REFRESH MATERIALIZED VIEW CONCURRENTLY started with: | Attached is a patch for REFRESH MATERIALIZED VIEW CONCURRENTLY | for 9.4 CF1. The goal of this patch is to allow a refresh | without interfering with concurrent reads, using transactional | semantics. | | It is my hope to get this committed during this CF to allow me to | focus on incremental maintenance for the rest of the release cycle. There was much discussion, testing, and revision then, which is here: http://www.postgresql.org/message-id/flat/1371225929.28496.yahoomail...@web162905.mail.bf1.yahoo.com#1371225929.28496.yahoomail...@web162905.mail.bf1.yahoo.com I think pretty much every concern raised was addressed except for a lingering doubt expressed by Noah over whether IS NOT DISTINCT FROM semantics were really the right basis for matching rows. Based on that feedback, I spent a lot of time looking at why that might or might not be correct, and decided that I had been wrong to base the behavior on that, for the reasons Robert expressed so clearly a couple messages back. Hence this patch. I had made the mistake of using an operator which used a column-by-column comparison based on the equality operator for the default opclass for comparing two values of each respective column type. I had been challenged on that in the review process, and was responding to it with the fix contained in this patch. The first post on this thread starts with: | Attached is a patch for a bit of infrastructure I believe to be | necessary for correct behavior of REFRESH MATERIALIZED VIEW | CONCURRENTLY as well as incremental maintenance of matviews. I think it is fairly obvious that REFRESH should REgenerate a FRESH copy of the data, versus incremental maintenance -- which attempts to keep the matview up-to-date without regenerating the full set of data. Whenever there is logical replication (and materialized views are, conceptually, one form of that -- within the database) I feel it is important to be able to correct any possible "drift". With matviews, I see the way to do that as the REFRESH command, and I feel that it is important to be able to do that in a way that can run concurrently with readers of the matview -- without blocking them or being blocked by them. Discussion of incremental maintenance really belongs on a different thread. Since I have gone to the trouble to read a lot of papers on the topic, and select one that I think is a good basis for our implementation, I hope everyone will frame discussion in terms of either: - how best to implement the techniques from that paper, or - why some other paper presents a better technique. I think it would be madness to approach implementing incremental maintenance based on an ad hoc set of thoughts rather than a peer-reviewed paper. I know how tempting it is it start from zero and think, "if we just do X we can cover this sort of query." I had a few thoughts like that before I read all those papers and discovered that there were many subtle issues to cover. We will have plenty of time to get creative with alternatives when we get past the query types specifically addressed in the paper and begin to move into, for example, CTEs and window functions. I certainly expect robust discussion around those areas, or even some of the infrastructure for capturing deltas and triggering the incremental maintenance. I really didn't expect to have to burn so much time and energy arguing over whether a REFRESH should leave the matview accurately containing the results of the matview's query. >> We can argue about how it should be named After looking at the existing suggestions and thinking about it I'm leaning toward these operators (based on a star in front of the usual default comparison operators): *= *<> *> *>= *< *<= >> and whether it should be documented I thought we had a consensus to document both the existing record comparison operators and these new ones, and I'm fine with that. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers