On Tue, Sep 24, 2013 at 1:04 PM, Stephen Frost <sfr...@snowman.net> wrote: > * Robert Haas (robertmh...@gmail.com) wrote: >> On Tue, Sep 24, 2013 at 12:00 PM, Stephen Frost <sfr...@snowman.net> wrote: >> > It wouldn't address my concerns anyway, which are around these binary >> > operators and the update-in-place approach being risky and setting us up >> > for problems down the road. >> >> I think that if you want to hold up a bug fix to a feature that's >> already committed, you need to be more specific than to say that there >> might be problems down the road. > > Committed isn't released and I simply can't agree that introducing new > operators is a 'bug fix'. Had it gone out as-is, I can't see us > back-patching a bunch of new operators to fix it.
You're perfectly welcome to argue that we should rip REFRESH MATERIALIZED VIEW CONCURRENTLY back out, or change the implementation of it. However, that's not the topic of this thread. Had it gone out as is, we would have either had to come up with some more complex fix that could make things right without catalog changes, or we would have had to document that it doesn't work correctly. Fortunately we are not faced with that conundrum. >> Note that, after we change the data in the underlying table, the >> output of the query changes. But REFRESH MATERIALIZED VIEW >> CONCURRENTLY doesn't fix it. However, REFRESH MATERIALIZED VIEW >> without CONCURRENTLY does fix it. That's a bug, because if there are >> two ways of refreshing a materialized view they should both produce >> the same answer. Shouldn't they? > > The same queries run over time without changes to the underlying data > really should return the same data, shoudln't they? Not necessarily. There are and always have been plenty of cases where that isn't true. > Is it a bug that they don't? No. > In general, I agree that they should produce the same results, as should > incrementally maintained views when they happen. I'm not convinced that > choosing whatever the 'new' value is to represent the value in the > matview for the equal-but-not-binary-identical will always be the > correct answer though. Well, you have yet to provide an example of where it isn't the right behavior. I initially thought that perhaps we needed a type-specific concept of exact equality, so that two things would be exactly equal if they would both be dumped the same way by pg_dump, even if the internal representations were different. But on further thought I'm no longer convinced that's a good idea. For example, consider the compact-numeric format that I introduced a few releases back. The changes are backward compatible, so you can run pg_upgrade on a cluster containing the old format and everything works just fine. But your table will be larger than you really want it to be until you rewrite it. Any materialized view that was built by selecting those numeric values will *also* be larger than you want it to be until you rewrite it. Fortunately, you can shrink the table by using a rewriting variant of ALTER TABLE. After you do that, you will perhaps also want to shrink the materialized view. And in fact, REFRESH will do that for you, but currently, REFRESH CONCURRENTLY won't. It seems to me that it would be nicer if it did. Now I admit that's an arguable point. We could certainly define an intermediate notion of equality that is more equal than whatever = does, but not as equal as exact binary equality. However, such a new notion would have no precedence in our existing code, whereas the use of binary equality does. Going to a lot of work to build up this intermediate notion of equality does not seem like a good idea to me; I think a general rule of good programming is not to invent more ways to do things than are clearly necessary, and requiring every core and extension type to define an exact-equality operator just to support this feature seems like excessive in the extreme. Moreover, I think what to include in that intermediate notion of equality would be kind of hard to decide, because there are a lot of subtly different questions that one can ask, and we'd inevitably have to answer the question "how equal does it have to be to be equal enough?". Numerics and arrays have multiple ways of referencing what is intended to be the exact same value, but those can be disambiguated by passing them to a function that cares, like pg_column_size(). Then, on a user-visible level, numerics allow variation in the number of trailing zeroes after the decimal point. Floats have extra digits that aren't shown except with extra_float_digits, so that the value can be less than totally equal even if the text representation is the same. citext intentionally ignores case. In every one of those cases, it's possible that the user of materialized views might say "you know, if it's equal enough that the = operator says it's the same, then I really don't mind if the materialized view maintenance gets skipped". But it's also possible that they might say, "you know, those values are not really the same, and the materialized view really ought to reflect the latest data". I think the conservative (and therefore correct) approach is to decide that we're always going to update if we detect any difference at all. The only real argument for ignoring any differences is that it will mean more changes to the materialized view. And that is true enough, but in practical scenarios such updates will be rare. I would argue that if someone goes through and add spends a lot of time adding or removing trailing zeroes to the base tables on which a materialized view is based, the likelihood is that those trailing zeros do matter and the change ought to reflect through to the materialized views as well. It is not impossible that someone could upgrade to a version of PG that supports a more compact or otherwise-enhanced version of some data type, rewrites the table for some reason, and is then surprised to see a change storm on the associated materialized view as well. But I don't think it's very likely - we have no such changes planned. And even if does happen at some point down the road, I can't see justifying the effort of building a third notion of equality as effort well-spent to avoid it, especially because it's equally possible that the user *would* want the changes to propagate through to the materialized view. >> The fact that users can write >> queries that don't always give the same answer is not a reason why >> it's OK for REFRESH CONCURRENTLY to misbehave on queries that do. > > This really is, imv, agreeing to hold a higher standard for matviews > than we do for what matviews are *based* on- which is queries. I don't think it is, and your explanation of why you think it is does not make any sense to me. If a view produced an answer that the underlying query *could not have produced* then we would clearly consider that a bug. But, you're arguing that a materialized view on which REFRESH CONCURRENTLY should be allowed to produce just such answers. It is obviously true, and unavoidable, that if the query can produce more than one result set depending on the query plan or other factors, then the materialized view contents can match only one of those possible result sets. But you are arguing that it should be allowed to produce some OTHER result set, that a direct execution of the query could *never* have produced, and I can't see how that can be right. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers