* Kevin Grittner (kgri...@ymail.com) wrote: > I'm trying to figure out what situation you think we're in. > Seriously, if you could apply the patch and show one example that > demonstrates what you see to be a problem, that would be great.
Here's an example to illustrate what I'm talking about when it comes down to "you can't claim that you'll produce exactly what the query will always, with these types:" =# table citext_table; id | name ----+------- 1 | one 3 | three 5 | 4 | Three 2 | Three (5 rows) =# create MATERIALIZED VIEW citext_matview AS select name from citext_table where id > 0 group by name; SELECT 3 =# table citext_matview; name ------- one three (3 rows) =# set enable_seqscan = false; SET =# select name from citext_table where id > 0 group by name; name ------- one Three (3 rows) Yes, the 'set enable_seqscan = false;' is a bit of a cheat- but I hope you agree that it *shouldn't* change the *result* of a query. There's certainly other cases where a different plan could be picked and result in the same kind of difference between the matview and running the query. Build a matview w/ max(id) as id and a unique index on 'id' and you'll still get the same issue. The problem is that the seqscan will pick up on 'three' first while the index scan will find 'Three' first. This is all with the patch from http://www.postgresql.org/message-id/1379024847.48294.yahoomail...@web162904.mail.bf1.yahoo.com applied. I simply don't believe it's possible to be consistent in this unless we declare the type's equality to be insufficient for us and, everywhere that we call a type's equality operator, *also* check which of the equal values involved in the comparison is "bigger" or "lower" (based on binary comparison) and actually pick one, consistently, all the time. This is why I'm complaining that we're trying to paper over a difference that just isn't that simple. I understand why you're trying to- it's definitely annoying, but this isn't so much a solution as it is a hack that doesn't actually work in all cases. I don't have a silver bullet for this but I don't like saying "let's implement these binary operators and make things *look* consistent in most cases, and then fail subtly in complex situations." I agree that users may complain if the underlying relation ends up not having *any* entries with the value that's in the matview and a refresh doesn't update it. I expect users will *also* complain if we implement these operators and then they write their own queries using this awesome new 'binary' comparison operator to compare their equal-to-citext strings- and discover that the binary operators say things that *look* the same are actually *different* (thinking encoding issues, overlong encodings, etc). That'd be a subtle and painful bug (technically in their code, not ours, but still) to find, and then what do you do? Argueing against citext (as I've had to do in the past..) would probably be more difficult for some if they saw these operators and figured they could use them. As I mentioned in the thread w/ Robert, when looking at further optimizations down the road, we're going to be in a situation of looking at only a subset of the records and then considering what to do when supporting MAX() efficiently leads us to running 'greater(existing,new)' which returns false, but q and r are binary different. This argument is saying "always replace what's there" but every other, existing, value in the table might be represented by the 'existing' representation. Thanks, Stephen
signature.asc
Description: Digital signature