Bruce Momjian <br...@momjian.us> wrote: > I get your point about COUNT(*) really counting rows, not values, > but why doesn't GROUP BY then skip nulls? > > WITH null_test (col1, col2) AS > ( > SELECT 1, null > UNION ALL > SELECT null, null > ) > SELECT COUNT(*), col2 FROM null_test group by col2 > UNION ALL > SELECT COUNT(col1), col2 FROM null_test group by col2; > > count | col2 > -------+------ > 2 | > 1 | > (2 rows) I'm having trouble seeing why this result is confusing. You asked for counts with a GROUP BY clause. In such a case, NULLs are a group, since you might be interested in how many *are* null. Then you did a count of all rows and a count of rows where another value wasn't NULL. You got the only reasonable result, IMO. (Well, unless you argue that a row with no known values should be an error in the first place, which if I remember correctly is what E.F. Codd argued for; but that is one point on which the standards committee didn't go with Codd's position.) > Looks like GROUP BY is selecting the NULL rows, then COUNT is > processing them based on its rules. I would tend to view it that COUNT is processing the rows it was asked to process in each case, and GROUP BY is grouping them as requested. > I think the original complaint is that NULL != NULL in a WHERE > clause, but GROUP BY is able to group them together just fine. Whoa! I think I know what you meant, but that is a dangerously misleading misstatement. It is not true that NULL = NULL, but neither is it true that NULL != NULL. In fact, you also don't get TRUE from NOT NULL = NULL. If you don't know either value, you can't know that they are equal, and you can't know that they are unequal. The results of such comparisons are UNKNOWN. That doesn't mean you would always find the count of rows where the value is NULL uninteresting; hence the IS NOT DISTINCT FROM concept is effectively used for grouping. Performing logical or set operations on data sets with missing values is an inherently tricky business, but I think that overall SQL has made reasonable choices on how to do that; my biggest gripe is that there is no standard way to distinguish between UNKNOWN and NOT APPLICABLE. The fuzziest areas seem to me to be related to that deficiency. As long as NULL is not abused for such things as "known to be zero" in an accounting record (which is subtly but significantly different from "not applicable"), NULL is almost always (IMO) better than some "magic value". If you have ever converted data from a database where names were split into multiple fields, and NULL was not allowed for middle name, you will probably agree. -Kevin
-- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs