On Mon, Jun 6, 2022 at 10:42 PM Chinmay Kanchi <cgkan...@gmail.com> wrote: > The simulated index in this case is outrageously fast, up to ~150x on the > GROUP BY.
Couldn't you make a similar argument in favor of adding a B-Tree index on "country"? This probably won't be effective in practice, but the reasons for this have little to do with how a B-Tree index represents TIDs. A GIN index can compress TIDs much more effectively, but the same issues apply there. The main reason why it won't work with a B-Tree is that indexes in Postgres are not transactionally consistent structures, in general. Whereas your cities_rb table is transactionally consistent (or perhaps just simulates a transactionally consistent index). Maybe it could work in cases where an index-only scan could be used, which is roughly comparable to having a transactionally consistent index. But that depends on having the visibility map set most or all heap pages all-visible. GIN indexes don't support index-only scans, and I don't see that changing. So it's possible that just adding TID compression to B-Tree indexes would significantly speedup this kind of query, just by making low cardinality indexes much smaller. Though that's a hard project, for many subtle reasons. This really amounts to building a bitmap index, of the kind that are typically used for data warehousing, which is something that has been discussed plenty of times on this list. GIN indexes were really built for things like full-text search, not for data warehousing. B-Tree deduplication makes B-Tree indexes a lot smaller, but it tends to "saturate" at about 3.5x smaller (relative to the same index with deduplication disabled) once there are about 10 or so distinct keys per row (the exception is indexes that happen to have huge keys, which aren't very interesting). There are many B-Tree indexes (with typical sized keys) that are similar in size to an "equivalent" GIN index -- the ability to compress TIDs isn't very valuable when you don't have that many TIDs per key anyway. It's different when you have many TIDs per key, of course. GIN indexes alone don't "saturate" at the same point -- there is often a big size difference between low cardinality and ultra low cardinality data. There are bound to be cases where not having that level of space efficiency matters, especially with B-Tree index-only scans that scan a significant fraction of the entire index, or even the entire index. -- Peter Geoghegan