BRIN indexes seem to work perfectly well for our purposes, and they are so tiny compared to B-Tree. Selecting min/max values is very expensive though.
In my case the table is ~2.5TB (530M records), while the whole BRIN index is 16MB. I think it'd be totally fine to scan all BRIN pages, it'd be way better than doing table scan. On Wed, Mar 29, 2023 at 1:47 PM Francisco Olarte <fola...@peoplecall.com> wrote: > On Wed, 29 Mar 2023 at 22:07, Vladimir Sitnikov > <sitnikov.vladi...@gmail.com> wrote: > > > Is it correct that BRIN indexes don't support MIN/MAX operations ? > > In theory, it should be possible to implement min/max scan support for > BRIN, however it is not implemented yet. > > > > Just in case, min/max query would require to read all BRIN pages, and > then it would require to read the corresponding pages in table. > > > For instance, imagine the table has N pages. Then BRIN would have N/128 > pages with the default pages_per_range=128, so your max(..) query would > take N/128 + 128 pages to read. In theory it would be sequential, however, > under concurrent load it might not be that sequential for the disk. > > I think BRIN would require N/128 RANGES, not pages, and if I am not > mistaken it fits several ranges in an index page. It talks of summary > tuples, and I suspect a summary tuple for say, an integer, is not > gonna be longer, than 128 bytes, in which case you could fit 64 of > them in a 4k page. > > Also, if you account for possible concurrent load disturbing your > index+partial scan, you also have to account for the ( more likely ) > disruption on the full scan. > > I.e., I have this table > apc | apc_cdrs_p2022_12 | table | > postgres | permanent | heap | 860 MB | > N/128 pages implies N/128 bytes, so index would be 6.7Mb in your numbers, > but > apc | apc_cdrs_p2022_12_cuando_idx | index > | postgres | apc_cdrs_p2022_12 | permanent | brin > | 64 kB | > apc | apc_cdrs_p2022_12_raw_id_idx | index > | postgres | apc_cdrs_p2022_12 | permanent | brin > | 64 kB | > 1st one is on a timestamp column, second on an integer. And several > empty partitions hace 48kB indexes, so it seems data is just 16k for > the 860 ranges. That could be about 20 bytes/range which more or less > fits to a couple of values. > > In my experience, BRIN are ridiculously small. I use them on that > particular table because both cuando and raw_id correlate with > insertion order and I normally only read several megabytes ranges > indexed on them, so they work very well in limiting the scan range to > nearly what I need. > > > For instance, 80GiB table would be like 10’000’000 pages, so the default > BRIN would take about 78’000 pages (625MiB), so the min/max scan would read > 626 MiB > > If pages per range is increased to ~3162, then index size would be ~3162 > pages (25MiB), and each index entry would cover 25MiB range. Then the query > would have to read ~50MiB to fetch min/max. It is not clear if that is > really practical though. > > If you assume your index fits 64 tuples per page your index read drops > to about 10Mb, plus the 1Mb range. > > Also, I suspect you will have to read all unsummarized ranges ( > probably before the summarized ones, as unsummarized can discard > summarizeds, but not the other way ). > > Francisco Olarte. > -- Andrey Klochkov