On Thu, Jul 4, 2019 at 10:38 AM Peter Geoghegan <p...@bowt.ie> wrote: > I tried this on my own "UK land registry" test data [1], which was > originally used for the v12 nbtree work. My test case has a low > cardinality, multi-column text index. I chose this test case because > it was convenient for me. > > On v12/master, the index is 1100MB. Whereas with your patch, it ends > up being 196MB -- over 5.5x smaller!
I also see a huge and consistent space saving for TPC-H. All 9 indexes are significantly smaller. The lineitem orderkey index is "just" 1/3 smaller, which is the smallest improvement among TPC-H indexes in my index bloat test case. The two largest indexes after the initial bulk load are *much* smaller: the lineitem parts supplier index is ~2.7x smaller, while the lineitem ship date index is a massive ~4.2x smaller. Also, the orders customer key index is ~2.8x smaller, and the order date index is ~2.43x smaller. Note that the test involved retail insertions, not CREATE INDEX. I haven't seen any regression in the size of any index so far, including when the number of internal pages is all that we measure. Actually, there seems to be cases where there is a noticeably larger reduction in internal pages than in leaf pages, probably because of interactions with suffix truncation. This result is very impressive. We'll need to revisit what the right trade-off is for the compression scheme, which Heikki had some thoughts on when we left off 3 years ago, but that should be a lot easier now. I am very encouraged by the fact that this relatively simple approach already works quite nicely. It's also great to see that bulk insertions with lots of compression are very clearly faster with this latest revision of your patch, unlike earlier versions from 2016 that made those cases slower (though I haven't tested indexes that don't really use compression). I think that this is because you now do the compression lazily, at the point where it looks like we may need to split the page. Previous versions of the patch had to perform compression eagerly, just like GIN, which is not really appropriate for nbtree. -- Peter Geoghegan