On 28 January 2016 at 14:06, Anastasia Lubennikova < a.lubennik...@postgrespro.ru> wrote:
> > 31.08.2015 10:41, Anastasia Lubennikova: > > Hi, hackers! > I'm going to begin work on effective storage of duplicate keys in B-tree > index. > The main idea is to implement posting lists and posting trees for B-tree > index pages as it's already done for GIN. > > In a nutshell, effective storing of duplicates in GIN is organised as > follows. > Index stores single index tuple for each unique key. That index tuple > points to posting list which contains pointers to heap tuples (TIDs). If > too many rows having the same key, multiple pages are allocated for the > TIDs and these constitute so called posting tree. > You can find wonderful detailed descriptions in gin readme > <https://github.com/postgres/postgres/blob/master/src/backend/access/gin/README> > and articles <http://www.cybertec.at/gin-just-an-index-type/>. > It also makes possible to apply compression algorithm to posting list/tree > and significantly decrease index size. Read more in presentation (part 1) > <http://www.pgcon.org/2014/schedule/attachments/329_PGCon2014-GIN.pdf>. > > Now new B-tree index tuple must be inserted for each table row that we > index. > It can possibly cause page split. Because of MVCC even unique index could > contain duplicates. > Storing duplicates in posting list/tree helps to avoid superfluous splits. > > > I'd like to share the progress of my work. So here is a WIP patch. > It provides effective duplicate handling using posting lists the same way > as GIN does it. > > Layout of the tuples on the page is changed in the following way: > before: > TID (ip_blkid, ip_posid) + key, TID (ip_blkid, ip_posid) + key, TID > (ip_blkid, ip_posid) + key > with patch: > TID (N item pointers, posting list offset) + key, TID (ip_blkid, > ip_posid), TID (ip_blkid, ip_posid), TID (ip_blkid, ip_posid) > > It seems that backward compatibility works well without any changes. But I > haven't tested it properly yet. > > Here are some test results. They are obtained by test functions > test_btbuild and test_ginbuild, which you can find in attached sql file. > i - number of distinct values in the index. So i=1 means that all rows > have the same key, and i=10000000 means that all keys are different. > The other columns contain the index size (MB). > > i B-tree Old B-tree New GIN > 1 214,234375 87,7109375 10,2109375 > 10 214,234375 87,7109375 10,71875 > 100 214,234375 87,4375 15,640625 > 1000 214,234375 86,2578125 31,296875 > 10000 214,234375 78,421875 104,3046875 > 100000 214,234375 65,359375 49,078125 > 1000000 214,234375 90,140625 106,8203125 > 10000000 214,234375 214,234375 534,0625 > You can note that the last row contains the same index sizes for B-tree, > which is quite logical - there is no compression if all the keys are > distinct. > Other cases looks really nice to me. > Next thing to say is that I haven't implemented posting list compression > yet. So there is still potential to decrease size of compressed btree. > > I'm almost sure, there are still some tiny bugs and missed functions, but > on the whole, the patch is ready for testing. > I'd like to get a feedback about the patch testing on some real datasets. > Any bug reports and suggestions are welcome. > > Here is a couple of useful queries to inspect the data inside the index > pages: > create extension pageinspect; > select * from bt_metap('idx'); > select bt.* from generate_series(1,1) as n, lateral bt_page_stats('idx', > n) as bt; > select n, bt.* from generate_series(1,1) as n, lateral > bt_page_items('idx', n) as bt; > > And at last, the list of items I'm going to complete in the near future: > 1. Add storage_parameter 'enable_compression' for btree access method > which specifies whether the index handles duplicates. default is 'off' > 2. Bring back microvacuum functionality for compressed indexes. > 3. Improve insertion speed. Insertions became significantly slower with > compressed btree, which is obviously not what we do want. > 4. Clean the code and comments, add related documentation. > This doesn't apply cleanly against current git head. Have you caught up past commit 65c5fcd35? Thom