Re: btree: implement dynamic prefix truncation (was: Improving btree performance through specializing by key shape, take 2)
On Tue, Aug 6, 2024 at 5:42 PM Matthias van de Meent wrote: > Attached is version 16 now. I ran this with my old land registry benchmark, used for validating the space utilization impact of nbtree deduplication (among other things). This isn't obviously the best benchmark for this sort of thing, but I seem to recall that you'd used it yourself at some point. To validate work in this area, likely including this patch. So I decided to start there. To be clear, this test involves bulk loading of an unlogged table (the land registry table). The following composite index is created on the table before we insert any rows, so most of the cycles here are in index maintenance including _bt_search descents: CREATE INDEX composite ON land2 USING btree (county COLLATE "C", city COLLATE "C", locality COLLATE "C"); I wasn't able to see much of an improvement with this patch applied. It went from ~00:51.598 to ~00:51.053. That's a little disappointing, given that this is supposed to be a sympathetic case for the patch. Can you suggest something else? (Granted, I understand that this patch has some complicated relationship with other patches of yours, which I don't understand currently.) I'm a bit worried about side-effects for this assertion: @@ -485,7 +489,7 @@ _bt_check_unique(Relation rel, BTInsertState insertstate, Relation heapRel, Assert(insertstate->bounds_valid); Assert(insertstate->low >= P_FIRSTDATAKEY(opaque)); Assert(insertstate->low <= insertstate->stricthigh); - Assert(_bt_compare(rel, itup_key, page, offset) < 0); + Assert(_bt_compare(rel, itup_key, page, offset, &sprefix) < 0); break; } More generally, it's not entirely clear how the code in _bt_check_unique is supposed to work with the patch. Why should it be safe to do what you're doing with the prefix there? It's not like we're doing a binary search here -- it's more like a linear search. > - Move from 1-indexed AttributeNumber to 0-indexed ints for prefixes, > and use "prefix" as naming scheme, rather than cmpcol. A lack of > prefix, previously indicated with a cmpcol value of 1, is now a prefix > value of 0. Found a likely-related bug in the changes you made to amcheck, which I was able to fix myself like so: diff --git a/contrib/amcheck/verify_nbtree.c b/contrib/amcheck/verify_nbtree.c index c7dc6725a..15be61777 100644 --- a/contrib/amcheck/verify_nbtree.c +++ b/contrib/amcheck/verify_nbtree.c @@ -3187,7 +3187,7 @@ bt_rootdescend(BtreeCheckState *state, IndexTuple itup) insertstate.buf = lbuf; /* Get matching tuple on leaf page */ -offnum = _bt_binsrch_insert(state->rel, &insertstate, 1); +offnum = _bt_binsrch_insert(state->rel, &insertstate, 0); /* Compare first >= matching item on leaf page, if any */ -- Peter Geoghegan
Re: btree: implement dynamic prefix truncation (was: Improving btree performance through specializing by key shape, take 2)
On Fri, 1 Mar 2024 at 14:48, Matthias van de Meent wrote: > Attached is version 15 of this patch, with the above issues fixed. > It's also rebased on top of 655dc310 of this morning, so that should > keep good for some time again. Attached is version 16 now. Relevant changes from previous patch versions: - Move from 1-indexed AttributeNumber to 0-indexed ints for prefixes, and use "prefix" as naming scheme, rather than cmpcol. A lack of prefix, previously indicated with a cmpcol value of 1, is now a prefix value of 0. - Adjusted README - Improved the efficiency of the insertion path in some cases where we've previously compared the page's highkey. As always, why we need this: Currently, btrees are quite inefficient when they have many key attributes but low attribute cardinality in the prefix, e.g. an index on ("", "", "", uuid). This is not just inefficient use of disk space with the high repetition of duplicate prefix values in pages, but it is also a computational overhead when we're descending the tree in e.g. _bt_first() or btinsert(): The code we use to search a page currently compares the full key to the full searchkey, for a complexity of O(n_equal_attributes + 1) for every tuple on the page, for O(log(page_ntups) * (n_equal_attributes + 1)) attribute compares every page during descent. This patch fixes that part of the computational complexity by applying dynamic prefix compression, thus reducing the average computational complexity in random index lookups to O(3 * (n_equal_attributes) + log(page_ntups)) per page (assuming at least 4 non-hikey tuples on each page). In practice, this makes indexes with 3+ attributes and prefixes with low selectivity (such as the example above) much more viable computationally, as we have to spend much less effort on comparing known index attributes during descent. Note that this does _not_ reuse prefix bounds across pages - it re-establishes the left- and right prefixes every page during the binary search. See the README modified in the patch for specific implementation details and considerations. This patch synergizes with the highkey optimization used in [0]: When combined, the number of attribute compare function calls could be further reduced to O(2 * (n_equal_atts) + log(page_ntups)), a reduction by n_equal_atts every page, which in certain wide index types could be over 25% of all attribute compare function calls on the page after dynamic prefix truncation. However, both are separately useful and reduce the amount of work done on most pages. Kind regards, Matthias van de Meent Neon (https://neon.tech) [0] https://www.postgresql.org/message-id/flat/CAEze2WijWhCQy_nZVP4Ye5Hwj=YW=3rqv+hbmjgcohtryqm...@mail.gmail.com v16-0001-btree-Implement-dynamic-prefix-truncation.patch Description: Binary data
Re: btree: implement dynamic prefix truncation (was: Improving btree performance through specializing by key shape, take 2)
On Wed, 24 Jan 2024 at 13:02, Matthias van de Meent wrote: > > 1. > > Commit message refers to a non-existing reference '(see [0])'. > > Noted, I'll update that. > > > 2. > > +When we do a binary search on a sorted set (such as a BTree), we know that > > a > > +tuple will be smaller than its left neighbour, and larger than its right > > +neighbour. > > > > I think this should be 'larger than left neighbour and smaller than > > right neighbour' instead of the other way around. > > Noted, will be fixed, too. Attached is version 15 of this patch, with the above issues fixed. It's also rebased on top of 655dc310 of this morning, so that should keep good for some time again. Kind regards, Matthias van de Meent Neon (https://neon.tech) v15-0001-btree-Implement-dynamic-prefix-compression.patch Description: Binary data
Re: btree: implement dynamic prefix truncation (was: Improving btree performance through specializing by key shape, take 2)
On Fri, 19 Jan 2024 at 05:55, Dilip Kumar wrote: > > On Wed, Nov 1, 2023 at 2:42 AM Matthias van de Meent > wrote: > > > > Hi, > > > > Currently, nbtree code compares each and every column of an index > > tuple during the binary search on the index page. With large indexes > > that have many duplicate prefix column values (e.g. an index on (bool, > > bool, uuid) ) that means a lot of wasted time getting to the right > > column. > > > > The attached patch improves on that by doing per-page dynamic prefix > > truncation: If we know that on both the right and left side there are > > index tuples where the first two attributes are equal to the scan key, > > we skip comparing those attributes at the current index tuple and > > start with comparing attribute 3, saving two attribute compares. We > > gain performance whenever comparing prefixing attributes is expensive > > and when there are many tuples with a shared prefix - in unique > > indexes this doesn't gain much, but we also don't lose much in this > > case. > > > > This patch was originally suggested at [0], but it was mentioned that > > they could be pulled out into it's own thread. Earlier, the > > performance gains were not clearly there for just this patch, but > > after further benchmarking this patch stands on its own for > > performance: it sees no obvious degradation of performance, while > > gaining 0-5% across various normal indexes on the cc-complete sample > > dataset, with the current worst-case index shape getting a 60%+ > > improved performance on INSERTs in the tests at [0]. > > +1 for the idea, I have some initial comments while reading through the patch. Thank you for the review. > 1. > Commit message refers to a non-existing reference '(see [0])'. Noted, I'll update that. > 2. > +When we do a binary search on a sorted set (such as a BTree), we know that a > +tuple will be smaller than its left neighbour, and larger than its right > +neighbour. > > I think this should be 'larger than left neighbour and smaller than > right neighbour' instead of the other way around. Noted, will be fixed, too. > 3. > +With the above optimizations, dynamic prefix truncation improves the worst > +case complexity of indexing from O(tree_height * natts * log(tups_per_page)) > +to O(tree_height * (3*natts + log(tups_per_page))) > > Where do the 3*natts come from? Is it related to setting up the > dynamic prefix at each level? Yes: We need to establish prefixes for both a tuple that's ahead of the to-be-compared tuple, and one that's after the to-be-compared tuple. Assuming homogenous random distribution of scan key accesses across the page (not always the case, but IMO a reasonable starting point) this would average to 3 unprefixed compares before you have established both a higher and a lower prefix. > 4. > + /* > + * All tuple attributes are equal to the scan key, only later attributes > + * could potentially not equal the scan key. > + */ > + *compareattr = ntupatts + 1; > > Can you elaborate on this more? If all tuple attributes are equal to > the scan key then what do those 'later attributes' mean? In inner pages, tuples may not have all key attributes, as some may have been truncated away in page splits. So, tuples that have at least the same prefix as this (potentially truncated) tuple will need to be compared starting at the first missing attribute of this tuple, i.e. ntupatts + 1. Kind regards, Matthias van de Meent
Re: btree: implement dynamic prefix truncation (was: Improving btree performance through specializing by key shape, take 2)
On Wed, Nov 1, 2023 at 2:42 AM Matthias van de Meent wrote: > > Hi, > > Currently, nbtree code compares each and every column of an index > tuple during the binary search on the index page. With large indexes > that have many duplicate prefix column values (e.g. an index on (bool, > bool, uuid) ) that means a lot of wasted time getting to the right > column. > > The attached patch improves on that by doing per-page dynamic prefix > truncation: If we know that on both the right and left side there are > index tuples where the first two attributes are equal to the scan key, > we skip comparing those attributes at the current index tuple and > start with comparing attribute 3, saving two attribute compares. We > gain performance whenever comparing prefixing attributes is expensive > and when there are many tuples with a shared prefix - in unique > indexes this doesn't gain much, but we also don't lose much in this > case. > > This patch was originally suggested at [0], but it was mentioned that > they could be pulled out into it's own thread. Earlier, the > performance gains were not clearly there for just this patch, but > after further benchmarking this patch stands on its own for > performance: it sees no obvious degradation of performance, while > gaining 0-5% across various normal indexes on the cc-complete sample > dataset, with the current worst-case index shape getting a 60%+ > improved performance on INSERTs in the tests at [0]. +1 for the idea, I have some initial comments while reading through the patch. 1. Commit message refers to a non-existing reference '(see [0])'. 2. +When we do a binary search on a sorted set (such as a BTree), we know that a +tuple will be smaller than its left neighbour, and larger than its right +neighbour. I think this should be 'larger than left neighbour and smaller than right neighbour' instead of the other way around. 3. +With the above optimizations, dynamic prefix truncation improves the worst +case complexity of indexing from O(tree_height * natts * log(tups_per_page)) +to O(tree_height * (3*natts + log(tups_per_page))) Where do the 3*natts come from? Is it related to setting up the dynamic prefix at each level? 4. + /* + * All tuple attributes are equal to the scan key, only later attributes + * could potentially not equal the scan key. + */ + *compareattr = ntupatts + 1; Can you elaborate on this more? If all tuple attributes are equal to the scan key then what do those 'later attributes' mean? -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: btree: implement dynamic prefix truncation (was: Improving btree performance through specializing by key shape, take 2)
st 1. 11. 2023 v 11:32 odesílatel Matthias van de Meent < boekewurm+postg...@gmail.com> napsal: > On Wed, 1 Nov 2023 at 07:47, Pavel Stehule > wrote: > > > > Hi > > > > út 31. 10. 2023 v 22:12 odesílatel Matthias van de Meent < > boekewurm+postg...@gmail.com> napsal: > >> This patch was originally suggested at [0], but it was mentioned that > >> they could be pulled out into it's own thread. Earlier, the > >> performance gains were not clearly there for just this patch, but > >> after further benchmarking this patch stands on its own for > >> performance: it sees no obvious degradation of performance, while > >> gaining 0-5% across various normal indexes on the cc-complete sample > >> dataset, with the current worst-case index shape getting a 60%+ > >> improved performance on INSERTs in the tests at [0]. > > > > > > +1 > > Thanks for showing interest. > > > This can be nice functionality. I had a customer with a very slow index > scan - the main problem was a long common prefix like prg010203. > > I'll have to note that this patch doesn't cover cases where e.g. text > attributes have large shared prefixes, but are still unique: the > dynamic prefix compression in this patch is only implemented at the > tuple attribute level; it doesn't implement type aware dynamic prefix > compression inside the attributes. So, a unique index on a column of > int32 formatted like '%0100i' would not materially benefit from this > patch. > > While would certainly be possible to add some type-level prefix > truncation in the framework of this patch, adding that would require > significant code churn in btree compare operators, because we'd need > an additional return argument to contain a numerical "shared prefix", > and that is not something I was planning to implement in this patch. > Thanks for the explanation. Pavel > Kind regards, > > Matthias van de Meent > Neon (https://neon.tech) >
Re: btree: implement dynamic prefix truncation (was: Improving btree performance through specializing by key shape, take 2)
On Wed, 1 Nov 2023 at 07:47, Pavel Stehule wrote: > > Hi > > út 31. 10. 2023 v 22:12 odesílatel Matthias van de Meent > napsal: >> This patch was originally suggested at [0], but it was mentioned that >> they could be pulled out into it's own thread. Earlier, the >> performance gains were not clearly there for just this patch, but >> after further benchmarking this patch stands on its own for >> performance: it sees no obvious degradation of performance, while >> gaining 0-5% across various normal indexes on the cc-complete sample >> dataset, with the current worst-case index shape getting a 60%+ >> improved performance on INSERTs in the tests at [0]. > > > +1 Thanks for showing interest. > This can be nice functionality. I had a customer with a very slow index scan > - the main problem was a long common prefix like prg010203. I'll have to note that this patch doesn't cover cases where e.g. text attributes have large shared prefixes, but are still unique: the dynamic prefix compression in this patch is only implemented at the tuple attribute level; it doesn't implement type aware dynamic prefix compression inside the attributes. So, a unique index on a column of int32 formatted like '%0100i' would not materially benefit from this patch. While would certainly be possible to add some type-level prefix truncation in the framework of this patch, adding that would require significant code churn in btree compare operators, because we'd need an additional return argument to contain a numerical "shared prefix", and that is not something I was planning to implement in this patch. Kind regards, Matthias van de Meent Neon (https://neon.tech)
Re: btree: implement dynamic prefix truncation (was: Improving btree performance through specializing by key shape, take 2)
Hi út 31. 10. 2023 v 22:12 odesílatel Matthias van de Meent < boekewurm+postg...@gmail.com> napsal: > Hi, > > Currently, nbtree code compares each and every column of an index > tuple during the binary search on the index page. With large indexes > that have many duplicate prefix column values (e.g. an index on (bool, > bool, uuid) ) that means a lot of wasted time getting to the right > column. > > The attached patch improves on that by doing per-page dynamic prefix > truncation: If we know that on both the right and left side there are > index tuples where the first two attributes are equal to the scan key, > we skip comparing those attributes at the current index tuple and > start with comparing attribute 3, saving two attribute compares. We > gain performance whenever comparing prefixing attributes is expensive > and when there are many tuples with a shared prefix - in unique > indexes this doesn't gain much, but we also don't lose much in this > case. > > This patch was originally suggested at [0], but it was mentioned that > they could be pulled out into it's own thread. Earlier, the > performance gains were not clearly there for just this patch, but > after further benchmarking this patch stands on its own for > performance: it sees no obvious degradation of performance, while > gaining 0-5% across various normal indexes on the cc-complete sample > dataset, with the current worst-case index shape getting a 60%+ > improved performance on INSERTs in the tests at [0]. > +1 This can be nice functionality. I had a customer with a very slow index scan - the main problem was a long common prefix like prg010203. Regards Pavel > Kind regards, > > Matthias van de Meent > Neon (https://neon.tech) > > PS. Best served with the downlink right separator/HIKEY optimization > (separate patch to be submitted soon), and specialization over at [0]. > > [0] > https://www.postgresql.org/message-id/CAEze2WiqOONRQTUT1p_ZV19nyMA69UU2s0e2dp+jSBM=j8s...@mail.gmail.com >
btree: implement dynamic prefix truncation (was: Improving btree performance through specializing by key shape, take 2)
Hi, Currently, nbtree code compares each and every column of an index tuple during the binary search on the index page. With large indexes that have many duplicate prefix column values (e.g. an index on (bool, bool, uuid) ) that means a lot of wasted time getting to the right column. The attached patch improves on that by doing per-page dynamic prefix truncation: If we know that on both the right and left side there are index tuples where the first two attributes are equal to the scan key, we skip comparing those attributes at the current index tuple and start with comparing attribute 3, saving two attribute compares. We gain performance whenever comparing prefixing attributes is expensive and when there are many tuples with a shared prefix - in unique indexes this doesn't gain much, but we also don't lose much in this case. This patch was originally suggested at [0], but it was mentioned that they could be pulled out into it's own thread. Earlier, the performance gains were not clearly there for just this patch, but after further benchmarking this patch stands on its own for performance: it sees no obvious degradation of performance, while gaining 0-5% across various normal indexes on the cc-complete sample dataset, with the current worst-case index shape getting a 60%+ improved performance on INSERTs in the tests at [0]. Kind regards, Matthias van de Meent Neon (https://neon.tech) PS. Best served with the downlink right separator/HIKEY optimization (separate patch to be submitted soon), and specialization over at [0]. [0] https://www.postgresql.org/message-id/CAEze2WiqOONRQTUT1p_ZV19nyMA69UU2s0e2dp+jSBM=j8s...@mail.gmail.com v14-0001-btree-Implement-dynamic-prefix-compression.patch Description: Binary data