Re: [PoC] Improve dead tuple storage for lazy vacuum

Yura Sokolov Mon, 26 Jul 2021 23:06:41 -0700

Masahiko Sawada писал 2021-07-27 07:06:

On Mon, Jul 26, 2021 at 11:01 PM Masahiko Sawada<sawada.m...@gmail.com> wrote:
I'll experiment with the proposed ideas including this idea in more
scenarios and share the results tomorrow.
I've done some benchmarks for proposed data structures. In this trial,
I've done with the scenario where dead tuples are concentrated on a
particular range of table blocks (test 5-8), in addition to the
scenarios I've done in the previous trial. Also, I've done benchmarks
of each scenario while increasing table size. In the first test, the
maximum block number of the table is 1,000,000 (i.g., 8GB table) and
in the second test, it's 10,000,000 (80GB table). We can see how
performance and memory consumption changes with a large-scale table.
Here are the results:

* Test 1
select prepare(
1000000, -- max block
10, -- # of dead tuples per page
1, -- dead tuples interval within  a page
1, -- # of consecutive pages having dead tuples
20 -- page interval
);
name | attach | attach | shuffled | size_x10 | attach_x10|shuffled_x10
--------+-----------+--------+----------+------------+-----------+-------------
array | 57.23 MB | 0.040 | 98.613 | 572.21 MB | 0.387 |1521.981intset | 46.88 MB | 0.114 | 75.944 | 468.67 MB | 0.961 |997.760radix | 40.26 MB | 0.102 | 18.427 | 336.64 MB | 0.797 |266.146rtbm | 64.02 MB | 0.234 | 22.443 | 512.02 MB | 2.230 |275.143svtm | 27.28 MB | 0.060 | 13.568 | 274.07 MB | 0.476 |211.073tbm | 96.01 MB | 0.273 | 10.347 | 768.01 MB | 2.882 |128.103
* Test 2
select prepare(
1000000, -- max block
10, -- # of dead tuples per page
1, -- dead tuples interval within  a page
1, -- # of consecutive pages having dead tuples
1 -- page interval
);
name | attach | attach | shuffled | size_x10 | attach_x10|shuffled_x10
--------+-----------+--------+----------+------------+-----------+-------------
array | 57.23 MB | 0.041 | 4.757 | 572.21 MB | 0.344 |71.228intset | 46.88 MB | 0.127 | 3.762 | 468.67 MB | 1.093 |49.573radix | 9.95 MB | 0.048 | 0.679 | 82.57 MB | 0.371 |16.211rtbm | 34.02 MB | 0.179 | 0.534 | 288.02 MB | 2.092 |8.693svtm | 5.78 MB | 0.043 | 0.239 | 54.60 MB | 0.342 |7.759tbm | 96.01 MB | 0.274 | 0.521 | 768.01 MB | 2.685 |6.360
* Test 3
select prepare(
1000000, -- max block
2, -- # of dead tuples per page
100, -- dead tuples interval within  a page
1, -- # of consecutive pages having dead tuples
1 -- page interval
);
name | attach | attach | shuffled | size_x10 | attach_x10|shuffled_x10
--------+-----------+--------+----------+------------+-----------+-------------
array | 11.45 MB | 0.009 | 57.698 | 114.45 MB | 0.076 |1045.639intset | 15.63 MB | 0.031 | 46.083 | 156.23 MB | 0.243 |848.525radix | 40.26 MB | 0.063 | 13.755 | 336.64 MB | 0.501 |223.413rtbm | 36.02 MB | 0.123 | 11.527 | 320.02 MB | 1.843 |180.977svtm | 9.28 MB | 0.053 | 9.631 | 92.59 MB | 0.438 |212.626tbm | 96.01 MB | 0.228 | 10.381 | 768.01 MB | 2.258 |126.630
* Test 4
select prepare(
1000000, -- max block
100, -- # of dead tuples per page
1, -- dead tuples interval within  a page
1, -- # of consecutive pages having dead tuples
1 -- page interval
);
name | attach | attach | shuffled | size_x10 | attach_x10|shuffled_x10
--------+-----------+--------+----------+------------+-----------+-------------
array | 572.21 MB | 0.367 | 78.047 | 5722.05 MB | 3.942 |1154.776intset | 93.74 MB | 0.777 | 45.146 | 937.34 MB | 7.716 |643.708radix | 40.26 MB | 0.203 | 9.015 | 336.64 MB | 1.775 |133.294rtbm | 36.02 MB | 0.369 | 5.639 | 320.02 MB | 3.823 |88.832svtm | 7.28 MB | 0.294 | 3.891 | 73.60 MB | 2.690 |103.744tbm | 96.01 MB | 0.534 | 5.223 | 768.01 MB | 5.679 |60.632
* Test 5
select prepare(
1000000, -- max block
150, -- # of dead tuples per page
1, -- dead tuples interval within  a page
10000, -- # of consecutive pages having dead tuples
20000 -- page interval
);

There are 10000 consecutive pages that have 150 dead tuples at every
20000 pages.
name | attach | attach | shuffled | size_x10 | attach_x10|shuffled_x10
--------+-----------+--------+----------+------------+-----------+-------------
array | 429.16 MB | 0.274 | 75.664 | 4291.54 MB | 3.067 |1259.501intset | 46.88 MB | 0.559 | 36.449 | 468.67 MB | 4.565 |517.445radix | 20.26 MB | 0.166 | 8.466 | 196.90 MB | 1.273 |166.587rtbm | 18.02 MB | 0.242 | 8.491 | 160.02 MB | 2.407 |171.725svtm | 3.66 MB | 0.243 | 3.635 | 37.10 MB | 2.022 |86.165tbm | 48.01 MB | 0.344 | 9.763 | 384.01 MB | 3.327 |151.824
* Test 6
select prepare(
1000000, -- max block
10, -- # of dead tuples per page
1, -- dead tuples interval within  a page
10000, -- # of consecutive pages having dead tuples
20000 -- page interval
);
There are 10000 consecutive pages that have 10 dead tuples at every20000 pages.
name | attach | attach | shuffled | size_x10 | attach_x10|shuffled_x10
--------+-----------+--------+----------+------------+-----------+-------------
array | 28.62 MB | 0.022 | 2.791 | 286.11 MB | 0.170 |46.920intset | 23.45 MB | 0.061 | 2.156 | 234.34 MB | 0.501 |32.577radix | 5.04 MB | 0.026 | 0.433 | 48.57 MB | 0.191 |11.060rtbm | 17.02 MB | 0.074 | 0.533 | 144.02 MB | 0.954 |11.502svtm | 3.16 MB | 0.023 | 0.206 | 27.60 MB | 0.175 |4.886tbm | 48.01 MB | 0.132 | 0.656 | 384.01 MB | 1.284 |10.231
* Test 7
select prepare(
1000000, -- max block
150, -- # of dead tuples per page
1, -- dead tuples interval within  a page
1000, -- # of consecutive pages having dead tuples
999000 -- page interval
);

There are pages that have 150 dead tuples at first 1000 blocks and
last 1000 blocks.
name | attach | attach | shuffled | size_x10 | attach_x10|shuffled_x10
--------+-----------+--------+----------+------------+-----------+-------------
array | 1.72 MB | 0.002 | 7.507 | 17.17 MB | 0.011 |76.510intset | 0.20 MB | 0.003 | 6.742 | 1.89 MB | 0.022 |52.122radix | 0.20 MB | 0.001 | 1.023 | 1.07 MB | 0.007 |12.023rtbm | 0.15 MB | 0.001 | 2.637 | 0.65 MB | 0.009 |34.528svtm | 0.52 MB | 0.002 | 0.721 | 0.61 MB | 0.010 |6.434tbm | 0.20 MB | 0.002 | 2.733 | 1.51 MB | 0.015 |38.538
* Test 8
select prepare(
1000000, -- max block
100, -- # of dead tuples per page
1, -- dead tuples interval within  a page
50, -- # of consecutive pages having dead tuples
100 -- page interval
);
There are 50 consecutive pages that have 100 dead tuples at every 100pages.
name | attach | attach | shuffled | size_x10 | attach_x10|shuffled_x10
--------+-----------+--------+----------+------------+-----------+-------------
array | 286.11 MB | 0.184 | 67.233 | 2861.03 MB | 1.743 |979.070intset | 46.88 MB | 0.389 | 35.176 | 468.67 MB | 3.698 |505.322radix | 21.82 MB | 0.116 | 6.160 | 186.86 MB | 0.891 |117.730rtbm | 18.02 MB | 0.182 | 5.909 | 160.02 MB | 1.870 |112.550svtm | 4.28 MB | 0.152 | 3.213 | 37.60 MB | 1.383 |79.073tbm | 48.01 MB | 0.265 | 6.673 | 384.01 MB | 2.586 |101.327
Overall, 'svtm' is faster and consumes less memory. 'radix' tree also
has good performance and memory usage.

From these results, svtm is the best data structure among proposed
ideas for dead tuple storage used during lazy vacuum in terms of
performance and memory usage. I think it can support iteration by
extracting the offset of dead tuples for each block while iterating
chunks.

Apart from performance and memory usage points of view, we also need
to consider the reusability of the code. When I started this thread, I
thought the best data structure would be the one optimized for
vacuum's dead tuple storage. However, if we can use a data structure
that can also be used in general, we can use it also for other
purposes. Moreover, if it's too optimized for the current TID system
(32 bits block number, 16 bits offset number, maximum block/offset
number, etc.) it may become a blocker for future changes.

In that sense, radix tree also seems good since it can also be used in
gist vacuum as a replacement for intset, or a replacement for hash
table for shared buffer as discussed before. Are there any other use
cases? On the other hand, I’m concerned that radix tree would be an
over-engineering in terms of vacuum's dead tuples storage since the
dead tuple storage is static data and requires only lookup operation,
so if we want to use radix tree as dead tuple storage, I'd like to see
further use cases.


I can evolve svtm to transparent intset replacement certainly. Using
same trick from radix_to_key it will store tids efficiently:

  shift = pg_ceil_log2_32(MaxHeapTuplesPerPage);
  tid_i = ItemPointerGetOffsetNumber(tid);
  tid_i |= ItemPointerGetBlockNumber(tid) << shift;

Will do today's evening.

regards
Yura Sokolov aka funny_falcon

Re: [PoC] Improve dead tuple storage for lazy vacuum

Reply via email to