Masahiko Sawada писал 2021-07-27 07:06:
On Mon, Jul 26, 2021 at 11:01 PM Masahiko Sawada <sawada.m...@gmail.com> wrote:

I'll experiment with the proposed ideas including this idea in more
scenarios and share the results tomorrow.


I've done some benchmarks for proposed data structures. In this trial,
I've done with the scenario where dead tuples are concentrated on a
particular range of table blocks (test 5-8), in addition to the
scenarios I've done in the previous trial. Also, I've done benchmarks
of each scenario while increasing table size. In the first test, the
maximum block number of the table is 1,000,000 (i.g., 8GB table) and
in the second test, it's 10,000,000 (80GB table). We can see how
performance and memory consumption changes with a large-scale table.
Here are the results:

* Test 1
select prepare(
1000000, -- max block
10, -- # of dead tuples per page
1, -- dead tuples interval within  a page
1, -- # of consecutive pages having dead tuples
20 -- page interval
);

name | attach | attach | shuffled | size_x10 | attach_x10| shuffled_x10
--------+-----------+--------+----------+------------+-----------+-------------
array | 57.23 MB | 0.040 | 98.613 | 572.21 MB | 0.387 | 1521.981 intset | 46.88 MB | 0.114 | 75.944 | 468.67 MB | 0.961 | 997.760 radix | 40.26 MB | 0.102 | 18.427 | 336.64 MB | 0.797 | 266.146 rtbm | 64.02 MB | 0.234 | 22.443 | 512.02 MB | 2.230 | 275.143 svtm | 27.28 MB | 0.060 | 13.568 | 274.07 MB | 0.476 | 211.073 tbm | 96.01 MB | 0.273 | 10.347 | 768.01 MB | 2.882 | 128.103

* Test 2
select prepare(
1000000, -- max block
10, -- # of dead tuples per page
1, -- dead tuples interval within  a page
1, -- # of consecutive pages having dead tuples
1 -- page interval
);

name | attach | attach | shuffled | size_x10 | attach_x10| shuffled_x10
--------+-----------+--------+----------+------------+-----------+-------------
array | 57.23 MB | 0.041 | 4.757 | 572.21 MB | 0.344 | 71.228 intset | 46.88 MB | 0.127 | 3.762 | 468.67 MB | 1.093 | 49.573 radix | 9.95 MB | 0.048 | 0.679 | 82.57 MB | 0.371 | 16.211 rtbm | 34.02 MB | 0.179 | 0.534 | 288.02 MB | 2.092 | 8.693 svtm | 5.78 MB | 0.043 | 0.239 | 54.60 MB | 0.342 | 7.759 tbm | 96.01 MB | 0.274 | 0.521 | 768.01 MB | 2.685 | 6.360

* Test 3
select prepare(
1000000, -- max block
2, -- # of dead tuples per page
100, -- dead tuples interval within  a page
1, -- # of consecutive pages having dead tuples
1 -- page interval
);

name | attach | attach | shuffled | size_x10 | attach_x10| shuffled_x10
--------+-----------+--------+----------+------------+-----------+-------------
array | 11.45 MB | 0.009 | 57.698 | 114.45 MB | 0.076 | 1045.639 intset | 15.63 MB | 0.031 | 46.083 | 156.23 MB | 0.243 | 848.525 radix | 40.26 MB | 0.063 | 13.755 | 336.64 MB | 0.501 | 223.413 rtbm | 36.02 MB | 0.123 | 11.527 | 320.02 MB | 1.843 | 180.977 svtm | 9.28 MB | 0.053 | 9.631 | 92.59 MB | 0.438 | 212.626 tbm | 96.01 MB | 0.228 | 10.381 | 768.01 MB | 2.258 | 126.630

* Test 4
select prepare(
1000000, -- max block
100, -- # of dead tuples per page
1, -- dead tuples interval within  a page
1, -- # of consecutive pages having dead tuples
1 -- page interval
);

name | attach | attach | shuffled | size_x10 | attach_x10| shuffled_x10
--------+-----------+--------+----------+------------+-----------+-------------
array | 572.21 MB | 0.367 | 78.047 | 5722.05 MB | 3.942 | 1154.776 intset | 93.74 MB | 0.777 | 45.146 | 937.34 MB | 7.716 | 643.708 radix | 40.26 MB | 0.203 | 9.015 | 336.64 MB | 1.775 | 133.294 rtbm | 36.02 MB | 0.369 | 5.639 | 320.02 MB | 3.823 | 88.832 svtm | 7.28 MB | 0.294 | 3.891 | 73.60 MB | 2.690 | 103.744 tbm | 96.01 MB | 0.534 | 5.223 | 768.01 MB | 5.679 | 60.632


* Test 5
select prepare(
1000000, -- max block
150, -- # of dead tuples per page
1, -- dead tuples interval within  a page
10000, -- # of consecutive pages having dead tuples
20000 -- page interval
);

There are 10000 consecutive pages that have 150 dead tuples at every
20000 pages.

name | attach | attach | shuffled | size_x10 | attach_x10| shuffled_x10
--------+-----------+--------+----------+------------+-----------+-------------
array | 429.16 MB | 0.274 | 75.664 | 4291.54 MB | 3.067 | 1259.501 intset | 46.88 MB | 0.559 | 36.449 | 468.67 MB | 4.565 | 517.445 radix | 20.26 MB | 0.166 | 8.466 | 196.90 MB | 1.273 | 166.587 rtbm | 18.02 MB | 0.242 | 8.491 | 160.02 MB | 2.407 | 171.725 svtm | 3.66 MB | 0.243 | 3.635 | 37.10 MB | 2.022 | 86.165 tbm | 48.01 MB | 0.344 | 9.763 | 384.01 MB | 3.327 | 151.824

* Test 6
select prepare(
1000000, -- max block
10, -- # of dead tuples per page
1, -- dead tuples interval within  a page
10000, -- # of consecutive pages having dead tuples
20000 -- page interval
);

There are 10000 consecutive pages that have 10 dead tuples at every 20000 pages.

name | attach | attach | shuffled | size_x10 | attach_x10| shuffled_x10
--------+-----------+--------+----------+------------+-----------+-------------
array | 28.62 MB | 0.022 | 2.791 | 286.11 MB | 0.170 | 46.920 intset | 23.45 MB | 0.061 | 2.156 | 234.34 MB | 0.501 | 32.577 radix | 5.04 MB | 0.026 | 0.433 | 48.57 MB | 0.191 | 11.060 rtbm | 17.02 MB | 0.074 | 0.533 | 144.02 MB | 0.954 | 11.502 svtm | 3.16 MB | 0.023 | 0.206 | 27.60 MB | 0.175 | 4.886 tbm | 48.01 MB | 0.132 | 0.656 | 384.01 MB | 1.284 | 10.231

* Test 7
select prepare(
1000000, -- max block
150, -- # of dead tuples per page
1, -- dead tuples interval within  a page
1000, -- # of consecutive pages having dead tuples
999000 -- page interval
);

There are pages that have 150 dead tuples at first 1000 blocks and
last 1000 blocks.

name | attach | attach | shuffled | size_x10 | attach_x10| shuffled_x10
--------+-----------+--------+----------+------------+-----------+-------------
array | 1.72 MB | 0.002 | 7.507 | 17.17 MB | 0.011 | 76.510 intset | 0.20 MB | 0.003 | 6.742 | 1.89 MB | 0.022 | 52.122 radix | 0.20 MB | 0.001 | 1.023 | 1.07 MB | 0.007 | 12.023 rtbm | 0.15 MB | 0.001 | 2.637 | 0.65 MB | 0.009 | 34.528 svtm | 0.52 MB | 0.002 | 0.721 | 0.61 MB | 0.010 | 6.434 tbm | 0.20 MB | 0.002 | 2.733 | 1.51 MB | 0.015 | 38.538

* Test 8
select prepare(
1000000, -- max block
100, -- # of dead tuples per page
1, -- dead tuples interval within  a page
50, -- # of consecutive pages having dead tuples
100 -- page interval
);

There are 50 consecutive pages that have 100 dead tuples at every 100 pages.

name | attach | attach | shuffled | size_x10 | attach_x10| shuffled_x10
--------+-----------+--------+----------+------------+-----------+-------------
array | 286.11 MB | 0.184 | 67.233 | 2861.03 MB | 1.743 | 979.070 intset | 46.88 MB | 0.389 | 35.176 | 468.67 MB | 3.698 | 505.322 radix | 21.82 MB | 0.116 | 6.160 | 186.86 MB | 0.891 | 117.730 rtbm | 18.02 MB | 0.182 | 5.909 | 160.02 MB | 1.870 | 112.550 svtm | 4.28 MB | 0.152 | 3.213 | 37.60 MB | 1.383 | 79.073 tbm | 48.01 MB | 0.265 | 6.673 | 384.01 MB | 2.586 | 101.327

Overall, 'svtm' is faster and consumes less memory. 'radix' tree also
has good performance and memory usage.

From these results, svtm is the best data structure among proposed
ideas for dead tuple storage used during lazy vacuum in terms of
performance and memory usage. I think it can support iteration by
extracting the offset of dead tuples for each block while iterating
chunks.

Apart from performance and memory usage points of view, we also need
to consider the reusability of the code. When I started this thread, I
thought the best data structure would be the one optimized for
vacuum's dead tuple storage. However, if we can use a data structure
that can also be used in general, we can use it also for other
purposes. Moreover, if it's too optimized for the current TID system
(32 bits block number, 16 bits offset number, maximum block/offset
number, etc.) it may become a blocker for future changes.

In that sense, radix tree also seems good since it can also be used in
gist vacuum as a replacement for intset, or a replacement for hash
table for shared buffer as discussed before. Are there any other use
cases? On the other hand, I’m concerned that radix tree would be an
over-engineering in terms of vacuum's dead tuples storage since the
dead tuple storage is static data and requires only lookup operation,
so if we want to use radix tree as dead tuple storage, I'd like to see
further use cases.

I can evolve svtm to transparent intset replacement certainly. Using
same trick from radix_to_key it will store tids efficiently:

  shift = pg_ceil_log2_32(MaxHeapTuplesPerPage);
  tid_i = ItemPointerGetOffsetNumber(tid);
  tid_i |= ItemPointerGetBlockNumber(tid) << shift;

Will do today's evening.

regards
Yura Sokolov aka funny_falcon


Reply via email to