Re: 回复: An implementation of multi-key sort
On Sun, Jul 7, 2024 at 2:32 AM Konstantin Knizhnik wrote: > If mksort really provides advantage only when there are a lot of > duplicates (for prefix keys?) and of small fraction of duplicates there > is even some (small) regression > then IMHO taking in account in planner information about estimated > number of distinct values seems to be really important. I don't think we can rely on the planner's n_distinct estimates for this at all. That information tends to be massively unreliable when we have it at all. If we rely on it for good performance, it will be easy to find cases where it's wrong and performance is bad. -- Robert Haas EDB: http://www.enterprisedb.com
Re: 回复: An implementation of multi-key sort
BTW I forgot to report that I intended to test this on 32-bit ARM too, because that sometimes triggers "funny" behavior, but the build fails like this: In file included from tuplesort.c:630: mk_qsort_tuple.c: In function ‘mkqs_compare_datum_by_shortcut’: mk_qsort_tuple.c:167:23: warning: implicit declaration of function ‘ApplySignedSortComparator’; did you mean ‘ApplyUnsignedSortComparator’? [-Wimplicit-function-declaration] 167 | ret = ApplySignedSortComparator(tuple1->datum1, | ^ | ApplyUnsignedSortComparator mk_qsort_tuple.c: In function ‘mkqs_compare_tuple’: mk_qsort_tuple.c:376:23: warning: implicit declaration of function ‘qsort_tuple_signed_compare’; did you mean ‘qsort_tuple_unsigned_compare’? [-Wimplicit-function-declaration] 376 | ret = qsort_tuple_signed_compare(a, b, state); | ^~ | qsort_tuple_unsigned_compare /usr/bin/ld: utils/sort/tuplesort.o: in function `mkqs_compare_datum_by_shortcut': /home/debian/postgres/src/backend/utils/sort/mk_qsort_tuple.c:167: undefined reference to `ApplySignedSortComparator' /usr/bin/ld: /home/debian/postgres/src/backend/utils/sort/mk_qsort_tuple.c:167: undefined reference to `ApplySignedSortComparator' /usr/bin/ld: /home/debian/postgres/src/backend/utils/sort/mk_qsort_tuple.c:167: undefined reference to `ApplySignedSortComparator' /usr/bin/ld: utils/sort/tuplesort.o: in function `mkqs_compare_tuple': /home/debian/postgres/src/backend/utils/sort/mk_qsort_tuple.c:376: undefined reference to `qsort_tuple_signed_compare' /usr/bin/ld: utils/sort/tuplesort.o: in function `mkqs_compare_datum_by_shortcut': /home/debian/postgres/src/backend/utils/sort/mk_qsort_tuple.c:167: undefined reference to `ApplySignedSortComparator' collect2: error: ld returned 1 exit status make[2]: *** [Makefile:67: postgres] Error 1 make[1]: *** [Makefile:42: all-backend-recurse] Error 2 make: *** [GNUmakefile:11: all-src-recurse] Error 2 I haven't investigated why it fails. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: 回复: An implementation of multi-key sort
On 7/7/24 08:32, Konstantin Knizhnik wrote: > > On 04/07/2024 3:45 pm, Yao Wang wrote: >> Generally, the benefit of mksort is mainly from duplicated values and >> sort >> keys: the more duplicated values and sort keys are, the bigger benefit it >> gets. > ... >> 1. Use distinct stats info of table to enable mksort >> >> It's kind of heuristics: in optimizer, check >> Form_pg_statistic->stadistinct >> of a table via pg_statistics. Enable mksort only when it is less than a >> threshold. >> >> The hacked code works, which need to modify a couple of interfaces of >> optimizer. In addition, a complete solution should consider types and >> distinct values of all columns, which might be too complex, and the >> benefit >> seems not so big. > > > If mksort really provides advantage only when there are a lot of > duplicates (for prefix keys?) and of small fraction of duplicates there > is even some (small) regression > then IMHO taking in account in planner information about estimated > number of distinct values seems to be really important. What was a > problem with accessing this statistics and why it requires modification > of optimizer interfaces? There is `get_variable_numdistinct` function > which is defined and used only in selfuncs.c > Yeah, I've been wondering about that too. But I'm also a bit unsure if using this known-unreliable statistics (especially with filters and multiple columns) would actually fix the regressions. > Information about values distribution seems to be quite useful for > choosing optimal sort algorithm. Not only for multi-key sort > optimization. For example if we know min.max value of sort key and it is > small, we can use O(N) algorithm for sorting. Also it can help to > estimate when TOP-N search is preferable. > This assumes the information is accurate / reliable, and I'm far from sure about that. > Right now Posgres creates special path for incremental sort. I am not > sure if we also need to be separate path for mk-sort. > But IMHO if we need to change some optimizer interfaces to be able to > take in account statistic and choose preferred sort algorithm at > planning time, then it should be done. > If mksort can increase sort more than two times (for large number of > duplicates), it will be nice to take it in account when choosing optimal > plan. > I did commit the incremental sort patch, and TBH I'm not convinced I'd do that again. It's a great optimization when it works (and it seems to work in plenty of cases), but we've also had a number of reports about significant regressions, where the incremental sort costing is quite off. Granted, it's often about cases where we already had issues and incremental sort just "exacerbates" that (say, with LIMIT queries), but that's kinda the point I'm trying to make - stats are inherently incomplete / simplified, and some plans are more sensitive to that. Which is why I'm wondering if we might do the decision based on some information collected at runtime. > Also in this case we do not need extra GUC for explicit enabling of > mksort. There are too many parameters for optimizer and adding one more > will make tuning more complex. So I prefer that decision is take buy > optimizer itself based on the available information, especially if > criteria seems to be obvious. > The GUC is very useful for testing, so let's keep it for now. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: 回复: An implementation of multi-key sort
On 7/4/24 14:45, Yao Wang wrote: > Hi John, > > Thanks for your kind message. I talked to Heikki before getting Tomas's > response, and he said "no promise but I will take a look". That's why I > added his email. I have updated the CF entry and added Tomas as reviewer. > > Hi Tomas, > > Again, I'd say a big thank to you. The report and script are really, really > helpful. And your ideas are very valuable. > > Firstly, the expectation of mksort performance: > > 1. When mksort works well, it should be faster than qsort because it saves > the cost of comparing duplicated values every time. > 2. When all values are distinct at a particular column, the comparison > will finish immediately, and mksort will actually fall back to qsort. For > the case, mksort should be equal or a bit slower than qsort because it need > to maintain more complex state. > > Generally, the benefit of mksort is mainly from duplicated values and sort > keys: the more duplicated values and sort keys are, the bigger benefit it > gets. > > Analysis on the report in your previous mail > > > 1. It seems the script uses $count to specify the duplicated values: > > number of repetitions for each value (ndistinct = nrows/count) > > However, it is not always correct. For type text, the script generates > values like this: > > expr="md5(((i / $count) + random())::text)" > > But md5() generates totally random values regardless of $count. Some cases > of timestamptz have the same problem. > > For all distinct values, the sort will finish at first depth and fall to > qsort actually. > You're right, thanks for noticing / fixing this. > 2. Even for the types with correct duplicated setting, the duplicated ratio > is very small: e.g. say $nrows = 1 and $count = 100, only 1% duplicated > rows can go to depth 2, and only 0.01% of them can go to depth 3. So it still > works on nearly all distinct values. > True, but that's why the scripts test with much larger data sets too, with more comparisons needing to look at other columns. It's be possible to construct data sets that are likely to benefit more from mksort - I'm not against doing that, but then there's the question of what data sets are more representative of what users actually do. I'd say a random data set like the ones I used are fairly common - it's fine to not improve them, but we should not regress them. > 3. Qsort of PG17 uses kind of specialization for tuple comparator, i.e. it > uses specialized functions for different types, e.g. qsort_tuple_unsigned() > for unsigned int. The specialized comparators avoid all type related checks > and are much faster than regular comparator. That is why we saw 200% or more > regression for the cases. > OK, I'm not familiar with this code enough to have an opinion. > > Code optimizations I did for mk qsort > - > > 1. Adapted specialization for tuple comparator. > 2. Use kind of "hybrid" sort: when we actually adapt bubble sort due to > limited sort items, use bubble sort to check datums since specified depth. > 3. Other other optimizations such as pre-ordered check. > > > Analysis on the new report > -- > > I also did some modifications to your script about the issues of data types, > plus an output about distinct value count/distinct ratio, and an indicator > for improvement/regression. I attached the new script and a report on a > data set with 100,000 rows and 2, 5, 8 columns. > OK, but I think a report for a single data set size is not sufficient to evaluate a patch like this, it can easily miss various caching effects etc. The results I shared a couple minutes ago are from 1000 to 10M rows, and it's much more complete view. > 1. Generally, the result match the expectation: "When mksort works well, it > should be faster than qsort; when mksort falls to qsort, it should be equal > or a bit slower than qsort." The challenge is how to know in advance if mksort is likely to work well. > 2. For all values of "sequential" (except text type), mksort is a bit slower > than qsort because no actual sort is performed due to the "pre-ordered" > check. OK > 3. For int and bigint type, mksort became faster and faster when > there were more and more duplicated values and sort keys. Improvement of > the best cases is about 58% (line 333) and 57% (line 711). I find it hard to interpret the text-only report, but I suppose these are essentially the "green" patches in the PDF report I attached to my earlier message. And indeed, there are nice improvements, but only with cases with very many duplicates, and the price for that is 10-20% regressions in the other cases. That does not seem like a great trade off to me. > 4. For timestamptz type, mksort is a bit slower than qsort because the > distinct ratio is always 1 for almost all cases. I think more benefit is > available by increasing the duplicated values. Yeah, this was a
Re: 回复: An implementation of multi-key sort
On 04/07/2024 3:45 pm, Yao Wang wrote: Generally, the benefit of mksort is mainly from duplicated values and sort keys: the more duplicated values and sort keys are, the bigger benefit it gets. ... 1. Use distinct stats info of table to enable mksort It's kind of heuristics: in optimizer, check Form_pg_statistic->stadistinct of a table via pg_statistics. Enable mksort only when it is less than a threshold. The hacked code works, which need to modify a couple of interfaces of optimizer. In addition, a complete solution should consider types and distinct values of all columns, which might be too complex, and the benefit seems not so big. If mksort really provides advantage only when there are a lot of duplicates (for prefix keys?) and of small fraction of duplicates there is even some (small) regression then IMHO taking in account in planner information about estimated number of distinct values seems to be really important. What was a problem with accessing this statistics and why it requires modification of optimizer interfaces? There is `get_variable_numdistinct` function which is defined and used only in selfuncs.c Information about values distribution seems to be quite useful for choosing optimal sort algorithm. Not only for multi-key sort optimization. For example if we know min.max value of sort key and it is small, we can use O(N) algorithm for sorting. Also it can help to estimate when TOP-N search is preferable. Right now Posgres creates special path for incremental sort. I am not sure if we also need to be separate path for mk-sort. But IMHO if we need to change some optimizer interfaces to be able to take in account statistic and choose preferred sort algorithm at planning time, then it should be done. If mksort can increase sort more than two times (for large number of duplicates), it will be nice to take it in account when choosing optimal plan. Also in this case we do not need extra GUC for explicit enabling of mksort. There are too many parameters for optimizer and adding one more will make tuning more complex. So I prefer that decision is take buy optimizer itself based on the available information, especially if criteria seems to be obvious. Best regards, Konstantin
Re: 回复: An implementation of multi-key sort
On Fri, Jun 14, 2024 at 6:20 PM Yao Wang wrote: > > hi Tomas, > > So many thanks for your kind response and detailed report. I am working > on locating issues based on your report/script and optimizing code, and > will update later. Hi, This is an interesting proof-of-concept! Given the above, I've set this CF entry to "waiting on author". Also, I see you've added Heikki as a reviewer. I'm not sure how others think, but I consider a "reviewer" in the CF app to be someone who has volunteered to be responsible to help move this patch forward. If there is a name in the reviewer column, it may discourage others from doing review. It also can happened that people ping reviewers to ask "There's been no review for X months -- are you planning on looking at this?", and it's not great if that message is a surprise. Note that we prefer not to top-post in emails since it makes our web archive more difficult to read. Thanks, John
Re: 回复: An implementation of multi-key sort
On 6/14/24 13:20, Yao Wang wrote: > hi Tomas, > > So many thanks for your kind response and detailed report. I am working > on locating issues based on your report/script and optimizing code, and > will update later. > > Could you please also send me the script to generate report pdf > from the test results (explain*.log)? I can try to make one by myself, > but I'd like to get a report exactly the same as yours. It's really > helpful. > I don't have a script for that. I simply load the results into a spreadsheet, do a pivot table to "aggregate and reshuffle" it a bit, and then add a heatmap. I use google sheets for this, but any other spreadsheet should handle this too, I think. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: 回复: An implementation of multi-key sort
hi Tomas, So many thanks for your kind response and detailed report. I am working on locating issues based on your report/script and optimizing code, and will update later. Could you please also send me the script to generate report pdf from the test results (explain*.log)? I can try to make one by myself, but I'd like to get a report exactly the same as yours. It's really helpful. Thanks in advance. Yao Wang On Mon, Jun 10, 2024 at 5:09 AM Tomas Vondra wrote: > > Hello Yao, > > I was interested in the patch, considering the promise of significant > speedups of sorting, so I took a quick look and did some basic perf > testing today. Unfortunately, my benchmarks don't really confirm any > peformance benefits, so I haven't looked at the code very much and only > have some very basic feedback: > > 1) The new GUC is missing from the .sample config, triggering a failure > of "make check-world". Fixed by 0002. > > 2) There's a place mixing tabs/spaces in indentation. Fixed by 0003. > > 3) I tried running pgindent, mostly to see how that would affect the > comments, and for most it's probably fine, but a couple are mangled > (usually those with a numbered list of items). Might needs some changes > to use formatting that's not reformatted like this. The changes from > pgindent are in 0004, but this is not a fix - it just shows the changes > after running pgindent. > > Now, regarding the performance tests - I decided to do the usual black > box testing, i.e. generate tables with varying numbers of columns, data > types, different data distribution (random, correlated, ...) and so on. > And then run simple ORDER BY queries on that, measuring timing with and > without mk-sort, and checking the effect. > > So I wrote a simple bash script (attached) that does exactly that - it > generates a table with 1k - 10M rows, fills with with data (with some > basic simple data distributions), and then runs the queries. > > The raw results are too large to attach, I'm only attaching a PDF > showing the summary with a "speedup heatmap" - it's a pivot with the > parameters on the left, and then the GUC and number on columns on top. > So the first group of columns is with enable_mk_sort=off, the second > group with enable_mk_sort=on, and finally the heatmap with relative > timing (enable_mk_sort=on / enable_mk_sort=off). > > So values <100% mean it got faster (green color - good), and values > >100% mean it got slower (red - bad). And the thing is - pretty much > everything is red, often in the 200%-300% range, meaning it got 2x-3x > slower. There's only very few combinations where it got faster. That > does not seem very promising ... but maybe I did something wrong? > > After seeing this, I took a look at your example again, which showed > some nice speedups. But it seems very dependent on the order of keys in > the ORDER BY clause. For example consider this: > > set enable_mk_sort = on; > explain (analyze, timing off) > select * from t1 order by c6, c5, c4, c3, c2, c1; > > QUERY PLAN > --- > Sort (cost=72328.81..73578.81 rows=49 width=76) >(actual rows=49 loops=1) >Sort Key: c6, c5, c4, c3, c2, c1 >Sort Method: quicksort Memory: 59163kB >-> Seq Scan on t1 (cost=0.00..24999.99 rows=49 width=76) >(actual rows=49 loops=1) > Planning Time: 0.054 ms > Execution Time: 1095.183 ms > (6 rows) > > set enable_mk_sort = on; > explain (analyze, timing off) > select * from t1 order by c6, c5, c4, c3, c2, c1; > > QUERY PLAN > --- > Sort (cost=72328.81..73578.81 rows=49 width=76) >(actual rows=49 loops=1) >Sort Key: c6, c5, c4, c3, c2, c1 >Sort Method: multi-key quick sort Memory: 59163kB >-> Seq Scan on t1 (cost=0.00..24999.99 rows=49 width=76) >(actual rows=49 loops=1) > Planning Time: 0.130 ms > Execution Time: 633.635 ms > (6 rows) > > Which seems great, but let's reverse the sort keys: > > set enable_mk_sort = off; > explain (analyze, timing off) > select * from t1 order by c1, c2, c3, c4, c5, c6; > > QUERY PLAN > --- > > Sort (cost=72328.81..73578.81 rows=49 width=76) >(actual rows=49 loops=1) >Sort Key: c1, c2, c3, c4, c5, c6 >Sort Method: quicksort Memory: 59163kB >-> Seq Scan on t1 (cost=0.00..24999.99 rows=49 width=76) >(actual rows=49 loops=1) > Planning Time: 0.146 ms > Execution Time: 170.085 ms > (6 rows) > > set enable_mk_sort = off; > explain (analyze, timing off) > select * from t1 order by c1, c2, c3, c4, c5, c6; > > QUERY PLAN > --- > Sort (cost=72328.81..73578.81
Re: 回复: An implementation of multi-key sort
To be accurate, "multi-key sort" includes both "multi-key quick sort" and "multi-key heap sort". This patch includes code change related to only "multi-key quick sort" which is used to replace standard quick sort for tuplesort. The "multi-key heap sort" is about an implementation of multi-key heap and should be treated as a separated task. We need to clarify the naming to avoid confusion. I updated code which is related to only function/var renaming and relevant comments, plus some minor assertions changes. Please see the attachment. Thanks, Yao Wang On Fri, May 31, 2024 at 8:09 PM Yao Wang wrote: > > I added two optimizations to mksort which exist on qsort_tuple(): > > 1. When selecting pivot, always pick the item in the middle of array but > not by random. Theoretically it has the same effect to old approach, but > it can eliminate some unstable perf test results, plus a bit perf benefit by > removing random value generator. > 2. Always check whether the array is ordered already, and return > immediately if it is. The pre-ordered check requires extra cost and > impacts perf numbers on some data sets, but can improve perf > significantly on other data sets. > > By now, mksort has perf results equal or better than qsort on all data > sets I ever used. > > I also updated test case. Please see v3 code as attachment. > > Perf test results: > > Data set 1 (with mass duplicate values): > - > > create table t1 (c1 int, c2 int, c3 int, c4 int, c5 int, c6 varchar(100)); > insert into t1 values (generate_series(1,49), 0, 0, 0, 0, > 'aaabbb'); > update t1 set c2 = c1 % 100, c3 = c1 % 50, c4 = c1 % 10, c5 = c1 % 3; > update t1 set c6 = 'aaabbb' > || (c1 % 5)::text; > > Query 1: > > explain analyze select c1 from t1 order by c6, c5, c4, c3, c2, c1; > > Disable Mksort > > 3021.636 ms > 3014.669 ms > 3033.588 ms > > Enable Mksort > > 1688.590 ms > 1686.956 ms > 1688.567 ms > > The improvement is 78.9%, which is reduced from the previous version > (129%). The most cost should be the pre-ordered check. > > Query 2: > > create index idx_t1_mk on t1 (c6, c5, c4, c3, c2, c1); > > Disable Mksort > > 1674.648 ms > 1680.608 ms > 1681.373 ms > > Enable Mksort > > 1143.341 ms > 1143.462 ms > 1143.894 ms > > The improvement is ~47%, which is also reduced a bit (52%). > > Data set 2 (with distinct values): > -- > > create table t2 (c1 int, c2 int, c3 int, c4 int, c5 int, c6 varchar(100)); > insert into t2 values (generate_series(1,49), 0, 0, 0, 0, ''); > update t2 set c2 = 90 - c1, c3 = 91 - c1, c4 = 92 - c1, c5 > = 93 - c1; > update t2 set c6 = 'aaabbb' > || (94 - c1)::text; > > Query 1: > > explain analyze select c1 from t2 order by c6, c5, c4, c3, c2, c1; > > Disable Mksort > > 12199.963 ms > 12197.068 ms > 12191.657 ms > > Enable Mksort > > 9538.219 ms > 9571.681 ms > 9536.335 ms > > The improvement is 27.9%, which is much better than the old approach (-6.2%). > > Query 2 (the data is pre-ordered): > > explain analyze select c1 from t2 order by c6 desc, c5, c4, c3, c2, c1; > > Enable Mksort > > 768.191 ms > 768.079 ms > 767.026 ms > > Disable Mksort > > 768.757 ms > 766.166 ms > 766.149 ms > > They are almost the same since no actual sort was performed, and much > better than the old approach (-1198.1%). > > > Thanks, > > Yao Wang > > On Fri, May 24, 2024 at 8:50 PM Yao Wang wrote: > > > > When all leading keys are different, mksort will finish the entire sort at > > the > > first sort key and never touch other keys. For the case, mksort falls back > > to > > kind of qsort actually. > > > > I created another data set with distinct values in all sort keys: > > > > create table t2 (c1 int, c2 int, c3 int, c4 int, c5 int, c6 varchar(100)); > > insert into t2 values (generate_series(1,49), 0, 0, 0, 0, ''); > > update t2 set c2 = 90 - c1, c3 = 91 - c1, c4 = 92 - c1, c5 > > = 93 - c1; > > update t2 set c6 = 'aaabbb' > > || (94 - c1)::text; > > explain analyze select c1 from t2 order by c6, c5, c4, c3, c2, c1; > > > > Results: > > > > MKsort: > > 12374.427 ms > > 12528.068 ms > > 12554.718 ms > > > > qsort: > > 12251.422 ms > > 12279.938 ms >
Re: 回复: An implementation of multi-key sort
I added two optimizations to mksort which exist on qsort_tuple(): 1. When selecting pivot, always pick the item in the middle of array but not by random. Theoretically it has the same effect to old approach, but it can eliminate some unstable perf test results, plus a bit perf benefit by removing random value generator. 2. Always check whether the array is ordered already, and return immediately if it is. The pre-ordered check requires extra cost and impacts perf numbers on some data sets, but can improve perf significantly on other data sets. By now, mksort has perf results equal or better than qsort on all data sets I ever used. I also updated test case. Please see v3 code as attachment. Perf test results: Data set 1 (with mass duplicate values): - create table t1 (c1 int, c2 int, c3 int, c4 int, c5 int, c6 varchar(100)); insert into t1 values (generate_series(1,49), 0, 0, 0, 0, 'aaabbb'); update t1 set c2 = c1 % 100, c3 = c1 % 50, c4 = c1 % 10, c5 = c1 % 3; update t1 set c6 = 'aaabbb' || (c1 % 5)::text; Query 1: explain analyze select c1 from t1 order by c6, c5, c4, c3, c2, c1; Disable Mksort 3021.636 ms 3014.669 ms 3033.588 ms Enable Mksort 1688.590 ms 1686.956 ms 1688.567 ms The improvement is 78.9%, which is reduced from the previous version (129%). The most cost should be the pre-ordered check. Query 2: create index idx_t1_mk on t1 (c6, c5, c4, c3, c2, c1); Disable Mksort 1674.648 ms 1680.608 ms 1681.373 ms Enable Mksort 1143.341 ms 1143.462 ms 1143.894 ms The improvement is ~47%, which is also reduced a bit (52%). Data set 2 (with distinct values): -- create table t2 (c1 int, c2 int, c3 int, c4 int, c5 int, c6 varchar(100)); insert into t2 values (generate_series(1,49), 0, 0, 0, 0, ''); update t2 set c2 = 90 - c1, c3 = 91 - c1, c4 = 92 - c1, c5 = 93 - c1; update t2 set c6 = 'aaabbb' || (94 - c1)::text; Query 1: explain analyze select c1 from t2 order by c6, c5, c4, c3, c2, c1; Disable Mksort 12199.963 ms 12197.068 ms 12191.657 ms Enable Mksort 9538.219 ms 9571.681 ms 9536.335 ms The improvement is 27.9%, which is much better than the old approach (-6.2%). Query 2 (the data is pre-ordered): explain analyze select c1 from t2 order by c6 desc, c5, c4, c3, c2, c1; Enable Mksort 768.191 ms 768.079 ms 767.026 ms Disable Mksort 768.757 ms 766.166 ms 766.149 ms They are almost the same since no actual sort was performed, and much better than the old approach (-1198.1%). Thanks, Yao Wang On Fri, May 24, 2024 at 8:50 PM Yao Wang wrote: > > When all leading keys are different, mksort will finish the entire sort at the > first sort key and never touch other keys. For the case, mksort falls back to > kind of qsort actually. > > I created another data set with distinct values in all sort keys: > > create table t2 (c1 int, c2 int, c3 int, c4 int, c5 int, c6 varchar(100)); > insert into t2 values (generate_series(1,49), 0, 0, 0, 0, ''); > update t2 set c2 = 90 - c1, c3 = 91 - c1, c4 = 92 - c1, c5 > = 93 - c1; > update t2 set c6 = 'aaabbb' > || (94 - c1)::text; > explain analyze select c1 from t2 order by c6, c5, c4, c3, c2, c1; > > Results: > > MKsort: > 12374.427 ms > 12528.068 ms > 12554.718 ms > > qsort: > 12251.422 ms > 12279.938 ms > 12280.254 ms > > MKsort is a bit slower than qsort, which can be explained by extra > checks of MKsort. > > Yao Wang > > On Fri, May 24, 2024 at 8:36 PM Wang Yao wrote: > > > > > > > > 获取Outlook for Android > > > > From: Heikki Linnakangas > > Sent: Thursday, May 23, 2024 8:47:29 PM > > To: Wang Yao ; PostgreSQL Hackers > > > > Cc: inte...@outlook.com > > Subject: Re: 回复: An implementation of multi-key sort > > > > On 23/05/2024 15:39, Wang Yao wrote: > > > No obvious perf regression is expected because PG will follow original > > > qsort code path when mksort is disabled. For the case, the only extra > > > cost is the check in tuplesort_sort_memtuples() to enter mksort code path. > > > > And what about the case the mksort is enabled, but it's not effective > > because all leading keys are different? > > > > -- > > Heikki Linnakangas > > Neon (https://neon.tech) > > -- This electronic communication and the information and any files transmitted with it, or attached to it, are confidential and are intended solely for the use of the individual or entity to whom it is addressed and may contain information that is confidential, legally
Re: 回复: An implementation of multi-key sort
When all leading keys are different, mksort will finish the entire sort at the first sort key and never touch other keys. For the case, mksort falls back to kind of qsort actually. I created another data set with distinct values in all sort keys: create table t2 (c1 int, c2 int, c3 int, c4 int, c5 int, c6 varchar(100)); insert into t2 values (generate_series(1,49), 0, 0, 0, 0, ''); update t2 set c2 = 90 - c1, c3 = 91 - c1, c4 = 92 - c1, c5 = 93 - c1; update t2 set c6 = 'aaabbb' || (94 - c1)::text; explain analyze select c1 from t2 order by c6, c5, c4, c3, c2, c1; Results: MKsort: 12374.427 ms 12528.068 ms 12554.718 ms qsort: 12251.422 ms 12279.938 ms 12280.254 ms MKsort is a bit slower than qsort, which can be explained by extra checks of MKsort. Yao Wang On Fri, May 24, 2024 at 8:36 PM Wang Yao wrote: > > > > 获取Outlook for Android > > From: Heikki Linnakangas > Sent: Thursday, May 23, 2024 8:47:29 PM > To: Wang Yao ; PostgreSQL Hackers > > Cc: inte...@outlook.com > Subject: Re: 回复: An implementation of multi-key sort > > On 23/05/2024 15:39, Wang Yao wrote: > > No obvious perf regression is expected because PG will follow original > > qsort code path when mksort is disabled. For the case, the only extra > > cost is the check in tuplesort_sort_memtuples() to enter mksort code path. > > And what about the case the mksort is enabled, but it's not effective > because all leading keys are different? > > -- > Heikki Linnakangas > Neon (https://neon.tech) > -- This electronic communication and the information and any files transmitted with it, or attached to it, are confidential and are intended solely for the use of the individual or entity to whom it is addressed and may contain information that is confidential, legally privileged, protected by privacy laws, or otherwise restricted from disclosure to anyone else. If you are not the intended recipient or the person responsible for delivering the e-mail to the intended recipient, you are hereby notified that any use, copying, distributing, dissemination, forwarding, printing, or copying of this e-mail is strictly prohibited. If you received this e-mail in error, please return the e-mail to the sender, delete it from your computer, and destroy any printed copy of it.
Re: 回复: An implementation of multi-key sort
On 23/05/2024 15:39, Wang Yao wrote: No obvious perf regression is expected because PG will follow original qsort code path when mksort is disabled. For the case, the only extra cost is the check in tuplesort_sort_memtuples() to enter mksort code path. And what about the case the mksort is enabled, but it's not effective because all leading keys are different? -- Heikki Linnakangas Neon (https://neon.tech)
回复: An implementation of multi-key sort
No obvious perf regression is expected because PG will follow original qsort code path when mksort is disabled. For the case, the only extra cost is the check in tuplesort_sort_memtuples() to enter mksort code path. It's also proved by the experiment today: Mksort disabled: 2949.287 ms 2955.258 ms 2947.262 ms No mksort code: 2947.094 ms 2946.419 ms 2953.215 ms Almost the same. I also updated code with small enhancements. Please see the latest code as attachment. Thanks, Yao Wang 发件人: Heikki Linnakangas 发送时间: 2024年5月22日 23:29 收件人: Wang Yao ; PostgreSQL Hackers 抄送: inte...@outlook.com 主题: Re: An implementation of multi-key sort On 22/05/2024 15:48, Wang Yao wrote: > Comparing to classic quick sort, it can get significant performance > improvement once multiple keys are available. A rough test shows it got > ~129% improvement than qsort for ORDER BY on 6 keys, and ~52% for CREATE > INDEX on the same data set. (See more details in section "Performance > Test") Impressive. Did you test the performance of the cases where MK-sort doesn't help, to check if there is a performance regression? -- Heikki Linnakangas Neon (https://neon.tech) v2-Implement-multi-key-sort.patch Description: v2-Implement-multi-key-sort.patch
Re: An implementation of multi-key sort
On 22/05/2024 15:48, Wang Yao wrote: Comparing to classic quick sort, it can get significant performance improvement once multiple keys are available. A rough test shows it got ~129% improvement than qsort for ORDER BY on 6 keys, and ~52% for CREATE INDEX on the same data set. (See more details in section "Performance Test") Impressive. Did you test the performance of the cases where MK-sort doesn't help, to check if there is a performance regression? -- Heikki Linnakangas Neon (https://neon.tech)
An implementation of multi-key sort
Hi hackers, I'd submit an implementation of multi-key sort for review. Please see the code as attachment. Thanks for your reponse in advance. Overview MKsort (multi-key sort) is an alternative of standard qsort algorithm, which has better performance for particular sort scenarios, i.e. the data set has multiple keys to be sorted. The implementation is based on the paper: Jon L. Bentley and Robert Sedgewick, "Fast Algorithms for Sorting and Searching Strings", Jan 1997 [1] MKsort is applied only for tuple sort by the patch. Theoretically it can be applied for general-purpose sort scenario when there are multiple sort keys available, but it is relatively difficult in practice because kind of unique interface is needed to manipulate the keys. So I limit the usage of mksort to sort SortTuple. Comparing to classic quick sort, it can get significant performance improvement once multiple keys are available. A rough test shows it got ~129% improvement than qsort for ORDER BY on 6 keys, and ~52% for CREATE INDEX on the same data set. (See more details in section "Performance Test") Author: Yao Wang Co-author: Hongxu Ma Scope - The interface of mksort is pretty simple: in tuplesort_sort_memtuples(), mksort_tuple() is invoked instead of qsort_tuple() if mksort is applicable. The major logic in mksort_tuple() is to apply mksort algorithm on SortTuple, and kind of callback mechanism is used to handle sort-variant-specific issue, e.g. comparing different datums, like qsort_tuple() does. It also handles the complexity of "abbreviated keys". A small difference from classic mksort algorithm is: for IndexTuple, when all the columns are equal, an additional comparing based on ItemPointer is performed to determine the order. It is to make the result consistent to existing qsort. I did consider about implementing mksort by the approach of kind of template mechanism like qsort (see sort_template.h), but it seems unnecessary because all concrete tuple types need to be handled are derived from SortTuple. Use callback to isolate type specific features is good enough. Note that not all tuple types are supported by mksort. Please see the comments inside tuplesort_sort_memtuples(). Test Cases -- The changes of test cases include: * Generally, mksort should generate result exactly same to qsort. However some test cases don't. The reason is that SQL doesn't specify order on all possible columns, e.g. "select c1, c2 from t1 order by c1" will generate different results between mksort/qsort when c1 values are equal, and the solution is to order c2 as well ("select c1, c2 from t1 order by c1, c2"). (e.g. geometry) * Some cases need to be updated to display the new sort method "multi-key sort" in explain result. (e.g. incremental_sort) * regress/tuplesort was updated with new cases to cover some scenarios of mksort. Performance Test The script I used to configure the build: CFLAGS="-O3 -fargument-noalias-global -fno-omit-frame-pointer -g" ./configure --prefix=$PGHOME --with-pgport=5432 --with-perl --with-openssl --with-python --with-pam --with-blocksize=16 --with-wal-blocksize=16 --with-perl --enable-tap-tests --with-gssapi --with-ldap I used the script for a rough test for ORDER BY: \timing on create table t1 (c1 int, c2 int, c3 int, c4 int, c5 int, c6 varchar(100)); insert into t1 values (generate_series(1,49), 0, 0, 0, 0, 'aaabbb'); update t1 set c2 = c1 % 100, c3 = c1 % 50, c4 = c1 % 10, c5 = c1 % 3; update t1 set c6 = 'aaabbb' || (c1 % 5)::text; -- Use a large work mem to ensure the entire sort happens in memory set work_mem='1GB'; -- switch between qsort/mksort set enable_mk_sort=off; explain analyze select c1 from t1 order by c6, c5, c4, c3, c2, c1; Results: mksort: 1341.283 ms (00:01.341) 1379.098 ms (00:01.379) 1369.868 ms (00:01.370) qsort: 3137.277 ms (00:03.137) 3147.771 ms (00:03.148) 3131.887 ms (00:03.132) The perf improvement is ~129%. Another perf test for CREATE INDEX: create index idx_t1_mk on t3 (c6, c5, c4, c3, c2, c1); Results: mksort: 1147.207 ms (00:01.147) 1200.501 ms (00:01.201) 1235.657 ms (00:01.236) Qsort: 1852.957 ms (00:01.853) 1824.209 ms (00:01.824) 1808.781 ms (00:01.809) The perf improvement is ~52%. Another test is to use one of queries of TPC-H: set work_mem='1GB'; -- query rewritten from TPCH-Q1, and there are 6001215 rows in lineitem explain analyze select l_returnflag,l_linestatus,l_quantity,l_shipmode from lineitem where l_shipdate <= date'1998-12-01' - interval '65 days' order by l_returnflag,l_linestatus,l_quantity,l_shipmode; Result: Qsort: 14582.626 ms 14524.188 ms 14524.111 ms mksort: 11390.891 ms 11647.065 ms 11546.791 ms The perf improvement is ~25.8%. [1] https://www.cs.tufts.edu/~nr/cs257/archive/bob-sedgewick/fast-strings.pdf [2