On Sat, Sep 16, 2017 at 2:46 AM, Alexander Korotkov < a.korot...@postgrespro.ru> wrote:
> On Thu, Sep 14, 2017 at 2:48 AM, Alexander Korotkov < > a.korot...@postgrespro.ru> wrote: > >> Patch rebased to current master is attached. I'm going to improve my >> testing script and post new results. >> > > New benchmarking script and results are attached. There new dataset > parameter is introduced: skew factor. Skew factor defines skew in > distribution of groups sizes. > My idea of generating is just usage of power function where power is from > 0 to 1. Following formula is used to get group number for particular item > number i. > [((i / number_of_indexes) ^ power) * number_of_groups] > For example, power = 1/6 gives following distribution of groups sizes: > group number group size > 0 2 > 1 63 > 2 665 > 3 3367 > 4 11529 > 5 31031 > 6 70993 > 7 144495 > 8 269297 > 9 468558 > > For convenience, instead of power itself, I use skew factor where power = > 1.0 / (1.0 + skew). Therefore, with skew = 0.0, distribution of groups > sizes is uniform. Larger skew gives more skewed distribution (and that > seems to be quite intuitive). For, negative skew, group sizes are mirrored > as for corresponding positive skew. For example, skew factor = -5.0 gives > following groups sizes distribution: > group number group size > 0 468558 > 1 269297 > 2 144495 > 3 70993 > 4 31031 > 5 11529 > 6 3367 > 7 665 > 8 63 > 9 2 > > Results shows that between 2172 test cases, in 2113 incremental sort gives > speedup while in 59 it causes slowdown. The following 4 test cases show > most significant slowdown (>10% of time). > > Table GroupedCols GroupCount Skew PreorderedFrac > FullSortMedian IncSortMedian TimeChangePercent > int4|int4|numeric 1 100 -10 0 > 1.5688240528 2.0607631207 31.36 > text|int8|text|int4 1 1 0 0 > 1.7785198689 <(778)%20519-8689> 2.1816160679 22.66 > int8|int8|int4 1 10 -10 0 > 1.136412859 1.3166360855 15.86 > numeric|text|int4|int8 2 10 -10 1 > 0.4403841496 0.5070910454 15.15 > > As you can see, 3 of this 4 test cases have skewed distribution while one > of them is related to costly location-aware comparison of text. I've no > particular idea of how to cope these slowdowns. Probably, it's OK to have > slowdown in some cases while have speedup in majority of cases (assuming > there is an option to turn off new behavior). Probably, we should teach > optimizer more about skewed distributions of groups, but that doesn't seem > feasible for me. > > Any thoughts? > BTW, replacement selection sort was removed by 8b304b8b. I think it worth to rerun benchmarks after that, because results might be changed. Will do. ------ Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company