On Sat, Sep 16, 2017 at 2:46 AM, Alexander Korotkov <
a.korot...@postgrespro.ru> wrote:

> On Thu, Sep 14, 2017 at 2:48 AM, Alexander Korotkov <
> a.korot...@postgrespro.ru> wrote:
>
>> Patch rebased to current master is attached.  I'm going to improve my
>> testing script and post new results.
>>
>
> New benchmarking script and results are attached.  There new dataset
> parameter is introduced: skew factor.  Skew factor defines skew in
> distribution of groups sizes.
> My idea of generating is just usage of power function where power is from
> 0 to 1.  Following formula is used to get group number for particular item
> number i.
> [((i / number_of_indexes) ^ power) * number_of_groups]
> For example, power = 1/6 gives following distribution of groups sizes:
> group number    group size
> 0               2
> 1               63
> 2               665
> 3               3367
> 4               11529
> 5               31031
> 6               70993
> 7               144495
> 8               269297
> 9               468558
>
> For convenience, instead of power itself, I use skew factor where power =
> 1.0 / (1.0 + skew).  Therefore, with skew = 0.0, distribution of groups
> sizes is uniform.  Larger skew gives more skewed distribution (and that
> seems to be quite intuitive).  For, negative skew, group sizes are mirrored
> as for corresponding positive skew.  For example, skew factor = -5.0 gives
> following groups sizes distribution:
> group number    group size
> 0               468558
> 1               269297
> 2               144495
> 3               70993
> 4               31031
> 5               11529
> 6               3367
> 7               665
> 8               63
> 9               2
>
> Results shows that between 2172 test cases, in 2113 incremental sort gives
> speedup while in 59 it causes slowdown.  The following 4 test cases show
> most significant slowdown (>10% of time).
>
> Table                   GroupedCols GroupCount Skew PreorderedFrac
> FullSortMedian IncSortMedian TimeChangePercent
> int4|int4|numeric       1                  100  -10              0
> 1.5688240528  2.0607631207             31.36
> text|int8|text|int4     1                    1    0              0
> 1.7785198689 <(778)%20519-8689>  2.1816160679             22.66
> int8|int8|int4          1                   10  -10              0
>  1.136412859  1.3166360855             15.86
> numeric|text|int4|int8  2                   10  -10              1
> 0.4403841496  0.5070910454             15.15
>
> As you can see, 3 of this 4 test cases have skewed distribution while one
> of them is related to costly location-aware comparison of text.  I've no
> particular idea of how to cope these slowdowns.  Probably, it's OK to have
> slowdown in some cases while have speedup in majority of cases (assuming
> there is an option to turn off new behavior).  Probably, we should teach
> optimizer more about skewed distributions of groups, but that doesn't seem
> feasible for me.
>
> Any thoughts?
>

BTW, replacement selection sort was removed by 8b304b8b.  I think it worth
to rerun benchmarks after that, because results might be changed.  Will do.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Reply via email to