On Wed, Jun 1, 2011 at 3:37 PM, Heikki Linnakangas <
heikki.linnakan...@enterprisedb.com> wrote:

> My guess is that the picksplit algorithm performs poorly with that data.
> Unfortunately, I have no idea how to improve that.


Current cube picksplit function have no storage utilization guarantees,
while original Guttman's picksplit has them (if one of group size reaches
some threshold, then all other entries go to another group). Also, current
picksplit is mix of Guttman's linear and quadratic algorithms. It picks
seeds quadratically, but distributes entries linearly.
I see following ways of solving picksplit problem for cube:
1) Add storage utilization guarantees to current picksplit. It may cause
increase of overlaps, but should descrease index size.
2) Add storage utilization guarantees to current picksplit and replace
entries distribution algorithm to the quadratic one. Picksplit will take
more time, but it should give more stable and predictable result.
3) I had some experiments with my own picksplit algorithm, which showed
pretty good results on tests which I've run. But current implementation is
dirty and it's require more testing.

 ------
With best regards,
Alexander Korotkov.

Reply via email to