On 2010-11-20 04:46, Robert Haas wrote:
On Tue, Nov 16, 2010 at 6:07 AM, Alexander Korotkov
<aekorot...@gmail.com> wrote:
On Tue, Nov 16, 2010 at 3:07 AM, Robert Haas<robertmh...@gmail.com> wrote:
But on a broader note, I'm not very certain the sorting algorithm is
sensible. For example, suppose you have 10 segments that are exactly
'0' and 20 segments that are exactly '1'. Maybe I'm misunderstanding,
but it seems like this will result in a 15/15 split when we almost
certainly want a 10/20 split. I think there will be problems in more
complex cases as well. The documentation says about the less-than and
greater-than operators that "These operators do not make a lot of
sense for any practical purpose but sorting."
In order to illustrate a real problem we should think about
gist behavior with great enough amount of data. For example, I tried to
extrapolate this case to 100000 of segs where 40% are (0,1) segs and 60% are
(1,2) segs. And this case doesn't seem a problem for me.
Well, the problem with just comparing on< is that it takes very
little account of the upper bounds. I think the cases where a simple
split would hurt you the most are those where examining the upper
bound is necessary to to get a good split.
With the current 8K default blocksize, I put my money on the sorting
algorithm for any seg case. The r-tree algorithm's performance is
probably more influenced by large buckets->low tree depth->generic keys
on non leaf nodes.
regards,
Yeb Havinga
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers