Teodor Sigaev <teo...@sigaev.ru> writes:
>> The point I'm making is that the scenario your test case exposes is not
>> an infinite loop of picksplits, but an infinite loop of choose calls.

> Thank you, now I see what I missed before. After some brainstorming, it's 
> possible to use '\0' only as end of string marker.  The idea is: when we 
> split 
> allthesame tuple to upper/lower we copy node label to upper tuple instead of 
> set 
> it to '\0'. Node labels of lower tuple will be unchanged but should be 
> ignored 
> for reconstructionValue - and they are ignored because of allTheSame flag.  
> See 
> attached patch.

I don't believe this patch works at all.  In particular, for an allTheSame
node occurring at the root level, omitting the nodeChar from the
reconstructed value is certainly wrong since there was no parent that
could have had a label containing the same character.  More generally, the
patch is assuming that any allTheSame tuple is one that is underneath a
split node; but that's surely wrong (where did such a tuple come from
before the split happened?).

If the root case were the only possible one then we could fix the problem
by adding a test on whether level == 0, but AFAICS, allTheSame tuples can
also get created at lower tree levels.  All you need is a bunch of strings
that are duplicates for more than the maximum prefix length.

I think what we have to do is use a different dummy value for the node
label of a pushed-down allTheSame tuple than we do for end-of-string.
This requires widening the node labels from uint8 to (at least) int16.
However, I think that's essentially free because pass-by-value node
labels are stored as Datums anyway.  In fact, it might even be upward
compatible, at least for cases that aren't broken today.

> I rewrited a patch to fix missed way - allTheSame result of picksplit and 
> tooBig 
> is set. I believe this patch is still needed because it could make a tree 
> more 
> efficient as it was demonstrated for quad tree.

The question here continues to be how sure we are that the case described
in checkAllTheSame's header comment can't actually occur.  We certainly
thought it could when we originally wrote this stuff.  I think possibly
we were wrong, but I'd like to see a clear proof of that before we discuss
dropping the logic that's meant to cope with the situation.

                        regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to