Re: [HACKERS] GSoC 2014 proposal

Alexander Korotkov Wed, 02 Apr 2014 03:23:29 -0700

On Tue, Apr 1, 2014 at 2:23 PM, Heikki Linnakangas
<[email protected]>wrote:


> The BIRCH algorithm as described in the paper describes building a tree in
> memory. If I understood correctly, you're suggesting to use a pre-built
> GiST index instead. Interesting idea!
>
> There are a couple of signifcant differences between the CF tree described
> in the paper and GiST:
>
> 1. In GiST, a leaf item always represents one heap tuple. In the CF tree,
> a leaf item represents a cluster, which consists of one or more tuples. So
> the CF tree doesn't store an entry for every input tuple, which makes it
> possible to keep it in memory.
>
> 2. In the CF tree, "all entries in a leaf node must satisfy a threshold
> requirement, with respect to a threshold value T: the diameter (or radius)
> has to be less than T". GiST imposes no such restrictions. An item can
> legally be placed anywhere in the tree; placing it badly will just lead to
> degraded search performance, but it's still a legal GiST tree.
>
> 3. A GiST index, like any other index in PostgreSQL, holds entries also
> for deleted tuples, until the index is vacuumed. So you cannot just use
> information from a non-leaf node and use it in the result, as the
> information summarized at a non-leaf level includes noise from the dead
> tuples.
>
> Can you elaborate how you are planning to use a GiST index to implement
> BIRCH? You might also want to take a look at SP-GiST; SP-GiST is more
> strict in where in the tree an item can be stored, and lets the operator
> class to specify exactly when a node is split etc.
>

Hmmm, it's likely I've imagined something quite outside of this paper, and
even already suggested it to Ivan... :)
I need a little time to rethink it.

------
With best regards,
Alexander Korotkov.

Re: [HACKERS] GSoC 2014 proposal

Reply via email to