Re: [HACKERS] polyphase merge?

Don Marvick Wed, 04 Feb 2009 18:53:42 -0800

This is what I understand from existing implementation:
- fix the merge fan in F, and number of tapes=F+1
- for each sorted run generated, we have a formula to determine which tape
it belongs to
- the sorted run is added to the end of the tape
- all sorted runs have been added to tape. There is always an empty tape
called output tape
- add dummy runs if necessary, to each tape
- merge the input tapes, write result to output tape, until one of the input
tape is empty
- repeat this process until 1 sorted run remains

I think the main difference is that at each step, the current impl does not
merge the smallest k-runs. It merges from the beginning of each tape. For 1
pass, this does not make any difference. For 2 passes onwards there are some
differences. In some complex query, where the memory is eaten by some other
operators, more passes are needed and the differences become larger. This is
the only improvement that can be achieved. Let me know if this does not make
any sense.

Regarding disk layout, I am open to suggestion whether we should reuse the
tape module or not.

On Wed, Feb 4, 2009 at 11:20 PM, Tom Lane <[email protected]> wrote:

> Don Marvick <[email protected]> writes:
> > Since nobody replied, I would give it a try. I am going to implement the
> > merge pattern described in Knuth Page 365 (5.4.9), essentially it is as
> > follow:
> > - create initial runs using replacement selection (basically this is as
> in
> > the current implementation)
> > - add enough dummy runs of size 0 so that the number of sorted run minus
> one
> > can be divided by k-1 (k is merge fan in)
> > - repetitively merge k smallest runs until only 1 run left
>
> How is this better than, or even different from, what is there now?
>
>                        regards, tom lane
>

Re: [HACKERS] polyphase merge?

Reply via email to