[PERFORM] Feature suggestion : FAST CLUSTER

PFC Tue, 22 May 2007 00:28:47 -0700

Well, CLUSTER is so slow (and it doesn't cluster the toast tablesassociated with the table to be clustered).

        However, when people use CLUSTER they use it to speed up their queries.
        For that the table does not need to be perfectly in-order.

        So, here is a new idea for CLUSTER :

        - choose a chunk size (about 50% of your RAM)
        - setup disk sorts for all indexes
        - seq scan the table :
                - take a chunk of chunk_size
                - sort it (in memory)
                - write it into new table file

- while we have the data on-hand, also send the indexed columns datainto the corresponding disk-sorts


        - finish the index disk sorts and rebuild indexes

This does not run a complete sort on the table. It would be about as fastas your seq scan disk throughput. Obviously, the end result is not as goodas a real CLUSTER since the table will be made up of several orderedchunks and a range lookup. Therefore, a range lookup on the clusteredcolumns would need at most N seeks, versus 1 for a really clustered table.But it only scans the table once and writes it once, even counting indexrebuild.

I would think that, with this approach, if people can CLUSTER a largetable in 5 minutes instead of hours, they will use it, instead of notusing it. Therefore, even if the resulting table is not as optimal as afully clustered table, it will still be much better than the non-clusteredcase.





---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
      subscribe-nomail command to [EMAIL PROTECTED] so that your
      message can get through to the mailing list cleanly

[PERFORM] Feature suggestion : FAST CLUSTER

Reply via email to