Well, CLUSTER is so slow (and it doesn't cluster the toast tables associated with the table to be clustered).
        However, when people use CLUSTER they use it to speed up their queries.
        For that the table does not need to be perfectly in-order.

        So, here is a new idea for CLUSTER :

        - choose a chunk size (about 50% of your RAM)
        - setup disk sorts for all indexes
        - seq scan the table :
                - take a chunk of chunk_size
                - sort it (in memory)
                - write it into new table file
- while we have the data on-hand, also send the indexed columns data into the corresponding disk-sorts

        - finish the index disk sorts and rebuild indexes

This does not run a complete sort on the table. It would be about as fast as your seq scan disk throughput. Obviously, the end result is not as good as a real CLUSTER since the table will be made up of several ordered chunks and a range lookup. Therefore, a range lookup on the clustered columns would need at most N seeks, versus 1 for a really clustered table. But it only scans the table once and writes it once, even counting index rebuild.

I would think that, with this approach, if people can CLUSTER a large table in 5 minutes instead of hours, they will use it, instead of not using it. Therefore, even if the resulting table is not as optimal as a fully clustered table, it will still be much better than the non-clustered case.




---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
      subscribe-nomail command to [EMAIL PROTECTED] so that your
      message can get through to the mailing list cleanly

Reply via email to