On Tue, 2006-01-17 at 21:43 +0000, Simon Riggs wrote: > On Tue, 2006-01-17 at 09:52 -0500, Tom Lane wrote: > > I was thinking along the lines of having multiple temp files per hash > > bucket. If you have a tuple that needs to migrate from bucket M to > > bucket N, you know that it arrived before every tuple that was > > assigned > > to bucket N originally, so put such tuples into a separate temp file > > and process them before the main bucket-N temp file. This might get a > > little tricky to manage after multiple hash resizings, but in > > principle > > it seems doable.
> You can manage that with file naming. Rows moved from batch N to batch M > would be renamed N.M, so you'd be able to use file ordering to retrieve > all files for *.M > That scheme would work for multiple splits too, so that filenames could > grow yet retain their sort order and final target batch properties. This seems to lead to a super-geometric progression in the number of files required, if we assume that the current batch could be redistributed to all future batches each of which could be similarly redistributed. batches 1 no files 2 1 file 4 7 files 8 64 files 16 64,000 files 32 4 billion files ish So it does seem important whether we demand sorted input or not. Or at least requires us to provide the executor with a starting point for the number of batches, so we could manage that. Best Regards, Simon Riggs ---------------------------(end of broadcast)--------------------------- TIP 6: explain analyze is your friend