On Thu, Jul 30, 2015 at 12:09 PM, Heikki Linnakangas <hlinn...@iki.fi> wrote:
> > True, you need a heap to hold the next tuple from each tape in the merge > step. To avoid exceeding work_mem, you'd need to push some tuples from the > in-memory array to the tape to make room for that. In practice, though, the > memory needed for the merge step's heap is tiny. Even if you merge 1000 > tapes, you only need memory for 1000 tuples in the heap. But yeah, you'll > need some logic to share/divide the in-memory array between the heap and > the "in-memory tail" of the last tape. It's a bit worse than that because we buffer up a significant chunk of the tape to avoid randomly seeking between tapes after every tuple read. But I think in today's era of large memory we don't need anywhere near the entire work_mem just to buffer to avoid random access. Something simple like a fixed buffer size per tape probably much less than 1MB/tape. I'm a bit confused where the big win comes from though. Is what's going on that the external sort only exceeded memory by a small amount so nearly all the tuples are still in memory? -- greg