On Sat, Feb 25, 2012 at 4:31 PM, Jeff Janes <jeff.ja...@gmail.com> wrote: >> I'm not sure about the conclusion, but given this discussion, I'm >> inclined to mark this Returned with Feedback. > > OK, thanks. Does anyone have additional feed-back on how tightly we > wish to manage memory usage? Is trying to make us use as much memory > as we are allowed to without going over a worthwhile endeavor at all, > or is it just academic nitpicking?
I'm not sure, either. It strikes me that, in general, it's hard to avoid a little bit of work_mem overrun, since we can't know whether the next tuple will fit until we've read it, and then if it turns out to be big, well, the best thing we can do is free it, but perhaps that's closing the barn door after the horse has gotten out already. Having recently spent quite a bit of time looking at tuplesort.c as a result of Peter Geoghegan's work and some other concerns, I'm inclined to think that it needs more than minor surgery. That file is peppered with numerous references to Knuth which serve the dual function of impressing the reader with the idea that the code must be very good (since Knuth is a smart guy) and rendering it almost completely impenetrable (since the design is justified by reference to a textbook most of us probably do not have copies of). A quick Google search for external sorting algorithms suggest that the typical way of doing an external sort is to read data until you fill your in-memory buffer, quicksort it, and dump it out as a run. Repeat until end-of-data; then, merge the runs (either in a single pass, or if there are too many, in multiple passes). I'm not sure whether that would be better than what we're doing now, but there seem to be enough other people doing it that we might want to try it out. Our current algorithm is to build a heap, bounded in size by work_mem, and dribble tuples in and out, but maintaining that heap is pretty expensive; there's a reason people use quicksort rather than heapsort for in-memory sorting. As a desirable side effect, I think it would mean that we could dispense with retail palloc and pfree altogether. We could just allocate a big chunk of memory, copy tuples into it until it's full, using a pointer to keep track of the next unused byte, and then, after writing the run, reset the allocation pointer back to the beginning of the buffer. That would not only avoid the cost of going through palloc/pfree, but also the memory overhead imposed by bookkeeping and power-of-two rounding. If we do want to stick with the current algorithm, there seem to be some newer techniques for cutting down on the heap maintenance overhead. Heikki's been investigating that a bit. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers