Re: [HACKERS] Using quicksort and a merge step to significantly improve on tuplesort's single run "external sort"

Peter Geoghegan Thu, 30 Jul 2015 23:57:07 -0700

On Thu, Jul 30, 2015 at 4:01 PM, Peter Geoghegan <p...@heroku.com> wrote:
> On Thu, Jul 30, 2015 at 12:00 AM, Heikki Linnakangas <hlinn...@iki.fi> wrote:
>> Hmm. You don't really need to merge the in-memory array into the tape, as
>> you know that all the tuples in the in-memory must come after the tuples
>> already on the tape. You can just return all the tuples from the tape first,
>> and then all the tuples from the array.
>
> It's more complicated than it appears, I think. Tuples may be variable
> sized. WRITETUP() performs a pfree(), and gives us back a variable
> amount of availMem. What if we dumped a single, massive, outlier tuple
> out when a caller passes it and it goes to the root of the heap? We'd
> dump that massive tuple in one go (this would be an incremental
> dumptuples() call, which we still do in the patch), making things
> !LACKMEM() again, but by an usually comfortable margin. We read in a
> few more regular tuples, but we're done consuming tuples before things
> ever get LACKMEM() again (no more dumping needed, at least with this
> patch applied).
>
> What prevents the tuple at the top of the in-memory heap at the point
> of tuplesort_performsort() (say, one of the ones added to the heap as
> our glut of memory was *partially* consumed) being less than the
> last/greatest tuple on tape? If the answer is "nothing", a merge step
> is clearly required.


It's simple to prove this with the attached rough patch, intended to
be applied on top of Postgres with my patch. It hacks
tuplesort_gettuple_common() to always return tape tuples first, before
returning memtuples only when tape tuples have been totally exhausted.
If you run my cursory regression test suite with this, you'll see
serious regressions. I also attach a regression test diff file from my
development system, to save you the trouble of trying this yourself.
Note how the "count(distinct(s))" numbers get closer to being correct
(lower) as work_mem increases make tuplesort approach an internal
sort.

It's possible that we can get away with something cheaper than a merge
step, but my impression right now is that it isn't terribly expensive.
OTOH, if we can make this work with the randomAccess case by being
more clever about merging, that could be worthwhile.
-- 
Peter Geoghegan

diff --git a/src/backend/utils/sort/tuplesort.c b/src/backend/utils/sort/tuplesort.c
index 5371f4f..ce79347 100644
--- a/src/backend/utils/sort/tuplesort.c
+++ b/src/backend/utils/sort/tuplesort.c
@@ -2022,7 +2022,6 @@ just_memtuples:
 			 */
 			Assert(state->current < state->memtupcount);
 
-			if (COMPARETUP(state, stup, &state->memtuples[state->current]) <= 0)
 			{
 				/*
 				 * Tape tuple less than or equal to memtuple array current
@@ -2031,18 +2030,6 @@ just_memtuples:
 				state->cached = false;
 				*should_free = true;
 			}
-			else
-			{
-				/*
-				 * Tape tuple greater than memtuple array's current tuple.
-				 *
-				 * Return current memtuple tuple, and cache tape tuple for
-				 * next call, where it may be returned.
-				 */
-				state->cached = true;
-				*should_free = false;
-				*stup = state->memtuples[state->current++];
-			}
 			return true;
 
 		case TSS_FINALMERGE:

regression.diffs
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Using quicksort and a merge step to significantly improve on tuplesort's single run "external sort"

Reply via email to