> -----Original Message----- > From: [EMAIL PROTECTED] [mailto:pgsql-hackers- > [EMAIL PROTECTED] On Behalf Of PFC > Sent: Thursday, September 29, 2005 9:10 AM > To: [EMAIL PROTECTED] > Cc: Pg Hackers; pgsql-performance@postgresql.org > Subject: Re: [HACKERS] [PERFORM] A Better External Sort? > > > Just to add a little anarchy in your nice debate... > > Who really needs all the results of a sort on your terabyte table ?
Reports with ORDER BY/GROUP BY, and many other possibilities. 40% of mainframe CPU cycles are spent sorting. That is because huge volumes of data require lots of energy to be meaningfully categorized. Let's suppose that instead of a terabyte of data (or a petabyte or whatever) we have 10% of it. That's still a lot of data. > I guess not many people do a SELECT from such a table and want all > the > results. What happens when they do? The cases where it is already fast are not very important. The cases where things go into the crapper are the ones that need attention. >So, this leaves : > - Really wanting all the results, to fetch using a cursor, > - CLUSTER type things, where you really want everything in order, > - Aggregates (Sort->GroupAggregate), which might really need to sort > the > whole table. > - Complex queries where the whole dataset needs to be examined, in > order > to return a few values > - Joins (again, the whole table is probably not going to be > selected) > - And the ones I forgot. > > However, > > Most likely you only want to SELECT N rows, in some ordering : > - the first N (ORDER BY x LIMIT N) > - last N (ORDER BY x DESC LIMIT N) For these, the QuickSelect algorithm is what is wanted. For example: #include <stdlib.h> typedef double Etype; extern Etype RandomSelect(Etype * A, size_t p, size_t r, size_t i); extern size_t RandRange(size_t a, size_t b); extern size_t RandomPartition(Etype * A, size_t p, size_t r); extern size_t Partition(Etype * A, size_t p, size_t r); /* ** ** In the following code, every reference to CLR means: ** ** "Introduction to Algorithms" ** By Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest ** ISBN 0-07-013143-0 */ /* ** CLR, page 187 */ Etype RandomSelect(Etype A[], size_t p, size_t r, size_t i) { size_t q, k; if (p == r) return A[p]; q = RandomPartition(A, p, r); k = q - p + 1; if (i <= k) return RandomSelect(A, p, q, i); else return RandomSelect(A, q + 1, r, i - k); } size_t RandRange(size_t a, size_t b) { size_t c = (size_t) ((double) rand() / ((double) RAND_MAX + 1) * (b - a)); return c + a; } /* ** CLR, page 162 */ size_t RandomPartition(Etype A[], size_t p, size_t r) { size_t i = RandRange(p, r); Etype Temp; Temp = A[p]; A[p] = A[i]; A[i] = Temp; return Partition(A, p, r); } /* ** CLR, page 154 */ size_t Partition(Etype A[], size_t p, size_t r) { Etype x, temp; size_t i, j; x = A[p]; i = p - 1; j = r + 1; for (;;) { do { j--; } while (!(A[j] <= x)); do { i++; } while (!(A[i] >= x)); if (i < j) { temp = A[i]; A[i] = A[j]; A[j] = temp; } else return j; } } > - WHERE x>value ORDER BY x LIMIT N > - WHERE x<value ORDER BY x DESC LIMIT N > - and other variants > > Or, you are doing a Merge JOIN against some other table ; in that > case, > yes, you might need the whole sorted terabyte table, but most likely there > are WHERE clauses in the query that restrict the set, and thus, maybe we > can get some conditions or limit values on the column to sort. Where clause filters are to be applied AFTER the join operations, according to the SQL standard. > Also the new, optimized hash join, which is more memory efficient, > might > cover this case. For == joins. Not every order by is applied to joins. And not every join is an equal join. > Point is, sometimes, you only need part of the results of your sort. > And > the bigger the sort, the most likely it becomes that you only want part of > the results. That is an assumption that will sometimes be true, and sometimes not. It is not possible to predict usage patterns for a general purpose database system. > So, while we're in the fun hand-waving, new algorithm trying > mode, why not consider this right from the start ? (I know I'm totally in > hand-waving mode right now, so slap me if needed). > > I'd say your new, fancy sort algorithm needs a few more input values > : > > - Range of values that must appear in the final result of the sort : > none, minimum, maximum, both, or even a set of values from the > other > side of the join, hashed, or sorted. That will already happen (or it certainly ought to) > - LIMIT information (first N, last N, none) That will already happen (or it certainly ought to -- I would be pretty surprised if it does not happen) > - Enhanced Limit information (first/last N values of the second > column to > sort, for each value of the first column) (the infamous "top10 by > category" query) > - etc. All the filters will (at some point) be applied to the data unless they cannot be applied to the data by formal rule. > With this, the amount of data that needs to be kept in memory is > dramatically reduced, from the whole table (even using your compressed > keys, that's big) to something more manageable which will be closer to the > size of the final result set which will be returned to the client, and > avoid a lot of effort. Sorting the minimal set is a good idea. Sometimes there is a big savings there. I would be pretty surprised if a large fraction of data that does not have to be included is actually processed during the sorts. > So, this would not be useful in all cases, but when it applies, it > would > be really useful. No argument there. And if an algorithm is being reworked, it is a good idea to look at things like filtering to see if all filtering that is allowed by the language standard before the sort takes place is applied. ---------------------------(end of broadcast)--------------------------- TIP 2: Don't 'kill -9' the postmaster