2009/12/21 Greg Stark <gsst...@mit.edu>: > On Mon, Dec 21, 2009 at 5:48 AM, Pavel Stehule <pavel.steh...@gmail.com> > wrote: >> Information about count of rows are not detected in planner time. This >> have to by available in any executor's sort node. So this value will >> be available every time - index scan is problem. Interesting is Greg's >> info about O(N) algorithms. Then we can implement median as classic >> aggregate. >>... >> On second hand - any internal implementation of median will be faster, >> then currently used techniques. > > Some further information now that I'm on a real keyboard. > > The O(n) algorithm for median of which I'm aware is Quickselect. It's > based on Quicksort which makes it unsuitable for a disk-based > algorithm since it would have to read every block many times. Perhaps > there's some way to adapt it or there might be some other O(n) > algorithm which has better i/o patterns. > > But in cases where the tupleset/tuplesort is still in memory it would > be easy to add support for pulling out the nth element and use that in > a magic median() function. It wouldn't be a "classic" aggregate, at > least as I understand it where you also need something like O(1) space > to store the state, because you definitely need access to the whole > tupleset to do the quickselect. >
I can check speed diferences on orafce's median implementation. But orafce isn't correct - it ignores work_mem limit - so it usable only for some external modules or for testing. I'll try simple median implementation based on aggregate API. Then I'll compare speed. > If we don't find a way to optimize this for on-disk tuplestores though > then I wonder whether it's worth it. It would only help in the narrow > case that you had a large enough set to see the difference between > doing quickselect and quicksort but small enough to fit in workmem. I > suppose that case is widening over time though as memory sizes get > bigger and bigger. Thank you I have to do same test and get more info about quickselect Pavel > > -- > greg > -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers