Jie Li <jay23j...@gmail.com> writes: > I'm new to window functions. Recently I run some simple queries but > surprised to find percent_rank is so slower than rank, could anybody tell me > why?
Huh, interesting. I can reproduce this with toy data, such as create table inventory1 (inv_date_sk int, inv_item_sk int); insert into inventory1 select 1, random()* 100000 from generate_series(1,189000); explain analyze select inv_date_sk,inv_item_sk, percent_rank()over(partition by inv_date_sk order by inv_item_sk) from inventory1; The example is *not* particularly slow if you leave work_mem at default. But if you bump up work_mem enough so that the WindowAgg's internal tuplestore fits into memory, it slows down like crazy. A bit of quality time with oprofile shows that all the time is going into this memmove() in tuplestore_trim(): /* * Slide the array down and readjust pointers. This may look pretty * stupid, but we expect that there will usually not be very many * tuple-pointers to move, so this isn't that expensive; and it keeps a * lot of other logic simple. * * In fact, in the current usage for merge joins, it's demonstrable that * there will always be exactly one non-removed tuple; so optimize that * case. */ if (nremove + 1 == state->memtupcount) state->memtuples[0] = state->memtuples[nremove]; else memmove(state->memtuples, state->memtuples + nremove, (state->memtupcount - nremove) * sizeof(void *)); We're throwing away one tuple at a time as we advance forward through the tuplestore, and moving 100000+ tuple pointers each time. Ugh. This code was all right when written, because (IIRC) the mergejoin case was actually the only caller. But it's not all right for WindowAgg's less-predictable usage patterns. I thought for a bit about changing things around so that the first-used tuple slot isn't necessarily state->memtuples[0], but just like the comment says, that complicates a lot of other logic. And there isn't any easy place to reclaim the wasted slots later. What seems like the best bet is to put in a heuristic to make tuplestore_trim simply not do anything until nremove reaches some reasonably large amount, perhaps 10% of the number of stored tuples. This wastes up to 10% of the alloted memory, but that seems tolerable. We could complicate things a bit more by remembering that so-and-so many slots are authorized to be removed, and forcing a trim operation to discard them if we find ourselves in memory trouble. I'm not sure that extra complication is worthwhile though. Comments? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers