On 07/17/2013 08:15 PM, Andrew Gierth wrote: > The spec defines two types of aggregate function classed as "ordered set > function", as follows: > > 1. An "inverse distribution function" taking one argument (which must be > a grouped column or otherwise constant within groups) plus a sorted > group with exactly one column: > > =# SELECT (func(p) WITHIN GROUP (ORDER BY q)) from ... > > The motivating example for this (and the only ones in the spec) are > percentile_cont and percentile_disc, to return a percentile result > from a continuous or discrete distribution. (Thus > percentile_cont(0.5) within group (order by x) is the spec's version > of a median(x) function.)
One question is how this relates to the existing SELECT agg_func(x order by y) ... syntax. Clearly there's some extra functionality here, but the two are very similar conceptually. > 2. A "hypothetical set function" taking N arguments of arbitrary types > (a la VARIADIC "any", rather than a fixed list) plus a sorted group > with N columns of matching types: > > =# SELECT (func(p1,p2,...) WITHIN GROUP (ORDER BY q1,q2,...)) from ... > > (where typeof(p1)==typeof(q1) and so on, at least up to trivial > conversions) > > The motivating example here is to be able to do rank(p1,p2,...) to > return the rank that the specified values would have had if they were > added to the group. Wow, I can't possibly grasp the purpose of this. Maybe a practical example? > We've also had an expression of interest in extending this to allow > percentile_disc(float8[]) and percentile_cont(float8[]) returning > arrays; e.g. percentile_cont(array[0, 0.25, 0.5, 0.75, 1]) to return an > array containing the bounds, median and quartiles in one go. This is an > extension to the spec but it seems sufficiently obviously useful to be > worth supporting. To be specific, I asked for this because it's already something I do using PL/R, although in PL/R it's pretty much limited to floats. Anyway, for anyone who isn't following why we want this: statitical summary reports. For example, I'd love to be able to do a quartile distribution of query execution times without resorting to R. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers