Re: [HACKERS] WITHIN GROUP patch

2014-01-07 Thread Andrew Gierth
 Tom == Tom Lane t...@sss.pgh.pa.us writes:

  Initial tests suggest that your version is ~40% slower than ours on
  some workloads.

 Tom I poked at this a bit with perf and oprofile, and concluded that
 Tom probably the difference comes from ordered_set_startup()
 Tom repeating lookups for each group that could be done just once
 Tom per query.

Retesting with your changes shows that the gap is down to 15% but still
present:

work_mem=64MB enable_hashagg=off  (for baseline test)

baseline query (333ms on both versions):
select count(*)
  from (select j from generate_series(1,3) i,
  generate_series(1,10) j group by j) s;

test query:
select count(*)
  from (select percentile_disc(0.5) within group (order by i)
  from generate_series(1,3) i,
   generate_series(1,10) j group by j) s;

On the original patch as supplied: 571ms - 333ms = 238ms
On current master: 607ms - 333ms = 274ms

Furthermore, I can't help noticing that the increased complexity has
now pretty much negated your original arguments for moving so much of
the work out of nodeAgg.c.

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WITHIN GROUP patch

2014-01-07 Thread Tom Lane
Andrew Gierth and...@tao11.riddles.org.uk writes:
 Retesting with your changes shows that the gap is down to 15% but still
 present:

There's probably some overhead from traversing advance_transition_function
for each row, which your version wouldn't have done.  15% sounds pretty
high for that though, since advance_transition_function doesn't have much
to do when the transition function is non strict and the state value is
pass-by-value (which internal is).  It's possible that we could do
something to micro-optimize that case.

I also wondered if it was possible to omit rechecking AggCheckCallContext
after the first time through in ordered_set_transition().  It seemed
unsafe to perhaps not have that happen at all, since if the point is to
detect misuse then the mere fact that argument 0 isn't null isn't much
comfort.  It strikes me though that now we could test for first call by
looking at fcinfo-flinfo-fn_extra, and be pretty sure that we've already
checked the context if that isn't null.  Whether this would save anything
noticeable isn't clear though; I didn't see AggCheckCallContext high in
the profile.

 Furthermore, I can't help noticing that the increased complexity has
 now pretty much negated your original arguments for moving so much of
 the work out of nodeAgg.c.

The key reason for that was, and remains, not having the behavior
hard-wired in nodeAgg; I believe that this design permits things to be
accomplished in aggregate implementation functions that would not have
been possible with the original patch.  I'm willing to accept some code
growth to have that flexibility.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WITHIN GROUP patch

2014-01-07 Thread Andrew Gierth
 Tom == Tom Lane t...@sss.pgh.pa.us writes:

  Furthermore, I can't help noticing that the increased complexity
  has now pretty much negated your original arguments for moving so
  much of the work out of nodeAgg.c.

 Tom The key reason for that was, and remains, not having the
 Tom behavior hard-wired in nodeAgg; I believe that this design
 Tom permits things to be accomplished in aggregate implementation
 Tom functions that would not have been possible with the original
 Tom patch.  I'm willing to accept some code growth to have that
 Tom flexibility.

Do you have an actual use case?

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WITHIN GROUP patch

2014-01-07 Thread Tom Lane
Andrew Gierth and...@tao11.riddles.org.uk writes:
 Tom == Tom Lane t...@sss.pgh.pa.us writes:
  Tom The key reason for that was, and remains, not having the
  Tom behavior hard-wired in nodeAgg; I believe that this design
  Tom permits things to be accomplished in aggregate implementation
  Tom functions that would not have been possible with the original
  Tom patch.  I'm willing to accept some code growth to have that
  Tom flexibility.

 Do you have an actual use case?

Not a concrete example of an aggregate to build, no; but it seemed
plausible to me that a new aggregate might want more control over
the sorting operation than was possible with your patch.  Basically
the only flexibility that was there was to sort a hypothetical row before
or after its peers, right?  Now it's got essentially full control over
the sort parameters.

One idea that comes to mind is that an aggregate that's interested in the
top N rows might wish to flip the sort order, so that the rows it wants
come out of the tuplesort first rather than last --- and/or it might want
to tell the tuplesort about the row limitation, so that the bounded-sort
logic can be used.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WITHIN GROUP patch

2014-01-07 Thread Tom Lane
I wrote:
 There's probably some overhead from traversing advance_transition_function
 for each row, which your version wouldn't have done.  15% sounds pretty
 high for that though, since advance_transition_function doesn't have much
 to do when the transition function is non strict and the state value is
 pass-by-value (which internal is).  It's possible that we could do
 something to micro-optimize that case.

The most obvious micro-optimization that's possible there is to avoid
redoing InitFunctionCallInfoData for each row.  I tried this as in the
attached patch.  It seems to be good for about half a percent overall
savings on your example case.  That's not much, but then again this
helps for *all* aggregates, and many of them are cheaper than ordered
aggregates.  I see about 2% overall savings on
select count(*) from generate_series(1,100);
which seems like a more interesting number.

As against that, the roughly-kilobyte-sized FunctionCallInfoData is
no longer just transient data on the stack, but persists for the lifespan
of the query.  We pay that already for plain FuncExpr and OpExpr nodes,
though.

On balance, I'm inclined to apply this; the performance benefit may be
marginal but it seems like it makes advance_transition_function's API
a tad cleaner by reducing the number of distinct structs it's hacking
on.  Comments?

regards, tom lane

diff --git a/src/backend/executor/nodeAgg.c b/src/backend/executor/nodeAgg.c
index 827b009..0dafb51 100644
*** a/src/backend/executor/nodeAgg.c
--- b/src/backend/executor/nodeAgg.c
*** typedef struct AggStatePerAggData
*** 235,240 
--- 235,248 
  	 */
  
  	Tuplesortstate *sortstate;	/* sort object, if DISTINCT or ORDER BY */
+ 
+ 	/*
+ 	 * This field is a pre-initialized FunctionCallInfo struct used for
+ 	 * calling this aggregate's transfn.  We save a few cycles per row by not
+ 	 * re-initializing the unchanging fields; which isn't much, but it seems
+ 	 * worth the extra space consumption.
+ 	 */
+ 	FunctionCallInfoData transfn_fcinfo;
  }	AggStatePerAggData;
  
  /*
*** static void initialize_aggregates(AggSta
*** 290,297 
  	  AggStatePerGroup pergroup);
  static void advance_transition_function(AggState *aggstate,
  			AggStatePerAgg peraggstate,
! 			AggStatePerGroup pergroupstate,
! 			FunctionCallInfoData *fcinfo);
  static void advance_aggregates(AggState *aggstate, AggStatePerGroup pergroup);
  static void process_ordered_aggregate_single(AggState *aggstate,
   AggStatePerAgg peraggstate,
--- 298,304 
  	  AggStatePerGroup pergroup);
  static void advance_transition_function(AggState *aggstate,
  			AggStatePerAgg peraggstate,
! 			AggStatePerGroup pergroupstate);
  static void advance_aggregates(AggState *aggstate, AggStatePerGroup pergroup);
  static void process_ordered_aggregate_single(AggState *aggstate,
   AggStatePerAgg peraggstate,
*** initialize_aggregates(AggState *aggstate
*** 399,419 
   * Given new input value(s), advance the transition function of an aggregate.
   *
   * The new values (and null flags) have been preloaded into argument positions
!  * 1 and up in fcinfo, so that we needn't copy them again to pass to the
!  * transition function.  No other fields of fcinfo are assumed valid.
   *
   * It doesn't matter which memory context this is called in.
   */
  static void
  advance_transition_function(AggState *aggstate,
  			AggStatePerAgg peraggstate,
! 			AggStatePerGroup pergroupstate,
! 			FunctionCallInfoData *fcinfo)
  {
! 	int			numTransInputs = peraggstate-numTransInputs;
  	MemoryContext oldContext;
  	Datum		newVal;
- 	int			i;
  
  	if (peraggstate-transfn.fn_strict)
  	{
--- 406,425 
   * Given new input value(s), advance the transition function of an aggregate.
   *
   * The new values (and null flags) have been preloaded into argument positions
!  * 1 and up in peraggstate-transfn_fcinfo, so that we needn't copy them again
!  * to pass to the transition function.  We also expect that the static fields
!  * of the fcinfo are already initialized; that was done by ExecInitAgg().
   *
   * It doesn't matter which memory context this is called in.
   */
  static void
  advance_transition_function(AggState *aggstate,
  			AggStatePerAgg peraggstate,
! 			AggStatePerGroup pergroupstate)
  {
! 	FunctionCallInfoData *fcinfo = peraggstate-transfn_fcinfo;
  	MemoryContext oldContext;
  	Datum		newVal;
  
  	if (peraggstate-transfn.fn_strict)
  	{
*** advance_transition_function(AggState *ag
*** 421,426 
--- 427,435 
  		 * For a strict transfn, nothing happens when there's a NULL input; we
  		 * just keep the prior transValue.
  		 */
+ 		int			numTransInputs = peraggstate-numTransInputs;
+ 		int			i;
+ 
  		for (i = 1; i = numTransInputs; i++)
  		{
  			if (fcinfo-argnull[i])
*** advance_transition_function(AggState *ag
*** 467,478 

Re: [HACKERS] WITHIN GROUP patch

2014-01-04 Thread Tom Lane
Andrew Gierth and...@tao11.riddles.org.uk writes:
 Tom == Tom Lane t...@sss.pgh.pa.us writes:
  Tom I've committed this after significant editorialization --- most
  Tom notably, I pushed control of the sort step into the aggregate
  Tom support functions.

 Initial tests suggest that your version is ~40% slower than ours on
 some workloads.

I poked at this a bit with perf and oprofile, and concluded that probably
the difference comes from ordered_set_startup() repeating lookups for each
group that could be done just once per query.  I'm not sure I believe the
40% figure; on this particular test case, oprofile breaks down the costs
of ordered_set_startup() like this:

  290.0756  postgres advance_transition_function
  3830799.9244  postgres ordered_set_transition
1445  1.0808  postgres ordered_set_startup
  3141879.4990  postgres tuplesort_begin_datum
  4056 10.2632  postgres get_typlenbyvalalign
  1445  3.6564  postgres ordered_set_startup [self]
  734   1.8573  postgres MemoryContextAllocZero
  510   1.2905  postgres RegisterExprContextCallback
  283   0.7161  postgres exprType
  247   0.6250  postgres get_sortgroupclause_tle
  235   0.5946  postgres exprCollation
  920.2328  postgres ReleaseSysCache
  780.1974  postgres SearchSysCache
  710.1797  postgres AggGetAggref
  630.1594  postgres AggCheckCallContext
  580.1468  postgres AllocSetAlloc
  460.1164  postgres PrepareSortSupportFromOrderingOp
  430.1088  postgres AggGetPerAggEContext
  400.1012  postgres get_typlenbyval
  390.0987  postgres palloc0
  350.0886  postgres MemoryContextAlloc
  170.0430  postgres get_sortgroupref_tle
  100.0253  postgres tuplesort_begin_common

The tuplesort_begin_datum calls are needed regardless --- your version
was just doing them inside nodeAgg rather than externally.  However,
get_typlenbyvalalign and some of the other calls here are to fetch
information that doesn't change across groups; probably we could arrange
to cache that info instead of recomputing it each time.  Still, it doesn't
look like that could save more than 20% of ordered_set_startup, which
itself is still only a few percent of the total query time.

Looking at this example makes me wonder if it wouldn't be worthwhile to
provide a way to reset and re-use a tuplesort object, instead of redoing
all the lookup work involved.  Or maybe just find a way to cache the
catalog lookups that are happening inside tuplesort_begin_datum, which are
about 50% of that function's cost it looks like.  We're paying this same
kind of price for repeated tuplesort setup in the existing nodeAgg code,
if we have an aggregate with ORDER BY or DISTINCT in a grouped query with
many groups.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WITHIN GROUP patch

2014-01-04 Thread David Rowley
On Sun, Jan 5, 2014 at 12:00 PM, Tom Lane t...@sss.pgh.pa.us wrote:


 Looking at this example makes me wonder if it wouldn't be worthwhile to
 provide a way to reset and re-use a tuplesort object, instead of redoing
 all the lookup work involved.  Or maybe just find a way to cache the
 catalog lookups that are happening inside tuplesort_begin_datum, which are
 about 50% of that function's cost it looks like.  We're paying this same
 kind of price for repeated tuplesort setup in the existing nodeAgg code,
 if we have an aggregate with ORDER BY or DISTINCT in a grouped query with
 many groups.


This sounds very similar to:
http://www.postgresql.org/message-id/caaphdvrbq348m8dyj-7o4vae5ps9zoq_34rgvaan1qyxl2s...@mail.gmail.com
A reset function was added in the next patch which improved performance in
my test case by about 5 times.

Perhaps they can make use of the same function.

Regards

David Rowley


 regards, tom lane


 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers



Re: [HACKERS] WITHIN GROUP patch

2013-12-28 Thread Andrew Gierth
 Tom == Tom Lane t...@sss.pgh.pa.us writes:

 Tom I've committed this after significant editorialization --- most
 Tom notably, I pushed control of the sort step into the aggregate
 Tom support functions.

Initial tests suggest that your version is ~40% slower than ours on
some workloads.

On my system, this query takes ~950ms using our dev branch of the code,
and ~1050ms on git master (using \timing in psql for timings, and taking
the best of many consecutive runs):

select count(*)
  from (select percentile_disc(0.5) within group (order by i)
  from generate_series(1,3) i, generate_series(1,10) j group by j) 
s;

About ~700ms of that is overhead, as tested by running this query with
enable_hashagg=false:

select count(*)
  from (select j
  from generate_series(1,3) i, generate_series(1,10) j group by j) 
s;

So your version is taking 350ms for the percentile calculations
compared to 250ms for ours.

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WITHIN GROUP patch

2013-12-24 Thread Atri Sharma


Sent from my iPad

 On 24-Dec-2013, at 2:50, Tom Lane t...@sss.pgh.pa.us wrote:
 
 Atri Sharma atri.j...@gmail.com writes:
 Please find attached the latest patch for WITHIN GROUP. This patch is
 after fixing the merge conflicts.
 
 I've committed this after significant editorialization --- most notably,
 I pushed control of the sort step into the aggregate support functions.
 I didn't like the way nodeAgg.c had been hacked up to do it the other way.
 There's a couple hundred lines of code handling that in orderedsetaggs.c,
 which is more or less comparable to the amount of code that didn't get
 added to nodeAgg.c, so I think the argument that the original approach
 avoided code bloat is bogus.
 
 The main reason I pushed all the new aggregates into a single file
 (orderedsetaggs.c) was so they could share a private typedef for the
 transition state value.  It's possible that we should expose that
 struct so that third-party aggregate functions could leverage the
 existing transition-function infrastructure instead of having to
 copy-and-paste it.  I wasn't sure where to put it though --- maybe
 a new include file would be needed.  Anyway I left the point for
 future discussion.
 
regards, tom lane

Thank you!


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WITHIN GROUP patch

2013-12-23 Thread Tom Lane
I wrote:
 Or, really, why don't we just do the same thing I'm advocating for
 the plain-ordered-set case?  That is, if there's a single collation
 applying to all the collatable inputs, that's the collation to use
 for the aggregate; otherwise it has no determinate collation, and
 it'll throw an error at runtime if it needs one.

I went and tried to implement this, and realized that it would take some
pretty significant rewriting of parse_collate.c, because of examples
like this:

 rank(a,b) within group (order by c collate foo, d collate bar)

In the current parse_collate logic, it would throw error immediately
upon being told to merge the two explicit-COLLATE results.  We'd
need a way to postpone that error and instead just decide that the
rank aggregate's collation is indeterminate.  While that's perhaps
just a SMOP, it would mean that ordered-set aggregates don't resolve
collation the same way as other functions, which pretty much destroys
the argument for this approach.

What's more, the same problem applies to non-hypothetical ordered-set
aggregates, if they've got more than one sortable input column.

What I'm now thinking we want to do is:

1. Non-hypothetical direct args always contribute to determining the
agg's collation.

2. Hypothetical and aggregated args contribute to the agg's collation
only if the agg is designed so that there is always exactly one
aggregated arg (ie, it's non-variadic with one aggregated arg).
Otherwise we assign their collations per-sort-column and don't merge
them into the aggregate's collation.

This specification ensures that a variadic aggregate doesn't change
behavior depending on how many sort columns there happen to be.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WITHIN GROUP patch

2013-12-23 Thread Andrew Gierth
 Tom == Tom Lane t...@sss.pgh.pa.us writes:

 Tom What I'm now thinking we want to do is:

 Tom 1. Non-hypothetical direct args always contribute to determining
 Tom the agg's collation.

 Tom 2. Hypothetical and aggregated args contribute to the agg's
 Tom collation only if the agg is designed so that there is always
 Tom exactly one aggregated arg (ie, it's non-variadic with one
 Tom aggregated arg).  Otherwise we assign their collations
 Tom per-sort-column and don't merge them into the aggregate's
 Tom collation.

 Tom This specification ensures that a variadic aggregate doesn't
 Tom change behavior depending on how many sort columns there happen
 Tom to be.

Works for me.

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WITHIN GROUP patch

2013-12-23 Thread Tom Lane
Atri Sharma atri.j...@gmail.com writes:
 Please find attached the latest patch for WITHIN GROUP. This patch is
 after fixing the merge conflicts.

I've committed this after significant editorialization --- most notably,
I pushed control of the sort step into the aggregate support functions.
I didn't like the way nodeAgg.c had been hacked up to do it the other way.
There's a couple hundred lines of code handling that in orderedsetaggs.c,
which is more or less comparable to the amount of code that didn't get
added to nodeAgg.c, so I think the argument that the original approach
avoided code bloat is bogus.

The main reason I pushed all the new aggregates into a single file
(orderedsetaggs.c) was so they could share a private typedef for the
transition state value.  It's possible that we should expose that
struct so that third-party aggregate functions could leverage the
existing transition-function infrastructure instead of having to
copy-and-paste it.  I wasn't sure where to put it though --- maybe
a new include file would be needed.  Anyway I left the point for
future discussion.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WITHIN GROUP patch

2013-12-22 Thread Andrew Gierth
 Tom == Tom Lane t...@sss.pgh.pa.us writes:

 Tom I eventually decided that we were overthinking this problem.  At
 Tom least for regular ordered-set aggregates, we can just deem that
 Tom the collation of the aggregate is indeterminate unless all the
 Tom inputs (both direct and aggregated) agree on the collation to
 Tom use.  This gives us the right answer for all the standard
 Tom aggregates, which have at most one collatable input, and it's
 Tom very unclear that anything more complicated would be an
 Tom improvement.  I definitely didn't like the hack that was in
 Tom there that'd force a sort column to absorb a possibly-unrelated
 Tom collation.

Yeah, I can go along with that, but see below.

 Tom For hypotheticals, I agree after reading the spec text that
 Tom we're supposed to unify the collations of matching hypothetical
 Tom and aggregated arguments to determine the collation to use for
 Tom sorting that column.

Yeah, the spec seemed clear enough on that.

 Tom I see that the patch just leaves these columns out of the
 Tom determination of the aggregate's result collation.  That's okay
 Tom for the time being at least, since we have no hypotheticals with
 Tom collatable output types, but I wonder if anyone wants to argue
 Tom for some other rule (and if so, what)?

Any alternative seems a bit ad-hoc to me.

The examples I've thought of which would return collatable types are
all ones that would be implemented as plain ordered set functions even
if their logic was in some sense hypothetical. For example you could
envisage a value_prec(x) within group (order by y) that returns the
value of y which sorts immediately before x, but this would just be
declared as value_prec(anyelement) within group (anyelement) rather
than engaging the hypothetical argument stuff. (It's this sort of
thing that suggested pushing down the collation into the sort column
even for non-hypothetical ordered set functions.)

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WITHIN GROUP patch

2013-12-22 Thread Tom Lane
Andrew Gierth and...@tao11.riddles.org.uk writes:
 The examples I've thought of which would return collatable types are
 all ones that would be implemented as plain ordered set functions even
 if their logic was in some sense hypothetical. For example you could
 envisage a value_prec(x) within group (order by y) that returns the
 value of y which sorts immediately before x, but this would just be
 declared as value_prec(anyelement) within group (anyelement) rather
 than engaging the hypothetical argument stuff. (It's this sort of
 thing that suggested pushing down the collation into the sort column
 even for non-hypothetical ordered set functions.)

Meh.  I see no very good reason that we shouldn't insist that the
collation be set on the sort column rather than the other input.
That is, if the alternatives are

 value_prec(x collate foo) within group (order by y)
 value_prec(x) within group (order by y collate foo)

which one makes it clearer that the ordering is to use collation foo?
The first one is at best unnatural, and at worst action-at-a-distance.
If you think of the sorting as an operation done in advance of
applying value_prec(), which is something the syntax rather encourages
you to think, there's no reason that it should absorb a collation
from a collate clause attached to a higher-level operation.

So I think the spec authors arguably got it wrong for hypotheticals,
and I'm uneager to extrapolate their choice of behavior into
situations where we don't know for certain that the collations of two
arguments must get unified.

More practically, I'm dubious about your assumption that an aggregate
like this needn't be marked hypothetical.  I see that use of
anyelement would be enough to constrain the types to be the same,
but it doesn't ordinarily constrain collations, and I don't think
it should start doing so.  So my reaction to this example is not
that we should hack the behavior for plain ordered-set aggregates,
but that we ought to find a rule that allows result-collation
determination for hypotheticals.  We speculated upthread about
merge the collations normally, but ignore inputs declared ANY
and merge the collations normally, but ignore variadic inputs.
Either of those would get the job done in this example.  I kinda
think we should pick one of these rules and move on.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WITHIN GROUP patch

2013-12-22 Thread Tom Lane
I wrote:
 ... So my reaction to this example is not
 that we should hack the behavior for plain ordered-set aggregates,
 but that we ought to find a rule that allows result-collation
 determination for hypotheticals.  We speculated upthread about
 merge the collations normally, but ignore inputs declared ANY
 and merge the collations normally, but ignore variadic inputs.
 Either of those would get the job done in this example.  I kinda
 think we should pick one of these rules and move on.

Or, really, why don't we just do the same thing I'm advocating for
the plain-ordered-set case?  That is, if there's a single collation
applying to all the collatable inputs, that's the collation to use
for the aggregate; otherwise it has no determinate collation, and
it'll throw an error at runtime if it needs one.  We realized long
ago that we can't throw most need-a-collation errors at parse time,
because the parser lacks information about which functions need to
know a collation to use.  This seems to be in the same category.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WITHIN GROUP patch

2013-12-21 Thread Tom Lane
[ still hacking away at this patch ]

Andrew Gierth and...@tao11.riddles.org.uk writes:
 Tom == Tom Lane t...@sss.pgh.pa.us writes:
  Tom Not wanting to consider the sort args when there's more than one
  Tom doesn't square with forcing them to be considered when there's
  Tom just one.  It's the same aggregate after all,

 This logic is only applied in the patch to aggregates that _aren't_
 hypothetical.

 (thinking out loud:) It might be more consistent to also add the
 condition that the single sort column not be a variadic arg. And/or
 the condition that it be the same type as the result. Or have a flag
 in pg_aggregate to say this agg returns one of its sorted input
 values, so preserve the collation.

I eventually decided that we were overthinking this problem.  At least
for regular ordered-set aggregates, we can just deem that the collation
of the aggregate is indeterminate unless all the inputs (both direct and
aggregated) agree on the collation to use.  This gives us the right
answer for all the standard aggregates, which have at most one collatable
input, and it's very unclear that anything more complicated would be an
improvement.  I definitely didn't like the hack that was in there that'd
force a sort column to absorb a possibly-unrelated collation.

For hypotheticals, I agree after reading the spec text that we're supposed
to unify the collations of matching hypothetical and aggregated arguments
to determine the collation to use for sorting that column.  I see that
the patch just leaves these columns out of the determination of the
aggregate's result collation.  That's okay for the time being at least,
since we have no hypotheticals with collatable output types, but I wonder
if anyone wants to argue for some other rule (and if so, what)?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WITHIN GROUP patch

2013-12-09 Thread Peter Eisentraut
On 11/21/13, 5:04 AM, Atri Sharma wrote:
 Please find attached the latest patch for WITHIN GROUP. This patch is
 after fixing the merge conflicts.

I would like to see more explanations and examples in the documentation.
 You introduce this feature with Ordered set functions compute a single
result from an ordered set of input values.  But string_agg, for
example, does that as well, so it's not clear how this is different.
Between ordered aggregates, window functions, and this new feature, it
can get pretty confusing.  Also, the hypothetical part should perhaps
be explained in more detail.  The tutorial part of the documentation
contains a nice introduction to window function.  I suggest you add
something like that as well.

In func.sgml, please list the functions in alphabetical order.

Also, don't write should when you mean must.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WITHIN GROUP patch

2013-12-08 Thread Tom Lane
Andrew Gierth and...@tao11.riddles.org.uk writes:
 There's also the question of ungrouped vars, maybe. Consider these two
 queries:

 select array(select a+sum(x) from (values (0.3),(0.7)) v(a) group by a)
   from generate_series(1,5) g(x);

 select array(select percentile_disc(a) within group (order by x)
from (values (0.3),(0.7)) v(a) group by a)
   from generate_series(1,5) g(x);

 In both cases the aggregation query is the outer one; but while the first
 can return a value, I think the second one has to fail (at least I can't
 see any reasonable way of executing it).

Hm, interesting.  So having decided that the agg has level 1, we need to
reject any level-0 vars in the direct parameters, grouped or not.

We could alternatively decide that the agg has level 0, but that doesn't
seem terribly useful, and I think it's not per spec either.  SQL:2008
section 6.9 set function specification seems pretty clear that
only aggregated arguments should be considered when determining the
semantic level of an aggregate.  OTOH, I don't see any text there
restricting what can be in the non-aggregated arguments, so maybe the
committee thinks this case is sensible?  Or they just missed it.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WITHIN GROUP patch

2013-12-08 Thread Andrew Gierth
 Tom == Tom Lane t...@sss.pgh.pa.us writes:

 Tom We could alternatively decide that the agg has level 0, but that
 Tom doesn't seem terribly useful, and I think it's not per spec
 Tom either.  SQL:2008 section 6.9 set function specification seems
 Tom pretty clear that only aggregated arguments should be considered
 Tom when determining the semantic level of an aggregate.  OTOH, I
 Tom don't see any text there restricting what can be in the
 Tom non-aggregated arguments, so maybe the committee thinks this
 Tom case is sensible?  Or they just missed it.

My bet is that they missed it, because there's another obvious
oversight there; it doesn't define column references in the FILTER
clause applied to an ordered set function as being aggregated column
references, yet it's clear that this must be the case (since they
filter the set of rows that the aggregated column references refer
to).

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WITHIN GROUP patch

2013-12-07 Thread Tom Lane
Andrew Gierth and...@tao11.riddles.org.uk writes:
 Tom == Tom Lane t...@sss.pgh.pa.us writes:
  Tom I believe that the spec requires that the direct arguments of
  Tom an inverse or hypothetical-set aggregate must not contain any
  Tom Vars of the current query level.

 False.

After examining this more closely, ISTM that the direct arguments are
supposed to be processed as if they weren't inside an aggregate call at all.
That being the case, isn't it flat out wrong for check_agg_arguments()
to be examining the agg_ordset list?  It should ignore those expressions
whilst determining the aggregate's semantic level.  As an example, an
upper-level Var in those expressions isn't grounds for deciding that the
aggregate isn't of the current query level.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WITHIN GROUP patch

2013-12-07 Thread Andrew Gierth
 Tom == Tom Lane t...@sss.pgh.pa.us writes:

 Tom After examining this more closely, ISTM that the direct
 Tom arguments are supposed to be processed as if they weren't inside
 Tom an aggregate call at all.  That being the case, isn't it flat
 Tom out wrong for check_agg_arguments() to be examining the
 Tom agg_ordset list?  It should ignore those expressions whilst
 Tom determining the aggregate's semantic level.  As an example, an
 Tom upper-level Var in those expressions isn't grounds for deciding
 Tom that the aggregate isn't of the current query level.

Hmm... yes, you're probably right; but we'd still have to check somewhere
for improper nesting, no? since not even the direct args are allowed to
contain nested aggregate calls.

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WITHIN GROUP patch

2013-12-07 Thread Tom Lane
Andrew Gierth and...@tao11.riddles.org.uk writes:
 Tom == Tom Lane t...@sss.pgh.pa.us writes:
  Tom After examining this more closely, ISTM that the direct
  Tom arguments are supposed to be processed as if they weren't inside
  Tom an aggregate call at all.  That being the case, isn't it flat
  Tom out wrong for check_agg_arguments() to be examining the
  Tom agg_ordset list?  It should ignore those expressions whilst
  Tom determining the aggregate's semantic level.  As an example, an
  Tom upper-level Var in those expressions isn't grounds for deciding
  Tom that the aggregate isn't of the current query level.

 Hmm... yes, you're probably right; but we'd still have to check somewhere
 for improper nesting, no? since not even the direct args are allowed to
 contain nested aggregate calls.

Yeah, I had come to that same conclusion while making the changes;
actually, check_agg_arguments needs to look for aggs but not vars there.

In principle we could probably support aggs there if we really wanted to;
but it would result in evaluation-order dependencies among the aggs of a
single query level, which doesn't seem like something we want to take on
for a feature that's not required by spec.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WITHIN GROUP patch

2013-12-07 Thread Andrew Gierth
 Tom == Tom Lane t...@sss.pgh.pa.us writes:

  Hmm... yes, you're probably right; but we'd still have to check
  somewhere for improper nesting, no? since not even the direct args
  are allowed to contain nested aggregate calls.

 Tom Yeah, I had come to that same conclusion while making the
 Tom changes; actually, check_agg_arguments needs to look for aggs
 Tom but not vars there.

There's also the question of ungrouped vars, maybe. Consider these two
queries:

select array(select a+sum(x) from (values (0.3),(0.7)) v(a) group by a)
  from generate_series(1,5) g(x);

select array(select percentile_disc(a) within group (order by x)
   from (values (0.3),(0.7)) v(a) group by a)
  from generate_series(1,5) g(x);

In both cases the aggregation query is the outer one; but while the first
can return a value, I think the second one has to fail (at least I can't
see any reasonable way of executing it).

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WITHIN GROUP patch

2013-12-06 Thread Andrew Gierth
 Tom == Tom Lane t...@sss.pgh.pa.us writes:

 Tom Further questions about WITHIN GROUP:

 Tom I believe that the spec requires that the direct arguments of
 Tom an inverse or hypothetical-set aggregate must not contain any
 Tom Vars of the current query level.

False.

The spec requires that the direct arguments must not contain _ungrouped_
columns (see set function specification), but nowhere are grouped
columns forbidden.

 Tom They don't manage to say that in plain English, of course, but
 Tom in the hypothetical set function case the behavior is defined
 Tom in terms of this sub-select:

 Tom   ( SELECT 0, SK1, ..., SKK
 Tom FROM TNAME UNION ALL
 Tom VALUES (1, VE1, ..., VEK) )

TNAME there is not the input table or an alias for it, but rather
the specific subset of rows to which the grouping operation is being
applied (after applying a FILTER if any).

We're in the General Rules here, not the Syntax Rules, so this is
describing _how to compute the result_ rather than a syntactic
transformation of the input.

In any event, going by the docs on the web, Oracle does not forbid
grouped columns there (their wording is This expr must be constant
within each aggregation group.) MSSQL seems to require a literal
constant, but that's obviously not per spec. IBM doesn't seem to
have it in db2 for linux, but some of their other products have it
and include examples of using grouped vars: see

http://pic.dhe.ibm.com/infocenter/ntz/v7r0m3/index.jsp?topic=%2Fcom.ibm.nz.dbu.doc%2Fc_dbuser_hypothetical_set_family_syntax.html

So I'm going to say that the majority opinion is on my side here.

 Tom So that's all fine, but the patch seems a bit confused about it.

The patch treats vars in the direct args exactly as it would treat
them anywhere else where they must be ungrouped.

[snip a bunch of stuff based on false assumptions]

 Tom What I now think we should do about the added pg_aggregate
 Tom columns is to have:

 Tom aggnfixedargs int number of fixed arguments, excluding any
 Tom   hypothetical ones (always 0 for normal aggregates)

 Tom aggkind   char'n' for normal aggregate,
 Tom   'o' for ordered set function,
 Tom   'h' for hypothetical-set function

I don't see any obvious problem with this.

 Tom with the parsing rules that given

 Tom  agg( n fixed arguments ) WITHIN GROUP ( ORDER BY k sort 
specifications )

 Tom 1. WITHIN GROUP is disallowed for normal aggregates.

(This is what the submitted patch does.)

 Tom 2. For an ordered set function, n must equal aggnfixedargs.  We
 Tom treat all n fixed arguments as contributing to the aggregate's
 Tom result collation, but ignore the sort arguments.

That doesn't work for getting a sensible collation out of
percentile_disc applied to a collatable type. (Which admittedly is an
extension to the spec, which allows only numeric and interval, but it
seems to me to be worth having.)

 Tom 3. For a hypothetical-set function, n must equal aggnfixedargs
 Tom plus k, and we match up type and collation info of the last k
 Tom fixed arguments with the corresponding sort arguments.  The
 Tom first n-k fixed arguments contribute to the aggregate's result
 Tom collation, the rest not.

The submitted patch does essentially this but taking the number of
non-variadic args in place of the suggested aggnfixedargs. Presumably
in your version the latter would be derived from the former?

 Tom Reading back over this email, I see I've gone back and forth
 Tom between using the terms direct args and fixed args for the
 Tom evaluate-once stuff to the left of WITHIN GROUP.  I guess I'm
 Tom not really sold on either terminology, but we need something we
 Tom can use consistently in the code and docs.  The spec is no help,
 Tom it has no generic term at all for these args.  Does anybody else
 Tom have a preference, or maybe another suggestion entirely?

We (Atri and I) have been using direct args, but personally I'm not
amazingly happy with it. Documentation for other dbs tends to just call
them arguments, and refer to the WITHIN GROUP expressions as ordering
expressions or similar.

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WITHIN GROUP patch

2013-12-06 Thread Tom Lane
Andrew Gierth and...@tao11.riddles.org.uk writes:
 Tom == Tom Lane t...@sss.pgh.pa.us writes:
  Tom I believe that the spec requires that the direct arguments of
  Tom an inverse or hypothetical-set aggregate must not contain any
  Tom Vars of the current query level.

 In any event, going by the docs on the web, Oracle does not forbid
 grouped columns there (their wording is This expr must be constant
 within each aggregation group.) MSSQL seems to require a literal
 constant, but that's obviously not per spec. IBM doesn't seem to
 have it in db2 for linux, but some of their other products have it
 and include examples of using grouped vars: see
 http://pic.dhe.ibm.com/infocenter/ntz/v7r0m3/index.jsp?topic=%2Fcom.ibm.nz.dbu.doc%2Fc_dbuser_hypothetical_set_family_syntax.html

OK, that's reasonably convincing.  I think we'll need a HINT or something
to clarify the error message, because it sure looks like those arguments
are used in an aggregate function.

  Tom 2. For an ordered set function, n must equal aggnfixedargs.  We
  Tom treat all n fixed arguments as contributing to the aggregate's
  Tom result collation, but ignore the sort arguments.

 That doesn't work for getting a sensible collation out of
 percentile_disc applied to a collatable type. (Which admittedly is an
 extension to the spec, which allows only numeric and interval, but it
 seems to me to be worth having.)

Meh.  I don't think you can have that and also have the behavior that
multiple ORDER BY items aren't constrained to have the same collation;
at least not without some rule that amounts to a special case for
percentile_disc, which I'd resist.

  Tom 3. For a hypothetical-set function, n must equal aggnfixedargs
  Tom plus k, and we match up type and collation info of the last k
  Tom fixed arguments with the corresponding sort arguments.  The
  Tom first n-k fixed arguments contribute to the aggregate's result
  Tom collation, the rest not.

 The submitted patch does essentially this but taking the number of
 non-variadic args in place of the suggested aggnfixedargs. Presumably
 in your version the latter would be derived from the former?

I'm not on board with using variadic vs non variadic to determine this.
For example, imagine a hypothetical-set function that for some reason
supports only a single sort column; there would be no reason to use
VARIADIC in its definition, and indeed good reason not to.  In any
case, I don't think this behavior should be tied to implementation details
of the representation of the function signature, and IMV variadic is
just that --- particularly for VARIADIC ANY, which is nothing more than
a short-cut for overloading the function name with different numbers of
ANY arguments.  Once we've got a match that involves N direct arguments
and K ordering arguments, the behavior should be determinate without
respect to just how we got that match.

  Tom Reading back over this email, I see I've gone back and forth
  Tom between using the terms direct args and fixed args for the
  Tom evaluate-once stuff to the left of WITHIN GROUP.  I guess I'm
  Tom not really sold on either terminology, but we need something we
  Tom can use consistently in the code and docs.  The spec is no help,
  Tom it has no generic term at all for these args.  Does anybody else
  Tom have a preference, or maybe another suggestion entirely?

 We (Atri and I) have been using direct args, but personally I'm not
 amazingly happy with it. Documentation for other dbs tends to just call
 them arguments, and refer to the WITHIN GROUP expressions as ordering
 expressions or similar.

Well, given that I was mistaken to think there could be no Vars at all
in them, fixed may not be le mot juste.  Unless somebody's got an
alternative to direct, let's go with that.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WITHIN GROUP patch

2013-12-06 Thread Tom Lane
I don't especially like the syntax you invented for listing arguments in
CREATE AGGREGATE, especially not the WITHIN GROUP (*) special case.
If we stick with that we're going to need to touch a lot more places than
you have, notably regprocedure.  I also fear that this syntax is not
appropriate for identifying aggregates reliably, ie, aggregate argument
lists that look different in this syntax could reduce to identical
pg_proc.proargs lists, and perhaps vice versa.

I think we should just have it list the arguments as they'd appear in
pg_proc, and rely on aggregate properties (to wit, aggkind and
aggndirectargs) to identify ordered-set and hypothetical aggregates.

A slightly different question is what \da ought to print.  I can't
say I think that (VARIADIC any) WITHIN GROUP (*) is going to prove
very helpful to users, but probably just (VARIADIC any) isn't
going to do either, at least not unless we add an aggregate-kind
column to the printout, and maybe not even then.  It might work to
cheat by duplicating the last item if it's variadic:
  (..., VARIADIC any) WITHIN GROUP (VARIADIC any)
while if it's not variadic, we'd have to work out which argument
positions correspond to the ordered-set arguments and put them
in the right places.

Regardless of that, though ... what is the reasoning for inventing
pg_get_aggregate_arguments() rather than just making
pg_get_function_arguments() do the right thing?  The separate function
seems to accomplish little except complicating life for clients, eg in
psql's describe.c.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WITHIN GROUP patch

2013-12-06 Thread Andrew Gierth
 Tom == Tom Lane t...@sss.pgh.pa.us writes:

 Tom Regardless of that, though ... what is the reasoning for
 Tom inventing pg_get_aggregate_arguments() rather than just making
 Tom pg_get_function_arguments() do the right thing?

pg_get_function_arguments()'s interface assumes that the caller is
providing the enclosing parens. Changing it would have meant returning
a result like 'float8) WITHIN GROUP (float8' which I reckoned would
have too much chance of breaking existing callers.

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WITHIN GROUP patch

2013-12-06 Thread Tom Lane
Andrew Gierth and...@tao11.riddles.org.uk writes:
 Tom == Tom Lane t...@sss.pgh.pa.us writes:
  Tom Regardless of that, though ... what is the reasoning for
  Tom inventing pg_get_aggregate_arguments() rather than just making
  Tom pg_get_function_arguments() do the right thing?

 pg_get_function_arguments()'s interface assumes that the caller is
 providing the enclosing parens. Changing it would have meant returning
 a result like 'float8) WITHIN GROUP (float8' which I reckoned would
 have too much chance of breaking existing callers.

Well, as it stands, existing callers are broken a fortiori; they can't
possibly produce the right output for an ordered-set aggregate, if we
define the right output as looking like that.  Of course, if we define
the right output as being just the arguments according to pg_proc, it's
fine.  But your point about the parens is a good one anyway, because now
that I think about it, what \da has traditionally printed is

 pg_catalog | sum  | bigint   | integer | sum as bigint acro
ss all integer input values

and I see the patch makes it do this

 pg_catalog | sum  | bigint   | (integer)   | sum as bigint acro

which I cannot find to be an improvement, especially since \df does
not look like that.  (As patched, the output of \df ran* is positively
schizophrenic.)

One possibility is to forget all the parens and say that the display
looks like

  type1, type2 WITHIN GROUP type3, type4

Please don't object that that doesn't look exactly like the syntax
for calling the function, because it doesn't anyway --- remember you
also need ORDER BY in the call.

Or perhaps we could abbreviate that --- maybe just WITHIN?  Abbreviation
would be a good thing, especially if we're going to say 'VARIADIC any'
twice in common cases.  OTOH I'm not sure we can make that work as a
declaration syntax without reserving WITHIN.  I think WITHIN GROUP would
work, though I've not tried to see if bison likes it.

Anyway, the long and the short of this is that the SQL committee's bizarre
choice of syntax for calling these functions should not be followed too
slavishly when declaring them.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WITHIN GROUP patch

2013-12-06 Thread Tom Lane
I wrote:
 One possibility is to forget all the parens and say that the display
 looks like
   type1, type2 WITHIN GROUP type3, type4
 Please don't object that that doesn't look exactly like the syntax
 for calling the function, because it doesn't anyway --- remember you
 also need ORDER BY in the call.

Actually, now that I think of it, why not use this syntax for declaration
and display purposes:
type1, type2 ORDER BY type3, type4
This has nearly as much relationship to the actual calling syntax as the
WITHIN GROUP proposal does, and it's hugely saner from a semantic
standpoint, because after all the ordering columns are ordering columns,
not grouping columns.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WITHIN GROUP patch

2013-12-06 Thread Andrew Gierth
 Tom == Tom Lane t...@sss.pgh.pa.us writes:

  Please don't object that that doesn't look exactly like the syntax
  for calling the function, because it doesn't anyway --- remember
  you also need ORDER BY in the call.

 Tom Actually, now that I think of it, why not use this syntax for
 Tom declaration and display purposes:

 Tom   type1, type2 ORDER BY type3, type4

 Tom This has nearly as much relationship to the actual calling
 Tom syntax as the WITHIN GROUP proposal does,

But unfortunately it looks exactly like the calling sequence for a
normal aggregate with an order by clause - I really think that is
potentially too much confusion. (It's one thing not to look like
the calling syntax, it's another to look exactly like a valid
calling sequence for doing something _different_.)

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WITHIN GROUP patch

2013-12-06 Thread David Johnston
Andrew Gierth wrote
 Tom == Tom Lane lt;

 tgl@.pa

 gt; writes:
 
   Please don't object that that doesn't look exactly like the syntax
   for calling the function, because it doesn't anyway --- remember
   you also need ORDER BY in the call.
 
  Tom Actually, now that I think of it, why not use this syntax for
  Tom declaration and display purposes:
 
  Tom type1, type2 ORDER BY type3, type4
 
  Tom This has nearly as much relationship to the actual calling
  Tom syntax as the WITHIN GROUP proposal does,
 
 But unfortunately it looks exactly like the calling sequence for a
 normal aggregate with an order by clause - I really think that is
 potentially too much confusion. (It's one thing not to look like
 the calling syntax, it's another to look exactly like a valid
 calling sequence for doing something _different_.)
 
 -- 
 Andrew (irc:RhodiumToad)

How about:

type1, type2 GROUP ORDER type3, type4

OR

GROUP type1, type2 ORDER type3, type4

David J.






--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/Re-WITHIN-GROUP-patch-tp5773851p5782202.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WITHIN GROUP patch

2013-12-06 Thread Tom Lane
Andrew Gierth and...@tao11.riddles.org.uk writes:
 Tom == Tom Lane t...@sss.pgh.pa.us writes:
  Tom Actually, now that I think of it, why not use this syntax for
  Tom declaration and display purposes:
  Tom type1, type2 ORDER BY type3, type4

 But unfortunately it looks exactly like the calling sequence for a
 normal aggregate with an order by clause - I really think that is
 potentially too much confusion.

I thought about that too, but really that ship sailed long ago, and it
went to sea under the SQL committee's captaincy, so it's not our fault.
There are already at least four different standards-blessed ways you can
use ORDER BY in a query, some of them quite nearby (eg window functions);
so the potential for confusion is there no matter what we do.  In this
case, if we describe ordered-set aggregates using WITHIN GROUP rather than
ORDER BY, we might avoid confusion with the normal-aggregate case, but
instead we will have confusion about what the arguments even do.  Is
semantic confusion better than syntactic confusion?

Another thing to think about here is to wonder why the committee chose
anything as verbose as agg(...) WITHIN GROUP (ORDER BY ...) in the
first place.  The words ORDER BY certainly seem pretty unnecessary.
I'm suspicious that they might've been leaving the door open to put other
things into the second set of parens later --- GROUP BY, maybe?  So down
the road, we might regret it if we key off WITHIN GROUP and not ORDER BY.

Having said that, I'm not so dead set on it that I won't take WITHIN GROUP
if that's what more people want.  But we gotta lose the extra parens; they
are just too strange for function declaration/documentation purposes.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WITHIN GROUP patch

2013-12-06 Thread Andrew Gierth
 Tom == Tom Lane t...@sss.pgh.pa.us writes:

  pg_get_function_arguments()'s interface assumes that the caller is
  providing the enclosing parens. Changing it would have meant
  returning a result like 'float8) WITHIN GROUP (float8' which I
  reckoned would have too much chance of breaking existing callers.

 Tom Well, as it stands, existing callers are broken a fortiori;

Not in most cases, because using the output of
pg_get_function_arguments works in all contexts except for
constructing the CREATE AGGREGATE statement (for example, a function
that uses the pg_get_function_arguments output to construct GRANTs
or ALTER OWNERs would still work).

And since constructing a correct CREATE AGGREGATE statement for an
ordered set function requires that the caller know about the
additional options to supply in the body, this does not seem like a
restriction.

 Tom Of course, if we define the right output as being just the
 Tom arguments according to pg_proc, it's fine.

The patch accepts the 'just the arguments according to pg_proc' as
input everywhere except in CREATE AGGREGATE, in addition to the
syntax with explicit WITHIN GROUP.

(With the exception of GRANT, as it happens, where since there is no
existing GRANT ON AGGREGATE variant and the patch doesn't add one,
only the pg_proc arguments form is accepted.)

 Tom But your point about the parens is a good one anyway, because
 Tom now that I think about it, what \da has traditionally printed is

 Tom  pg_catalog | sum  | bigint   | integer | sum as 
bigint acro
 Tom ss all integer input values

 Tom and I see the patch makes it do this

 Tom  pg_catalog | sum  | bigint   | (integer)   | sum as 
bigint acro

 Tom which I cannot find to be an improvement, especially since \df
 Tom does not look like that.  (As patched, the output of \df ran*
 Tom is positively schizophrenic.)

Since I don't particularly trust my own judgement on aesthetics, I used
the feedback I got from others when deciding what to do. Frankly I think
this one needs wider input than just you and me arguing over it.

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WITHIN GROUP patch

2013-12-06 Thread Josh Berkus
On 12/06/2013 01:30 PM, Andrew Gierth wrote:
 Since I don't particularly trust my own judgement on aesthetics, I used
 the feedback I got from others when deciding what to do. Frankly I think
 this one needs wider input than just you and me arguing over it.

Can someone paste examples of the two syntax alternatives we're talking
about here?  I've lost track.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WITHIN GROUP patch

2013-12-06 Thread Andrew Gierth
 Tom == Tom Lane t...@sss.pgh.pa.us writes:

 Tom Another thing to think about here is to wonder why the committee chose
 Tom anything as verbose as agg(...) WITHIN GROUP (ORDER BY ...) in the
 Tom first place.  The words ORDER BY certainly seem pretty unnecessary.

All of the ordered-set functions that the spec defines are intimately tied
to ordering of values, and they allow things like DESC ordering to affect
the results, so it seems obvious to me that they used (ORDER BY ...) because
what follows is both syntactically and semantically similar to any other use
of ORDER BY. (Similar logic seems to have been used for FILTER (WHERE ...).)
(The name ordered set function also seems quite specific.)

So I don't think there's any greater chance of the spec people adding
a WITHIN GROUP (something_other_than_order_by) construct than of them
adding any other awkward piece of syntax - and if they did, it would
represent an entirely new class of aggregate functions, separate from
ordered set functions and no doubt with its own headaches.

(I realize that expecting sanity from the spec writers is perhaps unwise)

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WITHIN GROUP patch

2013-12-06 Thread Tom Lane
Josh Berkus j...@agliodbs.com writes:
 Can someone paste examples of the two syntax alternatives we're talking
 about here?  I've lost track.

I'll leave it to Andrew to describe/defend exactly what his patch is
doing.  The alternative I'm thinking about is that in CREATE AGGREGATE
as well as \da output, the argument list of an ordered-set aggregate
would look like

  type1, type2, ... ORDER BY type3, type4, ...

if the aggregate only permits a fixed number of ordering columns
(as mode() does for example).  If it permits a variable number of
ordering columns, you could have

  type1, type2, ... ORDER BY [ type3, type4, ... ] VARIADIC something

99% of the time the right-hand part would just be VARIADIC ANY
but if an aggregate had need to lock down the ordering column types,
or the leading ordering column types, it could do that.  If it needs
a variable number of direct arguments as well (which in particular
hypothetical set functions do) then you would write

  [ type1, type2, ... ] VARIADIC something ORDER BY VARIADIC something

where we constrain the two somethings to be the same.  (Again, these
would *usually* be ANY, but I can envision that an aggregate might want to
constrain the argument types more than that.)  We have to constrain this
case because the underlying pg_proc representation and parser function
lookup code only allow the last argument to be declared VARIADIC.  So
under the hood, this last case would be represented in pg_proc with
proargs looking like just [ type1, type2, ... ] VARIADIC something,
whereas in the first two cases the proargs representation would contain
all the same entries as above.

We could substitute something else for ORDER BY without doing a lot
of violence to this, if people are really down on that aspect.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WITHIN GROUP patch

2013-12-06 Thread Andrew Gierth
 Josh == Josh Berkus j...@agliodbs.com writes:

  Since I don't particularly trust my own judgement on aesthetics, I
  used the feedback I got from others when deciding what to
  do. Frankly I think this one needs wider input than just you and
  me arguing over it.

 Josh Can someone paste examples of the two syntax alternatives we're
 Josh talking about here?  I've lost track.

OK. The starting point is that this is the calling syntax for ordered
set funcs, set by the spec:

SELECT func(foo) WITHIN GROUP (ORDER BY bar) FROM ...

where foo and bar might be fixed types, or polymorphic or variadic.

So we need to define (with no help from the spec) at least these:

  - what syntax is used in CREATE AGGREGATE to specify the number and
types of parameters for a newly defined func

  - what syntax is used to refer to the function in a
GRANT ... ON FUNCTION ...  statement, or ALTER ... OWNER TO ...

  - what should ::regprocedure casts accept as input and produce as
output

  - what output is shown in \df and \da when listing the function's
argument types

The patch as submitted answers those questions as follows:

CREATE AGGREGATE func(integer) WITHIN GROUP (text) ...

GRANT ... ON FUNCTION func(integer,text) ...
(there is no GRANT ... ON AGGREGATE ... (yet))

ALTER AGGREGATE func(integer) WITHIN GROUP (text) OWNER TO ...
ALTER AGGREGATE func(integer,text) OWNER TO ...
ALTER FUNCTION func(integer,text) OWNER TO ...
(all three of those are equivalent)

regprocedure outputs 'func(integer,text)' and accepts only that as
input

postgres=# \df func
 List of functions
 Schema | Name | Result data type | Argument data types   | Type 
+--+--+---+--
 public | func | text | (integer) WITHIN GROUP (text) | agg
(1 row)

If there's also a func() that isn't an ordered set function, you get
output like this (which provoked tom's schitzophrenic comment):

postgres=# \df ran[a-z]{1,5}
   List of functions
   Schema   |   Name   | Result data type |Argument data types| 
 Type  
+--+--+---+
 pg_catalog | random   | double precision |   | 
normal
 pg_catalog | rangesel | double precision | internal, oid, internal, integer  | 
normal
 pg_catalog | rank | bigint   |   | 
window
 pg_catalog | rank | bigint   | (VARIADIC any) WITHIN GROUP (*) | 
agg
(4 rows)

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WITHIN GROUP patch

2013-12-06 Thread Tom Lane
Andrew Gierth and...@tao11.riddles.org.uk writes:
 The patch as submitted answers those questions as follows:

 CREATE AGGREGATE func(integer) WITHIN GROUP (text) ...

You've glossed over a significant amount of complexity, as shown by your
example that prints WITHIN GROUP (*), a syntax that you've not defined
here.

In the long run we might think it worthwhile to actually store two
separate arglists for ordered-set aggregates; probably, pg_proc.proargs
would just describe the direct arguments and there'd be a second oidvector
in pg_aggregate that would describe the ORDER BY arguments.  This'd let
them be independently VARIADIC, or not.  I'm not proposing we do that
right now, because we don't have any use-cases that aren't sufficiently
handled by the hack of letting a single VARIADIC ANY entry cover both sets
of arguments.  I'd like though that the external syntax not be something
that prevents that from ever happening, and I'm afraid that this (*)
business is cheating enough to be a problem.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WITHIN GROUP patch

2013-12-06 Thread Andrew Gierth
 Tom == Tom Lane t...@sss.pgh.pa.us writes:

 Tom 2. For an ordered set function, n must equal aggnfixedargs.  We
 Tom treat all n fixed arguments as contributing to the aggregate's
 Tom result collation, but ignore the sort arguments.

  That doesn't work for getting a sensible collation out of
  percentile_disc applied to a collatable type. (Which admittedly is
  an extension to the spec, which allows only numeric and interval,
  but it seems to me to be worth having.)

 Tom Meh.  I don't think you can have that and also have the behavior
 Tom that multiple ORDER BY items aren't constrained to have the same
 Tom collation; at least not without some rule that amounts to a
 Tom special case for percentile_disc, which I'd resist.

What the submitted patch does (as discussed in the comment in
parse_collate) is to treat the sort argument as contributing to the
collation only if there is exactly one sort arg.

Consider a construct like:

select max(common_val)
  from (select mode() within group (order by textcol) as common_val
  from ... group by othercol) s;

(the same arguments for percentile_disc also apply to mode() and to
any other ordered set function that returns a value chosen from its
input sorted set)

Having this sort of thing not preserve the collation of textcol (or
fail) would be, IMO, surprising and undesirable.

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WITHIN GROUP patch

2013-12-06 Thread Tom Lane
Andrew Gierth and...@tao11.riddles.org.uk writes:
 Tom == Tom Lane t...@sss.pgh.pa.us writes:
  Tom Meh.  I don't think you can have that and also have the behavior
  Tom that multiple ORDER BY items aren't constrained to have the same
  Tom collation; at least not without some rule that amounts to a
  Tom special case for percentile_disc, which I'd resist.

 What the submitted patch does (as discussed in the comment in
 parse_collate) is to treat the sort argument as contributing to the
 collation only if there is exactly one sort arg.

Blech :-(

Not wanting to consider the sort args when there's more than one doesn't
square with forcing them to be considered when there's just one.
It's the same aggregate after all, and in general the semantics ought
to be the same whether you give one sort col or three.

We might be able to make this work sanely by considering only argument
positions that were declared something other than ANY (anyelement being
other than that, so this isn't leaving polymorphics in the cold entirely).
This is a bit unlike the normal rules for collation assignment but
it doesn't result in weird semantics changes depending on how many sort
columns you supply.

 Consider a construct like:

 select max(common_val)
   from (select mode() within group (order by textcol) as common_val
   from ... group by othercol) s;

AFAICT none of the SQL-spec aggregates expose the kind of case I'm worried
about, because none of the ones that can take multiple sort columns have a
potentially collatable return type.  I don't think that justifies not
thinking ahead, though.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WITHIN GROUP patch

2013-12-06 Thread Andrew Gierth
 Tom == Tom Lane t...@sss.pgh.pa.us writes:

 Tom Not wanting to consider the sort args when there's more than one
 Tom doesn't square with forcing them to be considered when there's
 Tom just one.  It's the same aggregate after all,

This logic is only applied in the patch to aggregates that _aren't_
hypothetical.

(thinking out loud:) It might be more consistent to also add the
condition that the single sort column not be a variadic arg. And/or
the condition that it be the same type as the result. Or have a flag
in pg_aggregate to say this agg returns one of its sorted input
values, so preserve the collation.

  Consider a construct like:

  select max(common_val)
  from (select mode() within group (order by textcol) as common_val
  from ... group by othercol) s;

 Tom AFAICT none of the SQL-spec aggregates expose the kind of case
 Tom I'm worried about, because none of the ones that can take
 Tom multiple sort columns have a potentially collatable return type.

None of the spec's ordered-set functions expose any collation issue at
all, because they _all_ have non-collatable return types, period.

The problem only arises from the desire to make functions like
percentile_disc and mode applicable to collatable types.

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WITHIN GROUP patch

2013-12-05 Thread Tom Lane
Further questions about WITHIN GROUP:

I believe that the spec requires that the direct arguments of an inverse
or hypothetical-set aggregate must not contain any Vars of the current
query level.  They don't manage to say that in plain English, of course,
but in the hypothetical set function case the behavior is defined in
terms of this sub-select:

( SELECT 0, SK1, ..., SKK
  FROM TNAME UNION ALL
  VALUES (1, VE1, ..., VEK) )

where SKn are the sort key values, TNAME is an alias for the input table,
and VEn are the direct arguments.  Any input-table Vars the aggregate
might refer to are thus in scope only for the SKn not the VEn.  (This
definition also makes it clear that the VEn are to be evaluated once,
not once per row.)  In the inverse distribution function case, they
resort to claiming that the value of the inverse distribution function
argument can be replaced by a numeric literal, which again makes it clear
that it's supposed to be evaluated just once.

So that's all fine, but the patch seems a bit confused about it.  If you
try to cheat, you get an error message that's less than apropos:

regression=# select cume_dist(f1) within group(order by f1) from text_tbl ;
ERROR:  column text_tbl.f1 must appear in the GROUP BY clause or be used in 
an aggregate function
LINE 1: select cume_dist(f1) within group(order by f1) from text_tbl...
 ^

I'd have hoped for a message along the line of fixed arguments of
cume_dist() must not contain variables, similar to the phrasing we
use when you try to put a variable in LIMIT.

Also, there are some comments and code changes in nodeAgg.c that seem
to be on the wrong page here:
  
+   /*
+* Use the representative input tuple for any references to
+* non-aggregated input columns in the qual and tlist.(If we are not
+* grouping, and there are no input rows at all, we will come here
+* with an empty firstSlot ... but if not grouping, there can't be any
+* references to non-aggregated input columns, so no problem.)
+* We do this before finalizing because for ordered set functions,
+* finalize_aggregates can evaluate arguments referencing the tuple.
+*/
+   econtext-ecxt_outertuple = firstSlot;
+ 

The last two lines of that comment are new, all the rest was moved from
someplace else.  Those last two lines are wrong according to the above
reasoning, and if they were correct the argument made in the pre-existing
part of the comment would be all wet, meaning the code could in fact crash
when trying to evaluate a Var reference in finalize_aggregates.

So I'm inclined to undo this part of the patch (and anything else that's
rearranging logic with an eye to Var references in finalize_aggregates),
and to try to fix parse_agg.c so it gives a more specific error message
for this case.

After looking at this I'm a bit less enthused about the notion of hybrid
aggregates than I was before.  I now see that the spec intends that when
the WITHIN GROUP notation is used, *all* the arguments to the left of
WITHIN GROUP are meant to be fixed evaluate-once arguments.  While we
could possibly define aggregates for which some of those argument
positions are treated as evaluate-once-per-row arguments, I'm realizing
that that'd likely be pretty darn confusing for users.

What I now think we should do about the added pg_aggregate columns is
to have:

aggnfixedargs   int number of fixed arguments, excluding any
hypothetical ones (always 0 for normal aggregates)
aggkind char'n' for normal aggregate, 'o' for ordered set function,
'h' for hypothetical-set function

with the parsing rules that given

 agg( n fixed arguments ) WITHIN GROUP ( ORDER BY k sort specifications )

1. WITHIN GROUP is disallowed for normal aggregates.

2. For an ordered set function, n must equal aggnfixedargs.  We treat all
n fixed arguments as contributing to the aggregate's result collation,
but ignore the sort arguments.

3. For a hypothetical-set function, n must equal aggnfixedargs plus k,
and we match up type and collation info of the last k fixed arguments
with the corresponding sort arguments.  The first n-k fixed arguments
contribute to the aggregate's result collation, the rest not.

Reading back over this email, I see I've gone back and forth between
using the terms direct args and fixed args for the evaluate-once
stuff to the left of WITHIN GROUP.  I guess I'm not really sold on
either terminology, but we need something we can use consistently
in the code and docs.  The spec is no help, it has no generic term
at all for these args.  Does anybody else have a preference, or
maybe another suggestion entirely?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WITHIN GROUP patch

2013-12-05 Thread David Johnston
Tom Lane-2 wrote
 Further questions about WITHIN GROUP:
 
 I believe that the spec requires that the direct arguments of an inverse
 or hypothetical-set aggregate must not contain any Vars of the current
 query level.  They don't manage to say that in plain English, of course,
 but in the 
 hypothetical set function
  case the behavior is defined in
 terms of this sub-select:
 
   ( SELECT 0, SK1, ..., SKK
 FROM TNAME UNION ALL
 VALUES (1, VE1, ..., VEK) )
 
 where SKn are the sort key values, TNAME is an alias for the input table,
 and VEn are the direct arguments.  Any input-table Vars the aggregate
 might refer to are thus in scope only for the SKn not the VEn.  (This
 definition also makes it clear that the VEn are to be evaluated once,
 not once per row.)  In the 
 inverse distribution function
  case, they
 resort to claiming that the value of the 
 inverse distribution function
 argument
  can be replaced by a numeric literal, which again makes it clear
 that it's supposed to be evaluated just once.
 
 So that's all fine, but the patch seems a bit confused about it.  If you
 try to cheat, you get an error message that's less than apropos:
 
 regression=# select cume_dist(f1) within group(order by f1) from text_tbl
 ;
 ERROR:  column text_tbl.f1 must appear in the GROUP BY clause or be used
 in an aggregate function
 LINE 1: select cume_dist(f1) within group(order by f1) from text_tbl...
  ^
 
 I'd have hoped for a message along the line of fixed arguments of
 cume_dist() must not contain variables, similar to the phrasing we
 use when you try to put a variable in LIMIT.
 
 Also, there are some comments and code changes in nodeAgg.c that seem
 to be on the wrong page here:
   
 +   /*
 +* Use the representative input tuple for any references to
 +* non-aggregated input columns in the qual and tlist.(If we
 are not
 +* grouping, and there are no input rows at all, we will come here
 +* with an empty firstSlot ... but if not grouping, there can't be
 any
 +* references to non-aggregated input columns, so no problem.)
 +* We do this before finalizing because for ordered set functions,
 +* finalize_aggregates can evaluate arguments referencing the
 tuple.
 +*/
 +   econtext-ecxt_outertuple = firstSlot;
 + 
 
 The last two lines of that comment are new, all the rest was moved from
 someplace else.  Those last two lines are wrong according to the above
 reasoning, and if they were correct the argument made in the pre-existing
 part of the comment would be all wet, meaning the code could in fact crash
 when trying to evaluate a Var reference in finalize_aggregates.
 
 So I'm inclined to undo this part of the patch (and anything else that's
 rearranging logic with an eye to Var references in finalize_aggregates),
 and to try to fix parse_agg.c so it gives a more specific error message
 for this case.

The original design references the spec as allowing a column reference if it
is a group by key:

Select cume_dist(f1) within group ( order by f1 ) from text_tbl group by f1

No example was shown where this would be useful...but nonetheless the
comment and error both presume that such is true.

I would presume the implementation would only supply rows for sorting from
the single group in question so for practical purposes the columns have a
constant value for the entirety of the evaluation and so this makes sense
and acts just like partition by on a window aggregate.

Question, are there any tests that exercise this desired behavior?  That
assuming agreement can be had that the group by behavior is indeed spec or
something custom we want to support.

David J.




--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/Re-WITHIN-GROUP-patch-tp5773851p5782041.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WITHIN GROUP patch

2013-12-05 Thread Tom Lane
David Johnston pol...@yahoo.com writes:
 The original design references the spec as allowing a column reference if it
 is a group by key:

 Select cume_dist(f1) within group ( order by f1 ) from text_tbl group by f1

 No example was shown where this would be useful...but nonetheless the
 comment and error both presume that such is true.

I can see no support for that position in the spec text, and as you say,
the usefulness is at best really debatable.  Unless somebody can present a
credible use-case, or convince me that this is indeed what the spec says,
I'd be inclined to blow this off in favor of having a more intelligible
error message.  (I think if you really did need such functionality, you
could get it with a sub-select.)

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WITHIN GROUP patch

2013-12-04 Thread Andrew Gierth
 Tom == Tom Lane t...@sss.pgh.pa.us writes:

 Tom Well, okay, but you've not said anything that wouldn't be
 Tom handled just as well by some logic that adds a fixed
 Tom integer-constant-zero flag column to the rows going into the
 Tom tuplesort.

Adding such a column unconditionally even for non-hypothetical
functions would break the optimization for sorting a single column
(which is a big deal, something like 3x speed difference, for by-value
types).

Adding it only for hypothetical set functions is making a distinction
in how functions are executed that I don't think is warranted -
imagine for example a function that calculates some measure over a
frequency distribution by adding a known set of boundary values to the
sort; this would not be a hypothetical set function in terms of
argument processing, but it would still benefit from the extra sort
column. I did not want to unnecessarily restrict such possibilities.

  It would still be overloaded in some sense because a non-hypothetical
  ordered set function could still take an arbitrary number of args
  (using variadic any) - there aren't any provided, but there's no
  good reason to disallow user-defined functions doing that - so you'd
  still need a special value like -1 for aggordnargs to handle that.

 Tom Sure.  But a -1 to indicate not applicable doesn't seem like it's
 Tom too much of a stretch.  It's the -2 business that's bothering me.
 Tom Again, that seems unnecessarily non-orthogonal --- who's to say which
 Tom functions would want to constrain the number of direct arguments and
 Tom which wouldn't?  (I wonder whether having this info in the catalogs
 Tom isn't the wrong thing anyhow, as opposed to expecting the functions
 Tom themselves to check the argument count at runtime.)

Not checking the number of arguments to a function until runtime seems
a bit on the perverse side. Having a fixed number of direct args is
the normal case (as seen from the fact that all the non-hypothetical
ordered set functions in the spec and in our patch have fixed argument
counts).

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WITHIN GROUP patch

2013-12-04 Thread Tom Lane
Andrew Gierth and...@tao11.riddles.org.uk writes:
 Tom == Tom Lane t...@sss.pgh.pa.us writes:
  Tom Well, okay, but you've not said anything that wouldn't be
  Tom handled just as well by some logic that adds a fixed
  Tom integer-constant-zero flag column to the rows going into the
  Tom tuplesort.

 Adding such a column unconditionally even for non-hypothetical
 functions would break the optimization for sorting a single column
 (which is a big deal, something like 3x speed difference, for by-value
 types).

Well, sure, but I was only suggesting adding it when the aggregate asks
for it, probably via a new flag column in pg_aggregate.  The question
you're evading is what additional functionality could be had if the
aggregate could demand a different datatype or constant value for the
flag column.

 Adding it only for hypothetical set functions is making a distinction
 in how functions are executed that I don't think is warranted -

That seems like rather a curious argument from someone who's willing to
give up the ability to specify a regular transition value concurrently
with the flag column.

But anyway, what I'm thinking right now is that these questions would all
go away if the aggregate transfunction were receiving the rows and
sticking them into the tuplestore.  It could add whatever columns it felt
like.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WITHIN GROUP patch

2013-12-04 Thread Andrew Gierth
 Tom == Tom Lane t...@sss.pgh.pa.us writes:

 Tom Well, sure, but I was only suggesting adding it when the
 Tom aggregate asks for it, probably via a new flag column in
 Tom pg_aggregate.

Sure, I was only pointing out the necessity.

 Tom The question you're evading is what additional functionality
 Tom could be had if the aggregate could demand a different datatype
 Tom or constant value for the flag column.

I don't really see a question there to answer - I simply chose to
provide a general mechanism rather than make assumptions about what
future users of the code would desire. I have no specific application
in mind that would require some other type.

  Adding it only for hypothetical set functions is making a
  distinction in how functions are executed that I don't think is
  warranted -

 Tom That seems like rather a curious argument from someone who's
 Tom willing to give up the ability to specify a regular transition
 Tom value concurrently with the flag column.

In the current patch the idea of also specifying a regular transition
value is meaningless since there is no transition function.

 Tom But anyway, what I'm thinking right now is that these questions
 Tom would all go away if the aggregate transfunction were receiving
 Tom the rows and sticking them into the tuplestore.  It could add
 Tom whatever columns it felt like.

True, but this ends up duplicating the sorting functionality of
nodeAgg that we are leveraging off in the first place. I think this
will be somewhat more intrusive and likely slower.

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WITHIN GROUP patch

2013-12-04 Thread Tom Lane
Andrew Gierth and...@tao11.riddles.org.uk writes:
 Tom == Tom Lane t...@sss.pgh.pa.us writes:
  Tom But anyway, what I'm thinking right now is that these questions
  Tom would all go away if the aggregate transfunction were receiving
  Tom the rows and sticking them into the tuplestore.  It could add
  Tom whatever columns it felt like.

 True, but this ends up duplicating the sorting functionality of
 nodeAgg that we are leveraging off in the first place. I think this
 will be somewhat more intrusive and likely slower.

Hm, it's just a refactoring of the same code we'd have to have anyway,
so I'm not seeing a reason to assume it'd be slower.  If anything,
this approach would open more opportunities for function-specific
optimizations, which in the long run could be faster.  (I'm not
claiming that any such optimizations would be in the first version.)

In hindsight I wonder if it wasn't a mistake to embed ordered-aggregate
support in nodeAgg.c the way we did.  We could have dumped that
responsibility into some sort of wrapper around specific aggregates,
with an option for some aggregates to skip the wrapper and handle it
themselves.  A trivial, and perhaps not very useful, example is that
non-order-sensitive aggregates like MIN/MAX/COUNT could have been coded
to simply ignore any ordering request.  I can't immediately think of any
examples that are compelling enough to justify such a refactoring now ---
unless it turns out to make WITHIN GROUP easier.

Anyway, I'm going to go off and look at the WITHIN GROUP patch with these
ideas in mind.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WITHIN GROUP patch

2013-12-03 Thread Tom Lane
Atri Sharma atri.j...@gmail.com writes:
 Please find attached the latest patch for WITHIN GROUP. This patch is
 after fixing the merge conflicts.

I've started to look at this patch now.  I have a couple of immediate
reactions to the catalog changes:

1. I really hate the way you've overloaded the transvalue to do something
that has approximately nothing to do with transition state (and haven't
updated catalogs.sgml to explain that, either).  Seems like it'd be
cleaner to just hardwire a bool column that distinguishes regular and
hypothetical input rows.  And why do you need aggtranssortop for that?
I fail to see the point of sorting on the flag column.

2 I also don't care for the definition of aggordnargs, which is the number
of direct arguments to an ordered set function, except when it isn't.
Rather than overloading it to be both a count and a couple of flags,
I wonder whether we shouldn't expand aggisordsetfunc to be a three-way
aggregate kind field (ordinary agg, ordered set, or hypothetical set
function).

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WITHIN GROUP patch

2013-12-03 Thread Andrew Gierth
 Tom == Tom Lane t...@sss.pgh.pa.us writes:

 Tom 1. I really hate the way you've overloaded the transvalue to do
 Tom something that has approximately nothing to do with transition
 Tom state (and haven't updated catalogs.sgml to explain that,
 Tom either).  Seems like it'd be cleaner to just hardwire a bool
 Tom column that distinguishes regular and hypothetical input rows.

The intention here is that while the provided functions all fit the
spec's idea of how inverse distribution or hypothetical set functions
work, the actual implementation mechanisms are more generally
applicable than that and a user-supplied function could well find
something else useful to do with them. Accordingly, hardcoding stuff
seemed inappropriate.

 Tom And why do you need aggtranssortop for that?  I fail to see the
 Tom point of sorting on the flag column.

It is convenient to the implementation to be able to rely on
encountering the hypothetical row deterministically before (or in some
cases after, as in cume_dist) its peers in the remainder of the sort
order. Removing that sort would make the results of the functions
incorrect.

There should probably be some comments about that. oops.

 Tom 2 I also don't care for the definition of aggordnargs, which is
 Tom the number of direct arguments to an ordered set function,
 Tom except when it isn't.  Rather than overloading it to be both a
 Tom count and a couple of flags, I wonder whether we shouldn't
 Tom expand aggisordsetfunc to be a three-way aggregate kind field
 Tom (ordinary agg, ordered set, or hypothetical set function).

It would still be overloaded in some sense because a non-hypothetical
ordered set function could still take an arbitrary number of args
(using variadic any) - there aren't any provided, but there's no
good reason to disallow user-defined functions doing that - so you'd
still need a special value like -1 for aggordnargs to handle that.

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WITHIN GROUP patch

2013-12-03 Thread Tom Lane
Andrew Gierth and...@tao11.riddles.org.uk writes:
 Tom == Tom Lane t...@sss.pgh.pa.us writes:
  Tom 1. I really hate the way you've overloaded the transvalue to do
  Tom something that has approximately nothing to do with transition
  Tom state (and haven't updated catalogs.sgml to explain that,
  Tom either).  Seems like it'd be cleaner to just hardwire a bool
  Tom column that distinguishes regular and hypothetical input rows.

 The intention here is that while the provided functions all fit the
 spec's idea of how inverse distribution or hypothetical set functions
 work, the actual implementation mechanisms are more generally
 applicable than that and a user-supplied function could well find
 something else useful to do with them. Accordingly, hardcoding stuff
 seemed inappropriate.

  Tom And why do you need aggtranssortop for that?  I fail to see the
  Tom point of sorting on the flag column.

 It is convenient to the implementation to be able to rely on
 encountering the hypothetical row deterministically before (or in some
 cases after, as in cume_dist) its peers in the remainder of the sort
 order. Removing that sort would make the results of the functions
 incorrect.

Well, okay, but you've not said anything that wouldn't be handled just as
well by some logic that adds a fixed integer-constant-zero flag column to
the rows going into the tuplesort.  The function-specific code could then
inject hypothetical row(s) with other values, eg 1 or -1, to get the
results you describe.  If there's any flexibility improvement from
allowing a different constant value than zero, or a different datatype
than integer, I don't see what it'd be.

On the other side of the coin, repurposing the transition-value catalog
infrastructure to do this totally different thing is confusing.  And what
if someday somebody has a use for a regular transition value along with
this stuff?  What you've done is unorthogonal for no very good reason.

(Actually, I'm wondering whether it wouldn't be better if the tuplesort
object *were* the transition value, and were managed primarily by the
aggregate function's code; in particular expecting the agg's transition
function to insert rows into the tuplesort object.  We could provide
helper functions to largely negate any duplication-of-code objections,
I'd think.  In this view the WITHIN GROUP columns aren't so different from
regular aggregate arguments, but there'd need to be some way for the
aggregate function to get hold of the sorting-semantics decoration on
them.)

  Tom 2 I also don't care for the definition of aggordnargs, which is
  Tom the number of direct arguments to an ordered set function,
  Tom except when it isn't.  Rather than overloading it to be both a
  Tom count and a couple of flags, I wonder whether we shouldn't
  Tom expand aggisordsetfunc to be a three-way aggregate kind field
  Tom (ordinary agg, ordered set, or hypothetical set function).

 It would still be overloaded in some sense because a non-hypothetical
 ordered set function could still take an arbitrary number of args
 (using variadic any) - there aren't any provided, but there's no
 good reason to disallow user-defined functions doing that - so you'd
 still need a special value like -1 for aggordnargs to handle that.

Sure.  But a -1 to indicate not applicable doesn't seem like it's
too much of a stretch.  It's the -2 business that's bothering me.
Again, that seems unnecessarily non-orthogonal --- who's to say which
functions would want to constrain the number of direct arguments and
which wouldn't?  (I wonder whether having this info in the catalogs
isn't the wrong thing anyhow, as opposed to expecting the functions
themselves to check the argument count at runtime.)

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WITHIN GROUP patch

2013-11-21 Thread Andrew Gierth
 Vik == Vik Fearing vik.fear...@dalibo.com writes:

 Vik I certainly want it.  I do not have a copy of the SQL standard,
 Vik but I have full faith in the Andrew Gierth's claim that this is
 Vik part of it.

For reference, this is how I believe it matches up against the spec
(I'm working from the 2008 final):

10.9 aggregate function:

  hypothetical set function is intended to be implemented in this
  patch exactly as per spec.

  inverse distribution function: the spec defines two of these,
  PERCENTILE_CONT and PERCENTILE_DISC:

  PERCENTILE_CONT is defined in the spec for numeric types, in which
  case it returns an approximate numeric result, and for interval, in
  which case it returns interval. Our patch defines percentile_cont
  functions for float8 and interval input types, relying on implicit
  casting to float8 to handle other numeric input types.

  As an extension to the spec, we define a percentile_cont function
  that returns an array of percentile values in one call, as suggested
  by some users.

  PERCENTILE_DISC is defined in the spec for the same types as _CONT.
  Our version on the other hand accepts any type which can be sorted,
  and returns the same type. This does mean that our version may return
  an exact numeric type rather than an approximate one, so this is a
  possible slight deviation from the spec.

  i.e. our  percentile_disc(float8) within group (order by bigint)
  returns a bigint, while the spec seems to imply it should return an
  approximate numeric type (i.e. float*).

  Again, we additionally provide an array version which is not in the
  spec.

  mode() is not in the spec, we just threw it in because it was easy.

6.10 window function

  The spec says that hypothetical set function is not allowed as a
  window function.

  The spec does not forbid other ordered set functions in a window
  function call, but we have NOT attempted to implement this (largely
  for the same reasons that DISTINCT and ORDER BY are not implemented
  for aggregates as window functions).

Conformance:  all the relevant features are parts of T612, Advanced
OLAP Operations, which we already list in the docs on the unsupported
list with the comment some forms supported. Maybe that could be
changed now to most forms supported, but that's a subjective call
(and one we didn't really consider doing in this patch).

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WITHIN GROUP patch

2013-11-21 Thread Vik Fearing
On 11/21/2013 11:04 AM, Atri Sharma wrote:
 Please find attached the latest patch for WITHIN GROUP. This patch is
 after fixing the merge conflicts.

I have spent quite some time on this and the previous versions.  Here is
my review, following the questions on the wiki.

This patch is in context diff format and applies cleanly to today's
master.  It contains several regression tests and for the most part,
good documentation.  I would like to see at least one example of using
each of the two types of function (hypothetical and inverted
distribution) in section 4.2.8.

This patch implements what it says it does.  We don't already have a way
to get these results without this patch that I know of, and I think we
do want it.  I certainly want it.  I do not have a copy of the SQL
standard, but I have full faith in the Andrew Gierth's claim that this
is part of it.  Even if not, implementation details were brought up
during design and agreed upon by this list[1].  I don't see how anything
here could be dangerous.  The custom ordered set functions I made
correctly passed a round-trip through dump/restore.

The code compiles without warning.  All of the clean tests I did worked
as expected, and all of the dirty tests failed elegantly.

I did not find any corner cases, and I looked in as many corners as I
could think of.  I didn't manage to trigger any assertion failures and I
didn't crash it.

I found no noticeable issues with performance, either directly or as
side effects.

I am not the most competent with code review so I'll be relying on
further review by another reviewer or the final committer.  The patch
fixed the project comments/messages style issues I raised in my previous
review.  I found the code comments lacking in some places (like
inversedistribution.c:mode_final for example) but I can't say if the
really is too terse, or if it's just me.  On the other hand, I thought
the explanation blocks in the code comments were adequately descriptive.

There is some room for improvement in future versions.  The query select
mode() within group (order by x) over (partition by y) from ... should
be valid but isn't.  This was explained by Andrew on IRC as being
non-trivial: specifically, we implemented WITHIN GROUP by repurposing
the infrastructure already present for agg(distinct ...) and agg(x order
by y) which are also not yet supported for
aggregate-as-window-function.  I assume then that evolution on one side
will benefit the other.

All in all, I believe this is ready for committer.

[1]
http://www.postgresql.org/message-id/2b8b55b8ba82f83ef4e6070b95fb0cd0%40news-out.riddles.org.uk

-- 
Vik



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WITHIN GROUP patch

2013-11-17 Thread Peter Eisentraut
On Fri, 2013-11-15 at 00:05 +0530, Atri Sharma wrote:
 Please find the latest version of the patch. This version fixes the
 issues pointed out by the reviewer and the divide by zero bug in
 percent_rank function. This version also adds a regression test for
 the divide by zero case in percent_rank.

This patch doesn't apply.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WITHIN GROUP patch

2013-11-04 Thread Vik Fearing
On 11/04/2013 08:43 AM, Atri Sharma wrote:
 Please find attached our latest version of the patch. This version
 fixes the issues pointed out by the reviewers.

No, it doesn't.  The documentation still contains formatting and
grammatical errors, and the code comments still do not match the their
surroundings.  (The use of I in the code comments is a point I have
conceded on IRC, but I stand by my other remarks.)

Don't bother submitting a new patch until I've posted my technical
review, but please fix these issues on your local copy.

-- 
Vik



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WITHIN GROUP patch

2013-10-15 Thread Vik Fearing
On 10/09/2013 04:19 PM, Pavel Stehule wrote:

 I checked a conformance with ANSI SQL - and I didn't find any issue.

 I found so following error message is not too friendly (mainly
 because this functionality will be new)

 postgres=# select dense_rank(3,3,2) within group (order by num
 desc, odd) from test4;
 ERROR:  Incorrect number of arguments for hypothetical set function
 LINE 1: select dense_rank(3,3,2) within group (order by num desc,
 od...
^
 postgres=# select dense_rank(3,3,2) within group (order by num
 desc) from test4;
 ERROR:  Incorrect number of arguments for hypothetical set function
 LINE 1: select dense_rank(3,3,2) within group (order by num desc)
 fr...
^
 postgres=# select dense_rank(3,3) within group (order by num desc)
 from test4;
 ERROR:  Incorrect number of arguments for hypothetical set function
 LINE 1: select dense_rank(3,3) within group (order by num desc)
 from...
^
 postgres=# select dense_rank(3,3) within group (order by num desc,
 num) from test4;
  dense_rank
 
   3
 (1 row)

 Probably some hint should be there?



In addition to Pavel's review, I have finally finished reading the
patch. Here are some notes, mainly on style:

First of all, it no longer compiles on HEAD because commit
4d212bac1752e1bad6f3aa6242061c393ae93a0a stole oid 3968.  I modified
that locally to be able to continue my review.

Some of the error messages do not comply with project style.  That is,
they begin with a capital letter.

Ordered set functions cannot have transition functions
Ordered set functions must have final functions
Invalid argument types for hypothetical set function
Invalid argument types for ordered set function
Incompatible change to aggregate definition
Too many arguments to ordered set function
Ordered set finalfns must not be strict
Cannot have multiple ORDER BY clauses with WITHIN GROUP
Cannot have DISTINCT and WITHIN GROUP together
Incorrect number of arguments for hypothetical set function
Incorrect number of direct arguments to ordered set function %s

And in pg_aggregate.c I found a comment with a similar problem that
doesn't match its surrounding code:
Oidtranssortop = InvalidOid;  /* Can be omitted */

I didn't find any more examples like that, but I did see several block
comments that weren't complete sentences whereas I think they should
be.  Also a lot of the code comments say I and I don't recall seeing
that elsewhere.  I may be wrong, but I would prefer if they were more
neutral.

The documentation has a number of issues.

collateindex.pl complains of duplicated index entries for PERCENTILE
CONTINUOUS and PERCENTILE DISCRETE.  This is because the index markup is
used for both overloaded versions.  This is the same mistake Bruce made
and then corrected in commit 5dcc48c2c76cf4b2b17c8e14fe3e588ae0c8eff3.

if there are multiple equally good result should have an s on the end
in func.sgml.

Table 9-49 has an extra empty column.  That should either be removed, or
filled in with some kind of comment text like other similar tables.

Apart from that, it looks good.  There is some mismatched coding styles
in there but the next run of pgindent should catch them so it's no big deal.

I haven't yet exercised the actual functionality of the new functions,
nor have I tried to create my own.  Andrew alerted me to a division by
zero bug in one of them, so I'll be looking forward to catching that.

So, more review to come.

-- 
Vik



Re: [HACKERS] WITHIN GROUP patch

2013-10-11 Thread Pavel Stehule
2013/10/11 Andrew Gierth and...@tao11.riddles.org.uk

  Pavel == Pavel Stehule pavel.steh...@gmail.com writes:

   I found so following error message is not too friendly (mainly because
   this functionality will be new)
  
   postgres=# select dense_rank(3,3,2) within group (order by num desc,
 odd)
   from test4;
   ERROR:  Incorrect number of arguments for hypothetical set function
   LINE 1: select dense_rank(3,3,2) within group (order by num desc, od...
   ^
  
   Probably some hint should be there?

 Hm, yeah.

 Anyone have ideas for better wording for the original error message,
 or a DETAIL or HINT addition? The key point here is that the number of
 _hypothetical_ arguments and the number of ordered columns must match,
 but we don't currently exclude the possibility that a user-defined
 hypothetical set function might have additional non-hypothetical args.

 The first alternative that springs to mind is:

 ERROR: Incorrect number of arguments for hypothetical set function
 DETAIL: Number of hypothetical arguments (3) must equal number of ordered
 columns (2)

 +1

Pavel

 --
 Andrew (irc:RhodiumToad)



Re: [HACKERS] WITHIN GROUP patch

2013-10-11 Thread Robert Haas
On Thu, Oct 10, 2013 at 10:35 PM, Andrew Gierth
and...@tao11.riddles.org.uk wrote:
 The first alternative that springs to mind is:

 ERROR: Incorrect number of arguments for hypothetical set function
 DETAIL: Number of hypothetical arguments (3) must equal number of ordered 
 columns (2)

I'd suggest trying to collapse that down into a single message; the
DETAIL largely recapitulates the original error message.  Also, I'd
suggest identifying the name of the function, since people may not be
able to identify that easily based just on the fact that it's a
hypothetical set function.

Here's one idea, with two variants depending on the direction of the inequality:

ERROR: function %s has %d arguments but only %d ordering columns
ERROR: function %s has %d ordering columns but only %d arguments

Or else leave out the actual numbers and just state the principle, but
identifying the exact function at fault, e.g.

ERROR: number of arguments to function %s does not match number of
ordering columns

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WITHIN GROUP patch

2013-10-10 Thread Andrew Gierth
 Pavel == Pavel Stehule pavel.steh...@gmail.com writes:

  I found so following error message is not too friendly (mainly because
  this functionality will be new)
  
  postgres=# select dense_rank(3,3,2) within group (order by num desc, odd)
  from test4;
  ERROR:  Incorrect number of arguments for hypothetical set function
  LINE 1: select dense_rank(3,3,2) within group (order by num desc, od...
  ^
  
  Probably some hint should be there?

Hm, yeah.

Anyone have ideas for better wording for the original error message,
or a DETAIL or HINT addition? The key point here is that the number of
_hypothetical_ arguments and the number of ordered columns must match,
but we don't currently exclude the possibility that a user-defined
hypothetical set function might have additional non-hypothetical args.

The first alternative that springs to mind is:

ERROR: Incorrect number of arguments for hypothetical set function
DETAIL: Number of hypothetical arguments (3) must equal number of ordered 
columns (2)

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WITHIN GROUP patch

2013-10-09 Thread Pavel Stehule
2013/10/9 Pavel Stehule pavel.steh...@gmail.com

 Hello

 I checked a conformance with ANSI SQL - and I didn't find any issue.

 I found so following error message is not too friendly (mainly because
 this functionality will be new)

 postgres=# select dense_rank(3,3,2) within group (order by num desc, odd)
 from test4;
 ERROR:  Incorrect number of arguments for hypothetical set function
 LINE 1: select dense_rank(3,3,2) within group (order by num desc, od...
^
 postgres=# select dense_rank(3,3,2) within group (order by num desc) from
 test4;
 ERROR:  Incorrect number of arguments for hypothetical set function
 LINE 1: select dense_rank(3,3,2) within group (order by num desc) fr...
^
 postgres=# select dense_rank(3,3) within group (order by num desc) from
 test4;
 ERROR:  Incorrect number of arguments for hypothetical set function
 LINE 1: select dense_rank(3,3) within group (order by num desc) from...
^
 postgres=# select dense_rank(3,3) within group (order by num desc, num)
 from test4;
  dense_rank
 
   3
 (1 row)

 Probably some hint should be there?

 Regards

 Pavel


 2013/10/2 Vik Fearing vik.fear...@dalibo.com

 On 09/30/2013 06:34 PM, Pavel Stehule wrote:
 
  I looked on this patch - it is one from long patches - so I propose to
  divide review to a few parts:
 
  a) a conformance with ANSI SQL
  b) check of new aggregates - semantic, implementation
  c) source code checking - usual patch review
 
  Now I would to work on @a

 I had an unexpected emergency come up, sorry about that.  I plan on
 doing B and C starting on Thursday (October 3).

 I am grateful to have Pavel's help, this is a big patch.

 --
 Vik