[HACKERS] Re: Abbreviated keys for Datum tuplesort (was: Re: B-Tree support function number 3 (strxfrm() optimization))
On Wed, May 13, 2015 at 2:44 PM, Peter Geoghegan wrote: > On Wed, May 13, 2015 at 11:40 AM, Robert Haas wrote: >> Committed. Thanks for the patch and your patience. > > This comment was not updated: > > /* > * The sortKeys variable is used by every case other than the datum and > * hash index cases; it is set by tuplesort_begin_xxx. tupDesc is only > * used by the MinimalTuple and CLUSTER routines, though. > */ > TupleDesc tupDesc; > SortSupport sortKeys; /* array of length nKeys */ Oops. Fixed. Thanks. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: Abbreviated keys for Datum tuplesort (was: Re: B-Tree support function number 3 (strxfrm() optimization))
On Wed, May 13, 2015 at 11:40 AM, Robert Haas wrote: > Committed. Thanks for the patch and your patience. This comment was not updated: /* * The sortKeys variable is used by every case other than the datum and * hash index cases; it is set by tuplesort_begin_xxx. tupDesc is only * used by the MinimalTuple and CLUSTER routines, though. */ TupleDesc tupDesc; SortSupport sortKeys; /* array of length nKeys */ -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: Abbreviated keys for Datum tuplesort (was: Re: B-Tree support function number 3 (strxfrm() optimization))
On Fri, Jan 23, 2015 at 4:13 PM, Andrew Gierth wrote: > [pruning the Cc: list and starting a new thread] > > Here's the cleaned-up version of the patch to allow abbreviated keys > when sorting a single Datum. This also removes comments that suggest > that the caller of tuplesort_begin_datum should ever have to care > about the abbreviated key optimization. > > I'll add this to the CF. Committed. Thanks for the patch and your patience. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Abbreviated keys for Datum tuplesort
On Fri, Apr 3, 2015 at 1:17 PM, Stephen Frost wrote: >> but even I'm not willing to >> expend the amount of ink and emotional energy you have on whether a >> variable that holds +1, 0, or -1 ought to be declared as "int" or >> "int32". Does it matter? Yeah. Is it worth this much argument? No. > > My only comment will be on this very minor aspect (and I'm quite > agnostic as to what is decided, particularly as I haven't read the patch > at all), but, should we consider an enum (generically) for such cases? > If that's truly the extent of possible values, and anything else is an > error, then at least if I was writing DDL to support this, I'd use an > enum, maybe a domain, or a CHECK constraint (though I'd likely feel > better about the enum or domain approach). It isn't the case that an enum can support it. Some comparators will return -5 rather than -1 (as with C-style comparators in general, sometimes comparisons can be implemented using subtraction and things like that). -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Abbreviated keys for Datum tuplesort
* Robert Haas (robertmh...@gmail.com) wrote: > I'm about as much > of a stickler for the details as you will find on this mailing list, > or possibly, in the observable universe, This made me laugh. :) > but even I'm not willing to > expend the amount of ink and emotional energy you have on whether a > variable that holds +1, 0, or -1 ought to be declared as "int" or > "int32". Does it matter? Yeah. Is it worth this much argument? No. My only comment will be on this very minor aspect (and I'm quite agnostic as to what is decided, particularly as I haven't read the patch at all), but, should we consider an enum (generically) for such cases? If that's truly the extent of possible values, and anything else is an error, then at least if I was writing DDL to support this, I'd use an enum, maybe a domain, or a CHECK constraint (though I'd likely feel better about the enum or domain approach). Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] Re: Abbreviated keys for Datum tuplesort
On Fri, Apr 3, 2015 at 1:07 PM, Robert Haas wrote: > I'm about as much > of a stickler for the details as you will find on this mailing list, > or possibly, in the observable universe, but even I'm not willing to > expend the amount of ink and emotional energy you have on whether a > variable that holds +1, 0, or -1 ought to be declared as "int" or > "int32". Does it matter? Yeah. Is it worth this much argument? No. I haven't really spent very much time arguing about this point at all, and intend to spend no more time on it. It's up to you. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Abbreviated keys for Datum tuplesort
On Thu, Apr 2, 2015 at 7:02 PM, Peter Geoghegan wrote: > On Thu, Apr 2, 2015 at 11:21 PM, Robert Haas wrote: >>> The changes that Andrew >>> took issue with are utterly insignificant. >> >> Great. Then you will be utterly indifferent to which version gets committed. > > *shrug* > > You were the one that taught me to be bureaucratically minded about > keeping code consistent at this fine a level. I think it's odd that > you of all people are opposing me on this point, but whatever. Sure, consistency is important. But sometimes there is more than one thing that you can choose to be consistent with. IIUC, you're complaining because somebody assigned the return value of a function to a variable whose type matches the function's return type, rather than assigning it to a variable of the same mismatching type used in parallel code elsewhere. Which form of consistency to aim for in such cases is fundamentally a judgement call. I'll have another look over the patch and maybe I'll come around to your point of view, but you don't seem very willing to concede the point that intelligent people could disagree over what is most consistent here. I'm about as much of a stickler for the details as you will find on this mailing list, or possibly, in the observable universe, but even I'm not willing to expend the amount of ink and emotional energy you have on whether a variable that holds +1, 0, or -1 ought to be declared as "int" or "int32". Does it matter? Yeah. Is it worth this much argument? No. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Abbreviated keys for Datum tuplesort
On Thu, Apr 2, 2015 at 11:21 PM, Robert Haas wrote: > On Thu, Apr 2, 2015 at 5:34 PM, Peter Geoghegan wrote: >> I think it's explained by the pre-check for sorted input making the >> number of comparisons exactly n -1. As I pointed out to Tomas, if you >> put a single, solitary unsorted element at the end, the abbreviated >> version is then 8x faster (maybe that was in relation to a slightly >> different case, but the pattern is the same). So this case isn't an >> argument against datum abbreviation, or even abbreviation in general, >> but rather an argument against using strxfrm() in general (which for >> example the GCC docs strongly recommend for sorting lists of strings). >> It's a bad argument, IMV. This sort is already extremely fast. > > OK, I see. Wait. It's also due to the fact that some cases are spuriously aborted, because the cost model for text needs to be fixed, I believe. It's not clear that this is actually such a case from Tomas' remarks, because the smaller scale tested *did* win significantly, and that appeared to be otherwise comparable to the regressed cases. So although what I describe here about perfectly pre-sorted input is something I've seen (and something that we discussed on another thread IIRC), this is not an example of it. Again, this is just an example of why we need to fix the cost model for text. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Abbreviated keys for Datum tuplesort
On Thu, Apr 2, 2015 at 11:21 PM, Robert Haas wrote: >> The changes that Andrew >> took issue with are utterly insignificant. > > Great. Then you will be utterly indifferent to which version gets committed. *shrug* You were the one that taught me to be bureaucratically minded about keeping code consistent at this fine a level. I think it's odd that you of all people are opposing me on this point, but whatever. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Abbreviated keys for Datum tuplesort
On Thu, Apr 2, 2015 at 5:34 PM, Peter Geoghegan wrote: > I think it's explained by the pre-check for sorted input making the > number of comparisons exactly n -1. As I pointed out to Tomas, if you > put a single, solitary unsorted element at the end, the abbreviated > version is then 8x faster (maybe that was in relation to a slightly > different case, but the pattern is the same). So this case isn't an > argument against datum abbreviation, or even abbreviation in general, > but rather an argument against using strxfrm() in general (which for > example the GCC docs strongly recommend for sorting lists of strings). > It's a bad argument, IMV. This sort is already extremely fast. OK, I see. > The changes that Andrew > took issue with are utterly insignificant. Great. Then you will be utterly indifferent to which version gets committed. > Also, the changes that Andrew didn't mention are clearly appropriate. > In particular, the comments on the SortKeys variable being used by > every case except the hash case and datum case should still be updated > to reflect that that's only true for the hash case now. On that point, I agree. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Abbreviated keys for Datum tuplesort
On Thu, Apr 2, 2015 at 8:17 PM, Robert Haas wrote: > On Fri, Feb 20, 2015 at 3:57 PM, Tomas Vondra > wrote: >> I've spent a fair amount of testing this today, and when using the >> simple percentile_disc example mentioned above, I see this pattern: >> >> master patched speedup >>- >> generate_series(1,100) 4.2 0.7 6 >> generate_series(1,200) 9.2 9.8 0.93 >> generate_series(1,300) 14.5 15.3 0.95 >> >> >> so for a small dataset the speedup is very nice, but for larger sets >> there's ~5% slowdown. Is this expected? I think it's explained by the pre-check for sorted input making the number of comparisons exactly n -1. As I pointed out to Tomas, if you put a single, solitary unsorted element at the end, the abbreviated version is then 8x faster (maybe that was in relation to a slightly different case, but the pattern is the same). So this case isn't an argument against datum abbreviation, or even abbreviation in general, but rather an argument against using strxfrm() in general (which for example the GCC docs strongly recommend for sorting lists of strings). It's a bad argument, IMV. This sort is already extremely fast. BTW, Corey Huinker (who I believe you also spoke to during pgConf.US) reported that his problematic CREATE INDEX case was 12x faster with 9.5. That used tapesort. That was also perfectly pre-sorted, but since it was using tapesort and abbreviated there was a huge improvement. > I had a look at this patch today with a view to committing it, but it > seems that nobody's commented on this point, which seems like an > important one. Any thoughts? > > For what it's worth, and without wishing to provoke another flamewar, > I am inclined to use Andrew's version of this patch rather than > Peter's. I have not undertaken an exhaustive comparison, nor do I > intend to. It is the reviewer's responsibility to justify the changes > they think the author needs to make, and that wasn't done here. On > the points of difference Andrew highlighted, I think his version is > fine. The justification is that Andrew's version had arbitrary differences with the existing code, in particular in the datum comparator, which was different to all other cases in the way that Andrew pointed out. Overall, there were only tiny differences between the two versions. I see no reason to not match the existing style. The changes that Andrew took issue with are utterly insignificant. Also, the changes that Andrew didn't mention are clearly appropriate. In particular, the comments on the SortKeys variable being used by every case except the hash case and datum case should still be updated to reflect that that's only true for the hash case now. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Abbreviated keys for Datum tuplesort
On Fri, Feb 20, 2015 at 3:57 PM, Tomas Vondra wrote: > I've spent a fair amount of testing this today, and when using the > simple percentile_disc example mentioned above, I see this pattern: > > master patched speedup >- > generate_series(1,100) 4.2 0.7 6 > generate_series(1,200) 9.2 9.8 0.93 > generate_series(1,300) 14.5 15.3 0.95 > > > so for a small dataset the speedup is very nice, but for larger sets > there's ~5% slowdown. Is this expected? I had a look at this patch today with a view to committing it, but it seems that nobody's commented on this point, which seems like an important one. Any thoughts? For what it's worth, and without wishing to provoke another flamewar, I am inclined to use Andrew's version of this patch rather than Peter's. I have not undertaken an exhaustive comparison, nor do I intend to. It is the reviewer's responsibility to justify the changes they think the author needs to make, and that wasn't done here. On the points of difference Andrew highlighted, I think his version is fine. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Abbreviated keys for Datum tuplesort
On Fri, Mar 13, 2015 at 7:51 PM, Andrew Gierth wrote: > Since ApplySortComparator returns int, and "compare" is used to store > the return value of comparetup_datum which is also declared int, this > seems inappropriate even as a "stylistic tweak". Consistency matters. That change, and the other changes to the structure of comparetup_datum() makes it match the other comparetup_* cases more closely. I don't think it's a big deal, and I'm surprised you're calling it out specifically. > Also, your changes to the block comment for SortTuple now hide the fact > that datum1 is potentially the abbreviated value in tuple as well as > single-Datum cases. Here are the versions for comparison (mine is > first): I don't think that I've failed to convey that, even just based on the diff you included. My remarks apply to the new case, datum sorting, where there is a new convention that must consider whether the type is pass-by-value or pass-by-reference. Surely if it's not clear that abbreviation is used for the heap tuple case (I think that's what you mean), then that's a problem with the extant code in the master branch that has nothing to do with this. The whole point of this patch is that virtually every reasonable case for abbreviation is now supported. That uniformity clarifies things. So of course sortSupport (and abbreviation) is supported by every case other than the hash case, where it clearly cannot apply. We just needed to add a note on what the datum sort case does with pass-by-value/past-by-reference-ness with respect to abbreviation, alongside the existing discussion of that, since that becomes a special case. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Abbreviated keys for Datum tuplesort
> "Peter" == Peter Geoghegan writes: Peter> I attach a slightly tweaked version of Andrew's original. You changed this: static int comparetup_datum(const SortTuple *a, const SortTuple *b, Tuplesortstate *state) { -int compare; +int32 compare; compare = ApplySortComparator(a->datum1, a->isnull1, Since ApplySortComparator returns int, and "compare" is used to store the return value of comparetup_datum which is also declared int, this seems inappropriate even as a "stylistic tweak". Also, your changes to the block comment for SortTuple now hide the fact that datum1 is potentially the abbreviated value in tuple as well as single-Datum cases. Here are the versions for comparison (mine is first): *** *** 136,175 /* * The objects we actually sort are SortTuple structs. These contain * a pointer to the tuple proper (might be a MinimalTuple or IndexTuple), * which is a separate palloc chunk --- we assume it is just one chunk and * can be freed by a simple pfree(). SortTuples also contain the tuple's * first key column in Datum/nullflag format, and an index integer. * * Storing the first key column lets us save heap_getattr or index_getattr * calls during tuple comparisons. We could extract and save all the key * columns not just the first, but this would increase code complexity and * overhead, and wouldn't actually save any comparison cycles in the common * case where the first key determines the comparison result. Note that * for a pass-by-reference datatype, datum1 points into the "tuple" storage. * - * There is one special case: when the sort support infrastructure provides an - * "abbreviated key" representation, where the key is (typically) a pass by - * value proxy for a pass by reference type. In this case, the abbreviated key - * is stored in datum1 in place of the actual first key column. - * * When sorting single Datums, the data value is represented directly by ! * datum1/isnull1 for pass by value types (or null values). If the datatype is ! * pass-by-reference and isnull1 is false, then "tuple" points to a separately ! * palloc'd data value, otherwise "tuple" is NULL. The value of datum1 is then ! * either the same pointer as "tuple", or is an abbreviated key value as ! * described above. Accordingly, "tuple" is always used in preference to ! * datum1 as the authoritative value for pass-by-reference cases. * * While building initial runs, tupindex holds the tuple's run number. During * merge passes, we re-use it to hold the input tape number that each tuple in * the heap was read from, or to hold the index of the next tuple pre-read * from the same tape in the case of pre-read entries. tupindex goes unused * if the sort occurs entirely in memory. */ typedef struct { void *tuple; /* the tuple proper */ Datum datum1; /* value of first key column */ boolisnull1;/* is first key column NULL? */ int tupindex; /* see notes above */ } SortTuple; --- 136,170 /* * The objects we actually sort are SortTuple structs. These contain * a pointer to the tuple proper (might be a MinimalTuple or IndexTuple), * which is a separate palloc chunk --- we assume it is just one chunk and * can be freed by a simple pfree(). SortTuples also contain the tuple's * first key column in Datum/nullflag format, and an index integer. * * Storing the first key column lets us save heap_getattr or index_getattr * calls during tuple comparisons. We could extract and save all the key * columns not just the first, but this would increase code complexity and * overhead, and wouldn't actually save any comparison cycles in the common * case where the first key determines the comparison result. Note that * for a pass-by-reference datatype, datum1 points into the "tuple" storage. * * When sorting single Datums, the data value is represented directly by ! * datum1/isnull1. If the datatype is pass-by-reference and isnull1 is false, ! * then datum1 points to a separately palloc'd data value that is also pointed ! * to by the "tuple" pointer; otherwise "tuple" is NULL. There is one special ! * case: when the sort support infrastructure provides an "abbreviated key" ! * representation, where the key is (typically) a pass by value proxy for a ! * pass by reference type. * * While building initial runs, tupindex holds the tuple's run number. During * merge passes, we re-use it to hold the input tape number that each tuple in * the heap was read from, or to hold the index of the next tuple pre-read * from the same tape in the case of pre-read entries. tupindex goes unused * if the sort occurs entirely in memory. */ typedef struct { void *tuple; /* the tuple proper */ Datum datum1; /* value of
Re: [HACKERS] Re: Abbreviated keys for Datum tuplesort
On Sun, Jan 25, 2015 at 3:15 AM, Andrew Gierth wrote: > Robert> I think this is a good idea. Do you have a test case that > Robert> shows the benefit? > > The best test case for datum sort performance is to use percentile_disc, > since that has almost no overhead beyond performing the actual sort. I attach a slightly tweaked version of Andrew's original. This revision makes the reverted comments within orderedsetaggs.c consistent with back branches (no need to call abbreviation out as an interesting special case anymore, just as in the back branches, where abbreviation doesn't even exist). Better to keep those consistent for backpatching purposes. Also, I've changed back tuplesort.c header comments to how they were back in November and until recently, to reflect that now it really is the case that only the hash index case doesn't have the "sortKeys" field reliably set. We now need to set "sortKeys" for the datum case, so don't say that we don't...we need to worry about the applicability of the onlyKey optimization for the datum sort case now, too. Other than that, there are a number of minor stylistic tweaks. The datum case does not support pass-by-value abbreviation, which could be useful in theory (e.g., abbreviation of float8 values, which was briefly discussed before). This isn't worth calling out as a special case in the tuplesort header comments IMV; there is now a brief note on this added to tuplesort_begin_datum(). We still support abbreviation for pass-by-value types for non-datumsort cases (there is of course no known example of opclass abbreviation support for a pass-by-value type, so this is only a theoretical deficiency). I've marked this "ready for committer". Thanks -- Peter Geoghegan diff --git a/src/backend/executor/nodeAgg.c b/src/backend/executor/nodeAgg.c index 9ff0eff..ee13fc0 100644 --- a/src/backend/executor/nodeAgg.c +++ b/src/backend/executor/nodeAgg.c @@ -363,10 +363,6 @@ initialize_aggregates(AggState *aggstate, * We use a plain Datum sorter when there's a single input column; * otherwise sort the full tuple. (See comments for * process_ordered_aggregate_single.) - * - * In the future, we should consider forcing the - * tuplesort_begin_heap() case when the abbreviated key - * optimization can thereby be used, even when numInputs is 1. */ peraggstate->sortstate = (peraggstate->numInputs == 1) ? diff --git a/src/backend/utils/adt/orderedsetaggs.c b/src/backend/utils/adt/orderedsetaggs.c index f9a5f7f..869a83b 100644 --- a/src/backend/utils/adt/orderedsetaggs.c +++ b/src/backend/utils/adt/orderedsetaggs.c @@ -266,13 +266,7 @@ ordered_set_startup(FunctionCallInfo fcinfo, bool use_tuples) osastate->qstate = qstate; osastate->gcontext = gcontext; - /* - * Initialize tuplesort object. - * - * In the future, we should consider forcing the tuplesort_begin_heap() - * case when the abbreviated key optimization can thereby be used, even - * when !use_tuples. - */ + /* Initialize tuplesort object */ if (use_tuples) osastate->sortstate = tuplesort_begin_heap(qstate->tupdesc, qstate->numSortCols, diff --git a/src/backend/utils/sort/tuplesort.c b/src/backend/utils/sort/tuplesort.c index 8ad5635..92f0355 100644 --- a/src/backend/utils/sort/tuplesort.c +++ b/src/backend/utils/sort/tuplesort.c @@ -336,9 +336,9 @@ struct Tuplesortstate bool markpos_eof; /* saved "eof_reached" */ /* - * The sortKeys variable is used by every case other than the datum and - * hash index cases; it is set by tuplesort_begin_xxx. tupDesc is only - * used by the MinimalTuple and CLUSTER routines, though. + * The sortKeys variable is used by every case other than the hash index + * case; it is set by tuplesort_begin_xxx. tupDesc is only used by the + * MinimalTuple and CLUSTER routines, though. */ TupleDesc tupDesc; SortSupport sortKeys; /* array of length nKeys */ @@ -901,29 +901,41 @@ tuplesort_begin_datum(Oid datumType, Oid sortOperator, Oid sortCollation, state->copytup = copytup_datum; state->writetup = writetup_datum; state->readtup = readtup_datum; + state->abbrevNext = 10; state->datumType = datumType; + /* lookup necessary attributes of the datum type */ + get_typlenbyval(datumType, &typlen, &typbyval); + state->datumTypeLen = typlen; + state->datumTypeByVal = typbyval; + /* Prepare SortSupport data */ - state->onlyKey = (SortSupport) palloc0(sizeof(SortSupportData)); + state->sortKeys = (SortSupport) palloc0(sizeof(SortSupportData)); + + state->sortKeys->ssup_cxt = CurrentMemoryContext; + state->sortKeys->ssup_collation = sortCollation; + state->sortKeys->ssup_nulls_first = nullsFirstFlag; - state->onlyKey->ssup_cxt = CurrentMemoryContext; - state->onlyKey->ssup_collation = sortCollation; - state->onlyKey->ssup_nulls_first = nullsFirstFlag; /* - * Conversion to abbreviated representation infeasible in the Datum case. - * It must be possible to subsequently fetch original datum values wit
Re: [HACKERS] Re: Abbreviated keys for Datum tuplesort
On 25.1.2015 12:15, Andrew Gierth wrote: > > So given some suitable test data, such as > > create table stuff as select random()::text as randtext > from generate_series(1,100); -- or however many rows > > you can do > > select percentile_disc(0) within group (order by randtext) from stuff; > > or > > select count(distinct randtext) from stuff; > > The performance improvements I saw were pretty much exactly as > expected from the improvement in the ORDER BY and CREATE INDEX cases. I've spent a fair amount of testing this today, and when using the simple percentile_disc example mentioned above, I see this pattern: master patched speedup - generate_series(1,100) 4.2 0.7 6 generate_series(1,200) 9.2 9.8 0.93 generate_series(1,300) 14.5 15.3 0.95 so for a small dataset the speedup is very nice, but for larger sets there's ~5% slowdown. Is this expected? -- Tomas Vondrahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Abbreviated keys for Datum tuplesort
> "Robert" == Robert Haas writes: >> Here's the cleaned-up version of the patch to allow abbreviated keys >> when sorting a single Datum. This also removes comments that suggest >> that the caller of tuplesort_begin_datum should ever have to care >> about the abbreviated key optimization. >> >> I'll add this to the CF. Robert> I think this is a good idea. Do you have a test case that Robert> shows the benefit? The best test case for datum sort performance is to use percentile_disc, since that has almost no overhead beyond performing the actual sort. (Unlike, say, count(distinct) or mode(), both of which have to do an additional comparison pass over the sorted data; but count(distinct) is probably the most common use of the datum sort in the wild, so it's useful to try that too.) So given some suitable test data, such as create table stuff as select random()::text as randtext from generate_series(1,100); -- or however many rows you can do select percentile_disc(0) within group (order by randtext) from stuff; or select count(distinct randtext) from stuff; The performance improvements I saw were pretty much exactly as expected from the improvement in the ORDER BY and CREATE INDEX cases. The best test case for checking the correct order of results is to use array_agg(x order by x), for example as follows: select u, u <= lag(u) over () from (select unnest(a) as u from (select array_agg(randtext order by randtext) from stuff) s1) s2; (note that array_agg(x order by x) uses the datum sort, but array_agg(x order by y) uses the ordinary heap tuple sort) -- Andrew (irc:RhodiumToad) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: Abbreviated keys for Datum tuplesort (was: Re: B-Tree support function number 3 (strxfrm() optimization))
On Sat, Jan 24, 2015 at 6:19 PM, Robert Haas wrote: > I think this is a good idea. Do you have a test case that shows the benefit? I agree. It seems likely that this will show a similar benefit to other cases already tested, but I'd like to see a test case too. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: Abbreviated keys for Datum tuplesort (was: Re: B-Tree support function number 3 (strxfrm() optimization))
On Fri, Jan 23, 2015 at 4:13 PM, Andrew Gierth wrote: > [pruning the Cc: list and starting a new thread] > > Here's the cleaned-up version of the patch to allow abbreviated keys > when sorting a single Datum. This also removes comments that suggest > that the caller of tuplesort_begin_datum should ever have to care > about the abbreviated key optimization. > > I'll add this to the CF. I think this is a good idea. Do you have a test case that shows the benefit? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers