Re: [HACKERS] Using quicksort for every external sort run
On 09/12/15 00:02, Jeff Janes wrote: > The second one consumes that giant tape run along with 232 small tape > runs. In terms of number of comparisons, binary merge works best when the inputs are of similar length. I'd assume the same goes for n-ary merge, but I don't know if comparison count is an issue here. -- Cheers, Jeremy -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Equivalence Class Filters
On 07/12/15 16:44, Simon Riggs wrote: > There are many optimizations we might adopt, yet planning time is a factor. > It seems simple enough to ignore more complex optimizations if we have > already achieved a threshold cost (say 10). Such a test would add nearly > zero time for the common case. We can apply the optimizations in some kind > of ordering depending upon the cost, so we are careful to balance the > cost/benefit of trying certain optimizations. Given parallelism, why not continue planning after initiating a a cancellable execution, giving a better plan to be used if the excecution runs for long enough? -- Cheers, Jeremy -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Getting sorted data from foreign server
On 08/10/15 17:09, Robert Haas wrote: > - You consider pushing down ORDER BY if any prefix of the query > pathkeys satisfy is_foreign_expr(), but that doesn't seem right to me. > If the user says SELECT * FROM remotetab ORDER BY a, unsafe(a), > ordering the result set by a doesn't help us much. We've talked a few > times about an incremental sort capability that would take a stream of > tuples already ordered by one or more columns and sort each group by > additional columns, but I don't think we have that currently. Without > that capability, I don't think there's much benefit in sorting by a > prefix of the pathkeys. I suspect that if we can't get them all, it's > not worth doing. That depends how often the additional columns affect the sorted order, if the sort method takes advantage of sorted input. -- Jeremy -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] BRIN indexes for MAX, MIN, ORDER BY?
On 27/09/15 21:58, Gavin Wahl wrote: > Somewhat harder but still possible would be using BRIN indexes to > accelerate ORDER BY. This would require a sorting algorithm that can take > advantage of mostly-sorted inputs. You would sort the page ranges by their > minimum or maximum value, then feed the sorting algorithm in that order. An internal merge sort does well with partially-sorted input. -- Cheers, Jeremy -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Using quicksort and a merge step to significantly improve on tuplesort's single run "external sort"
On 31/07/15 18:31, Robert Haas wrote: > On Fri, Jul 31, 2015 at 7:21 AM, Jeremy Harris wrote: >> Heapification is O(n) already, whether siftup (existing) or down. > > That's not my impression, or what Wikipedia says. Source? Measurements done last year: http://www.postgresql.org/message-id/52f35462.3030...@wizmail.org (spreadsheet attachment) http://www.postgresql.org/message-id/52f40ce9.1070...@wizmail.org (measurement procedure and spreadsheet explanation) -- Cheers, Jeremy -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: Using quicksort and a merge step to significantly improve on tuplesort's single run "external sort"
On 30/07/15 02:05, Peter Geoghegan wrote: > Since heapification is now a big fraction of the total cost of a sort > sometimes, even where the heap invariant need not be maintained for > any length of time afterwards, it might be worth revisiting the patch > to make that an O(n) rather than a O(log n) operation [3]. O(n log n) ? Heapification is O(n) already, whether siftup (existing) or down. It might be worthwhile comparing actual times with a quicksort, given that a sorted array is trivially a well-formed heap (the reverse is not true) and that quicksort seems to be cache-friendly. Presumably there will be a crossover N where the cache-friendliness k reduction loses out to the log n penalty for doing a full sort; below this it would be useful. You could then declare the tape buffer to be the leading tranche of work-mem (and dump it right away) and the heap to start with the remainder. -- Cheers, Jeremy -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Asynchronous DRAM Self-Refresh
On 22/05/15 22:30, Josh Berkus wrote: > At CoreOS Fest, Intel presented about a technology which they used to > improve write times for the nonrelational data store Etcd. It's called > Asynchronous DRAM Self-Refresh, or ADR. This is supposedly a feature of > all of their chips since E5 which allows users to designate a small area > of memory (16 to 64MB) which is somehow guaranteed to be flushed to disk > in the event of a power loss (the exact mechanism was not explained). > > So my thought was "Hello! wal_buffers?" Theoretically, this feature > could give us the benefits of aynchronous commit without the penalties > ... *if* it actually works. > > However, since then I've been able to find zero documentation on ADR. > There's a bunch of stuff in the Intel press releases, but zero I can > find in their technical docs. Anyone have a clue on this? > http://snia.org/sites/default/files/NVDIMM%20Messaging%20and%20FAQ%20Jan%2020143.pdf describes it as a (minor) processor feature to flush memory-pipeline buffers to RAM on powerloss detection. The context there is for a flash-backup on-DIMM for the RAM. It's unclear whether any dirty data in cpu cache gets written... Are Intel caches never write-behind? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Asynchronous DRAM Self-Refresh
On 22/05/15 22:30, Josh Berkus wrote: > At CoreOS Fest, Intel presented about a technology which they used to > improve write times for the nonrelational data store Etcd. It's called > Asynchronous DRAM Self-Refresh, or ADR. This is supposedly a feature of > all of their chips since E5 which allows users to designate a small area > of memory (16 to 64MB) which is somehow guaranteed to be flushed to disk > in the event of a power loss (the exact mechanism was not explained). > > So my thought was "Hello! wal_buffers?" Theoretically, this feature > could give us the benefits of aynchronous commit without the penalties > ... *if* it actually works. > > However, since then I've been able to find zero documentation on ADR. > There's a bunch of stuff in the Intel press releases, but zero I can > find in their technical docs. Anyone have a clue on this? > Are you certain disk was mentioned? The wording at http://www.intel.com/design/intarch/iastorage/xeon.htm "to preserve critical data in RAM during a power fail" implies not. I assume there's a battery backup for just that memory region, and the chip does its own refresh rather than needing a controller; the data should still be recoverable on next boot. -- Cheers, Jeremy -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Abbreviated keys for text cost model fix
On 03/03/15 03:08, Arthur Silva wrote: > Does it always perform mergesort instead of quicksort when enabled? Yes; there seemed no advantage to any additional complexity. The merge consistently performs fewer comparisons than the quicksort, on random input - and many fewer if there are any sorted (or reverse-sorted) sections. However, it also consistently takes slightly longer (a few percent). I was unable to chase this down but assume it to be a cacheing effect. So I don't currently think it should replace the current sort for all use. If the planner can identify cases where there is some pre-sort in the input (CLUSTER was mentioned?), or maybe a correlated index, it could be worthwhile. Also useful might be very-expensive comparison cases, and distinct-output cases (uniqification is supported at each submerge stage, so data disappears early). There's potential for running submerges in parallel using multiple cores, but I've not worked on that. -- Cheers, Jeremy -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Abbreviated keys for text cost model fix
On 25/02/15 00:32, Jeremy Harris wrote: > On 23/02/15 16:40, Tomas Vondra wrote: >> On 22.2.2015 22:30, Peter Geoghegan wrote: >>> You should try it with the data fully sorted like this, but with one >>> tiny difference: The very last tuple is out of order. How does that >>> look? > > If this case is actually important, a merge-sort can take > significant advantage of the partial order: Presumably it is not, as nobody commented on the alleged 20 or 30x speedup. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Abbreviated keys for text cost model fix
On 25/02/15 00:42, Peter Geoghegan wrote: > On Tue, Feb 24, 2015 at 4:32 PM, Jeremy Harris wrote: >> On 23/02/15 16:40, Tomas Vondra wrote: >>> On 22.2.2015 22:30, Peter Geoghegan wrote: >>>> You should try it with the data fully sorted like this, but with one >>>> tiny difference: The very last tuple is out of order. How does that >>>> look? >> >> If this case is actually important, a merge-sort can take >> significant advantage of the partial order: > >> test=# set enable_intmerge_sort to on; > > What is this? Associated with an implementation of an internal merge sort, a control to disable it. Not in the mainline sourcebase. -- Cheers, Jeremy -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Abbreviated keys for text cost model fix
On 23/02/15 16:40, Tomas Vondra wrote: > On 22.2.2015 22:30, Peter Geoghegan wrote: >> You should try it with the data fully sorted like this, but with one >> tiny difference: The very last tuple is out of order. How does that >> look? If this case is actually important, a merge-sort can take significant advantage of the partial order: test=# explain analyze select * from (select * from stuff_text_asc order by randtxt offset 1000) foo; QUERY PLAN --- Limit (cost=247054.81..247054.81 rows=1 width=18) (actual time=25133.029..25133.029 rows=0 loops=1) -> Sort (cost=242054.81..247054.81 rows=201 width=18) (actual time=25025.931..25088.406 rows=201 loops=1) Sort Key: stuff_text_asc.randtxt Sort Method: quicksort Memory: 221213kB Compares: 95541376 -> Seq Scan on stuff_text_asc (cost=0.00..32739.01 rows=201 width=18) (actual time=0.011..118.390 rows=201 loops=1) Planning time: 0.080 ms Execution time: 25144.538 ms (7 rows) Time: 25145.185 ms test=# test=# test=# set enable_intmerge_sort to on; SET Time: 0.378 ms test=# explain analyze select * from (select * from stuff_text_asc order by randtxt offset 1000) foo; QUERY PLAN -- Limit (cost=247054.81..247054.81 rows=1 width=18) (actual time=1051.603..1051.603 rows=0 loops=1) -> Sort (cost=242054.81..247054.81 rows=201 width=18) (actual time=943.304..1006.988 rows=201 loops=1) Sort Key: stuff_text_asc.randtxt Sort Method: internal merge Memory: 221213kB Compares: 202 -> Seq Scan on stuff_text_asc (cost=0.00..32739.01 rows=201 width=18) (actual time=0.009..98.474 rows=201 loops=1) Planning time: 0.072 ms Execution time: 1063.434 ms (7 rows) Time: 1064.113 ms test=# test=# set enable_intmerge_sort to off; SET Time: 0.353 ms test=# test=# test=# test=# test=# test=# explain analyze select count(distinct randtxt) from stuff_text_asc; QUERY PLAN - Aggregate (cost=37739.01..37739.02 rows=1 width=18) (actual time=25196.814..25196.815 rows=1 loops=1) -> Seq Scan on stuff_text_asc (cost=0.00..32739.01 rows=201 width=18) (actual time=0.010..114.995 rows=201 loops=1) Planning time: 0.053 ms Execution time: 25196.857 ms (4 rows) Time: 25197.371 ms test=# test=# explain analyze select count(*) from (select distinct randtxt from stuff_text_asc) as foo; QUERY PLAN - Aggregate (cost=277054.83..277054.84 rows=1 width=0) (actual time=25521.258..25521.258 rows=1 loops=1) -> Unique (cost=242054.81..252054.81 rows=201 width=18) (actual time=25101.157..25438.622 rows=1999100 loops=1) -> Sort (cost=242054.81..247054.81 rows=201 width=18) (actual time=25101.156..25184.436 rows=201 loops=1) Sort Key: stuff_text_asc.randtxt Sort Method: quicksort Memory: 221213kB Compares: 95541376 -> Seq Scan on stuff_text_asc (cost=0.00..32739.01 rows=201 width=18) (actual time=0.011..116.509 rows=201 loops=1) Planning time: 0.088 ms Execution time: 25532.947 ms (8 rows) Time: 25533.642 ms test=# test=# test=# set enable_intmerge_sort to on; SET Time: 0.401 ms test=# explain analyze select count(*) from (select distinct randtxt from stuff_text_asc) as foo; QUERY PLAN --- Aggregate (cost=272054.82..272054.83 rows=1 width=0) (actual time=1184.289..1184.289 rows=1 loops=1) -> Sort (cost=242054.81..247054.81 rows=201 width=18) (actual time=1037.019..1100.720 rows=1999100 loops=1) Sort Key: stuff_text_asc.randtxt Sort Method: dedup internal merge Memory: 221143kB Compares: 201 -> Seq Scan on stuff_text_asc (cost=0.00..32739.01 rows=201 width=18) (actual time=0.010..106.729 rows=201 loops=1) Planning time: 0.086 ms Execution time: 1195.891 ms (7 rows) Time: 1196.514 ms test=# -- Cheers, Jeremy -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] EXPLAIN ANALYZE output weird for Top-N Sort
On 14/11/14 14:54, Tom Lane wrote: > Jeremy Harris writes: >> On 14/11/14 00:46, Simon Riggs wrote: >>> Limit (cost= rows=20 width=175) (actual time= rows=20 loops=1) >>> -> Sort (cost= rows=568733 width=175) (actual time= >>> rows=20 loops=1) >>> Sort Method: top-N heapsort > >> Going off on a tangent, when I was playing with a merge-sort >> implementation I propagated limit information into the sort >> node, for a significant win. > > I'm not entirely following. The top-N heapsort approach already > makes use of the limit info. Having gone back to look, you're right. It was Uniq nodes I merged (the sort handles both bounded-output and dedup). -- Cheers, Jeremy -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] EXPLAIN ANALYZE output weird for Top-N Sort
On 14/11/14 00:46, Simon Riggs wrote: > Limit (cost= rows=20 width=175) (actual time= rows=20 loops=1) >-> Sort (cost= rows=568733 width=175) (actual time= > rows=20 loops=1) > Sort Method: top-N heapsort Going off on a tangent, when I was playing with a merge-sort implementation I propagated limit information into the sort node, for a significant win. Eliding the Limit node gave a further slight win. I wasn't convinced the use-case was common enough to justify the replacement of quicksort (despite having consistently fewer compares, the merge sort was slightly slower. I never understood why) - but I never asked. Is there any appetite for supporting alternate sort algorithms? -- Cheers, Jeremy -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Directory/File Access Permissions for COPY and Generic File Access Functions
On 29/10/14 16:11, Andres Freund wrote: > I do think checking the link count to > be 1 is safe though. You will fail against certain styles of online-backup. -- Cheers, Jeremy -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Minor performance improvement in transition to external sort
On 26/02/14 03:08, Robert Haas wrote: On Tue, Feb 25, 2014 at 2:55 PM, Jeremy Harris wrote: On 24/02/14 17:38, Robert Haas wrote: On Thu, Feb 20, 2014 at 7:27 PM, Jeremy Harris wrote: Run under cachegrind, it takes about N/10 last-level cache misses, all for the new item being introduced to the heap. The existing code takes none at all. Can you explain this further? This seems like an important clue that could be useful when trying to optimize this code, but I'm a little unclear which part of the operation has more cache misses with your changes and why. In the patched version, for the heapify operation the outer loop starts at the last heap-parent tuple and works backward to the start of the tuples array. A copy is taken of the SortTuple being operated on for the inner loop to use. This copy suffers cache misses. (The inner loop operates on elements between the copy source and the end of the array). In the original, the outer loop runs the array in increasing index order. Again a copy is taken of the SortTuple for the inner loop to use. This copy does not appear to take significant cache misses. (The inner loop operates on elements between the copy source and the start of the array). Do you have any theory as to why this happens in one case but not the other? Nope. Only really wild stuff requiring the cpu memory channel recognising the forward scan (despite the inner loop) and not the backward one - and ignoring prefetch instructions, which I experimented with. -- Cheers, Jeremy -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Minor performance improvement in transition to external sort
On 24/02/14 17:38, Robert Haas wrote: On Thu, Feb 20, 2014 at 7:27 PM, Jeremy Harris wrote: Run under cachegrind, it takes about N/10 last-level cache misses, all for the new item being introduced to the heap. The existing code takes none at all. Can you explain this further? This seems like an important clue that could be useful when trying to optimize this code, but I'm a little unclear which part of the operation has more cache misses with your changes and why. In the patched version, for the heapify operation the outer loop starts at the last heap-parent tuple and works backward to the start of the tuples array. A copy is taken of the SortTuple being operated on for the inner loop to use. This copy suffers cache misses. (The inner loop operates on elements between the copy source and the end of the array). In the original, the outer loop runs the array in increasing index order. Again a copy is taken of the SortTuple for the inner loop to use. This copy does not appear to take significant cache misses. (The inner loop operates on elements between the copy source and the start of the array). -- Cheers, Jeremy -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Minor performance improvement in transition to external sort
On 09/02/14 17:11, Jeremy Harris wrote: On 06/02/14 18:21, Jeff Janes wrote: Did you try sorting already-sorted, reverse sorted, or pipe-organ shaped data sets? We will also need to test it on strings. I usually use md5(random()::text) to generate strings for such purposes, at least for a first pass. Attached is version 2 of the patch, which fixes the performance on constant-input. Having beaten on this some more I'm prepared to abandon it. The wallclock time, for random input, drifts up at larger N (compared to the existing code) despite the number of comparisons being consistently less. Run under cachegrind, it takes about N/10 last-level cache misses, all for the new item being introduced to the heap. The existing code takes none at all. It might be worthwhile for a seriously expensive comparison function; say, more than 50 clocks. For integers and md5-strings it isn't. -- Cheers, Jeremy -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Minor performance improvement in transition to external sort
On 06/02/14 22:12, Jeremy Harris wrote: Did you try sorting already-sorted, reverse sorted, or pipe-organ shaped data sets? Summary (low numbers better): Random ints: 83% compares, level on time. Sorted ints: level compares, 70% time. Reverse-sorted ints: 10% compares, 15% time (!) Constant ints: 200% compares, 360% time(ouch, and not O(n)) Pipe-organ ints: 80% compares, 107% time Random text: 83% compares, 106% time -- Cheers, Jeremy siftdown_performance.ods Description: application/vnd.oasis.opendocument.spreadsheet -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Minor performance improvement in transition to external sort
On 06/02/14 14:22, Robert Haas wrote: On Thu, Feb 6, 2014 at 4:22 AM, Jeremy Harris wrote: On 05/02/14 21:56, Robert Haas wrote: On Tue, Feb 4, 2014 at 5:22 PM, Jeremy Harris wrote: The attached patch replaces the existing siftup method for heapify with a siftdown method. Tested with random integers it does 18% fewer compares and takes 10% less time for the heapify, over the work_mem range 1024 to 1048576. Both algorithms appear to be O(n) (contradicting Wikipedia's claim that a siftup heapify is O(n log n)). I think Wikipedia is right. Inserting an a tuple into a heap is O(lg n); doing that n times is therefore O(n lg n). Your patch likewise implements an outer loop which runs O(n) times and an inner loop which runs at most O(lg n) times, so a naive analysis of that algorithm also comes out to O(n lg n). The patch implements a siftdown. Measurements attached. Uh, this spreadsheet is useless. You haven't explained how these numbers were generated, or what the columns in the spreadsheet actually mean. Ideally, you should include enough information that someone else can try to reproduce your results. I'm sorry I wasn't clear. Test code was groups of the form: set work_mem to 1024; CREATE TABLE jgh (i integer); INSERT INTO jgh (i) SELECT 2 * random() FROM generate_series(1, 2); VACUUM ANALYZE jgh; explain analyze SELECT * FROM jgh ORDER BY i; set enable_siftdown_heapify to off; explain analyze SELECT * FROM jgh ORDER BY i; set enable_siftdown_heapify to on; DROP TABLE jgh; .. for the tabulated work_mem, and scaled values for the INSERT. Compares were counted for the heapify stage (only), and timings taken using pg_rusage_init() before and pg_rusage_show() after it. Times are in seconds. Baseline value for compares and time used 858ec11858. Sift-down compares and time are for the patch. "Baseline K cmps" is compares divided by work_mem (which should be a scaled version of compares-per-tuple being heapified) pre-patch. "Siftdown K cmps" is likewise with the patch. "K ratio cmps" is the ratio (patched compares)/(unpatched compares). Repeat for time measurements. -- Cheers, Jeremy -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Minor performance improvement in transition to external sort
On 06/02/14 18:21, Jeff Janes wrote: How big of sets were you sorting in each case? Big enough to go external. The timings and compare-counts given are purely for the heapify stage not the total for the sort, so are constrained by the work_mem not by the sort size per se. I'm limited to working on a small machine, so the work_mem value of 1e+6 is approaching my top limit of sort-time pain unless I rip the heapify out into a test harness. What range of work_mem is it useful to test over? Was it always just slightly bigger than would fit in work_mem? I didn't check. Did you try sorting already-sorted, reverse sorted, or pipe-organ shaped data sets? Not yet, but I can. What variety of pipe-organ shape is of interest (I can think of several that might be called that)? We will also need to test it on strings. I usually use md5(random()::text) to generate strings for such purposes, at least for a first pass. OK. The name of the attached patch is a bit unfortunate. Is there a project standard for this? And I'm not sure what you are doing with tupindex, the change there doesn't seem to be necessary to the rest of your work, so some explanation on that would be helpful. I'll add commentary. The project coding style usually has comments in blocks before loops on which they comment, rather than interspersed throughout the clauses of the "for" statement. I'll adjust that. Another optimization that is possible is to do only one comparison at each level (between the two children) all the way down the heap, and then compare the parent to the chain of smallest children in reverse order. This can substantially reduce the number of comparisons on average, because most tuples sift most of the way down the heap (because most of the tuples are at the bottom of the heap). Sounds interesting; I'll see if I can get that going. (Also, I think you should make your siftdown code look more like the existing siftdown code.) A function called by the outer loop? Can do; the existing does that because the function is called from multiple places but this will only be used the once. Thanks for the review. -- Jeremy -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Minor performance improvement in transition to external sort
On 05/02/14 21:56, Robert Haas wrote: On Tue, Feb 4, 2014 at 5:22 PM, Jeremy Harris wrote: The attached patch replaces the existing siftup method for heapify with a siftdown method. Tested with random integers it does 18% fewer compares and takes 10% less time for the heapify, over the work_mem range 1024 to 1048576. Both algorithms appear to be O(n) (contradicting Wikipedia's claim that a siftup heapify is O(n log n)). I think Wikipedia is right. Inserting an a tuple into a heap is O(lg n); doing that n times is therefore O(n lg n). Your patch likewise implements an outer loop which runs O(n) times and an inner loop which runs at most O(lg n) times, so a naive analysis of that algorithm also comes out to O(n lg n). The patch implements a siftdown. Measurements attached. -- Cheers, Jeremy work_mem baseline Sift-down baseline K siftdown K K ratio baseline K siftdown K K ratio cmps time cmps time cmps cmps cmps time time time 1024 27271 0.001 22426 0.001 26.6 21.9 82% 6.76E-07 4.93E-07 73% 2048 52153 0.001 44557 0.001 25.5 21.8 85% 6.28E-07 4.47E-07 71% 4096 108660 0.003 89591 0.002 26.5 21.9 82% 6.47E-07 4.63E-07 72% 8192 217341 0.005 179191 0.004 26.5 21.9 82% 6.57E-07 4.88E-07 74% 16384 434101 0.011 358680 0.009 26.5 21.9 83% 6.55E-07 5.42E-07 83% 32768 871110 0.022 717542 0.019 26.6 21.9 82% 6.60E-07 5.74E-07 87% 65536 1740355 0.043 1434967 0.039 26.6 21.9 82% 6.59E-07 5.96E-07 90% 131072 3478497 0.087 2869190 0.078 26.5 21.9 82% 6.62E-07 5.98E-07 90% 262144 6962693 0.173 5740518 0.154 26.6 21.9 82% 6.58E-07 5.89E-07 89% 524288 13912236 0.345 11476660 0.311 26.5 21.9 82% 6.59E-07 5.93E-07 90% 1048576 27839260 0.690 22957368 0.622 26.5 21.9 82% 6.58E-07 5.93E-07 90% -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Minor performance improvement in transition to external sort
The attached patch replaces the existing siftup method for heapify with a siftdown method. Tested with random integers it does 18% fewer compares and takes 10% less time for the heapify, over the work_mem range 1024 to 1048576. Both algorithms appear to be O(n) (contradicting Wikipedia's claim that a siftup heapify is O(n log n)). -- Cheers, Jeremy diff --git a/src/backend/utils/sort/tuplesort.c b/src/backend/utils/sort/tuplesort.c index 8b520c1..2ea7b2d 100644 --- a/src/backend/utils/sort/tuplesort.c +++ b/src/backend/utils/sort/tuplesort.c @@ -1211,7 +1211,8 @@ puttuple_common(Tuplesortstate *state, SortTuple *tuple) (void) grow_memtuples(state); Assert(state->memtupcount < state->memtupsize); } - state->memtuples[state->memtupcount++] = *tuple; + state->memtuples[state->memtupcount] = *tuple; + state->memtuples[state->memtupcount++].tupindex = 0; /* * Check if it's time to switch over to a bounded heapsort. We do @@ -1884,23 +1885,47 @@ inittapes(Tuplesortstate *state) state->tp_tapenum = (int *) palloc0(maxTapes * sizeof(int)); /* -* Convert the unsorted contents of memtuples[] into a heap. Each tuple is -* marked as belonging to run number zero. -* -* NOTE: we pass false for checkIndex since there's no point in comparing -* indexes in this step, even though we do intend the indexes to be part -* of the sort key... +* Convert the unsorted contents of memtuples[] into a heap. Each tuple was +* marked as belonging to run number zero on input; we don't compare that. */ ntuples = state->memtupcount; - state->memtupcount = 0; /* make the heap empty */ - for (j = 0; j < ntuples; j++) { - /* Must copy source tuple to avoid possible overwrite */ - SortTuple stup = state->memtuples[j]; + SortTuple * m = state->memtuples; - tuplesort_heap_insert(state, &stup, 0, false); + for (j = (ntuples-2)/2; /* comb the array from the last heap parent */ +j >= 0;/* to the start */ +j--) + { + int root = j; + int child, swap = 0; + SortTuple ptup = m[root]; + + while ((child = root*2 + 1) <= ntuples-1) + { /* root has at least one child. Check left-child */ + if (COMPARETUP(state, &ptup, &m[child]) < 0) + { /* [root] < [lchild] */ + if ( ++child <= ntuples-1 /* check right-child */ + && COMPARETUP(state, &ptup, &m[child]) > 0) + swap = child; + else + break; /* [root] smallest */ + } + else + { + swap = child; + if ( ++child <= ntuples-1 /* check right-child */ + && COMPARETUP(state, &m[swap], &m[child]) > 0) + swap = child; + } + /* [swap] is smallest of three; move into the parent slot */ + m[root] = m[swap]; + root = swap;/* and repeat with the child subtree */ + } + if (swap) + m[swap] = ptup; + /* This and all heap nodes after are now well-positioned */ + } } - Assert(state->memtupcount == ntuples); state->currentRun = 0; -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] A better way than tweaking NTUP_PER_BUCKET
On 27/01/14 18:00, Simon Riggs wrote: On 27 January 2014 17:44, Pavel Stehule wrote: This topic is interesting - we found very bad performance with hashing large tables with high work_mem. MergeJoin with quicksort was significantly faster. I've seen this also. I didn't deeper research - there is a possibility of virtualization overhead. I took measurements and the effect was repeatable and happened for all sizes of work_mem, but nothing more to add. FWIW my current list-based internal merge seems to perform worse at larger work-mem, compared to quicksort. I've been starting to wonder if the rate of new ram-chip page opens is an issue (along with the more-usually considered cache effects). Any random-access workload would be affected by this, if it really exists. -- Cheers, Jeremy -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Questionable coding in orderedsetaggs.c
In ordered_set_startup() sorts are initialised in non-randomAccess mode (tuplesort_begin_heap() and ~datum(), last argument). The use of tuplesort_skip_tuples() feels very like a random access to me. I think it doesn't fail because the only use (and implementation) is to skip forwards; if backwards were tried (as the interface permits) external sorts would fail because multiple tapes are present for FINALMERGE. -- Cheers, Jeremy -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PoC: Duplicate Tuple Elidation during External Sort for DISTINCT
On 22/01/14 03:53, Tom Lane wrote: Jon Nelson writes: A rough summary of the patch follows: - a GUC variable enables or disables this capability - in nodeAgg.c, eliding duplicate tuples is enabled if the number of distinct columns is equal to the number of sort columns (and both are greater than zero). - in createplan.c, eliding duplicate tuples is enabled if we are creating a unique plan which involves sorting first - ditto planner.c - all of the remaining changes are in tuplesort.c, which consist of: + a new macro, DISCARDTUP and a new structure member, discardtup, are both defined and operate similar to COMPARETUP, COPYTUP, etc... + in puttuple_common, when state is TSS_BUILDRUNS, we *may* simply throw out the new tuple if it compares as identical to the tuple at the top of the heap. Since we're already performing this comparison, this is essentially free. + in mergeonerun, we may discard a tuple if it compares as identical to the *last written tuple*. This is a comparison that did not take place before, so it's not free, but it saves a write I/O. + We perform the same logic in dumptuples [ raised eyebrow ... ] And what happens if the planner drops the unique step and then the sort doesn't actually go to disk? I don't think Jon was suggesting that the planner drop the unique step. -- Cheers, Jeremy -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PoC: Duplicate Tuple Elidation during External Sort for DISTINCT
On 22/01/14 03:16, Jon Nelson wrote: Greetings -hackers: I have worked up a patch to PostgreSQL which elides tuples during an external sort. The primary use case is when sorted input is being used to feed a DISTINCT operation. The idea is to throw out tuples that compare as identical whenever it's convenient, predicated on the assumption that even a single I/O is more expensive than some number of (potentially extra) comparisons. Obviously, this is where a cost model comes in, which has not been implemented. This patch is a work-in-progress. Dedup-in-sort is also done by my WIP internal merge sort, and extended (in much the same ways as Jon's) to the external merge. https://github.com/j47996/pgsql_sorb I've not done a cost model either, but the dedup capability is exposed from tuplesort.c to the executor, and downstream uniq nodes removed. I've not worked out yet how to eliminate upstream hashagg nodes, which would be worthwhile from testing results. -- Cheers, Jeremy -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PoC: Partial sort
On 31/12/13 01:41, Andreas Karlsson wrote: On 12/29/2013 08:24 AM, David Rowley wrote: If it was possible to devise some way to reuse any previous tuplesortstate perhaps just inventing a reset method which clears out tuples, then we could see performance exceed the standard seqscan -> sort. The code the way it is seems to lookup the sort functions from the syscache for each group then allocate some sort space, so quite a bit of time is also spent in palloc0() and pfree() If it was not possible to do this then maybe adding a cost to the number of sort groups would be better so that the optimization is skipped if there are too many sort groups. It should be possible. I have hacked a quick proof of concept for reusing the tuplesort state. Can you try it and see if the performance regression is fixed by this? One thing which have to be fixed with my patch is that we probably want to close the tuplesort once we have returned the last tuple from ExecSort(). I have attached my patch and the incremental patch on Alexander's patch. How does this work in combination with randomAccess ? -- Thanks, Jeremy -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PoC: Partial sort
On 13/01/14 18:01, Alexander Korotkov wrote: Thanks. It's included into attached version of patch. As wall as estimation improvements, more comments and regression tests fix. Would it be possible to totally separate the two sets of sort-keys, only giving the non-index set to the tuplesort? At present tuplesort will, when it has a group to sort, make wasted compares on the indexed set of keys before starting on the non-indexed set. -- Cheers, Jeremy -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance
On 14/01/14 22:23, Dave Chinner wrote: On Tue, Jan 14, 2014 at 11:40:38AM -0800, Kevin Grittner wrote: To quantify that, in a production setting we were seeing pauses of up to two minutes with shared_buffers set to 8GB and default dirty ^ page settings for Linux, on a machine with 256GB RAM and 512MB ^ There's your problem. By default, background writeback doesn't start until 10% of memory is dirtied, and on your machine that's 25GB of RAM. That's way to high for your workload. It appears to me that we are seeing large memory machines much more commonly in data centers - a couple of years ago 256GB RAM was only seen in supercomputers. Hence machines of this size are moving from "tweaking settings for supercomputers is OK" class to "tweaking settings for enterprise servers is not OK" Perhaps what we need to do is deprecate dirty_ratio and dirty_background_ratio as the default values as move to the byte based values as the defaults and cap them appropriately. e.g. 10/20% of RAM for small machines down to a couple of GB for large machines Perhaps the kernel needs a dirty-amount control measured in time units rather than pages (it being up to the kernel to measure the achievable write rate)... -- Cheers, Jeremy -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PoC: Partial sort
On 22/12/13 20:26, Alexander Korotkov wrote: On Sat, Dec 14, 2013 at 6:30 PM, Jeremy Harris wrote: On 14/12/13 12:54, Andres Freund wrote: Is that actually all that beneficial when sorting with a bog standard qsort() since that doesn't generally benefit from data being pre-sorted? I think we might need to switch to a different algorithm to really benefit from mostly pre-sorted input. Eg: http://www.postgresql.org/message-id/5291467e.6070...@wizmail.org Maybe Alexander and I should bash our heads together. Partial sort patch is mostly optimizer/executor improvement rather than improvement of sort algorithm itself. I finally got as far as understanding Alexander's cleverness, and it does make the performance advantage (on partially-sorted input) of the merge-sort irrelevant. There's a slight tradeoff possible between the code complexity of the chunking code front-ending the sorter and just using the enhanced sorter. The chunking does reduce the peak memory usage quite nicely too. The implementation of the chunker does O(n) compares using the keys of the feed-stream index, to identify the chunk boundaries. Would it be possible to get this information from the Index Scan? -- Cheers, Jeremy -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PoC: Partial sort
On 14/12/13 12:54, Andres Freund wrote: On 2013-12-14 13:59:02 +0400, Alexander Korotkov wrote: Currently when we need to get ordered result from table we have to choose one of two approaches: get results from index in exact order we need or do sort of tuples. However, it could be useful to mix both methods: get results from index in order which partially meets our requirements and do rest of work from heap. -- Limit (cost=69214.06..69214.08 rows=10 width=16) (actual time=0.097..0.099 rows=10 loops=1) -> Sort (cost=69214.06..71714.06 rows=100 width=16) (actual time=0.096..0.097 rows=10 loops=1) Sort Key: v1, v2 Sort Method: top-N heapsort Memory: 25kB -> Index Scan using test_v1_idx on test (cost=0.42..47604.42 rows=100 width=16) (actual time=0.017..0.066 rows=56 loops=1) Total runtime: 0.125 ms (6 rows) Is that actually all that beneficial when sorting with a bog standard qsort() since that doesn't generally benefit from data being pre-sorted? I think we might need to switch to a different algorithm to really benefit from mostly pre-sorted input. Eg: http://www.postgresql.org/message-id/5291467e.6070...@wizmail.org Maybe Alexander and I should bash our heads together. -- Cheers, Jeremy -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Dynamic Shared Memory stuff
On 20/11/13 19:58, Robert Haas wrote: Parallel sort, and then parallel other stuff. Eventually general parallel query. I have recently updated https://wiki.postgresql.org/wiki/Parallel_Sort and you may find that interesting/helpful as a statement of intent. I've been playing with an internal merge sort which may be of interest in this area of work. While I've not worked on parallelizing the merge stages it does not feel like an impossible task. More to the immediate point, the input stage and readout stage can do useful work overlapped with the data source and sink (respectively). The current implementation uses the same amount of memory as the quicksort one, and does approximately the same number of compares (on random input). The overheads are higher than the quicksort implementation (up to 50% higher cpu time, on a single key of random integers). Its performance shines on partially- or reverse-sorted input. Transition to (the existing) external sort is implemented, as is the limited random-access facility, and bounded output. It supports unique-output. The planner interface is extended to return an indication of dedup guarantee, and the Unique node uses this to elide itself at planning time. Dedup operations also cover external sorts. -- Cheers, Jeremy -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fast insertion indexes: why no developments
On 13/11/13 09:07, Leonardo Francalanci wrote: Problem? having 4 btree indexes on random values (imsi+imei * 2, since we have calling and caller) kills the performance in insertion after a while. Surely there's good correlation between IMSI & IMEI, so have a separate table to translate one to (a group of) the others, and halve the indexes on your main table? -- Jeremy -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] regression tests
On 06/09/13 15:44, Robert Haas wrote: On Fri, Sep 6, 2013 at 3:34 AM, Jeremy Harris wrote: I don't see the regression tests running any index-hash operations. What am I missing? What's an index-hash operation? Ones that hit tuplesort_begin_index_hash() -- Cheers, Jeremy -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] regression tests
Hi, I don't see the regression tests running any index-hash operations. What am I missing? -- Cheers, Jeremy -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] how to pass data (tuples) to worker processes?
On 06/08/13 13:59, Robert Haas wrote: My thought is to create a queue abstraction that sits on top of the dynamic shared memory infrastructure, so that you can set aside a portion of your dynamic shared memory segment to use as a ring buffer and send messages back and forth with using some kind of API along these lines: You may find http://quatermass.co.uk/toolsmith/mplib1/ of use here. -- Cheers, Jeremy -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Support for REINDEX CONCURRENTLY
On 10/05/2012 09:03 PM, Tom Lane wrote: Note that allowing subsequent requests to jump the queue would not be a good fix for this; if you do that, it's likely the ex-lock will never be granted, at least not till the next system idle time. Offering that option to the admin sounds like a good thing, since (as Alvaro points out) the build of the replacement index could take considerable time but be done without the lock. Then the swap done in the first quiet period (but without further admin action), and the drop started. One size doesn't fit all. It doesn't need to be the only method. -- Cheers, Jeremy -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Memory usage during sorting
On 2012-03-18 15:25, Tom Lane wrote: Jeff Janes writes: On Wed, Mar 7, 2012 at 11:55 AM, Robert Haas wrote: The problem there is that none of the files can be deleted until it was entirely read, so you end up with all the data on disk twice. I don't know how often people run their databases so close to the edge on disk space that this matters, but someone felt that that extra storage was worth avoiding. Yeah, that was me, and it came out of actual user complaints ten or more years back. (It's actually not 2X growth but more like 4X growth according to the comments in logtape.c, though I no longer remember the exact reasons why.) We knew when we put in the logtape logic that we were trading off speed for space, and we accepted that. How about a runtime check of disk-free? -- Jeremy -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] random_page_cost vs seq_page_cost
On 2012-01-05 10:04, Benedikt Grundmann wrote: I have a question of how to benchmark hardware to determine the appropriate ratio of seq_page_cost vs random_page_cost. It'd be really nice if the DBMS measured actual experienced values.. -- Jeremy -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] spinlocks on powerpc
On 2012-01-03 04:44, Robert Haas wrote: On read-only workloads, you get spinlock contention, because everyone who wants a snapshot has to take the LWLock mutex to increment the shared lock count and again (just a moment later) to decrement it. Does the LWLock protect anything but the shared lock count? If not then the usually quickest manipulation is along the lines of: loop: lwarx r5,0,r3 #load and reserve add r0,r4,r5 #increment word stwcx. r0,0,r3 #store new value if still reserved bne-loop #loop if lost reservation (per IBM's software ref manual, https://www-01.ibm.com/chips/techlib/techlib.nsf/techdocs/852569B20050FF778525699600719DF2 ) The same sort of thing generally holds on other instruction-sets also. Also, heavy-contention locks should be placed in cache lines away from other data (to avoid thrashing the data cache lines when processors are fighting over the lock cache lines). -- Jeremy -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers