Re: [HACKERS] Using quicksort for every external sort run

2015-12-09 Thread Jeremy Harris
On 09/12/15 00:02, Jeff Janes wrote:
> The second one consumes that giant tape run along with 232 small tape
> runs.

In terms of number of comparisons, binary merge works best when the
inputs are of similar length.  I'd assume the same goes for n-ary
merge, but I don't know if comparison count is an issue here.
-- 
Cheers,
  Jeremy



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Equivalence Class Filters

2015-12-08 Thread Jeremy Harris
On 07/12/15 16:44, Simon Riggs wrote:
> There are many optimizations we might adopt, yet planning time is a factor.
> It seems simple enough to ignore more complex optimizations if we have
> already achieved a threshold cost (say 10). Such a test would add nearly
> zero time for the common case. We can apply the optimizations in some kind
> of ordering depending upon the cost, so we are careful to balance the
> cost/benefit of trying certain optimizations.

Given parallelism, why not continue planning after initiating a
a cancellable execution, giving a better plan to be used if the
excecution runs for long enough?
-- 
Cheers,
  Jeremy




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Getting sorted data from foreign server

2015-10-08 Thread Jeremy Harris
On 08/10/15 17:09, Robert Haas wrote:
> - You consider pushing down ORDER BY if any prefix of the query
> pathkeys satisfy is_foreign_expr(), but that doesn't seem right to me.
> If the user says SELECT * FROM remotetab ORDER BY a, unsafe(a),
> ordering the result set by a doesn't help us much.  We've talked a few
> times about an incremental sort capability that would take a stream of
> tuples already ordered by one or more columns and sort each group by
> additional columns, but I don't think we have that currently.  Without
> that capability, I don't think there's much benefit in sorting by a
> prefix of the pathkeys.  I suspect that if we can't get them all, it's
> not worth doing.

That depends how often the additional columns affect the sorted
order, if the sort method takes advantage of sorted input.
-- 
Jeremy



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] BRIN indexes for MAX, MIN, ORDER BY?

2015-09-29 Thread Jeremy Harris
On 27/09/15 21:58, Gavin Wahl wrote:
> Somewhat harder but still possible would be using BRIN indexes to
> accelerate ORDER BY. This would require a sorting algorithm that can take
> advantage of mostly-sorted inputs. You would sort the page ranges by their
> minimum or maximum value, then feed the sorting algorithm in that order.

An internal merge sort does well with partially-sorted input.
-- 
Cheers,
 Jeremy




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: Using quicksort and a merge step to significantly improve on tuplesort's single run external sort

2015-08-01 Thread Jeremy Harris
On 31/07/15 18:31, Robert Haas wrote:
 On Fri, Jul 31, 2015 at 7:21 AM, Jeremy Harris j...@wizmail.org wrote:
 Heapification is O(n) already, whether siftup (existing) or down.
 
 That's not my impression, or what Wikipedia says.  Source?

Measurements done last year:

http://www.postgresql.org/message-id/52f35462.3030...@wizmail.org
(spreadsheet attachment)

http://www.postgresql.org/message-id/52f40ce9.1070...@wizmail.org
(measurement procedure and spreadsheet explanation)

-- 
Cheers,
  Jeremy


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: Using quicksort and a merge step to significantly improve on tuplesort's single run external sort

2015-07-31 Thread Jeremy Harris
On 30/07/15 02:05, Peter Geoghegan wrote:
 Since heapification is now a big fraction of the total cost of a sort
 sometimes, even where the heap invariant need not be maintained for
 any length of time afterwards, it might be worth revisiting the patch
 to make that an O(n) rather than a O(log n) operation [3].

 O(n log n) ?

Heapification is O(n) already, whether siftup (existing) or down.

It might be worthwhile comparing actual times with a quicksort, given
that a sorted array is trivially a well-formed heap (the reverse is not
true) and that quicksort seems to be cache-friendly.  Presumably there
will be a crossover N where the cache-friendliness k reduction
loses out to the log n penalty for doing a full sort; below this
it would be useful.

You could then declare the tape buffer to be the leading tranche of
work-mem (and dump it right away) and the heap to start with the
remainder.
-- 
Cheers,
  Jeremy


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Asynchronous DRAM Self-Refresh

2015-05-23 Thread Jeremy Harris
On 22/05/15 22:30, Josh Berkus wrote:
 At CoreOS Fest, Intel presented about a technology which they used to
 improve write times for the nonrelational data store Etcd.  It's called
 Asynchronous DRAM Self-Refresh, or ADR.  This is supposedly a feature of
 all of their chips since E5 which allows users to designate a small area
 of memory (16 to 64MB) which is somehow guaranteed to be flushed to disk
 in the event of a power loss (the exact mechanism was not explained).
 
 So my thought was Hello!  wal_buffers?  Theoretically, this feature
 could give us the benefits of aynchronous commit without the penalties
 ... *if* it actually works.
 
 However, since then I've been able to find zero documentation on ADR.
 There's a bunch of stuff in the Intel press releases, but zero I can
 find in their technical docs.  Anyone have a clue on this?
 
Are you certain disk was mentioned?  The wording at

 http://www.intel.com/design/intarch/iastorage/xeon.htm

to preserve critical data in RAM during a power fail
implies not.  I assume there's a battery backup for
just that memory region, and the chip does its own refresh
rather than needing a controller; the data should still
be recoverable on next boot.
-- 
Cheers,
  Jeremy



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Asynchronous DRAM Self-Refresh

2015-05-23 Thread Jeremy Harris
On 22/05/15 22:30, Josh Berkus wrote:
 At CoreOS Fest, Intel presented about a technology which they used to
 improve write times for the nonrelational data store Etcd.  It's called
 Asynchronous DRAM Self-Refresh, or ADR.  This is supposedly a feature of
 all of their chips since E5 which allows users to designate a small area
 of memory (16 to 64MB) which is somehow guaranteed to be flushed to disk
 in the event of a power loss (the exact mechanism was not explained).
 
 So my thought was Hello!  wal_buffers?  Theoretically, this feature
 could give us the benefits of aynchronous commit without the penalties
 ... *if* it actually works.
 
 However, since then I've been able to find zero documentation on ADR.
 There's a bunch of stuff in the Intel press releases, but zero I can
 find in their technical docs.  Anyone have a clue on this?
 

http://snia.org/sites/default/files/NVDIMM%20Messaging%20and%20FAQ%20Jan%2020143.pdf

describes it as a (minor) processor feature to flush
memory-pipeline buffers to RAM on powerloss detection.
The context there is for a flash-backup on-DIMM for the
RAM.

It's unclear whether any dirty data in cpu cache gets written...
Are Intel caches never write-behind?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Abbreviated keys for text cost model fix

2015-03-03 Thread Jeremy Harris
On 03/03/15 03:08, Arthur Silva wrote:
 Does it always perform mergesort instead of quicksort when enabled?

Yes; there seemed no advantage to any additional complexity.
The merge consistently performs fewer comparisons than the
quicksort, on random input - and many fewer if there are
any sorted (or reverse-sorted) sections.  However, it also
consistently takes slightly longer (a few percent).  I was
unable to chase this down but assume it to be a cacheing
effect.  So I don't currently think it should replace the
current sort for all use.

If the planner can identify cases where there is some pre-sort
in the input (CLUSTER was mentioned?), or maybe a correlated
index, it could be worthwhile.

Also useful might be very-expensive comparison cases,
and distinct-output cases (uniqification is supported at
each submerge stage, so data disappears early).

There's potential for running submerges in parallel
using multiple cores, but I've not worked on that.

-- 
Cheers,
  Jeremy



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Abbreviated keys for text cost model fix

2015-02-25 Thread Jeremy Harris
On 25/02/15 00:42, Peter Geoghegan wrote:
 On Tue, Feb 24, 2015 at 4:32 PM, Jeremy Harris j...@wizmail.org wrote:
 On 23/02/15 16:40, Tomas Vondra wrote:
 On 22.2.2015 22:30, Peter Geoghegan wrote:
 You should try it with the data fully sorted like this, but with one
 tiny difference: The very last tuple is out of order. How does that
 look?

 If this case is actually important, a merge-sort can take
 significant advantage of the partial order:
 
 test=# set enable_intmerge_sort to on;
 
 What is this?

Associated with an implementation of an internal merge sort,
a control to disable it.  Not in the mainline sourcebase.
-- 
Cheers,
  Jeremy




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Abbreviated keys for text cost model fix

2015-02-24 Thread Jeremy Harris
On 23/02/15 16:40, Tomas Vondra wrote:
 On 22.2.2015 22:30, Peter Geoghegan wrote:
 You should try it with the data fully sorted like this, but with one 
 tiny difference: The very last tuple is out of order. How does that 
 look?

If this case is actually important, a merge-sort can take
significant advantage of the partial order:


test=# explain analyze select * from (select * from stuff_text_asc order
by randtxt offset 1000) foo;
  QUERY PLAN

---
 Limit  (cost=247054.81..247054.81 rows=1 width=18) (actual
time=25133.029..25133.029 rows=0 loops=1)
   -  Sort  (cost=242054.81..247054.81 rows=201 width=18) (actual
time=25025.931..25088.406 rows=201 loops=1)
 Sort Key: stuff_text_asc.randtxt
 Sort Method: quicksort  Memory: 221213kB  Compares: 95541376
 -  Seq Scan on stuff_text_asc  (cost=0.00..32739.01
rows=201 width=18) (actual time=0.011..118.390 rows=201 loops=1)
 Planning time: 0.080 ms
 Execution time: 25144.538 ms
(7 rows)

Time: 25145.185 ms
test=#
test=#
test=# set enable_intmerge_sort to on;
SET
Time: 0.378 ms
test=# explain analyze select * from (select * from stuff_text_asc order
by randtxt offset 1000) foo;
  QUERY PLAN

--
 Limit  (cost=247054.81..247054.81 rows=1 width=18) (actual
time=1051.603..1051.603 rows=0 loops=1)
   -  Sort  (cost=242054.81..247054.81 rows=201 width=18) (actual
time=943.304..1006.988 rows=201 loops=1)
 Sort Key: stuff_text_asc.randtxt
 Sort Method: internal merge  Memory: 221213kB  Compares: 202
 -  Seq Scan on stuff_text_asc  (cost=0.00..32739.01
rows=201 width=18) (actual time=0.009..98.474 rows=201 loops=1)
 Planning time: 0.072 ms
 Execution time: 1063.434 ms
(7 rows)

Time: 1064.113 ms
test=#
test=# set enable_intmerge_sort to off;
SET
Time: 0.353 ms
test=#
test=#
test=#
test=#
test=#
test=# explain analyze select count(distinct randtxt) from stuff_text_asc;
   QUERY PLAN

-
 Aggregate  (cost=37739.01..37739.02 rows=1 width=18) (actual
time=25196.814..25196.815 rows=1 loops=1)
   -  Seq Scan on stuff_text_asc  (cost=0.00..32739.01 rows=201
width=18) (actual time=0.010..114.995 rows=201 loops=1)
 Planning time: 0.053 ms
 Execution time: 25196.857 ms
(4 rows)

Time: 25197.371 ms
test=#
test=# explain analyze select count(*) from (select distinct randtxt
from stuff_text_asc) as foo;
 QUERY
PLAN
-
 Aggregate  (cost=277054.83..277054.84 rows=1 width=0) (actual
time=25521.258..25521.258 rows=1 loops=1)
   -  Unique  (cost=242054.81..252054.81 rows=201 width=18) (actual
time=25101.157..25438.622 rows=1999100 loops=1)
 -  Sort  (cost=242054.81..247054.81 rows=201 width=18)
(actual time=25101.156..25184.436 rows=201 loops=1)
   Sort Key: stuff_text_asc.randtxt
   Sort Method: quicksort  Memory: 221213kB  Compares: 95541376
   -  Seq Scan on stuff_text_asc  (cost=0.00..32739.01
rows=201 width=18) (actual time=0.011..116.509 rows=201 loops=1)
 Planning time: 0.088 ms
 Execution time: 25532.947 ms
(8 rows)

Time: 25533.642 ms
test=#
test=#
test=# set enable_intmerge_sort to on;
SET
Time: 0.401 ms
test=# explain analyze select count(*) from (select distinct randtxt
from stuff_text_asc) as foo;
  QUERY PLAN

---
 Aggregate  (cost=272054.82..272054.83 rows=1 width=0) (actual
time=1184.289..1184.289 rows=1 loops=1)
   -  Sort  (cost=242054.81..247054.81 rows=201 width=18) (actual
time=1037.019..1100.720 rows=1999100 loops=1)
 Sort Key: stuff_text_asc.randtxt
 Sort Method: dedup internal merge  Memory: 221143kB  Compares:
201
 -  Seq Scan on stuff_text_asc  (cost=0.00..32739.01
rows=201 width=18) (actual time=0.010..106.729 rows=201 loops=1)
 Planning time: 0.086 ms
 Execution time: 1195.891 ms
(7 rows)

Time: 1196.514 ms
test=#


-- 
Cheers,
  Jeremy


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] EXPLAIN ANALYZE output weird for Top-N Sort

2014-11-14 Thread Jeremy Harris
On 14/11/14 00:46, Simon Riggs wrote:
 Limit  (cost= rows=20 width=175) (actual time= rows=20 loops=1)
-  Sort  (cost= rows=568733 width=175) (actual time=
 rows=20 loops=1)
  Sort Method: top-N heapsort

Going off on a tangent, when I was playing with a merge-sort
implementation I propagated limit information into the sort
node, for a significant win.  Eliding the Limit node gave
a further slight win.

I wasn't convinced the use-case was common enough to justify
the replacement of quicksort (despite having consistently
fewer compares, the merge sort was slightly slower.  I never
understood why) - but I never asked.  Is there any appetite
for supporting alternate sort algorithms?
-- 
Cheers,
  Jeremy



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] EXPLAIN ANALYZE output weird for Top-N Sort

2014-11-14 Thread Jeremy Harris
On 14/11/14 14:54, Tom Lane wrote:
 Jeremy Harris j...@wizmail.org writes:
 On 14/11/14 00:46, Simon Riggs wrote:
 Limit  (cost= rows=20 width=175) (actual time= rows=20 loops=1)
 -  Sort  (cost= rows=568733 width=175) (actual time=
 rows=20 loops=1)
 Sort Method: top-N heapsort
 
 Going off on a tangent, when I was playing with a merge-sort
 implementation I propagated limit information into the sort
 node, for a significant win.
 
 I'm not entirely following.  The top-N heapsort approach already
 makes use of the limit info.

Having gone back to look, you're right.  It was Uniq nodes I merged
(the sort handles both bounded-output and dedup).
-- 
Cheers,
  Jeremy




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Directory/File Access Permissions for COPY and Generic File Access Functions

2014-10-29 Thread Jeremy Harris
On 29/10/14 16:11, Andres Freund wrote:
  I do think checking the link count to
 be 1 is safe though.

You will fail against certain styles of online-backup.
-- 
Cheers,
  Jeremy



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Minor performance improvement in transition to external sort

2014-02-26 Thread Jeremy Harris

On 26/02/14 03:08, Robert Haas wrote:

On Tue, Feb 25, 2014 at 2:55 PM, Jeremy Harris j...@wizmail.org wrote:

On 24/02/14 17:38, Robert Haas wrote:


On Thu, Feb 20, 2014 at 7:27 PM, Jeremy Harris j...@wizmail.org wrote:


Run under cachegrind, it takes about N/10 last-level cache misses,
all for the new item being introduced to the heap.  The existing
code takes none at all.



Can you explain this further?  This seems like an important clue that
could be useful when trying to optimize this code, but I'm a little
unclear which part of the operation has more cache misses with your
changes and why.



In the patched version, for the heapify operation the outer loop
starts at the last heap-parent tuple and works backward to the
start of the tuples array.  A copy is taken of the SortTuple being
operated on for the inner loop to use.  This copy suffers cache misses.

(The inner loop operates on elements between the copy source and the
end of the array).


In the original, the outer loop runs the array in increasing index
order.  Again a copy is taken of the SortTuple for the inner loop
to use.  This copy does not appear to take significant cache misses.

(The inner loop operates on elements between the copy source and
the start of the array).


Do you have any theory as to why this happens in one case but not the other?


Nope.  Only really wild stuff requiring the cpu memory channel
recognising the forward scan (despite the inner loop) and not
the backward one - and ignoring prefetch instructions, which I
experimented with.
--
Cheers,
   Jeremy




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Minor performance improvement in transition to external sort

2014-02-25 Thread Jeremy Harris

On 24/02/14 17:38, Robert Haas wrote:

On Thu, Feb 20, 2014 at 7:27 PM, Jeremy Harris j...@wizmail.org wrote:

Run under cachegrind, it takes about N/10 last-level cache misses,
all for the new item being introduced to the heap.  The existing
code takes none at all.


Can you explain this further?  This seems like an important clue that
could be useful when trying to optimize this code, but I'm a little
unclear which part of the operation has more cache misses with your
changes and why.


In the patched version, for the heapify operation the outer loop
starts at the last heap-parent tuple and works backward to the
start of the tuples array.  A copy is taken of the SortTuple being
operated on for the inner loop to use.  This copy suffers cache misses.

(The inner loop operates on elements between the copy source and the
end of the array).


In the original, the outer loop runs the array in increasing index
order.  Again a copy is taken of the SortTuple for the inner loop
to use.  This copy does not appear to take significant cache misses.

(The inner loop operates on elements between the copy source and
the start of the array).

--
Cheers,
   Jeremy



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Minor performance improvement in transition to external sort

2014-02-20 Thread Jeremy Harris

On 09/02/14 17:11, Jeremy Harris wrote:

On 06/02/14 18:21, Jeff Janes wrote:

  Did you try sorting already-sorted, reverse
sorted, or pipe-organ shaped data sets?  We will also need to test it on
strings.  I usually use md5(random()::text) to generate strings for such
purposes, at least for a first pass.


Attached is version 2 of the patch, which fixes the performance on
constant-input.


Having beaten on this some more I'm prepared to abandon it.

The wallclock time, for random input, drifts up at larger N
(compared to the existing code) despite the number of comparisons
being consistently less.

Run under cachegrind, it takes about N/10 last-level cache misses,
all for the new item being introduced to the heap.  The existing
code takes none at all.


It might be worthwhile for a seriously expensive comparison function;
say, more than 50 clocks.  For integers and md5-strings it isn't.
--
Cheers,
  Jeremy




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Minor performance improvement in transition to external sort

2014-02-07 Thread Jeremy Harris

On 06/02/14 22:12, Jeremy Harris wrote:

 Did you try sorting already-sorted, reverse
sorted, or pipe-organ shaped data sets?


Summary (low numbers better):

Random ints: 83% compares, level on time.
Sorted ints: level compares, 70% time.
Reverse-sorted ints: 10% compares, 15% time  (!)
Constant ints:   200% compares, 360% time(ouch, and not O(n))
Pipe-organ ints: 80% compares, 107% time
Random text: 83% compares, 106% time

--
Cheers,
  Jeremy


siftdown_performance.ods
Description: application/vnd.oasis.opendocument.spreadsheet

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Minor performance improvement in transition to external sort

2014-02-06 Thread Jeremy Harris

On 05/02/14 21:56, Robert Haas wrote:

On Tue, Feb 4, 2014 at 5:22 PM, Jeremy Harris j...@wizmail.org wrote:

The attached patch replaces the existing siftup method for heapify with
a siftdown method. Tested with random integers it does 18% fewer
compares and takes 10% less time for the heapify, over the work_mem
range 1024 to 1048576.

Both algorithms appear to be O(n) (contradicting Wikipedia's claim
that a siftup heapify is O(n log n)).


I think Wikipedia is right.  Inserting an a tuple into a heap is O(lg
n); doing that n times is therefore O(n lg n).  Your patch likewise
implements an outer loop which runs O(n) times and an inner loop which
runs at most O(lg n) times, so a naive analysis of that algorithm also
comes out to O(n lg n).


The patch implements a siftdown.  Measurements attached.
--
Cheers,
  Jeremy


work_mem	baseline		Sift-down			baseline K	siftdown K		K ratio		baseline K	siftdown K		K ratio
	cmps	time	cmps	time		cmps	cmps		cmps		time	time		time
1024	27271	0.001	22426	0.001		26.6	21.9		82%		6.76E-07	4.93E-07		73%
2048	52153	0.001	44557	0.001		25.5	21.8		85%		6.28E-07	4.47E-07		71%
4096	108660	0.003	89591	0.002		26.5	21.9		82%		6.47E-07	4.63E-07		72%
8192	217341	0.005	179191	0.004		26.5	21.9		82%		6.57E-07	4.88E-07		74%
16384	434101	0.011	358680	0.009		26.5	21.9		83%		6.55E-07	5.42E-07		83%
32768	871110	0.022	717542	0.019		26.6	21.9		82%		6.60E-07	5.74E-07		87%
65536	1740355	0.043	1434967	0.039		26.6	21.9		82%		6.59E-07	5.96E-07		90%
131072	3478497	0.087	2869190	0.078		26.5	21.9		82%		6.62E-07	5.98E-07		90%
262144	6962693	0.173	5740518	0.154		26.6	21.9		82%		6.58E-07	5.89E-07		89%
524288	13912236	0.345	11476660	0.311		26.5	21.9		82%		6.59E-07	5.93E-07		90%
1048576	27839260	0.690	22957368	0.622		26.5	21.9		82%		6.58E-07	5.93E-07		90%

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Minor performance improvement in transition to external sort

2014-02-06 Thread Jeremy Harris

On 06/02/14 18:21, Jeff Janes wrote:

  How big of
sets were you sorting in each case?


Big enough to go external.  The timings and compare-counts given are
purely for the heapify stage not the total for the sort, so are
constrained by the work_mem not by the sort size per se.

I'm limited to working on a small machine, so the work_mem value
of 1e+6 is approaching my top limit of sort-time pain unless I rip the
heapify out into a test harness.  What range of work_mem is it useful
to test over?


 Was it always just slightly bigger
than would fit in work_mem?


I didn't check.


 Did you try sorting already-sorted, reverse
sorted, or pipe-organ shaped data sets?


Not yet, but I can.  What variety of pipe-organ shape is of
interest (I can think of several that might be called that)?


 We will also need to test it on
strings.  I usually use md5(random()::text) to generate strings for such
purposes, at least for a first pass.


OK.



The name of the attached patch is a bit unfortunate.


Is there a project standard for this?


 And I'm not sure what
you are doing with tupindex, the change there doesn't seem to be necessary
to the rest of your work, so some explanation on that would be helpful.


I'll add commentary.


The project coding style usually has comments in blocks before loops on
which they comment, rather than interspersed throughout the clauses of the
for statement.


I'll adjust that.


Another optimization that is possible is to do only one comparison at each
level (between the two children) all the way down the heap, and then
compare the parent to the chain of smallest children in reverse order.
  This can substantially reduce the number of comparisons on average,
because most tuples sift most of the way down the heap (because most of the
tuples are at the bottom of the heap).


Sounds interesting; I'll see if I can get that going.


(Also, I think you should make your siftdown code look more like the
existing siftdown code.)


A function called by the outer loop?  Can do; the existing does that
because the function is called from multiple places but this will only
be used the once.

Thanks for the review.
--
Jeremy



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Minor performance improvement in transition to external sort

2014-02-06 Thread Jeremy Harris

On 06/02/14 14:22, Robert Haas wrote:

On Thu, Feb 6, 2014 at 4:22 AM, Jeremy Harris j...@wizmail.org wrote:

On 05/02/14 21:56, Robert Haas wrote:

On Tue, Feb 4, 2014 at 5:22 PM, Jeremy Harris j...@wizmail.org wrote:

The attached patch replaces the existing siftup method for heapify with
a siftdown method. Tested with random integers it does 18% fewer
compares and takes 10% less time for the heapify, over the work_mem
range 1024 to 1048576.

Both algorithms appear to be O(n) (contradicting Wikipedia's claim
that a siftup heapify is O(n log n)).



I think Wikipedia is right.  Inserting an a tuple into a heap is O(lg
n); doing that n times is therefore O(n lg n).  Your patch likewise
implements an outer loop which runs O(n) times and an inner loop which
runs at most O(lg n) times, so a naive analysis of that algorithm also
comes out to O(n lg n).


The patch implements a siftdown.  Measurements attached.


Uh, this spreadsheet is useless.  You haven't explained how these
numbers were generated, or what the columns in the spreadsheet
actually mean.  Ideally, you should include enough information that
someone else can try to reproduce your results.



I'm sorry I wasn't clear.

Test code was groups of the form:

set work_mem to 1024;
CREATE TABLE jgh (i integer);
INSERT INTO jgh (i) SELECT 2 * random() FROM generate_series(1, 2);
VACUUM ANALYZE jgh;
explain analyze SELECT * FROM jgh ORDER BY i;
set enable_siftdown_heapify to off;
explain analyze SELECT * FROM jgh ORDER BY i;
set enable_siftdown_heapify to on;
DROP TABLE jgh;

.. for the tabulated work_mem, and scaled values for the INSERT.

Compares were counted for the heapify stage (only), and timings
taken using pg_rusage_init() before and pg_rusage_show() after
it.  Times are in seconds.

Baseline value for compares and time used 858ec11858.
Sift-down compares and time are for the patch.

Baseline K cmps is compares divided by work_mem (which should
be a scaled version of compares-per-tuple being heapified) pre-patch.
Siftdown K cmps is likewise with the patch.

K ratio cmps is the ratio  (patched compares)/(unpatched compares).

Repeat for time measurements.

--
Cheers,
   Jeremy


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Minor performance improvement in transition to external sort

2014-02-04 Thread Jeremy Harris

The attached patch replaces the existing siftup method for heapify with
a siftdown method. Tested with random integers it does 18% fewer
compares and takes 10% less time for the heapify, over the work_mem
range 1024 to 1048576.

Both algorithms appear to be O(n) (contradicting Wikipedia's claim
that a siftup heapify is O(n log n)).

--
Cheers,
   Jeremy
diff --git a/src/backend/utils/sort/tuplesort.c 
b/src/backend/utils/sort/tuplesort.c
index 8b520c1..2ea7b2d 100644
--- a/src/backend/utils/sort/tuplesort.c
+++ b/src/backend/utils/sort/tuplesort.c
@@ -1211,7 +1211,8 @@ puttuple_common(Tuplesortstate *state, SortTuple *tuple)
(void) grow_memtuples(state);
Assert(state-memtupcount  state-memtupsize);
}
-   state-memtuples[state-memtupcount++] = *tuple;
+   state-memtuples[state-memtupcount] = *tuple;
+   state-memtuples[state-memtupcount++].tupindex = 0;
 
/*
 * Check if it's time to switch over to a bounded 
heapsort. We do
@@ -1884,23 +1885,47 @@ inittapes(Tuplesortstate *state)
state-tp_tapenum = (int *) palloc0(maxTapes * sizeof(int));
 
/*
-* Convert the unsorted contents of memtuples[] into a heap. Each tuple 
is
-* marked as belonging to run number zero.
-*
-* NOTE: we pass false for checkIndex since there's no point in 
comparing
-* indexes in this step, even though we do intend the indexes to be part
-* of the sort key...
+* Convert the unsorted contents of memtuples[] into a heap. Each tuple 
was
+* marked as belonging to run number zero on input; we don't compare 
that.
 */
ntuples = state-memtupcount;
-   state-memtupcount = 0; /* make the heap empty */
-   for (j = 0; j  ntuples; j++)
{
-   /* Must copy source tuple to avoid possible overwrite */
-   SortTuple   stup = state-memtuples[j];
+   SortTuple * m = state-memtuples;
 
-   tuplesort_heap_insert(state, stup, 0, false);
+   for (j = (ntuples-2)/2; /* comb the array from the last heap 
parent */
+j = 0;/* to the start */
+j--)
+   {
+   int root = j;
+   int child, swap = 0;
+   SortTuple ptup = m[root];
+
+   while ((child = root*2 + 1) = ntuples-1)
+   {   /* root has at least one child. Check 
left-child */
+   if (COMPARETUP(state, ptup, m[child])  0)
+   {   
/* [root]  [lchild] */
+   if (  ++child = ntuples-1  
/* check right-child */
+   COMPARETUP(state, ptup, 
m[child])  0)
+   swap = child;
+   else
+   break;  
/* [root] smallest */
+   }
+   else
+   {
+   swap = child;
+   if (  ++child = ntuples-1  
/* check right-child */
+   COMPARETUP(state, m[swap], 
m[child])  0)
+   swap = child;
+   }
+   /* [swap] is smallest of three; move into the 
parent slot */
+   m[root] = m[swap];
+   root = swap;/* and repeat 
with the child subtree */
+   }
+   if (swap)
+   m[swap] = ptup;
+   /* This and all heap nodes after are now 
well-positioned */
+   }
}
-   Assert(state-memtupcount == ntuples);
 
state-currentRun = 0;
 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] A better way than tweaking NTUP_PER_BUCKET

2014-01-28 Thread Jeremy Harris

On 27/01/14 18:00, Simon Riggs wrote:

On 27 January 2014 17:44, Pavel Stehule pavel.steh...@gmail.com wrote:


This topic is interesting - we found very bad performance with hashing large
tables with high work_mem. MergeJoin with quicksort was significantly
faster.


I've seen this also.


I didn't deeper research - there is a possibility of virtualization
overhead.


I took measurements and the effect was repeatable and happened for all
sizes of work_mem, but nothing more to add.


FWIW my current list-based internal merge seems to perform worse at
larger work-mem, compared to quicksort.   I've been starting to wonder
if the rate of new ram-chip page opens is an issue (along with the
more-usually considered cache effects).  Any random-access workload
would be affected by this, if it really exists.
--
Cheers,
   Jeremy



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Questionable coding in orderedsetaggs.c

2014-01-25 Thread Jeremy Harris

In ordered_set_startup() sorts are initialised in non-randomAccess mode
(tuplesort_begin_heap() and ~datum(), last argument).

The use of tuplesort_skip_tuples() feels very like a random access to
me.  I think it doesn't fail because the only use (and implementation)
is to skip forwards; if backwards were tried (as the interface permits)
external sorts would fail because multiple tapes are present for
FINALMERGE.
--
Cheers,
   Jeremy


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PoC: Duplicate Tuple Elidation during External Sort for DISTINCT

2014-01-22 Thread Jeremy Harris

On 22/01/14 03:16, Jon Nelson wrote:

Greetings -hackers:

I have worked up a patch to PostgreSQL which elides tuples during an
external sort. The primary use case is when sorted input is being used
to feed a DISTINCT operation. The idea is to throw out tuples that
compare as identical whenever it's convenient, predicated on the
assumption that even a single I/O is more expensive than some number
of (potentially extra) comparisons.  Obviously, this is where a cost
model comes in, which has not been implemented. This patch is a
work-in-progress.



Dedup-in-sort is also done by my WIP internal merge sort, and
extended (in much the same ways as Jon's) to the external merge.

https://github.com/j47996/pgsql_sorb


I've not done a cost model either, but the dedup capability is
exposed from tuplesort.c to the executor, and downstream uniq
nodes removed.

I've not worked out yet how to eliminate upstream hashagg nodes,
which would be worthwhile from testing results.

--
Cheers,
   Jeremy


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PoC: Duplicate Tuple Elidation during External Sort for DISTINCT

2014-01-22 Thread Jeremy Harris

On 22/01/14 03:53, Tom Lane wrote:

Jon Nelson jnelson+pg...@jamponi.net writes:

A rough summary of the patch follows:



- a GUC variable enables or disables this capability
- in nodeAgg.c, eliding duplicate tuples is enabled if the number of
   distinct columns is equal to the number of sort columns (and both are
   greater than zero).
- in createplan.c, eliding duplicate tuples is enabled if we are
   creating a unique plan which involves sorting first
- ditto planner.c
- all of the remaining changes are in tuplesort.c, which consist of:
   + a new macro, DISCARDTUP and a new structure member, discardtup, are
 both defined and operate similar to COMPARETUP, COPYTUP, etc...
   + in puttuple_common, when state is TSS_BUILDRUNS, we *may* simply
 throw out the new tuple if it compares as identical to the tuple at
 the top of the heap. Since we're already performing this comparison,
 this is essentially free.
   + in mergeonerun, we may discard a tuple if it compares as identical
 to the *last written tuple*. This is a comparison that did not take
 place before, so it's not free, but it saves a write I/O.
   + We perform the same logic in dumptuples


[ raised eyebrow ... ]  And what happens if the planner drops the
unique step and then the sort doesn't actually go to disk?


I don't think Jon was suggesting that the planner drop the unique step.
--
Cheers,
   Jeremy



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PoC: Partial sort

2014-01-18 Thread Jeremy Harris

On 13/01/14 18:01, Alexander Korotkov wrote:

Thanks. It's included into attached version of patch. As wall as estimation
improvements, more comments and regression tests fix.


Would it be possible to totally separate the two sets of sort-keys,
only giving the non-index set to the tuplesort?  At present tuplesort
will, when it has a group to sort, make wasted compares on the
indexed set of keys before starting on the non-indexed set.
--
Cheers,
  Jeremy



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PoC: Partial sort

2014-01-18 Thread Jeremy Harris

On 31/12/13 01:41, Andreas Karlsson wrote:

On 12/29/2013 08:24 AM, David Rowley wrote:

If it was possible to devise some way to reuse any
previous tuplesortstate perhaps just inventing a reset method which
clears out tuples, then we could see performance exceed the standard
seqscan - sort. The code the way it is seems to lookup the sort
functions from the syscache for each group then allocate some sort
space, so quite a bit of time is also spent in palloc0() and pfree()

If it was not possible to do this then maybe adding a cost to the number
of sort groups would be better so that the optimization is skipped if
there are too many sort groups.


It should be possible. I have hacked a quick proof of concept for
reusing the tuplesort state. Can you try it and see if the performance
regression is fixed by this?

One thing which have to be fixed with my patch is that we probably want
to close the tuplesort once we have returned the last tuple from
ExecSort().

I have attached my patch and the incremental patch on Alexander's patch.


How does this work in combination with randomAccess ?
--
Thanks,
   Jeremy



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PoC: Partial sort

2014-01-16 Thread Jeremy Harris

On 22/12/13 20:26, Alexander Korotkov wrote:

On Sat, Dec 14, 2013 at 6:30 PM, Jeremy Harris j...@wizmail.org wrote:


On 14/12/13 12:54, Andres Freund wrote:

Is that actually all that beneficial when sorting with a bog standard
qsort() since that doesn't generally benefit from data being pre-sorted?
I think we might need to switch to a different algorithm to really
benefit from mostly pre-sorted input.



Eg:  http://www.postgresql.org/message-id/5291467e.6070...@wizmail.org

Maybe Alexander and I should bash our heads together.



Partial sort patch is mostly optimizer/executor improvement rather than
improvement of sort algorithm itself.


I finally got as far as understanding Alexander's cleverness, and it
does make the performance advantage (on partially-sorted input) of the
merge-sort irrelevant.

There's a slight tradeoff possible between the code complexity of
the chunking code front-ending the sorter and just using the
enhanced sorter.  The chunking does reduce the peak memory usage
quite nicely too.

The implementation of the chunker does O(n) compares using the
keys of the feed-stream index, to identify the chunk boundaries.
Would it be possible to get this information from the Index Scan?
--
Cheers,
   Jeremy



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance

2014-01-16 Thread Jeremy Harris

On 14/01/14 22:23, Dave Chinner wrote:

On Tue, Jan 14, 2014 at 11:40:38AM -0800, Kevin Grittner wrote:

To quantify that, in a production setting we were seeing pauses of
up to two minutes with shared_buffers set to 8GB and default dirty

^

page settings for Linux, on a machine with 256GB RAM and 512MB

   ^
There's your problem.

By default, background writeback doesn't start until 10% of memory
is dirtied, and on your machine that's 25GB of RAM. That's way to
high for your workload.

It appears to me that we are seeing large memory machines much more
commonly in data centers - a couple of years ago 256GB RAM was only
seen in supercomputers. Hence machines of this size are moving from
tweaking settings for supercomputers is OK class to tweaking
settings for enterprise servers is not OK

Perhaps what we need to do is deprecate dirty_ratio and
dirty_background_ratio as the default values as move to the byte
based values as the defaults and cap them appropriately.  e.g.
10/20% of RAM for small machines down to a couple of GB for large
machines


whisper  Perhaps the kernel needs a dirty-amount control measured
in time units rather than pages (it being up to the kernel to
measure the achievable write rate)...
--
Cheers,
   Jeremy


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PoC: Partial sort

2013-12-14 Thread Jeremy Harris

On 14/12/13 12:54, Andres Freund wrote:

On 2013-12-14 13:59:02 +0400, Alexander Korotkov wrote:

Currently when we need to get ordered result from table we have to choose
one of two approaches: get results from index in exact order we need or do
sort of tuples. However, it could be useful to mix both methods: get
results from index in order which partially meets our requirements and do
rest of work from heap.



--
  Limit  (cost=69214.06..69214.08 rows=10 width=16) (actual
time=0.097..0.099 rows=10 loops=1)
-  Sort  (cost=69214.06..71714.06 rows=100 width=16) (actual
time=0.096..0.097 rows=10 loops=1)
  Sort Key: v1, v2
  Sort Method: top-N heapsort  Memory: 25kB
  -  Index Scan using test_v1_idx on test  (cost=0.42..47604.42
rows=100 width=16) (actual time=0.017..0.066 rows=56 loops=1)
  Total runtime: 0.125 ms
(6 rows)


Is that actually all that beneficial when sorting with a bog standard
qsort() since that doesn't generally benefit from data being pre-sorted?
I think we might need to switch to a different algorithm to really
benefit from mostly pre-sorted input.


Eg:  http://www.postgresql.org/message-id/5291467e.6070...@wizmail.org

Maybe Alexander and I should bash our heads together.

--
Cheers,
   Jeremy


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Dynamic Shared Memory stuff

2013-11-23 Thread Jeremy Harris

On 20/11/13 19:58, Robert Haas wrote:

Parallel sort, and then parallel other stuff.  Eventually general
parallel query.

I have recently updated https://wiki.postgresql.org/wiki/Parallel_Sort
and you may find that interesting/helpful as a statement of intent.


I've been playing with an internal merge sort which may be of interest
in this area of work.  While I've not worked on parallelizing the merge
stages it does not feel like an impossible task.  More to the immediate
point, the input stage and readout stage can do useful work
overlapped with the data source and sink (respectively).

The current implementation uses the same amount of memory as the
quicksort one, and does approximately the same number of compares
(on random input).  The overheads are higher than the quicksort
implementation (up to 50% higher cpu time, on a single key of
random integers).

Its performance shines on partially- or reverse-sorted input.

Transition to (the existing) external sort is implemented, as is
the limited random-access facility, and bounded output.

It supports unique-output.  The planner interface is extended to
return an indication of dedup guarantee, and the Unique node uses
this to elide itself at planning time.  Dedup operations also
cover external sorts.
--
Cheers,
   Jeremy


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fast insertion indexes: why no developments

2013-11-13 Thread Jeremy Harris

On 13/11/13 09:07, Leonardo Francalanci wrote:

Problem? having 4 btree indexes on random values (imsi+imei * 2,
since we have calling and caller) kills the performance in insertion
after a while.


Surely there's good correlation between IMSI  IMEI, so have a separate
table to translate one to (a group of) the others, and
halve the indexes on your main table?
--
Jeremy



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] regression tests

2013-09-06 Thread Jeremy Harris

Hi,

   I don't see the regression tests running any index-hash operations.
What am I missing?

--
Cheers,
Jeremy


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] regression tests

2013-09-06 Thread Jeremy Harris

On 06/09/13 15:44, Robert Haas wrote:

On Fri, Sep 6, 2013 at 3:34 AM, Jeremy Harris j...@wizmail.org wrote:

I don't see the regression tests running any index-hash operations.
What am I missing?


What's an index-hash operation?



Ones that hit tuplesort_begin_index_hash()
--
Cheers,
   Jeremy


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] how to pass data (tuples) to worker processes?

2013-08-07 Thread Jeremy Harris

On 06/08/13 13:59, Robert Haas wrote:

  My thought is to create a
queue abstraction that sits on top of the dynamic shared memory
infrastructure, so that you can set aside a portion of your dynamic
shared memory segment to use as a ring buffer and send messages back
and forth with using some kind of API along these lines:


You may find http://quatermass.co.uk/toolsmith/mplib1/ of use here.
--
Cheers,
   Jeremy


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for REINDEX CONCURRENTLY

2012-10-06 Thread Jeremy Harris

On 10/05/2012 09:03 PM, Tom Lane wrote:

Note that allowing subsequent requests to jump the queue would not be a
good fix for this; if you do that, it's likely the ex-lock will never be
granted, at least not till the next system idle time.


Offering that option to the admin sounds like a good thing, since
(as Alvaro points out) the build of the replacement index could take
considerable time but be done without the lock.  Then the swap
done in the first quiet period (but without further admin action),
and the drop started.

One size doesn't fit all.  It doesn't need to be the only method.
--
Cheers,
Jeremy



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Memory usage during sorting

2012-03-18 Thread Jeremy Harris

On 2012-03-18 15:25, Tom Lane wrote:

Jeff Janesjeff.ja...@gmail.com  writes:

On Wed, Mar 7, 2012 at 11:55 AM, Robert Haasrobertmh...@gmail.com  wrote:
The problem there is that none of the files can be deleted until it
was entirely read, so you end up with all the data on disk twice.  I
don't know how often people run their databases so close to the edge
on disk space that this matters, but someone felt that that extra
storage was worth avoiding.


Yeah, that was me, and it came out of actual user complaints ten or more
years back.  (It's actually not 2X growth but more like 4X growth
according to the comments in logtape.c, though I no longer remember the
exact reasons why.)  We knew when we put in the logtape logic that we
were trading off speed for space, and we accepted that.


How about a runtime check of disk-free?
--
Jeremy


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] random_page_cost vs seq_page_cost

2012-01-05 Thread Jeremy Harris

On 2012-01-05 10:04, Benedikt Grundmann wrote:

I have a question of how to benchmark hardware to determine
the appropriate ratio of seq_page_cost vs random_page_cost.


It'd be really nice if the DBMS measured actual experienced values..

--
Jeremy

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] spinlocks on powerpc

2012-01-03 Thread Jeremy Harris

On 2012-01-03 04:44, Robert Haas wrote:

On read-only workloads, you get spinlock contention, because everyone
who wants a snapshot has to take the LWLock mutex to increment the
shared lock count and again (just a moment later) to decrement it.


Does the LWLock protect anything but the shared lock count?  If not
then the usually quickest manipulation is along the lines of:

loop: lwarx r5,0,r3  #load and reserve
add r0,r4,r5 #increment word
stwcx. r0,0,r3  #store new value if still reserved
bne-loop  #loop if lost reservation

(per IBM's software ref manual,
 
https://www-01.ibm.com/chips/techlib/techlib.nsf/techdocs/852569B20050FF778525699600719DF2
)

The same sort of thing generally holds on other instruction-sets also.

Also, heavy-contention locks should be placed in cache lines away from other
data (to avoid thrashing the data cache lines when processors are fighting
over the lock cache lines).
--
Jeremy

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers