Re: [PERFORM] B-Heaps

2010-06-20 Thread Greg Smith

Robert Haas wrote:

This is drifting a bit off-topic for this thread, but it's not so easy
to figure out from looking at the TODO which things are actually
important.  Performance-related improvements are mixed in with
non-performance related improvements, which are mixed in with things
that are probably not improvements at all.  And even to the extent
that you can identify the stuff that's performance-related, it's far
from obvious which things are most important.  Any thoughts on that


I don't think it's off topic at all actually, and as usually I'll be 
happy to argue why.  Reorganizing the TODO so that it's easier for 
newcomers to consume is certainly a worthwhile but hard to fund (find 
time to do relative to more important things) effort itself.  My point 
was more that statistically, *anything* on that list is likely a better 
candidate for something to work on usefully than one of the random 
theoretical performance improvements from research that pop on the lists 
from time to time.  People get excited about these papers and blog posts 
sometimes, but the odds of those actually being in the critical path 
where it represents a solution to a current PostgreSQL bottleneck is 
dramatically lower than that you'll find one reading the list of *known* 
issues.  Want to improve PostgreSQL performance?  Spend more time 
reading the TODO, less looking around elsewhere for problems the 
database may or may not have.


I have a major time sink I'm due to free myself from this week, and the 
idea of providing some guidance for a low hanging performance fruit 
section of the TODO is a good one I should take a look at.  I have a 
personal list of that sort already I should probably just make public, 
since the ideas for improving things are not the valuable part I should 
worry about keeping private anyway.


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] B-Heaps

2010-06-19 Thread Robert Haas
On Fri, Jun 18, 2010 at 1:53 PM, Greg Smith g...@2ndquadrant.com wrote:
 Matthew Wakeling wrote:

 This sort of thing has been fairly well researched at an academic level,
 but has not been implemented in that many real world situations. I would
 encourage its use in Postgres.

 I guess, but don't forget that work on PostgreSQL is driven by what problems
 people are actually running into.  There's a long list of performance
 improvements sitting in the TODO list waiting for people to find time to
 work on them, ones that we're quite certain are useful.  That anyone is
 going to chase after any of these speculative ideas from academic research
 instead of one of those is unlikely.  Your characterization of the potential
 speed up here is Using a proper tree inside the index page would improve
 the CPU usage of the index lookups, which seems quite reasonable.
  Regardless, when I consider is that something I have any reason to suspect
 is a bottleneck on common workloads?, I don't think of any, and return to
 working on one of things I already know is instead.

This is drifting a bit off-topic for this thread, but it's not so easy
to figure out from looking at the TODO which things are actually
important.  Performance-related improvements are mixed in with
non-performance related improvements, which are mixed in with things
that are probably not improvements at all.  And even to the extent
that you can identify the stuff that's performance-related, it's far
from obvious which things are most important.  Any thoughts on that?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] B-Heaps

2010-06-18 Thread Robert Haas
On Tue, Jun 15, 2010 at 8:23 AM, Matthew Wakeling matt...@flymine.org wrote:
 Absolutely, and I said in
 http://archives.postgresql.org/pgsql-performance/2010-03/msg00272.php
 but applied to the Postgres B-tree indexes instead of heaps.

This is an interesting idea.  I would guess that you could simulate
this to some degree by compiling PG with a larger block size.  Have
you tried this to see whether/how much/for what kind of workloads it
helps?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] B-Heaps

2010-06-18 Thread Matthew Wakeling

On Fri, 18 Jun 2010, Robert Haas wrote:

On Tue, Jun 15, 2010 at 8:23 AM, Matthew Wakeling matt...@flymine.org wrote:

Absolutely, and I said in
http://archives.postgresql.org/pgsql-performance/2010-03/msg00272.php
but applied to the Postgres B-tree indexes instead of heaps.


This is an interesting idea.  I would guess that you could simulate
this to some degree by compiling PG with a larger block size.  Have
you tried this to see whether/how much/for what kind of workloads it
helps?


To a degree, that is the case. However, if you follow the thread a bit 
further back, you will find evidence that when the index is in memory, 
increasing the page size actually decreases the performance, because it 
uses more CPU.


To make it clear - 8kB is not an optimal page size for either fully cached 
data or sparsely cached data. For disc access, large pages are 
appropriate, on the order of 256kB. If the page size is much lower than 
that, then the time taken to fetch it doesn't actually decrease much, and 
we are trying to get the maximum amount of work done per fetch without 
slowing fetches down significantly.


Given such a large page size, it would then be appropriate to have a 
better data structure inside the page. Currently, our indexes (at least 
the GiST ones - I haven't looked at the Btree ones) use a simple linear 
array in the index page. Using a proper tree inside the index page would 
improve the CPU usage of the index lookups.


One detail that would need to be sorted out is the cache eviction policy. 
I don't know if it is best to evict whole 256kB pages, or to evict 8kB 
pages. Probably the former, which would involve quite a bit of change to 
the shared memory cache. I can see the cache efficiency decreasing as a 
result of this, which is the only disadvantage I can see.


This sort of thing has been fairly well researched at an academic level, 
but has not been implemented in that many real world situations. I would 
encourage its use in Postgres.


Matthew

--
Failure is not an option. It comes bundled with your Microsoft product. 
-- Ferenc Mantfeld


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] B-Heaps

2010-06-18 Thread Greg Smith

Matthew Wakeling wrote:
This sort of thing has been fairly well researched at an academic 
level, but has not been implemented in that many real world 
situations. I would encourage its use in Postgres.


I guess, but don't forget that work on PostgreSQL is driven by what 
problems people are actually running into.  There's a long list of 
performance improvements sitting in the TODO list waiting for people to 
find time to work on them, ones that we're quite certain are useful.  
That anyone is going to chase after any of these speculative ideas from 
academic research instead of one of those is unlikely.  Your 
characterization of the potential speed up here is Using a proper tree 
inside the index page would improve the CPU usage of the index lookups, 
which seems quite reasonable.  Regardless, when I consider is that 
something I have any reason to suspect is a bottleneck on common 
workloads?, I don't think of any, and return to working on one of 
things I already know is instead.


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] B-Heaps

2010-06-18 Thread Yeb Havinga

Greg Smith wrote:

Matthew Wakeling wrote:
This sort of thing has been fairly well researched at an academic 
level, but has not been implemented in that many real world 
situations. I would encourage its use in Postgres.


I guess, but don't forget that work on PostgreSQL is driven by what 
problems people are actually running into.  There's a long list of 
performance improvements sitting in the TODO list waiting for people 
to find time to work on them, ones that we're quite certain are 
useful.  That anyone is going to chase after any of these speculative 
ideas from academic research instead of one of those is unlikely.  
Your characterization of the potential speed up here is Using a 
proper tree inside the index page would improve the CPU usage of the 
index lookups, which seems quite reasonable.  Regardless, when I 
consider is that something I have any reason to suspect is a 
bottleneck on common workloads?, I don't think of any, and return to 
working on one of things I already know is instead.



There are two different things concerning gist indexes:

1) with larger block sizes and hence, larger # entries per gist page, 
results in more generic keys of those pages. This in turn results in a 
greater number of hits, when the index is queried, so a larger part of 
the index is scanned. NB this has nothing to do with caching / cache 
sizes; it holds for every IO model. Tests performed by me showed 
performance improvements of over 200%. Since then implementing a speedup 
has been on my 'want to do list'.


2) there are several approaches to get the # entries per page down. Two 
have been suggested in the thread referred to by Matthew (virtual pages 
(but how to order these?) and tree within a page). It is interesting to 
see if ideas from Prokop's cache oblivous algorithms match with this 
problem to find a suitable virtual page format.


regards,
Yeb Havinga


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] B-Heaps

2010-06-18 Thread Kevin Grittner
Yeb Havinga yebhavi...@gmail.com wrote:
 
 concerning gist indexes:
 
 1) with larger block sizes and hence, larger # entries per gist
 page, results in more generic keys of those pages. This in turn
 results in a greater number of hits, when the index is queried, so
 a larger part of the index is scanned. NB this has nothing to do
 with caching / cache sizes; it holds for every IO model. Tests
 performed by me showed performance improvements of over 200%.
 Since then implementing a speedup has been on my 'want to do
 list'.
 
As I recall, the better performance in your tests was with *smaller*
GiST pages, right?  (The above didn't seem entirely clear on that.)
 
-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] B-Heaps

2010-06-18 Thread Tom Lane
Greg Smith g...@2ndquadrant.com writes:
 Your characterization of the potential speed up here is Using a proper tree 
 inside the index page would improve the CPU usage of the index lookups, 
 which seems quite reasonable.  Regardless, when I consider is that 
 something I have any reason to suspect is a bottleneck on common 
 workloads?, I don't think of any, and return to working on one of 
 things I already know is instead.

Note also that this doesn't do a thing for b-tree indexes, which already
have an intelligent within-page structure.  So that instantly makes it
not a mainstream issue.  Perhaps somebody will be motivated to work on
it, but most of us are chasing other things.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] B-Heaps

2010-06-18 Thread Yeb Havinga

Kevin Grittner wrote:

Yeb Havinga yebhavi...@gmail.com wrote:
 
  

concerning gist indexes:

1) with larger block sizes and hence, larger # entries per gist
page, results in more generic keys of those pages. This in turn
results in a greater number of hits, when the index is queried, so
a larger part of the index is scanned. NB this has nothing to do
with caching / cache sizes; it holds for every IO model. Tests
performed by me showed performance improvements of over 200%.
Since then implementing a speedup has been on my 'want to do
list'.

 
As I recall, the better performance in your tests was with *smaller*

GiST pages, right?  (The above didn't seem entirely clear on that.)
  

Yes, making pages smaller made index scanning faster.

-- Yeb


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] B-Heaps

2010-06-15 Thread A. Kretschmer
In response to Greg Smith :
 For details about what the database does there, see Inside the 
 PostgreSQL Buffer Cache at http://projects.2ndquadrant.com/talks

Nice paper, btw., thanks for that!


Regards, Andreas
-- 
Andreas Kretschmer
Kontakt:  Heynitz: 035242/47150,   D1: 0160/7141639 (mehr: - Header)
GnuPG: 0x31720C99, 1006 CCB4 A326 1D42 6431  2EB0 389D 1DC2 3172 0C99

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance



Re: [PERFORM] B-Heaps

2010-06-15 Thread Matthew Wakeling

On Mon, 14 Jun 2010, Eliot Gable wrote:

Just curious if this would apply to PostgreSQL:
http://queue.acm.org/detail.cfm?id=1814327


Absolutely, and I said in 
http://archives.postgresql.org/pgsql-performance/2010-03/msg00272.php
but applied to the Postgres B-tree indexes instead of heaps. It's a pretty 
obvious performance improvement really - the principle is that when you do 
have to fetch a page from a slower medium, you may as well make it count 
for a lot.


Lots of research has already been done on this - the paper linked above is 
rather behind the times.


However, AFAIK, Postgres has not implemented this in any of its indexing 
systems.


Matthew

--
An ant doesn't have a lot of processing power available to it. I'm not trying
to be speciesist - I wouldn't want to detract you from such a wonderful
creature, but, well, there isn't a lot there, is there?
   -- Computer Science Lecturer

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] B-Heaps

2010-06-15 Thread Yeb Havinga

Greg Smith wrote:

Eliot Gable wrote:
Just curious if this would apply to PostgreSQL: 
http://queue.acm.org/detail.cfm?id=1814327


It's hard to take this seriously at all when it's so ignorant of 
actual research in this area.  Take a look at 
http://www.cc.gatech.edu/~bader/COURSES/UNM/ece637-Fall2003/papers/BFJ01.pdf 
for a second

Interesting paper, thanks for the reference!
PostgreSQL is modeling a much more complicated situation where there 
are many levels of caches, from CPU to disk.  When executing a query, 
the database tries to manage that by estimating the relative costs for 
CPU operations, row operations, sequential disk reads, and random disk 
reads.  Those fundamental operations are then added up to build more 
complicated machinery like sorting.  To minimize query execution cost, 
various query plans are considered, the cost computed for each one, 
and the cheapest one gets executed.  This has to take into account a 
wide variety of subtle tradeoffs related to whether memory should be 
used for things that would otherwise happen on disk.  There are three 
primary ways to search for a row, three main ways to do a join, two 
for how to sort, and they all need to have cost estimates made for 
them that balance CPU time, memory, and disk access.
Do you think that the cache oblivious algorithm described in the paper 
could speed up index scans hitting the disk Postgres (and os/hardware) 
multi level memory case? (so e.g. random page cost could go down?)


regards,
Yeb Havinga

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance