Re: [PERFORM] wal_synch_method = open_sync safe on RHEL 5.5?

2010-06-18 Thread Greg Smith

Mark Mielke wrote:
The conclusion I read was that Linux O_SYNC behaves like O_DSYNC on 
other systems. For WAL, this seems satisfactory?


It would be if it didn't have any bugs or limitiations, but it does.  
The one pointed out in the message I linked to suggests that a mix of 
buffered and O_SYNC direct I/O can cause a write error, with the exact 
behavior you get depending on the kernel version.  That's a path better 
not explored as I see it.


The kernels that have made some effort to implement this correctly 
actually expose O_DSYNC, on newer Linux systems.  My current opinion is 
that if you only have Linux O_SYNC, don't use it.  The ones with O_DSYNC 
haven't been around for long enough to be proven or disproven as 
effective yet.


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] PostgreSQL as a local in-memory cache

2010-06-18 Thread Matthew Wakeling

Dimitri Fontaine wrote:

Well I guess I'd prefer a per-transaction setting


Not possible, as many others have said. As soon as you make an unsafe 
transaction, all the other transactions have nothing to rely on.


On Thu, 17 Jun 2010, Pierre C wrote:
A per-table (or per-index) setting makes more sense IMHO. For instance on 
recovery, truncate this table (this was mentioned before).


That would be much more valuable.

I'd like to point out the costs involved in having a whole separate 
version of Postgres that has all this safety switched off. Package 
managers will not thank anyone for having to distribute another version of 
the system, and woe betide the user who installs the wrong version because 
it runs faster. No, this is much better as a configurable option.


Going back to the on recovery, truncate this table. We already have a 
mechanism for skipping the WAL writes on an entire table - we do that for 
tables that have been created in the current transaction. It would surely 
be a small step to allow this to be configurably permanent on a particular 
table.


Moreover, we already have a mechanism for taking a table that has had 
non-logged changes, and turning it into a fully logged table - we do that 
to the above mentioned tables when the transaction commits. I would 
strongly recommend providing an option to ALTER TABLE MAKE SAFE, which may 
involve some more acrobatics if the table is currently in use by multiple 
transactions, but would be valuable.


This would allow users to create temporary tables that can be shared by 
several connections. It would also allow bulk loading in parallel of a 
single large table.


With these suggestions, we would still need to WAL-log all the metadata 
changes, but I think in most circumstances that is not going to be a large 
burden on performance.


Matthew

--
Picard: I was just paid a visit from Q.
Riker:  Q! Any idea what he's up to?
Picard: No. He said he wanted to be nice to me.
Riker:  I'll alert the crew.

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] B-Heaps

2010-06-18 Thread Robert Haas
On Tue, Jun 15, 2010 at 8:23 AM, Matthew Wakeling matt...@flymine.org wrote:
 Absolutely, and I said in
 http://archives.postgresql.org/pgsql-performance/2010-03/msg00272.php
 but applied to the Postgres B-tree indexes instead of heaps.

This is an interesting idea.  I would guess that you could simulate
this to some degree by compiling PG with a larger block size.  Have
you tried this to see whether/how much/for what kind of workloads it
helps?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] PostgreSQL as a local in-memory cache

2010-06-18 Thread Pierre C


I'd like to point out the costs involved in having a whole separate  
version


It must be a setting, not a version.

For instance suppose you have a session table for your website and a users  
table.


- Having ACID on the users table is of course a must ;
- for the sessions table you can drop the D

Server crash would force all users to re-login on your website but if your  
server crashes enough that your users complain about that, you have  
another problem anyway. Having the sessions table not WAL-logged (ie  
faster) would not prevent you from having sessions.user_id REFERENCES  
users( user_id ) ... so mixing safe and unsafe tables would be much more  
powerful than just having unsafe tables.


And I really like the idea of non-WAL-logged indexes, too, since they can  
be rebuilt as needed, the DBA could decide between faster index updates  
but rebuild on crash, or normal updates and fast recovery.


Also materialized views etc, you can rebuild them on crash and the added  
update speed would be good.


Moreover, we already have a mechanism for taking a table that has had  
non-logged changes, and turning it into a fully logged table - we do  
that to the above mentioned tables when the transaction commits. I would  
strongly recommend providing an option to ALTER TABLE MAKE SAFE, which  
may involve some more acrobatics if the table is currently in use by  
multiple transactions, but would be valuable.


I believe the old discussions called this ALTER TABLE SET PERSISTENCE.

This would allow users to create temporary tables that can be shared  
by several connections. It would also allow bulk loading in parallel of  
a single large table.


This would need to WAL-log the entire table to send it to the slaves if  
replication is enabled, but it's a lot faster than replicating each record.



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] requested shared memory size overflows size_t

2010-06-18 Thread Kenneth Marshall
On Fri, Jun 18, 2010 at 12:46:11AM +0100, Tom Wilcox wrote:
 On 17/06/2010 22:41, Greg Smith wrote:
 Tom Wilcox wrote:
 Any suggestions for good monitoring software for linux?

 By monitoring, do you mean for alerting purposes or for graphing purposes? 
  Nagios is the only reasonable choice for the former, while doing at best 
 a mediocre job at the latter.  For the later, I've found that Munin does a 
 good job of monitoring Linux and PostgreSQL in its out of the box 
 configuration, in terms of providing useful activity graphs.  And you can 
 get it to play nice with Nagios.

 Thanks Greg. Ill check Munin and Nagios out. It is very much for graphing 
 purposes. I would like to be able to perform objective, 
 platform-independent style performance comparisons.

 Cheers,
 Tom

Zabbix-1.8+ is also worth taking a look at and it can run off our
favorite database. It allows for some very flexible monitoring and
trending data collection.

Regards,
Ken

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] B-Heaps

2010-06-18 Thread Matthew Wakeling

On Fri, 18 Jun 2010, Robert Haas wrote:

On Tue, Jun 15, 2010 at 8:23 AM, Matthew Wakeling matt...@flymine.org wrote:

Absolutely, and I said in
http://archives.postgresql.org/pgsql-performance/2010-03/msg00272.php
but applied to the Postgres B-tree indexes instead of heaps.


This is an interesting idea.  I would guess that you could simulate
this to some degree by compiling PG with a larger block size.  Have
you tried this to see whether/how much/for what kind of workloads it
helps?


To a degree, that is the case. However, if you follow the thread a bit 
further back, you will find evidence that when the index is in memory, 
increasing the page size actually decreases the performance, because it 
uses more CPU.


To make it clear - 8kB is not an optimal page size for either fully cached 
data or sparsely cached data. For disc access, large pages are 
appropriate, on the order of 256kB. If the page size is much lower than 
that, then the time taken to fetch it doesn't actually decrease much, and 
we are trying to get the maximum amount of work done per fetch without 
slowing fetches down significantly.


Given such a large page size, it would then be appropriate to have a 
better data structure inside the page. Currently, our indexes (at least 
the GiST ones - I haven't looked at the Btree ones) use a simple linear 
array in the index page. Using a proper tree inside the index page would 
improve the CPU usage of the index lookups.


One detail that would need to be sorted out is the cache eviction policy. 
I don't know if it is best to evict whole 256kB pages, or to evict 8kB 
pages. Probably the former, which would involve quite a bit of change to 
the shared memory cache. I can see the cache efficiency decreasing as a 
result of this, which is the only disadvantage I can see.


This sort of thing has been fairly well researched at an academic level, 
but has not been implemented in that many real world situations. I would 
encourage its use in Postgres.


Matthew

--
Failure is not an option. It comes bundled with your Microsoft product. 
-- Ferenc Mantfeld


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] B-Heaps

2010-06-18 Thread Greg Smith

Matthew Wakeling wrote:
This sort of thing has been fairly well researched at an academic 
level, but has not been implemented in that many real world 
situations. I would encourage its use in Postgres.


I guess, but don't forget that work on PostgreSQL is driven by what 
problems people are actually running into.  There's a long list of 
performance improvements sitting in the TODO list waiting for people to 
find time to work on them, ones that we're quite certain are useful.  
That anyone is going to chase after any of these speculative ideas from 
academic research instead of one of those is unlikely.  Your 
characterization of the potential speed up here is Using a proper tree 
inside the index page would improve the CPU usage of the index lookups, 
which seems quite reasonable.  Regardless, when I consider is that 
something I have any reason to suspect is a bottleneck on common 
workloads?, I don't think of any, and return to working on one of 
things I already know is instead.


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] HashAggregate slower than sort?

2010-06-18 Thread Kevin Grittner
Jatinder Sangha j...@coalition.com wrote:
 
 I have a simple query that when planned either uses hash-
 aggregates or a sort depending on the amount of working memory
 available. The problem is that when it uses the hash-aggregates,
 the query runs 25% slower than when using the sort method.
 
 The table in question contains about 60 columns, many of which are
 boolean, 32-bit integers and some are 64-bit integers. Many fields
 are text - and some of these can be quite long (eg 32Kb).
 
 Obviously, I can re-write the query to use a distinct on (...)
 clause
 
Yeah, that seems prudent, to say the least.
 
 Why is the hash-aggregate slower than the sort?
 
 Is it something to do with the number of columns? ie. When
 sorting, the first few columns defined on the table (id, version)
 make the row unique - but when using the hash-aggregate feature,
 presumably every column needs to be hashed which takes longer
 especially for long text fields?
 
Sounds like a reasonable guess to me.  But since you're apparently
retrieving about 9,000 wide rows in (worst case) 56 ms, it would
seem that your active data set may be fully cached.  If so, you
could try reducing both random_page_cost and seq_page_cost to
something in the 0.1 to 0.005 range and see if it improves the
accuracy of the cost estimates.  Not that you should go back to
using DISTINCT on all 60 column, including big text columns; but
these cost factors might help other queries pick faster plans.
 
-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] requested shared memory size overflows size_t

2010-06-18 Thread Greg Smith

Kenneth Marshall wrote:

Zabbix-1.8+ is also worth taking a look at and it can run off our
favorite database. It allows for some very flexible monitoring and
trending data collection.
  


Note that while Zabbix is perfectly reasonable general solution, the 
number of things it monitors out of the box for PostgreSQL:  
http://www.zabbix.com/wiki/howto/monitor/db/postgresql is only a 
fraction of what Munin shows you.  The main reason I've been suggesting 
Munin lately is because it seems to get all the basics right for new 
users without them having to do anything but activate the PostgreSQL 
plug-in.


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] B-Heaps

2010-06-18 Thread Yeb Havinga

Greg Smith wrote:

Matthew Wakeling wrote:
This sort of thing has been fairly well researched at an academic 
level, but has not been implemented in that many real world 
situations. I would encourage its use in Postgres.


I guess, but don't forget that work on PostgreSQL is driven by what 
problems people are actually running into.  There's a long list of 
performance improvements sitting in the TODO list waiting for people 
to find time to work on them, ones that we're quite certain are 
useful.  That anyone is going to chase after any of these speculative 
ideas from academic research instead of one of those is unlikely.  
Your characterization of the potential speed up here is Using a 
proper tree inside the index page would improve the CPU usage of the 
index lookups, which seems quite reasonable.  Regardless, when I 
consider is that something I have any reason to suspect is a 
bottleneck on common workloads?, I don't think of any, and return to 
working on one of things I already know is instead.



There are two different things concerning gist indexes:

1) with larger block sizes and hence, larger # entries per gist page, 
results in more generic keys of those pages. This in turn results in a 
greater number of hits, when the index is queried, so a larger part of 
the index is scanned. NB this has nothing to do with caching / cache 
sizes; it holds for every IO model. Tests performed by me showed 
performance improvements of over 200%. Since then implementing a speedup 
has been on my 'want to do list'.


2) there are several approaches to get the # entries per page down. Two 
have been suggested in the thread referred to by Matthew (virtual pages 
(but how to order these?) and tree within a page). It is interesting to 
see if ideas from Prokop's cache oblivous algorithms match with this 
problem to find a suitable virtual page format.


regards,
Yeb Havinga


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] B-Heaps

2010-06-18 Thread Kevin Grittner
Yeb Havinga yebhavi...@gmail.com wrote:
 
 concerning gist indexes:
 
 1) with larger block sizes and hence, larger # entries per gist
 page, results in more generic keys of those pages. This in turn
 results in a greater number of hits, when the index is queried, so
 a larger part of the index is scanned. NB this has nothing to do
 with caching / cache sizes; it holds for every IO model. Tests
 performed by me showed performance improvements of over 200%.
 Since then implementing a speedup has been on my 'want to do
 list'.
 
As I recall, the better performance in your tests was with *smaller*
GiST pages, right?  (The above didn't seem entirely clear on that.)
 
-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] B-Heaps

2010-06-18 Thread Tom Lane
Greg Smith g...@2ndquadrant.com writes:
 Your characterization of the potential speed up here is Using a proper tree 
 inside the index page would improve the CPU usage of the index lookups, 
 which seems quite reasonable.  Regardless, when I consider is that 
 something I have any reason to suspect is a bottleneck on common 
 workloads?, I don't think of any, and return to working on one of 
 things I already know is instead.

Note also that this doesn't do a thing for b-tree indexes, which already
have an intelligent within-page structure.  So that instantly makes it
not a mainstream issue.  Perhaps somebody will be motivated to work on
it, but most of us are chasing other things.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] B-Heaps

2010-06-18 Thread Yeb Havinga

Kevin Grittner wrote:

Yeb Havinga yebhavi...@gmail.com wrote:
 
  

concerning gist indexes:

1) with larger block sizes and hence, larger # entries per gist
page, results in more generic keys of those pages. This in turn
results in a greater number of hits, when the index is queried, so
a larger part of the index is scanned. NB this has nothing to do
with caching / cache sizes; it holds for every IO model. Tests
performed by me showed performance improvements of over 200%.
Since then implementing a speedup has been on my 'want to do
list'.

 
As I recall, the better performance in your tests was with *smaller*

GiST pages, right?  (The above didn't seem entirely clear on that.)
  

Yes, making pages smaller made index scanning faster.

-- Yeb


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] PostgreSQL as a local in-memory cache

2010-06-18 Thread Josh Berkus

 It must be a setting, not a version.
 
 For instance suppose you have a session table for your website and a
 users table.
 
 - Having ACID on the users table is of course a must ;
 - for the sessions table you can drop the D

You're trying to solve a different use-case than the one I am.

Your use-case will be solved by global temporary tables.  I suggest that
you give Robert Haas some help  feedback on that.

My use case is people using PostgreSQL as a cache, or relying entirely
on replication for durability.

-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] PostgreSQL as a local in-memory cache

2010-06-18 Thread Josh Berkus
On 6/18/10 2:15 AM, Matthew Wakeling wrote:
 I'd like to point out the costs involved in having a whole separate
 version of Postgres that has all this safety switched off. Package
 managers will not thank anyone for having to distribute another version
 of the system, and woe betide the user who installs the wrong version
 because it runs faster. No, this is much better as a configurable option.

Agreed, although initial alphas of this concept are likely to in fact be
a separate source code tree.  Eventually when we have it working well it
could become an initdb-time option.

-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] requested shared memory size overflows size_t

2010-06-18 Thread Scott Carey

On Jun 16, 2010, at 1:53 PM, Alvaro Herrera wrote:

 Excerpts from Tom Lane's message of lun jun 14 23:57:11 -0400 2010:
 Scott Carey sc...@richrelevance.com writes:
 Great points.  There is one other option that is decent for the WAL:
 If splitting out a volume is not acceptable for the OS and WAL -- 
 absolutely split those two out into their own partitions.  It is most 
 important to make sure that WAL and data are not on the same filesystem, 
 especially if ext3 is involved.
 
 Uh, no, WAL really needs to be on its own *spindle*.  The whole point
 here is to have one disk head sitting on the WAL and not doing anything
 else except writing to that file.
 
 However, there's another point here -- probably what Scott is on about:
 on Linux (at least ext3), an fsync of any file does not limit to
 flushing that file's blocks -- it flushes *ALL* blocks on *ALL* files in
 the filesystem.  This is particularly problematic if you have pgsql_tmp
 in the same filesystem and do lots of disk-based sorts.
 
 So if you have it in the same spindle but on a different filesystem, at
 least you'll avoid that extra fsync work, even if you have to live with
 the extra seeking.

yes, especially with a battery backed up caching raid controller the whole own 
spindle thing doesn't really matter, the WAL log writes fairly slowly and 
linearly and any controller with a damn will batch those up efficiently.

By FAR, the most important thing is to have WAL on its own file system.  If 
using EXT3 in a way that is safe for your data (data = ordered or better), even 
with just one SATA disk, performance will improve a LOT if data and xlog are 
separated into different file systems.  Yes, an extra spindle is better.

However with a decent RAID card or caching storage, 8 spindles for it all in 
one raid 10, with a partition for xlog and one for data, is often better 
performing than a mirrored pair for OS/xlog and 6 for data so long as the file 
systems are separated.   With a dedicated xlog and caching reliable storage, 
you can even mount it direct to avoid polluting OS page cache.



 
 -- 
 Álvaro Herrera alvhe...@commandprompt.com
 The PostgreSQL Company - Command Prompt, Inc.
 PostgreSQL Replication, Consulting, Custom Development, 24x7 support


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance