Re: [PERFORM] Rather large LA

2011-09-06 Thread Richard Shaw
Thanks for the advice, It's one under consideration at the moment.   What are 
your thoughts on increasing RAM and shared_buffers?


On 6 Sep 2011, at 20:21, Alan Hodgson wrote:

> On September 6, 2011 12:11:10 PM Richard Shaw wrote:
>> 24 :)
>> 
>> 4 x Intel Xeon-NehalemEX E7540-HexCore [2GHz]
>> 
> 
> Nice box.
> 
> Still I/O-bound, though. SSDs would help a lot, I would think.


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Rather large LA

2011-09-06 Thread Richard Shaw

24 :)

4 x Intel Xeon-NehalemEX E7540-HexCore [2GHz] 

On 6 Sep 2011, at 20:07, Alan Hodgson wrote:

> On September 5, 2011 03:36:09 PM you wrote:
>> After Restart
>> 
>> procs ---memory-- ---swap-- -io --system--
>> -cpu-- r  b   swpd   free   buff  cache   si   sobibo   in
>>  cs us sy id wa st 2 34   2332 5819012  75632 258553680089   
>> 4200  7  5 85  3  0 4 39   2332 5813344  75628 2583358800 
>> 5104   324 5480 27047  3  1 84 11  0 2 47   2332 5815212  75336 25812064  
>> 00  4356  1664 5627 28695  3  1 84 12  0 2 40   2332 5852452  75340
>> 2581749600  5632   828 5817 28832  3  1 84 11  0 1 45   2332
>> 5835704  75348 2581707200  4960  1004 5111 25782  2  1 88  9  0 2
>> 42   2332 5840320  75356 2581163200  3884   492 5405 27858  3  1
>> 88  8  0 0 47   2332 5826648  75348 2580529600  4432  1268 5888
>> 29556  3  1 83 13  0
>> 
>> avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>>   3.260.001.69   25.210.00   69.84
>> 
>> Device: rrqm/s   wrqm/s   r/s   w/s   rsec/s   wsec/s avgrq-sz
>> avgqu-sz   await  svctm  %util sda   0.5045.00 520.00 
>> 2.50  8316.00   380.0016.6471.70  118.28   1.92 100.10 sda1   
>>  0.00 0.00  0.00  0.00 0.00 0.00 0.00 0.00   
>> 0.00   0.00   0.00 sda2  0.5045.00 520.00  2.50  8316.00  
>> 380.0016.6471.70  118.28   1.92 100.10 sda3  0.00
>> 0.00  0.00  0.00 0.00 0.00 0.00 0.000.00   0.00   0.00
>> sdb   0.00   196.50  0.00 10.50 0.00  1656.00   157.71
>> 0.010.67   0.52   0.55 sdb1  0.00   196.50  0.00 10.50
>> 0.00  1656.00   157.71 0.010.67   0.52   0.55
>> 
>> avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>>   3.970.001.71   20.880.00   73.44
> 
> Yeah 20% I/O wait I imagine feels pretty slow. 8 cores? 


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Rather large LA

2011-09-06 Thread Richard Shaw

/

OS and Postgres on same mount point

On 6 Sep 2011, at 00:31, Scott Marlowe wrote:

> On Mon, Sep 5, 2011 at 4:36 PM, Richard Shaw  wrote:
>> Device: rrqm/s   wrqm/s   r/s   w/s   rsec/s   wsec/s avgrq-sz 
>> avgqu-sz   await  svctm  %util
>> sda   1.00   143.00 523.50 108.00  8364.00  2008.0016.42 
>> 2.784.41   1.56  98.35
>> sda1  0.00 0.00  0.00  0.00 0.00 0.00 0.00 
>> 0.000.00   0.00   0.00
>> sda2  1.00   143.00 523.50 108.00  8364.00  2008.0016.42 
>> 2.784.41   1.56  98.35
> 
> So what is /dev/sda2 mounted as?


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Rather large LA

2011-09-05 Thread Richard Shaw

vmstat 1 and iostat -x output 

Normal


procs ---memory-- ---swap-- -io --system-- 
-cpu--
 r  b   swpd   free   buff  cache   si   sobibo   in   cs us sy id wa st
 3  0   2332 442428  73904 3128734400894200  7  5 85  3 
 0
 4  1   2332 428428  73904 3128828800  1440 0 6553 29066  5  2 91  
1  0
 4  1   2332 422688  73904 3128868800   856 0 4480 18860  3  1 95  
1  0
 0  0   2332 476072  73920 3128944400   544  1452 4478 19103  3  1 95  
0  0
 3  0   2332 422288  73920 3129057200  1268   496 5565 23410  5  3 91  
1  0

cavg-cpu:  %user   %nice %system %iowait  %steal   %idle
   5.110.012.582.560.00   89.74

Device: rrqm/s   wrqm/s   r/s   w/s   rsec/s   wsec/s avgrq-sz avgqu-sz 
  await  svctm  %util
sda   1.00   143.00 523.50 108.00  8364.00  2008.0016.42 
2.784.41   1.56  98.35
sda1  0.00 0.00  0.00  0.00 0.00 0.00 0.00 0.00 
   0.00   0.00   0.00
sda2  1.00   143.00 523.50 108.00  8364.00  2008.0016.42 
2.784.41   1.56  98.35
sda3  0.00 0.00  0.00  0.00 0.00 0.00 0.00 0.00 
   0.00   0.00   0.00
sdb   0.00 0.00  0.00  0.00 0.00 0.00 0.00 0.00 
   0.00   0.00   0.00
sdb1  0.00 0.00  0.00  0.00 0.00 0.00 0.00 0.00 
   0.00   0.00   0.00

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
   4.890.002.943.140.00   89.04

Device: rrqm/s   wrqm/s   r/s   w/s   rsec/s   wsec/s avgrq-sz avgqu-sz 
  await  svctm  %util
sda   1.00 0.00 285.00  0.00  4808.00 0.0016.87 
2.468.29   3.02  86.20
sda1  0.00 0.00  0.00  0.00 0.00 0.00 0.00 0.00 
   0.00   0.00   0.00
sda2  1.00 0.00 285.00  0.00  4808.00 0.0016.87 
2.468.29   3.02  86.20
sda3  0.00 0.00  0.00  0.00 0.00 0.00 0.00 0.00 
   0.00   0.00   0.00
sdb   0.00   161.50  0.00  6.50 0.00  1344.00   206.77 0.00 
   0.69   0.15   0.10
sdb1  0.00   161.50  0.00  6.50 0.00  1344.00   206.77 0.00 
   0.69   0.15   0.10


After Restart

procs ---memory-- ---swap-- -io --system-- 
-cpu--
 r  b   swpd   free   buff  cache   si   sobibo   in   cs us sy id wa st
 2 34   2332 5819012  75632 2585536800894200  7  5 85  
3  0
 4 39   2332 5813344  75628 2583358800  5104   324 5480 27047  3  1 84 
11  0
 2 47   2332 5815212  75336 2581206400  4356  1664 5627 28695  3  1 84 
12  0
 2 40   2332 5852452  75340 2581749600  5632   828 5817 28832  3  1 84 
11  0
 1 45   2332 5835704  75348 2581707200  4960  1004 5111 25782  2  1 88  
9  0
 2 42   2332 5840320  75356 2581163200  3884   492 5405 27858  3  1 88  
8  0
 0 47   2332 5826648  75348 2580529600  4432  1268 5888 29556  3  1 83 
13  0

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
   3.260.001.69   25.210.00   69.84

Device: rrqm/s   wrqm/s   r/s   w/s   rsec/s   wsec/s avgrq-sz avgqu-sz 
  await  svctm  %util
sda   0.5045.00 520.00  2.50  8316.00   380.0016.64
71.70  118.28   1.92 100.10
sda1  0.00 0.00  0.00  0.00 0.00 0.00 0.00 0.00 
   0.00   0.00   0.00
sda2  0.5045.00 520.00  2.50  8316.00   380.0016.64
71.70  118.28   1.92 100.10
sda3  0.00 0.00  0.00  0.00 0.00 0.00 0.00 0.00 
   0.00   0.00   0.00
sdb   0.00   196.50  0.00 10.50 0.00  1656.00   157.71 0.01 
   0.67   0.52   0.55
sdb1  0.00   196.50  0.00 10.50 0.00  1656.00   157.71 0.01 
   0.67   0.52   0.55

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
   3.970.001.71   20.880.00   73.44

Device: rrqm/s   wrqm/s   r/s   w/s   rsec/s   wsec/s avgrq-sz avgqu-sz 
  await  svctm  %util
sda   1.00 0.00 532.00  0.00  8568.00 0.0016.11
73.73  148.44   1.88 100.05
sda1  0.00 0.00  0.00  0.00 0.00 0.00 0.00 0.00 
   0.00   0.00   0.00
sda2  1.00 0.00 532.00  0.00  8568.00 0.0016.11
73.73  148.44   1.88 100.05
sda3  0.00 0.00  0.00  0.00 0.00 0.00 0.00 0.00 
   0.00   0.00   0.00
sdb   0.00   106.50  0.00 11.50 0.00   944.0082.09 0.00 
   0.39   0.30   0.35
sdb1  0.00   106.50  0.00 11.50 0.00   944.0082.09 0.00 
   0.39   0.30   0.35

Regards

Richard

.

On 5 Sep 2011, at 21:05, Alan Hodgson wrote:

> On September 5, 2011, Richard Shaw  wrote:
> > Hi Andy,
> > 
> > It's not a new issue no, It's a legacy system that is in no w

Re: [PERFORM] Rather large LA

2011-09-05 Thread Richard Shaw

Hi Andy,

It's not a new issue no, It's a legacy system that is in no way ideal but is 
also not in a position to be overhauled.  Indexes are correct, tables are up to 
25 million rows.

On startup, it hits CPU more than IO, I'll provide some additional stats after 
I restart it tonight.

Server logs have been reviewed and where possible, slow queries have been 
fixed.   

Autovacuum has been disabled and set to run manually via cron during a quiet 
period and fsync has recently been turned off to gauge any real world 
performance increase, there is battery backup on the raid card providing some 
level of resilience.

Thanks

Richard


On 5 Sep 2011, at 14:39, Andy Colson wrote:

> On 09/05/2011 05:28 AM, Richard Shaw wrote:
>> 
>> Hi,
>> 
>> I have a database server that's part of a web stack and is experiencing 
>> prolonged load average spikes of up to 400+ when the db is restarted and 
>> first accessed by the other parts of the stack and has generally poor 
>> performance on even simple select queries.
>> 
> 
> Is the slowness new?  Or has it always been a bit slow?  Have you checked for 
> bloat on your tables/indexes?
> 
> When you start up, does it peg a cpu or sit around doing IO?
> 
> Have you reviewed the server logs?
> 
> 
> autovacuum | off
> 
> Why?  I assume that's a problem.
> 
> fsync  | off
> 
> Seriously?
> 
> 
> -Andy
> 
> 
> 
>> There are 30 DBs in total on the server coming in at 226GB.  The one that's 
>> used the most is 67GB and there are another 29 that come to 159GB.
>> 
>> I'd really appreciate it if you could review my configurations below and 
>> make any suggestions that might help alleviate the performance issues.  I've 
>> been looking more into the shared buffers to the point of installing the 
>> contrib module to check what they're doing, possibly installing more RAM as 
>> the most used db @ 67GB might appreciate it, or moving the most used DB onto 
>> another set of disks, possible SSD.
>> 
>> 
>> PostgreSQL 9.0.4
>> Pgbouncer 1.4.1
>> 
>> Linux 2.6.18-238.9.1.el5 #1 SMP Tue Apr 12 18:10:13 EDT 2011 x86_64 x86_64 
>> x86_64 GNU/Linux
>> 
>> CentOS release 5.6 (Final)
>> 
>> 4 x Intel Xeon-NehalemEX E7540-HexCore [2GHz] ( 24 physical cores )
>> 32GB DDR3 RAM
>> 1 x Adaptec 5805 Z  SATA/SAS RAID with battery backup
>> 4 x Seagate Cheetah ST3300657SS 300GB 15RPM SAS drives in RAID 10
>> 1 x 500GB 7200RPM SATA disk
>> 
>> Postgres and the OS reside on the same ex3 filesystem, whilst query and 
>> archive logging go onto the SATA disk which is also ext3.
>> 
>> 
>>   name  |
>>   current_setting
>> +---
>>  version| PostgreSQL 9.0.4 on 
>> x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 4.1.2 20080704 (Red Hat 
>> 4.1.2-48), 64-bit
>>  archive_command| tar jcf /disk1/db-wal/%f.tar.bz2 %p
>>  archive_mode   | on
>>  autovacuum | off
>>  checkpoint_completion_target   | 0.9
>>  checkpoint_segments| 10
>>  client_min_messages| notice
>>  effective_cache_size   | 17192MB
>>  external_pid_file  | /var/run/postgresql/9-main.pid
>>  fsync  | off
>>  full_page_writes   | on
>>  lc_collate | en_US.UTF-8
>>  lc_ctype   | en_US.UTF-8
>>  listen_addresses   |
>>  log_checkpoints| on
>>  log_destination| stderr
>>  log_directory  | /disk1/pg_log
>>  log_error_verbosity| verbose
>>  log_filename   | postgresql-%Y-%m-%d_%H%M%S.log
>>  log_line_prefix| %m %u %h
>>  log_min_duration_statement | 250ms
>>  log_min_error_statement| error
>>  log_min_messages   | notice
>>  log_rotation_age   | 1d
>>  logging_collector  | on
>>  maintenance_work_mem   | 32MB
>>  max_connections| 1000
>>  max_prepared_transactions  | 25
>>  max_stack_depth| 4MB
>>  port   | 6432
>>  server_encoding| UTF8
>>  shared_buffers | 8GB
>>  superuser_reserved_

Re: [PERFORM] Rather large LA

2011-09-05 Thread Richard Shaw

Hi Craig,

Apologies, I should have made that clearer, I am using PgBouncer 1.4.1 in front 
of Postgres and included the config at the bottom of my original mail

Regards

Richard

.

On 5 Sep 2011, at 11:49, Craig Ringer wrote:

> On 5/09/2011 6:28 PM, Richard Shaw wrote:
>>  max_connections| 1000
> 
> Woah! No wonder you have "stampeding herd" problems after a DB or server 
> restart and are having performance issues.
> 
> When you have 1000 clients trying to do work at once, they'll all be fighting 
> over memory, disk I/O bandwidth, and CPU power which is nowhere near 
> sufficient to allow them to all actually achieve something all at once. 
> You'll have a lot of overhead as the OS tries to be fair and allow each to 
> make progress - at the expense of overall throughput.
> 
> If most of those connections are idle most of the time - say, they're 
> peristent connections from some webapp that requrires one connection per 
> webserver thread - then the situation isn't so bad. They're still costing you 
> backend RAM and various housekeeping overhead (including task switching) 
> related to lock management and shared memory, though.
> 
> Consider using a connection pooler like PgPool-II or PgBouncer if your 
> application is suitable. Most apps will be quite happy using pooled 
> connections; only a few things like advisory locking and HOLD cursors work 
> poorly with pooled connections. Using a pool allows you to reduce the number 
> of actively working and busy connections to the real Pg backend to something 
> your hardware can cope with, which should dramatically increase performance 
> and reduce startup load spikes. The general very rough rule of thumb for 
> number of active connections is "number of CPU cores + number of HDDs" but of 
> course this is only incredibly rough and depends a lot on your workload and 
> DB.
> 
> Ideally PostgreSQL would take care of this pooling inside the server, 
> breaking the "one connection = one worker backend" equivalence. Unfortunately 
> the server's process-based design makes that harder than it could be. There's 
> also a lot of debate about whether pooling is even the core DB server's job 
> and if it is, which of the several possible approaches is the most 
> appropriate. Then there's the issue of whether in-server connection pooling 
> is even appropriate without admission control - which brings up the 
> "admission control is insanely hard" problem. So for now, pooling lives 
> outside the server in projects like PgPool-II and PgBouncer.
> 
> --
> Craig Ringer


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Rather large LA

2011-09-05 Thread Richard Shaw

Hi, 

I have a database server that's part of a web stack and is experiencing 
prolonged load average spikes of up to 400+ when the db is restarted and first 
accessed by the other parts of the stack and has generally poor performance on 
even simple select queries.

There are 30 DBs in total on the server coming in at 226GB.  The one that's 
used the most is 67GB and there are another 29 that come to 159GB. 

I'd really appreciate it if you could review my configurations below and make 
any suggestions that might help alleviate the performance issues.  I've been 
looking more into the shared buffers to the point of installing the contrib 
module to check what they're doing, possibly installing more RAM as the most 
used db @ 67GB might appreciate it, or moving the most used DB onto another set 
of disks, possible SSD.


PostgreSQL 9.0.4
Pgbouncer 1.4.1

Linux 2.6.18-238.9.1.el5 #1 SMP Tue Apr 12 18:10:13 EDT 2011 x86_64 x86_64 
x86_64 GNU/Linux

CentOS release 5.6 (Final)

4 x Intel Xeon-NehalemEX E7540-HexCore [2GHz] ( 24 physical cores )
32GB DDR3 RAM
1 x Adaptec 5805 Z  SATA/SAS RAID with battery backup
4 x Seagate Cheetah ST3300657SS 300GB 15RPM SAS drives in RAID 10
1 x 500GB 7200RPM SATA disk

Postgres and the OS reside on the same ex3 filesystem, whilst query and archive 
logging go onto the SATA disk which is also ext3.


  name  |   
   current_setting  
+---
 version| PostgreSQL 9.0.4 on x86_64-unknown-linux-gnu, 
compiled by GCC gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-48), 64-bit
 archive_command| tar jcf /disk1/db-wal/%f.tar.bz2 %p
 archive_mode   | on
 autovacuum | off
 checkpoint_completion_target   | 0.9
 checkpoint_segments| 10
 client_min_messages| notice
 effective_cache_size   | 17192MB
 external_pid_file  | /var/run/postgresql/9-main.pid
 fsync  | off
 full_page_writes   | on
 lc_collate | en_US.UTF-8
 lc_ctype   | en_US.UTF-8
 listen_addresses   | 
 log_checkpoints| on
 log_destination| stderr
 log_directory  | /disk1/pg_log
 log_error_verbosity| verbose
 log_filename   | postgresql-%Y-%m-%d_%H%M%S.log
 log_line_prefix| %m %u %h 
 log_min_duration_statement | 250ms
 log_min_error_statement| error
 log_min_messages   | notice
 log_rotation_age   | 1d
 logging_collector  | on
 maintenance_work_mem   | 32MB
 max_connections| 1000
 max_prepared_transactions  | 25
 max_stack_depth| 4MB
 port   | 6432
 server_encoding| UTF8
 shared_buffers | 8GB
 superuser_reserved_connections | 3
 synchronous_commit | on
 temp_buffers   | 5120
 TimeZone   | UTC
 unix_socket_directory  | /var/run/postgresql
 wal_buffers| 10MB
 wal_level  | archive
 wal_sync_method| fsync
 work_mem   | 16MB


Pgbouncer config

[databases]
* = port=6432
[pgbouncer]
user=postgres
pidfile = /tmp/pgbouncer.pid
listen_addr = 
listen_port = 5432
unix_socket_dir = /var/run/postgresql
auth_type = trust 
auth_file = /etc/pgbouncer/userlist.txt
admin_users = postgres
stats_users = postgres
pool_mode = session 
server_reset_query = DISCARD ALL;
server_check_query = select 1
server_check_delay = 10 
server_idle_timeout = 5
server_lifetime = 0
max_client_conn = 4096 
default_pool_size = 100
log_connections = 1
log_disconnections = 1
log_pooler_errors = 1
client_idle_timeout = 30
reserve_pool_size = 800


Thanks in advance

Richard


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance