[PERFORM] High load and iowait but no disk access

2005-08-30 Thread Rémy Beaumont
We have been trying to pinpoint what originally seem to be a I/O bottleneck but which now seems to be an issue with either Postgresql or RHES 3.

We have the following test environment on which we can reproduce the problem:

1) Test System A
Dell 6650 Quad Xeon Pentium 4
8 Gig of RAM
OS: RHES 3 update 2
Storage: NetApp FAS270 connected using an FC card using 10 disks

2) Test System B
Dell Dual Xeon Pentium III
2 Gig o RAM
OS: RHES 3 update 2
Storage: NetApp FAS920 connected using an FC card using 28 disks

Our Database size is around 30G. 

The behavior we see is that when running queries that do random reads on disk, IOWAIT goes over 80% and actual disk IO falls to a crawl at a throughput bellow 3000kB/s (We usually average 4 kB/s to 8 kB/s on sequential read operations on the netapps)

The stats of the NetApp do confirm that it is sitting idle. Doing an strace on the Postgresql process shows that is it doing seeks and reads.

So my question is where is this iowait time spent ?
Is there a way to pinpoint the problem in more details ?
We are able to reproduce this behavior with Postgresql 7.4.8 and 8.0.3

I have included the output of top,vmstat,strace and systat from the Netapp from System B while running a single query that generates this behavior.

Rémy

top output:
06:27:28  up 5 days, 16:59,  6 users,  load average: 1.04, 1.30, 1.01
72 processes: 71 sleeping, 1 running, 0 zombie, 0 stopped
CPU states:  cpuusernice  systemirq  softirq  iowaitidle
total2.7%0.0%1.0%   0.1% 0.2%   46.0%   49.5%
cpu000.2%0.0%0.2%   0.0% 0.2%2.2%   97.2%
cpu015.3%0.0%1.9%   0.3% 0.3%   89.8%1.9%
Mem:  2061696k av, 2043936k used,   17760k free,   0k shrd,3916k buff
1566332k actv,  296648k in_d,   30504k in_c
Swap: 16771584k av,   21552k used, 16750032k free 1933772k cached

PID USER PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME CPU COMMAND
30960 postgres  15   0 13424  10M  9908 D 2.7  0.5   2:00   1 postmaster
30538 root  15   0  1080  764   524 S 0.7  0.0   0:43   0 sshd
1 root  15   0   496  456   436 S 0.0  0.0   0:08   0 init
2 root  RT   0 00 0 SW0.0  0.0   0:00   0 migration/0
3 root  RT   0 00 0 SW0.0  0.0   0:00   1 migration/1
4 root  15   0 00 0 SW0.0  0.0   0:01   0 keventd
5 root  34  19 00 0 SWN   0.0  0.0   0:00   0 ksoftirqd/0
6 root  34  19 00 0 SWN   0.0  0.0   0:00   1 ksoftirqd/1
9 root  15   0 00 0 SW0.0  0.0   0:24   1 bdflush
7 root  15   0 00 0 SW0.0  0.0   6:53   1 kswapd
8 root  15   0 00 0 SW0.0  0.0   8:44   1 kscand
10 root  15   0 00 0 SW0.0  0.0   0:13   0 kupdated
11 root  25   0 00 0 SW0.0  0.0   0:00   0 mdrecoveryd
17 root  15   0 00 0 SW0.0  0.0   0:00   0 ahc_dv_0


vmstat output 
procs  memory  swap  io system cpu
r  b   swpd   free   buff  cache   si   sobibo   incs us sy id wa
0  1  21552  17796   4872 193192823 3 1   27 6  2  1  7  3
0  1  21552  18044   4880 193165200  1652 0  397   512  1  2 50 47
0  1  21552  17976   4896 193166400  2468 0  407   552  2  2 50 47
1  0  21552  17984   4896 193160800  2124 0  418   538  3  3 48 46
0  1  21552  18028   4900 193153600  1592 0  385   509  1  3 50 46
0  1  21552  18040   4916 193148800  1620   820  419   581  2  2 50 46
0  1  21552  17968   4916 193153604  1708 4  402   554  3  1 50 46
1  1  21552  18052   4916 193138800  1772 0  409   531  3  1 49 47
0  1  21552  17912   4924 193149200  1772 0  408   565  3  1 48 48
0  1  21552  17932   4932 193144004  1356 4  391   545  5  0 49 46
0  1  21552  18320   4944 193101604  1500   840  414   571  1  1 48 50
0  1  21552  17872   4944 193144000  2116 0  392   496  1  5 46 48
0  1  21552  18060   4944 193123200  2232 0  423   597  1  2 48 49
1  1  21552  17684   4944 193158400  1752 0  395   537  1  1 50 48
0  1  21552  18000   4944 193124000  1576 0  401   549  0  1 50 49


NetApp stats:
CPU   NFS  CIFS  HTTP   TotalNet kB/s   Disk kB/s Tape kB/s Cache Cache  CP   CP Disk   DAFS   FCP iSCSI   FCP  kB/s
in   out   read  write  read write   age   hit time  ty util   in   out
2% 0 0 0 139 0 0   2788  0 0 0 3   96%   0%  -   15%  0   139 0 3  2277
2% 0 0 0 144 0 0   2504  0 0 0 3   96%   0%  -   18%  0   144 0 3  2150
2% 0 0 0 130 0 0   2212  0 0 0 3   96%   0%  -   13%  0   130 0 3  1879
3% 0 0 0 169 0 0   2937 80 0

Re: [PERFORM] High load and iowait but no disk access

2005-08-30 Thread Rémy Beaumont


On 30-Aug-05, at 12:15, Tom Lane wrote:


=?ISO-8859-1?Q?R=E9my_Beaumont?= [EMAIL PROTECTED] writes:

The stats of the NetApp do confirm that it is sitting idle.


Really?






  CPU   NFS  CIFS  HTTP   TotalNet kB/s   Disk kB/s Tape kB/s
Cache Cache  CP   CP Disk   DAFS   FCP iSCSI   FCP  kB/s
   in   out   read  write  read write
age   hit time  ty util   in   out
   2% 0 0 0 139 0 0   2788  0 0 0
  3   96%   0%  -   15%  0   139 0 3  2277
   2% 0 0 0 144 0 0   2504  0 0 0
  3   96%   0%  -   18%  0   144 0 3  2150
   2% 0 0 0 130 0 0   2212  0 0 0
  3   96%   0%  -   13%  0   130 0 3  1879
   3% 0 0 0 169 0 0   2937 80 0 0
  3   96%   0%  -   13%  0   169 0 4  2718
   2% 0 0 0 139 0 0   2448  0 0 0
  3   96%   0%  -   12%  0   139 0 3  2096


I know zip about NetApps, but doesn't the 8th column indicate pretty
steady disk reads?

Yes, but they are very low.
At 15% usage, it's pretty much sitting idle if you consider that the OS 
reports that one of the processor is spending more then 80% of it's 
time in IOwait.


Rémy


regards, tom lane



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] High load and iowait but no disk access

2005-08-30 Thread Rémy Beaumont


On 30-Aug-05, at 12:29, Tom Lane wrote:


=?ISO-8859-1?Q?R=E9my_Beaumont?= [EMAIL PROTECTED] writes:

On 30-Aug-05, at 12:15, Tom Lane wrote:

I know zip about NetApps, but doesn't the 8th column indicate pretty
steady disk reads?



Yes, but they are very low.


Sure, but that's more or less what you'd expect if the thing is 
randomly

seeking all over the disk :-(.  Just because it's a NetApp doesn't mean
it's got zero seek time.
Per NetApp, the disk utilization percentage they report does include 
seek time, not just read/write operations.
NetApp has been involved in trying to figure out what is going on and 
their claim is that the NetApp filer is not IO bound.




You did not say what sort of query this is, but I gather that it's 
doing

an indexscan on a table that is not at all in index order.
Yes, most of those queries are doing an  indexscan.  It's a fresh 
restore of our production database that we have vacuumed/analyzed.



Possible
solutions involve reverting to a seqscan (have you forced the planner 
to
choose an indexscan here, either directly or by lowering 
random_page_cost?)

No.

or CLUSTERing the table by the index (which would need to be repeated
periodically, so it's not a great answer).
Will try to cluster the tables and see if it changes anything. Still 
doesn't explain what is going on with those seeks.


Thanks,

Rémy



regards, tom lane



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] High load and iowait but no disk access

2005-08-30 Thread Rémy Beaumont


On 30-Aug-05, at 14:32, Josh Berkus wrote:


Remy,


The behavior we see is that when running queries that do random reads
on disk, IOWAIT goes over 80% and actual disk IO falls to a crawl at a
throughput bellow 3000kB/s (We usually average 4 kB/s to 8 
kB/s

on sequential read operations on the netapps)


This seems pretty low for a NetApp -- you should be able to manage up 
to

180mb/s, if not higher.   Are you sure it's configured correctly?

Hi Josh,

The config has been reviewed by NetApp. We do get rates higher then 
80mb/s, but on average, that's what we get.


Do you have NetApp filers deployed ?
How many spindles do you have in your volume ?
On which OS are you running Postgres ?

Thanks,

Rémy



--
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco

---(end of 
broadcast)---

TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] High load and iowait but no disk access

2005-08-30 Thread Rémy Beaumont


On 30-Aug-05, at 14:46, Anjan Dave wrote:

I have seen references of changing the kernel io scheduler at boot 
time…not sure if it applies to RHEL3.0, or will help, but try setting 
‘elevator=deadline’ during boot time or via grub.conf.

That's only for RHEL 4.0.


Have you tried running a simple ‘dd’ on the LUN?

We get amazing performance using dd.

The drives are in RAID10 configuration, right?

NetApp has their own type of raid format (RAID4 aka WAFL)

Rémy

 
Thanks,
Anjan

From: Woody Woodring [mailto:[EMAIL PROTECTED]
Sent: Tuesday, August 30, 2005 2:30 PM
To: 'Rémy Beaumont'; pgsql-performance@postgresql.org
Subject: Re: [PERFORM] High load and iowait but no disk access
 
Have you tried a different kernel?  We run with a netapp over NFS 
without any issues, but we have seen high IO-wait on other Dell boxes 
(running  and not running postgres) and RHES 3.  We have replaced a 
Dell PowerEdge 350 running RH 7.3  with a PE750 with more memory 
running RHES3 and it be bogged down with IO waits due to syslog 
messages writing to the disk, the old slower server could handle it 
fine.  I don't know if it is a Dell thing or a RH kernel, but we try 
different kernels on our boxes to try to find one that works better.  
We have not found one that stands out over another consistently but we 
have been moving away from Update 2 kernel (2.4.21-15.ELsmp) due to 
server lockup issues.  Unfortunately we get the best disk throughput 
on our few remaining 7.3 boxes.

 
Woody
 
IGLASS Networks
www.iglass.net
 


From: [EMAIL PROTECTED] 
[mailto:[EMAIL PROTECTED] On Behalf Of Rémy 
Beaumont

Sent: Monday, August 29, 2005 9:43 AM
To: pgsql-performance@postgresql.org
Subject: [PERFORM] High load and iowait but no disk access
We have been trying to pinpoint what originally seem to be a I/O 
bottleneck but which now seems to be an issue with either Postgresql 
or RHES 3.


 We have the following test environment on which we can reproduce the 
problem:


 1) Test System A
 Dell 6650 Quad Xeon Pentium 4
 8 Gig of RAM
 OS: RHES 3 update 2
 Storage: NetApp FAS270 connected using an FC card using 10 disks

 2) Test System B
 Dell Dual Xeon Pentium III
 2 Gig o RAM
 OS: RHES 3 update 2
 Storage: NetApp FAS920 connected using an FC card using 28 disks

 Our Database size is around 30G.

 The behavior we see is that when running queries that do random reads 
on disk, IOWAIT goes over 80% and actual disk IO falls to a crawl at a 
throughput bellow 3000kB/s (We usually average 4 kB/s to 8 
kB/s on sequential read operations on the netapps)


 The stats of the NetApp do confirm that it is sitting idle. Doing an 
strace on the Postgresql process shows that is it doing seeks and 
reads.


 So my question is where is this iowait time spent ?
 Is there a way to pinpoint the problem in more details ?
 We are able to reproduce this behavior with Postgresql 7.4.8 and 8.0.3

 I have included the output of top,vmstat,strace and systat from the 
Netapp from System B while running a single query that generates this 
behavior.


 Rémy

 top output:
 06:27:28 up 5 days, 16:59, 6 users, load average: 1.04, 1.30, 1.01
 72 processes: 71 sleeping, 1 running, 0 zombie, 0 stopped
 CPU states: cpu user nice system irq softirq iowait idle
 total 2.7% 0.0% 1.0% 0.1% 0.2% 46.0% 49.5%
 cpu00 0.2% 0.0% 0.2% 0.0% 0.2% 2.2% 97.2%
 cpu01 5.3% 0.0% 1.9% 0.3% 0.3% 89.8% 1.9%
 Mem: 2061696k av, 2043936k used, 17760k free, 0k shrd, 3916k buff
 1566332k actv, 296648k in_d, 30504k in_c
 Swap: 16771584k av, 21552k used, 16750032k free 1933772k cached

 PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME CPU COMMAND
 30960 postgres 15 0 13424 10M 9908 D 2.7 0.5 2:00 1 postmaster
 30538 root 15 0 1080 764 524 S 0.7 0.0 0:43 0 sshd
 1 root 15 0 496 456 436 S 0.0 0.0 0:08 0 init
 2 root RT 0 0 0 0 SW 0.0 0.0 0:00 0 migration/0
 3 root RT 0 0 0 0 SW 0.0 0.0 0:00 1 migration/1
 4 root 15 0 0 0 0 SW 0.0 0.0 0:01 0 keventd
 5 root 34 19 0 0 0 SWN 0.0 0.0 0:00 0 ksoftirqd/0
 6 root 34 19 0 0 0 SWN 0.0 0.0 0:00 1 ksoftirqd/1
 9 root 15 0 0 0 0 SW 0.0 0.0 0:24 1 bdflush
 7 root 15 0 0 0 0 SW 0.0 0.0 6:53 1 kswapd
 8 root 15 0 0 0 0 SW 0.0 0.0 8:44 1 kscand
 10 root 15 0 0 0 0 SW 0.0 0.0 0:13 0 kupdated
 11 root 25 0 0 0 0 SW 0.0 0.0 0:00 0 mdrecoveryd
 17 root 15 0 0 0 0 SW 0.0 0.0 0:00 0 ahc_dv_0


 vmstat output
 procs memory swap io system cpu
 r b swpd free buff cache si so bi bo in cs us sy id wa
 0 1 21552 17796 4872 1931928 2 3 3 1 27 6 2 1 7 3
 0 1 21552 18044 4880 1931652 0 0 1652 0 397 512 1 2 50 47
 0 1 21552 17976 4896 1931664 0 0 2468 0 407 552 2 2 50 47
 1 0 21552 17984 4896 1931608 0 0 2124 0 418 538 3 3 48 46
 0 1 21552 18028 4900 1931536 0 0 1592 0 385 509 1 3 50 46
 0 1 21552 18040 4916 1931488 0 0 1620 820 419 581 2 2 50 46
 0 1 21552 17968 4916 1931536 0 4 1708 4 402 554 3 1 50 46
 1 1 21552 18052 4916 1931388 0 0 1772 0 409 531 3 1 49 47
 0 1 21552 17912 4924 1931492 0 0 1772 0 408 565 3 1 48 48
 0 1 21552 17932 4932 1931440 0 4 1356 4 391 545 5 0