subject:"RE\: \[CentOS\] 3Ware 9550SX and latency\/system responsiveness"

Re: [CentOS] 3Ware 9550SX and latency/system responsiveness

2007-10-03 Thread Feizhou




What's the best multi-threaded / multi-process io-benchmark utility that
works with filesystems instead of raw devices? and can read/write multiple
files at once..



http://untroubled.org/benchmarking/2004-04/

No raw numbers but...
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos

RE: [CentOS] 3Ware 9550SX and latency/system responsiveness

2007-10-03 Thread Simon Banton


At 13:49 -0400 2/10/07, Ross S. W. Walker wrote:

Sounds like the issue is more of a CPU issue then a disk issue, so
just upgrading the hardware and OS should make a big difference in
itself,


Yeah, that was the plan :-) Basically, we worked out what we needed 
to do (alleviate peak load CPU bottleneck by upgrading hardware), 
sought what we imagined would be suitable (dual faster CPU, hardware 
RAID 1, lots of RAM), and then ran into a brick wall with disk 
performance while testing - something that's never been an issue to 
date on the existing webservers which have a single IDE disk each.



but I would profile the SQL queries to make sure they are
not trying to bite off more then they need to.


Fair point - we've done a lot of database tuning in the 5 years this 
app's been under development, so that's pretty well covered. With the 
existing hardware, (the back-end dbserver's a 1GB 1.6GHz P4 with 
mdadm RAID 1) the dbserver load barely reaches 1 even under peak 
traffic - we're not SQL- or IO-bound, we're CPU-bound on the front 
end.



Well when you created the file system the write cache wasn't installed
yet right?


True, but there have been many wipes and installs since the BBUs have 
been available and the same long pauses when the inodes are created 
(much more noticeable with CentOS 4.5 than 5, but then the default 
nr_requests is 128 in 5 rather than 8192 in 4.5) that initially drew 
my attention are still apparent.



And it may be that when you were creating the file system it was right
after you created the RAID1 array and the controller may have been
still sync'ing up the disks, which will slow things down tremendously.


I noted that from the documentation at the outset and did an initial 
verify of the RAID 1 through the 3ware BIOS before doing the original 
install. A previous life as a technical author makes me a bit of a 
RTFM freak :-)



I agree that it is the edge cases that can come back and bite you
just be sure you don't over-scope those edge cases for situations
that will never arise.


That's why I'm now building the machine as if there wasn't an issue, 
so I can hammer it with apachebench and see if I'm tilting at 
windmills or not.


S.
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos

RE: [CentOS] 3Ware 9550SX and latency/system responsiveness

2007-10-03 Thread Ross S. W. Walker

Simon Banton wrote:
 
 At 12:59 -0400 2/10/07, Ross S. W. Walker wrote:
 Try running the same benchmark but use bs=4k and count=1048576
 
 Just finished doing that now - comparison graphs are here:
 
 http://community.novacaster.com/showarticle.pl?id=7492
 
 While these tests are running can you run any processes on another
 session?
 
 Yes, but responsiveness is sluggish eg (taken during 4k 'deadline' 
 scheduler test)
 
 # time ls /usr/lib
 
 real0m15.959s
 user0m0.011s
 sys 0m0.016s

Well it looks like the CFQ actually was able to get some reads in
while the background write was going on so it actually looks
better in this workload scenario.

I am going to retract my suggestion that 'deadline' be a general
purpose scheduler for servers based on this. Instead I would make
the following alternate suggestions.

The issue I had with CFQ is it cannot handle overlapping IO well
from multiple threads of the same process, so if your application
does that (MySQL?) then it is probably NOT the right scheduler for
you and you might be best to consider 'deadline' or 'noop' and
put your data on a separate disk/array that doesn't compete with
any other process/application.

If an application spawns multiple threads or processes to handle
completely separate data workloads (ie not overlapping io from
same workload) then you are best using the default 'cfq' I believe.

I would try some web benchmark app next with both cfq and then
deadline to see which works better in a web-server environment. For
web serving that is read only I suspect that either 'cfq' or
'deadline' will work well, but would like to know the results of
your web benchmarks.

In the end, since all your content will be in mysql and therefore
all file system operations will be read, the whole issue of being
able to read while writing a large file isn't very relevant, so I
would probably disregard it as an edge case that doesn't fit your
workload.

-Ross

__
This e-mail, and any attachments thereto, is intended only for use by
the addressee(s) named herein and may contain legally privileged
and/or confidential information. If you are not the intended recipient
of this e-mail, you are hereby notified that any dissemination,
distribution or copying of this e-mail, and any attachments thereto,
is strictly prohibited. If you have received this e-mail in error,
please immediately notify the sender and permanently delete the
original and any copy or printout thereof.

___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos

Re: [CentOS] 3Ware 9550SX and latency/system responsiveness

2007-10-02 Thread matthias platzer


hello,

i saw this thread a bit late, but I had /am having the exact same issues 
on a dual-2-core-cpu opteron box with a 9550SX. (Centos 5 x86_64)
What I did to work around them was basically switching to XFS for 
everything except / (3ware say their cards are fast, but only on XFS) 
AND using very low nr_requests for every blockdev on the 3ware card.
(like 32 or 64). That will limit the iowait times for the cpus and make 
the 3ware-drives respond faster (see yourself with iostat -x -m 1 while 
benchmarking).
If you can, you could also try _not_ putting the system disks on the 
3ware card, because additionally the 3ware driver/card gives writes 
priority. People suggested the unresponsive system behaviour is because 
the cpu hanging in iowait for writing and then reading the system 
binaries won't happen till the writes are done, so the binaries should 
be on another io path.


All this seem to be symptoms of a very complex issue consisting of 
kernel bugs/bad drivers/... and they seem to be worst on a AMD/3ware 
Combination.

here is another link:
http://bugzilla.kernel.org/show_bug.cgi?id=7372

regards,
matthias

Simon Banton schrieb:

Dear list,

I thought I'd just share my experiences with this 3Ware card, and see if 
anyone might have any suggestions.


System: Supermicro H8DA8 with 2 x Opteron 250 2.4GHz and 4GB RAM 
installed. 9550SX-8LP hosting 4x Seagate ST3250820SV 250GB in a RAID 1 
plus 2 hot spare config. The array is properly initialized, write cache 
is on, as is queueing (and supported by the drives). StoreSave set to 
Protection.


OS is CentOS 4.5 i386, minimal install, default partitioning as 
suggested by the installer (ext3, small /boot on /dev/sda1, remainder as 
/ on LVM VolGroup with 2GB swap).


Firmware from 3Ware codeset 9.4.1.2 in use, firmware/driver details:
//serv1 /c0 show all
/c0 Driver Version = 2.26.05.007
/c0 Model = 9550SX-8LP
/c0 Memory Installed  = 112MB
/c0 Firmware Version = FE9X 3.08.02.005
/c0 Bios Version = BE9X 3.08.00.002
/c0 Monitor Version = BL9X 3.01.00.006

I initially noticed something odd while installing 4.4, that writing the 
inode tables took a longer time than I expected (I thought the installer 
had frozen) and the system overall felt sluggish when doing its first 
yum update, certainly more sluggish than I'd expect with a comparatively 
powerful machine and hardware RAID 1.


I tried a few simple benchmarks (bonnie++, iozone, dd) and noticed up to 
8 pdflush commands hanging about in uninterruptible sleep when writing 
to disk, along with kjournald and kswapd from time to time. Loadave 
during writing climbed considerably (up to 12) with 'ls' taking up to 
30 seconds to give any output. I've tried CentOS 4.4, 4.5, RHEL AS 4 
update 5 (just in case) and openSUSE 10.2 and they all show the same 
symptoms.


Googling around makes me think that this may be related to queue depth, 
nr_requests and possibly VM params (the latter from 
https://bugzilla.redhat.com/show_bug.cgi?id=121434#c275). These are the 
default settings:


/sys/block/sda/device/queue_depth = 254
/sys/block/sda/queue/nr_requests = 8192
/proc/sys/vm/dirty_expire_centisecs = 3000
/proc/sys/vm/dirty_ratio = 30

3Ware mentions elevator=deadline, blockdev --setra 16384 along with 
nr_requests=512 in their performance tuning doc - these alone seem to 
make no difference to the latency problem.


Setting dirty_expire_centisecs = 1000 and dirty_ratio = 5 does indeed 
reduce the number of processes in 'b' state as reported by vmstat 1 
during an iozone benchmark (./iozone -s 20480m -r 64 -i 0 -i 1 -t 1 -b 
filename.xls as per 3Ware's own tuning doc) but the problem is obviously 
still there, just mitigated somewhat. The comparison graphs are in a PDF 
here: 
http://community.novacaster.com/attach.pl/7411/482/iozone_vm_tweaks_xls.pdf 
Incidentally, the vmstat 1 output was directed to an NFS-mounted disk to 
avoid writing it to the arry during the actual testing.


I've tried eliminating LVM from the equation, going to ext2 rather than 
ext3 and booting single-processor all to no useful effect. I've also 
tried benchmarking with different blocksizes from 512B to 1M in powers 
of 2 and the problem remains - many processes in uninterruptible sleep 
blocking other IO. I'm about to start downloading CentOS 5 to give it a 
go, and after that I might have to resort to seeing if WinXP has the 
same issue.


My only real question is where do I go from here? I don't have enough 
specific tuning knowledge to know what else to look at.


Thanks for any pointers.

Simon
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos



___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos

Re: [CentOS] 3Ware 9550SX and latency/system responsiveness

2007-10-02 Thread Simon Banton


At 12:30 +0200 2/10/07, matthias platzer wrote:


What I did to work around them was basically switching to XFS for 
everything except / (3ware say their cards are fast, but only on 
XFS) AND using very low nr_requests for every blockdev on the 3ware 
card.


Hi Matthias,

Thanks for this. In my CentOS 5 tests the nr_requests turned out by 
default to be 128, rather than the 8192 of CentOS 4.5. I'll have a go 
at reducing it still further.


If you can, you could also try _not_ putting the system disks on the 
3ware card, because additionally the 3ware driver/card gives writes 
priority.


I've noticed that kicking off a simulataneous pair of dd reads and 
writes from/to the RAID 1 array indicates that very clearly - only 
with cfq as the elevator did reads get any kind of look-in. Sadly, 
I'm not able to separate the system disks off as there's no on-board 
SATA on the mboard nor any room for inboard disks, the original 
intention was to provide the resilience of hardware RAID 1 for the 
entire machine.


People suggested the unresponsive system behaviour is because the 
cpu hanging in iowait for writing and then reading the system 
binaries won't happen till the writes are done, so the binaries 
should be on another io path.


Yup, that certainly seems to be what's happening. Wish I had another io path...

All this seem to be symptoms of a very complex issue consisting of 
kernel bugs/bad drivers/... and they seem to be worst on a AMD/3ware 
Combination.

here is another link:
http://bugzilla.kernel.org/show_bug.cgi?id=7372


Ouch - thanks for that link :-( Looks like I'm screwed big time.

S.
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos

RE: [CentOS] 3Ware 9550SX and latency/system responsiveness

2007-10-02 Thread Ross S. W. Walker

matthias platzer wrote:
 
 hello,
 
 i saw this thread a bit late, but I had /am having the exact 
 same issues 
 on a dual-2-core-cpu opteron box with a 9550SX. (Centos 5 x86_64)
 What I did to work around them was basically switching to XFS for 
 everything except / (3ware say their cards are fast, but only on XFS) 
 AND using very low nr_requests for every blockdev on the 3ware card.
 (like 32 or 64). That will limit the iowait times for the 
 cpus and make 
 the 3ware-drives respond faster (see yourself with iostat -x 
 -m 1 while 
 benchmarking).
 If you can, you could also try _not_ putting the system disks on the 
 3ware card, because additionally the 3ware driver/card gives writes 
 priority. People suggested the unresponsive system behaviour 
 is because 
 the cpu hanging in iowait for writing and then reading the system 
 binaries won't happen till the writes are done, so the 
 binaries should 
 be on another io path.
 
 All this seem to be symptoms of a very complex issue consisting of 
 kernel bugs/bad drivers/... and they seem to be worst on a AMD/3ware 
 Combination.
 here is another link:
 http://bugzilla.kernel.org/show_bug.cgi?id=7372

Actually the real-real fix was to use the 'deadline' or 'noop' scheduler
with this card as the default 'cfq' scheduler was designed to work with
a single drive and not a multiple drive RAID, so it acts as a govenor on
the amount of IO that a single process can send to the device and when
you do multiple overlapping IOs performance decreases instead of
increases.

Personnally I always use 'deadline' as my scheduler of choice.

Of course if your drivers are broken that will always negatively impact
performance.

-Ross

__
This e-mail, and any attachments thereto, is intended only for use by
the addressee(s) named herein and may contain legally privileged
and/or confidential information. If you are not the intended recipient
of this e-mail, you are hereby notified that any dissemination,
distribution or copying of this e-mail, and any attachments thereto,
is strictly prohibited. If you have received this e-mail in error,
please immediately notify the sender and permanently delete the
original and any copy or printout thereof.

___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos

RE: [CentOS] 3Ware 9550SX and latency/system responsiveness

2007-10-02 Thread Simon Banton


At 09:24 -0400 2/10/07, Ross S. W. Walker wrote:

Actually the real-real fix was to use the 'deadline' or 'noop' scheduler
with this card as the default 'cfq' scheduler was designed to work with
a single drive and not a multiple drive RAID, so it acts as a govenor on
the amount of IO that a single process can send to the device and when
you do multiple overlapping IOs performance decreases instead of
increases.


Ah - that wasn't actually a complete fix Ross, but it did give a 
noticeable improvement in certain situations. I'm still chasing a 
real real 'general purpose' fix.


S.
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos

RE: [CentOS] 3Ware 9550SX and latency/system responsiveness

2007-10-02 Thread Ross S. W. Walker

Simon Banton wrote:
 
 At 12:30 +0200 2/10/07, matthias platzer wrote:
 
 What I did to work around them was basically switching to XFS for 
 everything except / (3ware say their cards are fast, but only on 
 XFS) AND using very low nr_requests for every blockdev on the 3ware 
 card.
 
 Hi Matthias,
 
 Thanks for this. In my CentOS 5 tests the nr_requests turned out by 
 default to be 128, rather than the 8192 of CentOS 4.5. I'll have a go 
 at reducing it still further.

Yes, the nr_requests should be a realistic reflection of what the
card itself can handle. If too high you will see io_waits stack up
high.

64 or 128 are good numbers, rarely have I seen a card that can handle
a depth larger then 128 (some older scsi cards did 256 I think).

 If you can, you could also try _not_ putting the system disks on the 
 3ware card, because additionally the 3ware driver/card gives writes 
 priority.
 
 I've noticed that kicking off a simulataneous pair of dd reads and 
 writes from/to the RAID 1 array indicates that very clearly - only 
 with cfq as the elevator did reads get any kind of look-in. Sadly, 
 I'm not able to separate the system disks off as there's no on-board 
 SATA on the mboard nor any room for inboard disks, the original 
 intention was to provide the resilience of hardware RAID 1 for the 
 entire machine.

CFQ will give reads a first to the line priority, but this can cause
all sorts of negative side effects for a RAID setup, workloads can be
such that a read operation is dependant on a write succeeding first,
but both were issued in an io overlapping scenario, you can see the
problem. If reads are getting starved with your workload you can try
'anticipatory', but if I remember you have BBU write-back cache
enabled and this should really limit the impact.

You will always see an impact though, that is just the nature of it.

Writes will beat reads, random will beat sequential, it's the rock,
paper, scissors game that all storage systems must play.

 People suggested the unresponsive system behaviour is because the 
 cpu hanging in iowait for writing and then reading the system 
 binaries won't happen till the writes are done, so the binaries 
 should be on another io path.
 
 Yup, that certainly seems to be what's happening. Wish I had 
 another io path...

You can have another io path, just add more disks to the 3ware,
create another RAID array and locate your application data there.

 All this seem to be symptoms of a very complex issue consisting of 
 kernel bugs/bad drivers/... and they seem to be worst on a AMD/3ware 
 Combination.
 here is another link:
 http://bugzilla.kernel.org/show_bug.cgi?id=7372
 
 Ouch - thanks for that link :-( Looks like I'm screwed big time.

There is always a way out of any mess (without scraping the whole
project).

-Ross

__
This e-mail, and any attachments thereto, is intended only for use by
the addressee(s) named herein and may contain legally privileged
and/or confidential information. If you are not the intended recipient
of this e-mail, you are hereby notified that any dissemination,
distribution or copying of this e-mail, and any attachments thereto,
is strictly prohibited. If you have received this e-mail in error,
please immediately notify the sender and permanently delete the
original and any copy or printout thereof.

___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos

RE: [CentOS] 3Ware 9550SX and latency/system responsiveness

2007-10-02 Thread Simon Banton


What is the recurring performance problem you are seeing?


Pretty much exactly the symptoms described in 
http://bugzilla.kernel.org/show_bug.cgi?id=7372 relating to read 
starvation under heavy write IO causing sluggish system response.


I recently graphed the blocks in/blocks out from vmstat 1 for the 
same test using each of the four IO schedulers (see the PDF attached 
to the article below):


http://community.novacaster.com/showarticle.pl?id=7492

The test was:

dd if=/dev/sda of=/dev/null bs=1M count=4096 ; sleep 5; dd 
if=/dev/zero of=./4G bs=1M count=4096 


Despite appearances, interactive responsiveness subjectively felt 
better using deadline than cfq - but this is obviously an atypical 
workload and so now I'm focusing on finishing building the machine 
completely so I can try profiling the more typical patterns of 
activity that it'll experience when in use.


I find myself wondering whether the fact that the array looks like a 
single SCSI disk to the OS means that cfq is able to perform better 
in terms of interleaving reads and writes to the card but that some 
side effect of its work is causing the responsiveness issue at the 
same time. Pure speculation on my part - this is way outside my 
experience.


I'm also looking into trying an Areca card instead (avoiding LSI 
because they're cited as having the same issue in the bugzilla 
mentioned above).


S.
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos

RE: [CentOS] 3Ware 9550SX and latency/system responsiveness

2007-10-02 Thread Ross S. W. Walker

Simon Banton wrote:
 
 What is the recurring performance problem you are seeing?
 
 Pretty much exactly the symptoms described in 
 http://bugzilla.kernel.org/show_bug.cgi?id=7372 relating to read 
 starvation under heavy write IO causing sluggish system response.
 
 I recently graphed the blocks in/blocks out from vmstat 1 for the 
 same test using each of the four IO schedulers (see the PDF attached 
 to the article below):
 
 http://community.novacaster.com/showarticle.pl?id=7492
 
 The test was:
 
 dd if=/dev/sda of=/dev/null bs=1M count=4096 ; sleep 5; dd 
 if=/dev/zero of=./4G bs=1M count=4096 
 
 Despite appearances, interactive responsiveness subjectively felt 
 better using deadline than cfq - but this is obviously an atypical 
 workload and so now I'm focusing on finishing building the machine 
 completely so I can try profiling the more typical patterns of 
 activity that it'll experience when in use.
 
 I find myself wondering whether the fact that the array looks like a 
 single SCSI disk to the OS means that cfq is able to perform better 
 in terms of interleaving reads and writes to the card but that some 
 side effect of its work is causing the responsiveness issue at the 
 same time. Pure speculation on my part - this is way outside my 
 experience.
 
 I'm also looking into trying an Areca card instead (avoiding LSI 
 because they're cited as having the same issue in the bugzilla 
 mentioned above).

If the performance issue is identical to the kernel bug mentioned
in the posting then the only real fix that was mentioned was to
switch to 32bit from 64bit or to down-rev your kernel, which on
CentOS means to go down to 4.5 from 5.0.

I'm trying to get confirmation that the culprit has been isolated,
but I have a suspicion that it lies in process scheduling on x86_64
and not in the io scheduler.

And while, yes the hardware RAID appears as a single disk to the io
scheduler the CFQ makes certain assumptions on a disk's performance
characteristics that are single-disk minded.

The CFQ is meant to favor reads over writes which is more important
for a single-user workstation then a multi-user server which should
handle these fairly while preventing total starvation of either,
which is what the deadline was designed to do.

So for a server I would use 'deadline' and a workstation I would use
'cfq'.

I myself am thinking of down-reving to CentOS 4.5 to avoid the x86_64
scheduling issue, but I keep holding out that the issue will be
uncovered upstream in time for 5.1...

-Ross

__
This e-mail, and any attachments thereto, is intended only for use by
the addressee(s) named herein and may contain legally privileged
and/or confidential information. If you are not the intended recipient
of this e-mail, you are hereby notified that any dissemination,
distribution or copying of this e-mail, and any attachments thereto,
is strictly prohibited. If you have received this e-mail in error,
please immediately notify the sender and permanently delete the
original and any copy or printout thereof.

___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos

RE: [CentOS] 3Ware 9550SX and latency/system responsiveness

2007-10-02 Thread Simon Banton


At 12:41 -0400 2/10/07, Ross S. W. Walker wrote:

If the performance issue is identical to the kernel bug mentioned
in the posting then the only real fix that was mentioned was to
switch to 32bit from 64bit or to down-rev your kernel, which on
CentOS means to go down to 4.5 from 5.0.


The irony is that I'm already running 32bit[*], and that the 
responsiveness problem's worse on 4.5.


S.

* we specifically went for the Opteron 250 so we could stay at 32-bit 
because some software components we need to use may not yet be 64bit 
clean. The intention was to migrate later to 64bit on the same 
hardware, once those wrinkles had been ironed out.

___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos

RE: [CentOS] 3Ware 9550SX and latency/system responsiveness

2007-10-02 Thread Ross S. W. Walker

Simon Banton wrote:
 
 What is the recurring performance problem you are seeing?
 
 Pretty much exactly the symptoms described in 
 http://bugzilla.kernel.org/show_bug.cgi?id=7372 relating to read 
 starvation under heavy write IO causing sluggish system response.
 
 I recently graphed the blocks in/blocks out from vmstat 1 for the 
 same test using each of the four IO schedulers (see the PDF attached 
 to the article below):
 
 http://community.novacaster.com/showarticle.pl?id=7492
 
 The test was:
 
 dd if=/dev/sda of=/dev/null bs=1M count=4096 ; sleep 5; dd 
 if=/dev/zero of=./4G bs=1M count=4096 

H, with that workload I think your going to see performance issues
no matter what, as it is using really big request sizes and it it reads
into /dev/sda for 5 seconds, then at some offset starts writing a large
file and both are sequential, so it is going to turn the io into 1MB
random reads and writes which on SATA disks is really going to suck
badly (actually it'll suck on any disk). Each request is atomic so it
will not start servicing another io request until the current 1MB io
request is complete, which is a long time in computer terms.

Try running the same benchmark but use bs=4k and count=1048576

This will use 4k request size, avg VFS io size, and do it up to 4GB.

IO will still end up random but the inter-request latency should be
smaller which should provide for a better result.

While these tests are running can you run any processes on another
session? How about file system use while running?

snip

__
This e-mail, and any attachments thereto, is intended only for use by
the addressee(s) named herein and may contain legally privileged
and/or confidential information. If you are not the intended recipient
of this e-mail, you are hereby notified that any dissemination,
distribution or copying of this e-mail, and any attachments thereto,
is strictly prohibited. If you have received this e-mail in error,
please immediately notify the sender and permanently delete the
original and any copy or printout thereof.

___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos

RE: [CentOS] 3Ware 9550SX and latency/system responsiveness

2007-10-02 Thread Ross S. W. Walker

Simon Banton wrote:
 
 At 12:41 -0400 2/10/07, Ross S. W. Walker wrote:
 If the performance issue is identical to the kernel bug mentioned
 in the posting then the only real fix that was mentioned was to
 switch to 32bit from 64bit or to down-rev your kernel, which on
 CentOS means to go down to 4.5 from 5.0.
 
 The irony is that I'm already running 32bit[*], and that the 
 responsiveness problem's worse on 4.5.
 
 S.
 
 * we specifically went for the Opteron 250 so we could stay at 32-bit 
 because some software components we need to use may not yet be 64bit 
 clean. The intention was to migrate later to 64bit on the same 
 hardware, once those wrinkles had been ironed out.

Then I don't think your problem is related.

Have you tried calculating the performance of your current drives on
paper to see if it matches your reality? It may just be that your
disks suck...

What is the server going to be doing? What is the workload of your
application? It may be that it will work fine for what you need it
to do?

-Ross

__
This e-mail, and any attachments thereto, is intended only for use by
the addressee(s) named herein and may contain legally privileged
and/or confidential information. If you are not the intended recipient
of this e-mail, you are hereby notified that any dissemination,
distribution or copying of this e-mail, and any attachments thereto,
is strictly prohibited. If you have received this e-mail in error,
please immediately notify the sender and permanently delete the
original and any copy or printout thereof.

___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos

RE: [CentOS] 3Ware 9550SX and latency/system responsiveness

2007-10-02 Thread Simon Banton


At 13:03 -0400 2/10/07, Ross S. W. Walker wrote:

Have you tried calculating the performance of your current drives on
paper to see if it matches your reality? It may just be that your
disks suck...


They're performing to spec for 7200rpm SATA II drives - your help in 
determining which was the appropriate elevator to use showed that.



What is the server going to be doing? What is the workload of your
application?


Originally, it was going to be hosting a number of VMWare 
installations each containing a separate self contained LAMP website 
(for ease of subsequent migration), but that's gone by the board in 
favour of dispensing with the VMWare aspect. Now the websites will be 
NameVhosts under a single Apache directly on the native OS.


The app on each website is MySQL-backed and Perl CGI intensive. DB 
intended to be on a separate (identical) server. All running 
swimmingly at present on 4 year old single 1.6GHz P4s with single IDE 
disks, 512MB RAM and RH7.3 - except at peak times when they're a bit 
CPU bound. Loadave rarely above 1 or 2 most of the time.


Which is why I'm now focused on getting the non-VMWare approach up 
and running so I can profile it, instead of getting hung up on 
benchmarking the empty hardware. I'd never have started if I'd not 
noticed a terrific slowdown halfway through creating the filesystem 
when doing an initial CentOS 4.3 install many many weeks ago.



It may be that it will work fine for what you need it
to do?


Yeah - but it's the edge cases that bite you. Can't be doing with a 
production server where it's possible to accidentally step on an 
indeterminate trigger that sends responsiveness into a nosedive.


S.
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos

Re: [CentOS] 3Ware 9550SX and latency/system responsiveness

2007-10-02 Thread Pasi Kärkkäinen

On Tue, Oct 02, 2007 at 09:39:09AM -0400, Ross S. W. Walker wrote:
 Simon Banton wrote:
  
  At 12:30 +0200 2/10/07, matthias platzer wrote:
  
  What I did to work around them was basically switching to XFS for 
  everything except / (3ware say their cards are fast, but only on 
  XFS) AND using very low nr_requests for every blockdev on the 3ware 
  card.
  
  Hi Matthias,
  
  Thanks for this. In my CentOS 5 tests the nr_requests turned out by 
  default to be 128, rather than the 8192 of CentOS 4.5. I'll have a go 
  at reducing it still further.
 
 Yes, the nr_requests should be a realistic reflection of what the
 card itself can handle. If too high you will see io_waits stack up
 high.
 
 64 or 128 are good numbers, rarely have I seen a card that can handle
 a depth larger then 128 (some older scsi cards did 256 I think).
 

Hmm.. let's say you have a linux software md-raid array made of sata
drives.. what kind of nr_request values you should use for that for optimal
performance? 

Thanks!

-- Pasi
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos

Re: [CentOS] 3Ware 9550SX and latency/system responsiveness

2007-10-02 Thread Pasi Kärkkäinen

On Tue, Oct 02, 2007 at 08:57:28PM +0300, Pasi Kärkkäinen wrote:
 On Tue, Oct 02, 2007 at 09:39:09AM -0400, Ross S. W. Walker wrote:
  Simon Banton wrote:
   
   At 12:30 +0200 2/10/07, matthias platzer wrote:
   
   What I did to work around them was basically switching to XFS for 
   everything except / (3ware say their cards are fast, but only on 
   XFS) AND using very low nr_requests for every blockdev on the 3ware 
   card.
   
   Hi Matthias,
   
   Thanks for this. In my CentOS 5 tests the nr_requests turned out by 
   default to be 128, rather than the 8192 of CentOS 4.5. I'll have a go 
   at reducing it still further.
  
  Yes, the nr_requests should be a realistic reflection of what the
  card itself can handle. If too high you will see io_waits stack up
  high.
  
  64 or 128 are good numbers, rarely have I seen a card that can handle
  a depth larger then 128 (some older scsi cards did 256 I think).
  
 
 Hmm.. let's say you have a linux software md-raid array made of sata
 drives.. what kind of nr_request values you should use for that for optimal
 performance? 
 

Or let's put it this way:

You have a md-raid array on dom0. What kind of nr_requests values should you
use for normal 7200 rpm sata-ncq disks on intel ich8 (ncq) controller? 

And then this md-array is seen as xvdb by domU.. what kind of nr_requests
values should you use in domU? 

io-scheduler/elevator should be deadline in domU I assume.. how about in
dom0? deadline there too? 

Thanks!

-- Pasi
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos

RE: [CentOS] 3Ware 9550SX and latency/system responsiveness

2007-10-02 Thread Ross S. W. Walker

Pasi Kärkkäinen wrote:
 
 On Tue, Oct 02, 2007 at 09:39:09AM -0400, Ross S. W. Walker wrote:
  Simon Banton wrote:
   
   At 12:30 +0200 2/10/07, matthias platzer wrote:
   
   What I did to work around them was basically switching 
 to XFS for 
   everything except / (3ware say their cards are fast, but only on 
   XFS) AND using very low nr_requests for every blockdev 
 on the 3ware 
   card.
   
   Hi Matthias,
   
   Thanks for this. In my CentOS 5 tests the nr_requests 
 turned out by 
   default to be 128, rather than the 8192 of CentOS 4.5. 
 I'll have a go 
   at reducing it still further.
  
  Yes, the nr_requests should be a realistic reflection of what the
  card itself can handle. If too high you will see io_waits stack up
  high.
  
  64 or 128 are good numbers, rarely have I seen a card that 
 can handle
  a depth larger then 128 (some older scsi cards did 256 I think).
  
 
 Hmm.. let's say you have a linux software md-raid array made of sata
 drives.. what kind of nr_request values you should use for 
 that for optimal
 performance? 
 
 Thanks!

Pasi,

Good to hear from you again.

I haven't done much testing with software RAID, but after googling
around I have found that there truly is no 1 nr_requests setting
that fits all pictures.

The nr_requests is the maximum number of requests that can be queued
before the queue is unplugged and the perfect # of requests queued
is a reflection of the workload and the hardware together. Also most
system func unplug after each request, so it isn't such a big issue
unless the system is under high load.

If the default 128 isn't working I would explore hardware or RAID
configuration problems first before trying to tweak this setting.

The old nr_requests = 8192 was definitely too high.

-Ross

__
This e-mail, and any attachments thereto, is intended only for use by
the addressee(s) named herein and may contain legally privileged
and/or confidential information. If you are not the intended recipient
of this e-mail, you are hereby notified that any dissemination,
distribution or copying of this e-mail, and any attachments thereto,
is strictly prohibited. If you have received this e-mail in error,
please immediately notify the sender and permanently delete the
original and any copy or printout thereof.

___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos

RE: [CentOS] 3Ware 9550SX and latency/system responsiveness

2007-09-26 Thread Ross S. W. Walker

Simon Banton wrote:
 
 At 13:26 -0400 25/9/07, Ross S. W. Walker wrote:
 Off of 3ware's support site I was able to download and compile the
 latest stable release which has this modinfo:
 
 [EMAIL PROTECTED] driver]# modinfo 3w-9xxx.ko
 filename:   3w-9xxx.ko
 version:2.26.06.002-2.6.18
 
 OK, driver source from the 9.4.1.3 codeset 
 (3w-9xxx-2.6.18kernel_9.4.1.3.tgz) now built and installed for RHEL5, 
 new initrd created and machine re-tested.
 
 [EMAIL PROTECTED] ~]# modinfo 3w-9xxx
 filename:   
 /lib/modules/2.6.18-8.el5/kernel/drivers/scsi/3w-9xxx.ko
 version:2.26.06.002-2.6.18
 license:GPL
 description:3ware 9000 Storage Controller Linux Driver
 author: AMCC
 srcversion: 7F428E7BA74EAFF0FF137E2
 alias:  pci:v13C1d1004sv*sd*bc*sc*i*
 alias:  pci:v13C1d1003sv*sd*bc*sc*i*
 alias:  pci:v13C1d1002sv*sd*bc*sc*i*
 depends:scsi_mod
 vermagic:   2.6.18-8.el5 SMP mod_unload 686 REGPARM 
 4KSTACKS gcc-4.1
 
 tw_cli output just to be sure:
 
 //serv1 /c0 show all
 /c0 Driver Version = 2.26.06.002-2.6.18
 /c0 Model = 9550SX-8LP
 /c0 Memory Installed  = 112MB
 /c0 Firmware Version = FE9X 3.08.02.007
 /c0 Bios Version = BE9X 3.08.00.002
 /c0 Monitor Version = BL9X 3.01.00.006
 
 Well bottom line, there is something very wrong with the 3ware
 drivers on the RHEL 5 implementation.
 
 There still is then, because the figures for the LTP disktest are 
 almost identical post-update.
 
 Sequential reads:
 
 RHEL5, RAID 0:
 | 2007/09/26-09:11:27 | START | 2962 | v1.2.8 | /dev/sdb | Start 
 args: -B 4k -h 1 -I BD -K 4 -p l -P T -T 30 -r (-N 976519167) (-c) 
 (-p u)
 | 2007/09/26-09:11:57 | STAT  | 2962 | v1.2.8 | /dev/sdb | Total read 
 throughput: 2430429.9B/s (2.32MB/s), IOPS 593.4/s.
 
 RHEL5, RAID 1:
 | 2007/09/26-09:59:41 | START | 3210 | v1.2.8 | /dev/sdb | Start 
 args: -B 4k -h 1 -I BD -K 4 -p l -P T -T 30 -r (-N 488259583) (-c) 
 (-p u)
 | 2007/09/26-10:00:11 | STAT  | 3210 | v1.2.8 | /dev/sdb | Total read 
 throughput: 2566280.5B/s (2.45MB/s), IOPS 626.5/s.
 
 Sequential writes:
 
 RHEL5, RAID 0:
 | 2007/09/26-09:11:57 | START | 2971 | v1.2.8 | /dev/sdb | Start 
 args: -B 4k -h 1 -I BD -K 4 -p l -P T -T 30 -w (-N 976519167) (-c) 
 (-p u)
 | 2007/09/26-09:12:27 | STAT  | 2971 | v1.2.8 | /dev/sdb | Total 
 write throughput: 66337450.7B/s (63.26MB/s), IOPS 16195.7/s.
 
 RHEL5, RAID 1:
 | 2007/09/26-10:00:11 | START | 3217 | v1.2.8 | /dev/sdb | Start 
 args: -B 4k -h 1 -I BD -K 4 -p l -P T -T 30 -w (-N 488259583) (-c) 
 (-p u)
 | 2007/09/26-10:00:41 | STAT  | 3217 | v1.2.8 | /dev/sdb | Total 
 write throughput: 54108160.0B/s (51.60MB/s), IOPS 13210.0/s.
 
 Random reads:
 
 RHEL5, RAID 0:
 | 2007/09/26-09:12:28 | START | 2978 | v1.2.8 | /dev/sdb | Start 
 args: -B 4k -h 1 -I BD -K 4 -p r -P T -T 30 -r (-N 976519167) (-c) 
 (-D 100:0)
 | 2007/09/26-09:12:57 | STAT  | 2978 | v1.2.8 | /dev/sdb | Total read 
 throughput: 269206.1B/s (0.26MB/s), IOPS 65.7/s.
 
 RHEL5, RAID 1:
 | 2007/09/26-10:00:41 | START | 3231 | v1.2.8 | /dev/sdb | Start 
 args: -B 4k -h 1 -I BD -K 4 -p r -P T -T 30 -r (-N 488259583) (-c) 
 (-D 100:0)
 | 2007/09/26-10:01:11 | STAT  | 3231 | v1.2.8 | /dev/sdb | Total read 
 throughput: 262144.0B/s (0.25MB/s), IOPS 64.0/s.
 
 Random writes:
 
 RHEL5, RAID 0:
 | 2007/09/26-09:12:57 | START | 2987 | v1.2.8 | /dev/sdb | Start 
 args: -B 4k -h 1 -I BD -K 4 -p r -P T -T 30 -w (-N 976519167) (-c) 
 (-D 0:100)
 | 2007/09/26-09:13:34 | STAT  | 2987 | v1.2.8 | /dev/sdb | Total 
 write throughput: 1378440.5B/s (1.31MB/s), IOPS 336.5/s.
 
 RHEL5, RAID 1:
 | 2007/09/26-10:01:12 | START | 11539 | v1.2.8 | /dev/sdb | Start 
 args: -B 4k -h 1 -I BD -K 4 -p r -P T -T 30 -w (-N 488259583) (-c) 
 (-D 0:100)
 | 2007/09/26-10:01:41 | STAT  | 11539 | v1.2.8 | /dev/sdb | Total 
 write throughput: 638976.0B/s (0.61MB/s), IOPS 156.0/s.
 
 I re-ran the tests again, just to be sure (same order as above - 
 SeqR, SeqW, RandomR, RandomW):
 
 RAID 0:
 SR| 2007/09/26-10:16:53 | STAT  | 4602 | v1.2.8 | /dev/sdb | Total 
 read throughput: 2456328.8B/s (2.34MB/s), IOPS 599.7/s.
 SW| 2007/09/26-10:17:23 | STAT  | 4611 | v1.2.8 | /dev/sdb | Total 
 write throughput: 66434662.4B/s (63.36MB/s), IOPS 16219.4/s.
 RR| 2007/09/26-10:17:53 | STAT  | 4618 | v1.2.8 | /dev/sdb | Total 
 read throughput: 273612.8B/s (0.26MB/s), IOPS 66.8/s.
 RW| 2007/09/26-10:18:31 | STAT  | 4626 | v1.2.8 | /dev/sdb | Total 
 write throughput: 1424701.8B/s (1.36MB/s), IOPS 347.8/s.
 
 RAID 1:
 SR| 2007/09/26-10:12:49 | STAT  | 4509 | v1.2.8 | /dev/sdb | Total 
 read throughput: 2479718.4B/s (2.36MB/s), IOPS 605.4/s.
 SW| 2007/09/26-10:13:19 | STAT  | 4516 | v1.2.8 | /dev/sdb | Total 
 write throughput: 53864721.1B/s (51.37MB/s), IOPS 13150.6/s.
 RR| 2007/09/26-10:13:49 | STAT  | 4525 | v1.2.8 | /dev/sdb | Total 
 read throughput: 268151.5B/s (0.26MB/s), IOPS 65.5/s.
 RW| 2007/09/26-10:14:19 | STAT  | 4532 | v1.2.8 | /dev/sdb | Total 
 write throughput:

RE: [CentOS] 3Ware 9550SX and latency/system responsiveness

2007-09-26 Thread Simon Banton


At 09:14 -0400 26/9/07, Ross S. W. Walker wrote:

Could you try the benchmarks with the 'deadline' scheduler?


OK, these are all with RHEL5, driver 2.26.06.002-2.6.18, RAID 1:

elevator=deadline:
Sequential reads:
| 2007/09/26-16:19:30 | START | 3065 | v1.2.8 | /dev/sdb | Start 
args: -B 4k -h 1 -I BD -K 4 -p l -P T -T 30 -r (-N 488259583) (-c) 
(-p u)
| 2007/09/26-16:20:00 | STAT  | 3065 | v1.2.8 | /dev/sdb | Total read 
throughput: 45353642.7B/s (43.25MB/s), IOPS 11072.7/s.

Sequential writes:
| 2007/09/26-16:20:00 | START | 3082 | v1.2.8 | /dev/sdb | Start 
args: -B 4k -h 1 -I BD -K 4 -p l -P T -T 30 -w (-N 488259583) (-c) 
(-p u)
| 2007/09/26-16:20:30 | STAT  | 3082 | v1.2.8 | /dev/sdb | Total 
write throughput: 53781186.2B/s (51.29MB/s), IOPS 13130.2/s.

Random reads:
| 2007/09/26-16:20:30 | START | 3091 | v1.2.8 | /dev/sdb | Start 
args: -B 4k -h 1 -I BD -K 4 -p r -P T -T 30 -r (-N 488259583) (-c) 
(-D 100:0)
| 2007/09/26-16:21:00 | STAT  | 3091 | v1.2.8 | /dev/sdb | Total read 
throughput: 545587.2B/s (0.52MB/s), IOPS 133.2/s.

Random writes:
| 2007/09/26-16:21:00 | START | 3098 | v1.2.8 | /dev/sdb | Start 
args: -B 4k -h 1 -I BD -K 4 -p r -P T -T 30 -w (-N 488259583) (-c) 
(-D 0:100)
| 2007/09/26-16:21:44 | STAT  | 3098 | v1.2.8 | /dev/sdb | Total 
write throughput: 795852.8B/s (0.76MB/s), IOPS 194.3/s.


Here are the others for comparison.

elevator=noop:
Sequential reads:
| 2007/09/26-16:24:02 | START | 3167 | v1.2.8 | /dev/sdb | Start 
args: -B 4k -h 1 -I BD -K 4 -p l -P T -T 30 -r (-N 488259583) (-c) 
(-p u)
| 2007/09/26-16:24:32 | STAT  | 3167 | v1.2.8 | /dev/sdb | Total read 
throughput: 45467374.9B/s (43.36MB/s), IOPS 11100.4/s.

Sequential writes:
| 2007/09/26-16:24:32 | START | 3176 | v1.2.8 | /dev/sdb | Start 
args: -B 4k -h 1 -I BD -K 4 -p l -P T -T 30 -w (-N 488259583) (-c) 
(-p u)
| 2007/09/26-16:25:02 | STAT  | 3176 | v1.2.8 | /dev/sdb | Total 
write throughput: 53825672.5B/s (51.33MB/s), IOPS 13141.0/s.

Random reads:
| 2007/09/26-16:25:03 | START | 3193 | v1.2.8 | /dev/sdb | Start 
args: -B 4k -h 1 -I BD -K 4 -p r -P T -T 30 -r (-N 488259583) (-c) 
(-D 100:0)
| 2007/09/26-16:25:32 | STAT  | 3193 | v1.2.8 | /dev/sdb | Total read 
throughput: 540954.5B/s (0.52MB/s), IOPS 132.1/s.

Random writes:
| 2007/09/26-16:25:32 | START | 3202 | v1.2.8 | /dev/sdb | Start 
args: -B 4k -h 1 -I BD -K 4 -p r -P T -T 30 -w (-N 488259583) (-c) 
(-D 0:100)
| 2007/09/26-16:26:16 | STAT  | 3202 | v1.2.8 | /dev/sdb | Total 
write throughput: 795989.3B/s (0.76MB/s), IOPS 194.3/s.


elevator=anticipatory:
Sequential reads:
| 2007/09/26-16:37:04 | START | 3277 | v1.2.8 | /dev/sdb | Start 
args: -B 4k -h 1 -I BD -K 4 -p l -P T -T 30 -r (-N 488259583) (-c) 
(-p u)
| 2007/09/26-16:37:34 | STAT  | 3277 | v1.2.8 | /dev/sdb | Total read 
throughput: 45414126.9B/s (43.31MB/s), IOPS 11087.4/s.

Sequential writes:
| 2007/09/26-16:37:35 | START | 3284 | v1.2.8 | /dev/sdb | Start 
args: -B 4k -h 1 -I BD -K 4 -p l -P T -T 30 -w (-N 488259583) (-c) 
(-p u)
| 2007/09/26-16:38:04 | STAT  | 3284 | v1.2.8 | /dev/sdb | Total 
write throughput: 53895168.0B/s (51.40MB/s), IOPS 13158.0/s.

Random reads:
| 2007/09/26-16:38:04 | START | 3293 | v1.2.8 | /dev/sdb | Start 
args: -B 4k -h 1 -I BD -K 4 -p r -P T -T 30 -r (-N 488259583) (-c) 
(-D 100:0)
| 2007/09/26-16:38:34 | STAT  | 3293 | v1.2.8 | /dev/sdb | Total read 
throughput: 467080.5B/s (0.45MB/s), IOPS 114.0/s.

Random writes:
| 2007/09/26-16:38:34 | START | 3300 | v1.2.8 | /dev/sdb | Start 
args: -B 4k -h 1 -I BD -K 4 -p r -P T -T 30 -w (-N 488259583) (-c) 
(-D 0:100)
| 2007/09/26-16:39:18 | STAT  | 3300 | v1.2.8 | /dev/sdb | Total 
write throughput: 793122.1B/s (0.76MB/s), IOPS 193.6/s.


elevator=cfq (just to re-check):
Sequential reads:
| 2007/09/26-16:42:18 | START | 3353 | v1.2.8 | /dev/sdb | Start 
args: -B 4k -h 1 -I BD -K 4 -p l -P T -T 30 -r (-N 488259583) (-c) 
(-p u)
| 2007/09/26-16:42:48 | STAT  | 3353 | v1.2.8 | /dev/sdb | Total read 
throughput: 2463470.9B/s (2.35MB/s), IOPS 601.4/s.

Sequential writes:
| 2007/09/26-16:42:48 | START | 3360 | v1.2.8 | /dev/sdb | Start 
args: -B 4k -h 1 -I BD -K 4 -p l -P T -T 30 -w (-N 488259583) (-c) 
(-p u)
| 2007/09/26-16:43:18 | STAT  | 3360 | v1.2.8 | /dev/sdb | Total 
write throughput: 54572782.9B/s (52.04MB/s), IOPS 13323.4/s.

Random reads:
| 2007/09/26-16:43:19 | START | 3369 | v1.2.8 | /dev/sdb | Start 
args: -B 4k -h 1 -I BD -K 4 -p r -P T -T 30 -r (-N 488259583) (-c) 
(-D 100:0)
| 2007/09/26-16:43:48 | STAT  | 3369 | v1.2.8 | /dev/sdb | Total read 
throughput: 267652.4B/s (0.26MB/s), IOPS 65.3/s.

Random writes:
| 2007/09/26-16:43:48 | START | 3376 | v1.2.8 | /dev/sdb | Start 
args: -B 4k -h 1 -I BD -K 4 -p r -P T -T 30 -w (-N 488259583) (-c) 
(-D 0:100)
| 2007/09/26-16:44:31 | STAT  | 3376 | v1.2.8 | /dev/sdb | Total 
write throughput: 793122.1B/s (0.76MB/s), IOPS 193.6/s.


Certainly cfq is severely cramping the reads, it appears.

S.
___
CentOS mailing list

RE: [CentOS] 3Ware 9550SX and latency/system responsiveness

2007-09-26 Thread Ross S. W. Walker

Simon Banton wrote:
 
 At 09:14 -0400 26/9/07, Ross S. W. Walker wrote:
 Could you try the benchmarks with the 'deadline' scheduler?
 
 OK, these are all with RHEL5, driver 2.26.06.002-2.6.18, RAID 1:
 
 elevator=deadline:
 Sequential reads:
 | 2007/09/26-16:19:30 | START | 3065 | v1.2.8 | /dev/sdb | Start 
 args: -B 4k -h 1 -I BD -K 4 -p l -P T -T 30 -r (-N 488259583) (-c) 
 (-p u)
 | 2007/09/26-16:20:00 | STAT  | 3065 | v1.2.8 | /dev/sdb | Total read 
 throughput: 45353642.7B/s (43.25MB/s), IOPS 11072.7/s.

That's a lot better, where it should be for those drives.

 Sequential writes:
 | 2007/09/26-16:20:00 | START | 3082 | v1.2.8 | /dev/sdb | Start 
 args: -B 4k -h 1 -I BD -K 4 -p l -P T -T 30 -w (-N 488259583) (-c) 
 (-p u)
 | 2007/09/26-16:20:30 | STAT  | 3082 | v1.2.8 | /dev/sdb | Total 
 write throughput: 53781186.2B/s (51.29MB/s), IOPS 13130.2/s.

Yup, with the write-back you'll see better write throughput then
read at this block size.

 Random reads:
 | 2007/09/26-16:20:30 | START | 3091 | v1.2.8 | /dev/sdb | Start 
 args: -B 4k -h 1 -I BD -K 4 -p r -P T -T 30 -r (-N 488259583) (-c) 
 (-D 100:0)
 | 2007/09/26-16:21:00 | STAT  | 3091 | v1.2.8 | /dev/sdb | Total read 
 throughput: 545587.2B/s (0.52MB/s), IOPS 133.2/s.

Same, random io would really be affected here.

 Random writes:
 | 2007/09/26-16:21:00 | START | 3098 | v1.2.8 | /dev/sdb | Start 
 args: -B 4k -h 1 -I BD -K 4 -p r -P T -T 30 -w (-N 488259583) (-c) 
 (-D 0:100)
 | 2007/09/26-16:21:44 | STAT  | 3098 | v1.2.8 | /dev/sdb | Total 
 write throughput: 795852.8B/s (0.76MB/s), IOPS 194.3/s.

Same here.

 Here are the others for comparison.
 
 elevator=noop:
 Sequential reads:
 | 2007/09/26-16:24:02 | START | 3167 | v1.2.8 | /dev/sdb | Start 
 args: -B 4k -h 1 -I BD -K 4 -p l -P T -T 30 -r (-N 488259583) (-c) 
 (-p u)
 | 2007/09/26-16:24:32 | STAT  | 3167 | v1.2.8 | /dev/sdb | Total read 
 throughput: 45467374.9B/s (43.36MB/s), IOPS 11100.4/s.

About the same as deadline, but you'll probably be better off with
deadline as deadline will attempt to merge requests from separate
sources to the same volume while noop will just send it as it gets
it.

 Sequential writes:
 | 2007/09/26-16:24:32 | START | 3176 | v1.2.8 | /dev/sdb | Start 
 args: -B 4k -h 1 -I BD -K 4 -p l -P T -T 30 -w (-N 488259583) (-c) 
 (-p u)
 | 2007/09/26-16:25:02 | STAT  | 3176 | v1.2.8 | /dev/sdb | Total 
 write throughput: 53825672.5B/s (51.33MB/s), IOPS 13141.0/s.

Same for the others.

 Random reads:
 | 2007/09/26-16:25:03 | START | 3193 | v1.2.8 | /dev/sdb | Start 
 args: -B 4k -h 1 -I BD -K 4 -p r -P T -T 30 -r (-N 488259583) (-c) 
 (-D 100:0)
 | 2007/09/26-16:25:32 | STAT  | 3193 | v1.2.8 | /dev/sdb | Total read 
 throughput: 540954.5B/s (0.52MB/s), IOPS 132.1/s.
 Random writes:
 | 2007/09/26-16:25:32 | START | 3202 | v1.2.8 | /dev/sdb | Start 
 args: -B 4k -h 1 -I BD -K 4 -p r -P T -T 30 -w (-N 488259583) (-c) 
 (-D 0:100)
 | 2007/09/26-16:26:16 | STAT  | 3202 | v1.2.8 | /dev/sdb | Total 
 write throughput: 795989.3B/s (0.76MB/s), IOPS 194.3/s.
 
 elevator=anticipatory:
 Sequential reads:
 | 2007/09/26-16:37:04 | START | 3277 | v1.2.8 | /dev/sdb | Start 
 args: -B 4k -h 1 -I BD -K 4 -p l -P T -T 30 -r (-N 488259583) (-c) 
 (-p u)
 | 2007/09/26-16:37:34 | STAT  | 3277 | v1.2.8 | /dev/sdb | Total read 
 throughput: 45414126.9B/s (43.31MB/s), IOPS 11087.4/s.

While anticipatory appears to be an adequate choice here it will
cause performance issues from multiple writers as it keeps trying to
anticipate those reads. For a server deadline is still the best.

 Sequential writes:
 | 2007/09/26-16:37:35 | START | 3284 | v1.2.8 | /dev/sdb | Start 
 args: -B 4k -h 1 -I BD -K 4 -p l -P T -T 30 -w (-N 488259583) (-c) 
 (-p u)
 | 2007/09/26-16:38:04 | STAT  | 3284 | v1.2.8 | /dev/sdb | Total 
 write throughput: 53895168.0B/s (51.40MB/s), IOPS 13158.0/s.
 Random reads:
 | 2007/09/26-16:38:04 | START | 3293 | v1.2.8 | /dev/sdb | Start 
 args: -B 4k -h 1 -I BD -K 4 -p r -P T -T 30 -r (-N 488259583) (-c) 
 (-D 100:0)
 | 2007/09/26-16:38:34 | STAT  | 3293 | v1.2.8 | /dev/sdb | Total read 
 throughput: 467080.5B/s (0.45MB/s), IOPS 114.0/s.
 Random writes:
 | 2007/09/26-16:38:34 | START | 3300 | v1.2.8 | /dev/sdb | Start 
 args: -B 4k -h 1 -I BD -K 4 -p r -P T -T 30 -w (-N 488259583) (-c) 
 (-D 0:100)
 | 2007/09/26-16:39:18 | STAT  | 3300 | v1.2.8 | /dev/sdb | Total 
 write throughput: 793122.1B/s (0.76MB/s), IOPS 193.6/s.
 
 elevator=cfq (just to re-check):
 Sequential reads:
 | 2007/09/26-16:42:18 | START | 3353 | v1.2.8 | /dev/sdb | Start 
 args: -B 4k -h 1 -I BD -K 4 -p l -P T -T 30 -r (-N 488259583) (-c) 
 (-p u)
 | 2007/09/26-16:42:48 | STAT  | 3353 | v1.2.8 | /dev/sdb | Total read 
 throughput: 2463470.9B/s (2.35MB/s), IOPS 601.4/s.

CFQ is intended for single disk workstations and it's io limits are
based on that, so it actually acts as an io govenor on RAID setups.

Only use 'cfq' on single disk workstations.

Use 'deadline' on RAID setups and servers.

 Sequential writes:
 |

RE: [CentOS] 3Ware 9550SX and latency/system responsiveness

2007-09-26 Thread Simon Banton


At 12:01 -0400 26/9/07, Ross S. W. Walker wrote:

CFQ is intended for single disk workstations and it's io limits are
based on that, so it actually acts as an io govenor on RAID setups.

Only use 'cfq' on single disk workstations.

Use 'deadline' on RAID setups and servers.


Many thanks Ross, that's one variable tied down at least :-)

S.
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos

RE: [CentOS] 3Ware 9550SX and latency/system responsiveness

2007-09-25 Thread Simon Banton


At 13:35 -0400 24/9/07, Ross S. W. Walker wrote:

Ok, so here is the command I would use:


Thanks - here are the results (tried CentOS 4.5 and RHEL5, with tests 
on sdb when configured as both RAID 0 and as RAID 1):



Sequential reads:
disktest -B 4k -h 1 -I BD -K 4 -p l -P T -T 300 -r /dev/sdX


CentOS 4.5, RAID 0:
| 2007/09/25-14:26:58 | STAT  | 13944 | v1.2.8 | /dev/sdb | Total 
read throughput: 50249728.0B/s (47.92MB/s), IOPS 12268.0/s.

| 2007/09/25-14:26:58 | END   | 13944 | v1.2.8 | /dev/sdb | Test Done (Passed)

CentOS 4.5, RAID 1:
| 2007/09/25-14:20:06 | STAT  | 13807 | v1.2.8 | /dev/sdb | Total 
read throughput: 44994150.4B/s (42.91MB/s), IOPS 10984.9/s.

| 2007/09/25-14:20:06 | END   | 13807 | v1.2.8 | /dev/sdb | Test Done (Passed)

RHEL5, RAID 0:
| 2007/09/25-11:07:46 | STAT  | 2835 | v1.2.8 | /dev/sdb | Total read 
throughput: 2405171.2B/s (2.29MB/s), IOPS 587.2/s.

| 2007/09/25-11:07:46 | END   | 2835 | v1.2.8 | /dev/sdb | Test Done (Passed)

RHEL5, RAID 1:
| 2007/09/25-11:35:53 | STAT  | 3022 | v1.2.8 | /dev/sdb | Total read 
throughput: 2461696.0B/s (2.35MB/s), IOPS 601.0/s.

| 2007/09/25-11:35:53 | END   | 3022 | v1.2.8 | /dev/sdb | Test Done (Passed)


Sequential writes:
disktest -B 4k -h 1 -I BD -K 4 -p l -P T -T 300 -w /dev/sdX


CentOS 4.5, RAID 0:
| 2007/09/25-14:28:19 | STAT  | 13951 | v1.2.8 | /dev/sdb | Total 
write throughput: 66150946.1B/s (63.09MB/s), IOPS 16150.1/s.

| 2007/09/25-14:28:19 | END   | 13951 | v1.2.8 | /dev/sdb | Test Done (Passed)

CentOS 4.5, RAID 1:
| 2007/09/25-14:21:52 | STAT  | 13815 | v1.2.8 | /dev/sdb | Total 
write throughput: 53170039.5B/s (50.71MB/s), IOPS 12981.0/s.

| 2007/09/25-14:21:52 | END   | 13815 | v1.2.8 | /dev/sdb | Test Done (Passed)

RHEL5, RAID 0:
| 2007/09/25-11:13:44 | STAT  | 2850 | v1.2.8 | /dev/sdb | Total 
write throughput: 66031616.0B/s (62.97MB/s), IOPS 16121.0/s.

| 2007/09/25-11:13:44 | END   | 2850 | v1.2.8 | /dev/sdb | Test Done (Passed)

RHEL5, RAID 1:
| 2007/09/25-11:36:36 | STAT  | 3031 | v1.2.8 | /dev/sdb | Total 
write throughput: 56870229.3B/s (54.24MB/s), IOPS 13884.3/s.

| 2007/09/25-11:36:36 | END   | 3031 | v1.2.8 | /dev/sdb | Test Done (Passed)


Random reads:
disktest -B 4k -h 1 -I BD -K 4 -p r -P T -T 300 -r /dev/sdX


CentOS 4.5, RAID 0:
| 2007/09/25-14:28:59 | STAT  | 13958 | v1.2.8 | /dev/sdb | Total 
read throughput: 504217.6B/s (0.48MB/s), IOPS 123.1/s.

| 2007/09/25-14:28:59 | END   | 13958 | v1.2.8 | /dev/sdb | Test Done (Passed)

CentOS 4.5, RAID 1:
| 2007/09/25-14:23:14 | STAT  | 13822 | v1.2.8 | /dev/sdb | Total 
read throughput: 549570.2B/s (0.52MB/s), IOPS 134.2/s.

| 2007/09/25-14:23:14 | END   | 13822 | v1.2.8 | /dev/sdb | Test Done (Passed)

RHEL5, RAID 0:
| 2007/09/25-11:16:21 | STAT  | 2875 | v1.2.8 | /dev/sdb | Total read 
throughput: 273612.8B/s (0.26MB/s), IOPS 66.8/s.

| 2007/09/25-11:16:21 | END   | 2875 | v1.2.8 | /dev/sdb | Test Done (Passed)

RHEL5, RAID 1:
| 2007/09/25-11:39:20 | STAT  | 3042 | v1.2.8 | /dev/sdb | Total read 
throughput: 546816.0B/s (0.52MB/s), IOPS 133.5/s.

| 2007/09/25-11:39:20 | END   | 3042 | v1.2.8 | /dev/sdb | Test Done (Passed)


Random writes:
disktest -B 4k -h 1 -I BD -K 4 -p r -P T -T 300 -w /dev/sdX


CentOS 4.5, RAID 0:
| 2007/09/25-14:29:34 | STAT  | 13965 | v1.2.8 | /dev/sdb | Total 
write throughput: 1379532.8B/s (1.32MB/s), IOPS 336.8/s.

| 2007/09/25-14:29:34 | END   | 13965 | v1.2.8 | /dev/sdb | Test Done (Passed)

CentOS 4.5, RAID 1:
| 2007/09/25-14:24:15 | STAT  | 13829 | v1.2.8 | /dev/sdb | Total 
write throughput: 782199.5B/s (0.75MB/s), IOPS 191.0/s.

| 2007/09/25-14:24:15 | END   | 13829 | v1.2.8 | /dev/sdb | Test Done (Passed)

RHEL5, RAID 0:
| 2007/09/25-11:19:21 | STAT  | 2894 | v1.2.8 | /dev/sdb | Total 
write throughput: 1377894.4B/s (1.31MB/s), IOPS 336.4/s.

| 2007/09/25-11:19:21 | END   | 2894 | v1.2.8 | /dev/sdb | Test Done (Passed)

RHEL5 RAID 1:
| 2007/09/25-11:40:08 | STAT  | 3049 | v1.2.8 | /dev/sdb | Total 
write throughput: 798310.4B/s (0.76MB/s), IOPS 194.9/s.

| 2007/09/25-11:40:08 | END   | 3049 | v1.2.8 | /dev/sdb | Test Done (Passed)

I'm not sure what to make of it, mind you.

Cheers
S.
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos

RE: [CentOS] 3Ware 9550SX and latency/system responsiveness

2007-09-25 Thread Ross S. W. Walker

Simon Banton wrote:
 
 At 13:35 -0400 24/9/07, Ross S. W. Walker wrote:
 Ok, so here is the command I would use:
 
 Thanks - here are the results (tried CentOS 4.5 and RHEL5, with tests 
 on sdb when configured as both RAID 0 and as RAID 1):
 
 Sequential reads:
 disktest -B 4k -h 1 -I BD -K 4 -p l -P T -T 300 -r /dev/sdX
 
 CentOS 4.5, RAID 0:
 | 2007/09/25-14:26:58 | STAT  | 13944 | v1.2.8 | /dev/sdb | Total 
 read throughput: 50249728.0B/s (47.92MB/s), IOPS 12268.0/s.
 | 2007/09/25-14:26:58 | END   | 13944 | v1.2.8 | /dev/sdb | 
 Test Done (Passed)

Ok, this is a 2 disk RAID0? If so then this is ok, not the fastest
config (60MB/s for fast drives) but mid-level SATA performance.

 CentOS 4.5, RAID 1:
 | 2007/09/25-14:20:06 | STAT  | 13807 | v1.2.8 | /dev/sdb | Total 
 read throughput: 44994150.4B/s (42.91MB/s), IOPS 10984.9/s.
 | 2007/09/25-14:20:06 | END   | 13807 | v1.2.8 | /dev/sdb | 
 Test Done (Passed)

Statisically equivalent to RAID0, which is a good sign, as it means the
3ware is doing striped reads off a RAID1.

 RHEL5, RAID 0:
 | 2007/09/25-11:07:46 | STAT  | 2835 | v1.2.8 | /dev/sdb | Total read 
 throughput: 2405171.2B/s (2.29MB/s), IOPS 587.2/s.
 | 2007/09/25-11:07:46 | END   | 2835 | v1.2.8 | /dev/sdb | 
 Test Done (Passed)

Ok there is a problem here with the driver on RHEL5, are you running
the latest version off of 3ware's site?

Can you send the output of a modinfo driver name?

 RHEL5, RAID 1:
 | 2007/09/25-11:35:53 | STAT  | 3022 | v1.2.8 | /dev/sdb | Total read 
 throughput: 2461696.0B/s (2.35MB/s), IOPS 601.0/s.
 | 2007/09/25-11:35:53 | END   | 3022 | v1.2.8 | /dev/sdb | 
 Test Done (Passed)

Same bad result here too... at least it's consistently bad, definitely
points to a bad driver.

 Sequential writes:
 disktest -B 4k -h 1 -I BD -K 4 -p l -P T -T 300 -w /dev/sdX
 
 CentOS 4.5, RAID 0:
 | 2007/09/25-14:28:19 | STAT  | 13951 | v1.2.8 | /dev/sdb | Total 
 write throughput: 66150946.1B/s (63.09MB/s), IOPS 16150.1/s.
 | 2007/09/25-14:28:19 | END   | 13951 | v1.2.8 | /dev/sdb | 
 Test Done (Passed)

Good write performance here, the BBU cache is definitely helping.

 CentOS 4.5, RAID 1:
 | 2007/09/25-14:21:52 | STAT  | 13815 | v1.2.8 | /dev/sdb | Total 
 write throughput: 53170039.5B/s (50.71MB/s), IOPS 12981.0/s.
 | 2007/09/25-14:21:52 | END   | 13815 | v1.2.8 | /dev/sdb | 
 Test Done (Passed)

Also good write performance here with the BBU cache, RAID1 is
going to be slower by nature as it writes twice for each write,
but the BBU cache is minimizing the hurt.

 RHEL5, RAID 0:
 | 2007/09/25-11:13:44 | STAT  | 2850 | v1.2.8 | /dev/sdb | Total 
 write throughput: 66031616.0B/s (62.97MB/s), IOPS 16121.0/s.
 | 2007/09/25-11:13:44 | END   | 2850 | v1.2.8 | /dev/sdb | 
 Test Done (Passed)

Write performance on RHEL 5 doesn't seem to be affected here,
maybe it's only read performance, maybe the BBU cache is hiding
the problem.

 RHEL5, RAID 1:
 | 2007/09/25-11:36:36 | STAT  | 3031 | v1.2.8 | /dev/sdb | Total 
 write throughput: 56870229.3B/s (54.24MB/s), IOPS 13884.3/s.
 | 2007/09/25-11:36:36 | END   | 3031 | v1.2.8 | /dev/sdb | 
 Test Done (Passed)

Same thing here, good write performance.

 Random reads:
 disktest -B 4k -h 1 -I BD -K 4 -p r -P T -T 300 -r /dev/sdX
 
 CentOS 4.5, RAID 0:
 | 2007/09/25-14:28:59 | STAT  | 13958 | v1.2.8 | /dev/sdb | Total 
 read throughput: 504217.6B/s (0.48MB/s), IOPS 123.1/s.
 | 2007/09/25-14:28:59 | END   | 13958 | v1.2.8 | /dev/sdb | 
 Test Done (Passed)

And here is where the difference between a 15K drive and a 7200
RPM drive appears, though with RAID0 one would expect to see
around 1MB, what chunk size does it use 64K?

 CentOS 4.5, RAID 1:
 | 2007/09/25-14:23:14 | STAT  | 13822 | v1.2.8 | /dev/sdb | Total 
 read throughput: 549570.2B/s (0.52MB/s), IOPS 134.2/s.
 | 2007/09/25-14:23:14 | END   | 13822 | v1.2.8 | /dev/sdb | 
 Test Done (Passed)

This is the correct performance of a RAID1 for 7200 RPM drives.

 RHEL5, RAID 0:
 | 2007/09/25-11:16:21 | STAT  | 2875 | v1.2.8 | /dev/sdb | Total read 
 throughput: 273612.8B/s (0.26MB/s), IOPS 66.8/s.
 | 2007/09/25-11:16:21 | END   | 2875 | v1.2.8 | /dev/sdb | 
 Test Done (Passed)

This also shows a serious performance degradation here, the numbers
should be similar to RHEL 4.5 numbers.

 RHEL5, RAID 1:
 | 2007/09/25-11:39:20 | STAT  | 3042 | v1.2.8 | /dev/sdb | Total read 
 throughput: 546816.0B/s (0.52MB/s), IOPS 133.5/s.
 | 2007/09/25-11:39:20 | END   | 3042 | v1.2.8 | /dev/sdb | 
 Test Done (Passed)

This is an oddity, and is inconsistent. I would have expected this
number to be low too, but it is showing normal throughput for this
configuration. I wouldn't put any faith in this and if you ran it
3 times in a row it will probably post slow numbers 2 out of the 3
times.

 Random writes:
 disktest -B 4k -h 1 -I BD -K 4 -p r -P T -T 300 -w /dev/sdX
 
 CentOS 4.5, RAID 0:
 | 2007/09/25-14:29:34 | STAT  | 13965 | v1.2.8 | /dev/sdb | Total 
 write throughput: 1379532.8B/s (1.32MB/s), IOPS 336.8/s.
 |

RE: [CentOS] 3Ware 9550SX and latency/system responsiveness

2007-09-25 Thread Ross S. W. Walker

Simon Banton wrote:
 
 At 10:36 -0400 25/9/07, Ross S. W. Walker wrote:
 Post the modinfo driver name to the list just in case somebody
 else knows of any issues with the version you are running.
 
 This is from RHEL5 - it's the driver that comes built-in:
 
 [EMAIL PROTECTED] ~]# modinfo 3w-9xxx
 filename:   
 /lib/modules/2.6.18-8.el5/kernel/drivers/scsi/3w-9xxx.ko
 version:2.26.02.007
 license:GPL
 description:3ware 9000 Storage Controller Linux Driver
 author: AMCC
 srcversion: 029473DD729D96D687985E4
 alias:  pci:v13C1d1003sv*sd*bc*sc*i*
 alias:  pci:v13C1d1002sv*sd*bc*sc*i*
 depends:scsi_mod
 vermagic:   2.6.18-8.el5 SMP mod_unload 686 REGPARM 
 4KSTACKS gcc-4.1
 
 The overall responsiveness of the system under benchmarking when 
 running RHEL5 is somewhat better than that when running CentOS 4.5.

Off of 3ware's support site I was able to download and compile the
latest stable release which has this modinfo:

[EMAIL PROTECTED] driver]# modinfo 3w-9xxx.ko
filename:   3w-9xxx.ko
version:2.26.06.002-2.6.18
license:GPL
description:3ware 9000 Storage Controller Linux Driver
author: AMCC
srcversion: 7F428E7BA74EAFF0FF137E2
alias:  pci:v13C1d1004sv*sd*bc*sc*i*
alias:  pci:v13C1d1003sv*sd*bc*sc*i*
alias:  pci:v13C1d1002sv*sd*bc*sc*i*
depends:scsi_mod
vermagic:   2.6.20-1.2320.fc5smp SMP mod_unload 686 4KSTACKS

I compiled it on a fc5 box, but that shouldn't matter.

As far as the responsiveness is concerned, the 2.6.18 kernel made
some substantial improvements to the cfq io scheduler that helped
with interactive user experience. If you are using the box as a
server though I would try the deadline or noop scheduler with
this card as you may see substantial performance improvements with
these schedulers and this card.

Try setting the io scheduler to deadline and re-run the benchmarks
to see what I mean.

-Ross

__
This e-mail, and any attachments thereto, is intended only for use by
the addressee(s) named herein and may contain legally privileged
and/or confidential information. If you are not the intended recipient
of this e-mail, you are hereby notified that any dissemination,
distribution or copying of this e-mail, and any attachments thereto,
is strictly prohibited. If you have received this e-mail in error,
please immediately notify the sender and permanently delete the
original and any copy or printout thereof.

___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos

Re: [CentOS] 3Ware 9550SX and latency/system responsiveness

2007-09-24 Thread Simon Banton


At 07:46 +0800 24/9/07, Feizhou wrote:
... plus an Out of Memory kill of sshd. Second time around (logged 
in on the console rather than over ssh), it's just the same except 
it's hald that happens to get clobbered instead.


Are you saying that running in RAID0 mode with this card and 
motherboard combination, you get a memory leak? Who is the culprit?


I don't know if it's caused by a memory leak or something else, I'm 
just describing what happens. I would be tempted to suspect the RAM 
itself if another identical machine didn't have exactly the same 
issue.



what's left to try?


Bug report...


I've reported the issue to 3ware but they've not responded. I 
replicated the problem with RHEL AS 4 update 5 and contacted RedHat 
but they told me evaluation subscriptions aren't supported.


I see there's a new firmware version out today (3ware codeset 
9.4.1.3... I guess I'll update it and push the whole thing back up 
the hill for another go.


I hope that fixes things for you.


Maybe I'm thinking about this all wrong - maybe this responsiveness 
issue won't even arise during normal operation, perhaps it's just a 
symptom of intensive benchmarking when all the resources of the 
machine are devoted to throwing data at the card/disks as fast as 
possible. I'm now way out of my depth, frankly.


I'm going to try the latest firmware upgrade, followed by RHEL/CentOS 
5, and finally see if I can replicate with a different card (Areca or 
LSI, perhaps).


Thanks for all the feedback, at least I feel as if I've tried every 
conceivable obvious thing.


S.
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos

RE: [CentOS] 3Ware 9550SX and latency/system responsiveness

2007-09-24 Thread Ross S. W. Walker

Simon Banton wrote:
 
 At 07:46 +0800 24/9/07, Feizhou wrote:
 ... plus an Out of Memory kill of sshd. Second time around (logged 
 in on the console rather than over ssh), it's just the same except 
 it's hald that happens to get clobbered instead.
 
 Are you saying that running in RAID0 mode with this card and 
 motherboard combination, you get a memory leak? Who is the culprit?
 
 I don't know if it's caused by a memory leak or something else, I'm 
 just describing what happens. I would be tempted to suspect the RAM 
 itself if another identical machine didn't have exactly the same 
 issue.
 
 what's left to try?
 
 Bug report...
 
 I've reported the issue to 3ware but they've not responded. I 
 replicated the problem with RHEL AS 4 update 5 and contacted RedHat 
 but they told me evaluation subscriptions aren't supported.
 
 I see there's a new firmware version out today (3ware codeset 
 9.4.1.3... I guess I'll update it and push the whole thing back up 
 the hill for another go.
 
 I hope that fixes things for you.
 
 Maybe I'm thinking about this all wrong - maybe this responsiveness 
 issue won't even arise during normal operation, perhaps it's just a 
 symptom of intensive benchmarking when all the resources of the 
 machine are devoted to throwing data at the card/disks as fast as 
 possible. I'm now way out of my depth, frankly.
 
 I'm going to try the latest firmware upgrade, followed by RHEL/CentOS 
 5, and finally see if I can replicate with a different card (Areca or 
 LSI, perhaps).
 
 Thanks for all the feedback, at least I feel as if I've tried every 
 conceivable obvious thing.

In the end it just may be that the card cannot perform under the load
you need it to.

How about trying your benchmarks with the 'disktest' utility from the
LTP (Linux Test Project), the utility can perform benchmarks on raw
block devices and it gives very accurate benchmarks, it is also a
lot easier to setup and use.

-Ross

__
This e-mail, and any attachments thereto, is intended only for use by
the addressee(s) named herein and may contain legally privileged
and/or confidential information. If you are not the intended recipient
of this e-mail, you are hereby notified that any dissemination,
distribution or copying of this e-mail, and any attachments thereto,
is strictly prohibited. If you have received this e-mail in error,
please immediately notify the sender and permanently delete the
original and any copy or printout thereof.

___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos

RE: [CentOS] 3Ware 9550SX and latency/system responsiveness

2007-09-24 Thread Simon Banton


At 10:04 -0400 24/9/07, Ross S. W. Walker wrote:

How about trying your benchmarks with the 'disktest' utility from the
LTP (Linux Test Project),


Now fetched and installed - I'd be grateful for a suggestion as to an 
appropriate disktest command line for a 4GB RAM twin CPU box with 
250GB RAID 1 array, because I think you had your tongue in your cheek 
when you said:



it is also a
lot easier to setup and use.


S.
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos

RE: [CentOS] 3Ware 9550SX and latency/system responsiveness

2007-09-24 Thread Ross S. W. Walker

Simon Banton wrote:
 
 At 10:04 -0400 24/9/07, Ross S. W. Walker wrote:
 How about trying your benchmarks with the 'disktest' utility from the
 LTP (Linux Test Project),
 
 Now fetched and installed - I'd be grateful for a suggestion as to an 
 appropriate disktest command line for a 4GB RAM twin CPU box with 
 250GB RAID 1 array, because I think you had your tongue in your cheek 
 when you said:
 
 it is also a
 lot easier to setup and use.

Ok, so maybe it's easy for me... I forget that nothing is easy the first
time using it.

Ok, so here is the command I would use:

Sequential reads:
disktest -B 4k -h 1 -I BD -K 4 -p l -P T -T 300 -r /dev/sdX

Sequential writes:
disktest -B 4k -h 1 -I BD -K 4 -p l -P T -T 300 -w /dev/sdX

Random writes:
disktest -B 4k -h 1 -I BD -K 4 -p r -P T -T 300 -r /dev/sdX

Random writes:
disktest -B 4k -h 1 -I BD -K 4 -p r -P T -T 300 -w /dev/sdX


Description of the options used:
-B 4k = 4k block ios
-h 1 = 1 second heartbeat
-I BD = block device, direct io
-K 4 = 4 threads, or 4 outstanding/overlapping ios, typical pattern
(use -K 1 for the raw performance of single drive, aka dd type output)
-p l|r = io type, l=linear, r=random
-P T = output metrics type Throughput
-T 300 = duration of test 300 seconds
-r = read
-w = write

These tests will run across the whole disk/partition and the write tests
WILL BE DESTRUCTIVE so be warned!

-Ross

__
This e-mail, and any attachments thereto, is intended only for use by
the addressee(s) named herein and may contain legally privileged
and/or confidential information. If you are not the intended recipient
of this e-mail, you are hereby notified that any dissemination,
distribution or copying of this e-mail, and any attachments thereto,
is strictly prohibited. If you have received this e-mail in error,
please immediately notify the sender and permanently delete the
original and any copy or printout thereof.

___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos

Re: [CentOS] 3Ware 9550SX and latency/system responsiveness

2007-09-24 Thread Feizhou


Simon Banton wrote:

At 07:46 +0800 24/9/07, Feizhou wrote:
... plus an Out of Memory kill of sshd. Second time around (logged in 
on the console rather than over ssh), it's just the same except it's 
hald that happens to get clobbered instead.


Are you saying that running in RAID0 mode with this card and 
motherboard combination, you get a memory leak? Who is the culprit?


I don't know if it's caused by a memory leak or something else, I'm just 
describing what happens. I would be tempted to suspect the RAM itself if 
another identical machine didn't have exactly the same issue.


This is worth checking and reporting to 3ware.




what's left to try?


Bug report...


I've reported the issue to 3ware but they've not responded. I replicated 
the problem with RHEL AS 4 update 5 and contacted RedHat but they told 
me evaluation subscriptions aren't supported.


If you have something that is reproducible like the above, they will 
definitely be interested. Nothing like causing instability to get them 
on the case.

___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos

Re: [CentOS] 3Ware 9550SX and latency/system responsiveness

2007-09-23 Thread Feizhou


Simon Banton wrote:

At 17:34 +0800 14/9/07, Feizhou wrote:

.ohdo you have a BBU for your write cache on your 3ware board?


Not installed, but the machine's on a UPS.


Ugh. The 3ware code will not give OK then until the stuff has hit disk.


Having now installed BBUs, it's made no difference to the underlying 
responsiveness problem I'm afraid.


So a 3ware card will give OK once the stuff is in the cache and you have 
selected write-cache enable even if there is no BBU? My apologies. My 
previous experience has been with the 75xx and 85xx series which do not 
have ram caches.




With ports 2 and 3 now configured as RAID 0, with ext3 filesystem and 
mounted on /mnt/raidtest, running this bonnie++ command:


bonnie++ -m RA-256_NR-8192 -n 0 -u 0 -r 4096 -s 20480 -f -b -d 
/mnt/raidtest


(RA- and NR- relate to kernel params for readahead and nr_requests 
respectively - the values above are Centos post-installation defaults)


...causes load to climb:

16:36:12 up 13 min,  2 users,  load average: 8.77, 4.78, 1.98

... and uninterruptible processes:

 ps ax | grep D
  PID TTY  STAT   TIME COMMAND
   59 ?D  0:03 [kswapd0]
 2159 ?D  0:01 [kjournald]
 2923 ?Ds 0:00 syslogd -m 0
 4155 ?D  0:00 [pdflush]
 4175 ?D  0:00 [pdflush]
 4192 ?D  0:00 [pdflush]
 4193 ?D  0:00 [pdflush]
 4197 ?D  0:00 [pdflush]
 4199 ?D  0:00 [pdflush]
 4201 pts/1R+ 0:00 grep D

... plus an Out of Memory kill of sshd. Second time around (logged in on 
the console rather than over ssh), it's just the same except it's hald 
that happens to get clobbered instead.


Are you saying that running in RAID0 mode with this card and motherboard 
combination, you get a memory leak? Who is the culprit?




Now that the presence or otherwise of a BBU has been ruled out along 
with OS, 3ware recommended kernel param tweaks, RAID level, LVM, slot 
speed, different but identical-spec hardware (both machine and card), 
what's left to try?


Bug report...



I see there's a new firmware version out today (3ware codeset 9.4.1.3 - 
driver's still at 2.26.05.007 but the fw's updated to from 3.08.02.005 
to 3.08.02.007), so I guess I'll update it and push the whole thing back 
up the hill for another go.




I hope that fixes things for you.
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos

Re: [CentOS] 3Ware 9550SX and latency/system responsiveness

2007-09-21 Thread Simon Banton


At 17:34 +0800 14/9/07, Feizhou wrote:

.ohdo you have a BBU for your write cache on your 3ware board?


Not installed, but the machine's on a UPS.


Ugh. The 3ware code will not give OK then until the stuff has hit disk.


Having now installed BBUs, it's made no difference to the underlying 
responsiveness problem I'm afraid.


With ports 2 and 3 now configured as RAID 0, with ext3 filesystem and 
mounted on /mnt/raidtest, running this bonnie++ command:


bonnie++ -m RA-256_NR-8192 -n 0 -u 0 -r 4096 -s 20480 -f -b -d /mnt/raidtest

(RA- and NR- relate to kernel params for readahead and nr_requests 
respectively - the values above are Centos post-installation defaults)


...causes load to climb:

16:36:12 up 13 min,  2 users,  load average: 8.77, 4.78, 1.98

... and uninterruptible processes:

 ps ax | grep D
  PID TTY  STAT   TIME COMMAND
   59 ?D  0:03 [kswapd0]
 2159 ?D  0:01 [kjournald]
 2923 ?Ds 0:00 syslogd -m 0
 4155 ?D  0:00 [pdflush]
 4175 ?D  0:00 [pdflush]
 4192 ?D  0:00 [pdflush]
 4193 ?D  0:00 [pdflush]
 4197 ?D  0:00 [pdflush]
 4199 ?D  0:00 [pdflush]
 4201 pts/1R+ 0:00 grep D

... plus an Out of Memory kill of sshd. Second time around (logged in 
on the console rather than over ssh), it's just the same except it's 
hald that happens to get clobbered instead.


Now that the presence or otherwise of a BBU has been ruled out along 
with OS, 3ware recommended kernel param tweaks, RAID level, LVM, slot 
speed, different but identical-spec hardware (both machine and card), 
what's left to try?


I see there's a new firmware version out today (3ware codeset 9.4.1.3 
- driver's still at 2.26.05.007 but the fw's updated to from 
3.08.02.005 to 3.08.02.007), so I guess I'll update it and push the 
whole thing back up the hill for another go.


If there's anyone out there with a 9550SX and a two-disk RAID 1 or 
RAID 0 config on CentOS 4.5 who can give the above bonnie++ benchmark 
a go (params adjusted for their own installed RAM - I'm benchmarking 
using 5x my installed amount) and let me know if they also have the 
same responsiveness problem or not, I'd seriously appreciate it.


S.
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos

RE: [CentOS] 3Ware 9550SX and latency/system responsiveness

2007-09-18 Thread Ross S. W. Walker

Feizhou wrote:
 
  Is there any way to tell the card to forget about not 
 having a BBU 
  and behave as if it did?
  Short of modifying the code...I do not know of any.
  Well, I've now got BBUs on order for the three identical 
 machines to 
  see if that does anything to improve matters - I'll report 
 back when 
  I've fitted them. A glance through the 2.26.05.007 driver 
 code shows 
  no references to the BBU, so the different code paths 
 (with BBU and 
  without) must be in the firmware itself.
  
  If your card is on a PCI riser try running it plugged 
 directly in the
  slot (if you can) and see if that helps.
  
 
 He said his card is directly plugged in.

Doh, problem with the long threads, one forgets everything
that was mentioned earlier unless they re-read the whole
thread again.

-Ross

__
This e-mail, and any attachments thereto, is intended only for use by
the addressee(s) named herein and may contain legally privileged
and/or confidential information. If you are not the intended recipient
of this e-mail, you are hereby notified that any dissemination,
distribution or copying of this e-mail, and any attachments thereto,
is strictly prohibited. If you have received this e-mail in error,
please immediately notify the sender and permanently delete the
original and any copy or printout thereof.

___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos

Re: [CentOS] 3Ware 9550SX and latency/system responsiveness

2007-09-18 Thread Feizhou



Is there any way to tell the card to forget about not having a BBU 
and behave as if it did?

Short of modifying the code...I do not know of any.
Well, I've now got BBUs on order for the three identical machines to 
see if that does anything to improve matters - I'll report back when 
I've fitted them. A glance through the 2.26.05.007 driver code shows 
no references to the BBU, so the different code paths (with BBU and 
without) must be in the firmware itself.


If your card is on a PCI riser try running it plugged directly in the
slot (if you can) and see if that helps.



He said his card is directly plugged in.
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos

Re: [CentOS] 3Ware 9550SX and latency/system responsiveness

2007-09-14 Thread Simon Banton

Hmm, how are you creating your ext3 filesystem(s) that you test on? 
Try creating it with a large journal (maybe 256MB) and run it in 
full journal mode.


The filesystem was created during the initial CentOS installation, 
and I've tried it with ext2 which made no difference.


S.
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos

Re: [CentOS] 3Ware 9550SX and latency/system responsiveness

2007-09-14 Thread Feizhou


Simon Banton wrote:
Hmm, how are you creating your ext3 filesystem(s) that you test on? 
Try creating it with a large journal (maybe 256MB) and run it in full 
journal mode.


The filesystem was created during the initial CentOS installation, and 
I've tried it with ext2 which made no difference.




The journal size was probably 32MB then. A 128MB or larger journal in 
full journal mode on a 3ware card with a BBU write cache should make a 
big difference because fsync calls will now return OK as soon as it hits 
the write cacheohdo you have a BBU for your write cache on your 
3ware board?

___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos

Re: [CentOS] 3Ware 9550SX and latency/system responsiveness

2007-09-14 Thread Simon Banton


At 17:34 +0800 14/9/07, Feizhou wrote:

.ohdo you have a BBU for your write cache on your 3ware board?


Not installed, but the machine's on a UPS.

I see where you're going with larger journal idea and I'll give that a go.

Cheers
S.
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos

RE: [CentOS] 3Ware 9550SX and latency/system responsiveness

2007-09-14 Thread Ross S. W. Walker

Simon Banton wrote:
 
 At 08:09 -0400 14/9/07, Jim Perrin wrote:
 Have you done any filesystem optimization and tried matching the
 filesystem to the raid chunk size?
 
 No, I haven't. This is 3ware hardware RAID-1 on two disks with a 
 single LVM ext3 / partition - I'm afraid I don't know how to go about 
 discovering the chunk size to plug into Ross's calcs.

I wouldn't worry about calculations at this point. Everything you
described is pointing to bad hardware and not software tuning.

Try getting another identical 3ware card and swapping them. If it
produces the same problem, then try putting that card in another
box with a different motherboard to see if it works then.

Once the basic card is working then we can talk about getting the
maximum performance out of it.

-Ross

__
This e-mail, and any attachments thereto, is intended only for use by
the addressee(s) named herein and may contain legally privileged
and/or confidential information. If you are not the intended recipient
of this e-mail, you are hereby notified that any dissemination,
distribution or copying of this e-mail, and any attachments thereto,
is strictly prohibited. If you have received this e-mail in error,
please immediately notify the sender and permanently delete the
original and any copy or printout thereof.

___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos

Re: [CentOS] 3Ware 9550SX and latency/system responsiveness

2007-09-14 Thread Sebastian Walter

Simon Banton wrote:
 At 08:09 -0400 14/9/07, Jim Perrin wrote:
 Have you done any filesystem optimization and tried matching the
 filesystem to the raid chunk size?

 No, I haven't. This is 3ware hardware RAID-1 on two disks with a
 single LVM ext3 / partition - I'm afraid I don't know how to go about
 discovering the chunk size to plug into Ross's calcs.

You can see the chunk size either in the raid's BIOS tool (Alt-3 at
startup) or, if installed, in the 3dm CLI (defaults to 64k, I think).

-- Sebastian
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos

Re: [CentOS] 3Ware 9550SX and latency/system responsiveness

2007-09-14 Thread Simon Banton


At 15:43 +0200 14/9/07, Sebastian Walter wrote:

Simon Banton wrote:
  No, I haven't. This is 3ware hardware RAID-1 on two disks with a

 single LVM ext3 / partition - I'm afraid I don't know how to go about
 discovering the chunk size to plug into Ross's calcs.


You can see the chunk size either in the raid's BIOS tool (Alt-3 at
startup) or, if installed, in the 3dm CLI (defaults to 64k, I think).


Hmmm, from what I can see in the tw_cli documentation, stripe size 
(and hence, presumably, chunk size) doesn't apply to RAID 1.


(apologies is the formatting goes awry):

Stripe consists of the logical unit stripe size to be used. The 
following table illustrates the supported and applicable stripes on 
unit types and controller models. Stripe size units are in K (kilo 
bytes).


 Model | Raid0   | Raid1  | Raid5  | Raid10  | JBOD | Spare | 
Raid50 | Single |


--+-+++-+--+---+++
 9K|   16|   N/A  |   16   |   16| N/A  |  N/A  | 
16   |   N/A  |
   |   64||   64   |   64|  |   | 
64   ||
   |   256   ||   256  |   256   |  |   | 
256  ||


--+-+++-+--+---+++

I'm focused now on swapping the card for a fresh one to see if it 
makes any difference, as per Ross's suggestion.


S.
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos

RE: [CentOS] 3Ware 9550SX and latency/system responsiveness

2007-09-14 Thread Simon Banton


At 09:41 -0400 14/9/07, Ross S. W. Walker wrote:

Try getting another identical 3ware card and swapping them. If it
produces the same problem, then try putting that card in another
box with a different motherboard to see if it works then.


I've got three identical machines here - two as yet not unpacked - so 
I guess I'd better start unpacking another one. Getting hold of a 
comparable class machine with a different motherboard is going to be 
tricky though.


It's going to be a busy weekend...

S.
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos

RE: [CentOS] 3Ware 9550SX and latency/system responsiveness

2007-09-14 Thread Ross S. W. Walker

Simon Banton wrote:
 
 At 15:43 +0200 14/9/07, Sebastian Walter wrote:
 Simon Banton wrote:
No, I haven't. This is 3ware hardware RAID-1 on two disks with a
   single LVM ext3 / partition - I'm afraid I don't know how 
 to go about
   discovering the chunk size to plug into Ross's calcs.
 
 You can see the chunk size either in the raid's BIOS tool (Alt-3 at
 startup) or, if installed, in the 3dm CLI (defaults to 64k, I think).
 
 Hmmm, from what I can see in the tw_cli documentation, stripe size 
 (and hence, presumably, chunk size) doesn't apply to RAID 1.

Some cards do, some cards don't. I suppose the idea is by chunking the
data in a RAID 1 the logic on the card can be the same between a RAID1
and a RAID10 (just a 2 drive RAID10) and therefore can save RD money
and NVRAM costs due to smaller firmware.

 (apologies is the formatting goes awry):
 
 Stripe consists of the logical unit stripe size to be used. The 
 following table illustrates the supported and applicable stripes on 
 unit types and controller models. Stripe size units are in K (kilo 
 bytes).
 

I'm going to guess that it is not configurable, but 3ware uses 64k
chunk size on RAID1 simply for the cost savings mentioned earlier.

It will have 0 performance impact either way.

 
 I'm focused now on swapping the card for a fresh one to see if it 
 makes any difference, as per Ross's suggestion.

Well I should have said try on another identical machine too in case
there is something off on this one, and since you have 2 others and
time to kill until a different model machine arrives...

-Ross

__
This e-mail, and any attachments thereto, is intended only for use by
the addressee(s) named herein and may contain legally privileged
and/or confidential information. If you are not the intended recipient
of this e-mail, you are hereby notified that any dissemination,
distribution or copying of this e-mail, and any attachments thereto,
is strictly prohibited. If you have received this e-mail in error,
please immediately notify the sender and permanently delete the
original and any copy or printout thereof.

___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos

RE: [CentOS] 3Ware 9550SX and latency/system responsiveness

2007-09-14 Thread Ross S. W. Walker

Feizhou wrote:
 
 Simon Banton wrote:
  At 17:34 +0800 14/9/07, Feizhou wrote:
  .ohdo you have a BBU for your write cache on your 3ware board?
  
  Not installed, but the machine's on a UPS.
 
 Ugh. The 3ware code will not give OK then until the stuff has 
 hit disk.
 
  
  I see where you're going with larger journal idea and I'll 
 give that a go.
 
 Well, I do not think it will help much with a larger 
 journal...you want 
 RAM speed, not single 250GB SATA disk speed.

Yes, a write-back cache with a BBU will definitely help, also your config,

 4x Seagate ST3250820SV 250GB in a RAID 1 plus 2 hot spare config

is kinda wasteful, why not create a 4 disk RAID10 and get a 5th drive for
a hot-spare.

Also think about getting 2 internal SATA drives for the OS and keep the
RAID10 as purely for data, that should make things humm nicely and to be
able to upgrade your data storage without messing with your OS/application
installation. It wouldn't cost a lot either, 2 SATA drives + 1 SAS drive.

-Ross

__
This e-mail, and any attachments thereto, is intended only for use by
the addressee(s) named herein and may contain legally privileged
and/or confidential information. If you are not the intended recipient
of this e-mail, you are hereby notified that any dissemination,
distribution or copying of this e-mail, and any attachments thereto,
is strictly prohibited. If you have received this e-mail in error,
please immediately notify the sender and permanently delete the
original and any copy or printout thereof.

___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos

Re: [CentOS] 3Ware 9550SX and latency/system responsiveness

2007-09-14 Thread Simon Banton


At 23:07 +0800 14/9/07, Feizhou wrote:
Well, I do not think it will help much with a larger journal...you 
want RAM speed, not single 250GB SATA disk speed.


Right now, I'd be happy with being able to configure the 3Ware care 
as a plain old SATA II passthru interface and do software RAID1 with 
mdadm - but no, Export JBOD doesn't seem possible any more with the 
9550 (unless the units have previously been JBODs on earlier cards), 
you've got to use their 'Single Disk' config which exhibits exactly 
the same problems.


S.
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos

RE: [CentOS] 3Ware 9550SX and latency/system responsiveness

2007-09-14 Thread Simon Banton


At 11:16 -0400 14/9/07, Ross S. W. Walker wrote:

Yes, a write-back cache with a BBU will definitely help, also your config,


The write-cache is enabled, but what I've not known up to now is that 
the absence of a BBU will impact IO performance in this way - which 
seems to be what you and Feizhou are saying. Is there any way to tell 
the card to forget about not having a BBU and behave as if it did?


The main problem here is the latency when under IO load not the 
throughput (or lack of). I don't care if it can't achieve 300MB/s 
sustained write speeds, only that it shouldn't bring the machine to 
its knees in the process of getting 35MB/s.



  4x Seagate ST3250820SV 250GB in a RAID 1 plus 2 hot spare config

is kinda wasteful, why not create a 4 disk RAID10 and get a 5th drive for
a hot-spare.


Logistics meant that it was more important to be able to cope with a 
disk failure without needing to visit the hosting centre immediately 
afterwards (which we'd have to do if there was only one hot spare).



Also think about getting 2 internal SATA drives for the OS and keep the
RAID10 as purely for data, that should make things humm nicely and to be
able to upgrade your data storage without messing with your OS/application
installation. It wouldn't cost a lot either, 2 SATA drives + 1 SAS drive.


The server box is a Supermicro AS2020A - there is no onboard SATA nor 
any space for internal disks - there are 6 bays on a hot swap 
backplane and they're all cabled to the 3ware controller.


I've unpacked and fired up one of the other identical machines and 
moved the drives from the original to this one and booted straight 
off them.


The only difference between hardware is that the firmware on the 
3Ware card in this one has not been updated (it's 3.04.00.005 from 
codeset 9.3.0 as opposed to 3.08.02.005 from 9.4.1.2).


# /opt/iozone/bin/iozone -s 20480m -r 64 -i 0 -i 1 -t 1

Original box:
  Initial write 34208.20703
Rewrite 38133.20313
   Read 79596.36719
Re-read 79669.22656

Newly unpacked box:
  Initial write 50230.10547
Rewrite 46108.17969
   Read 78739.14844
Re-read 79325.11719

... but the new one still shows the same IO blocking/responsiveness issue.

S.
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos

Re: [CentOS] 3Ware 9550SX and latency/system responsiveness

2007-09-13 Thread Feizhou


Simon Banton wrote:

Dear list,

I thought I'd just share my experiences with this 3Ware card, and see if 
anyone might have any suggestions.


System: Supermicro H8DA8 with 2 x Opteron 250 2.4GHz and 4GB RAM 
installed. 9550SX-8LP hosting 4x Seagate ST3250820SV 250GB in a RAID 1 
plus 2 hot spare config. The array is properly initialized, write cache 
is on, as is queueing (and supported by the drives). StoreSave set to 
Protection.


Well, the first thing I noted was that the H8DA8 was not on the list of 
compatible motherboards on the 3ware website.




My only real question is where do I go from here? I don't have enough 
specific tuning knowledge to know what else to look at.




Perhaps update to the latest firmware for both motherboard and 3ware 
board. Also check that you actually plugged the thing into a PCI-X 
64-bit 100/133 Mhz slot and that it is running at those speeds. Next 
question would be whether you are using a riser board?

___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos

45 matches

Mail list logo