Re: I/O performance issues on 2.4.23 SMP system

2004-02-03 Thread Theodore Knab
> >>I was the poster who initiated the previous thread on this subject.  The
> >>problem disappeared here after we went down to 2 GB of memory (although
> >>we physically removed it from the server rather than passing the arg to
> >>the kernel... shouldn't make a difference though, I'd imagine).  We went
> >>straight from 4 GB to 2 GB, so I can't comment on the results of using 3
> >>GB.

The above comment sounds a lot like a bounce buffer issue. This is not an IO 
issue.

Bounce Buffer issues look a like like IO problems on the surface. However, the 
IO
bus will get a messy from having to much memory feeding it. Bounce Buffer 
issues can occur
anytime you use over 2GB of RAM on a 32bit system.

I have a Dual SMP Xeon 700 (32 bit) with 10GB of RAM in it. 
It is under a 10-20% CPU load daily.

Originally, I had a bounce buffer problem that occurred during backups and 
heavy IO loads.
The output from sar, system activity report, told me that process switches were 
not recovering 
after backups. IO loads would 'snowball' after backups.

Generally, the whole system seemed to get overwhelmed and unstable after a 
heavy 
IO event, like a backup. I found this strange.

Since the patch has been applied the server has been running very stable for 
over 43 days.

I fixed the problem with following:

This Bounce Buffer problem was resolved with the 00_block-highmem-all-18b-3 
patch.
http://www.kernel.org/pub/linux/kernel/people/andrea

For example, the following sar output shows a normal recovery after a heavy IO 
event:

22:30:01  all83 089 1302172  0.35  0.33  0.35
074 090
183 088

-> backup started #rsync 100GB RAID 5 Array

23:40:01  all3   14 18261731166  1.44  1.46  1.52

00:00:02  cpu %usr %sys %nice %idle pswch/s runq nrproc lavg1 lavg5 avg15 _cpu_
00:10:01  all3   14 18256793166  1.62  1.56  1.53
03   13 183
13   15 181
00:20:01  all4   14 18260683156  1.45  1.46  1.46
03   14 182
14   14 181
00:30:01  all2   13 18355855161  1.10  1.16  1.29
03   13 184
12   14 183
00:40:01  all38 18831912146  0.12  0.63  1.01
038 188
138 188
00:50:01  all33 095  863139  0.15  0.23  0.60
-> sync finished

If you sar output does not look like this after a backup, and you have more 
than 2GB of RAM 
something is probably going on with a buffer. You can fix it two ways, upgrade 
to a 64Bit machine or
patch your kernel with the block-highmem patch written by Andrea.

My Kernel: 2.4.18

image=/boot/vmlinuz-2.4.18
#Compiled using GCC-2.95 on new IMAP server
#Debian 2.4.18 Kernel package
#Debian 2.4.18 xfs kernel patch
#block-highmemory patch from 
http://www.kernel.org/pub/linux/kernel/people/andrea/kernels/v2.4/
#00_block-highmem-all-18b-3
#HIMEM Kernel Support to 64GB
#HIMEME IO Support added
label=LinuxHIMEM
read-only

My Hardware:
00:00.0 Host bridge: ServerWorks CNB20HE Host Bridge (rev 21)
00:00.1 Host bridge: ServerWorks CNB20HE Host Bridge (rev 01)
00:00.2 Host bridge: ServerWorks: Unknown device 0006
00:00.3 Host bridge: ServerWorks: Unknown device 0006
00:01.0 SCSI storage controller: Adaptec 7896
00:01.1 SCSI storage controller: Adaptec 7896
00:05.0 Ethernet controller: Advanced Micro Devices [AMD] 79c970 [PCnet LANCE] 
(rev 44)
00:06.0 VGA compatible controller: S3 Inc. Trio 64 3D (rev 01)
00:0f.0 ISA bridge: ServerWorks OSB4 South Bridge (rev 4f)
00:0f.1 IDE interface: ServerWorks OSB4 IDE Controller
00:0f.2 USB Controller: ServerWorks OSB4/CSB5 OHCI USB Controller (rev 04)
01:01.0 RAID bus controller: IBM Netfinity ServeRAID controller
01:02.0 RAID bus controller: IBM Netfinity ServeRAID controller
02:06.0 Ethernet controller: Intel Corp. 82557 [Ethernet Pro 100] (rev 0c)


On 03/02/04 13:25 -0600, Benjamin Sherman wrote:
> Thanks to all who sent comments on this. I did some more testing and 
> went straight to the source for input.
> 
> 
> if you want to try the 4G patch then i'd suggest Andrew Morton's -mm 
> tree, which has it included:
> 
> http://kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.2-rc2/2.6.2-rc2-mm2/
> 
> i've got a 2.4 backport too, included in RHEL3. (the SRPM is
> downloadable.) But extracting the patch from this srpm will likely not
> apply to a vanilla 2.4 tree - there are lots of other patches as well 
> and interdependencies. So i'd suggest the RHEL3 kernel as-is, or the -mm 
> tree in 2.6.
> 
> Ingo
> 
> 
> Of course, as newer kernels are released, Andrew releases newer -mm 
> patches. This patch set solved the I/O problem and let me use 4GB RAM.
> 
> 
> 
> Mark Ferlatte wrote

Re: I/O performance issues on 2.4.23 SMP system

2004-02-03 Thread Theodore Knab
> >>I was the poster who initiated the previous thread on this subject.  The
> >>problem disappeared here after we went down to 2 GB of memory (although
> >>we physically removed it from the server rather than passing the arg to
> >>the kernel... shouldn't make a difference though, I'd imagine).  We went
> >>straight from 4 GB to 2 GB, so I can't comment on the results of using 3
> >>GB.

The above comment sounds a lot like a bounce buffer issue. This is not an IO issue.

Bounce Buffer issues look a like like IO problems on the surface. However, the IO
bus will get a messy from having to much memory feeding it. Bounce Buffer issues can 
occur
anytime you use over 2GB of RAM on a 32bit system.

I have a Dual SMP Xeon 700 (32 bit) with 10GB of RAM in it. 
It is under a 10-20% CPU load daily.

Originally, I had a bounce buffer problem that occurred during backups and heavy IO 
loads.
The output from sar, system activity report, told me that process switches were not 
recovering 
after backups. IO loads would 'snowball' after backups.

Generally, the whole system seemed to get overwhelmed and unstable after a heavy 
IO event, like a backup. I found this strange.

Since the patch has been applied the server has been running very stable for over 43 
days.

I fixed the problem with following:

This Bounce Buffer problem was resolved with the 00_block-highmem-all-18b-3 patch.
http://www.kernel.org/pub/linux/kernel/people/andrea

For example, the following sar output shows a normal recovery after a heavy IO event:

22:30:01  all83 089 1302172  0.35  0.33  0.35
074 090
183 088

-> backup started #rsync 100GB RAID 5 Array

23:40:01  all3   14 18261731166  1.44  1.46  1.52

00:00:02  cpu %usr %sys %nice %idle pswch/s runq nrproc lavg1 lavg5 avg15 _cpu_
00:10:01  all3   14 18256793166  1.62  1.56  1.53
03   13 183
13   15 181
00:20:01  all4   14 18260683156  1.45  1.46  1.46
03   14 182
14   14 181
00:30:01  all2   13 18355855161  1.10  1.16  1.29
03   13 184
12   14 183
00:40:01  all38 18831912146  0.12  0.63  1.01
038 188
138 188
00:50:01  all33 095  863139  0.15  0.23  0.60
-> sync finished

If you sar output does not look like this after a backup, and you have more than 2GB 
of RAM 
something is probably going on with a buffer. You can fix it two ways, upgrade to a 
64Bit machine or
patch your kernel with the block-highmem patch written by Andrea.

My Kernel: 2.4.18

image=/boot/vmlinuz-2.4.18
#Compiled using GCC-2.95 on new IMAP server
#Debian 2.4.18 Kernel package
#Debian 2.4.18 xfs kernel patch
#block-highmemory patch from 
http://www.kernel.org/pub/linux/kernel/people/andrea/kernels/v2.4/
#00_block-highmem-all-18b-3
#HIMEM Kernel Support to 64GB
#HIMEME IO Support added
label=LinuxHIMEM
read-only

My Hardware:
00:00.0 Host bridge: ServerWorks CNB20HE Host Bridge (rev 21)
00:00.1 Host bridge: ServerWorks CNB20HE Host Bridge (rev 01)
00:00.2 Host bridge: ServerWorks: Unknown device 0006
00:00.3 Host bridge: ServerWorks: Unknown device 0006
00:01.0 SCSI storage controller: Adaptec 7896
00:01.1 SCSI storage controller: Adaptec 7896
00:05.0 Ethernet controller: Advanced Micro Devices [AMD] 79c970 [PCnet LANCE] (rev 44)
00:06.0 VGA compatible controller: S3 Inc. Trio 64 3D (rev 01)
00:0f.0 ISA bridge: ServerWorks OSB4 South Bridge (rev 4f)
00:0f.1 IDE interface: ServerWorks OSB4 IDE Controller
00:0f.2 USB Controller: ServerWorks OSB4/CSB5 OHCI USB Controller (rev 04)
01:01.0 RAID bus controller: IBM Netfinity ServeRAID controller
01:02.0 RAID bus controller: IBM Netfinity ServeRAID controller
02:06.0 Ethernet controller: Intel Corp. 82557 [Ethernet Pro 100] (rev 0c)


On 03/02/04 13:25 -0600, Benjamin Sherman wrote:
> Thanks to all who sent comments on this. I did some more testing and 
> went straight to the source for input.
> 
> 
> if you want to try the 4G patch then i'd suggest Andrew Morton's -mm 
> tree, which has it included:
> 
> http://kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.2-rc2/2.6.2-rc2-mm2/
> 
> i've got a 2.4 backport too, included in RHEL3. (the SRPM is
> downloadable.) But extracting the patch from this srpm will likely not
> apply to a vanilla 2.4 tree - there are lots of other patches as well 
> and interdependencies. So i'd suggest the RHEL3 kernel as-is, or the -mm 
> tree in 2.6.
> 
> Ingo
> 
> 
> Of course, as newer kernels are released, Andrew releases newer -mm 
> patches. This patch set solved the I/O problem and let me use 4GB RAM.
> 
> 
> 
> Mark Ferlatte wrote:
> 
>

Re: I/O performance issues on 2.4.23 SMP system

2004-02-03 Thread Benjamin Sherman
Thanks to all who sent comments on this. I did some more testing and 
went straight to the source for input.


if you want to try the 4G patch then i'd suggest Andrew Morton's -mm 
tree, which has it included:

http://kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.2-rc2/2.6.2-rc2-mm2/
i've got a 2.4 backport too, included in RHEL3. (the SRPM is
downloadable.) But extracting the patch from this srpm will likely not
apply to a vanilla 2.4 tree - there are lots of other patches as well 
and interdependencies. So i'd suggest the RHEL3 kernel as-is, or the -mm 
tree in 2.6.

Ingo

Of course, as newer kernels are released, Andrew releases newer -mm 
patches. This patch set solved the I/O problem and let me use 4GB RAM.


Mark Ferlatte wrote:
Daniel Erat said on Thu, Jan 29, 2004 at 08:08:49AM -0800:
I was the poster who initiated the previous thread on this subject.  The
problem disappeared here after we went down to 2 GB of memory (although
we physically removed it from the server rather than passing the arg to
the kernel... shouldn't make a difference though, I'd imagine).  We went
straight from 4 GB to 2 GB, so I can't comment on the results of using 3
GB.
Our problem didn't seem to directly correspond with the 1 GB threshold
-- it wouldn't manifest itself until the server had allocated all 4 GB
of RAM.  After a reboot, it would be nice and speedy again for a day or
two until all the memory was being used for buffering again.

This was the behavior I saw as well.  I did a bunch of research and source
reading before actually figuring out what was going on; it wasn't a well
documented bug for some reason... I guess there aren't that many people running
large boxes using 2.4.
This makes me think that the problems I saw with 2GB were not related to the IO
subsystem, but were something else.  Time to go play around a bit; getting
those boxes up to 2GB without having to do a kernel patch/upgrade cycle would
be nice.
M
--
Benjamin Sherman
Software Developer
Iowa Interactive, Inc
515-323-3468 x14
[EMAIL PROTECTED]



Re: I/O performance issues on 2.4.23 SMP system

2004-02-03 Thread Benjamin Sherman
Thanks to all who sent comments on this. I did some more testing and 
went straight to the source for input.


if you want to try the 4G patch then i'd suggest Andrew Morton's -mm 
tree, which has it included:

http://kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.2-rc2/2.6.2-rc2-mm2/

i've got a 2.4 backport too, included in RHEL3. (the SRPM is
downloadable.) But extracting the patch from this srpm will likely not
apply to a vanilla 2.4 tree - there are lots of other patches as well 
and interdependencies. So i'd suggest the RHEL3 kernel as-is, or the -mm 
tree in 2.6.

Ingo

Of course, as newer kernels are released, Andrew releases newer -mm 
patches. This patch set solved the I/O problem and let me use 4GB RAM.



Mark Ferlatte wrote:

Daniel Erat said on Thu, Jan 29, 2004 at 08:08:49AM -0800:

I was the poster who initiated the previous thread on this subject.  The
problem disappeared here after we went down to 2 GB of memory (although
we physically removed it from the server rather than passing the arg to
the kernel... shouldn't make a difference though, I'd imagine).  We went
straight from 4 GB to 2 GB, so I can't comment on the results of using 3
GB.
Our problem didn't seem to directly correspond with the 1 GB threshold
-- it wouldn't manifest itself until the server had allocated all 4 GB
of RAM.  After a reboot, it would be nice and speedy again for a day or
two until all the memory was being used for buffering again.


This was the behavior I saw as well.  I did a bunch of research and source
reading before actually figuring out what was going on; it wasn't a well
documented bug for some reason... I guess there aren't that many people running
large boxes using 2.4.
This makes me think that the problems I saw with 2GB were not related to the IO
subsystem, but were something else.  Time to go play around a bit; getting
those boxes up to 2GB without having to do a kernel patch/upgrade cycle would
be nice.
M
--
Benjamin Sherman
Software Developer
Iowa Interactive, Inc
515-323-3468 x14
[EMAIL PROTECTED]
--
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]


Re: I/O performance issues on 2.4.23 SMP system

2004-01-30 Thread Russell Coker
On Fri, 30 Jan 2004 01:02, Jeff S Wheeler <[EMAIL PROTECTED]> wrote:
> I don't know anything about thos 2.4.23 I/O problem, but I will tell you
> that RAID 5 is not the way to go for big SQL performance. In a RAID 5
> array, all the heads must move for every operation. You already spent a
> lot of money on that server. I suggest you buy more disks for RAID 10.

Any decent RAID-5 implementation will have a non-volatile write-back cache.  
This will hugely increase performance as it allows the possibility of 
combining writes.  NB  This is something that Linux software RAID lacks 
support for.

Moving all heads is not required for every operation.  Reading from all disks 
is not required for a read unless an entire line is to be brought in, last 
time I did read benchmarks it seemed that this wasn't being done on Mylex 
RAID controllers or Sun Metadisk (never done any real tests on Linux software 
RAID-5).

Reading from all disks is not necessarily required for a one-block write 
either.  Reading the block that is to be written and the parity block is 
enough.  New parity block will be old_block ^ old_parity ^ new_block.  Doing 
reads from and writes to two disks should be significantly faster than reads 
from all disks and writes to two.

The benchmark results Craig Sanders posted when comparing RAID-5 and RAID-10 
were surprising, RAID-5 won many of the test scenarios!  I recall that Craig 
posted the results to this list, a google search should return them.

-- 
http://www.coker.com.au/selinux/   My NSA Security Enhanced Linux packages
http://www.coker.com.au/bonnie++/  Bonnie++ hard drive benchmark
http://www.coker.com.au/postal/Postal SMTP/POP benchmark
http://www.coker.com.au/~russell/  My home page




Re: I/O performance issues on 2.4.23 SMP system

2004-01-30 Thread Russell Coker
On Fri, 30 Jan 2004 01:02, Jeff S Wheeler <[EMAIL PROTECTED]> wrote:
> I don't know anything about thos 2.4.23 I/O problem, but I will tell you
> that RAID 5 is not the way to go for big SQL performance. In a RAID 5
> array, all the heads must move for every operation. You already spent a
> lot of money on that server. I suggest you buy more disks for RAID 10.

Any decent RAID-5 implementation will have a non-volatile write-back cache.  
This will hugely increase performance as it allows the possibility of 
combining writes.  NB  This is something that Linux software RAID lacks 
support for.

Moving all heads is not required for every operation.  Reading from all disks 
is not required for a read unless an entire line is to be brought in, last 
time I did read benchmarks it seemed that this wasn't being done on Mylex 
RAID controllers or Sun Metadisk (never done any real tests on Linux software 
RAID-5).

Reading from all disks is not necessarily required for a one-block write 
either.  Reading the block that is to be written and the parity block is 
enough.  New parity block will be old_block ^ old_parity ^ new_block.  Doing 
reads from and writes to two disks should be significantly faster than reads 
from all disks and writes to two.

The benchmark results Craig Sanders posted when comparing RAID-5 and RAID-10 
were surprising, RAID-5 won many of the test scenarios!  I recall that Craig 
posted the results to this list, a google search should return them.

-- 
http://www.coker.com.au/selinux/   My NSA Security Enhanced Linux packages
http://www.coker.com.au/bonnie++/  Bonnie++ hard drive benchmark
http://www.coker.com.au/postal/Postal SMTP/POP benchmark
http://www.coker.com.au/~russell/  My home page


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Re: I/O performance issues on 2.4.23 SMP system

2004-01-29 Thread Mark Ferlatte
Daniel Erat said on Thu, Jan 29, 2004 at 08:08:49AM -0800:
> I was the poster who initiated the previous thread on this subject.  The
> problem disappeared here after we went down to 2 GB of memory (although
> we physically removed it from the server rather than passing the arg to
> the kernel... shouldn't make a difference though, I'd imagine).  We went
> straight from 4 GB to 2 GB, so I can't comment on the results of using 3
> GB.
> 
> Our problem didn't seem to directly correspond with the 1 GB threshold
> -- it wouldn't manifest itself until the server had allocated all 4 GB
> of RAM.  After a reboot, it would be nice and speedy again for a day or
> two until all the memory was being used for buffering again.

This was the behavior I saw as well.  I did a bunch of research and source
reading before actually figuring out what was going on; it wasn't a well
documented bug for some reason... I guess there aren't that many people running
large boxes using 2.4.

This makes me think that the problems I saw with 2GB were not related to the IO
subsystem, but were something else.  Time to go play around a bit; getting
those boxes up to 2GB without having to do a kernel patch/upgrade cycle would
be nice.

M


pgpEZm48kWcf3.pgp
Description: PGP signature


Re: I/O performance issues on 2.4.23 SMP system

2004-01-29 Thread Daniel Erat
On Wed, Jan 28, 2004 at 01:38:29PM -0800, Mark Ferlatte wrote:
[snip]
> The problem (bug) is that block device IO has to go through buffers
> that are below 1GB.  The memory manager doesn't know this, so what
> happens is that the IO layer requests a block of memory below 1GB, and
> the swapout daemon (kswapd) then runs around like a madman trying to
> free pages, instead of shuffling pages that don't need to be below 1GB
> to higher memory addresses.  Since many of the pages below 1GB can't
> be freed (they belong to active programs), the IO starves.
> 
> With 1GB of memory, both the IO layer and the swapout daemon are
> working with the same view of memory, so the bug is concealed, and
> performance is good.
> 
> I have heard of people trying 2GB, and having it work, but it didn't
> for me.

I was the poster who initiated the previous thread on this subject.  The
problem disappeared here after we went down to 2 GB of memory (although
we physically removed it from the server rather than passing the arg to
the kernel... shouldn't make a difference though, I'd imagine).  We went
straight from 4 GB to 2 GB, so I can't comment on the results of using 3
GB.

Our problem didn't seem to directly correspond with the 1 GB threshold
-- it wouldn't manifest itself until the server had allocated all 4 GB
of RAM.  After a reboot, it would be nice and speedy again for a day or
two until all the memory was being used for buffering again.

Dan




Re: I/O performance issues on 2.4.23 SMP system

2004-01-29 Thread Mark Ferlatte
Daniel Erat said on Thu, Jan 29, 2004 at 08:08:49AM -0800:
> I was the poster who initiated the previous thread on this subject.  The
> problem disappeared here after we went down to 2 GB of memory (although
> we physically removed it from the server rather than passing the arg to
> the kernel... shouldn't make a difference though, I'd imagine).  We went
> straight from 4 GB to 2 GB, so I can't comment on the results of using 3
> GB.
> 
> Our problem didn't seem to directly correspond with the 1 GB threshold
> -- it wouldn't manifest itself until the server had allocated all 4 GB
> of RAM.  After a reboot, it would be nice and speedy again for a day or
> two until all the memory was being used for buffering again.

This was the behavior I saw as well.  I did a bunch of research and source
reading before actually figuring out what was going on; it wasn't a well
documented bug for some reason... I guess there aren't that many people running
large boxes using 2.4.

This makes me think that the problems I saw with 2GB were not related to the IO
subsystem, but were something else.  Time to go play around a bit; getting
those boxes up to 2GB without having to do a kernel patch/upgrade cycle would
be nice.

M


pgp0.pgp
Description: PGP signature


Re: I/O performance issues on 2.4.23 SMP system

2004-01-29 Thread Daniel Erat
On Wed, Jan 28, 2004 at 01:38:29PM -0800, Mark Ferlatte wrote:
[snip]
> The problem (bug) is that block device IO has to go through buffers
> that are below 1GB.  The memory manager doesn't know this, so what
> happens is that the IO layer requests a block of memory below 1GB, and
> the swapout daemon (kswapd) then runs around like a madman trying to
> free pages, instead of shuffling pages that don't need to be below 1GB
> to higher memory addresses.  Since many of the pages below 1GB can't
> be freed (they belong to active programs), the IO starves.
> 
> With 1GB of memory, both the IO layer and the swapout daemon are
> working with the same view of memory, so the bug is concealed, and
> performance is good.
> 
> I have heard of people trying 2GB, and having it work, but it didn't
> for me.

I was the poster who initiated the previous thread on this subject.  The
problem disappeared here after we went down to 2 GB of memory (although
we physically removed it from the server rather than passing the arg to
the kernel... shouldn't make a difference though, I'd imagine).  We went
straight from 4 GB to 2 GB, so I can't comment on the results of using 3
GB.

Our problem didn't seem to directly correspond with the 1 GB threshold
-- it wouldn't manifest itself until the server had allocated all 4 GB
of RAM.  After a reboot, it would be nice and speedy again for a day or
two until all the memory was being used for buffering again.

Dan


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Re: I/O performance issues on 2.4.23 SMP system

2004-01-29 Thread Benjamin Sherman
The problem (bug) is that block device IO has to go through buffers that are
below 1GB.  The memory manager doesn't know this, so what happens is that the
IO layer requests a block of memory below 1GB, and the swapout daemon (kswapd)
then runs around like a madman trying to free pages, instead of shuffling pages
that don't need to be below 1GB to higher memory addresses.  Since many of the
pages below 1GB can't be freed (they belong to active programs), the IO
starves.
With 1GB of memory, both the IO layer and the swapout daemon are working with
the same view of memory, so the bug is concealed, and performance is good.
I have heard of people trying 2GB, and having it work, but it didn't for me.
Right, I have seen a 2GB success story.
Do you know if this is fixed in kernel 2.6.x?
--
Benjamin Sherman
Software Developer
Iowa Interactive, Inc
515-323-3468 x14
[EMAIL PROTECTED]


smime.p7s
Description: S/MIME Cryptographic Signature


Re: I/O performance issues on 2.4.23 SMP system

2004-01-29 Thread Benjamin Sherman
 Is this problem specific to the 3ware cards? does anyone know of any 
issues with the Highpoint 1640 SATA RAID cards?

 Any experience or recomendations with these?
No, this issue is not specific to 3ware cards. The original poster had 
QLogic fibre channel card and Adaptec SCSI.

--
Benjamin Sherman
Software Developer
Iowa Interactive, Inc
515-323-3468 x14
[EMAIL PROTECTED]


smime.p7s
Description: S/MIME Cryptographic Signature


Re: I/O performance issues on 2.4.23 SMP system

2004-01-29 Thread Benjamin Sherman
The problem (bug) is that block device IO has to go through buffers that are
below 1GB.  The memory manager doesn't know this, so what happens is that the
IO layer requests a block of memory below 1GB, and the swapout daemon (kswapd)
then runs around like a madman trying to free pages, instead of shuffling pages
that don't need to be below 1GB to higher memory addresses.  Since many of the
pages below 1GB can't be freed (they belong to active programs), the IO
starves.
With 1GB of memory, both the IO layer and the swapout daemon are working with
the same view of memory, so the bug is concealed, and performance is good.
I have heard of people trying 2GB, and having it work, but it didn't for me.
Right, I have seen a 2GB success story.
Do you know if this is fixed in kernel 2.6.x?
--
Benjamin Sherman
Software Developer
Iowa Interactive, Inc
515-323-3468 x14
[EMAIL PROTECTED]


smime.p7s
Description: S/MIME Cryptographic Signature


Re: I/O performance issues on 2.4.23 SMP system

2004-01-29 Thread Benjamin Sherman
 Is this problem specific to the 3ware cards? does anyone know of any 
issues with the Highpoint 1640 SATA RAID cards?

 Any experience or recomendations with these?
No, this issue is not specific to 3ware cards. The original poster had 
QLogic fibre channel card and Adaptec SCSI.

--
Benjamin Sherman
Software Developer
Iowa Interactive, Inc
515-323-3468 x14
[EMAIL PROTECTED]


smime.p7s
Description: S/MIME Cryptographic Signature


Re: I/O performance issues on 2.4.23 SMP system

2004-01-29 Thread Jeff S Wheeler
On Tue, 2004-01-27 at 16:49, Benjamin Sherman wrote:
> I have a server running dual 2.66Ghz Xeons and 4GB RAM, in a 
> PenguinComputing Relion 230S system. It has a 3ware RAID card with 3 
> 120GB SATA drives in RAID5. It is currently running Debian 3.0 w/ 
> vanilla kernel 2.4.23, HIGHMEM4G=y, HIGHIO=y, SMP=y, ACPI=y. I see the 
> problem with APCI and HT turned off OR if I leave them on.

I don't know anything about thos 2.4.23 I/O problem, but I will tell you
that RAID 5 is not the way to go for big SQL performance. In a RAID 5
array, all the heads must move for every operation. You already spent a
lot of money on that server. I suggest you buy more disks for RAID 10.

--
Jeff




Re: I/O performance issues on 2.4.23 SMP system

2004-01-29 Thread Jeff S Wheeler
On Tue, 2004-01-27 at 16:49, Benjamin Sherman wrote:
> I have a server running dual 2.66Ghz Xeons and 4GB RAM, in a 
> PenguinComputing Relion 230S system. It has a 3ware RAID card with 3 
> 120GB SATA drives in RAID5. It is currently running Debian 3.0 w/ 
> vanilla kernel 2.4.23, HIGHMEM4G=y, HIGHIO=y, SMP=y, ACPI=y. I see the 
> problem with APCI and HT turned off OR if I leave them on.

I don't know anything about thos 2.4.23 I/O problem, but I will tell you
that RAID 5 is not the way to go for big SQL performance. In a RAID 5
array, all the heads must move for every operation. You already spent a
lot of money on that server. I suggest you buy more disks for RAID 10.

--
Jeff


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Re: I/O performance issues on 2.4.23 SMP system

2004-01-28 Thread Jose Alberto Guzman
Mark Ferlatte wrote:
Benjamin Sherman said on Wed, Jan 28, 2004 at 03:16:56PM -0600:
 

I've got some machines in nearly the same configuration.  What I ended up
doing was to put an `append="mem=1G"' in the lilo.conf boot stanza for the
kernel I was using, and rebooted the machine in question.
This does reduce the available memory in the machine to 1GB, but solves the
IO problem.  In my case, it was much faster, even though MySQL couldn't
buffer nearly as much as with 4GB.
Thanks, Mark. I will probably try this with 3GB instead of 1GB. Did you try
that?

Yes; it didn't work.
The problem (bug) is that block device IO has to go through buffers that are
below 1GB.  The memory manager doesn't know this, so what happens is that the
IO layer requests a block of memory below 1GB, and the swapout daemon (kswapd)
then runs around like a madman trying to free pages, instead of shuffling pages
that don't need to be below 1GB to higher memory addresses.  Since many of the
pages below 1GB can't be freed (they belong to active programs), the IO
starves.
With 1GB of memory, both the IO layer and the swapout daemon are working with
the same view of memory, so the bug is concealed, and performance is good.
I have heard of people trying 2GB, and having it work, but it didn't for me.
M

 Is this problem specific to the 3ware cards? does anyone know of any 
issues with the Highpoint 1640 SATA RAID cards?

 Any experience or recomendations with these?
 Which is the best SATA raid card for linux at the moment?
 Thanks
 José
PS.
please reply to the list.



Re: I/O performance issues on 2.4.23 SMP system

2004-01-28 Thread Jose Alberto Guzman
Mark Ferlatte wrote:
Benjamin Sherman said on Wed, Jan 28, 2004 at 03:16:56PM -0600:
 

I've got some machines in nearly the same configuration.  What I ended up
doing was to put an `append="mem=1G"' in the lilo.conf boot stanza for the
kernel I was using, and rebooted the machine in question.
This does reduce the available memory in the machine to 1GB, but solves the
IO problem.  In my case, it was much faster, even though MySQL couldn't
buffer nearly as much as with 4GB.
Thanks, Mark. I will probably try this with 3GB instead of 1GB. Did you try
that?


Yes; it didn't work.

The problem (bug) is that block device IO has to go through buffers that are
below 1GB.  The memory manager doesn't know this, so what happens is that the
IO layer requests a block of memory below 1GB, and the swapout daemon (kswapd)
then runs around like a madman trying to free pages, instead of shuffling pages
that don't need to be below 1GB to higher memory addresses.  Since many of the
pages below 1GB can't be freed (they belong to active programs), the IO
starves.
With 1GB of memory, both the IO layer and the swapout daemon are working with
the same view of memory, so the bug is concealed, and performance is good.
I have heard of people trying 2GB, and having it work, but it didn't for me.

M


 Is this problem specific to the 3ware cards? does anyone know of any 
issues with the Highpoint 1640 SATA RAID cards?

 Any experience or recomendations with these?

 Which is the best SATA raid card for linux at the moment?

 Thanks

 José

PS.
please reply to the list.
--
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]


Re: I/O performance issues on 2.4.23 SMP system

2004-01-28 Thread Mark Ferlatte
Benjamin Sherman said on Wed, Jan 28, 2004 at 03:16:56PM -0600:
 
> >I've got some machines in nearly the same configuration.  What I ended up
> >doing was to put an `append="mem=1G"' in the lilo.conf boot stanza for the
> >kernel I was using, and rebooted the machine in question.
> >
> >This does reduce the available memory in the machine to 1GB, but solves the
> >IO problem.  In my case, it was much faster, even though MySQL couldn't
> >buffer nearly as much as with 4GB.
> Thanks, Mark. I will probably try this with 3GB instead of 1GB. Did you try
> that?

Yes; it didn't work.

The problem (bug) is that block device IO has to go through buffers that are
below 1GB.  The memory manager doesn't know this, so what happens is that the
IO layer requests a block of memory below 1GB, and the swapout daemon (kswapd)
then runs around like a madman trying to free pages, instead of shuffling pages
that don't need to be below 1GB to higher memory addresses.  Since many of the
pages below 1GB can't be freed (they belong to active programs), the IO
starves.

With 1GB of memory, both the IO layer and the swapout daemon are working with
the same view of memory, so the bug is concealed, and performance is good.

I have heard of people trying 2GB, and having it work, but it didn't for me.

M


pgpoDfLP7KTv2.pgp
Description: PGP signature


Re: I/O performance issues on 2.4.23 SMP system

2004-01-28 Thread Benjamin Sherman
* Is the I/O patch referenced (by Ingo Molnar) available for 2.4.24?
Possibly; it's certainly not merged into 2.4.24.
Can anyone point me to the specific patch?
I've got some machines in nearly the same configuration.  What I ended up doing
was to put an `append="mem=1G"' in the lilo.conf boot stanza for the kernel I
was using, and rebooted the machine in question.
This does reduce the available memory in the machine to 1GB, but solves the IO
problem.  In my case, it was much faster, even though MySQL couldn't buffer
nearly as much as with 4GB.
Thanks, Mark. I will probably try this with 3GB instead of 1GB. Did you 
try that?

--
Benjamin Sherman
Software Developer
Iowa Interactive, Inc
515-323-3468 x14
[EMAIL PROTECTED]


smime.p7s
Description: S/MIME Cryptographic Signature


Re: I/O performance issues on 2.4.23 SMP system

2004-01-28 Thread Mark Ferlatte
Benjamin Sherman said on Tue, Jan 27, 2004 at 03:49:24PM -0600:
> So, I have a couple of questions because this box made it to production 
> before the problem was discovered and I can't test as I'd like.
> * If I were to use 64GB HIGHMEM support. Would this problem go away?

Nope.

> * Is the I/O patch referenced (by Ingo Molnar) available for 2.4.24?

Possibly; it's certainly not merged into 2.4.24.

> * Is the patch available individually, if so, where can it be found? I 
> googled quite a bit, but didn't find anything definite.
> 
> Any thoughts or suggestions?

I've got some machines in nearly the same configuration.  What I ended up doing
was to put an `append="mem=1G"' in the lilo.conf boot stanza for the kernel I
was using, and rebooted the machine in question.

This does reduce the available memory in the machine to 1GB, but solves the IO
problem.  In my case, it was much faster, even though MySQL couldn't buffer
nearly as much as with 4GB.

M


pgpv7xXO6Gh3N.pgp
Description: PGP signature


Re: I/O performance issues on 2.4.23 SMP system

2004-01-28 Thread Mark Ferlatte
Benjamin Sherman said on Wed, Jan 28, 2004 at 03:16:56PM -0600:
 
> >I've got some machines in nearly the same configuration.  What I ended up
> >doing was to put an `append="mem=1G"' in the lilo.conf boot stanza for the
> >kernel I was using, and rebooted the machine in question.
> >
> >This does reduce the available memory in the machine to 1GB, but solves the
> >IO problem.  In my case, it was much faster, even though MySQL couldn't
> >buffer nearly as much as with 4GB.
> Thanks, Mark. I will probably try this with 3GB instead of 1GB. Did you try
> that?

Yes; it didn't work.

The problem (bug) is that block device IO has to go through buffers that are
below 1GB.  The memory manager doesn't know this, so what happens is that the
IO layer requests a block of memory below 1GB, and the swapout daemon (kswapd)
then runs around like a madman trying to free pages, instead of shuffling pages
that don't need to be below 1GB to higher memory addresses.  Since many of the
pages below 1GB can't be freed (they belong to active programs), the IO
starves.

With 1GB of memory, both the IO layer and the swapout daemon are working with
the same view of memory, so the bug is concealed, and performance is good.

I have heard of people trying 2GB, and having it work, but it didn't for me.

M


pgp0.pgp
Description: PGP signature


Re: I/O performance issues on 2.4.23 SMP system

2004-01-28 Thread Benjamin Sherman
* Is the I/O patch referenced (by Ingo Molnar) available for 2.4.24?
Possibly; it's certainly not merged into 2.4.24.
Can anyone point me to the specific patch?

I've got some machines in nearly the same configuration.  What I ended up doing
was to put an `append="mem=1G"' in the lilo.conf boot stanza for the kernel I
was using, and rebooted the machine in question.
This does reduce the available memory in the machine to 1GB, but solves the IO
problem.  In my case, it was much faster, even though MySQL couldn't buffer
nearly as much as with 4GB.
Thanks, Mark. I will probably try this with 3GB instead of 1GB. Did you 
try that?

--
Benjamin Sherman
Software Developer
Iowa Interactive, Inc
515-323-3468 x14
[EMAIL PROTECTED]


smime.p7s
Description: S/MIME Cryptographic Signature


Re: I/O performance issues on 2.4.23 SMP system

2004-01-28 Thread Mark Ferlatte
Benjamin Sherman said on Tue, Jan 27, 2004 at 03:49:24PM -0600:
> So, I have a couple of questions because this box made it to production 
> before the problem was discovered and I can't test as I'd like.
> * If I were to use 64GB HIGHMEM support. Would this problem go away?

Nope.

> * Is the I/O patch referenced (by Ingo Molnar) available for 2.4.24?

Possibly; it's certainly not merged into 2.4.24.

> * Is the patch available individually, if so, where can it be found? I 
> googled quite a bit, but didn't find anything definite.
> 
> Any thoughts or suggestions?

I've got some machines in nearly the same configuration.  What I ended up doing
was to put an `append="mem=1G"' in the lilo.conf boot stanza for the kernel I
was using, and rebooted the machine in question.

This does reduce the available memory in the machine to 1GB, but solves the IO
problem.  In my case, it was much faster, even though MySQL couldn't buffer
nearly as much as with 4GB.

M


pgp0.pgp
Description: PGP signature


I/O performance issues on 2.4.23 SMP system

2004-01-27 Thread Benjamin Sherman
I am following up a message sent to this list:
# Subject: severe I/O performance issues on 2.4.22 SMP system
# From: Daniel Erat <[EMAIL PROTECTED]>
# Date: Fri, 31 Oct 2003 12:38:38 -0800
I have a server running dual 2.66Ghz Xeons and 4GB RAM, in a 
PenguinComputing Relion 230S system. It has a 3ware RAID card with 3 
120GB SATA drives in RAID5. It is currently running Debian 3.0 w/ 
vanilla kernel 2.4.23, HIGHMEM4G=y, HIGHIO=y, SMP=y, ACPI=y. I see the 
problem with APCI and HT turned off OR if I leave them on.

I think my problem is perhaps the same as Mr. Erat's.
Basically, I/O on this box sucks. A good example of the problem is the 
compared import of identical data to mysql. On this box, importing a 
dataset takes roughly 20 minutes. On another dev server (single Athlon 
2Ghz, 1GB RAM, software RAID5 over Firewire), with identical mysql and 
dataset, the same import takes roughly 4.5 minutes.

So, I have a couple of questions because this box made it to production 
before the problem was discovered and I can't test as I'd like.
* If I were to use 64GB HIGHMEM support. Would this problem go away?
* Is the I/O patch referenced (by Ingo Molnar) available for 2.4.24?
  OR is the patch going to be in the kernel anytime soon?
* Is the patch available individually, if so, where can it be found? I 
googled quite a bit, but didn't find anything definite.

Any thoughts or suggestions?
Thanks!
--
Benjamin Sherman
Iowa Interactive, Inc
[EMAIL PROTECTED]


smime.p7s
Description: S/MIME Cryptographic Signature


I/O performance issues on 2.4.23 SMP system

2004-01-27 Thread Benjamin Sherman
I am following up a message sent to this list:
# Subject: severe I/O performance issues on 2.4.22 SMP system
# From: Daniel Erat <[EMAIL PROTECTED]>
# Date: Fri, 31 Oct 2003 12:38:38 -0800
I have a server running dual 2.66Ghz Xeons and 4GB RAM, in a 
PenguinComputing Relion 230S system. It has a 3ware RAID card with 3 
120GB SATA drives in RAID5. It is currently running Debian 3.0 w/ 
vanilla kernel 2.4.23, HIGHMEM4G=y, HIGHIO=y, SMP=y, ACPI=y. I see the 
problem with APCI and HT turned off OR if I leave them on.

I think my problem is perhaps the same as Mr. Erat's.

Basically, I/O on this box sucks. A good example of the problem is the 
compared import of identical data to mysql. On this box, importing a 
dataset takes roughly 20 minutes. On another dev server (single Athlon 
2Ghz, 1GB RAM, software RAID5 over Firewire), with identical mysql and 
dataset, the same import takes roughly 4.5 minutes.

So, I have a couple of questions because this box made it to production 
before the problem was discovered and I can't test as I'd like.
* If I were to use 64GB HIGHMEM support. Would this problem go away?
* Is the I/O patch referenced (by Ingo Molnar) available for 2.4.24?
  OR is the patch going to be in the kernel anytime soon?
* Is the patch available individually, if so, where can it be found? I 
googled quite a bit, but didn't find anything definite.

Any thoughts or suggestions?
Thanks!
--
Benjamin Sherman
Iowa Interactive, Inc
[EMAIL PROTECTED]


smime.p7s
Description: S/MIME Cryptographic Signature