Re: [CentOS] High load averages with latest kernel and USB drives?

2009-11-20 Thread Todd Denniston
Benjamin Smith wrote, On 11/18/2009 06:11 PM:
> On Tuesday 17 November 2009 15:37:24 Todd Denniston wrote:
>> Benjamin Smith wrote, On 11/17/2009 01:46 PM:
>>> See comments below...
>>>
>>> On Tuesday 17 November 2009 07:52:01 Todd Denniston wrote:
 Benjamin Smith wrote, On 11/16/2009 10:56 PM:
> I have a 1TB USB drive plugged into a USB2 port that I use to back up
> the production drives (which are SCSI). It's working fine, but while
> doing backups (hourly) the load average on the server shoots up from
> the normal 0.5 - 1.5 or so up to a high between 10 and 30. Strangely,
> even though the "load is high" the server is completely responsive,
> even the USB drives being accessed are!
>
> Using top to diagnose, nothing seems to be particularly high! IoWait
> seems reasonable (10-30%) and CPUs are 0.5%, Idle is 70-90%. Even
> accessing the USB partition while the load is "high" is responsive!
>> you might add another field to top while you are watching, Last used cpu
>>  (SMP), i.e., start top
>> press f
>> press j
>> press enter
>>
>> this should let you see if your process is bouncing between processors.
> 
> The process pg_dump is "adhering" fine to processor 1. I see usb-storage 
> bouncing between processors - I've seen it on 3, 4, 7 over perhaps a minute. 
> What could you recommend next? 
> 

try
#2 set the usb-storage on a particular set of processors,
# Note USBSTORPID= line prototyped on CentOS 5 machine not 4.
USBSTORPID=`ps aux |grep usb-storage|head -1 |awk '{print $2}'`
taskset -p -c 4 $USBSTORPID

and still
I have not had the taskset of the USB driver cause faults when used on a dual 
processor Xeon, but if 
any of the above breaks your system you get to keep the chunky bits. :0

so if you try it, keep an eye on it.
reversing the above taskset in your case would I _think_ be:
taskset -p -c 0-7 $USBSTORPID

-- 
Todd Denniston
Crane Division, Naval Surface Warfare Center (NSWC Crane)
Harnessing the Power of Technology for the Warfighter
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] High load averages with latest kernel and USB drives?

2009-11-18 Thread Amos Shapira
Sorry can't suggest much about the usb issue but for such frequent
backups, as well as to enable poin-in-time-recovery (PITR) you should
consider log archiving. It should also save you heaps of load on cpu,
disk, network and postgresql server.

-Amos

On 11/17/09, Benjamin Smith  wrote:
> I'm having a server report a high load average when backing up Postgres
> database files to an external USB drive. This is driving my loadbalancers
> all
> out of kilter and causing a large volume of network monitor alerts.
>
> I have a 1TB USB drive plugged into a USB2 port that I use to back up the
> production drives (which are SCSI). It's working fine, but while doing
> backups
> (hourly) the load average on the server shoots up from the normal 0.5 - 1.5
> or
> so up to a high between 10 and 30. Strangely, even though the "load is high"
> the server is completely responsive, even the USB drives being accessed are!
>
> Backup script is really simple, run via cron, pretty much just:
>
> #! /bin/sh
> hour=`date +%k`;
> pg_dump  mydatabase > /media/backups/mydatabase.$hour.pgsql;
>
> where /media/backups is the mount point for the USB drive.
>
> Using top to diagnose, nothing seems to be particularly high! IoWait seems
> reasonable (10-30%) and CPUs are 0.5%, Idle is 70-90%. Even accessing the
> USB
> partition while the load is "high" is responsive!
>
> I'm guessing that something changed in how load average is counted?
>
> Server Stats:
>   Late model 8-way Xeon, SuperMicro brand.
>   CentOS 4.x  / 64 (all updates applied, booted after last kernel update)
>   Kernel 2.6.9-89.0.16.ELsmp
>   4 GB ECC RAM
>   300 GB SCSI HDD.
>   Standard Apache/PHP, Postgres 8.4.
>
> Any idea how to revert to the old load average tracking behavior short of
> using a stale and potentially insecure kernel?
>
> --
> This message has been scanned for viruses and
> dangerous content by MailScanner, and is
> believed to be clean.
>
> ___
> CentOS mailing list
> CentOS@centos.org
> http://lists.centos.org/mailman/listinfo/centos
>
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] High load averages with latest kernel and USB drives?

2009-11-18 Thread Benjamin Smith
On Tuesday 17 November 2009 15:37:24 Todd Denniston wrote:
> Benjamin Smith wrote, On 11/17/2009 01:46 PM:
> > See comments below...
> >
> > On Tuesday 17 November 2009 07:52:01 Todd Denniston wrote:
> >> Benjamin Smith wrote, On 11/16/2009 10:56 PM:
> >>> I have a 1TB USB drive plugged into a USB2 port that I use to back up
> >>> the production drives (which are SCSI). It's working fine, but while
> >>> doing backups (hourly) the load average on the server shoots up from
> >>> the normal 0.5 - 1.5 or so up to a high between 10 and 30. Strangely,
> >>> even though the "load is high" the server is completely responsive,
> >>> even the USB drives being accessed are!
> >>>
> >>> Using top to diagnose, nothing seems to be particularly high! IoWait
> >>> seems reasonable (10-30%) and CPUs are 0.5%, Idle is 70-90%. Even
> >>> accessing the USB partition while the load is "high" is responsive!
> 
> you might add another field to top while you are watching, Last used cpu
>  (SMP), i.e., start top
> press f
> press j
> press enter
> 
> this should let you see if your process is bouncing between processors.

The process pg_dump is "adhering" fine to processor 1. I see usb-storage 
bouncing between processors - I've seen it on 3, 4, 7 over perhaps a minute. 
What could you recommend next? 

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] High load averages with latest kernel and USB drives?

2009-11-17 Thread Todd Denniston
Benjamin Smith wrote, On 11/17/2009 01:46 PM:
> See comments below... 
> 
> On Tuesday 17 November 2009 07:52:01 Todd Denniston wrote:
>> Benjamin Smith wrote, On 11/16/2009 10:56 PM:
>>> I have a 1TB USB drive plugged into a USB2 port that I use to back up the
>>> production drives (which are SCSI). It's working fine, but while doing
>>> backups (hourly) the load average on the server shoots up from the normal
>>> 0.5 - 1.5 or so up to a high between 10 and 30. Strangely, even though
>>> the "load is high" the server is completely responsive, even the USB
>>> drives being accessed are!
>>>
>>> Using top to diagnose, nothing seems to be particularly high! IoWait
>>> seems reasonable (10-30%) and CPUs are 0.5%, Idle is 70-90%. Even
>>> accessing the USB partition while the load is "high" is responsive!
>>>
> 

you might add another field to top while you are watching, Last used cpu (SMP), 
i.e.,
start top
press f
press j
press enter

this should let you see if your process is bouncing between processors.

>> As workarounds perhaps asking the kernel to schedule in a specific way
>>  might help, i.e.: #1 set the backup on a particular set of processors,
>> #  replace the pg_dump line above with
>> taskset -c 3-4 pg_dump  mydatabase > \
>>  /media/backups/mydatabase.$hour.pgsql;
> 
> There are 8 cores on the machine, none of which are reporting more than 5% 
> load. That's what has me perplexed. When I run top, I see a max of about 30% 
> user. Everything else is zero. When I run the backup script to a non-USB 
> drive, the load average is completely normal (below 0.50, often below 0.10) 

USB chewing up more CPU than normal disks has been my experience all along, 
this just seems a little 
extreme.

> 
>> #2 set the usb-storage on a particular set of processors,
>> # Note USBSTORPID= line prototyped on CentOS 5 machine not 4.
>> USBSTORPID=`ps aux |grep usb-storage|head -1 |awk '{print $2}'`
>> taskset -p -c 3-4 $USBSTORPID
>> #you might even go back and reduce the processor list
>> #to just 3 or 4 instead of both.
> 
> Could you explain to me what this should accomplish? I'm curious as to why 
> you 
> went this route... 

Even though the process is not using much processor time, having it bounce 
around between processors 
can:
* thrash the cache of each processor as it goes there
* waste time context switching in the next processor
* bounce other processes around and cascade the same effects as they go along

I know that there has been some scheduler work over time to have these switches 
be less likely, but 
I have also seen some good effects by locking certain processes into a 
processor instead of letting 
it float.  Usually the best processes to do to are ones that use large amounts 
of memory, like X or 
Firefox which are large enough that they thoroughly toss anything else out of a 
processor's cache.


-- 
Todd Denniston
Crane Division, Naval Surface Warfare Center (NSWC Crane)
Harnessing the Power of Technology for the Warfighter
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] High load averages with latest kernel and USB drives?

2009-11-17 Thread Benjamin Smith
See comments below... 

On Tuesday 17 November 2009 07:52:01 Todd Denniston wrote:
> Benjamin Smith wrote, On 11/16/2009 10:56 PM:
> > I have a 1TB USB drive plugged into a USB2 port that I use to back up the
> > production drives (which are SCSI). It's working fine, but while doing
> > backups (hourly) the load average on the server shoots up from the normal
> > 0.5 - 1.5 or so up to a high between 10 and 30. Strangely, even though
> > the "load is high" the server is completely responsive, even the USB
> > drives being accessed are!
> >
> > Backup script is really simple, run via cron, pretty much just:
> >
> > #! /bin/sh
> > hour=`date +%k`;
> > pg_dump  mydatabase > /media/backups/mydatabase.$hour.pgsql;
> >
> > where /media/backups is the mount point for the USB drive.
> >
> > Using top to diagnose, nothing seems to be particularly high! IoWait
> > seems reasonable (10-30%) and CPUs are 0.5%, Idle is 70-90%. Even
> > accessing the USB partition while the load is "high" is responsive!
> >
> > I'm guessing that something changed in how load average is counted?
> >
> > Server Stats:
> > Late model 8-way Xeon, SuperMicro brand.
> > CentOS 4.x  / 64 (all updates applied, booted after last kernel update)
> > Kernel 2.6.9-89.0.16.ELsmp
> > 4 GB ECC RAM
> > 300 GB SCSI HDD.
> > Standard Apache/PHP, Postgres 8.4.
> >
> > Any idea how to revert to the old load average tracking behavior short of
> > using a stale and potentially insecure kernel?



> Are you saying that when you were running a previous kernel the same
>  operations with the same devices did not have the high load? 

Correct! 

>  Which
>  specific kernels worked as desired (if someone is going to bisect the
>  problem they need a start point)?

kernel-smp-devel-2.6.9-89.0.15.EL  (I always keep my machines updated on at 
least a weekly scheduule) 

> Are there other processes on the machine that are waiting to use the db
>  while the dump is occurring? 

No. Database is actually on a different machine and backups are being done over 
the network. 

>  How many postgres processes are waiting for
>  the dump to finish (it has been a while since I ran postgres so I don't
>  recall how it deals with query's during a dump)?

One - the one performing the backup. Postgres uses MVCC so pg_dump doesn't 
block any other connections from continuing/finishing. 

> As workarounds perhaps asking the kernel to schedule in a specific way
>  might help, i.e.: #1 set the backup on a particular set of processors,
> #  replace the pg_dump line above with
> taskset -c 3-4 pg_dump  mydatabase > \
>   /media/backups/mydatabase.$hour.pgsql;

There are 8 cores on the machine, none of which are reporting more than 5% 
load. That's what has me perplexed. When I run top, I see a max of about 30% 
user. Everything else is zero. When I run the backup script to a non-USB 
drive, the load average is completely normal (below 0.50, often below 0.10) 

> #2 set the usb-storage on a particular set of processors,
> # Note USBSTORPID= line prototyped on CentOS 5 machine not 4.
> USBSTORPID=`ps aux |grep usb-storage|head -1 |awk '{print $2}'`
> taskset -p -c 3-4 $USBSTORPID
> #you might even go back and reduce the processor list
> #to just 3 or 4 instead of both.

Could you explain to me what this should accomplish? I'm curious as to why you 
went this route... 

> #3 don't update atime
> # (should at worst be a minor thing, and you say that
> # the usb mounted file system is responsive,
> # but perhaps it would help some.)
> mount -oremount,noatime /media/backups/

Already mounted noatime... here's the mount line in the backup script: 
# mount -o rw,noatime -t ext3 /dev/sdc1 /home/backup/localdb/


-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] High load averages with latest kernel and USB drives?

2009-11-17 Thread Todd Denniston
Benjamin Smith wrote, On 11/16/2009 10:56 PM:
> I have a 1TB USB drive plugged into a USB2 port that I use to back up the 
> production drives (which are SCSI). It's working fine, but while doing 
> backups 
> (hourly) the load average on the server shoots up from the normal 0.5 - 1.5 
> or 
> so up to a high between 10 and 30. Strangely, even though the "load is high" 
> the server is completely responsive, even the USB drives being accessed are! 
> 
> Backup script is really simple, run via cron, pretty much just: 
> 
> #! /bin/sh 
> hour=`date +%k`;
> pg_dump  mydatabase > /media/backups/mydatabase.$hour.pgsql; 
> 
> where /media/backups is the mount point for the USB drive. 
> 
> Using top to diagnose, nothing seems to be particularly high! IoWait seems 
> reasonable (10-30%) and CPUs are 0.5%, Idle is 70-90%. Even accessing the USB 
> partition while the load is "high" is responsive! 
> 
> I'm guessing that something changed in how load average is counted?
> 
> Server Stats: 
>   Late model 8-way Xeon, SuperMicro brand. 
>   CentOS 4.x  / 64 (all updates applied, booted after last kernel update) 
>   Kernel 2.6.9-89.0.16.ELsmp
>   4 GB ECC RAM
>   300 GB SCSI HDD. 
>   Standard Apache/PHP, Postgres 8.4. 
> 
> Any idea how to revert to the old load average tracking behavior short of 
> using a stale and potentially insecure kernel? 
> 

Note, although I have a couple of ideas, I am answering/questioning more out of 
curiosity than 
experience. salt appropriately.

Are you saying that when you were running a previous kernel the same operations 
with the same 
devices did not have the high load?  Which specific kernels worked as desired 
(if someone is going 
to bisect the problem they need a start point)?

Are there other processes on the machine that are waiting to use the db while 
the dump is occurring?
How many postgres processes are waiting for the dump to finish (it has been a 
while since I ran 
postgres so I don't recall how it deals with query's during a dump)?

As workarounds perhaps asking the kernel to schedule in a specific way might 
help, i.e.:
#1 set the backup on a particular set of processors,
#  replace the pg_dump line above with
taskset -c 3-4 pg_dump  mydatabase > \
/media/backups/mydatabase.$hour.pgsql;

#2 set the usb-storage on a particular set of processors,
# Note USBSTORPID= line prototyped on CentOS 5 machine not 4.
USBSTORPID=`ps aux |grep usb-storage|head -1 |awk '{print $2}'`
taskset -p -c 3-4 $USBSTORPID
#you might even go back and reduce the processor list
#to just 3 or 4 instead of both.

#3 don't update atime
# (should at worst be a minor thing, and you say that
# the usb mounted file system is responsive,
# but perhaps it would help some.)
mount -oremount,noatime /media/backups/

I have not had the taskset of the USB driver cause faults when used on a dual 
processor Xeon, but if 
any of the above breaks your system you get to keep the chunky bits. :0
-- 
Todd Denniston
Crane Division, Naval Surface Warfare Center (NSWC Crane)
Harnessing the Power of Technology for the Warfighter
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


[CentOS] High load averages with latest kernel and USB drives?

2009-11-16 Thread Benjamin Smith
I'm having a server report a high load average when backing up Postgres 
database files to an external USB drive. This is driving my loadbalancers all 
out of kilter and causing a large volume of network monitor alerts. 

I have a 1TB USB drive plugged into a USB2 port that I use to back up the 
production drives (which are SCSI). It's working fine, but while doing backups 
(hourly) the load average on the server shoots up from the normal 0.5 - 1.5 or 
so up to a high between 10 and 30. Strangely, even though the "load is high" 
the server is completely responsive, even the USB drives being accessed are! 

Backup script is really simple, run via cron, pretty much just: 

#! /bin/sh 
hour=`date +%k`;
pg_dump  mydatabase > /media/backups/mydatabase.$hour.pgsql; 

where /media/backups is the mount point for the USB drive. 

Using top to diagnose, nothing seems to be particularly high! IoWait seems 
reasonable (10-30%) and CPUs are 0.5%, Idle is 70-90%. Even accessing the USB 
partition while the load is "high" is responsive! 

I'm guessing that something changed in how load average is counted?

Server Stats: 
Late model 8-way Xeon, SuperMicro brand. 
CentOS 4.x  / 64 (all updates applied, booted after last kernel update) 
Kernel 2.6.9-89.0.16.ELsmp
4 GB ECC RAM
300 GB SCSI HDD. 
Standard Apache/PHP, Postgres 8.4. 

Any idea how to revert to the old load average tracking behavior short of 
using a stale and potentially insecure kernel? 

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos