Re: [CentOS] disk I/O problems and Solutions

2009-10-09 Thread Martin Suehowicz
Raid10 should be better on your writes. Random reads and writes are the
most important for a db. For random io the more spindles(disks) you have
the better. I would use sas over sata if possible. How big is your
database? If it is small you may be able to put it on a few solid state
drives there really good for random io. I don't know of a website, but
one would be nice. 

-Original Message-
From: centos-boun...@centos.org [mailto:centos-boun...@centos.org] On
Behalf Of Alan McKay
Sent: Friday, October 09, 2009 9:45 AM
To: Alan McKay
Subject: [CentOS] disk I/O problems and Solutions

Hey folks,

CentOS / PostgreSQL shop over here.

I'm hitting 3 of my favorite lists with this, so here's hoping that
the BCC trick is the right way to do it :-)

We've just discovered thanks to a new Munin plugin
http://blogs.amd.co.at/robe/2008/12/graphing-linux-disk-io-statistics-wi
th-munin.html
that our production DB is completely maxing out in I/O for about a 3
hour stretch from 6am til 9am
This is "device utilization" as per the last graph at the above link.

Load went down for a while but is now between 70% and 95% sustained.
We've only had this plugin going for less than a day so I don't really
 have any more data going back further.  But we've suspected a disk
issue for some time - just have not been able to prove it.

Our system
IBM 3650 - quad 2Ghz e5405 Xeon
8K SAS RAID Controller
6 x 300G 15K/RPM SAS Drives
/dev/sda - 2 drives configured as a RAID 1 for 300G for the OS
/dev/sdb - 3 drives configured as RAID5 for 600G for the DB
1 drive as a global hot spare

/dev/sdb is the one that is maxing out.

We need to have a very serious look at fixing this situation.   But we
don't have the money to be experimenting with solutions that won't
solve our problem.  And our budget is fairly limited.

Is there a public library somewhere of disk subsystems and their
performance figures?  Done with some semblance of a standard
benchmark?

One benchmark I am partial to is this one :
http://wiki.postgresql.org/wiki/PgCon_2009/Greg_Smith_Hardware_Benchmark
ing_notes#dd_test

One thing I am thinking of in the immediate term is taking the RAID5 +
hot spare and converting it to RAID10 with the same amount of storage.
 Will that perform much better?

In general we are planning to move away from RAID5 toward RAID10.

We also have on order an external IBM array (don't have the exact name
on hand but model number was 3000) with 12 drive bays.  We ordered it
with just 4 x SATAII drives, and were going to put it on a different
system as a RAID10.  These are just 7200 RPM drives - the goal was
cheaper storage because the SAS drives are about twice as much per
drive, and it is only a 300G drive versus the 1T SATA2 drives.   IIRC
the SATA2 drives are about $200 each and the SAS 300G drives about
$500 each.

So I have 2 thoughts with this 12 disk array.   1 is to fill it up
with 12 x cheap SATA2 drives and hope that even though the spin-rate
is a lot slower, that the fact that it has more drives will make it
perform better.  But somehow I am doubtful about that.   The other
thought is to bite the bullet and fill it up with 300G SAS drives.

any thoughts here?  recommendations on what to do with a tight budget?
  It could be the answer is that I just have to go back to the bean
counters and tell them we have no choice but to start spending some
real money.  But on what?  And how do I prove that this is the only
choice?


-- 
"Don't eat anything you've ever seen advertised on TV"
 - Michael Pollan, author of "In Defense of Food"
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] syslog: CPU stuck for 10s!

2009-03-17 Thread Martin Suehowicz
Try upgrading to the latest kernel. 

-Original Message-
From: centos-boun...@centos.org [mailto:centos-boun...@centos.org] On
Behalf Of tblader
Sent: Tuesday, March 17, 2009 9:05 AM
To: CentOS mailing list
Subject: [CentOS] syslog: CPU stuck for 10s!

Hi All,
I have a Centos 5 box serving NFS3 shares from an LSI megaraid card.
The box has been up and down for about a week and trying to figure
out what's up. Found a syslog message today about "APIC error on CPU"
and after rebooting with NOAPIC, I now get this:

  # cat /var/log/kernel | grep BUG
  Mar 17 09:51:05 ofdmz kernel: BUG: soft lockup - CPU#0 stuck for 10s!
  [migration/0:2]
  Mar 17 09:52:21 ofdmz kernel: BUG: soft lockup - CPU#0 stuck for 10s!
  [ssh:3491]

Anyone know what this means?  I found a thread* from 2006 on this
list that mentions updating the bios, but thought I would get
a message out early in case this doesn't fix it.

Thanks

[*] - http://lists.centos.org/pipermail/centos/2006-June/023933.html
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


[CentOS] Looking for a list of default services to disable in centos 5

2009-03-25 Thread Martin Suehowicz
I am looking for a list of services that you disable by default on your
server. 

Here is what I am disabling so far.

avahi-daemon 
bluetooth 
cups 
firstboot 
haldaemon 
hidd 
hplip 
ip6tables 
isdn 
messagebus 
pcscd 
rpcgssd 
rpcidmapd 
sendmail 
xfs 
xinetd 
yum-updatesd 

Thanks for any input you provide!
Martin



___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Looking for a list of default services to disable in centos 5

2009-03-25 Thread Martin Suehowicz
My question was targeted at minimal install that I could start with bare
bones. Just what you need to run the os. I would use it to build the
rest of my kickstarts with adding the needed services for webservers,
databases, etc. I see the usefulness it for example You can pretty much
say that everyone with a server build does not need Bluetooth and that
most people are going to want syslog running. Thanks for the input! I do
see your point about looking at my servers. 
Martin

-Original Message-
From: centos-boun...@centos.org [mailto:centos-boun...@centos.org] On
Behalf Of Spiro Harvey
Sent: Wednesday, March 25, 2009 1:40 PM
To: centos@centos.org
Subject: Re: [CentOS] Looking for a list of default services to disable
in centos 5

> I am looking for a list of services that you disable by default on 
> your server.

what kind of server? smtp server? pop/imap server? proxy server? web
server? ftp server? logging server? voip gateway? firewall? rpm build
box? swipe card reader server? development/source repo server? LDAP,
NFS? 

or are you looking for a set of things that we disable by default on all
servers? At which point I question your choice of removing sendmail
(unless you're replacing it with something like exim or postfix) because
most servers need to send mail, even if it's just to alert you when a
cron job has barfed.

personally I disable, or don't install SE Linux, Network Manager (with
extreme prejudice), and anything to do with wireless/bluetooth, and X on
every single server. 

>From there it depends on what the server is doing.

We've got a Kickstart server and boot off USB sticks and CDs that allow
us to pick generic build types off a menu (eg; web server, smtp server,
mail storage server, etc). The kickstart config just pulls down the
packages we want, a few scripts get run doing various things like
updating all packages, setting up our distributed config system,
installing custom packages, and so on. 

However, I don't see the usefulness in seeing what other people disable.
Everybody has different networks, different requirements, and does
different things on their boxes. What you should be doing is looking at
*your* servers and itemising what they do. Then remove all packages that
are not needed to provide those services.

-- 
Spiro Harvey  Knossos Networks Ltd
021-295-1923www.knossos.net.nz

___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Two sets of Heartbeat HTTPD clusters on same subnet

2009-04-01 Thread Martin Suehowicz
Think here is a faq somewhere on the heartbeat website that recommended
using multicast and 2 different ports. Perhaps you could just use 2
diffent udp ports though. Have a look at the halinux faq.

-Original Message-
From: centos-boun...@centos.org [mailto:centos-boun...@centos.org] On
Behalf Of Fabian Arrotin
Sent: Wednesday, April 01, 2009 10:41 AM
To: CentOS mailing list
Subject: Re: [CentOS] Two sets of Heartbeat HTTPD clusters on same
subnet

Devraj Mukherjee wrote:
> Hi all,
> 
> I am new to Hearbeat so please be kind :) I also posted this on
> Linux-HA lists with no responses so I posted it here.
> 
> I have successfully configure two machines to use heartbeat to cluster
> httpd. The two nodes are called etk-1 and etk-2. I am trying to
> configure another two machines to act as a separate cluster (on the
> same IP subnet). These two nodes are called radu-1 and radu-2.
> 
> Obviously being a broadcast protocol radu-1 and radu-2 get these
> messages from etk-1 and I can't seem to get radu-1 and radu-2 to
> cluster (mostly probably because they are not getting the messages
> from the right nodes).
> 
> Should I just change the name of the test, if I do that I get heaps of
> WARNING log messages.
> 
> Is it possible to have two sets of clusters in the one IP subnet?
> 
> If yes what do I have to change so these clusters don't send messages
> to the wrong nodes.
> 
> heartbeat[3745]: 2009/03/30_04:48:02 ERROR: process_status_message:
> bad node [etk-1] in message
> heartbeat[3745]: 2009/03/30_04:48:02 ERROR: MSG: Dumping message with
10 fields
> heartbeat[3745]: 2009/03/30_04:48:02 ERROR: MSG[0] : [t=NS_ackmsg]
> heartbeat[3745]: 2009/03/30_04:48:02 ERROR: MSG[1] : [dest= etk-2]
> heartbeat[3745]: 2009/03/30_04:48:02 ERROR: MSG[2] : [ackseq=1a9601]
> heartbeat[3745]: 2009/03/30_04:48:02 ERROR: MSG[3] :
> [(1)destuuid=0xdf38de8(37 28)]
> heartbeat[3745]: 2009/03/30_04:48:02 ERROR: MSG[4] : [src= etk-1]
> heartbeat[3745]: 2009/03/30_04:48:02 ERROR: MSG[5] :
> [(1)srcuuid=0xdf39248(36 27)]
> heartbeat[3745]: 2009/03/30_04:48:02 ERROR: MSG[6] : [hg=499a2a65]
> heartbeat[3745]: 2009/03/30_04:48:02 ERROR: MSG[7] : [ts=49cfb452]
> heartbeat[3745]: 2009/03/30_04:48:02 ERROR: MSG[8] : [ttl=3]
> 
> 

I suppose your /etc/ha.d/authkeys files are configured correctly (and 
not configured to use the same 'secret' for both clusters)
You can change the port used by the second cluster , or even better 
(what i do usually) broadcast in a separate vlan for the heartbeat 
signal (you don't broadcast to the production network that way, so more 
efficient)


-- 
--
Fabian Arrotin
  idea=`grep -i clue /dev/brain`
  test -z "$idea" && echo "sorry, init 6 in progress" || sh ./answer.sh
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos