Re: [CentOS] Two sets of Heartbeat HTTPD clusters on same subnet

2009-04-02 Thread J Potter

 I have successfully configure two machines to use heartbeat to cluster
 httpd. The two nodes are called etk-1 and etk-2. I am trying to
 configure another two machines to act as a separate cluster (on the
 same IP subnet). These two nodes are called radu-1 and radu-2.

We successfully do this with many pairs of HA nodes in the same  
subnet, using different UDP ports...

Under /etc/ha.d/ha.cf:
udpport someNumber

Use a different authkey for each pair so as to avoid accidental snafus  
with mixing up nodes from different pairs.

-Jeff
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] [OT] Network switches

2009-03-26 Thread J Potter

 look at HP Procurves. That is what I use.
 You can get 2524's quite cheap on ebay.

We used these for years, and they were great, and super cheap on EBay.  
HP support was fantastic as well. The 26xx series allows for light  
layer 3 routing; you may want to snag the 2626 or 2650 instead of the  
25xx series. I believe that HP has end-of-lifed these switches,  
though, so firmware updates for security bugs, etc, will, from what I  
understand, cease in a few years.

We upgraded to some Dell PowerConnect 6248s in the past year, so that  
we could use VRRP for (routing-enabled) switch failover. As with all  
Dell things, hammer them on the price and you can get it ~30% cheaper  
than listed.

-Jeff

___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


[CentOS] general protection rip?

2009-03-06 Thread J Potter

Hi List,

On one of our CentOS 5 (x64_86) servers, identical to a number of  
other systems, I'm seeing some processes / services failing to run,  
along with the following error in /var/log/messages:

Mar  2 23:25:07 someHostname kernel: wrapper-linux-x[24448] general  
protection rip:805386e rsp:ffc20390 error:0
Mar  2 23:25:09 someHostname kernel: dsm_sa_datamgr3[5063] general  
protection rip:f7f86e7c rsp:f5aab30c error:0
Mar  3 09:57:54 someHostname kernel: dcecfg32[1993] general  
protection rip:b9c71a rsp:ffa20568 error:0
Mar 6 09:38:59 someHostname kernel: omreport[15107] general  
protection rip:af597a rsp:ffdccad0 error:0

I've Googled a bit, but don't see any clear consistent explanation.  
Does anyone have any pointers as to what is causing this? Faulty  
hardware? I ran memtester, but it didn't find any issue with the  
memory it was able to test.

Thanks,
Jeff
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] clustering and load balancing Apache

2009-02-11 Thread J Potter

Look at pound: http://www.apsis.ch/pound/

If you are concerned about traffic volume, you might consider running  
squid as a transparent proxy in front of pound. I.e.:

request - squid - pound - apache

Where squid will return the response for everything marked as  
cacheable and still fresh; and pound will take care of load balancing  
to apache. (Pound can inspect/insert cookies to send visitors to the  
same back-end node on subsequent requests.) On some of our setups,  
squid responds to 98% of the requests coming in, and is able to  
respond to an extremely insane high volume of requests. Other list  
users might be able to provide good stats as to what sort of volume  
they can support. (I'd be curious to hear what others have seen...)

For HA:
- 2 instances of squid, active/standby or active/active (i.e. two IP  
address in DNS for the public hostname, and have each squid instance  
pick up the others during failure).
- 2 instances of pound, active/standby
- N instances of apache

Re: replication of content on your apache nodes, another poster  
suggested drbd. From my understanding, I do not think this is  
possible, since only one node can mount the drbd volume at a time. If  
you have shared data that needs to be seen across apache nodes, either  
stick it in SQL or mount an NFS volume across the nodes. (But then you  
have NFS in the picture, which might not be so good.)

If your apache code is constant, then have a master apache node and  
write a shell script that runs rsync to push code changes out to the  
other instances.

It's hard to get very specific about what's best for your setup  
without know the specifics of things like the data sync needs on the  
apache nodes, so take all of this with a grain of salt -- or as a  
default starting place.

best,
Jeff
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Problems with mysql multi-master after update.

2009-02-10 Thread J Potter

For what it's worth, I haven't seen this on any systems I manage when  
going from 5.0.22-5.0.45, which include permutations of master-slave  
and master-master.

Is there anything useful in /var/log/mysqld.log?


 after I updated from mysql-5.0.22 CentOS 5.0 to mysqld-5.0.45 in  
 CentOS
 5.2, mysql looses master-slave sync after one node reboots.
 I've noticed that the slave doest not respect the informantion on
 master.info, instead, it tries to read the informantion from the  
 master
 server file inc-index.index
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] squid HA failover?

2009-02-06 Thread J Potter

Assuming server A with IP M, server B with IP N, and DNS entry X  
currently pointing at IP M:

1) Add heartbeat on servers A and B, with heartbeat managing a new IP  
address O (this is your virtual IP -- nothing to do with VRRP, that's  
for your routers to failover your gateway).
2) If you want active-active load sharing on servers A and B, install  
pound on both server A and B, and in your pound config, point pound to  
IP M and IP N (same pound config on both servers),
3) Update your DNS to point entry X to IP M.

If you want active-standby on your squids, then have both squids bind  
to 0.0.0.0 and you're done. The standby server will have squid  
listening to requests, but since standby won't have the VIP O, it'll  
just sit there. In this setup, heartbeat is only managing the VIP, but  
no services.

If you want active-active on your squids, then have squid on server A  
bind to only IP M, squid on server B bind to only IP N, and pound  
configured to bind to only IP O. Heartbeat will need to be configured  
to start pound on failover (since IP O will only exist on one box at a  
time, so pound can't bind to it unless the interface is up).

Make sure you test the case where squid (or pound in active-active) on  
the server running VIP O crashes.

-Jeff
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] squid HA failover?

2009-02-06 Thread J Potter

 Yes, I normally want one server handling the full load to maximize the
 cache hits.  But the other one should be up and running.

So, active/standby. Easier config. Squid won't even be aware that  
heartbeat is running; just keep it running on both servers all the time.

See my install notes at bottom.


 And, because
 this is already in production as a mid-level cache working behind a
 loadbalancer, I would like to be able to keep answering on the M or N
 addresses, gradually reconfiguring the web servers in the farm to use
 address O instead of the loadbalancer VIP address.

Go for it. It'll work fine. You could get fancy and switch primary  
interface from M to O, and make M the VIP. Depends if you can accept  
the ~30 seconds of downtime and your tolerance for risk.


 I got the impression
 from some of the docs/tutorials that it was a bad idea to access the  
 M/N
 addresses directly.

In your case, it's only bad if M/N go down.


 Or does that only apply to services where it is
 important to only have one instance alive at a time or where you are
 replicating data?

Depends on the service and replication setup. If you had master/slave  
MySQL and connected to the slave, you'd see amnesia on the master.  
(That setup wouldn't allow for fail-back though, so, probably wouldn't  
see it.) Things like drbd protect you from concurrent mounting.


 Even after converting all of the farm to use the new
 address, I'll still want to be able to monitor the backup server to be
 sure it is still healthy.

Yup. And you'll want to monitor the active node and force-failover if  
the service fails. My config below doesn't take this into  
consideration; maybe other list lurkers can correct it to be better.  
The quick and dirty fix is to for each node to check if it is active,  
and if it is, if squid is not active, to then run 'service heartbeat  
restart' to failover to the other node. (I.e. once-a-minute cron job.)  
Not as pretty as it should be.

best,
Jeff


Replace 1.2.3.4 with your VIP ip address, and a.example.com and  
b.example.com with your FQDN hostnames.

server A (a.example.com):
yum -y install heartbeat
chkconfig --add heartbeat
chkconfig --level 345 heartbeat on

echo 'a.example.com IPaddr::1.2.3.4'  /etc/ha.d/haresources
echo node a.example.com  /etc/ha.d/ha.cf
echo node b.example.com  /etc/ha.d/ha.cf
echo udpport 9000  /etc/ha.d/ha.cf
echo bcast bond0  /etc/ha.d/ha.cf
echo auto_failback off  /etc/ha.d/ha.cf
echo logfile /var/log/ha-log  /etc/ha.d/ha.cf
echo logfacility local0  /etc/ha.d/ha.cf
echo auth 1  /etc/ha.d/authkeys
echo 1 crc  /etc/ha.d/authkeys
chmod go-rwx /etc/ha.d/authkeys

server B (b.example.com):
yum -y install heartbeat
chkconfig --add heartbeat
chkconfig --level 345 heartbeat on

echo 'a.example.com IPaddr::1.2.3.4'  /etc/ha.d/haresources # yes,  
a again - that's the default host to run the service
echo node a.example.com  /etc/ha.d/ha.cf
echo node b.example.com  /etc/ha.d/ha.cf
echo udpport 9000  /etc/ha.d/ha.cf
echo bcast bond0  /etc/ha.d/ha.cf
echo auto_failback off  /etc/ha.d/ha.cf
echo logfile /var/log/ha-log  /etc/ha.d/ha.cf
echo logfacility local0  /etc/ha.d/ha.cf
echo auth 1  /etc/ha.d/authkeys
echo 1 crc  /etc/ha.d/authkeys
chmod go-rwx /etc/ha.d/authkeys

# This assumes:
# 1) your network is bond0, not eth0
# 2) you are on a private network where you don't care about  
security, otherwise see http://www.linux-ha.org/authkeys
# Make sure udpport isn't in use by any other instances; or, use mcast.

On server A:
service heartbeat start
# Then, check your log files (/var/log/ha-log and /var/log/messages).
# Ping the virtual IP.

On server B:
service heartbeat start
# check your log files

On server A:
service heartbeat restart

On server B:
ifconfig -a
# Check if the interface is now runing on server B.

You can monitor current active node with arp -- the mac address will  
switch to match the physical interface that the VIP is running on.

___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Virtual NICs (aliases like eth0:1) won't come up after reboot

2008-11-14 Thread J Potter


I found under CentOS 4 a few years ago that the OS would only bring up  
virtual interfaces starting with 0:0 and increasing sequentially -- if  
there was a gap, it would stop at that gap point.


I.e.:
Good: eth0, eth0:0, eth0:1, eth0:2...
Bad: eth0, eth0:0, eth0:2  (system would only start 0 and 0:0)
Bad: eth0, eth0:1, eth0:2  (system would only start 0)

This last case looks like yours --in your config, after eth0, you  
started with eth0:1.


-Jeff

___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


[CentOS] count of active tcp sockets?

2008-04-16 Thread J Potter


Hi List,

Is there an easy way to get a count of the number of active socket  
connections, or even better, number of socket connections in the  
time_wait state? (Something lightweight... under /proc/sys/net/ipv4/?  
I'd like to avoid the impact of listing out all the connections a-la  
netstat.)


Thanks!
-Jeff
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] count of active tcp sockets?

2008-04-16 Thread J Potter



netstat -an|grep TIME_WAIT|wc  ?


I need to avoid anything that lists out all the connections -- the  
above would take too long if there are tens of thousands of connections.


I'm hoping there's a proc entry that has a summary count of the  
current number of connections?


-Jeff

___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


[CentOS] Newer MySQL in centos-plus?

2008-03-10 Thread J. Potter



Hi List,

I'm noticing that the CentOS-plus repo for 4.6 has MySQL 5.0.54 in it,  
but the CentOS 5.1 repo does not have a newer rpm, leaving the  
newest easily-available version as the vendor-provided mysql 5.0.22.


Is there a reason for this? We're wanting to try a newer MySQL under  
CentOS 5.1 -- do any of the standard repos include such an RPM?


Thanks!
-Jeff
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


[CentOS] Is there a fix for the sqlite cache needs updating error?

2008-02-19 Thread J. Potter


I've been seeing the below message from yum whenever the repo has an  
update (CentOS 5):



/etc/cron.daily/yum.cron:

** Message: sqlite cache needs updating, reading in metadata


Googling a bit, it looks like others have seen this happen as well.   
The solutions, when I've found them, have been along the lines of  
send output to /dev/null or edit this file, but nothing that feels  
like the right fix.


Is there a general solution to this? Or should we just sit tight for  
redhat bug #429689 to be fixed (5.2?).


-Jeff
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Is there a fix for the sqlite cache needs updating error?

2008-02-19 Thread J. Potter



** Message: sqlite cache needs updating, reading in metadata
... Is there a general solution to this? Or should we just sit  
tight for redhat bug #429689 to be fixed (5.2?).


this is not really a bug ...
it is just verbose output that causes an e-mail to be sent.


It seems to be the default, though, and when managing 150+ servers,  
this ends up being a lot of email!


Is there a simple way to flip that message off?



I do not see how this issue is at all related to RH bug 429689 ???


Typo; meant 429869; as Akemi suggested.

-Jeff
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


[CentOS] limit number of per-vhost or per-user cgi processes?

2008-02-14 Thread J. Potter


Hi List,

Is there a way to limit the number of cgi processes Apache's suExec  
will fork for a given vhost or given user?  (either solution is fine)


suExec doesn't honor the /etc/security/limits.conf nproc value.  
mod_throttle seems to be dead; and I can't figure out if selinux might  
be able to manage this (although would rather not flip selinux from  
permissive to enabled).


What are others doing?

Thanks!

-Jeff
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] RPM for perl-svn-notify?

2008-02-06 Thread J. Potter
... If I can't find an RPM for a Perl module on one of the third- 
party repositories, I usually use cpanflute2 to build an RPM, then  
install that.  That way RPM knows all about the module and can  
handle it appropriately. ...



Thanks, Jay!

Mostly there. For some reason, the rpm file is outputting the files  
under /var/tmp, instead of on the system:


rpm -ql perl-SVN-Notify
/usr/share/doc/perl-SVN-Notify-2.66
/usr/share/doc/perl-SVN-Notify-2.66/Changes
/usr/share/doc/perl-SVN-Notify-2.66/README
/var/tmp/perl-SVN-Notify-2.66-8-root/usr/bin/svnnotify
	/var/tmp/perl-SVN-Notify-2.66-8-root/usr/lib/perl5/site_perl/5.8.8/ 
SVN/Notify.pm
	/var/tmp/perl-SVN-Notify-2.66-8-root/usr/lib/perl5/site_perl/5.8.8/ 
SVN/Notify/Alternative.pm

...

Did I miss a setting somewhere?

-Jeff


On CentOS 5 x86_64:

	yum -y install perl-RPM-Specfile perl-IO-Zlib rpm-build perl-rpm- 
build-perl perl-Module-Build perl-HTML-Parser

wget 
'http://search.cpan.org/CPAN/authors/id/D/DW/DWHEELER/SVN-Notify-2.66.tar.gz'
gunzip SVN-Notify-2.66.tar.gz
	cpanflute2 --name=SVN-Notify --version=2.66 SVN-Notify-2.66.tar  -- 
buildall

rpm -Uvh perl-SVN-Notify-2.66-8.src.rpm
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Clustering MySQL

2007-12-11 Thread J. Potter



 - master A is at position X
 - master B, replicating from A, gets to position X
 - master A syncs to its filesystem that it's at position X

 - master A receives some inserts, and is now at position Y
 - master B, replicating from A, gets to position Y
 - master A crashes before the position gets synced to filesystem
 - master A gets rebooted, recovers from innodb log, but has itself
only marked at position X
 - master B requests position Y from master A, but that position
doesn't exist yet, so replication breaks.

Perhaps someone here knows the proper recovery procedure at this  
point?


If this were master-slave, I'd probably do an LVM Snapshot and get a
fresh copy of the master db.  The same could be done for
master-master.


I'm not sure this would work, since some data will have been inserted  
in on master B as well. I.e., with master-master, a one-way sync won't  
work. The only recovery option that I can see is to destroy Master A,  
and copy Master B -- either via an LVM snapshot or shutdown, sync,  
startup -- to create a new Master A.  Maybe this is what you're  
suggesting?


Is there a better way?

best,
Jeff
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] weird load values

2007-12-06 Thread J. Potter


As mentioned before, IO could give such strange results. I suggest  
launching

dstat with logging to a file, and analyzing the file afterwards.


Thanks, much appreciated!

This has yielded some interesting data, which I'll attempt to include  
a few seconds before and after one of these events occurred.


	system interrupts per second: Note the ~200x jump to almost 200,000  
interrupts per second.

2907
6714
1371
194218
2456
2907

	network received: Note the network received ramps up over 5 seconds,  
peaks at ~50x background, and ramps back down in about 3 seconds. The  
peak is from the same sample as the 200x sample above.

108784
389794
1070850
4843956
352226
353102
96392


Everything else looks sane -- there's enough ram, nothing's being  
swapped out, etc. This is on a private-network server that has a load  
balancer in front of it, so if it's network related, it wouldn't be  
misdirected random bits.


Has anyone seen this sort of behavior before? What was the cause? What  
should I do to figure out how to keep the load averages from flipping  
out of control?


(This isn't something as lame as a counter rolling over somewhere  
internal to the kernel, is it? Wouldn't think so, but thought to ask.  
Running 2.6.18-8.1.8.el5. We could reboot to run 2.6.18-8.1.15 if  
that'd be a potential fix.)


Thanks for any insight!

best,
Jeff




total cpu usage dsk/total   net/total   system  
usr sys readwritrecvsendint csw
10.53.250   409600  108784  72286   290720376
3.992.993   0   319488  389794  661170  671423941
0.250.250   720896  1070850 1189720 137116648
9.167   90.442  12288   1122304 4843956 38  194218  55433
56.931  16.832  0   1273856 352226  334506  245612844
46.25   20  0   454656  353102  384496  290720631
24.25   1.250   3260416 96392   72316   134217307
23.25   2.250   610304  91086   71194   145817584
10.973  1.496   0   0   84192   46276   134918135
0   0   0   94208   71892   33304   122016979
0.250.250   126976  71184   47576   126816973___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] how does one remove bond1?

2007-10-22 Thread J. Potter



Remove the max_bonds=2 from /etc/modprobe.conf


Yup, doing that removed the mystery bond1 under /proc/net/bonding -  
thanks!


best,
Jeff
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos