Re: [CentOS] Two sets of Heartbeat HTTPD clusters on same subnet
I have successfully configure two machines to use heartbeat to cluster httpd. The two nodes are called etk-1 and etk-2. I am trying to configure another two machines to act as a separate cluster (on the same IP subnet). These two nodes are called radu-1 and radu-2. We successfully do this with many pairs of HA nodes in the same subnet, using different UDP ports... Under /etc/ha.d/ha.cf: udpport someNumber Use a different authkey for each pair so as to avoid accidental snafus with mixing up nodes from different pairs. -Jeff ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Re: [CentOS] [OT] Network switches
look at HP Procurves. That is what I use. You can get 2524's quite cheap on ebay. We used these for years, and they were great, and super cheap on EBay. HP support was fantastic as well. The 26xx series allows for light layer 3 routing; you may want to snag the 2626 or 2650 instead of the 25xx series. I believe that HP has end-of-lifed these switches, though, so firmware updates for security bugs, etc, will, from what I understand, cease in a few years. We upgraded to some Dell PowerConnect 6248s in the past year, so that we could use VRRP for (routing-enabled) switch failover. As with all Dell things, hammer them on the price and you can get it ~30% cheaper than listed. -Jeff ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
[CentOS] general protection rip?
Hi List, On one of our CentOS 5 (x64_86) servers, identical to a number of other systems, I'm seeing some processes / services failing to run, along with the following error in /var/log/messages: Mar 2 23:25:07 someHostname kernel: wrapper-linux-x[24448] general protection rip:805386e rsp:ffc20390 error:0 Mar 2 23:25:09 someHostname kernel: dsm_sa_datamgr3[5063] general protection rip:f7f86e7c rsp:f5aab30c error:0 Mar 3 09:57:54 someHostname kernel: dcecfg32[1993] general protection rip:b9c71a rsp:ffa20568 error:0 Mar 6 09:38:59 someHostname kernel: omreport[15107] general protection rip:af597a rsp:ffdccad0 error:0 I've Googled a bit, but don't see any clear consistent explanation. Does anyone have any pointers as to what is causing this? Faulty hardware? I ran memtester, but it didn't find any issue with the memory it was able to test. Thanks, Jeff ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Re: [CentOS] clustering and load balancing Apache
Look at pound: http://www.apsis.ch/pound/ If you are concerned about traffic volume, you might consider running squid as a transparent proxy in front of pound. I.e.: request - squid - pound - apache Where squid will return the response for everything marked as cacheable and still fresh; and pound will take care of load balancing to apache. (Pound can inspect/insert cookies to send visitors to the same back-end node on subsequent requests.) On some of our setups, squid responds to 98% of the requests coming in, and is able to respond to an extremely insane high volume of requests. Other list users might be able to provide good stats as to what sort of volume they can support. (I'd be curious to hear what others have seen...) For HA: - 2 instances of squid, active/standby or active/active (i.e. two IP address in DNS for the public hostname, and have each squid instance pick up the others during failure). - 2 instances of pound, active/standby - N instances of apache Re: replication of content on your apache nodes, another poster suggested drbd. From my understanding, I do not think this is possible, since only one node can mount the drbd volume at a time. If you have shared data that needs to be seen across apache nodes, either stick it in SQL or mount an NFS volume across the nodes. (But then you have NFS in the picture, which might not be so good.) If your apache code is constant, then have a master apache node and write a shell script that runs rsync to push code changes out to the other instances. It's hard to get very specific about what's best for your setup without know the specifics of things like the data sync needs on the apache nodes, so take all of this with a grain of salt -- or as a default starting place. best, Jeff ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Re: [CentOS] Problems with mysql multi-master after update.
For what it's worth, I haven't seen this on any systems I manage when going from 5.0.22-5.0.45, which include permutations of master-slave and master-master. Is there anything useful in /var/log/mysqld.log? after I updated from mysql-5.0.22 CentOS 5.0 to mysqld-5.0.45 in CentOS 5.2, mysql looses master-slave sync after one node reboots. I've noticed that the slave doest not respect the informantion on master.info, instead, it tries to read the informantion from the master server file inc-index.index ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Re: [CentOS] squid HA failover?
Assuming server A with IP M, server B with IP N, and DNS entry X currently pointing at IP M: 1) Add heartbeat on servers A and B, with heartbeat managing a new IP address O (this is your virtual IP -- nothing to do with VRRP, that's for your routers to failover your gateway). 2) If you want active-active load sharing on servers A and B, install pound on both server A and B, and in your pound config, point pound to IP M and IP N (same pound config on both servers), 3) Update your DNS to point entry X to IP M. If you want active-standby on your squids, then have both squids bind to 0.0.0.0 and you're done. The standby server will have squid listening to requests, but since standby won't have the VIP O, it'll just sit there. In this setup, heartbeat is only managing the VIP, but no services. If you want active-active on your squids, then have squid on server A bind to only IP M, squid on server B bind to only IP N, and pound configured to bind to only IP O. Heartbeat will need to be configured to start pound on failover (since IP O will only exist on one box at a time, so pound can't bind to it unless the interface is up). Make sure you test the case where squid (or pound in active-active) on the server running VIP O crashes. -Jeff ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Re: [CentOS] squid HA failover?
Yes, I normally want one server handling the full load to maximize the cache hits. But the other one should be up and running. So, active/standby. Easier config. Squid won't even be aware that heartbeat is running; just keep it running on both servers all the time. See my install notes at bottom. And, because this is already in production as a mid-level cache working behind a loadbalancer, I would like to be able to keep answering on the M or N addresses, gradually reconfiguring the web servers in the farm to use address O instead of the loadbalancer VIP address. Go for it. It'll work fine. You could get fancy and switch primary interface from M to O, and make M the VIP. Depends if you can accept the ~30 seconds of downtime and your tolerance for risk. I got the impression from some of the docs/tutorials that it was a bad idea to access the M/N addresses directly. In your case, it's only bad if M/N go down. Or does that only apply to services where it is important to only have one instance alive at a time or where you are replicating data? Depends on the service and replication setup. If you had master/slave MySQL and connected to the slave, you'd see amnesia on the master. (That setup wouldn't allow for fail-back though, so, probably wouldn't see it.) Things like drbd protect you from concurrent mounting. Even after converting all of the farm to use the new address, I'll still want to be able to monitor the backup server to be sure it is still healthy. Yup. And you'll want to monitor the active node and force-failover if the service fails. My config below doesn't take this into consideration; maybe other list lurkers can correct it to be better. The quick and dirty fix is to for each node to check if it is active, and if it is, if squid is not active, to then run 'service heartbeat restart' to failover to the other node. (I.e. once-a-minute cron job.) Not as pretty as it should be. best, Jeff Replace 1.2.3.4 with your VIP ip address, and a.example.com and b.example.com with your FQDN hostnames. server A (a.example.com): yum -y install heartbeat chkconfig --add heartbeat chkconfig --level 345 heartbeat on echo 'a.example.com IPaddr::1.2.3.4' /etc/ha.d/haresources echo node a.example.com /etc/ha.d/ha.cf echo node b.example.com /etc/ha.d/ha.cf echo udpport 9000 /etc/ha.d/ha.cf echo bcast bond0 /etc/ha.d/ha.cf echo auto_failback off /etc/ha.d/ha.cf echo logfile /var/log/ha-log /etc/ha.d/ha.cf echo logfacility local0 /etc/ha.d/ha.cf echo auth 1 /etc/ha.d/authkeys echo 1 crc /etc/ha.d/authkeys chmod go-rwx /etc/ha.d/authkeys server B (b.example.com): yum -y install heartbeat chkconfig --add heartbeat chkconfig --level 345 heartbeat on echo 'a.example.com IPaddr::1.2.3.4' /etc/ha.d/haresources # yes, a again - that's the default host to run the service echo node a.example.com /etc/ha.d/ha.cf echo node b.example.com /etc/ha.d/ha.cf echo udpport 9000 /etc/ha.d/ha.cf echo bcast bond0 /etc/ha.d/ha.cf echo auto_failback off /etc/ha.d/ha.cf echo logfile /var/log/ha-log /etc/ha.d/ha.cf echo logfacility local0 /etc/ha.d/ha.cf echo auth 1 /etc/ha.d/authkeys echo 1 crc /etc/ha.d/authkeys chmod go-rwx /etc/ha.d/authkeys # This assumes: # 1) your network is bond0, not eth0 # 2) you are on a private network where you don't care about security, otherwise see http://www.linux-ha.org/authkeys # Make sure udpport isn't in use by any other instances; or, use mcast. On server A: service heartbeat start # Then, check your log files (/var/log/ha-log and /var/log/messages). # Ping the virtual IP. On server B: service heartbeat start # check your log files On server A: service heartbeat restart On server B: ifconfig -a # Check if the interface is now runing on server B. You can monitor current active node with arp -- the mac address will switch to match the physical interface that the VIP is running on. ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Re: [CentOS] Virtual NICs (aliases like eth0:1) won't come up after reboot
I found under CentOS 4 a few years ago that the OS would only bring up virtual interfaces starting with 0:0 and increasing sequentially -- if there was a gap, it would stop at that gap point. I.e.: Good: eth0, eth0:0, eth0:1, eth0:2... Bad: eth0, eth0:0, eth0:2 (system would only start 0 and 0:0) Bad: eth0, eth0:1, eth0:2 (system would only start 0) This last case looks like yours --in your config, after eth0, you started with eth0:1. -Jeff ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
[CentOS] count of active tcp sockets?
Hi List, Is there an easy way to get a count of the number of active socket connections, or even better, number of socket connections in the time_wait state? (Something lightweight... under /proc/sys/net/ipv4/? I'd like to avoid the impact of listing out all the connections a-la netstat.) Thanks! -Jeff ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Re: [CentOS] count of active tcp sockets?
netstat -an|grep TIME_WAIT|wc ? I need to avoid anything that lists out all the connections -- the above would take too long if there are tens of thousands of connections. I'm hoping there's a proc entry that has a summary count of the current number of connections? -Jeff ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
[CentOS] Newer MySQL in centos-plus?
Hi List, I'm noticing that the CentOS-plus repo for 4.6 has MySQL 5.0.54 in it, but the CentOS 5.1 repo does not have a newer rpm, leaving the newest easily-available version as the vendor-provided mysql 5.0.22. Is there a reason for this? We're wanting to try a newer MySQL under CentOS 5.1 -- do any of the standard repos include such an RPM? Thanks! -Jeff ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
[CentOS] Is there a fix for the sqlite cache needs updating error?
I've been seeing the below message from yum whenever the repo has an update (CentOS 5): /etc/cron.daily/yum.cron: ** Message: sqlite cache needs updating, reading in metadata Googling a bit, it looks like others have seen this happen as well. The solutions, when I've found them, have been along the lines of send output to /dev/null or edit this file, but nothing that feels like the right fix. Is there a general solution to this? Or should we just sit tight for redhat bug #429689 to be fixed (5.2?). -Jeff ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Re: [CentOS] Is there a fix for the sqlite cache needs updating error?
** Message: sqlite cache needs updating, reading in metadata ... Is there a general solution to this? Or should we just sit tight for redhat bug #429689 to be fixed (5.2?). this is not really a bug ... it is just verbose output that causes an e-mail to be sent. It seems to be the default, though, and when managing 150+ servers, this ends up being a lot of email! Is there a simple way to flip that message off? I do not see how this issue is at all related to RH bug 429689 ??? Typo; meant 429869; as Akemi suggested. -Jeff ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
[CentOS] limit number of per-vhost or per-user cgi processes?
Hi List, Is there a way to limit the number of cgi processes Apache's suExec will fork for a given vhost or given user? (either solution is fine) suExec doesn't honor the /etc/security/limits.conf nproc value. mod_throttle seems to be dead; and I can't figure out if selinux might be able to manage this (although would rather not flip selinux from permissive to enabled). What are others doing? Thanks! -Jeff ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Re: [CentOS] RPM for perl-svn-notify?
... If I can't find an RPM for a Perl module on one of the third- party repositories, I usually use cpanflute2 to build an RPM, then install that. That way RPM knows all about the module and can handle it appropriately. ... Thanks, Jay! Mostly there. For some reason, the rpm file is outputting the files under /var/tmp, instead of on the system: rpm -ql perl-SVN-Notify /usr/share/doc/perl-SVN-Notify-2.66 /usr/share/doc/perl-SVN-Notify-2.66/Changes /usr/share/doc/perl-SVN-Notify-2.66/README /var/tmp/perl-SVN-Notify-2.66-8-root/usr/bin/svnnotify /var/tmp/perl-SVN-Notify-2.66-8-root/usr/lib/perl5/site_perl/5.8.8/ SVN/Notify.pm /var/tmp/perl-SVN-Notify-2.66-8-root/usr/lib/perl5/site_perl/5.8.8/ SVN/Notify/Alternative.pm ... Did I miss a setting somewhere? -Jeff On CentOS 5 x86_64: yum -y install perl-RPM-Specfile perl-IO-Zlib rpm-build perl-rpm- build-perl perl-Module-Build perl-HTML-Parser wget 'http://search.cpan.org/CPAN/authors/id/D/DW/DWHEELER/SVN-Notify-2.66.tar.gz' gunzip SVN-Notify-2.66.tar.gz cpanflute2 --name=SVN-Notify --version=2.66 SVN-Notify-2.66.tar -- buildall rpm -Uvh perl-SVN-Notify-2.66-8.src.rpm ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Re: [CentOS] Clustering MySQL
- master A is at position X - master B, replicating from A, gets to position X - master A syncs to its filesystem that it's at position X - master A receives some inserts, and is now at position Y - master B, replicating from A, gets to position Y - master A crashes before the position gets synced to filesystem - master A gets rebooted, recovers from innodb log, but has itself only marked at position X - master B requests position Y from master A, but that position doesn't exist yet, so replication breaks. Perhaps someone here knows the proper recovery procedure at this point? If this were master-slave, I'd probably do an LVM Snapshot and get a fresh copy of the master db. The same could be done for master-master. I'm not sure this would work, since some data will have been inserted in on master B as well. I.e., with master-master, a one-way sync won't work. The only recovery option that I can see is to destroy Master A, and copy Master B -- either via an LVM snapshot or shutdown, sync, startup -- to create a new Master A. Maybe this is what you're suggesting? Is there a better way? best, Jeff ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Re: [CentOS] weird load values
As mentioned before, IO could give such strange results. I suggest launching dstat with logging to a file, and analyzing the file afterwards. Thanks, much appreciated! This has yielded some interesting data, which I'll attempt to include a few seconds before and after one of these events occurred. system interrupts per second: Note the ~200x jump to almost 200,000 interrupts per second. 2907 6714 1371 194218 2456 2907 network received: Note the network received ramps up over 5 seconds, peaks at ~50x background, and ramps back down in about 3 seconds. The peak is from the same sample as the 200x sample above. 108784 389794 1070850 4843956 352226 353102 96392 Everything else looks sane -- there's enough ram, nothing's being swapped out, etc. This is on a private-network server that has a load balancer in front of it, so if it's network related, it wouldn't be misdirected random bits. Has anyone seen this sort of behavior before? What was the cause? What should I do to figure out how to keep the load averages from flipping out of control? (This isn't something as lame as a counter rolling over somewhere internal to the kernel, is it? Wouldn't think so, but thought to ask. Running 2.6.18-8.1.8.el5. We could reboot to run 2.6.18-8.1.15 if that'd be a potential fix.) Thanks for any insight! best, Jeff total cpu usage dsk/total net/total system usr sys readwritrecvsendint csw 10.53.250 409600 108784 72286 290720376 3.992.993 0 319488 389794 661170 671423941 0.250.250 720896 1070850 1189720 137116648 9.167 90.442 12288 1122304 4843956 38 194218 55433 56.931 16.832 0 1273856 352226 334506 245612844 46.25 20 0 454656 353102 384496 290720631 24.25 1.250 3260416 96392 72316 134217307 23.25 2.250 610304 91086 71194 145817584 10.973 1.496 0 0 84192 46276 134918135 0 0 0 94208 71892 33304 122016979 0.250.250 126976 71184 47576 126816973___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Re: [CentOS] how does one remove bond1?
Remove the max_bonds=2 from /etc/modprobe.conf Yup, doing that removed the mystery bond1 under /proc/net/bonding - thanks! best, Jeff ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos