Re: [Ltsp-discuss] Round-Robin Load Balancing for multiple LTSP servers?

John D. Robertson Thu, 22 Jan 2004 10:20:16 -0800

Anders,

Please see comments below:

On Thursday 22 January 2004 10:58 am, Anders Bruun Olsen wrote:
> This would not get over the problem that all the users would need an
> account on both servers and the fact that their settings wouldn't
> migrate from one server to the other.

Well, there's no way you could do these things with any dhcp daemon anyway, no 
matter how cleverly configured.

If you set up NIS for user authentication and use an NFS mounted volume for 
user home directories, then the solution I suggested would work just fine. If 
you are sysadmining an LTSP installation, you should already know this. I 
assume you are a Windows sysadmin and new to Unix.

You need to know that Linux servers just don't go down unless the hardware 
fails, or the sysadmin is incompetant. I have an installation where the 
entire enterprise (24 users, email, web server, database, firewall, dns, 
dhcp, ...) is running on _one_ dual processor system. The users access it 
using LTSP thin clients. The system has been up for 50 days. The last time it 
was down was for a kernel upgrade.

If you want true high availability then the cost and complexity will be 
substantial. You may find that it is much less expensive to put all your 
software on one fast server with RAID'd disks. keep a complete set of spare 
parts handy so you could fix a hardware failure and have the server back up 
again quickly.

>
> > Put an entry in the crontab to run this script every minute.
>
> I believe this would put quite a load on the server - CPU time which in
> my opinion could be used better for other things.

Better upgrade that '386 man!
Seriously, you need to look at this quantitatively. What makes you think that 
a tiny job that runs once a _minute_ would put excessive load on the CPU? It 
would probably take a few _milliseconds_ of CPU time at the most. Compared to 
the CPU load of things like Gnome, KDE, Mozilla, Java, and Open Office, this 
is serveral _orders_of_magnitude_ smaller, or negligable.

In terms of wasted CPU, you should be concerned about processes like "java_vm" 
and "wineloader", which are often particularly obnoxious. For example, one 
user can bring the system to it's knees when they hit a web page with 20 java 
applets on it, each one spawning a different "java_vm" that is eating all the 
CPU it can get!

My solution to this is the following script, which runs once per minute:
-----------------------------------------------------------------------------------------------------------------------------
#!/bin/sh
#
# 3/21/01  created by JDR
# Checks for processes that are rampant, and kills them if they remain
# continue to run.
#
#
KILL_PROCNAMES=''
NICE_PROCNAMES='java_vm wineloader wineserver'

KILL_LIST=/tmp/manageRampantProcs_kill.lst
NICE_LIST=/tmp/manageRampantProcs_nice.lst
REFFILE=/var/run/crond.pid
# This is the %CPU utilization that gets our attention
THRESHOLD=15

function getPidCpu ()
# Get a list of PID's and %CPU utilization given a list of command names.
{
   THRESH=$1
   shift
   unset CMDSTR

#  Get ps entries for all processes we are monitoring
   while [ -n "$1" ];do
      CMDSTR="$CMDSTR -C $1"
      shift
   done
#  Run ps command, strip decimal part of %CPU
   ps --no-headers $CMDSTR -o 'pid pcpu'| sed 's/\.[0123456789]*$//' | \
   (
      while read PID CPU; do
#ps --no-header -p $PID 1>&2
         if [ -n "$CPU" ] && let "$CPU > $THRESH"; then
            echo $PID
         fi
      done
   )
}

function getOffendingPids ()
# Search the supplied list of PID's, and return those that are using more CPU 
than the threshold
{
   THRESH=$1
   shift

#  check pid's for offending processes
   while [ -n "$1" ]; do
      CPU=$(ps --no-headers -p $1 -o 'pcpu'| sed 's/\.[0123456789]*$//')
      if [ -n "$CPU" ] && let "$CPU > $THRESH"; then
         echo $1
      fi
      shift
   done
}

######################## main() ############################
# If the list is legitimate, search for processes still
# using too much CPU.
if [ -e "$NICE_LIST" ] && [ "$NICE_LIST" -nt "$REFFILE" ]; then
   PIDS=$(getOffendingPids $THRESHOLD $(cat $NICE_LIST))
   if [ -n "$PIDS" ]; then
      renice +19 -p $PIDS &>/dev/null
   fi
fi

if [ -e "$KILL_LIST" ] && [ "$KILL_LIST" -nt "$REFFILE" ]; then
   PIDS=$(getOffendingPids $THRESHOLD $(cat $KILL_LIST))
   if [ -n "$PIDS" ]; then
      kill -s SIGTERM $PIDS &>/dev/null
#     Give honest processes a chance to exit gracefully
      sleep 5
      PIDS=$(ps --no-header -p $(cat $KILL_LIST) -o 'pid')
      if [ -n "$PIDS" ]; then
         kill -s SIGKILL $PIDS &>/dev/null
      fi
   fi
fi

# Generate update list of files to watch
getPidCpu $THRESHOLD $KILL_PROCNAMES >$KILL_LIST
getPidCpu $THRESHOLD $NICE_PROCNAMES >$NICE_LIST

exit 0
---------------------------------------------------------------------------------

-- 
=============================================================
John D. Robertson, Computer / Engineering Consultant
Robertson & Robertson Consultants, Inc.
3637 West Georgia Rd.
Pelzer, SC  29669

Phone: (864) 243-2436
  Fax: (864) 243-3023
Email: [EMAIL PROTECTED]
  WWW: http://www.rrci.com

-------------------------------------------------------
The SF.Net email is sponsored by EclipseCon 2004
Premiere Conference on Open Tools Development and Integration
See the breadth of Eclipse activity. February 3-5 in Anaheim, CA.
http://www.eclipsecon.org/osdn
_____________________________________________________________________
Ltsp-discuss mailing list.   To un-subscribe, or change prefs, goto:
      https://lists.sourceforge.net/lists/listinfo/ltsp-discuss
For additional LTSP help,   try #ltsp channel on irc.freenode.net

Re: [Ltsp-discuss] Round-Robin Load Balancing for multiple LTSP servers?

Reply via email to