Hello all,

NJIT has been running Virtualized fileservers and database servers for at 4 
years.   We operate two cells, our administrative cell is fully virtualized.  
Our Student/Research cell is partially virtualized (full virtualization in 
progress now).

OpenAFS at NJIT forms the backbone of many of our IT operations, It is critical 
to our system management philosophy and is used widely by both administrative 
staff, faculty and students.   (Even though they may not always know they are 
using OpenAFS under the covers!)

The VM platform we use is VMware vSphere 4.1 running on a 30 hosts in three HA 
clusters.   Our storage is currently SUN/StorageTek FC SAN arrays and NetApp 
(11 arrays total).   Each vSphere host is dual attached to the SAN at 4gb/s and 
dual-pathed etherchannel to the campus network cisco 65xx VSS datacenter switch 
complex.  (Migration to 10G net and 8g FC underway as we also migrate to a new 
hardware vendor)

All VMs use normal VMDK virtual disks.
VMware HA clusters have DRS in "Fully automated" mode, so VMs will vMotion 
between nodes seeking best use of available resources.    vSphere does a pretty 
good job at this, provided you adequately configure how you want resources 
allocated using VMware "Resource Pools"

Overall, we have about 360 VMs in the entire environment, of those about 30 of 
those are VM fileservers.   All other VMs and nearly every desktop on campus is 
a client.

Performance is good.   Complaints are rare and have never been attributed to 
"virtualization" as the real cause.  (Although it is sometimes the first to be 
blamed)   We have everything from user home directories, web content, software, 
 data files, etc.  Wildly different workloads.

OpenAFS is an workload  just like any other.  It uses CPU, Memory, and I/O.   
When building up a virtual environment all of those resources need to be 
accounted for and scaled appropriately.   With modern hardware the 
"virtualization tax" caused by the hypervisor abstraction is getting smaller 
and smaller.   This is in large part due to work done by AMD and Intel to 
include virtualization instruction enhancements in their CPUs, MMUs and 
chipsets.

Suggestions (these are somewhat VMware specific, but other VM platforms share 
these features):

*         Use a real virtualization platform (sorry, that sounds snobbish, but 
its true, the free stuff doesn't cut it when you scale up), features which are 
very important:

o   Dynamic Resource Scheduling to move VMs around the cluster to seek resources

o   High Availability (HA) at the VM platform layer greatly improves total 
uptime.   This eliminates the need to support separate HA solutions for every 
application.  HA is implemented BELOW the OS in the VM layer.  We have one HA 
cluster technology  (Goodbye MS cluster, sun cluster, linux cluster, cluster of 
the month club)

o   Storage Migration:  (VMware sVmotion)... There have been a number of 
situation where we have had to "evacuate" a disk array.  For hardware 
replacement, upgrade, etc    This can be done non-disruptively.

o   Memory overcommitment

o   Integrated backup management (VMware Consolidated Backup, Site Recovery 
Manager, Etc...)

*         Deploying VMs from Templates:   Templates allow us to deploy a new 
Fileserver VM in (a small number of) minutes.     Deploy VM, Run configuration 
scripts, done.   Its ready to vos move volumes to it.    This is how we perform 
most major software upgrades.   New FS VMs are deployed, Old fileserver VMs are 
evacuated and destroyed.   All non-disruptive.

*         Many fileservers / smaller fileservers:   This philosophy has evolved 
as we have moved more into virtualized fileservers.   With physical hardware 
you are limited by ABILITY TO PURCHASE.  Meaning, you can only get "x" number 
of servers of "n" size.   This means if you want highly resilient servers, you 
can only afford to by a few of them.   This can lead to very fat fileservers.   
If you go for many cheap fileservers, you might be able to get more distributed 
but end up suffering more small individual outages.   With virtualized 
fileservers you have full flexibility.  On the virtual platform, you get HA by 
default on every VM.    After that you get to design your fileserver layout 
decisions based on the DATA they will store.    For example, in our layout we 
have the following classes of fileservers:

o   Infrastructure (INF) Fileservers:  Very small fileserver, for highly 
critical root.*, software, etc. the "bones" of the cell.  Replicated of course.

o   User fileservers (USR):  Home volumes, nuff said

o   Bulk Fileservers (BLK): Almost everything else, projects, web content, 
research data, yadda yadda

o   Jumbo Fileservers (JMB):  Used for ridiculously large volumes.  These 
fileservers are the only fileserver that has a VARIABLE vicep partition size.   
Used for archival data and some research projects.

*         Headroom is maintained on the fileservers in a mostly n+1 type 
fashion so that at least one fileserver can be evacuated at any given time for 
maintenance.   Almost ALL maintenance is non-disruptive.  (Volumes move, FS VMs 
move from blade to blade, and even from array to array non-disruptive)

*         Balancing, evacuation, and new volume placement should be automated 
as much as possible.  (we have scripts to do this)


So, To boil this down.   It works.  It can work, and can actually work very 
well.   The system needs to be scaled and built properly.  When you consolidate 
workloads in this manner, you can afford to buy better hardware and storage.   
You get HA everywhere.  Capacity that is heavily used during the day is freed 
and available at night for other workloads.

Perhaps one of the nicest features is your operation becomes more vendor 
agnostic.   When you have squeezed all the pennies out of your server or 
storage hardware, moving to new hardware is much easier.  Add new hardware, 
move VMs.

Hope this helps, If anyone has questions pls reply to the list or personally.   
One of us will reply.

Thanks
-Matt


Reply via email to