There is at least one company (not  a reference) using LVS as an HA vehicle
for their solution in production in Linux zSeries. The company is very
happy. We also did some experiments at another company which tested the LVS
running a director in two different LPARS in two different machines and
when you bring down zVM in the one running the primary director, the
secondary director takes over in the other LPAR and everything looks as if
nothing happens (well...you know what I mean). The good thing about LVS is
that you can have Linux in Intel or any other platform serving as real
servers. If you use tunneling as the protocol the real servers can be
located any where and it supports windows or any other operating system
that you can run the application and that supports TCPIP. I have tested
having servers in Linux zSeries and Intel and everything worked fine. I
know I am over simplifying things but it is really not difficult to setup
and test. Carlos :-)


Saying goes: Great minds think alike - I say: Great minds think for
themselves!

Carlos A. Ordonez
IBM Corporation
Server Consolidation



|---------+--------------------------->
|         |           David Boyes     |
|         |           <dboyes@sinenomi|
|         |           ne.net>         |
|         |           Sent by: Linux  |
|         |           on 390 Port     |
|         |           <[EMAIL PROTECTED]|
|         |           RIST.EDU>       |
|         |                           |
|         |                           |
|         |           12/18/2002 09:59|
|         |           PM              |
|         |           Please respond  |
|         |           to Linux on 390 |
|         |           Port            |
|         |                           |
|---------+--------------------------->
  
>-------------------------------------------------------------------------------------------------------------------------------|
  |                                                                                    
                                           |
  |        To:      [EMAIL PROTECTED]                                            
                                           |
  |        cc:                                                                         
                                           |
  |                 From:                                                              
                                           |
  |               Subject:      Re: High Availability                                  
                                           |
  |                                                                                    
                                           |
  
>-------------------------------------------------------------------------------------------------------------------------------|




> Thanks for the reply.  We've actually already looked into using CSE if it
happens to go on two VM systems.  Other
>possibilities are vm/linux and rs-6000/linux or intel-linux, or even a
Unix
system.  In other words, one on a VM guest linux and
>one somewhere else as the failover site.
> So, the question still remains, "What kind of software do people use on
Linux/390 for high-availability clustering?"

It's a multipart problem. As Dave J mentioned, CSE is one part of the
solution when there are Linux/390 systems involved, but there are some
missing pieces before you can consider doing HA beyond a trivial
load-balancing solution, primarily the ability to do frame-based bridging
between internal LANs and external V:LAN infrastructure without paying a
huge computational penalty.

I think there's a couple of cases here:

Case 1: HA for virtual machine implementations only. This has three
sub-cases, VM HA, Linux HA, and application HA.

Case 2: HA for mixed virtual machine/discrete machine, both Linux systems.
This has several sub-cases as well, VM HA, Linux HA, application HA, HA
signaling processes.

Case 3: HA for mixed virtual machine/discrete machine, differing operating
system.  This has many sub-cases, some of which are VM HA, n cases of
discrete machine HA, network HA, HA signaling, etc.

At a crude level, layer 4 content switching (devices such as the Cisco
Content Switch (aka Local Director and Distributed Director) or the
Alteon/Big IP style solutions) is vendor neutral, and applies equally well
to all three cases for handling distribution of incoming traffic to a set
of
systems.  It addresses only 1 way traffic (from the outside to the server
farm), and requires an agent of some type to run on the systems to detect
heartbeat and feed back load information to balance work. Failover is
handled in two sub-classes: system failure and system overload, both of
which are handled by eliminating the failing or overloaded server from
receiving new work.  One still needs a HSRP/VRRP-style virtual address HA
network solution for outgoing traffic as well.  The current VRRP code does
not work on Linux/390 systems using OSAs because the adapters do not return
an error when attempting to set a duplicate IP address on multiple
adapters.
Adam Thornton has written a stopgap solution (VRT, available for download
from www.sinenomine.net) that provides a limited virtual address takeover
capability for Linux TCP stacks, and has recently added preferred interface
processing for virtual addresses that allow better utilization of multiple
adapters.

Most people doing "HA" for Linux/390 are using this type of mechanism
today,
via an external load-balancing system.

In a more detailed case, you need to address each level in turn.  At the VM
layer, you need both local clustering and remote clustering capabilities.
CSE provides the local capability in that systems in a CSE cluster can
share
a CP directory and disk resources, allowing systems that normally reside on
a failed VM node to be brought up quickly on one of the remaining nodes in
the complex without having to worry about moving data around.  ISFC,
another
CP feature, provides distributed IUCV processing which allows separating
the
TCP stack from the applications processing, which can be scaled
independently of the network node processing (the old SNA "CHOST" concept,
where one system owned the FEPs and everyone else did cross-domain sessions
is alive and well here). If a particular node fails, ISFC can reroute IUCV
traffic via other links to remaining systems (think of a crude
counterrotating token ring architecture where the ISFC cluster is connected
in a ring and one node fails). By starting with the VM layer and handling
the basic structure, you take advantage of a lot of goodies that Linux
doesn't necessarily need to know about -- it just sees the system keep
running.

Once that's done, then you can apply the existing Linux clustering tools
over the top of the VM tools.  The RH "cluster" tool works ok (if you have
zVM 4.3) if you have dedicated Linux systems configured as layer 2 bridges
(not routers -- just frame forwarders) to connect guest LANs between
machines. This is ugly and expensive in terms of CPU cycles, as these tools
tend to use a lot of broadcast and multicast techniques to keep things
syncd
up. Within a single box, they're fine, however that doesn't help the HA
problem much. Other clustering tools exist, but have similar problems.

Individual applications also have clustering capabilities (BEA uses a
multicast based scheme, etc). Once you have the first two layers set up,
it's up to the application to play nicely, and each one works differently.
The basic assumption for most of them is that there be a
broadcast/multicast
capable network for scalability reasons.

That's kind of the crux of the problem if you want to extend a solution
outside the box; the cost of doing layer 2 frame-forwarding into a external
VLAN is prohibitive -- you either have to dedicate a physical interface and
a Linux router and burn lots of CPU cycles to forward frames (on a slow,
expensive 390 CPU with low-density network interfaces) to an external VLAN,
or you keep it inside the boxes and forego really survivable HA.  I'm not
aware of a solution that works well for clustering virtual machines and
external systems, mostly because it's so expensive to do the layer 2 part.
I've heard rumors of VLAN trunking support for OSAs, but no official
comments yet -- this would be a Very Good Thing if it happened.  Another
reason not to toss those CIPs just yet -- the HSRP and VLAN trunking
support
in IOS is very handy for this sort of problem, especially because it
doesn't
consume host cycles, and can be shared between multiple physical 390 frames
(which you can't do with an OSA).

( It's kind of too bad that both hipersockets and guest LANs didn't take
some ideas from the ATM ELAN concepts. There's a lot of good ideas there
that would have made implementing multisystem HA complexes a lot simpler --
all the layer 2 and 3 forwarding stuff is completely worked out, documented
and tested. C'est la vie, I guess. )

In the heterogenous case (virtual machine and non-Linux system), clustering
tools need a fair amount of information to work. Other than the crude layer
4 solutions, it's more of a confederation of separate solutions than a true
HA solution, with limited load balancing. I don't think anyone's managed to
do that one right yet.

So, after the long digression, the short answer I'd give is that the state
of the art for HA is still the external load balancing box for the most
part, with some HA being provided by individual applications or clusters of
virtual or real servers within a specific geographic complex providing
failover within the complex.   For linking internal and external servers,
the solutions available are prohibitively expensive until/unless
significant
advances in networking infrastructure support for the OSA and similar
devices are made. Work is being done, but at a relatively slow pace. Your
best bet right now is the external load balancing box for incoming network
traffic HA management and a combination of VRT for virtual servers and HSRP
for external servers for outgoing traffic. Each platform cluster will need
a
HA solution of it's own, and there are no stable, manageble, cost-effective
clustering tools for internal and external servers just yet.

 -- db

Reply via email to