Hello,

I've been testing RHCS (CentOS 5.2) cluster with GFS1 for a while, and
I'm about to transition the cluster to production, and I'd appreciate
a quick review of the architecture and filesystem choices. I've got some 
concerns about GFS (1 & 2) stability and performance vrs. ext3fs, but the 
increased flexibility in a clustered filesystem has a lot of advantages.

If there are fundamental stability advantages to a design that does not
cluster the filesystems (ie., that uses GFS in lock_nolock mode or ext3fs), 
that would override any performance consideration.

Assuming that stability is not an issue, my basic question in terms of choosing
an architecture is whether there is better performance through using GFS with
multiple cluster nodes (gaining some CPU and network load balancing at the cost
of the GFS locking performance penalty) accessing the same data, or whether
serving each volume from a single server via NFS (using RCHS solely for
fail-over) is more efficient? Obviously, I don't expect anyone to provide
definitive answers or data that's unique to our environment, but I'd highly
appreciate your view on the architecture choices.


Background:
    Our lab does basic science research on software to process medical
    images. There are about 40 lab members, with about 15~25 logged in
    at any given time. Most people will be logged into multiple servers
    at once, with their home directory and all data directories provided
    via NFS at this time.

    The workload is divided between a software development environment
    (compile/test cycles) and image processing. The software development
    process is interactive, and includes algorithm testing which requires
    reading/writing multi-MB files.  There's a reasonably high performance
    expectation for interactive work, less so for the testing phase.

    Many lab members also SAMBA mount filesystems from the servers
    to their desktop machines, for which there is a high performance
    expectation.

    The image processing is very strongly CPU-bound, but involves reading
    many image files in the 1 to 50MB range, and writing results files
    in the same range, along with smaller index and metadata files. The
    image processing is largely non-interactive, so the I/O performance
    is not critical.

The RHCS cluster will be used for infrastructure services (not as a
compute resource for image processing, not as login servers, not as
compilation servers). The primary services to be run on the clustered
machines are:

        network file sharing (NFS, Samba)
        SVN repository
        backup server (bacula, to fibre-attached tape drive)
        Wiki
        nagios

None of those services require a lot of CPU. The network file sharing
could benefit from load balancing, so that the NFS and SAMBA clients have
multiple network paths to the storage, but the NFS and SAMBA protocols
are not well suited for using RHCS as a load balancer, so this may not
be possible (using LVS or a front-end hardware load balancer is not an
option at this time...HA is much more important than load balancing).

The goals of using RHCS and clustering those functions are (in order of
importance):

        stability of applications
        high availability of applications
        performance
        expandability of filesystems (ie., expand volumes at the SAN, LUN,
                LVM, and filesystem layers)
        expandability of servers (add more servers to the cluster, with
                machines dedicated to functions, as a crude form of load
                balancing)
        
The computing environment consists of:
        2 RHCS servers
                fibre attached to storage and backup tape device

        ~15TB EMC fibre-attached storage
        ~14TB fibre and iSCSI attached storage in the near future

        4 compute servers
                currently accessing storage via NFS, could be
                fibre-attached and configured as cluster members

        35 compute servers
                NFS-only access to storage, possibly iSCSI in the
                future, no chance of fibre attachment


As I see it, there are 3 possible architecture choices:

        [1] infrastructure only-GFS+NFS
                the 2 cluster nodes share storage via GFS, and
                act as NFS servers to all compute servers

                + load balancing of some services
                - complexity of GFS
                - performance of shared GFS storage

        [2] shared storage/NFS
                2 cluster nodes and 4 fibre-attached compute servers
                share storage via GFS (all machines are RHCS nodes, but
                the compute nodes do not provide infrastructure services,
                just use cluster membership for GFS file access)

                each GFS node is potentially an NFS server (via a VIP) to
                the 35 compute servers that are not on the fibre SAN

                + potentially faster access to data for 4 fibre-attached
                  compute servers

                - potentially slow accesss to data for 4 fibre-attached
                  compute servers due to GFS locking

                + increased stability over 2 node cluster
                - increased complexity
                
        [3] exclusive storage/NFS
                filesystems are formatted as ext3fs, exclusively mounted
                to one of the 2 infrastructure cluster nodes at a time,
                each filesystem mount also includes a child (dependent)
                function for the node to be an NFS server, all compute nodes
                access data via NFS

                + reliability of filesystem
                + performance of filesystem
                - potential for corruption in case of non-exclusive access
                - decreased flexibility due to exclusive use
                - no potential for load balancing across cluster nodes

I'm very interested in getting your opinion of the choices, and would like
to learn about other ideas that I may have overlooked.

Thanks,

Mark


----
Mark Bergman                              voice: 215-662-7310
[email protected]                 fax: 215-614-0266
System Administrator     Section of Biomedical Image Analysis
Department of Radiology            University of Pennsylvania
      PGP Key: https://www.rad.upenn.edu/sbia/bergman

--
Linux-cluster mailing list
[email protected]
https://www.redhat.com/mailman/listinfo/linux-cluster

Reply via email to