Comparing vendors: scaling + resiliency

Smith, Curt H. Thu, 21 Dec 2000 09:00:52 -0800
I'm closer to having to make a buy decission on an architecture
and appserver vendor for a very large scale mission critical system.
100's of boxes, many 1000's of services.

This must include facilities for control, monitoring and status of the whole
system.  SNMP and what else???  The vendors are weak here!

I'd like to hear what folks feel about current vendors and possibly what
might be in next releases if anyone's got some info?

Scaling, my views:      Efficient Name Service (NS) and robust client Stub

        - Clustering.
                To me, the appserver should support flexible asymetric
clusters.
                The cluster should spin up new VMs when load demands.
                Weblogic does neither, Gemstone does both, Inprise supports
                asymetric clusters but don't remember if they spin up VMs.

        - Name service

           The best of breed NS  that I've seen is in Weblogic server.  It
uses a multi-cast group to
           keep all NS in a cluster in sync.  The worst choice is using
commercial LDAP
           directory servers for the local store and LDAP replication to
keep N instances
           in sync.  Our current appserver uses this architecture and we
hate it for so many
           reasons.

        - Client side Stub:
                - All vendors support re-bind and re-dispatch on IOException

                  is there differentiation between the vendors?

                - Does anyone support time out on method call??  I.E. an SLA
                  and re-dispatch to a different instance when time out
occurs?
                  To me this is much lighter weight than creating a client
side
                  transaction context and setting transaction timeout.  The
former
                  approach would not appreciably increase per-method
latency,
                   where com.jts.UserTransaction would be a costly operation
                   for every method.

           Other vendors stack up?

        - Load balancing.

                - Some vendors use a bulletin board of which services are
busiest
                  and do method level load balancing.

                  I'm not sure I want this all the time??  Another source of
latency
                  and bottle neck???

                - Bind level load balancing.  All vendors that support
clustering at least
                  support this form of load balancing unless they do method
level balancing.

Resiliency / fault tolerance (clustered boxes):

        - NS, object activation and system status must not have single point
of failure.

          Any vendor differentiation here or particularly bad designs??

        - Client Stub and NS provide re-dispatch of failed  (indepotient)
method call.

          I need but haven't found method level timeout.  Any vendors
present or future?

        - When service / box fails the NS must be quickly scrubbed of dead
references

        - If singleton services fail they must be restarted else where.

          Any vendor support this?
]

Thanks for your thoughts and experiences!

Curt
Architect of our next-gen telephone system.


Curt Smith
Z-Tel
email:  [EMAIL PROTECTED]
work:   404-237-1166  x182
FAX:    404-237-1167

===========================================================================
To unsubscribe, send email to [EMAIL PROTECTED] and include in the body
of the message "signoff EJB-INTEREST".  For general help, send email to
[EMAIL PROTECTED] and include in the body of the message "help".
Comparing vendors: scaling + resiliency

Reply via email to