[ https://issues.apache.org/jira/browse/SLING-2939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13699411#comment-13699411 ]
Robert Munteanu commented on SLING-2939: ---------------------------------------- [~egli] - JGroups can function well with UDP or TCP. I actually switched to TCP at some point in my cluster for more reliability, but I guess things have changed in the last 4 years. As for dedicated servers vs embedded, since JGroups is easily embeddable I solved this problem by running a mini-app which just embedded JGroups and (IIRC) was designated to be the JGroups coordinator. This application being always available and under low load, it was a perfect fit for a coordinator. The details are a bit unclear for me right now, but I can dig them up if needed. > 3rd-party based implementation of discovery.api > ----------------------------------------------- > > Key: SLING-2939 > URL: https://issues.apache.org/jira/browse/SLING-2939 > Project: Sling > Issue Type: Task > Components: Extensions > Affects Versions: Discovery API 1.0.0 > Reporter: Stefan Egli > Assignee: Stefan Egli > > The Sling Discovery API introduces the abstraction of a topology which > contains (Sling) clusters and instances, supports liveliness-detection, > leader-election within a cluster and property-propagation between the > instances. As a default and reference implementation a resource-based, OOTB > implementation was created (org.apache.sling.discovery.impl). > Pros and cons of the discovery.impl > Although the discovery.impl supports everything required in discovery.api, it > has a few limitations. Here's a list of pros and cons: > Pros > No additional software required (leverages repository for intra-cluster > communication/storage and HTTP-REST calls for cross-cluster communication) > Very small footprint > Perfectly suited for a single clusters, instance and for small, rather > stable hub-based topologies > Cons > Config-/deployment-limitations (aka embedded-limitation): connections > between clusters are peer-to-peer and explicit. To span a topology, a number > of instances must (be made) know (to) each other, changes in the topology > typically requires config adjustments to guarantee high availability of the > discovery service > Except if a natural "hub cluster" exists that can serve as connection > point for all "satellite clusters" > Other than that, it is less suited for large and/or dynamic topologies > Change propagation (for topology parts reported via connectors) is > non-atomic and slow, hop-by-hop based > No guarantee on order of TopologyEvents sent in individual instances - ie > different instances might see different orders of TopologyEvents (ie changes > in the topology) but eventually the topology is guaranteed to be consistent > Robustness of discovery.impl wrt storm situations depends on robustness > of underlying cluster (not a real negative but discovery.impl might in theory > unveil repository bugs which would otherwise not have been a problem) > Rather new, little tested code which might have issues with edge cases > wrt network problems > although partitioning-support is not a requirement per se, similar > edge-cases might exist wrt network-delays/timing/crashes > Reusing a suitable 3rd party library > To provide an additional option as implementation of the discovery.api one > idea is to use a suitable 3rd party library. > Requirements > The following is a list of requirements a 3rd party library must support: > liveliness detection: detect whether an instance is up and running > stable leader election within a cluster: stable describes the fact that a > leader will remain leader until it leaves/crashes and no new, joining > instance shall take over while a leader exists > stable instance ordering: the list of instances within a cluster is > ordered and stable, new, joining instances are put at the end of the list > property propagation: propagate the properties provided within one > instance to everybody in the topology. there are no timing requirements bound > to this but the intention of this is not to be used as messaging but to > announce config parameters to the topology > support large, dynamic clusters: configuration of the new discovery > implementation should be easy and support frequent changes in the (large) > topology > no single point of failure: this is obvious, there should of course be no > single point of failure in the setup > embedded or dedicated: this might be a hot topic: embedding a library has > the advantages of not having to install anything additional. a dedicated > service on the other hand requires additional handling in deployment. > embedding implies a peer-to-peer setup: nodes communicate peer-to-peer rather > than via a centralized service. this IMHO is a negative for large topologies > which would typically be cross data-centers. hence a dedicated service could > be seen as an advantage in the end. > due to need for cross data-center deployments, the transport protocol > must be TCP (or HTTP for that matter) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira