Really appreciate your time and explanation! Lets summarize here the pros and cons for using Observer and View Aggregator. (I should really have included this section in design doc and will add to it.)
*Observer* Pros: - Existing ZK component, officially supported by community, and officially recommended for scaling read-only access - Guarantee of event sequence consistency Cons: - All data in ZK will be propagated in real time no matter what, a waste of cross DC traffic especially when Helix is sharing ZK with other services, or when there is a few cluster need to be aggregated - New interface with aggregation logics need to be implemented and upgraded at client side, which adds up configuration, upgrading and performance overhead - Deployment overhead, as we need maintain deployments of observer in each DC for different ZKs *View Cluster and View Aggregator* Pros: - Easy to control what clusters / what data inside each cluster to aggregate, and at what granularity, with what semantics. - No change on client side, as View cluster data model will follow exactly same semantics as a normal Helix cluster, so all currently Helix libraries can be used to access aggregated view. - Reduce deployment overhead, as we only need to maintain 1 deployment in each DC and choose from what ZKs to aggregate by configuring "source cluster" Cons: - No guarantee of event sequence consistency: event happens in view cluster does not reflect any ordering / timing of those in source cluster (but we can enforce such sequence when needed by altering impl) - Development overhead: if aggregation requirement gets more complicated, its complexity will grow >From the above analysis, I still think that View Cluster and View Aggregator has it's own values that native ZK Observer is not able to provide, and makes it worth to develop such a service in Helix eco-system. What do you think? On Tue, Oct 23, 2018 at 8:21 PM kishore g <g.kish...@gmail.com> wrote: > What you are proposing is no different from an Observer (without > consistency guarantees) etc. Initially, this might look simple but once we > start handling all the edge cases, it will start looking more like > Observer. > > “Observers forward these requests to the Leader like Followers do, but they > then simply wait to hear the result of the vote” The documentation is > referring to the writes sent to observers. The use case we are trying to > address involves only reading. Observers do not forward the read requests > to leaders. > > > > On Tue, Oct 23, 2018 at 5:16 PM Hao Zhang <hzzh0...@gmail.com> wrote: > > > I agree that it is undoubtedly true that using native ZooKeeper observer > > has big advantage such that it also provides clients with ordered events, > > but our use cases (i.e. administration, federation, or Ambry’s data > > replication and serving requests from remote) are just not latency > > sensitive and therefore having strict event sequence enforcement is > likely > > to be an overkill. > > > > In addition, according to zookeeper’s official documentation, “Observers > > forward these requests to the Leader like Followers do, but they then > > simply wait to hear the result of the vote”. Since observers are just > > proxy-ing requests, it cannot actually resolve our real pain point - the > > massive cross data center traffic generated when all storage nodes and > > routers in Ambry cluster need to know information from all data centers, > > which makes it worthwhile to build a customized view aggregation service > to > > “cache” information locally. > > > > > > — > > Best, > > Harry > > On Tue, Oct 23, 2018 at 16:24 kishore g <g.kish...@gmail.com> wrote: > > > > > It's better to use observers since the replication is timeline > consistent > > > i.e changes are seen in the same order as they happened on the > > originating > > > cluster. Achieving correctness is easier with observer model. I agree > > that > > > we might have to replicate changes we don't care but changes to ZK are > > > multiple orders of magnitude smaller than replicating a database. > > > > > > You can still have the aggregation logic as part of the client library. > > > > > > > > > On Tue, Oct 23, 2018 at 2:02 PM zhan849 <g...@git.apache.org> wrote: > > > > > > > Github user zhan849 commented on a diff in the pull request: > > > > > > > > https://github.com/apache/helix/pull/266#discussion_r227562948 > > > > > > > > --- Diff: designs/aggregated-cluster-view/design.md --- > > > > @@ -0,0 +1,353 @@ > > > > +Aggregated Cluster View Design > > > > +============================== > > > > + > > > > +## Introduction > > > > +Currently Helix organize information by cluster - clusters are > > > > autonomous entities that holds resource / node information. > > > > +In real practice, a helix client might need to access aggregated > > > > information of helix clusters from different data center regions for > > > > management or coordination purpose. > > > > +This design proposes a service in Helix ecosystem for clients to > > > > retrieve cross-datacenter information in a more efficient way. > > > > + > > > > + > > > > +## Problem Statement > > > > +We identified a couple of use cases for accessing cross > datacenter > > > > information. [Ambry](https://github.com/linkedin/ambry) is one of > > them. > > > > +Here is a simplified example: some service has Helix cluster > > > > "MyDBCluster" in 3 data centers respectively, and each cluster has a > > > > resource named "MyDB". > > > > +To federate this "MyDBCluster", current usage is to have each > > > > federation client (usually Helix spectator) to connect to metadata > > store > > > > endpoints in all fabrics to retrieve information and aggregate them > > > locally. > > > > +Such usge has the following drawbacks: > > > > + > > > > +* As there are a lot of clients in each DC that need cross-dc > > > > information, there are a lot of expensive cross-dc traffics > > > > +* Every client needs to know information about metadata stores > in > > > all > > > > fabrics which > > > > + * Increases operational cost when these information changes > > > > + * Increases security concern by allowing cross data center > > traffic > > > > + > > > > +To solve the problem, we have the following requirements: > > > > +* Clients should still be able to GET/WATCH aggregated > information > > > > from 1 or more metadata stores (likely but not necessarily from > > different > > > > data centers) > > > > +* Cross DC traffic should be minimized > > > > +* Reduce amount of information about data center that a client > > needs > > > > +* Agility of information aggregation can be configured > > > > +* Currently, it's good enough to have only LiveInstance, > > > > InstanceConfig, and ExternalView aggregated > > > > + > > > > + > > > > + > > > > + > > > > + > > > > +## Proposed Design > > > > + > > > > +To provide aggregated cluster view, the solution I'm proposing > is > > to > > > > add a special type of cluster, i.e. **View Cluster**. > > > > +View cluster leverages current Helix semantics to store > aggregated > > > > information of various **Source Clusters**. > > > > +There will be another micro service (Helix View Aggregator) > > running, > > > > fetching information from clusters (likely from other data centers) > to > > be > > > > aggregated, and store then to the view cluster. > > > > --- End diff -- > > > > > > > > though setting up observer local to clients can potentially > reduce > > > > cross data center traffic, but has a few draw backs: > > > > 1. all data changes will be propagated immediately, and if such > > > > information is not required frequently, there will be wasted traffic. > > > > Building a service makes it possible to customize aggregation > > granularity > > > > 2. Using zookeeper observer leaves aggregation logic to client - > > > > providing aggregated data will make it easier for user to consume > > > > 3. Building a service will leave space to customize aggregated > data > > > in > > > > the future, i.e. if we want to aggregate idea state, we might not > need > > to > > > > aggregate preference list, etc > > > > > > > > Will add these points into design doc > > > > > > > > > > > > --- > > > > > > > > > >