Re: [openstack-dev] [tricircle] multiple cascade services
Hi Joe, I think you misunderstood what Eran proposed. Eran proposed a "single service/multi-fake-node" scheme not only to enforce state-sync (as what ZK usually is used for), but also the execution *order* It means that even if we implement like what PoC did: multiple-services/one service per bottom node, we still need another upper layer that provide an ordered view of those cascade services. I think what Eran proposed is that, to make Tricircle as an independent service like we envisioned. Therefore Tricircle only represent one cascade service, which present an state-synced/order-preserved bottom OpenStack instances to the Top, via one set of API or RPC Call interfaces. When you deploy Tricircle, like any other OpenStack services, you implement multiple necessary process. Fake nodes would be spawn like any other processes, and there is avialable techniques to make these fake nodes synced/ordered, in a active/passive. These are ,as Eran mentioned, implementation details. In essence, fake nodes is just like those multiple cascade services running in parallel in PoC design. However in order to make Tricircle more like an OpenStack standard service, and better cooperation with Mistral on task order, it would be a good idea to let Tricircle provide an abstract interface all together at the top, and running fake node processes inside. My 2 cents, not sure if I got it all right :) On Sat, Aug 29, 2015 at 9:42 AM, joehuang wrote: > Hi, > > > > I think you may have some misunderstanding on the PoC design. (the proxy > node only to listen the RPC to compute-node/cinder-volume/L2/L3 agent…) > > > > 1) The cascading layer including the proxy nodes are assumed running > in VMs but not in physical servers (you can do that). Even in CJK > intercloud ( China, Japan, Korea ) intercloud, the cascading layer > including API,messagebus, DB, proxy nodes are running in VMs > > > > 2) For proxy nodes running in VMs, it's not strange that multiple > proxy nodes running over one physical server. And if the load of one proxy > nodes increased, it’s easy to move VM from one physical server to another, > this is quite mature technology and easy to monitor, to deal with. And most > of virtualization also support hot scale-up for one virtual machine. > > > > 3) It's already in some scenario that the ZooKeeper is used to > manage the proxy node role and membership. And backup node will take over > the responsibility of the failed node. > > > > So I did not see that “fake node” mode will bring extra benefit. On the > other hand, the “fake node” add additional complexity: > > > > 1 ) the complexity of the code in cascade service, to implement the RPC to > scheduler and the RPC to compute node/cinder volume. > > > > 2 ) how to judge the load of a “fake node”. If all “fake-nodes” will run > flatly(no special process or thread, just a symbol) in the same process, > then how can you judge the load of a “fake node”, by message number ? but > message number does not imply the load. The load is often measured through > CPU utilization / memory occupy, so how to calculate the load for each > “fake node” and then make decision to move which nodes to other physical > server? How to manage this “fake-node” in Zookeeper like cluster ware. You > may want to make fake-node run in different process or thread space, then > you need to manage “fake-node” and process/thread relationship. > > > > I admit that the proposal 3 is much more complex to make it work for the > flexible load balance. We have to record relative stamp for each message in > the queue, pick the message from message bus, and put the message into task > queue for each site in DB, then execute this task in order. > > > > As what has been described above that the proposal 2 does not bring extra > benefit, and if we don’t want to strive for the 3rd direction, we’d > better fallback to the proposal 1. > > > > Best Regards > > Chaoyi Huang ( Joe Huang ) > > > > *From:* e...@gampel.co.il [mailto:e...@gampel.co.il] *On Behalf Of *Eran > Gampel > *Sent:* Thursday, August 27, 2015 7:07 PM > *To:* joehuang; Irena Berezovsky; Eshed Gal-Or; Ayal Baron; OpenStack > Development Mailing List (not for usage questions); caizhiyuan (A); Saggi > Mizrahi; Orran Krieger; Gal Sagie; Orran Krieger; Zhipeng Huang > *Subject:* Re: [openstack-dev][tricircle] multiple cascade services > > > > Hi, > > Please see my comments inline > > BR, > > Eran > > > > Hello, > > > > As what we discussed in the yesterday’s meeting, the contradict is how to > scale out cascade services. > > > > 1) In PoC, one proxy node will only forward to one bottom openstack, > the
Re: [openstack-dev] [tricircle] multiple cascade services
Hi, I think you may have some misunderstanding on the PoC design. (the proxy node only to listen the RPC to compute-node/cinder-volume/L2/L3 agent…) 1) The cascading layer including the proxy nodes are assumed running in VMs but not in physical servers (you can do that). Even in CJK intercloud ( China, Japan, Korea ) intercloud, the cascading layer including API,messagebus, DB, proxy nodes are running in VMs 2) For proxy nodes running in VMs, it's not strange that multiple proxy nodes running over one physical server. And if the load of one proxy nodes increased, it’s easy to move VM from one physical server to another, this is quite mature technology and easy to monitor, to deal with. And most of virtualization also support hot scale-up for one virtual machine. 3) It's already in some scenario that the ZooKeeper is used to manage the proxy node role and membership. And backup node will take over the responsibility of the failed node. So I did not see that “fake node” mode will bring extra benefit. On the other hand, the “fake node” add additional complexity: 1 ) the complexity of the code in cascade service, to implement the RPC to scheduler and the RPC to compute node/cinder volume. 2 ) how to judge the load of a “fake node”. If all “fake-nodes” will run flatly(no special process or thread, just a symbol) in the same process, then how can you judge the load of a “fake node”, by message number ? but message number does not imply the load. The load is often measured through CPU utilization / memory occupy, so how to calculate the load for each “fake node” and then make decision to move which nodes to other physical server? How to manage this “fake-node” in Zookeeper like cluster ware. You may want to make fake-node run in different process or thread space, then you need to manage “fake-node” and process/thread relationship. I admit that the proposal 3 is much more complex to make it work for the flexible load balance. We have to record relative stamp for each message in the queue, pick the message from message bus, and put the message into task queue for each site in DB, then execute this task in order. As what has been described above that the proposal 2 does not bring extra benefit, and if we don’t want to strive for the 3rd direction, we’d better fallback to the proposal 1. Best Regards Chaoyi Huang ( Joe Huang ) From: e...@gampel.co.il [mailto:e...@gampel.co.il] On Behalf Of Eran Gampel Sent: Thursday, August 27, 2015 7:07 PM To: joehuang; Irena Berezovsky; Eshed Gal-Or; Ayal Baron; OpenStack Development Mailing List (not for usage questions); caizhiyuan (A); Saggi Mizrahi; Orran Krieger; Gal Sagie; Orran Krieger; Zhipeng Huang Subject: Re: [openstack-dev][tricircle] multiple cascade services Hi, Please see my comments inline BR, Eran Hello, As what we discussed in the yesterday’s meeting, the contradict is how to scale out cascade services. 1) In PoC, one proxy node will only forward to one bottom openstack, the proxy node will be added to a regarding AZ, and multiple proxy nodes for one bottom OpenStack is feasible by adding more proxy nodes into this AZ, and the proxy node will be scheduled like usual. Is this perfect? No. Because the VM’s host attribute is binding to a specific proxy node, therefore, these multiple proxy nodes can’t work in cluster mode, and each proxy node has to be backup by one slave node. [Eran] I agree with this point - In the PoC you had a limitation of single active proxy per bottom site. In addition, each proxy could only support a Single bottom site by-design. 2) The fake node introduced in the cascade service. Because fanout rpc call for Neutron API is assumed, then no multiple fake nodes for one bottom openstack is allowed. [Eran] In fact, this is not a limitation in the current design. We could have multiple "fake nodes" to handle the same bottom site, but only 1 that is Active. If this Active node becomes unavailable, one of the other "Passive" nodes can take over with some leader-election or any other known design pattern (it's an implementation decision). And because the traffic to one bottom OpenStack is un-predictable, and move these fake nodes dynamically among cascade service is very complicated, therefore we can’t deploy multiple fake nodes in one cascade service. [Eran] I'm not sure I follow you on this point... as we see it, there are 3 places where load is an issue (and potential bottleneck): 1. API + message queue + database 2. Cascading Service itself (dependency builder, communication service, DAL) 3. Task execution I think you were concerned about #2, which in our design must be a single-active per bottom site (to maintain task order of execution). In our opinion, the heaviest part is actually #3 (task execution), which is delegated to a separate execution path (Mistral workflow or otherwise)