Re: [openstack-dev] [tricircle] multiple cascade services

2015-08-30 Thread Zhipeng Huang
Hi Joe,

I think you misunderstood what Eran proposed.

Eran proposed a "single service/multi-fake-node" scheme not only to enforce
state-sync (as what ZK usually is used for), but also the execution *order*

It means that even if we implement like what PoC did: multiple-services/one
service per bottom node, we still need another upper layer that provide an
ordered view of those cascade services.

I think what Eran proposed is that, to make Tricircle as an independent
service like we envisioned. Therefore Tricircle only represent one cascade
service, which present an state-synced/order-preserved bottom OpenStack
instances to the Top, via one set of API or RPC Call interfaces.

When you deploy Tricircle, like any other OpenStack services, you implement
multiple necessary process. Fake nodes would be spawn like any other
processes, and there is avialable techniques to make these fake nodes
synced/ordered, in a active/passive. These are ,as Eran mentioned,
implementation details.

In essence, fake nodes is just like those multiple cascade services running
in parallel in PoC design. However in order to make Tricircle more like an
OpenStack standard service, and better cooperation with Mistral on task
order, it would be a good idea to let Tricircle provide an abstract
interface all together at the top, and running fake node processes inside.

My 2 cents, not sure if I got it all right :)




On Sat, Aug 29, 2015 at 9:42 AM, joehuang  wrote:

> Hi,
>
>
>
> I think you may have some misunderstanding on the PoC design. (the proxy
> node only to listen the RPC to compute-node/cinder-volume/L2/L3 agent…)
>
>
>
> 1)  The cascading layer including the proxy nodes are assumed running
> in VMs but not in physical servers (you can do that). Even in CJK
> intercloud ( China, Japan, Korea ) intercloud, the cascading layer
> including API,messagebus, DB, proxy nodes are running in VMs
>
>
>
> 2)  For proxy nodes running in VMs, it's not strange that  multiple
> proxy nodes running over one physical server. And if the load of one proxy
> nodes increased, it’s easy to move VM from one physical server to another,
> this is quite mature technology and easy to monitor, to deal with. And most
> of virtualization also support hot scale-up for one virtual machine.
>
>
>
> 3)  It's already in some scenario that the ZooKeeper is used to
> manage the proxy node role and membership. And backup node will take over
> the responsibility of the failed node.
>
>
>
> So I did not see that “fake node” mode will bring extra benefit. On the
> other hand, the “fake node” add additional complexity:
>
>
>
> 1 ) the complexity of the code in cascade service, to implement the RPC to
> scheduler and the RPC to compute node/cinder volume.
>
>
>
> 2 ) how to judge the load of a “fake node”.  If all “fake-nodes” will run
> flatly(no special process or thread, just a symbol) in the same process,
> then how can you judge the load of a “fake node”, by message number ? but
> message number does not imply the  load. The load is often measured through
> CPU utilization / memory occupy, so how to calculate the load for each
> “fake node” and then make decision to move which nodes to other physical
> server? How to manage this “fake-node” in Zookeeper like cluster ware. You
> may want to make fake-node run in different process or thread space, then
> you need to manage “fake-node” and process/thread relationship.
>
>
>
> I admit that the proposal 3 is much more complex to make it work for the
> flexible load balance. We have to record relative stamp for each message in
> the queue, pick the message from message bus, and put the message into task
> queue for each site in DB, then execute this task in order.
>
>
>
> As what has been described above that the proposal 2 does not bring extra
> benefit, and if we don’t want to strive for the 3rd direction, we’d
> better fallback to the proposal 1.
>
>
>
> Best Regards
>
> Chaoyi Huang ( Joe Huang )
>
>
>
> *From:* e...@gampel.co.il [mailto:e...@gampel.co.il] *On Behalf Of *Eran
> Gampel
> *Sent:* Thursday, August 27, 2015 7:07 PM
> *To:* joehuang; Irena Berezovsky; Eshed Gal-Or; Ayal Baron; OpenStack
> Development Mailing List (not for usage questions); caizhiyuan (A); Saggi
> Mizrahi; Orran Krieger; Gal Sagie; Orran Krieger; Zhipeng Huang
> *Subject:* Re: [openstack-dev][tricircle] multiple cascade services
>
>
>
> Hi,
>
> Please see my comments inline
>
> BR,
>
> Eran
>
>
>
> Hello,
>
>
>
> As what we discussed in the yesterday’s meeting, the contradict is how to
> scale out cascade services.
>
>
>
> 1)  In PoC, one proxy node will only forward to one bottom openstack,
> the

Re: [openstack-dev] [tricircle] multiple cascade services

2015-08-28 Thread joehuang
Hi,

I think you may have some misunderstanding on the PoC design. (the proxy node 
only to listen the RPC to compute-node/cinder-volume/L2/L3 agent…)


1)  The cascading layer including the proxy nodes are assumed running in 
VMs but not in physical servers (you can do that). Even in CJK intercloud ( 
China, Japan, Korea ) intercloud, the cascading layer including API,messagebus, 
DB, proxy nodes are running in VMs



2)  For proxy nodes running in VMs, it's not strange that  multiple proxy 
nodes running over one physical server. And if the load of one proxy nodes 
increased, it’s easy to move VM from one physical server to another, this is 
quite mature technology and easy to monitor, to deal with. And most of 
virtualization also support hot scale-up for one virtual machine.



3)  It's already in some scenario that the ZooKeeper is used to manage the 
proxy node role and membership. And backup node will take over the 
responsibility of the failed node.


So I did not see that “fake node” mode will bring extra benefit. On the other 
hand, the “fake node” add additional complexity:

1 ) the complexity of the code in cascade service, to implement the RPC to 
scheduler and the RPC to compute node/cinder volume.

2 ) how to judge the load of a “fake node”.  If all “fake-nodes” will run 
flatly(no special process or thread, just a symbol) in the same process, then 
how can you judge the load of a “fake node”, by message number ? but message 
number does not imply the  load. The load is often measured through CPU 
utilization / memory occupy, so how to calculate the load for each “fake node” 
and then make decision to move which nodes to other physical server? How to 
manage this “fake-node” in Zookeeper like cluster ware. You may want to make 
fake-node run in different process or thread space, then you need to manage 
“fake-node” and process/thread relationship.

I admit that the proposal 3 is much more complex to make it work for the 
flexible load balance. We have to record relative stamp for each message in the 
queue, pick the message from message bus, and put the message into task queue 
for each site in DB, then execute this task in order.

As what has been described above that the proposal 2 does not bring extra 
benefit, and if we don’t want to strive for the 3rd direction, we’d better 
fallback to the proposal 1.

Best Regards
Chaoyi Huang ( Joe Huang )

From: e...@gampel.co.il [mailto:e...@gampel.co.il] On Behalf Of Eran Gampel
Sent: Thursday, August 27, 2015 7:07 PM
To: joehuang; Irena Berezovsky; Eshed Gal-Or; Ayal Baron; OpenStack Development 
Mailing List (not for usage questions); caizhiyuan (A); Saggi Mizrahi; Orran 
Krieger; Gal Sagie; Orran Krieger; Zhipeng Huang
Subject: Re: [openstack-dev][tricircle] multiple cascade services

Hi,
Please see my comments inline
BR,
Eran

Hello,

As what we discussed in the yesterday’s meeting, the contradict is how to scale 
out cascade services.


1)  In PoC, one proxy node will only forward to one bottom openstack, the 
proxy node will be added to a regarding AZ, and multiple proxy nodes for one 
bottom OpenStack is feasible by adding more proxy nodes into this AZ, and the 
proxy node will be scheduled like usual.



Is this perfect? No. Because the VM’s host attribute is binding to a specific 
proxy node, therefore, these multiple proxy nodes can’t work in cluster mode, 
and each proxy node has to be backup by one slave node.



[Eran] I agree with this point - In the PoC you had a limitation of single 
active proxy per bottom site.  In addition, each proxy could only support a 
Single bottom site by-design.



2)  The fake node introduced in the cascade service.

Because fanout rpc call for Neutron API is assumed, then no multiple fake nodes 
for one bottom openstack is allowed.



[Eran] In fact, this is not a limitation in the current design.  We could have 
multiple "fake nodes" to handle the same bottom site, but only 1 that is 
Active.  If this Active node becomes unavailable, one of the other "Passive" 
nodes can take over with some leader-election or any other known design pattern 
(it's an implementation decision).

And because the traffic to one bottom OpenStack is un-predictable, and move 
these fake nodes dynamically among cascade service is very complicated, 
therefore we can’t deploy multiple fake nodes in one cascade service.



[Eran] I'm not sure I follow you on this point... as we see it, there are 3 
places where load is an issue (and potential bottleneck):

1. API + message queue + database

2. Cascading Service itself (dependency builder, communication service, DAL)

3. Task execution



I think you were concerned about #2, which in our design must be a 
single-active per bottom site (to maintain task order of execution).

In our opinion, the heaviest part is actually #3 (task execution), which is 
delegated to a separate execution path (Mistral workflow or otherwise)