[controller-dev] 答复: 答复: 答复: Is Read from follower shard ok and openflowplugin master must be shard leader?

2019-06-04 Thread 杨燚
Robert, thank you so much for your insightful answers, I'm wondering if we can 
have a meeting to discuss this specially, last week, we discussed this in 
openflowplugin weekly meeting, I believe you participated in ODL DDF and joined 
discussion about :ODL scale" Luis presented, it will be great let us have a 
meeting to focus on this.

Abhijit, Anil, does my suggestion make sense? Robert is the strongest arguer 
for akka-based ODL cluster.

Robert, but in ODL cluster, there is only one leader node no matter how many 
nodes we have, I think it is bottleneck, isn't it? In my mind, message queue is 
the only feasible way to synchronize data in larger scale distributed 
application, I'm not sure if akka is using the same way to handle data 
synchronization. I would like to get your idea about this. I know akka uses 
gossip, but leader node will be responsible for synchronizing data to all the 
other follower nodes, this is a big issue, in message queue solution, message 
servers can handle this workload, data producer just send data once, in current 
ODL cluster, I think, the leader node will send N-1 times data to all the other 
follower nodes, please correct me if I'm wrong.

-邮件原件-
发件人: Robert Varga [mailto:n...@hq.sk] 
发送时间: 2019年6月5日 3:20
收件人: Yi Yang (杨燚)-云服务集团 ; vishnoia...@gmail.com
抄送: avish...@luminanetworks.com; openflowplugin-...@lists.opendaylight.org; 
robert.va...@pantheon.tech; mdsal-...@lists.opendaylight.org; 
abhijit.kumbh...@ericsson.com; d...@lists.opendaylight.org; 
controller-dev@lists.opendaylight.org
主题: Re: 答复: [controller-dev] 答复: Is Read from follower shard ok and 
openflowplugin master must be shard leader?

On 04/06/2019 02:29, Yi Yang (杨燚)-云服务集团 wrote:
> Robert, we're talking about scalability, can you tell us how many nodes 
> current akka-base clustering can support at most?

Yi,

I think we have vocabulary (i.e. language) discrepancy. In order to be
clear:

- "performance" means how fast a system is when operating with a certain 
working set

- "scalability" means how well a system is able to maintain performance when 
the working set is increased. I think you may have meant this when you asked 
about IMDT "efficiency", but I can't be sure.

In a potentially-distributed system, there are two distinct parts which affect 
how the system can scale:

- "vertical scalability" means how well the system can be scaled by increasing 
resources available to individual nodes

- "horizontal scalability" means how well the system can be scaled by 
increasing the number of individual nodes

I think it is always more efficient use of resources to allocated them to 
scaling vertically rather than horizontally -- each node participating in a 
distributed system typically requires non-zero overhead.

The number of potential nodes is limited by what Akka can provide us with -- 
which I see no problem with based on 
https://www.lightbend.com/blog/running-a-2400-akka-nodes-cluster-on-google-compute-engine.

> Per my understanding, current ODL clustering is more like a disaster backup 
> solution for data store, I don't think it can work correctly if we have 128 
> nodes there.

I am not sure what that understanding is based on. CDS uses an implementation 
of RAFT, which does not place artificial limits on the number of participating 
nodes.

I do not see any design issue with deploying CDS on such a large number of 
nodes. There may be bugs, but those are just bugs -- I do believe it
*will* work correctly.

> In cloud environment, tenants are dynamically creating and destroying VMs, 
> which will install and remove flows very often, openflow statistics is also a 
> not-small stress for openflow. Per current openflowplugin clustering, one ovs 
> node is connected to 3 odl nodes, these are permanent tcp connections, hoe 
> many ovs nodes can 3 odl nodes support at most? Anybody tested it, I think it 
> won't surpass 100.

That largely depends on what flows are loaded on the switches.

Yes, somebody tested it, and yes, it did surpass 100, thank you:
https://slides.com/dfarrell07/odl-perf/fullscreen#/1

> As I said, config inventory will have 2MB data in a 3 nodes environment, you 
> can evaluate how much data is there if we have 1 nodes, do you think 
> current ODL replication mechanism can work well?

As I wrote previously, this heavily depends on the structure of the data, what 
the application does and how. It also depends on the software being used.

To get definitive answers, I do suggest running some tests and evaluating them.

> I know Pantheon has some commercial deployment in production environments, 
> can you tell us how many devices/nodes you can support at most in a 3 node 
> ODL cluster?

Not really, sorry.

Even if I could, the numbers depend on the particulars of a deployment and I 
have precious little details about what is it exactly you are doing and how -- 
and thus could not select the relevant data to share.

> Performance and scalability are two things, we always 

Re: [controller-dev] 答复: 答复: 答复: Is Read from follower shard ok and openflowplugin master must be shard leader?

2019-06-06 Thread Robert Varga
On 05/06/2019 02:49, Yi Yang (杨燚)-云服务集团 wrote:
> Robert, thank you so much for your insightful answers, I'm wondering if we 
> can have a meeting to discuss this specially, last week, we discussed this in 
> openflowplugin weekly meeting, I believe you participated in ODL DDF and 
> joined discussion about :ODL scale" Luis presented, it will be great let us 
> have a meeting to focus on this.

Well, I believe in rough consensus and running code -- and I have not
seen anything aside from high-level slides so far.

> Abhijit, Anil, does my suggestion make sense? Robert is the strongest arguer 
> for akka-based ODL cluster.

I am not arguing either way. What I am doing here is dispelling
misconceptions and trying to understand the *exact* nature of the problem.

> Robert, but in ODL cluster, there is only one leader node no matter how many 
> nodes we have, I think it is bottleneck, isn't it?

That is not correct. Every Shard has its own leader and it is a matter
of how shards are laid out.

> In my mind, message queue is the only feasible way to synchronize data in 
> larger scale distributed application, I'm not sure if akka is using the same 
> way to handle data synchronization. I would like to get your idea about this. 
> I know akka uses gossip, but leader node will be responsible for 
> synchronizing data to all the other follower nodes, this is a big issue, in 
> message queue solution, message servers can handle this workload, data 
> producer just send data once, in current ODL cluster, I think, the leader 
> node will send N-1 times data to all the other follower nodes, please correct 
> me if I'm wrong.

I do not see how a message queue (directly) is solving the problem. At
the end of the day, every actor has a message queue, so what are we
really talking about?

This thread has been heavy on CDS criticism, without ever mentioning
what the applications are doing, how or why.

Before jumping to solutions, I *really* would like us understanding the
requirements and the problem we are faced with -- software engineering
is *engineering*, i.e. picking reasonable trade-offs from the solution
space for a given problem space and constraints.

Currently, there has been no data floated in this thread, so ... can we
follow UNPHAT, as per
https://blog.bradfieldcs.com/you-are-not-google-84912cf44afb ?

Regards,
Robert



signature.asc
Description: OpenPGP digital signature
___
controller-dev mailing list
controller-dev@lists.opendaylight.org
https://lists.opendaylight.org/mailman/listinfo/controller-dev