Robert, thank you so much for your insightful answers, I'm wondering if we can
have a meeting to discuss this specially, last week, we discussed this in
openflowplugin weekly meeting, I believe you participated in ODL DDF and joined
discussion about :ODL scale" Luis presented, it will be great let us have a
meeting to focus on this.
Abhijit, Anil, does my suggestion make sense? Robert is the strongest arguer
for akka-based ODL cluster.
Robert, but in ODL cluster, there is only one leader node no matter how many
nodes we have, I think it is bottleneck, isn't it? In my mind, message queue is
the only feasible way to synchronize data in larger scale distributed
application, I'm not sure if akka is using the same way to handle data
synchronization. I would like to get your idea about this. I know akka uses
gossip, but leader node will be responsible for synchronizing data to all the
other follower nodes, this is a big issue, in message queue solution, message
servers can handle this workload, data producer just send data once, in current
ODL cluster, I think, the leader node will send N-1 times data to all the other
follower nodes, please correct me if I'm wrong.
-邮件原件-
发件人: Robert Varga [mailto:n...@hq.sk]
发送时间: 2019年6月5日 3:20
收件人: Yi Yang (杨燚)-云服务集团 ; vishnoia...@gmail.com
抄送: avish...@luminanetworks.com; openflowplugin-...@lists.opendaylight.org;
robert.va...@pantheon.tech; mdsal-...@lists.opendaylight.org;
abhijit.kumbh...@ericsson.com; d...@lists.opendaylight.org;
controller-dev@lists.opendaylight.org
主题: Re: 答复: [controller-dev] 答复: Is Read from follower shard ok and
openflowplugin master must be shard leader?
On 04/06/2019 02:29, Yi Yang (杨燚)-云服务集团 wrote:
> Robert, we're talking about scalability, can you tell us how many nodes
> current akka-base clustering can support at most?
Yi,
I think we have vocabulary (i.e. language) discrepancy. In order to be
clear:
- "performance" means how fast a system is when operating with a certain
working set
- "scalability" means how well a system is able to maintain performance when
the working set is increased. I think you may have meant this when you asked
about IMDT "efficiency", but I can't be sure.
In a potentially-distributed system, there are two distinct parts which affect
how the system can scale:
- "vertical scalability" means how well the system can be scaled by increasing
resources available to individual nodes
- "horizontal scalability" means how well the system can be scaled by
increasing the number of individual nodes
I think it is always more efficient use of resources to allocated them to
scaling vertically rather than horizontally -- each node participating in a
distributed system typically requires non-zero overhead.
The number of potential nodes is limited by what Akka can provide us with --
which I see no problem with based on
https://www.lightbend.com/blog/running-a-2400-akka-nodes-cluster-on-google-compute-engine.
> Per my understanding, current ODL clustering is more like a disaster backup
> solution for data store, I don't think it can work correctly if we have 128
> nodes there.
I am not sure what that understanding is based on. CDS uses an implementation
of RAFT, which does not place artificial limits on the number of participating
nodes.
I do not see any design issue with deploying CDS on such a large number of
nodes. There may be bugs, but those are just bugs -- I do believe it
*will* work correctly.
> In cloud environment, tenants are dynamically creating and destroying VMs,
> which will install and remove flows very often, openflow statistics is also a
> not-small stress for openflow. Per current openflowplugin clustering, one ovs
> node is connected to 3 odl nodes, these are permanent tcp connections, hoe
> many ovs nodes can 3 odl nodes support at most? Anybody tested it, I think it
> won't surpass 100.
That largely depends on what flows are loaded on the switches.
Yes, somebody tested it, and yes, it did surpass 100, thank you:
https://slides.com/dfarrell07/odl-perf/fullscreen#/1
> As I said, config inventory will have 2MB data in a 3 nodes environment, you
> can evaluate how much data is there if we have 1 nodes, do you think
> current ODL replication mechanism can work well?
As I wrote previously, this heavily depends on the structure of the data, what
the application does and how. It also depends on the software being used.
To get definitive answers, I do suggest running some tests and evaluating them.
> I know Pantheon has some commercial deployment in production environments,
> can you tell us how many devices/nodes you can support at most in a 3 node
> ODL cluster?
Not really, sorry.
Even if I could, the numbers depend on the particulars of a deployment and I
have precious little details about what is it exactly you are doing and how --
and thus could not select the relevant data to share.
> Performance and scalability are two things, we always