Re: [openstack-dev] [neutron] Neutron scaling datapoints?

Neil Jerram Wed, 15 Apr 2015 06:50:28 -0700

Hi again Joe, (+ list)

On 11/04/15 02:00, joehuang wrote:

Hi, Neil,


See inline comments.

Best Regards

Chaoyi Huang

________________________________________
From: Neil Jerram [neil.jer...@metaswitch.com]
Sent: 09 April 2015 23:01
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [neutron] Neutron scaling datapoints?

Hi Joe,

Many thanks for your reply!

On 09/04/15 03:34, joehuang wrote:

Hi, Neil,

  From theoretic, Neutron is like a "broadcast" domain, for example, enforcement of DVR and 
security group has to touch each regarding host where there is VM of this project resides. Even using SDN 
controller, the "touch" to regarding host is inevitable. If there are plenty of physical hosts, for 
example, 10k, inside one Neutron, it's very hard to overcome the "broadcast storm" issue under 
concurrent operation, that's the bottleneck for scalability of Neutron.


I think I understand that in general terms - but can you be more
specific about the broadcast storm?  Is there one particular message
exchange that involves broadcasting?  Is it only from the server to
agents, or are there 'broadcasts' in other directions as well?

[[joehuang]] for example, L2 population, Security group rule update, DVR route 
update. Both direction in different scenario.

Thanks. In case it's helpful to see all the cases together,sync_routers (from the L3 agent) was also mentioned in other part ofthis thread. Plus of course the liveness reporting from all agents.

(I presume you are talking about control plane messages here, i.e.
between Neutron components.  Is that right?  Obviously there can also be
broadcast storm problems in the data plane - but I don't think that's
what you are talking about here.)

[[joehuang]] Yes, controll plane here.


Thanks for confirming that.

We need layered architecture in Neutron to solve the "broadcast domain" bottleneck of 
scalability. The test report from OpenStack cascading shows that through layered architecture 
"Neutron cascading", Neutron can supports up to million level ports and 100k level 
physical hosts. You can find the report here: 
http://www.slideshare.net/JoeHuang7/test-report-for-open-stack-cascading-solution-to-support-1-million-v-ms-in-100-data-centers


Many thanks, I will take a look at this.

It was very interesting, thanks. And by following through your links Ialso learned more about Nova cells, and about how some people questionwhether we need any kind of partitioning at all, and should insteadsolve scaling/performance problems in other ways... It will beinteresting to see how this plays out.

I'd still like to see more information, though, about how far peoplehave scaled OpenStack - and in particular Neutron - as it exists today.Surely having a consensus set of current limits is an important inputinto any discussion of future scaling work.

For example, Kevin mentioned benchmarking where the Neutron serverprocessed a liveness update in <50ms and a sync_routers in 300ms.Suppose, the liveness update time was 50ms (since I don't know in detailwhat that < means) and agents report liveness every 30s. Does that meanthat a single Neutron server can only support 600 agents?

I'm also especially interested in the DHCP agent, because in Calico wehave one of those on every compute host. We've just run tests whichappeared to be hitting trouble from just 50 compute hosts onwards, andapparently because of DHCP agent communications. We need to continuelooking into that and report findings properly, but if anyone alreadyhas any insights, they would be much appreciated.


Many thanks,
        Neil

__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [neutron] Neutron scaling datapoints?

Reply via email to