On Fri, Jun 9, 2017 at 5:25 AM, Dmitry Tantsur <dtant...@redhat.com> wrote: > On 06/08/2017 02:21 PM, Justin Kilpatrick wrote: >> >> Morning everyone, >> >> I've been working on a performance testing tool for TripleO hardware >> provisioning operations off and on for about a year now and I've been >> using it to try and collect more detailed data about how TripleO >> performs in scale and production use cases. Perhaps more importantly >> YODA (Yet Openstack Deployment Tool, Another) automates the task >> enough that days of deployment testing is a set it and forget it >> operation. > >> You can find my testing tool here [0] and the test report [1] has >> links to raw data and visualization. Just scroll down, click the >> capcha and click "go to kibana". I still need to port that machine >> from my own solution over to search guard. >> >> If you have too much email to consider clicking links I'll copy the >> results summary here. >> >> TripleO inspection workflows have seen massive improvements from >> Newton with a failure rate for 50 nodes with the default workflow >> falling from 100% to <15%. Using patches slated for Pike that spurious >> failure rate reaches zero. > > > \o/ > >> >> Overcloud deployments show a significant improvement of deployment >> speed in HA and stack update tests. >> >> Ironic deployments in the overcloud allow the use of Ironic for bare >> metal scale out alongside more traditional VM compute. Considering a >> single conductor starts to struggle around 300 nodes it will be >> difficult to push a multi conductor setup to it's limits. > > > This number of "300", does it come from your testing or from other sources?
Dmitry - The "300" comes from my testing on different environments. Most recently, here is what I saw at CNCF - https://snapshot.raintank.io/dashboard/snapshot/Sp2wuk2M5adTpqfXMJenMXcSlCav2PiZ The undercloud was "idle" during this period. > If the former, which driver were you using? pxe_ipmitool. > What exactly problems have you seen approaching this number? I would have to restart ironic-conductor before every scale-up, which here is what ironic-conductor looks like after a restart https://snapshot.raintank.io/dashboard/snapshot/Im3AxP6qUfMnTeB97kryUcQV6otY0bHP . Without restarting ironic, the scale up would fail due to ironic (I do not have the exact error we would encounter documented). > >> >> Finally Ironic node cleaning, shows a similar failure rate to >> inspection and will require similar attention in TripleO workflows to >> become painless. > > > Could you please elaborate? (a bug could also help). What exactly were you > doing? > >> >> [0] https://review.openstack.org/#/c/384530/ >> [1] >> https://docs.google.com/document/d/194ww0Pi2J-dRG3-X75mphzwUZVPC2S1Gsy1V0K0PqBo/ >> >> Thanks for your time! > > > Thanks for YOUR time, this work is extremely valuable! > > >> >> - Justin >> >> __________________________________________________________________________ >> OpenStack Development Mailing List (not for usage questions) >> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >> > > > __________________________________________________________________________ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev