Re: [openstack-dev] [openstack-ansible] Debugging slow Xenial gate

2016-11-10 Thread Major Hayden
On 11/02/2016 08:51 AM, Major Hayden wrote:
> At this point, I'm still trying to test some additional theories. Does anyone 
> have any other ideas?

Here's an update for today.  There are a few bugs open now:

  OpenStack-Ansible bug: 
https://bugs.launchpad.net/openstack-ansible/+bug/1637494
  Ubuntu python2.7 bug: 
https://bugs.launchpad.net/ubuntu/+source/python2.7/+bug/1638695

The suggestion from the python2.7 bug is to compile python 2.7.12 with gcc-4.8 
on 16.04 to see if the performance issue is related to GCC.  I haven't had a 
chance to test that out yet, but if someone else has a moment to try it, I'd be 
much obliged. ;)

There is also a private bug opened with Canonical that has been escalated as 
part of my company's support contract with Canonical.  I'll provide relevant 
updates from that bug when I get them.

--
Major Hayden



signature.asc
Description: OpenPGP digital signature
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [openstack-ansible] Debugging slow Xenial gate

2016-11-02 Thread Jesse Pretorius
> On 11/2/16, 1:51 PM, "Major Hayden"  wrote:
>I tossed up a horribly written hack[0] to change some CPU scheduler 
> settings back to the Trusty settings.  My initial tests were great!  Also, 
> the first test in OpenStack CI was really good --  62 minutes for trusty and 
> 65 minutes for xenial.  However, that seems to be a fluke since the second 
> test had a 30 minute gap between the test durations. :(

I think that difference was due to the hardware/contention profiles of the 
different nodepool providers. You’ll have to do tests somewhere we you can 
execute on a consistent hardware profile, ideally with no other contention on 
the host, in order to get reliable comparisons.

I think Logan may be able to help with that. Alternatively perhaps you can get 
access to an OSIC host or instance for testing?




Rackspace Limited is a company registered in England & Wales (company 
registered number 03897010) whose registered office is at 5 Millington Road, 
Hyde Park Hayes, Middlesex UB3 4AZ. Rackspace Limited privacy policy can be 
viewed at www.rackspace.co.uk/legal/privacy-policy - This e-mail message may 
contain confidential or privileged information intended for the recipient. Any 
dissemination, distribution or copying of the enclosed material is prohibited. 
If you receive this transmission in error, please notify us immediately by 
e-mail at ab...@rackspace.com and delete the original message. Your cooperation 
is appreciated.
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [openstack-ansible] Debugging slow Xenial gate

2016-11-02 Thread Major Hayden
On 10/28/2016 04:02 AM, Major Hayden wrote:
> On the topic of threads, the sysbench output from both Trusty and Xenial are 
> nearly identical with the exception of threads.  Trusty is usually about 
> 15-20% faster on that benchmark than Xenial.

I spoke with a few other people and it seems like the culprit could be a CPU 
scheduler difference and/or a glibc change.  After messing around with perf for 
a long time, I found that context switches and CPU migrations were slightly 
higher on Xenial than Trusty, but by a negligible amount (< 10% at worst).

I tossed up a horribly written hack[0] to change some CPU scheduler settings 
back to the Trusty settings.  My initial tests were great!  Also, the first 
test in OpenStack CI was really good --  62 minutes for trusty and 65 minutes 
for xenial.  However, that seems to be a fluke since the second test had a 30 
minute gap between the test durations. :(

Those scheduler changes for busy_factor, min_interval, and max_interval appear 
to have been made in the upstream Linux kernel, and they're present on various 
distributions like Ubuntu, CentOS, and Fedora.

At this point, I'm still trying to test some additional theories. Does anyone 
have any other ideas?

[0] https://review.openstack.org/392316

--
Major Hayden



signature.asc
Description: OpenPGP digital signature
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [openstack-ansible] Debugging slow Xenial gate

2016-10-28 Thread Major Hayden
On 10/28/2016 10:17 AM, Major Hayden wrote:
>> Also, when running the tests on both systems, track cpu usage and number
>> > of threads to see if one has more restrictions than the other.
> Almost no difference here.

On the topic of threads, the sysbench output from both Trusty and Xenial are 
nearly identical with the exception of threads.  Trusty is usually about 15-20% 
faster on that benchmark than Xenial.

That leads me to rule out a few things:

  1) It's probably not python that is slow since it affects sysbench, too
  2) The kernel version doesn't seem to make a difference
  3) The way python was compiled doesn't matter (I tried pyenv)
  4) Kernel tunables (via sysctl) look very similar, especially with regard to 
threads

I also ran the full suite of tests from nova and got these results:

  Trusty: 375 seconds
  Xenial: 531 seconds
 
--
Major Hayden



signature.asc
Description: OpenPGP digital signature
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [openstack-ansible] Debugging slow Xenial gate

2016-10-28 Thread Major Hayden
On 10/28/2016 01:44 AM, Mike Carden wrote:
> I bounced this off my 'distro differences' goto guy, Chris Smart. Here are 
> his thoughts:
> 
> "Run the 14.04 kernel on 16.04 system and re-run the tests to see if it's
> kernel related.
> 
> If 16.04 userland with 14.04 kernel is as fast as Ubuntu 14.04, then
> compare the kernel .config files to see if there were major changes,
> like switching out schedulers.

14.04 with 16.04's kernel is actually just a small amount (~ 3-5%) faster than 
14.04 with its standard kernel.

> Also, when running the tests on both systems, track cpu usage and number
> of threads to see if one has more restrictions than the other.

Almost no difference here.

> Check swappiness and also "vmstat 1" to see if you're getting more pages
> swapped in and out in 16.04.

No difference here, either.

> I'm assuming that the two virtual machines are identical (CPU type, memory,
> threads, virtio, etc)."

They are!  We've seen this occur in the OpenStack CI jobs (with KVM), and I've 
also tested this with Xen and bare metal.

--
Major Hayden



signature.asc
Description: OpenPGP digital signature
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [openstack-ansible] Debugging slow Xenial gate

2016-10-27 Thread Mike Carden
Major,

I bounced this off my 'distro differences' goto guy, Chris Smart. Here are
his thoughts:

"Run the 14.04 kernel on 16.04 system and re-run the tests to see if it's
kernel related.

If 16.04 userland with 14.04 kernel is as fast as Ubuntu 14.04, then
compare the kernel .config files to see if there were major changes,
like switching out schedulers.

Also, when running the tests on both systems, track cpu usage and number
of threads to see if one has more restrictions than the other.

Check swappiness and also "vmstat 1" to see if you're getting more pages
swapped in and out in 16.04.

I'm assuming that the two virtual machines are identical (CPU type, memory,
threads, virtio, etc)."

-- 
MC
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [openstack-ansible] Debugging slow Xenial gate

2016-10-27 Thread Major Hayden
Hey there,

We've talked about the slow Xenial gate during the OpenStack Summit this week 
and I decided to do a little digging.  I built two quick test instances: one 
with Trusty and the other with Xenial.

Trusty comes with python 2.7.6 and Xenial has 2.7.12.  Here are the initial 
comparisons:

  https://gist.github.com/major/20d7d11442685355c30d0abf0c07be98

The worst test shows that 2.7.12 on Xenial is 1.88 slower than 2.7.6 on Trusty. 
Wow.

I compiled 2.7.12 from source on Xenial to see if it's a packaging issue, but 
that didn't change anything much.  I then compiled 2.7.12 on 14.04 and found it 
be to be slightly slower than 2.7.6 on 14.04, but faster than 2.7.12 on 16.04.  
That's confusing, so here's a ranking from fastest to slowest performance:

1) 2.7.6 on Ubuntu 14.04 (fastest)
2) 2.7.12 compiled from source on Ubuntu 14.04 (a little slower than #1)
3) 2.7.12 compiled from source on Ubuntu 16.04 (slightly faster than #4)
4) 2.7.12 on Ubuntu 16.04 (significant slower than #1)

It's evident that 2.7.12 is a little bit slower, but something in Ubuntu 16.04 
makes it much worse.  I checked sysctl settings and the only big difference was 
the max threads per process (16.04 was about half of 14.04).  I set them both 
to the same value but the performance testing didn't change.

Does anyone else have any ideas of what might be causing this?

--
Major Hayden



signature.asc
Description: OpenPGP digital signature
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev