One notice: Even on the super-super-fast SSD, there is a huge overhead on IO. Basically, you can't go lower than 50 us on IO, and this is 50000 ns, almost eternity for the modern processors. And you get minor page fault, which is not the fastest thing in the world. Few context switching, filesystem/block device level... And 50us - is the best possible. Normally you will have something like 150us, which is very slow.

It's ok to push to swap some unused or rarely used part of the guests memory, but do not expect it to be silver bullet. Borderline between 'normal swap operations' and 'thrashed system' is very blurry, and main symptom your guests will experience during overswapping is extreme raise of latency (everything: IO, networking...). And when this happens you will have no knobs to fix things... Even if you kill some of the guests, it will take up to 10 minutes to finish thrashing part of the swap and reduce congestion on IO.

In my experience, for average compute node no more than 20% of memory may be pushed to swap without significant consequences.

... And swap in the guests is better. Because guest may throw away few pages from cache, if needed. But host will swap guest page cache as well, as actual process memory. Allocate that SSD as ephemeral drive to guests and let them swap.

On 07/03/2015 11:19 AM, Blair Bethwaite wrote:
Damnit! So no-one has done this or has a feel for it?
I was really hoping for the lazy option here.

So next question. Ideas for convoluting a reasonable test case?
Assuming I've got a compute node with 256GB RAM and 350GB of PCIe SSD
for swap, what next? We've got Rally going so could potentially use
that, but I'm not sure whether it can do different tasks in parallel
in order to simulate a set of varied workloads... Ideally we'd want at
least these workloads happening in parallel:
- web servers
- db servers
- idle servers
- batch processing

On 30 June 2015 at 03:24, Warren Wang <war...@wangspeed.com> wrote:
I'm gonna forward this to my co-workers :) I've been kicking this idea
around for some time now, and it hasn't caught traction. I think it could
work for a modest overcommit, depending on the memory workload. We decided
that it should be possible to do this sanely, but that it needed testing.
I'm happy to help test this out. Sounds like the results could be part of a
Tokyo talk :P

Warren

Warren

On Mon, Jun 29, 2015 at 9:36 AM, Blair Bethwaite <blair.bethwa...@gmail.com>
wrote:
Hi all,

Question up-front:

Do the performance characteristics of modern PCIe attached SSDs
invalidate/challenge the old "don't overcommit memory" with KVM wisdom
(recently discussed on this list and at meetups and summits)? Has
anyone out there tried & tested this?

Long-form:

I'm currently looking at possible options for increasing virtual
capacity in a public/community KVM based cloud. We started very
conservatively at a 1:1 cpu allocation ratio, so perhaps predictably
we have boatloads of CPU headroom to work with. We also see maybe 50%
memory actually in-use on a host that is, from Nova's perspective,
more-or-less full.

The most obvious thing to do here is increase available memory. There
are at least three ways to achieve that:
1/ physically add RAM
2/ reduce RAM per vcore (i.e., introduce lower RAM flavors)
3/ increase virtual memory capacity (i.e., add swap) and make
ram_allocation_ratio > 1

We're already doing a bit of #2, but at the end of the day, taking
away flavors and trying to change user behaviour is actually harder
than just upgrading hardware. #1 is ideal but I do wonder whether we'd
be better to spend that same money on some PCIe SSD and use it for #3
(at least for our 'standard' flavor classes), the advantage being that
SSD is cheaper per GB (and it might also help alleviate IOPs
starvation for local storage based hosts)...

The question is whether the performance characteristics of modern PCIe
attached SSDs invalidate the old "don't overcommit memory" with KVM
wisdom (recently discussed on this list:
http://www.gossamer-threads.com/lists/openstack/operators/46104 and
also apparently at the Kilo mid-cycle:
https://etherpad.openstack.org/p/PHL-ops-capacity-mgmt where there was
an action to update the default from 1.5 to 1.0, though that doesn't
seem to have happened). Has anyone out there tried this?

I'm also curious if anyone has any recent info re. the state of
automated memory ballooning and/or memory hotplug? Ideally a RAM
overcommitted host would try to inflate guest balloons before
swapping.

--
Cheers,
~Blairo

_______________________________________________
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators





_______________________________________________
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators

Reply via email to