Re: [Beowulf] [OOM killer/scheduler] disabling swap on cluster nodes?

Joe Landman Wed, 11 Feb 2015 07:49:08 -0800

On 02/11/2015 12:25 AM, Mark Hahn wrote:

is net-swap really ever a good idea?  it always seems like asking for
trouble, though in principle there's no reason why net IO should be
"worse" than disk IO...


... except for the need to allocate memory to build packets to send
the swap data.


I thought the implication was clear, that doing disk IO may also require
memory allocations.

Paging to local scratch is less memory intensive than constructing amemory packet to hold a buffer for transfer over the network. In factlocal paging is really quite memory efficient.

There are still a few places that look at you funny if you suggest
running w/o swap.  The 6 orders of magnitude performance difference
for random page touching performance suggests you should stare them
back down.


absolutely: if you have reason to believe all your pages are uniformly hot,
more power to you!

Bad analysis. In the old days (ugh), locality of reference wassomething you had to work very hard to make sure you made effective useof your memory. You re-ordered your loops, did all manner of otherthings. Nowadays, you have to worry about objects and their instancedata, which you don't know so much when and where they will be touched.

Feel free to use a modern OO code on a memory starved system ... itsjust not pleasant. That 6 OOM performance variance between hot and coldpages will bite you.

Seriously, if you can avoid under-spec'ing/provisioning ram, you should.


in other words: buy extra ram to hold your cold pages!  after all, dram
is only O($10/GB), and disk is O($0.05/GB).  oh, wait...

And this is what I was waiting for, someone to pull out a bad analysisand then use it as a strawman.

Ok, using your underlying theory here (disk is cheap, ram is expensive),lets go to zero ram and save money.


Oh ... Wait ...

Yes, it should be obvious why this is silly. And by extension, theoriginal argument is silly.

But the more subtle point (which is the one I had hoped you would gofor, as its the one that makes sense) is that there is a fine balancingbetween size of ram and (if you use it) swap. This balancing act isinfluenced by the opportunity cost of decisions (less ram -> more swap,longer execution time/cost for memory intensive codes; versus more ram-> less swap, shorter execution time, though higher cost per node).

In fact this gets to the very definition of opportunity cost, what isthe amount of value I am giving up by making the alternative choice.Another way of thinking of this is asking what the marginal value of thechoice of more or less ram?

This is why I argue that sizing memory (and almost all other things) isvery important. Building a 1TB ram machine for problems that run in 4GBis a waste of resources (too much ram). Building a 16GB ram machine forproblems that run in 1TB is a waste (too little).

wish for the wild west of OOM shooting random things in comparison to
random 4k page touches.  Yes, I've seen the latter.


thrashing is bad.  it's not the same as *using* swap.  that's why swap
still makes sense.

Thrashing *is* using swap as a transparent memory extension. It is oneof the worst possible cases, and seen quite frequently when you havelarge OO codes where you can't predict what object is going to do what.Or you have large in memory databases. Or ...

That is, swap/paging provide a memory extension, and its a crutchrelative to in-app memory management. The latter is generally frownedupon in most development circles these days, especially with GC systemsin OO code.

The world has evolved significantly since I spilled my first matrices tolocal files.

interesting thought: SSD is about $0.5/GB, so would make a great swap
dev - has anyone tried tuning the swap cluster size to match the SSD
flash block?


We've done quite a bit of this, yes.

What it comes down to is, a) swap is a terrible thing to do, avoid it ifpossible. b) if you can't avoid it, do it as quickly as you can. c) theincremental cost of increasing RAM size versus paying the (often far)longer run time (with all its attendant costs and effects: slowerthroughput, fewer jobs per unit time, more power spent per job, etc.) isheavily biased *against* building sizable swap. This is why we use zramwhenever possible, zcache, and very fast tuned swap partitions wheneverpossible.

Note though, and this has happened to us before: if a swap device dieswhile you have pages out on it ... lets just say thats a new experiencein crashing. Its exactly like pulling a random DIMM out of a runningmachine.




--
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics, Inc.
e: [email protected]
w: http://scalableinformatics.com
t: @scalableinfo
p: +1 734 786 8423 x121
c: +1 734 612 4615
_______________________________________________
Beowulf mailing list, [email protected] sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Re: [Beowulf] [OOM killer/scheduler] disabling swap on cluster nodes?

Reply via email to