adrian cockcroft writes:
> I think this is a good forum to discuss a systematic performance issue with
> swap. The problem has been there for a long time, I tried to get people
> interested in doing something about it around ten years ago, I left Sun in
> 2004 and don't even use Sun's at my current job, but it would be nice if
> someone figured out how to do a comprehensive redesign of swapfs that solves
> the performance, observability and management problems that were designed in
> around 1990. Since there is an easy workaround (just add RAM) it was not
> considered a high priority issue when I complained.
What can we propose then ?
Assuming the application text is preserved in memory.
For swap-in of application data, what I see happening is.
an application become runnable.
an instruction will load some data into register.
the corresponding page had been swapped out before
so we have a pagefault.
We wait a disk rotation, fill the memory page
and resume execution.
This has to execute at 1 event per disk latency (a vague
number, track buffer can help here).
It seems that the only way to improve on this would be
to have 'something' that reduces the number of faults;
So that seems to imply, that on one pagefault, we have to
issue an I/O that will be bigger than 8K which will also
bring in anon data that we expect will be pagefaulted-in next.
Moreover we have to determine at swap-out time, what the
swap-in sequence is likely to be. Clearly for sparsely used
anonymous segments there is not much hope.
But for densily used segments (which is: if I use a page I'm
likely to be using adjacent pages in the near future) then
we can see a ray of hope :
Instead of pageout stuff one page at a time, we could decide
to group a set of 64K or 128K together. The scanner today is
setup to detect pages that are not recently used (through
the 2 handclock mechanism). At some point it holds one page
that it's wants to swapout. So the new scheme would say
in the interest of future swap-in performance
I will decide now that adjacent pages from the
same segment to the one I am holding on to are also
candidate to swap-out despite them not yet being
swepts by the scanner... So instead of swapout
individual pages, I will swapout a kluster of pages
from the same segment (as soon as one page in the
kluster is scanned as 'not recently used').
In the best case the gains will be proportional to the
kluster size (8X or more). In the bad case, we'll swapout
pages that are live. Maybe we could swap-out klusters like
this but only release the individual pages to the freelist
when the scanner decides so.
In this scheme, are there easy ways to decide what is a
densily used anonymous segment. Should we consider all
segment as such or try to be more clever.
Does linux manage to issue big I/Os on pagefault induced
swapins ?
Do we need to do something special for tmpfs files ?
-r
> My opinion of Solaris swap - "If you think you understand how it works, you
> weren't looking closely enough". See this
> http://www.itworld.com/Comp/2377/UIR980701perf/
>
> Adrian
> I think this is a good forum to discuss a systematic performance issue with
> swap. The problem has been there for a long time, I tried to get people
> interested in doing something about it around ten years ago, I left Sun in
> 2004 and don't even use Sun's at my current job, but it would be
> nice if someone figured out how to do a comprehensive redesign of swapfs
> that solves the performance, observability and management problems that were
> designed in around 1990. Since there is an easy workaround (just add RAM) it
> was not considered a high priority issue when I complained.
> <div><br class="webkit-block-placeholder"></div><div>My opinion of Solaris
> swap - "If you think you understand how it works, you weren't
> looking closely enough". See this <a
> href="http://www.itworld.com/Comp/2377/UIR980701perf/">
> http://www.itworld.com/Comp/2377/UIR980701perf/</a></div><div><br
> class="webkit-block-placeholder"></div><div>Adrian</div><div><br
> class="webkit-block-placeholder"></div>
_______________________________________________
perf-discuss mailing list
[email protected]