Re: [PATCH RFC] mm/swap: automatic tuning for swapin readahead

Hugh Dickins Mon, 22 Oct 2012 22:17:00 -0700

On Mon, 22 Oct 2012, Shaohua Li wrote:
> On Tue, Oct 16, 2012 at 08:50:49AM +0800, Shaohua Li wrote:
> > On Mon, Oct 08, 2012 at 03:09:58PM -0700, Hugh Dickins wrote:
> > > On Thu, 4 Oct 2012, Konstantin Khlebnikov wrote:
> > > 
> > > > Here results of my test. Workload isn't very realistic, but at least it
> > > > threaded: compiling linux-3.6 with defconfig in 16 threads on tmpfs,
> > > > 512mb ram, dualcore cpu, ordinary hard disk. (test script in attachment)
> > > > 
> > > > average results for ten runs:
> > > > 
> > > >                 RA=3    RA=0    RA=1    RA=2    RA=4    Hugh    Shaohua
> > > > real time       500     542     528     519     500     523     522
> > > > user time       738     737     735     737     739     737     739
> > > > sys time        93      93      91      92      96      92      93
> > > > pgmajfault      62918   110533  92454   78221   54342   86601   77229
> > > > pgpgin  2070372 795228  1034046 1471010 3177192 1154532 1599388
> > > > pgpgout 2597278 2022037 2110020 2350380 2802670 2286671 2526570
> > > > pswpin  462747  138873  202148  310969  739431  232710  341320
> > > > pswpout 646363  502599  524613  584731  697797  568784  628677
> > > > 
> > > > So, last two columns shows mostly equal results: +4.6% and +4.4% in
> > > > comparison to vanilla kernel with RA=3, but your version shows more 
> > > > stable
> > > > results (std-error 2.7% against 4.8%) (all this numbers in huge table in
> > > > attachment)
> > > 
> > > Thanks for doing this, Konstantin, but I'm stuck for anything much to say!
> > > Shaohua and I are both about 4.5% bad for this particular test, but I'm
> > > more consistently bad - hurrah!
> > > 
> > > I suspect (not a convincing argument) that if the test were just slightly
> > > different (a little more or a little less memory, SSD instead of hard
> > > disk, diskcache instead of tmpfs), then it would come out differently.
> > > 
> > > Did you draw any conclusions from the numbers you found?
> > > 
> > > I haven't done any more on this in the last few days, except to verify
> > > that once an anon_vma is judged random with Shaohua's, then it appears
> > > to be condemned to no-readahead ever after.
> > > 
> > > That's probably something that a hack like I had in mine would fix,
> > > but that addition might change its balance further (and increase vma
> > > or anon_vma size) - not tried yet.
> > > 
> > > All I want to do right now, is suggest to Andrew that he hold Shaohua's
> > > patch back from 3.7 for the moment: I'll send a response to Sep 7th's
> > > mm-commits mail to suggest that - but no great disaster if he ignores me.
> > 
> > Ok, I tested Hugh's patch. My test is a multithread random write workload.
> > With Hugh's patch, 49:28.06elapsed
> > With mine, 43:23.39elapsed
> > There is 12% more time used with Hugh's patch.
> > 
> > In the stable state of this workload, SI:SO ratio should be roughly 1:1. 
> > With
> > Hugh's patch, it's around 1.6:1, there is still unnecessary swapin.
> > 
> > I also tried a workload with seqential/random write mixed, Hugh's patch is 
> > 10%
> > bad too.
> 
> With below change, the si/so ratio is back to around 1:1 in my workload. Guess
> the run time of my test will be reduced too, though I didn't test yet.
> -     used = atomic_xchg(&swapra_hits, 0) + 1;
> +     used = atomic_xchg(&swapra_hits, 0);


Thank you for playing and trying that, I haven't found time to revisit it
at all.  I'll give that adjustment a go at my end.  The "+ 1" was for the
target page itself; but whatever works best, there's not much science to it.

> 
> I'm wondering how could a global counter based method detect readahead
> correctly. For example, if there are a sequential access thread and a random
> access thread, doesn't this method always make wrong decision?

But only in the simplest cases is the sequentiality of placement on swap
well correlated with the sequentiality of placement in virtual memory.
Once you have a sequential access thread and a random access thread
swapping out at the same time, their pages will be interspersed.

I'm pretty sure that if you give it more thought than I am giving it
at the moment, you can devise a test case which would go amazingly
faster by your per-vma method than by keeping just this global state.

But I doubt such a test case would be so realistic as to deserve that
extra sophistication.  I do prefer to keep the heuristic as stupid and
unpretentious as possible.

Especially when I remember how get_swap_page() stripes across swap
areas of equal priority: my guess is that nobody uses that feature,
and we don't even want to consider it here; but it feels wrong to
ignore it if we aim for more cleverness at the readahead end.

Hugh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH RFC] mm/swap: automatic tuning for swapin readahead

Reply via email to