On Tue, 27 May 2014, Vitaly Wool wrote: > Hi, > > I have recently been poking around saving memory on low-RAM Android devices, > basically > following the Google KSM+ZRAM guidelines for KitKat and measuring the > gain/performance. > While getting quite some RAM savings indeed (in the range of 10k-20k pages) > we noticed > that kswapd used a lot of CPU cycles most of the time, and that iowait times > reported > by e. g. top were sometimes off the reasonable limits (up to 40%). From what > I could see, > the reason for that behavior at least in part is that KSM has to traverse > really long > VMA lists. > > Android userspace should be held somewhat responsible for that since it > "advises" KSM all > MAP_PRIVATE|MAP_ANONYMOUS mmap'ed pages are mergeable while this seems to be > exhaustive > and not quite following the kernel KSM Documentation piece saying: > "Applications should be considerate in their use of MADV_MERGEABLE, > restricting its use to areas likely to benefit. KSM's scans may use a lot > of processing power: some installations will disable KSM for that reason." > > As a mitigation to this, we suggest an additional parameter to be added to > KSM > sysfs-exported ones. It will allow for bypassing small VM areas advertised as > mergeable > and only add bigger ones to KSM lists, keeping the default behavior intact. > > The RFC/patch code may then look like this: > > diff --git a/mm/ksm.c b/mm/ksm.c > index 68710e8..069f6b0 100644 > --- a/mm/ksm.c > +++ b/mm/ksm.c > @@ -232,6 +232,10 @@ static int ksm_nr_node_ids = 1; > #define ksm_nr_node_ids 1 > #endif > +/* Threshold for minimal VMA size to consider */ > +static unsigned long ksm_vma_size_threshold = 4096; > + > + > #define KSM_RUN_STOP 0 > #define KSM_RUN_MERGE 1 > #define KSM_RUN_UNMERGE 2 > @@ -1757,6 +1761,9 @@ int ksm_madvise(struct vm_area_struct *vma, unsigned > long start, > return 0; > #endif > + if (end - start < ksm_vma_size_threshold) > + return 0; > + > if (!test_bit(MMF_VM_MERGEABLE, &mm->flags)) { > err = __ksm_enter(mm); > if (err) > @@ -2240,6 +2247,29 @@ static ssize_t merge_across_nodes_store(struct kobject > *kobj, > KSM_ATTR(merge_across_nodes); > #endif > +static ssize_t vma_size_threshold_show(struct kobject *kobj, > + struct kobj_attribute *attr, char *buf) > +{ > + return sprintf(buf, "%lu\n", ksm_vma_size_threshold); > +} > + > +static ssize_t vma_size_threshold_store(struct kobject *kobj, > + struct kobj_attribute *attr, > + const char *buf, size_t count) > +{ > + int err; > + unsigned long thresh; > + > + err = strict_strtoul(buf, 10, &thresh); > + if (err || thresh > UINT_MAX) > + return -EINVAL; > + > + ksm_vma_size_threshold = thresh; > + > + return count; > +} > +KSM_ATTR(vma_size_threshold); > + > static ssize_t pages_shared_show(struct kobject *kobj, > struct kobj_attribute *attr, char *buf) > { > @@ -2297,6 +2327,7 @@ static struct attribute *ksm_attrs[] = { > #ifdef CONFIG_NUMA > &merge_across_nodes_attr.attr, > #endif > + &vma_size_threshold_attr.attr, > NULL, > }; > > With our (narrow) use case, setting vma_size_threshold to 65536 significantly > decreases the > iowait time and the CPU idle load, while the KSM gain descreases quite > slightly (by 5-15%). > > Any comments will be greatly appreciated,
It's interesting, even amusing, but I think the emphasis has to be on your "(narrow) use case". I can't see any particular per-vma overhead in KSM's scan; and what little per-vma overhead there is (find_vma, vma->vm_next) includes the non-mergeable vmas along with the mergeable ones. And I don't think it's a universal rule of nature that small vmas are less likely to contain identical pages than large ones - beyond, of course, the obvious fact that small vmas are likely to contain fewer pages than large ones, so to that degree less likely to have merge hits. But you see a significantly/slightly effect beyond that: any theory why? I think it's just a feature of your narrow use case, and the adjustment for it best made in userspace (or hacked into your own kernel if you wish); but I cannot at present see the case for doing this in an upstream kernel. Hugh -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/