On Sun, 2013-11-03 at 18:57 -0500, KOSAKI Motohiro wrote:
> >> I'm slightly surprised this cache makes 15% hit. Which application
> >> get a benefit? You listed a lot of applications, but I'm not sure
> >> which is highly depending on largest vma.
> >
> > Well I chose the largest vma because it gives us a greater chance of
> > being already cached when we do the lookup for the faulted address.
> >
> > The 15% improvement was with Hadoop. According to my notes it was at
> > ~48% with the baseline kernel and increased to ~63% with this patch.
> >
> > In any case I didn't measure the rates on a per-task granularity, but at
> > a general system level. When a system is first booted I can see that the
> > mmap_cache access rate becomes the determinant factor and when adding a
> > workload it doesn't change much. One exception to this was a kernel
> > build, where we go from ~50% to ~89% hit rate on a vanilla kernel.
> 
> I looked at this patch a bit. The worth of this is to improve the
> cache hit ratio
> of heap.
> 
> 1) For single thread applications, heap is frequently largest mapping
> in the process.

Right.

> 2) For java VM, "java -Xms1000m -Xmx1000m HelloWorld" makes following
> /proc/<pid>/smaps entry. That said, JVM allocate single heap even if
> applications are multi threaded.

Oh, this is new to me and nicely explains why I see the most benefit in
java related workloads.

> 
> c1800000-100000000 rw-p 00000000 00:00 0
> Size:            1024000 kB
> Rss:                 244 kB
> Pss:                 244 kB
> Shared_Clean:          0 kB
> Shared_Dirty:          0 kB
> Private_Clean:         0 kB
> Private_Dirty:       244 kB
> Referenced:          244 kB
> Anonymous:           244 kB
> AnonHugePages:         0 kB
> Swap:                  0 kB
> KernelPageSize:        4 kB
> MMUPageSize:           4 kB
> 
> That's good.
> 
> However, we know there is a situation that this patch doesn't work.
> glibc makes per thread heap (arena) by default. So, it is not to be
> expected works well on glibc multi threaded programs. That's a
> slightly big limitation.

I think this is what Linus was referring to.

> 
> Anyway, I haven't observed real performance difference because most
> big penalty of find_vma come from taking mmap_sem, not rb-tree search.

Yes, undoubtedly, which is why I'm using units of hit/miss rather than
workload throughput.

Thanks,
Davidlohr

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to