On Tue, 20 Nov 2012, Ingo Molnar wrote:

> The current updated table of performance results is:
> 
> -------------------------------------------------------------------------
>   [ seconds         ]    v3.7  AutoNUMA   |  numa/core-v16    [ vs. v3.7]
>   [ lower is better ]   -----  --------   |  -------------    -----------
>                                           |
>   numa01                340.3    192.3    |      139.4          +144.1%
>   numa01_THREAD_ALLOC   425.1    135.1    |    121.1          +251.0%
>   numa02                 56.1     25.3    |       17.5          +220.5%
>                                           |
>   [ SPECjbb transactions/sec ]            |
>   [ higher is better         ]            |
>                                           |
>   SPECjbb 1x32 +THP      524k     507k    |     638k           +21.7%
>   SPECjbb 1x32 !THP      395k             |       512k           +29.6%
>                                           |
> -----------------------------------------------------------------------
>                                           |
>   [ SPECjbb multi-4x8 ]                   |
>   [ tx/sec            ]  v3.7             |  numa/core-v16
>   [ higher is better  ] -----             |  -------------
>                                           |
>               +THP:      639k             |       655k            +2.5%
>               -THP:      510k             |       517k            +1.3%
> 
> So I think I've addressed all regressions reported so far - if 
> anyone can still see something odd, please let me know so I can 
> reproduce and fix it ASAP.
> 

I started profiling on a new machine that is an exact duplicate of the 
16-way, 4 node, 32GB machine I was profiling with earlier to rule out any 
machine-specific problems.  I pulled master and ran new comparisons with 
THP enabled at c418de93e398 ("Merge branch 'x86/mm'"): 

  CONFIG_NUMA_BALANCING disabled        136521.55 SPECjbb2005 bops
  CONFIG_NUMA_BALANCING enabled         132476.07 SPECjbb2005 bops (-3.0%)

Aside: neither 4739578c3ab3 ("x86/mm: Don't flush the TLB on #WP pmd 
fixups") nor 01e9c2441eee ("x86/vsyscall: Add Kconfig option to use native 
vsyscalls and switch to it") significantly improved upon the throughput on 
this system.

Over the past 24 hours, however, throughput has significantly improved 
from a 6.3% regression to a 3.0% regression because of 246c0b9a1caf ("mm, 
numa: Turn 4K pte NUMA faults into effective hugepage ones")!

One request: I noticed that the entire patchset doesn't add any fields to 
/proc/vmstat through count_vm_event() like thp did, which I found very 
useful when profiling that set when it was being reviewed.  Would it be 
possible to add some vm events to the balancing code so we can capture 
data of how the NUMA balancing is behaving?  Their usefulness would extend 
beyond just the review period.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to