On 12/17/2013 01:10 PM, David Timothy Strauss wrote:

> System specs:
>  * Fedora 19 with the 3.11.10-200.fc19.x86_64 kernel (just the stock RPM)
>  * Bare-metal servers with 128GB RAM split between two NUMA regions,
> each region with one hex-core processor
>  * More than 700 processes, a couple hundred of which are active
> fairly frequently. The systems were at 7000 processes, but we've
> dropped it while we dive into this issue.
>  * Many of the processes are short-lived. The long-lived ones
> experience spikes in CPU and memory usage while processing requests.
> 
> Here's what we've tried, to no avail:
>  * tuned-adm on latency-performance and virtual-host profiles; this
> places the system on the deadline scheduler, but this problem occurred
> on the default one too
>  * kernel.sched_migration_cost_ns=5000000 (which tuned will do for
> those profiles in v3.3/Fedora 20)
>  * numad to balance between regions
>  * Global use of sched_relax_domain_level=1 and sched_relax_domain_level=2
>  * Splitting the system with cpuset into management tasks (6 virtual
> cores) and workload tasks (18 virtual cores) with
> sched_relax_domain_level=2. This is based on recommendations for NUMA
> systems in the cpuset man page.

Just for a quick sanity check, can you try disabling the
automatic numa balancing code?

# echo NO_NUMA > /sys/kernel/debug/sched_features

> Here's what we've used for analysis:
>  * powertop
>  * top/htop
>  * perf record -a -g

Does "perf report -g" show where the calls to the
migration code are coming from? Something must be
migrating tasks around, and it will be good to know
what it is...

-- 
All rights reversed
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to