Re: [PATCH] x86/mm: Tracking linear mapping split events since boot

2021-01-25 Thread Dave Hansen
On 1/25/21 12:11 PM, Saravanan D wrote:
> Numerous hugepage splits in the linear mapping would give
> admins the signal to narrow down the sluggishness caused by TLB
> miss/reload.
> 
> One of the many lasting (as we don't coalesce back) sources for huge page
> splits is tracing as the granular page attribute/permission changes would
> force the kernel to split code segments mapped to huge pages to smaller
> ones thereby increasing the probability of TLB miss/reload even after
> tracing has been stopped.
> 
> The split event information will be displayed at the bottom of
> /proc/meminfo
> 
> DirectMap4k: 3505112 kB
> DirectMap2M:19464192 kB
> DirectMap1G:12582912 kB
> DirectMap2MSplits:  1705
> DirectMap1GSplits:20

This seems much more like something we'd want in /proc/vmstat or as a
tracepoint than meminfo.  A tracepoint would be especially nice because
the trace buffer could actually be examined if an admin finds an
excessive number of these.


Re: [PATCH] x86/mm: Tracking linear mapping split events since boot

2021-01-25 Thread Tejun Heo
Hello,

On Mon, Jan 25, 2021 at 12:15:51PM -0800, Dave Hansen wrote:
> > DirectMap4k: 3505112 kB
> > DirectMap2M:19464192 kB
> > DirectMap1G:12582912 kB
> > DirectMap2MSplits:  1705
> > DirectMap1GSplits:20
> 
> This seems much more like something we'd want in /proc/vmstat or as a
> tracepoint than meminfo.  A tracepoint would be especially nice because
> the trace buffer could actually be examined if an admin finds an
> excessive number of these.

Adding a TP sure can be helpful but I'm not sure how that'd make counters
unnecessary given that the accumulated number of events since boot is what
matters.

Thanks.

-- 
tejun


Re: [PATCH] x86/mm: Tracking linear mapping split events since boot

2021-01-26 Thread Dave Hansen
On 1/25/21 4:53 PM, Tejun Heo wrote:
>> This would be a lot more useful if you could reset the counters.  Then
>> just reset them from userspace at boot.  Adding read-write debugfs
>> exports for these should be pretty trivial.
> While this would work for hands-on cases, I'm a bit worried that this might
> be more challenging to gain confidence in large production environments.

Which part?  Large production environments don't trust data from
debugfs?  Or don't trust it if it might have been reset?

You could stick the "reset" switch in debugfs, and dump something out in
dmesg like we do for /proc/sys/vm/drop_caches so it's not a surprise
that it happened.

BTW, counts of *events* don't really belong in meminfo.  These really do
belong in /proc/vmstat if anything.


Re: [PATCH] x86/mm: Tracking linear mapping split events since boot

2021-01-26 Thread Dave Hansen
On 1/25/21 12:32 PM, Tejun Heo wrote:
> On Mon, Jan 25, 2021 at 12:15:51PM -0800, Dave Hansen wrote:
>>> DirectMap4k: 3505112 kB
>>> DirectMap2M:19464192 kB
>>> DirectMap1G:12582912 kB
>>> DirectMap2MSplits:  1705
>>> DirectMap1GSplits:20
>> This seems much more like something we'd want in /proc/vmstat or as a
>> tracepoint than meminfo.  A tracepoint would be especially nice because
>> the trace buffer could actually be examined if an admin finds an
>> excessive number of these.
> Adding a TP sure can be helpful but I'm not sure how that'd make counters
> unnecessary given that the accumulated number of events since boot is what
> matters.

Kinda.  The thing that *REALLY* matters is how many of these splits were
avoidable and *could* be coalesced.

The patch here does not actually separate out pre-boot from post-boot,
so it's pretty hard to tell if the splits came from something like
tracing which is totally unnecessary or they were the result of
something at boot that we can't do anything about.

This would be a lot more useful if you could reset the counters.  Then
just reset them from userspace at boot.  Adding read-write debugfs
exports for these should be pretty trivial.


Re: [PATCH] x86/mm: Tracking linear mapping split events since boot

2021-01-26 Thread Tejun Heo
Hello,

On Mon, Jan 25, 2021 at 05:04:00PM -0800, Dave Hansen wrote:
> Which part?  Large production environments don't trust data from
> debugfs?  Or don't trust it if it might have been reset?

When the last reset was. Not saying it's impossible or anything but in
general it's a lot better to have the counters to be monotonically
increasing with time/event stamped markers than the counters themselves
getting reset or modified in other ways because the ownership of a specific
counter might not be obvious to everyone and accidents and mistakes happen.

Note that the "time/event stamped markers" above don't need to and shouldn't
be in the kernel. It can be managed by whoever that wants to monitor a given
time period and there can be any number of them.

> You could stick the "reset" switch in debugfs, and dump something out in
> dmesg like we do for /proc/sys/vm/drop_caches so it's not a surprise
> that it happened.

Processing dmesgs can work too but isn't particularly reliable or scalable.

> BTW, counts of *events* don't really belong in meminfo.  These really do
> belong in /proc/vmstat if anything.

Oh yeah, I don't have a strong opinion on where the counters should go.

Thanks.

-- 
tejun


Re: [PATCH] x86/mm: Tracking linear mapping split events since boot

2021-01-26 Thread Tejun Heo
Hello, Dave.

On Mon, Jan 25, 2021 at 04:47:42PM -0800, Dave Hansen wrote:
> The patch here does not actually separate out pre-boot from post-boot,
> so it's pretty hard to tell if the splits came from something like
> tracing which is totally unnecessary or they were the result of
> something at boot that we can't do anything about.

Ah, right, didn't know they also included splits during boot. It'd be a lot
more useful if they were counting post-boot splits.

> This would be a lot more useful if you could reset the counters.  Then
> just reset them from userspace at boot.  Adding read-write debugfs
> exports for these should be pretty trivial.

While this would work for hands-on cases, I'm a bit worried that this might
be more challenging to gain confidence in large production environments.

Thanks.

-- 
tejun