Re: [PATCH v5 4/4] zram: introduce zram memory tracking

Minchan Kim Thu, 19 Apr 2018 19:10:11 -0700

On Wed, Apr 18, 2018 at 02:07:15PM -0700, Andrew Morton wrote:
> On Wed, 18 Apr 2018 10:26:36 +0900 Minchan Kim <[email protected]> wrote:
> 
> > Hi Andrew,
> > 
> > On Tue, Apr 17, 2018 at 02:59:21PM -0700, Andrew Morton wrote:
> > > On Mon, 16 Apr 2018 18:09:46 +0900 Minchan Kim <[email protected]> wrote:
> > > 
> > > > zRam as swap is useful for small memory device. However, swap means
> > > > those pages on zram are mostly cold pages due to VM's LRU algorithm.
> > > > Especially, once init data for application are touched for launching,
> > > > they tend to be not accessed any more and finally swapped out.
> > > > zRAM can store such cold pages as compressed form but it's pointless
> > > > to keep in memory. Better idea is app developers free them directly
> > > > rather than remaining them on heap.
> > > > 
> > > > This patch tell us last access time of each block of zram via
> > > > "cat /sys/kernel/debug/zram/zram0/block_state".
> > > > 
> > > > The output is as follows,
> > > >       300    75.033841 .wh
> > > >       301    63.806904 s..
> > > >       302    63.806919 ..h
> > > > 
> > > > First column is zram's block index and 3rh one represents symbol
> > > > (s: same page w: written page to backing store h: huge page) of the
> > > > block state. Second column represents usec time unit of the block
> > > > was last accessed. So above example means the 300th block is accessed
> > > > at 75.033851 second and it was huge so it was written to the backing
> > > > store.
> > > > 
> > > > Admin can leverage this information to catch cold|incompressible pages
> > > > of process with *pagemap* once part of heaps are swapped out.
> > > 
> > > A few things..
> > > 
> > > - Terms like "Admin can" and "Admin could" are worrisome.  How do we
> > >   know that admins *will* use this?  How do we know that we aren't
> > >   adding a bunch of stuff which nobody will find to be (sufficiently)
> > >   useful?  For example, is there some userspace tool to which you are
> > >   contributing which will be updated to use this feature?
> > 
> > Actually, I used this feature two years ago to find memory hogger
> > although the feature was very fast prototyping. It was very useful
> > to reduce memory cost in embedded space.
> > 
> > The reason I am trying to upstream the feature is I need the feature
> > again. :)
> > 
> > Yub, I have a userspace tool to use the feature although it was
> > not compatible with this new version. It should be updated with
> > new format. I will find a time to submit the tool.
> 
> hm, OK, can we get this info into the changelog?


No problem. I will add as follows,

"I used the feature a few years ago to find memory hoggers in userspace
to notice them what memory they have wasted without touch for a long time.
With it, they could reduce unnecessary memory space. However, at that time,
I hacked up zram for the feature but now I need the feature again so
I decided it would be better to upstream rather than keeping it alone.
I hope I submit the userspace tool to use the feature soon"

> 
> > > 
> > > - block_state's second column is in microseconds since some
> > >   undocumented time.  But how is userspace to know how much time has
> > >   elapsed since the access?  ie, "current time".
> > 
> > It's a sched_clock so it should be elapsed time since the system boot.
> > I should have written it explictly.
> > I will fix it.
> > 
> > > 
> > > - Is the sched_clock() return value suitable for exporting to
> > >   userspace?  Is it monotonic?  Is it consistent across CPUs, across
> > >   CPU hotadd/remove, across suspend/resume, etc?  Does it run all the
> > >   way up to 2^64 on all CPU types, or will some processors wrap it at
> > >   (say) 32 bits?  etcetera.  Documentation/timers/timekeeping.txt
> > >   points out that suspend/resume can mess it up and that the counter
> > >   can drift between cpus.
> > 
> > Good point!
> > 
> > I just referenced it from ftrace because I thought the goal is similiar
> > "no need to be exact unless the drift is frequent but wanted to be fast"
> > 
> > AFAIK, ftrace/printk is active user of the function so if the problem
> > happens frequently, it might be serious. :)
> 
> It could be that ktime_get() is a better fit here - especially if
> sched_clock() goes nuts after resume.  Unfortunately ktime_get()
> appears to be totally undocumented :(
> 

I will use ktime_get_boottime(). With it, zram is not demamaged by
suspend/resume and code would be more simple/clear. For user, it
would be more straightforward to parse the time.

Thanks for good suggestion, Andrew!

Re: [PATCH v5 4/4] zram: introduce zram memory tracking

Reply via email to