On Wed, Apr 18, 2018 at 02:07:15PM -0700, Andrew Morton wrote: > On Wed, 18 Apr 2018 10:26:36 +0900 Minchan Kim <minc...@kernel.org> wrote: > > > Hi Andrew, > > > > On Tue, Apr 17, 2018 at 02:59:21PM -0700, Andrew Morton wrote: > > > On Mon, 16 Apr 2018 18:09:46 +0900 Minchan Kim <minc...@kernel.org> wrote: > > > > > > > zRam as swap is useful for small memory device. However, swap means > > > > those pages on zram are mostly cold pages due to VM's LRU algorithm. > > > > Especially, once init data for application are touched for launching, > > > > they tend to be not accessed any more and finally swapped out. > > > > zRAM can store such cold pages as compressed form but it's pointless > > > > to keep in memory. Better idea is app developers free them directly > > > > rather than remaining them on heap. > > > > > > > > This patch tell us last access time of each block of zram via > > > > "cat /sys/kernel/debug/zram/zram0/block_state". > > > > > > > > The output is as follows, > > > > 300 75.033841 .wh > > > > 301 63.806904 s.. > > > > 302 63.806919 ..h > > > > > > > > First column is zram's block index and 3rh one represents symbol > > > > (s: same page w: written page to backing store h: huge page) of the > > > > block state. Second column represents usec time unit of the block > > > > was last accessed. So above example means the 300th block is accessed > > > > at 75.033851 second and it was huge so it was written to the backing > > > > store. > > > > > > > > Admin can leverage this information to catch cold|incompressible pages > > > > of process with *pagemap* once part of heaps are swapped out. > > > > > > A few things.. > > > > > > - Terms like "Admin can" and "Admin could" are worrisome. How do we > > > know that admins *will* use this? How do we know that we aren't > > > adding a bunch of stuff which nobody will find to be (sufficiently) > > > useful? For example, is there some userspace tool to which you are > > > contributing which will be updated to use this feature? > > > > Actually, I used this feature two years ago to find memory hogger > > although the feature was very fast prototyping. It was very useful > > to reduce memory cost in embedded space. > > > > The reason I am trying to upstream the feature is I need the feature > > again. :) > > > > Yub, I have a userspace tool to use the feature although it was > > not compatible with this new version. It should be updated with > > new format. I will find a time to submit the tool. > > hm, OK, can we get this info into the changelog?
No problem. I will add as follows, "I used the feature a few years ago to find memory hoggers in userspace to notice them what memory they have wasted without touch for a long time. With it, they could reduce unnecessary memory space. However, at that time, I hacked up zram for the feature but now I need the feature again so I decided it would be better to upstream rather than keeping it alone. I hope I submit the userspace tool to use the feature soon" > > > > > > > - block_state's second column is in microseconds since some > > > undocumented time. But how is userspace to know how much time has > > > elapsed since the access? ie, "current time". > > > > It's a sched_clock so it should be elapsed time since the system boot. > > I should have written it explictly. > > I will fix it. > > > > > > > > - Is the sched_clock() return value suitable for exporting to > > > userspace? Is it monotonic? Is it consistent across CPUs, across > > > CPU hotadd/remove, across suspend/resume, etc? Does it run all the > > > way up to 2^64 on all CPU types, or will some processors wrap it at > > > (say) 32 bits? etcetera. Documentation/timers/timekeeping.txt > > > points out that suspend/resume can mess it up and that the counter > > > can drift between cpus. > > > > Good point! > > > > I just referenced it from ftrace because I thought the goal is similiar > > "no need to be exact unless the drift is frequent but wanted to be fast" > > > > AFAIK, ftrace/printk is active user of the function so if the problem > > happens frequently, it might be serious. :) > > It could be that ktime_get() is a better fit here - especially if > sched_clock() goes nuts after resume. Unfortunately ktime_get() > appears to be totally undocumented :( > I will use ktime_get_boottime(). With it, zram is not demamaged by suspend/resume and code would be more simple/clear. For user, it would be more straightforward to parse the time. Thanks for good suggestion, Andrew!