On Mon, 16 Apr 2018 18:09:46 +0900 Minchan Kim <minc...@kernel.org> wrote:
> zRam as swap is useful for small memory device. However, swap means > those pages on zram are mostly cold pages due to VM's LRU algorithm. > Especially, once init data for application are touched for launching, > they tend to be not accessed any more and finally swapped out. > zRAM can store such cold pages as compressed form but it's pointless > to keep in memory. Better idea is app developers free them directly > rather than remaining them on heap. > > This patch tell us last access time of each block of zram via > "cat /sys/kernel/debug/zram/zram0/block_state". > > The output is as follows, > 300 75.033841 .wh > 301 63.806904 s.. > 302 63.806919 ..h > > First column is zram's block index and 3rh one represents symbol > (s: same page w: written page to backing store h: huge page) of the > block state. Second column represents usec time unit of the block > was last accessed. So above example means the 300th block is accessed > at 75.033851 second and it was huge so it was written to the backing > store. > > Admin can leverage this information to catch cold|incompressible pages > of process with *pagemap* once part of heaps are swapped out. A few things.. - Terms like "Admin can" and "Admin could" are worrisome. How do we know that admins *will* use this? How do we know that we aren't adding a bunch of stuff which nobody will find to be (sufficiently) useful? For example, is there some userspace tool to which you are contributing which will be updated to use this feature? - block_state's second column is in microseconds since some undocumented time. But how is userspace to know how much time has elapsed since the access? ie, "current time". - Is the sched_clock() return value suitable for exporting to userspace? Is it monotonic? Is it consistent across CPUs, across CPU hotadd/remove, across suspend/resume, etc? Does it run all the way up to 2^64 on all CPU types, or will some processors wrap it at (say) 32 bits? etcetera. Documentation/timers/timekeeping.txt points out that suspend/resume can mess it up and that the counter can drift between cpus.