Hi Michal, On Fri, Jul 26, 2024 at 10:16 AM Michal Koutný <mkou...@suse.com> wrote: > > Hello David. > > On Wed, Jul 24, 2024 at 12:19:41PM GMT, David Finkel <dav...@vimeo.com> wrote: > > Writing a specific string to the memory.peak and memory.swap.peak > > pseudo-files reset the high watermark to the current usage for > > subsequent reads through that same fd. > > This is elegant and nice work! (Caught my attention, so a few nits below.)
Thanks! You can thank Johannes for the algorithm. > > > --- a/include/linux/cgroup-defs.h > > +++ b/include/linux/cgroup-defs.h > > @@ -775,6 +775,11 @@ struct cgroup_subsys { > > > > extern struct percpu_rw_semaphore cgroup_threadgroup_rwsem; > > > > +struct cgroup_of_peak { > > + long value; > > Wouldn't this better be unsigned like watermarks themselves? Hmm, interesting question. I originally set that to be signed to handle the special value of -1. However, that's kind of irrelevant if I'm casting it to an unsigned u64 in the only place that value's being handled. I've switched this over now. > > > + struct list_head list; > > +}; > > > > --- a/include/linux/page_counter.h > > +++ b/include/linux/page_counter.h > > @@ -26,6 +26,7 @@ struct page_counter { > > atomic_long_t children_low_usage; > > > > unsigned long watermark; > > + unsigned long local_watermark; > > At first, I struggled understading what the locality is (when the local > value is actually in of_peak), IIUC, it's more about temporal position. > > I'd suggest a comment (if not a name) like: > /* latest reset watermark */ > > + unsigned long local_watermark; Yeah, I had a comment before that was a bit inaccurate, and was advised to remove it instead of trying to fix it in a previous round. I've added one that says "Latest cg2 reset watermark". > > > > + > > + /* User wants global or local peak? */ > > + if (fd_peak == -1UL) > > Here you use typed -1UL but not in other places. (Maybe define an > explicit macro value ((unsigned long)-1)?) Good idea! > > > +static ssize_t peak_write(struct kernfs_open_file *of, char *buf, size_t > > nbytes, > > + loff_t off, struct page_counter *pc, > > + struct list_head *watchers) > > +{ > ... > > + list_for_each_entry(peer_ctx, watchers, list) > > + if (usage > peer_ctx->value) > > + peer_ctx->value = usage; > > The READ_ONCE() in peak_show() suggests it could be WRITE_ONCE() here. Good point. I've sprinkled a few more READ_ONCE and WRITE_ONCE calls. > > > + > > + /* initial write, register watcher */ > > + if (ofp->value == -1) > > + list_add(&ofp->list, watchers); > > + > > + ofp->value = usage; > > Move the registration before iteration and drop the extra assignment? My original reason is that I could avoid an extra list hop and conditional, but at this point I see two reasons to keep it separate: - We need to reset this value either way. If it's already been reset, it may not get reset by the loop. - since these are now unsigned ints, -1 compares greater than everything, so it would need a special case (or an additional cast). (Assuming we're on a system that uses twos complement) - I think it's a bit clearer this way > > Thanks, > Michal Thanks for the review! -- David Finkel Senior Principal Software Engineer, Core Services