On 4/15/21 1:53 PM, Johannes Weiner wrote:
On Tue, Apr 13, 2021 at 09:20:27PM -0400, Waiman Long wrote:
Most kmem_cache_alloc() calls are from user context. With instrumentation
enabled, the measured amount of kmem_cache_alloc() calls from non-task
context was about 0.01% of the total.
The irq disable/enable sequence used in this case to access content
from object stock is slow. To optimize for user context access, there
are now two object stocks for task context and interrupt context access
respectively.
The task context object stock can be accessed after disabling preemption
which is cheap in non-preempt kernel. The interrupt context object stock
can only be accessed after disabling interrupt. User context code can
access interrupt object stock, but not vice versa.
The mod_objcg_state() function is also modified to make sure that memcg
and lruvec stat updates are done with interrupted disabled.
The downside of this change is that there are more data stored in local
object stocks and not reflected in the charge counter and the vmstat
arrays. However, this is a small price to pay for better performance.
Signed-off-by: Waiman Long <long...@redhat.com>
Acked-by: Roman Gushchin <g...@fb.com>
Reviewed-by: Shakeel Butt <shake...@google.com>
This makes sense, and also explains the previous patch a bit
better. But please merge those two.
The reason I broke it into two is so that the patches are individually
easier to review. I prefer to update the commit log of patch 4 to
explain why the obj_stock structure is introduced instead of merging the
two.
@@ -2229,7 +2229,8 @@ struct obj_stock {
struct memcg_stock_pcp {
struct mem_cgroup *cached; /* this never be root cgroup */
unsigned int nr_pages;
- struct obj_stock obj;
+ struct obj_stock task_obj;
+ struct obj_stock irq_obj;
struct work_struct work;
unsigned long flags;
@@ -2254,11 +2255,48 @@ static bool obj_stock_flush_required(struct
memcg_stock_pcp *stock,
}
#endif
+/*
+ * Most kmem_cache_alloc() calls are from user context. The irq disable/enable
+ * sequence used in this case to access content from object stock is slow.
+ * To optimize for user context access, there are now two object stocks for
+ * task context and interrupt context access respectively.
+ *
+ * The task context object stock can be accessed by disabling preemption only
+ * which is cheap in non-preempt kernel. The interrupt context object stock
+ * can only be accessed after disabling interrupt. User context code can
+ * access interrupt object stock, but not vice versa.
+ */
static inline struct obj_stock *current_obj_stock(void)
{
struct memcg_stock_pcp *stock = this_cpu_ptr(&memcg_stock);
- return &stock->obj;
+ return in_task() ? &stock->task_obj : &stock->irq_obj;
+}
+
+#define get_obj_stock(flags) \
+({ \
+ struct memcg_stock_pcp *stock; \
+ struct obj_stock *obj_stock; \
+ \
+ if (in_task()) { \
+ preempt_disable(); \
+ (flags) = -1L; \
+ stock = this_cpu_ptr(&memcg_stock); \
+ obj_stock = &stock->task_obj; \
+ } else { \
+ local_irq_save(flags); \
+ stock = this_cpu_ptr(&memcg_stock); \
+ obj_stock = &stock->irq_obj; \
+ } \
+ obj_stock; \
+})
+
+static inline void put_obj_stock(unsigned long flags)
+{
+ if (flags == -1L)
+ preempt_enable();
+ else
+ local_irq_restore(flags);
}
Please make them both functions and use 'unsigned long *flags'.
Sure, I can do that.
Also I'm not sure doing in_task() twice would actually be more
expensive than the == -1 special case, and easier to understand.
I can make that change too. Either way is fine with me.
@@ -2327,7 +2365,9 @@ static void drain_local_stock(struct work_struct *dummy)
local_irq_save(flags);
stock = this_cpu_ptr(&memcg_stock);
- drain_obj_stock(&stock->obj);
+ drain_obj_stock(&stock->irq_obj);
+ if (in_task())
+ drain_obj_stock(&stock->task_obj);
drain_stock(stock);
clear_bit(FLUSHING_CACHED_CHARGE, &stock->flags);
@@ -3183,7 +3223,7 @@ static inline void mod_objcg_state(struct obj_cgroup *objcg,
memcg = obj_cgroup_memcg(objcg);
if (pgdat)
lruvec = mem_cgroup_lruvec(memcg, pgdat);
- __mod_memcg_lruvec_state(memcg, lruvec, idx, nr);
+ mod_memcg_lruvec_state(memcg, lruvec, idx, nr);
rcu_read_unlock();
This is actually a bug introduced in the earlier patch, isn't it?
Calling __mod_memcg_lruvec_state() without irqs disabled...
Not really, in patch 3, mod_objcg_state() is called only in the stock
update context where interrupt had already been disabled. But now, that
is no longer the case, that is why i need to update mod_objcg_state() to
make sure irq is disabled before updating vmstat data array.
Cheers,
Longman