On 2/9/26 8:29 PM, Chen Ridong wrote:

On 2026/2/10 4:29, Waiman Long wrote:
On 2/9/26 2:12 AM, Chen Ridong wrote:
           return;
       }
   -    WARN_ON_ONCE(housekeeping_update(isolated_cpus) < 0);
-    isolated_cpus_updating = false;
+    /*
+     * update_isolation_cpumasks() may be called more than once in the
+     * same cpuset_mutex critical section.
+     */
+    lockdep_assert_held(&cpuset_top_mutex);
+    if (isolcpus_twork_queued)
+        return;
+
+    init_task_work(&twork_cb, isolcpus_tworkfn);
+    if (!task_work_add(current, &twork_cb, TWA_RESUME))
+        isolcpus_twork_queued = true;
+    else
+        WARN_ON_ONCE(1);    /* Current task shouldn't be exiting */
   }
Timeline:

user A            user B
write isolated cpus    write isolated cpus
isolated_cpus_update
update_isolation_cpumasks
task_work_add
isolcpus_twork_queued =true

// before returning userspace
// waiting for worker
             isolated_cpus_update
             if (isolcpus_twork_queued)
                 return // Early exit
             // return to userspace

// workqueue finishes
// return to userspace

For User B, the isolated_cpus value appears to be set and the syscall returns
successfully to userspace. However, because isolcpus_twork_queued was already
true (set by User A), User B's call skipped the actual mask update
(update_isolation_cpumasks).
Thus, the new isolated_cpus value is not yet effective in the kernel, even
though User B's write operation returned without error.

Is this a valid issue? Should User B's write be blocked?
It is perfectly possible that isolated_cpus can be modified more than one time
from different tasks before a work or task_work function is executed. When that
function is invoked, isolated_cpus should contain changes for both. It will copy
isolated_cpus to isolated_hk_cpus and pass it to housekeeping_update(). When the
It is clear about isolated_hk_cpus and isolated_cpus.

2nd work or task_work function is invoked, it will see that isolated_cpus match
isolated_hk_cpus and skip the housekeeping_update() action. There is no need to
block user B's write as only one task can update isolated_cpus at any time.

The main question remains: user B receives a success return even though
isolated_hk_cpus has not yet taken effect (i.e.,
/sys/devices/system/cpu/isolated does not reflect the change). In that case, how
can user B confirm whether their configuration is actually applied?

task_work function is synchronous. IOW, if a user writes to a cpuset control file to modify an isolated partition, when control is passed back to userspace, it is guaranteed that the task_work function, if queued, would have been executed.

wq work function, OTOH, is asynchronous. So if a user brings down an isolated CPU to make an isolated partition invalid, the supposed changes to the sched domains may not be completed by the time the offline operation returns. However this is an operation that normal users shouldn't do in a production system anyway and they are taking their own risk if they try to do it.

Cheers,
Longman


Reply via email to