On 2026/2/7 4:37, Waiman Long wrote:
> Now that we are going to defer any changes to the HK_TYPE_DOMAIN
> housekeeping cpumasks to either task_work or workqueue
> where rebuild_sched_domains() call will be issued. The current
> rebuild_sched_domains_locked() call near the end of the cpuset critical
> section can be removed in such cases.
> 
> Currently, a boolean force_sd_rebuild flag is used to decide if
> rebuild_sched_domains_locked() call needs to be invoked. To allow
> deferral that like, we change it to a tri-state sd_rebuild enumaration
> type.
> 
> Signed-off-by: Waiman Long <[email protected]>
> ---
>  kernel/cgroup/cpuset.c | 20 ++++++++++++++------
>  1 file changed, 14 insertions(+), 6 deletions(-)
> 
> diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
> index d26c77a726b2..e224df321e34 100644
> --- a/kernel/cgroup/cpuset.c
> +++ b/kernel/cgroup/cpuset.c
> @@ -173,7 +173,11 @@ static bool              isolcpus_twork_queued;  /* T */
>   * Note that update_relax_domain_level() in cpuset-v1.c can still call
>   * rebuild_sched_domains_locked() directly without using this flag.
>   */
> -static bool force_sd_rebuild;                        /* RWCS */
> +static enum {
> +     SD_NO_REBUILD = 0,
> +     SD_REBUILD,
> +     SD_DEFER_REBUILD,
> +} sd_rebuild;                                        /* RWCS */
>  
>  /*
>   * Partition root states:
> @@ -990,7 +994,7 @@ void rebuild_sched_domains_locked(void)
>  
>       lockdep_assert_cpus_held();
>       lockdep_assert_cpuset_lock_held();
> -     force_sd_rebuild = false;
> +     sd_rebuild = SD_NO_REBUILD;
>  
>       /* Generate domain masks and attrs */
>       ndoms = generate_sched_domains(&doms, &attr);
> @@ -1377,6 +1381,9 @@ static void update_isolation_cpumasks(void)
>       else
>               isolated_cpus_updating = false;
>  

If isolated_hk_cpus is defined, I believe isolated_cpus_updating becomes 
redundant.

> +     /* Defer rebuild_sched_domains() to task_work or wq */
> +     sd_rebuild = SD_DEFER_REBUILD;
> +

There is a potential issue: we defer all domain rebuilds here, including those
triggered by hotplug events which may change the isolation state.

The problem is that functions like cpuset_cpu_active, which rely on the
scheduler domains being up-to-date—will, also be delayed. Is that okay?

>       /*
>        * This function can be reached either directly from regular cpuset
>        * control file write or via CPU hotplug. In the latter case, it is
> @@ -3011,7 +3018,7 @@ static int update_prstate(struct cpuset *cs, int 
> new_prs)
>       update_partition_sd_lb(cs, old_prs);
>  
>       notify_partition_change(cs, old_prs);
> -     if (force_sd_rebuild)
> +     if (sd_rebuild == SD_REBUILD)
>               rebuild_sched_domains_locked();
>       free_tmpmasks(&tmpmask);
>       return 0;
> @@ -3288,7 +3295,7 @@ ssize_t cpuset_write_resmask(struct kernfs_open_file 
> *of,
>       }
>  
>       free_cpuset(trialcs);
> -     if (force_sd_rebuild)
> +     if (sd_rebuild == SD_REBUILD)
>               rebuild_sched_domains_locked();
>  out_unlock:
>       cpuset_full_unlock();
> @@ -3771,7 +3778,8 @@ hotplug_update_tasks(struct cpuset *cs,
>  
>  void cpuset_force_rebuild(void)
>  {
> -     force_sd_rebuild = true;
> +     if (!sd_rebuild)
> +             sd_rebuild = SD_REBUILD;
>  }
>  
>  /**
> @@ -3981,7 +3989,7 @@ static void cpuset_handle_hotplug(void)
>       }
>  
>       /* rebuild sched domains if necessary */
> -     if (force_sd_rebuild)
> +     if (sd_rebuild == SD_REBUILD)
>               rebuild_sched_domains_cpuslocked();
>  
>       free_tmpmasks(ptmp);

-- 
Best regards,
Ridong


Reply via email to