On Tue, 03 Apr 2007 16:00:50 -0700 Ulrich Drepper <[EMAIL PROTECTED]> wrote:
> Andrew Morton wrote: > > Now it could be argued that the current behaviour is that sane thing: we > > allow the process to "pin" itself to not-present CPUs and just handle it in > > the CPU scheduler. > > As a stop-gap solution Jakub will likely implement the sched_getaffinity > hack. So, it would realy be best to get the masks updated. > > > But all this of course does not solve the issue sysconf() has. In > sysconf we cannot use sched_getaffinity since all the systems CPUs must > be reported. OK. This is excecptionally gruesome, but one could run sched_getaffinity() against pid 1 (init). Which will break nicely in the OS-virtualised future when the system has multiple pid-1-inits running in containers... > > > Is it kernel overhead, or userspace? The overhead of counting the bits? > > The overhead I meant is userland. > OK. Your cost of counting those bits is proportional to CONFIG_NR_CPUS. It's a bit sad that sys_sched_get_get_affinity() returns sizeof(cpumask_t), because that means that userspace must handle 256 or whatever CPUs on a machine which only has two CPUs. Does anyone see a reason why sys_sched_getaffinity() cannot be altered to return maximum-possible-cpu-id-on-this-machine? That way, your hweight operation will be much faster on sane-sized machines. > > > Because sched_getaffinity() could be easily sped up in the case where > > it is operating on the current process. > > If there is possibility to treat this case special and make it faster, > please do so. It would be best to allow pid==0 as a special case so > that callers don't have to find out the TID (which they shouldn't have > to know). > OK. Does anyone see a reason why we cannot do this? --- a/kernel/sched.c~sched_getaffinity-speedup +++ a/kernel/sched.c @@ -4381,8 +4381,12 @@ long sched_getaffinity(pid_t pid, cpumas struct task_struct *p; int retval; - lock_cpu_hotplug(); - read_lock(&tasklist_lock); + if (pid) { + lock_cpu_hotplug(); + read_lock(&tasklist_lock); + } else { + preempt_disable(); /* Prevent CPU hotplugging */ + } retval = -ESRCH; p = find_process_by_pid(pid); @@ -4396,12 +4400,13 @@ long sched_getaffinity(pid_t pid, cpumas cpus_and(*mask, p->cpus_allowed, cpu_online_map); out_unlock: - read_unlock(&tasklist_lock); - unlock_cpu_hotplug(); - if (retval) - return retval; - - return 0; + if (pid) { + read_unlock(&tasklist_lock); + unlock_cpu_hotplug(); + } else { + preempt_enable(); + } + return retval; } /** _ > > > Anyway, where do we stand? Assuming we can address the CPU hotplug issues, > > does sched_getaffinity() look like it will be suitable? > > It's only usable for the special case on the OpenMP code where the > number of threads is used to determine the number of worker threads. > For sysconf() we still need better support. Maybe now somebody will > step up and say they need faster sysconf as well. I guess we could add a simple sys_get_nr_cpus(). If we want more than that (ie: topology, SMT/MC/NUMA/numa-distance etc) then it gets much more complex and sysfs is more appropriate for that. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/