* Nick Piggin <[EMAIL PROTECTED]> wrote:

> [...] Although I'd imagine it may be something distros may want. For 
> example, a generic x86-64 kernel for both AMD and Intel systems could 
> easily have SMT and NUMA turned on.

yes, that's true - in fact reducing the number of separate kernel 
packages is of utmost importance to all distributions. (I'm not sure we 
are there yet with CONFIG_NUMA, but small steps wont hurt.)

> I agree with the downside of exercising less code paths though.

if we make CONFIG_NUMA good enough on small boxes so that distributors 
can turn it on then in the long run the loss could be offset by the win 
the extra QA gives.

> >is there any case where we'd want to simplify the domain tree? One more 
> >domain level is just one (and very minor) aspect of CONFIG_NUMA - i'd 
> >not want to run a CONFIG_NUMA kernel on a non-NUMA box, even if the 
> >domain tree got optimized. Hm?
> 
> I guess there is the SMT issue too, and even booting an SMP kernel on 
> a UP system. Also small ia64 NUMA systems will probably have one 
> redundant NUMA level.

i think most factors of not running an SMP kernel on a UP box are not 
due scheduler overhead: the biggest cost is spinlock overhead. Someone 
should try a little prototype: use the 'alternate instructions' 
framework to patch out calls to spinlock functions to NOPs, and 
benchmark the resulting kernel against UP. If it's "good enough", 
distros will use it. Having just a single binary kernel RPM that 
supports everything from NUMA through SMP to UP is the holy grail of 
distros. (especially the ones that offer commercial support and 
services.)

this is probably not possible on x86 - e.g. it would probably be 
expensive (in terms of runtime cost) to make the PAE/non-PAE decision 
runtime (the distro boot kernel needs to be non-PAE). But for newer 
arches like x64 it should be easier.

> If/when topologies get more complex (for example, the recent Altix 
> discussions we had with Paul), it will be generally easier to set up 
> all levels in a generic way, then weed them out using something like 
> this, rather than put the logic in the domain setup code.

ok. That should also make it easier to put more of the arch domain setup 
code into sched.c. E.g. i'm still uneasy about it having so much 
scheduler code in arch/ia64/kernel/domain.c, and all the ripple effects. 
(the #ifdefs, include file impact, etc.)

        Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to