> Right now, Simics tells Solaris that all of the
> memory is on a single board, even though my add-on
> module to Simics actually carries out the timing of
> NUMA. The bottom line is that we currently model
> the timing of NUMA, however Solaris does not do any
> memory placement optimization because it thinks
> memory is all on one board.
>
> Thus in order to get memory placement optimization
> working, I believe I need to bring up a newer version
> of Solaris, and to possibly get Simics to properly
> tell Solaris about NUMA hardware.
Right.
> posted the same question in the code group.
> Apparently for SPARC, the platform-specific files
> statically define the lgroups, and that many/most of
> the platform-specific files are *not* included with
> OpenSolaris.
They are included. The SPARC lgrp platform code lives
in usr/src/uts/sun4/os/lgrpplat.c
You'll notice that many of the routines call a platform
specific version (plat_lgrp_* vs lgrp_plat_*), and those
functions live in various platform modules (platmods).
For example, the routine associating CPUs with lgroups on Serengeti
class systems lives in usr/src/uts/sun4u/serengeti/os/serengeti.c
/*
* Return the platform handle for the lgroup containing the given CPU
*
* For Serengeti, lgroup platform handle == board number
*/
lgrp_handle_t
plat_lgrp_cpu_to_hand(processorid_t id)
{
return (CPUID_TO_BOARD(id));
}
memnodes (expanses of physical memory) are also associated
with lgroups. That association is made in lgrpplat.c as well,
although the code which populates the mnode <-> lgrp platform handle
lookup arrays lives in the platmods (for SPARC systems).
So changing these routines such that they return something other than
LGRP_DEFAULT_HANDLE is at least the basis for doing what you want.
The devil is always in the details though, so if you get stuck let us know.
> Thus it seems that maybe my best solution is to add a
> mechanism (system-call or something) so that I can
> manually define lgroups before running my database
> workload. OR, I can go with the Opteron approach and
> have Solaris manually probe the memory system at boot
> time in order to figure out the lgroups dynamically.
> Doing so might be easiest because it is easy for me
> to affect the timing of Simics, but it isn't so
> easy to make Simics return the right information to
> Solaris (static hardware platform).
Nah, it "should" be pretty easy. :) To start, I would suggest
just hardcoding what the lgrp_plat_* routines return to see if you can get
more than one lgroup created.
The lgrp observability tools (found in the perf community) should
prove helpful here.
This might also be a good time to mention that we've been talking for
some time about (re)examining the common/architecture/platform NUMA
interfaces to see if they can be re-factored in a way that's more modular.
Enabling MPO support for new architectures/platforms should be easier
than it is...and it would be nice to say: "just implement your platform's
version of the 4 functions defined in foo.h to enable/configure MPO"...
Adding / configuring MPO support by dropping in a new loadable kernel
module would be useful too.
Thanks,
-Eric
> Thanks for the interest and response. Would welcome
> any ideas.
Sure, any time.
-Eric
This message posted from opensolaris.org
_______________________________________________
perf-discuss mailing list
[email protected]