[perf-discuss] Re: Memory Placement Optimization for SPARC (lgroup creation)

Eric C. Saxe Tue, 11 Apr 2006 14:29:27 -0700

> Right now, Simics tells Solaris that all of the
> memory is on a single board, even though my add-on
> module to Simics actually carries out the timing of
> NUMA.   The bottom line is that we currently model
> the timing of NUMA, however Solaris does not do any
> memory placement optimization because it thinks
> memory is all on one board.  
> 
> Thus in order to get memory placement optimization
> working, I believe I need to bring up a newer version
> of Solaris, and to possibly get Simics to properly
> tell Solaris about NUMA hardware.


Right.
   
> posted the same question in the code group.
> Apparently for SPARC, the platform-specific files
> statically define the lgroups, and that many/most of
> the platform-specific files are *not* included with
>  OpenSolaris.  

They are included. The SPARC lgrp platform code lives
in usr/src/uts/sun4/os/lgrpplat.c

You'll notice that many of the routines call a platform
specific version (plat_lgrp_* vs lgrp_plat_*), and those
functions live in various platform modules (platmods).

For example, the routine associating CPUs with lgroups on Serengeti
class systems lives in usr/src/uts/sun4u/serengeti/os/serengeti.c

/*
 * Return the platform handle for the lgroup containing the given CPU
 *
 * For Serengeti, lgroup platform handle == board number
 */
lgrp_handle_t
plat_lgrp_cpu_to_hand(processorid_t id)
{
        return (CPUID_TO_BOARD(id));
}

memnodes (expanses of physical memory) are also associated
with lgroups. That association is made in lgrpplat.c as well,
although the code which populates the mnode <-> lgrp platform handle
lookup arrays lives in the platmods (for SPARC systems).

So changing these routines such that they return something other than
LGRP_DEFAULT_HANDLE is at least the basis for doing what you want.

The devil is always in the details though, so if you get stuck let us know.

> Thus it seems that maybe my best solution is to add a
> mechanism (system-call or something) so that I can
> manually define lgroups before running my database
> workload.  OR, I can go with the Opteron approach and
> have Solaris manually probe the memory system at boot
> time in order to figure out the lgroups dynamically.
> Doing so might be easiest because it is easy for me
> to affect the timing of Simics, but it isn't so
> easy to make Simics return the right information to
>  Solaris (static hardware platform).  

Nah, it "should" be pretty easy. :) To start, I would suggest
just hardcoding what the lgrp_plat_* routines return to see if you can get
more than one lgroup created.

The lgrp observability tools (found in the perf community) should
prove helpful here.

This might also be a good time to mention that we've been talking for
some time about (re)examining the common/architecture/platform NUMA
interfaces to see if they can be re-factored in a way that's more modular.

Enabling MPO support for new architectures/platforms should be easier
than it is...and it would be nice to say: "just implement your platform's
version of the 4 functions defined in foo.h to enable/configure MPO"...

Adding / configuring MPO support by dropping in a new loadable kernel
module would be useful too.

Thanks,
-Eric
 
> Thanks for the interest and response.  Would welcome
> any ideas.

Sure, any time.

-Eric
 
 
This message posted from opensolaris.org
_______________________________________________
perf-discuss mailing list
[email protected]

[perf-discuss] Re: Memory Placement Optimization for SPARC (lgroup creation)

Reply via email to