On Jun 19, 2011, at 3:34 PM, Jed Brown wrote:

> On Sun, Jun 19, 2011 at 21:39, Barry Smith <bsmith at mcs.anl.gov> wrote:
> Huhh? VecDot() {if n is >> big use 2 threads else use 1} I don't see why that 
> is hard?
> 
> VecMAXPY() when some vectors were faulted with different affinity. Most any 
> use of VecPlaceArray(). Any bubbling of threads to a higher level (e.g. if 
> all thread dispatch is not strictly done at the finest level of granularity). 
> Client code that uses a different affinity during residual evaluation. Matrix 
> preallocation with variation in row length. Index sets have different sizes 
> than vectors.

   As Vec's can now track if the memory or GPU memory is valid can we not add 
info to the Vec (and Mat) indicating the memory "affinity" etc then dispatch 
different versions based on that?  For example a VecPlaceArray() would mark the 
affinity as "unknown" or something. 

   Barry


> 
> > A related matter that I keep harping on is that the memory hierarchy is 
> > very non-uniform. In the old days, it was reasonably uniform within a 
> > socket, but some of the latest hardware has multiple dies within a socket, 
> > each with more-or-less independent memory buses.
> 
>  So what is the numa.h you've been using. If we allocate vector arrays and 
> matrix arrays then does that give you the locality?
> 
> That lets you specify explicitly at allocation time how you want the memory 
> mapped. This can be achieved, more-or-less, by spawning a suitable number of 
> OpenMP (or other paradigm) threads, making sure the OS/environment was 
> configured so that they will have the affinity you desire, partitioning their 
> work load as you want, and faulting the memory.
> 
> But numa.h also has primitives to move the physical pages associated with 
> memory that you have allocated, e.g. numa_move_pages(), as well as query the 
> mapping of other memory. If every platform supported libnuma (it's 
> Linux-only), I think we would be a lot better off. We could build a slightly 
> higher level abstraction on libnuma and have predictable, debuggable mapping 
> of memory.
> 
> One option is to experiment and build this higher level abstraction using 
> libnuma with a default implementation that does something less reliable on 
> platforms without libnuma (non-Linux). Some primitives like numa_move_pages() 
> are not at all available, so they would have to just do nothing and suffer 
> the performance consequences.
> 
>   BTW: If it doesn't do it yet, ./configure needs to check for numa.h and do 
> PETSC_HAVE_NUMA_H
> 
> It doesn't, but I agree.


Reply via email to