Re: [saj...@gmail.com: Google Summer of Code: a NUMA wishlist!]

2012-03-29 Thread Simon Marlow

On 28/03/2012 16:57, Tyson Whitehead wrote:

On March 28, 2012 04:41:16 Simon Marlow wrote:

Sure.  Do you have a NUMA machine to test on?


My understanding is non-NUMA machines went away when the AMD and Intel moved
away from frontside buses (FSB) and integrated the memory controllers on die.

Intel is more recent to this game.  I believe AMD's last non-NUMA machines
where the Athalon XP series and Intel's the Core 2 series.

An easy way to see what you've got is to see what 'numactl --hardware' says.
If the node distance matrix is not uniform, you have NUMA hardware.

As an example, on a 8 socket Opteron machine (32 cores) you get

$ numactl --hardware
available: 8 nodes (0-7)
node 0 size: 16140 MB
node 0 free: 3670 MB
node 1 size: 16160 MB
node 1 free: 3472 MB
node 2 size: 16160 MB
node 2 free: 4749 MB
node 3 size: 16160 MB
node 3 free: 4542 MB
node 4 size: 16160 MB
node 4 free: 3110 MB
node 5 size: 16160 MB
node 5 free: 1963 MB
node 6 size: 16160 MB
node 6 free: 1715 MB
node 7 size: 16160 MB
node 7 free: 2862 MB
node distances:
node   0   1   2   3   4   5   6   7
   0:  10  20  20  20  20  20  20  20
   1:  20  10  20  20  20  20  20  20
   2:  20  20  10  20  20  20  20  20
   3:  20  20  20  10  20  20  20  20
   4:  20  20  20  20  10  20  20  20
   5:  20  20  20  20  20  10  20  20
   6:  20  20  20  20  20  20  10  20
   7:  20  20  20  20  20  20  20  10


Well, you learn something new every day!  On the new 32-core Opteron box 
we have here:


available: 8 nodes (0-7)
node 0 cpus: 0 4 8 12
node 0 size: 8182 MB
node 0 free: 1994 MB
node 1 cpus: 16 20 24 28
node 1 size: 8192 MB
node 1 free: 2783 MB
node 2 cpus: 3 7 11 15
node 2 size: 8192 MB
node 2 free: 2961 MB
node 3 cpus: 19 23 27 31
node 3 size: 8192 MB
node 3 free: 5359 MB
node 4 cpus: 2 6 10 14
node 4 size: 8192 MB
node 4 free: 3030 MB
node 5 cpus: 18 22 26 30
node 5 size: 8192 MB
node 5 free: 4667 MB
node 6 cpus: 1 5 9 13
node 6 size: 8192 MB
node 6 free: 3240 MB
node 7 cpus: 17 21 25 29
node 7 size: 8192 MB
node 7 free: 4031 MB
node distances:
node   0   1   2   3   4   5   6   7
  0:  10  16  16  22  16  22  16  22
  1:  16  10  16  22  22  16  22  16
  2:  16  16  10  16  16  16  16  22
  3:  22  22  16  10  16  16  22  16
  4:  16  22  16  16  10  16  16  16
  5:  22  16  16  16  16  10  22  22
  6:  16  22  16  22  16  22  10  16
  7:  22  16  22  16  16  22  16  10

The node distances on this box are less uniform than yours.

Cheers,
Simon

___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Code review for new primop's CMM code?

2012-03-29 Thread Ryan Newton
Perhaps a question mark is more appropriate in the title.  It is a code
review I am seeking, not one on offer ;-).

On Thu, Mar 29, 2012 at 12:56 AM, Ryan Newton rrnew...@gmail.com wrote:

 Hi all,

 In preparation for students working on concurrent data structures GSOC(s),
 I wanted to make sure they could count on CAS for array elements as well as
 IORefs.  The following patch represents my first attempt:


 https://github.com/rrnewton/ghc/commit/18ed460be111b47a759486677960093d71eef386

 It passes a simple test [Appendix 2 below], but I am very unsure as to
 whether the GC write barrier is correct.  Could someone do a code-review on
 the following few lines of CMM:

if (GET_INFO(arr) == stg_MUT_ARR_PTRS_CLEAN_info) {
   SET_HDR(arr, stg_MUT_ARR_PTRS_DIRTY_info, CCCS);
   len = StgMutArrPtrs_ptrs(arr);
   // The write barrier.  We must write a byte into the mark table:
   I8[arr + SIZEOF_StgMutArrPtrs + WDS(len) + (ind 
 MUT_ARR_PTRS_CARD_BITS )] = 1;
}

 Thanks,
   -Ryan

 -- Appendix 1: First draft code CMM definition for casArray#
 ---
 stg_casArrayzh
 /* MutableArray# s a - Int# - a - a - State# s - (# State# s, Int#, a
 #) */
 {
W_ arr, p, ind, old, new, h, len;
arr = R1; // anything else?
ind = R2;
old = R3;
new = R4;

p = arr + SIZEOF_StgMutArrPtrs + WDS(ind);
(h) = foreign C cas(p, old, new) [];

if (h != old) {
// Failure, return what was there instead of 'old':
RET_NP(1,h);
} else {
// Compare and Swap Succeeded:
if (GET_INFO(arr) == stg_MUT_ARR_PTRS_CLEAN_info) {
   SET_HDR(arr, stg_MUT_ARR_PTRS_DIRTY_info, CCCS);
   len = StgMutArrPtrs_ptrs(arr);
   // The write barrier.  We must write a byte into the mark table:
   I8[arr + SIZEOF_StgMutArrPtrs + WDS(len) + (ind 
 MUT_ARR_PTRS_CARD_BITS )] = 1;
}
RET_NP(0,h);
}
 }

 -- Appendix 2:  Simple test file; when run it should print:
 ---
 -- Perform a CAS within a MutableArray#
 --   1st try should succeed: (True,33)
 -- 2nd should fail: (False,44)
 -- Printing array:
 --   33  33  33  44  33
 -- Done.
 ---
 {-# Language MagicHash, UnboxedTuples  #-}

 import GHC.IO
 import GHC.IORef
 import GHC.ST
 import GHC.STRef
 import GHC.Prim
 import GHC.Base
 import Data.Primitive.Array
 import Control.Monad

 

 -- -- | Write a value to the array at the given index:
 casArrayST :: MutableArray s a - Int - a - a - ST s (Bool, a)
 casArrayST (MutableArray arr#) (I# i#) old new = ST$ \s1# -
  case casArray# arr# i# old new s1# of
(# s2#, x#, res #) - (# s2#, (x# ==# 0#, res) #)

 
 {-# NOINLINE mynum #-}
 mynum :: Int
 mynum = 33

 main = do
  putStrLn Perform a CAS within a MutableArray#
  arr - newArray 5 mynum

  res  - stToIO$ casArrayST arr 3 mynum 44
  res2 - stToIO$ casArrayST arr 3 mynum 44
  putStrLn$   1st try should succeed: ++show res
  putStrLn$ 2nd should fail: ++show res2

  putStrLn Printing array:
  forM_ [0..4] $ \ i - do
x - readArray arr i
putStr (  ++show x)
  putStrLn 
  putStrLn Done.


___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users