Rolf,

Whoowh! That's actually good news, since in our own tests hierarch is always slower. But this might be due to various reasons, including the fact, that we only have two cores per node. BTW: I actually would expect IMB test to have worse performance for hierarch compared to many other benchmarks, since the rotating root causes some additional work/overhead.

Rolf Vandevaart wrote:
I am curious if anyone is doing any work currently on the hierarchical collectives. I ask this because I just did some runs on a cluster made up of 4 servers with 4 processors per server. I used TCP over IB. I was running with np=16 and using the IMB benchmark to test MPI_Bcast. What I am seeing is that the hierarchical collectives appear to boost performance. The IMB test rotates the root so one could imagine that since the hierarchical minimizes internode communication, performance increases. See the table at the end of this post with the comparison for MPI_Bcast between tuned and hierarchical. This leads me to a few other questions.

1. From what I can tell from the debug messages, we still cannot stack the hierarchical on top of the tuned. I know that Brian Barrett did some work after the collectives meeting to allow for this, but I could not figure out how to get it to work.

actually, this should be possible. We do however experience with the current trunk some problems, so I can not verify right now. So, just for the sake of clarity, did you run hierarch on top of tuned or on top of basic and/or sm?


2. Enabling the hierarchical collectives causes a massive slowdown during MPI_Init. I know it was discussed a little at the collectives meeting and it appears that this is still something we need to solve. For a simple hello_world, np=4, 2 node cluster, I see around 5 seconds to run for tuned collectives, but I see around 18 seconds for
hierarchical.

yes. A faster, however simpler hierarchy detection is implemented, but not yet committed.


3. Apart from the MPI_Init issue, is hierarchical ready to go?

Clearly, the algorithms are very simple in hierarch, but they are still lacking large-scale testing, so this is something which would have to be incorporated.

We have also experimented with various other hierarchical algorithms for bcast, over the last few months. Our overall progress has however been significantly slower than I hoped. I know however, that various other groups also have interest in the hierarch component, and might also be ready to invest some time to bring it up to speed.



Thanks
Edgar


4. As the nodes get fatter, I assume the need for hierarchical
will increase, so this may become a larger issue for all of us?

RESULTS FROM TWO RUNS OF IMB-MPI1

#----------------------------------------------------------------
# Benchmarking Bcast
# #processes = 16             TUNED         HIERARCH
#----------------------------------------------------------------
        #bytes #repetitions  t_avg[usec]  t_avg[usec]
             0         1000         0.11         0.22
             1         1000       205.97       319.86
             2         1000       159.23       180.80
             4         1000       175.32       189.16
             8         1000       153.10       184.26
            16         1000       170.98       192.33
            32         1000       160.69       187.17
            64         1000       159.75       182.62
           128         1000       175.47       185.19
           256         1000       160.77       194.68
           512         1000       265.45       313.89
          1024         1000       185.66       215.43
          2048         1000       815.97       257.37
          4096         1000      1208.48       442.93
          8192         1000      1521.23       530.54
         16384         1000      2357.45       813.44
         32768         1000      3341.29      1455.78
         65536          640      6485.70      3387.02
        131072          320     13488.35      5261.65
        262144          160     24783.09     10747.28
        524288           80     50906.06     21817.64
       1048576           40     95466.82     41397.49
       2097152           20    180759.72     81319.54
       4194304           10    322327.71    163274.55


=========================
rolf.vandeva...@sun.com
781-442-3043
=========================
_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

--
Edgar Gabriel
Assistant Professor
Parallel Software Technologies Lab      http://pstl.cs.uh.edu
Department of Computer Science          University of Houston
Philip G. Hoffman Hall, Room 524        Houston, TX-77204, USA
Tel: +1 (713) 743-3857                  Fax: +1 (713) 743-3335

Reply via email to