Scott, out of curiosity ...
generally speaking, and when HT is more efficient, how is it used ? - flat MPI, with one task per thread - Hybrid MPI+OpenMP, a task is bound to a core or socket, but never to a thread Cheers, Gilles On Thursday, March 24, 2016, Atchley, Scott <atchle...@ornl.gov> wrote: > Hi Aurélien, > > I have said the same thing to many users over the years. Our colleagues at > NERSC, however, have found that 20% of their codes work better when using > HT. Some codes benefit from SMT2 (i.e. HT) and even SMT4 (available on > Power8) in order to provide enough latency hiding of memory accesses. > > As with everything in computer science, the answer is “It depends”. Try > with and without for each new generation of hardware. > > Scott > > > On Mar 23, 2016, at 4:32 PM, Aurélien Bouteiller <boute...@icl.utk.edu > <javascript:;>> wrote: > > > > To add to what Ralf said, you probably do not want to use Hyper Threads > for HPC workloads, as that generally results in very poor performance (as > you noticed). Set the number of slots to the number of real cores (not HT), > that would yield optimal results 95% of the time. > > > > Aurélien > > > > -- > > Aurélien Bouteiller, Ph.D. ~~ https://icl.cs.utk.edu/~bouteill/ > > > >> Le 23 mars 2016 à 16:24, Ralph Castain <r...@open-mpi.org <javascript:;>> > a écrit : > >> > >> “Slots” are an abstraction commonly used by schedulers as a way of > indicating how many processes are allowed to run on a given node. It has > nothing to do with hardware, either cores or HTs. > >> > >> MPI programmers frequently like to bind a process to one or more > hardware assets (cores or HTs). Thus, you will see confusion in the > community where people mix the term “slot” with “cores” or “cpus”. This is > unfortunate as it the terms really do mean very different things. > >> > >> In OMPI, we chose to try and “help” the user by not requiring them to > specify detailed info in a hostfile. So if you don’t specify the number of > “slots” for a given node, we will sense the number of cores on that node > and set the slots to match that number. This best matches user expectations > today. > >> > >> If you do specify the number of slots, then we use that to guide the > desired number of processes assigned to each node. We then bind each of > those processes according to the user-provided guidance. > >> > >> HTH > >> Ralph > >> > >>> On Mar 23, 2016, at 9:35 AM, Federico Reghenzani < > federico1.reghenz...@mail.polimi.it <javascript:;>> wrote: > >>> > >>> Ok, I've investigated further today, it seems "--map-by hwthread" does > not remove the problem. However, if I specified in the hostfile "node0 > slots=32" it runs really slower than specifying only "node0". In both cases > I run mpirun with -np 32. So I'm quite sure I didn't understand what slots > are. > >>> > >>> __ > >>> Federico Reghenzani > >>> M.Eng. Student @ Politecnico di Milano > >>> Computer Science and Engineering > >>> > >>> > >>> > >>> 2016-03-22 18:56 GMT+01:00 Federico Reghenzani < > federico1.reghenz...@mail.polimi.it <javascript:;>>: > >>> Hi guys, > >>> > >>> I'm really confused about slots in resource allocation: I thought that > slots are the number of processes spawnable in a certain node, so it should > correspond to the number of Processing Elements of the node. For example, > on each of my nodes I have 2 processors, total 16 cores with > hyperthreading, so a total of 32 processing elements per node (i.e. 32 hw > threads). However, considering a single node, passing in the hostfile 32 > slots and requesting "-np 32" results is a performance degradation of 20x > slower than using only "-np 16". The problem disappears specifing --map-by > hwthread. > >>> > >>> Investigating on the problem I found these counterintuitive things: > >>> - here is stated "slots are Open MPI's representation of how many > processors are available" > >>> - here is stated "Slots indicate how many processes can potentially > execute on a node. For best performance, the number of slots may be chosen > to be the number of cores on the node or the number of processor sockets" > >>> - I tried to remove the slots information from the hostfile, so > according to this should be interpreted as "1", but it spawns anyway 32 > processes > >>> - I'm not sure what --map-by and --rank-by do > >>> > >>> In custom RAS we are developing, what we have to send to mpirun? The > number of processor sockets, the number of cores or the number of hwthread > available? How --map-by and --rank-by affect the spawn policy? > >>> > >>> > >>> Thank you! > >>> > >>> > >>> OFFTOPIC: is someone going to EuroMPI 2016 in September? We will be > there to present our migration technique. > >>> > >>> > >>> Cheers, > >>> Federico > >>> > >>> __ > >>> Federico Reghenzani > >>> M.Eng. Student @ Politecnico di Milano > >>> Computer Science and Engineering > >>> > >>> > >>> > >>> _______________________________________________ > >>> devel mailing list > >>> de...@open-mpi.org <javascript:;> > >>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > >>> Link to this post: > http://www.open-mpi.org/community/lists/devel/2016/03/18723.php > >> > >> _______________________________________________ > >> devel mailing list > >> de...@open-mpi.org <javascript:;> > >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > >> Link to this post: > http://www.open-mpi.org/community/lists/devel/2016/03/18724.php > > > > _______________________________________________ > > devel mailing list > > de...@open-mpi.org <javascript:;> > > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > > Link to this post: > http://www.open-mpi.org/community/lists/devel/2016/03/18725.php > > _______________________________________________ > devel mailing list > de...@open-mpi.org <javascript:;> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2016/03/18726.php