Scott,

out of curiosity ...

generally speaking, and when HT is more efficient, how is it used ?
- flat MPI, with one task per thread
- Hybrid MPI+OpenMP, a task is bound to a core or socket, but never to a
thread

Cheers,

Gilles

On Thursday, March 24, 2016, Atchley, Scott <atchle...@ornl.gov> wrote:

> Hi Aurélien,
>
> I have said the same thing to many users over the years. Our colleagues at
> NERSC, however, have found that 20% of their codes work better when using
> HT. Some codes benefit from SMT2 (i.e. HT) and even SMT4 (available on
> Power8) in order to provide enough latency hiding of memory accesses.
>
> As with everything in computer science, the answer is “It depends”. Try
> with and without for each new generation of hardware.
>
> Scott
>
> > On Mar 23, 2016, at 4:32 PM, Aurélien Bouteiller <boute...@icl.utk.edu
> <javascript:;>> wrote:
> >
> > To add to what Ralf said, you probably do not want to use Hyper Threads
> for HPC workloads, as that generally results in very poor performance (as
> you noticed). Set the number of slots to the number of real cores (not HT),
> that would yield optimal results 95% of the time.
> >
> > Aurélien
> >
> > --
> > Aurélien Bouteiller, Ph.D. ~~ https://icl.cs.utk.edu/~bouteill/
> >
> >> Le 23 mars 2016 à 16:24, Ralph Castain <r...@open-mpi.org <javascript:;>>
> a écrit :
> >>
> >> “Slots” are an abstraction commonly used by schedulers as a way of
> indicating how many processes are allowed to run on a given node. It has
> nothing to do with hardware, either cores or HTs.
> >>
> >> MPI programmers frequently like to bind a process to one or more
> hardware assets (cores or HTs). Thus, you will see confusion in the
> community where people mix the term “slot” with “cores” or “cpus”. This is
> unfortunate as it the terms really do mean very different things.
> >>
> >> In OMPI, we chose to try and “help” the user by not requiring them to
> specify detailed info in a hostfile. So if you don’t specify the number of
> “slots” for a given node, we will sense the number of cores on that node
> and set the slots to match that number. This best matches user expectations
> today.
> >>
> >> If you do specify the number of slots, then we use that to guide the
> desired number of processes assigned to each node. We then bind each of
> those processes according to the user-provided guidance.
> >>
> >> HTH
> >> Ralph
> >>
> >>> On Mar 23, 2016, at 9:35 AM, Federico Reghenzani <
> federico1.reghenz...@mail.polimi.it <javascript:;>> wrote:
> >>>
> >>> Ok, I've investigated further today, it seems "--map-by hwthread" does
> not remove the problem. However, if I specified in the hostfile "node0
> slots=32" it runs really slower than specifying only "node0". In both cases
> I run mpirun with -np 32. So I'm quite sure I didn't understand what slots
> are.
> >>>
> >>> __
> >>> Federico Reghenzani
> >>> M.Eng. Student @ Politecnico di Milano
> >>> Computer Science and Engineering
> >>>
> >>>
> >>>
> >>> 2016-03-22 18:56 GMT+01:00 Federico Reghenzani <
> federico1.reghenz...@mail.polimi.it <javascript:;>>:
> >>> Hi guys,
> >>>
> >>> I'm really confused about slots in resource allocation: I thought that
> slots are the number of processes spawnable in a certain node, so it should
> correspond to the number of Processing Elements of the node. For example,
> on each of my nodes I have 2 processors, total 16 cores with
> hyperthreading, so a total of 32 processing elements per node (i.e. 32 hw
> threads). However, considering a single node, passing in the hostfile 32
> slots and requesting "-np 32" results is a performance degradation of 20x
> slower than using only "-np 16". The problem disappears specifing --map-by
> hwthread.
> >>>
> >>> Investigating on the problem I found these counterintuitive things:
> >>> - here is stated "slots are Open MPI's representation of how many
> processors are available"
> >>> - here is stated "Slots indicate how many processes can potentially
> execute on a node. For best performance, the number of slots may be chosen
> to be the number of cores on the node or the number of processor sockets"
> >>> - I tried to remove the slots information from the hostfile, so
> according to this should be interpreted as "1", but it spawns anyway 32
> processes
> >>> - I'm not sure what --map-by and --rank-by do
> >>>
> >>> In custom RAS we are developing, what we have to send to mpirun? The
> number of processor sockets, the number of cores or the number of hwthread
> available? How --map-by and --rank-by affect the spawn policy?
> >>>
> >>>
> >>> Thank you!
> >>>
> >>>
> >>> OFFTOPIC: is someone going to EuroMPI 2016 in September? We will be
> there to present our migration technique.
> >>>
> >>>
> >>> Cheers,
> >>> Federico
> >>>
> >>> __
> >>> Federico Reghenzani
> >>> M.Eng. Student @ Politecnico di Milano
> >>> Computer Science and Engineering
> >>>
> >>>
> >>>
> >>> _______________________________________________
> >>> devel mailing list
> >>> de...@open-mpi.org <javascript:;>
> >>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >>> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2016/03/18723.php
> >>
> >> _______________________________________________
> >> devel mailing list
> >> de...@open-mpi.org <javascript:;>
> >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2016/03/18724.php
> >
> > _______________________________________________
> > devel mailing list
> > de...@open-mpi.org <javascript:;>
> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> > Link to this post:
> http://www.open-mpi.org/community/lists/devel/2016/03/18725.php
>
> _______________________________________________
> devel mailing list
> de...@open-mpi.org <javascript:;>
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2016/03/18726.php

Reply via email to