date:20160323

Re: [OMPI devel] MPI Error

2016-03-23 Thread Dominic Kedelty

Hi Gilles,

I believe I have found the problem. Initially I thought it may have been an
mpi issue since it was internally within an mpi function. However, now I am
sure that the problem has to do with an overflow of 4-byte signed integers.

I am dealing with computational domains that have a little more than a
billion cells (1024^3 cells). However, I am still within the limits of the
4 byte integer. The area where I am running into the problem is here I have
shortened the code,

 ! Fileviews
integer :: fileview_hexa
integer :: fileview_hexa_conn

integer, dimension(:), allocatable :: blocklength
integer, dimension(:), allocatable :: map
integer(KIND=8) :: size
integer(KIND=MPI_OFFSET_KIND) :: disp   ! MPI_OFFSET_KIND seems to be
8-bytes

allocate(map(ncells_hexa_),blocklength(ncells_hexa_))
map = hexa_-1
hexa_ is a 4-byte array of integers that label local hexa elements at a
given rank. The max this number can be is in my current code (1024^3). and
min is 1.
blocklength = 1
call
MPI_TYPE_INDEXED(ncells_hexa_,blocklength,map,MPI_REAL_SP,fileview_hexa,ierr)
MPI_REAL_SP is used for 4-byte scalar data types that are going to be
written to the file. (i.e. temperature scalar stored at a given hexa cell)
call MPI_TYPE_COMMIT(fileview_hexa,ierr)
map = map * 8
here is where problems arise. The map is being multiplied by 8 because the
hexa cell node connectivity needs to be written. The node numbering that is
being written to the file needs to be 4-bytes, and the max node numbering
is able to be held within the 4-byte signed integer. But since I have to
map 8*1024^3 displacements to be written map needs to be integer(kind=8).
blocklength = 8
call
MPI_TYPE_INDEXED(ncells_hexa_,blocklength,map,MPI_INTEGER,fileview_hexa_conn,ierr)
MPI_TYPE_INDEXED( 1024^3,  blocklength=(/8 8 8 8 8 . 8 8 /), map=(/0,
8, 16, 24, . , 8589934592/), MPI_INTEGER, file_view_hexa_conn, ierr)
Would this be a correct way to declare the newdatatype file_view_hexa_conn.
in this call blocklength would be a 4-byte integer array and map would be
an 8-byte integer array. To be clear, in the code currently has both map
and blocklength as 4-bytes integer arrays.
call MPI_TYPE_COMMIT(fileview_hexa_conn,ierr)
deallocate(map,blocklength)

disp = disp+84
call
MPI_FILE_SET_VIEW(iunit,disp,MPI_INTEGER,fileview_hexa,"native",MPI_INFO_NULL,ierr)
call MPI_FILE_WRITE_ALL(iunit,hexa_,ncells_hexa_,MPI_INTEGER,status,ierr)
I believe this could be wrong as well but the fileview_hexa is being used
to write the 4-byte integer hexa labeling, but as you said MPI_REAL_SP =
MPI_INTEGER = 4-byte may be fine. It has not given any problems thus far.
disp = disp+4*ncells_hexa
call
MPI_FILE_SET_VIEW(iunit,disp,MPI_INTEGER,fileview_hexa_conn,"native",MPI_INFO_NULL,ierr)
size = 8*ncells_hexa_
call MPI_FILE_WRITE_ALL(iunit,conn_hexa,size,MPI_INTEGER,status,ierr)

Hopefully that is enough information about the issue. Now my questions.

   1. Does this implementation look correct.
   2. What kind should fileview_hexa and fileview_hexa_conn be?
   3. Is it okay that map and blocklength are different integer kinds?
   4. What does the blocklength parameter specify exactly. I played with
   this some and changing the blocklength did not seem to change anything

Thanks for the help.

-Dominic Kedelty

On Wed, Mar 16, 2016 at 12:02 AM, Gilles Gouaillardet 
wrote:

> Dominic,
>
> at first, you might try to add
> call MPI_Barrier(comm,ierr)
> between
>
>   if (file_is_there .and. irank.eq.iroot) call
> MPI_FILE_DELETE(file,MPI_INFO_NULL,ierr)
>
> and
>
>   call
> MPI_FILE_OPEN(comm,file,IOR(MPI_MODE_WRONLY,MPI_MODE_CREATE),MPI_INFO_NULL,iunit,ierr)
>
> /* there might me a race condition, i am not sure about that */
>
>
> fwiw, the
>
> STOP A configuration file is required
>
> error message is not coming from OpenMPI.
> it might be indirectly triggered by an ompio bug/limitation, or even a bug
> in your application.
> did you get your application work with an other flavor of OpenMPI ?
> e.g. are you reporting an OpenMPI bug ?
> or are you asking some help with your application (the bug could either be
> in your code or in OpenMPI, and you do not know for sure)
>
> i am a bit surprised you are using the same fileview_node type with both
> MPI_INTEGER and MPI_REAL_SP, but since they should be the same size, that
> might not be an issue.
>
> the subroutine depends on too many external parameters
> (nnodes_, fileview_node, ncells_hexa, ncells_hexa_, unstr2str, ...)
> so writing a simple reproducer might not be trivial.
>
> i recommend you first write a self contained program that can be evidenced
> to reproduce the issue,
> and then i will investigate that. for that, you might want to dump the
> array sizes and the description of fileview_node in your application, and
> then hard code them into your self contained program.
> also how many nodes/tasks are you running and what filesystem are you
> running on ?
>
> Cheers,
>
> Gilles
>
>
> On 3/16/2016 3:05 PM, Dominic Kedelty wrote

Re: [OMPI devel] MPI Error

2016-03-23 Thread Gilles Gouaillardet


Dominik,

with MPI_Type_indexed, array_of_displacements is an int[]
so yes, there is a risk of overflow

on the other hand, MPI_Type_create_hindexed, array_of_displacements is 
an MPI_Aint[]


note
 array_of_displacements
 Displacement  for  each  block,  in  multiples  of 
oldtype extent for MPI_Type_indexed and bytes for 
MPI_Type_create_hindexed (array of integer for MPI_TYPE_INDEXED, array 
of MPI_Aint for

 MPI_TYPE_CREATE_HINDEXED).


i do not fully understand what you are trying to achieve ...

MPI_TYPE_INDEXED( 1024^3,  blocklength=(/8 8 8 8 8 . 8 8 /), 
map=(/0, 8, 16, 24, . , 8589934592/), MPI_INTEGER, 
file_view_hexa_conn, ierr)


at first glance, this is equivalent to
MPI_Type_contiguous(1024^3, MPI_INTEGER, file_view_hexa_conn, ierr)

so unless your data has a non regular shape, i recomment you use other 
MPI primitives to create your datatype.

This should be much more efficient, and less prone to integer overflow

Cheers,

Gilles
On 3/23/2016 2:50 PM, Dominic Kedelty wrote:

Hi Gilles,

I believe I have found the problem. Initially I thought it may have 
been an mpi issue since it was internally within an mpi function. 
However, now I am sure that the problem has to do with an overflow of 
4-byte signed integers.


I am dealing with computational domains that have a little more than a 
billion cells (1024^3 cells). However, I am still within the limits of 
the 4 byte integer. The area where I am running into the problem is 
here I have shortened the code,


 ! Fileviews
integer :: fileview_hexa
integer :: fileview_hexa_conn

integer, dimension(:), allocatable :: blocklength
integer, dimension(:), allocatable :: map
integer(KIND=8) :: size
integer(KIND=MPI_OFFSET_KIND) :: disp   ! MPI_OFFSET_KIND seems to be 
8-bytes


allocate(map(ncells_hexa_),blocklength(ncells_hexa_))
map = hexa_-1
hexa_ is a 4-byte array of integers that label local hexa elements at 
a given rank. The max this number can be is in my current code 
(1024^3). and min is 1.

blocklength = 1
call 
MPI_TYPE_INDEXED(ncells_hexa_,blocklength,map,MPI_REAL_SP,fileview_hexa,ierr)
MPI_REAL_SP is used for 4-byte scalar data types that are going to be 
written to the file. (i.e. temperature scalar stored at a given hexa 
cell)

call MPI_TYPE_COMMIT(fileview_hexa,ierr)
map = map * 8
here is where problems arise. The map is being multiplied by 8 because 
the hexa cell node connectivity needs to be written. The node 
numbering that is being written to the file needs to be 4-bytes, and 
the max node numbering is able to be held within the 4-byte signed 
integer. But since I have to map 8*1024^3 displacements to be written 
map needs to be integer(kind=8).

blocklength = 8
call 
MPI_TYPE_INDEXED(ncells_hexa_,blocklength,map,MPI_INTEGER,fileview_hexa_conn,ierr)
MPI_TYPE_INDEXED( 1024^3,  blocklength=(/8 8 8 8 8 . 8 8 /), 
map=(/0, 8, 16, 24, . , 8589934592/), MPI_INTEGER, 
file_view_hexa_conn, ierr)
Would this be a correct way to declare the newdatatype 
file_view_hexa_conn. in this call blocklength would be a 4-byte 
integer array and map would be an 8-byte integer array. To be clear, 
in the code currently has both map and blocklength as 4-bytes integer 
arrays.

call MPI_TYPE_COMMIT(fileview_hexa_conn,ierr)
deallocate(map,blocklength)



disp = disp+84
call 
MPI_FILE_SET_VIEW(iunit,disp,MPI_INTEGER,fileview_hexa,"native",MPI_INFO_NULL,ierr)

call MPI_FILE_WRITE_ALL(iunit,hexa_,ncells_hexa_,MPI_INTEGER,status,ierr)
I believe this could be wrong as well but the fileview_hexa is being 
used to write the 4-byte integer hexa labeling, but as you said 
MPI_REAL_SP = MPI_INTEGER = 4-byte may be fine. It has not given any 
problems thus far.

disp = disp+4*ncells_hexa
call 
MPI_FILE_SET_VIEW(iunit,disp,MPI_INTEGER,fileview_hexa_conn,"native",MPI_INFO_NULL,ierr)

size = 8*ncells_hexa_
call MPI_FILE_WRITE_ALL(iunit,conn_hexa,size,MPI_INTEGER,status,ierr)


Hopefully that is enough information about the issue. Now my questions.

 1. Does this implementation look correct.
 2. What kind should fileview_hexa and fileview_hexa_conn be?
 3. Is it okay that map and blocklength are different integer kinds?
 4. What does the blocklength parameter specify exactly. I played with
this some and changing the blocklength did not seem to change anything

Thanks for the help.

-Dominic Kedelty


On Wed, Mar 16, 2016 at 12:02 AM, Gilles Gouaillardet 
mailto:gil...@rist.or.jp>> wrote:


Dominic,

at first, you might try to add
call MPI_Barrier(comm,ierr)
between

  if (file_is_there .and. irank.eq.iroot) call
MPI_FILE_DELETE(file,MPI_INFO_NULL,ierr)

and

  call

MPI_FILE_OPEN(comm,file,IOR(MPI_MODE_WRONLY,MPI_MODE_CREATE),MPI_INFO_NULL,iunit,ierr)

/* there might me a race condition, i am not sure about that */


fwiw, the

STOP A configuration file is required

error message is not coming from OpenMPI.
it might be indirectly triggered by an ompio bug/li

Re: [OMPI devel] MPI Error

2016-03-23 Thread Dominic Kedelty

I am open to any suggestions to make the code better, especially if the way
it's coded now is wrong.

I believe what the MPI_TYPE_INDEXED is trying to do is this...

I have a domain of for example 8 hexahedral elements (2x2x2 cell domain)
that has 27 unique connectivity nodes (3x3x3 nodes)
In this portion of the code it is trying to write the hexa cell labeling
and its connectivity via nodes. and these elements can be spread across
nprocs.

The potion of the binary file that is being written should have this format
[id_e1 id_e2 ... id_ne]

This block of the file has nelems=12 4-byte binary integers.

n1_e1 n2_e1 ... n8_e1
n1_e2 n2_e2 ... n8_e2
 . .
n1_e12 n2_e12 ... n8_e12

This block of the file has 8.nelems=12 4-byte binary integers.

It is not an irregular shape. since I know that I have an array hexa_ that
has [id_e1 id_e2 id_e3 id_e4] on rank 3 and [id_e5 id_e6 id_e7 id_e8] on
rank 1... etc. and for the most part every processor has the same number of
elements. (that is unless I am running on an odd number of processors)

note: i am using random ranks because I am not sure if rank 0 gets the
first ids.


If MPI_Type_contiguous would work better I am open to switching to that.

On Tue, Mar 22, 2016 at 11:06 PM, Gilles Gouaillardet 
wrote:

> Dominik,
>
> with MPI_Type_indexed, array_of_displacements is an int[]
> so yes, there is a risk of overflow
>
> on the other hand, MPI_Type_create_hindexed, array_of_displacements is an
> MPI_Aint[]
>
> note
>  array_of_displacements
>  Displacement  for  each  block,  in  multiples  of
> oldtype extent for MPI_Type_indexed and bytes for MPI_Type_create_hindexed
> (array of integer for MPI_TYPE_INDEXED, array of MPI_Aint for
>  MPI_TYPE_CREATE_HINDEXED).
>
>
> i do not fully understand what you are trying to achieve ...
>
> MPI_TYPE_INDEXED( 1024^3,  blocklength=(/8 8 8 8 8 . 8 8 /), map=(/0,
> 8, 16, 24, . , 8589934592/), MPI_INTEGER, file_view_hexa_conn, ierr)
>
> at first glance, this is equivalent to
> MPI_Type_contiguous(1024^3, MPI_INTEGER, file_view_hexa_conn, ierr)
>
> so unless your data has a non regular shape, i recomment you use other MPI
> primitives to create your datatype.
> This should be much more efficient, and less prone to integer overflow
>
> Cheers,
>
> Gilles
>
> On 3/23/2016 2:50 PM, Dominic Kedelty wrote:
>
> Hi Gilles,
>
> I believe I have found the problem. Initially I thought it may have been
> an mpi issue since it was internally within an mpi function. However, now I
> am sure that the problem has to do with an overflow of 4-byte signed
> integers.
>
> I am dealing with computational domains that have a little more than a
> billion cells (1024^3 cells). However, I am still within the limits of the
> 4 byte integer. The area where I am running into the problem is here I have
> shortened the code,
>
>  ! Fileviews
> integer :: fileview_hexa
> integer :: fileview_hexa_conn
>
> integer, dimension(:), allocatable :: blocklength
> integer, dimension(:), allocatable :: map
> integer(KIND=8) :: size
> integer(KIND=MPI_OFFSET_KIND) :: disp   ! MPI_OFFSET_KIND seems to be
> 8-bytes
>
> allocate(map(ncells_hexa_),blocklength(ncells_hexa_))
> map = hexa_-1
> hexa_ is a 4-byte array of integers that label local hexa elements at a
> given rank. The max this number can be is in my current code (1024^3). and
> min is 1.
> blocklength = 1
> call
> MPI_TYPE_INDEXED(ncells_hexa_,blocklength,map,MPI_REAL_SP,fileview_hexa,ierr)
> MPI_REAL_SP is used for 4-byte scalar data types that are going to be
> written to the file. (i.e. temperature scalar stored at a given hexa cell)
> call MPI_TYPE_COMMIT(fileview_hexa,ierr)
> map = map * 8
> here is where problems arise. The map is being multiplied by 8 because the
> hexa cell node connectivity needs to be written. The node numbering that is
> being written to the file needs to be 4-bytes, and the max node numbering
> is able to be held within the 4-byte signed integer. But since I have to
> map 8*1024^3 displacements to be written map needs to be integer(kind=8).
> blocklength = 8
> call
> MPI_TYPE_INDEXED(ncells_hexa_,blocklength,map,MPI_INTEGER,fileview_hexa_conn,ierr)
> MPI_TYPE_INDEXED( 1024^3,  blocklength=(/8 8 8 8 8 . 8 8 /), map=(/0,
> 8, 16, 24, . , 8589934592/), MPI_INTEGER, file_view_hexa_conn, ierr)
> Would this be a correct way to declare the newdatatype
> file_view_hexa_conn. in this call blocklength would be a 4-byte integer
> array and map would be an 8-byte integer array. To be clear, in the code
> currently has both map and blocklength as 4-bytes integer arrays.
> call MPI_TYPE_COMMIT(fileview_hexa_conn,ierr)
> deallocate(map,blocklength)
>
> 
>
> disp = disp+84
> call
> MPI_FILE_SET_VIEW(iunit,disp,MPI_INTEGER,fileview_hexa,"native",MPI_INFO_NULL,ierr)
> call MPI_FILE_WRITE_ALL(iunit,hexa_,ncells_hexa_,MPI_INTEGER,status,ierr)
> I believe this could be wrong as well but the fileview_hexa is being used
> to write the 4-byte integer hexa labeli

Re: [OMPI devel] OMPI devel] MPI Error

2016-03-23 Thread Gilles Gouaillardet

Dominic,

I can only recommend you write a small self contained programs that write the 
data in parallel, and then check from task 0 only that data was written as you 
expected.

Feel free to take some time reading mpi io tutorials.

If you are still struggling with your code, i will try to help you,
once i can download and compile it

Also, since this does not look like an openmpi bug, i recommend you post this 
kind of requests to the users mailing list

Cheers,

Gilles

Dominic Kedelty  wrote:
>I am open to any suggestions to make the code better, especially if the way 
>it's coded now is wrong.
>
>
>I believe what the MPI_TYPE_INDEXED is trying to do is this...
>
>
>I have a domain of for example 8 hexahedral elements (2x2x2 cell domain) that 
>has 27 unique connectivity nodes (3x3x3 nodes)
>
>In this portion of the code it is trying to write the hexa cell labeling and 
>its connectivity via nodes. and these elements can be spread across nprocs.
>
>
>The potion of the binary file that is being written should have this format
>
>[id_e1 id_e2 ... id_ne]
>
>This block of the file has nelems=12 4-byte binary integers. 
>
>n1_e1 n2_e1 ... n8_e1 
>
>n1_e2 n2_e2 ... n8_e2
>
> . . 
>
>n1_e12 n2_e12 ... n8_e12
>
>This block of the file has 8.nelems=12 4-byte binary integers.
>
>
>It is not an irregular shape. since I know that I have an array hexa_ that has 
>[id_e1 id_e2 id_e3 id_e4] on rank 3 and [id_e5 id_e6 id_e7 id_e8] on rank 1... 
>etc. and for the most part every processor has the same number of elements. 
>(that is unless I am running on an odd number of processors)
>
>note: i am using random ranks because I am not sure if rank 0 gets the first 
>ids.
>
>
>If MPI_Type_contiguous would work better I am open to switching to that.
>
>
>On Tue, Mar 22, 2016 at 11:06 PM, Gilles Gouaillardet  
>wrote:
>
>Dominik,
>
>with MPI_Type_indexed, array_of_displacements is an int[]
>so yes, there is a risk of overflow
>
>on the other hand, MPI_Type_create_hindexed, array_of_displacements is an 
>MPI_Aint[]
>
>note
> array_of_displacements
> Displacement  for  each  block,  in  multiples  of  oldtype 
>extent for MPI_Type_indexed and bytes for MPI_Type_create_hindexed (array of 
>integer for MPI_TYPE_INDEXED, array of MPI_Aint for
> MPI_TYPE_CREATE_HINDEXED).
>
>
>i do not fully understand what you are trying to achieve ...
>
>MPI_TYPE_INDEXED( 1024^3,  blocklength=(/8 8 8 8 8 . 8 8 /), map=(/0, 8, 
>16, 24, . , 8589934592/), MPI_INTEGER, file_view_hexa_conn, ierr)
>
>at first glance, this is equivalent to
>MPI_Type_contiguous(1024^3, MPI_INTEGER, file_view_hexa_conn, ierr)
>
>so unless your data has a non regular shape, i recomment you use other MPI 
>primitives to create your datatype.
>This should be much more efficient, and less prone to integer overflow
>
>Cheers,
>
>Gilles
>
>
>On 3/23/2016 2:50 PM, Dominic Kedelty wrote:
>
>Hi Gilles,
>
>I believe I have found the problem. Initially I thought it may have been an 
>mpi issue since it was internally within an mpi function. However, now I am 
>sure that the problem has to do with an overflow of 4-byte signed integers.
>
>I am dealing with computational domains that have a little more than a billion 
>cells (1024^3 cells). However, I am still within the limits of the 4 byte 
>integer. The area where I am running into the problem is here I have shortened 
>the code,
>
> ! Fileviews
>
>integer :: fileview_hexa
>
>integer :: fileview_hexa_conn
>
>
>integer, dimension(:), allocatable :: blocklength
>integer, dimension(:), allocatable :: map
>
>integer(KIND=8) :: size
>
>integer(KIND=MPI_OFFSET_KIND) :: disp   ! MPI_OFFSET_KIND seems to be 8-bytes
>
>
>allocate(map(ncells_hexa_),blocklength(ncells_hexa_))
>map = hexa_-1
>
>hexa_ is a 4-byte array of integers that label local hexa elements at a given 
>rank. The max this number can be is in my current code (1024^3). and min is 1. 
>
>blocklength = 1
>call 
>MPI_TYPE_INDEXED(ncells_hexa_,blocklength,map,MPI_REAL_SP,fileview_hexa,ierr)
>
>MPI_REAL_SP is used for 4-byte scalar data types that are going to be written 
>to the file. (i.e. temperature scalar stored at a given hexa cell) 
>
>call MPI_TYPE_COMMIT(fileview_hexa,ierr)
>map = map * 8
>
>here is where problems arise. The map is being multiplied by 8 because the 
>hexa cell node connectivity needs to be written. The node numbering that is 
>being written to the file needs to be 4-bytes, and the max node numbering is 
>able to be held within the 4-byte signed integer. But since I have to map 
>8*1024^3 displacements to be written map needs to be integer(kind=8).
>
>blocklength = 8
>call 
>MPI_TYPE_INDEXED(ncells_hexa_,blocklength,map,MPI_INTEGER,fileview_hexa_conn,ierr)
>
>MPI_TYPE_INDEXED( 1024^3,  blocklength=(/8 8 8 8 8 . 8 8 /), map=(/0, 8, 
>16, 24, . , 8589934592/), MPI_INTEGER, file_view_hexa_conn, ierr)
>
>Would this be a correct way to declare the newdatatype file_view_hexa_conn. in 
>this call blocklength woul

Re: [OMPI devel] Confusion about slots

2016-03-23 Thread Federico Reghenzani

Ok, I've investigated further today, it seems "--map-by hwthread" does not
remove the problem. However, if I specified in the hostfile "node0
slots=32" it runs really slower than specifying only "node0". In both cases
I run mpirun with -np 32. So I'm quite sure I didn't understand what slots
are.

__
Federico Reghenzani
M.Eng. Student @ Politecnico di Milano
Computer Science and Engineering



2016-03-22 18:56 GMT+01:00 Federico Reghenzani <
federico1.reghenz...@mail.polimi.it>:

> Hi guys,
>
> I'm really confused about *slots* in resource allocation: I thought that
> slots are the number of processes spawnable in a certain node, so it should
> correspond to the number of Processing Elements of the node. For example,
> on each of my nodes I have 2 processors, total 16 cores with
> hyperthreading, so a total of 32 processing elements per node (i.e. 32 hw
> threads). However, considering a single node, passing in the hostfile 32
> slots and requesting "-np 32" results is a performance degradation of 20x
> slower than using only "-np 16". The problem disappears specifing --map-by
> hwthread.
>
> Investigating on the problem I found these counterintuitive things:
> - here
>  is
> stated "*slots* are Open MPI's representation of how many processors are
> available"
> - here  is
> stated "Slots indicate how many processes can potentially execute on a
> node. For best performance, the number of slots may be chosen to be the
> number of cores on the node or the number of processor sockets"
> - I tried to remove the slots information from the hostfile, so according
> to this
> 
> should be interpreted as "1", but it spawns anyway 32 processes
> - I'm not sure what --map-by and --rank-by do
>
> In custom RAS we are developing, what we have to send to mpirun? The
> number of processor sockets, the number of cores or the number of hwthread
> available? How --map-by and --rank-by affect the spawn policy?
>
>
> Thank you!
>
>
> OFFTOPIC: is someone going to EuroMPI 2016 in September? We will be there
> to present our migration technique.
>
>
> Cheers,
> Federico
>
> __
> Federico Reghenzani
> M.Eng. Student @ Politecnico di Milano
> Computer Science and Engineering
>
>
>

Re: [OMPI devel] Confusion about slots

2016-03-23 Thread Ralph Castain

“Slots” are an abstraction commonly used by schedulers as a way of indicating 
how many processes are allowed to run on a given node. It has nothing to do 
with hardware, either cores or HTs.

MPI programmers frequently like to bind a process to one or more hardware 
assets (cores or HTs). Thus, you will see confusion in the community where 
people mix the term “slot” with “cores” or “cpus”. This is unfortunate as it 
the terms really do mean very different things.

In OMPI, we chose to try and “help” the user by not requiring them to specify 
detailed info in a hostfile. So if you don’t specify the number of “slots” for 
a given node, we will sense the number of cores on that node and set the slots 
to match that number. This best matches user expectations today.

If you do specify the number of slots, then we use that to guide the desired 
number of processes assigned to each node. We then bind each of those processes 
according to the user-provided guidance.

HTH
Ralph

> On Mar 23, 2016, at 9:35 AM, Federico Reghenzani 
>  wrote:
> 
> Ok, I've investigated further today, it seems "--map-by hwthread" does not 
> remove the problem. However, if I specified in the hostfile "node0 slots=32" 
> it runs really slower than specifying only "node0". In both cases I run 
> mpirun with -np 32. So I'm quite sure I didn't understand what slots are.  
> 
> __
> Federico Reghenzani
> M.Eng. Student @ Politecnico di Milano
> Computer Science and Engineering
> 
> 
> 
> 2016-03-22 18:56 GMT+01:00 Federico Reghenzani 
>  >:
> Hi guys,
> 
> I'm really confused about slots in resource allocation: I thought that slots 
> are the number of processes spawnable in a certain node, so it should 
> correspond to the number of Processing Elements of the node. For example, on 
> each of my nodes I have 2 processors, total 16 cores with hyperthreading, so 
> a total of 32 processing elements per node (i.e. 32 hw threads). However, 
> considering a single node, passing in the hostfile 32 slots and requesting 
> "-np 32" results is a performance degradation of 20x slower than using only 
> "-np 16". The problem disappears specifing --map-by hwthread.
> 
> Investigating on the problem I found these counterintuitive things:
> - here 
>  is 
> stated "slots are Open MPI's representation of how many processors are 
> available"
> - here  is stated 
> "Slots indicate how many processes can potentially execute on a node. For 
> best performance, the number of slots may be chosen to be the number of cores 
> on the node or the number of processor sockets" 
> - I tried to remove the slots information from the hostfile, so according to 
> this  
> should be interpreted as "1", but it spawns anyway 32 processes
> - I'm not sure what --map-by and --rank-by do 
> 
> In custom RAS we are developing, what we have to send to mpirun? The number 
> of processor sockets, the number of cores or the number of hwthread 
> available? How --map-by and --rank-by affect the spawn policy?
> 
> 
> Thank you!
> 
> 
> OFFTOPIC: is someone going to EuroMPI 2016 in September? We will be there to 
> present our migration technique.
> 
> 
> Cheers,
> Federico
> 
> __
> Federico Reghenzani
> M.Eng. Student @ Politecnico di Milano
> Computer Science and Engineering
> 
> 
> 
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2016/03/18723.php 
>

Re: [OMPI devel] Confusion about slots

2016-03-23 Thread Aurélien Bouteiller

To add to what Ralf said, you probably do not want to use Hyper Threads for HPC 
workloads, as that generally results in very poor performance (as you noticed). 
Set the number of slots to the number of real cores (not HT), that would yield 
optimal results 95% of the time. 

Aurélien 

--
Aurélien Bouteiller, Ph.D. ~~ https://icl.cs.utk.edu/~bouteill/ 

> Le 23 mars 2016 à 16:24, Ralph Castain  a écrit :
> 
> “Slots” are an abstraction commonly used by schedulers as a way of indicating 
> how many processes are allowed to run on a given node. It has nothing to do 
> with hardware, either cores or HTs.
> 
> MPI programmers frequently like to bind a process to one or more hardware 
> assets (cores or HTs). Thus, you will see confusion in the community where 
> people mix the term “slot” with “cores” or “cpus”. This is unfortunate as it 
> the terms really do mean very different things.
> 
> In OMPI, we chose to try and “help” the user by not requiring them to specify 
> detailed info in a hostfile. So if you don’t specify the number of “slots” 
> for a given node, we will sense the number of cores on that node and set the 
> slots to match that number. This best matches user expectations today.
> 
> If you do specify the number of slots, then we use that to guide the desired 
> number of processes assigned to each node. We then bind each of those 
> processes according to the user-provided guidance.
> 
> HTH
> Ralph
> 
>> On Mar 23, 2016, at 9:35 AM, Federico Reghenzani 
>> > > wrote:
>> 
>> Ok, I've investigated further today, it seems "--map-by hwthread" does not 
>> remove the problem. However, if I specified in the hostfile "node0 slots=32" 
>> it runs really slower than specifying only "node0". In both cases I run 
>> mpirun with -np 32. So I'm quite sure I didn't understand what slots are.  
>> 
>> __
>> Federico Reghenzani
>> M.Eng. Student @ Politecnico di Milano
>> Computer Science and Engineering
>> 
>> 
>> 
>> 2016-03-22 18:56 GMT+01:00 Federico Reghenzani 
>> > >:
>> Hi guys,
>> 
>> I'm really confused about slots in resource allocation: I thought that slots 
>> are the number of processes spawnable in a certain node, so it should 
>> correspond to the number of Processing Elements of the node. For example, on 
>> each of my nodes I have 2 processors, total 16 cores with hyperthreading, so 
>> a total of 32 processing elements per node (i.e. 32 hw threads). However, 
>> considering a single node, passing in the hostfile 32 slots and requesting 
>> "-np 32" results is a performance degradation of 20x slower than using only 
>> "-np 16". The problem disappears specifing --map-by hwthread.
>> 
>> Investigating on the problem I found these counterintuitive things:
>> - here 
>>  is 
>> stated "slots are Open MPI's representation of how many processors are 
>> available"
>> - here  is 
>> stated "Slots indicate how many processes can potentially execute on a node. 
>> For best performance, the number of slots may be chosen to be the number of 
>> cores on the node or the number of processor sockets" 
>> - I tried to remove the slots information from the hostfile, so according to 
>> this 
>>  
>> should be interpreted as "1", but it spawns anyway 32 processes
>> - I'm not sure what --map-by and --rank-by do 
>> 
>> In custom RAS we are developing, what we have to send to mpirun? The number 
>> of processor sockets, the number of cores or the number of hwthread 
>> available? How --map-by and --rank-by affect the spawn policy?
>> 
>> 
>> Thank you!
>> 
>> 
>> OFFTOPIC: is someone going to EuroMPI 2016 in September? We will be there to 
>> present our migration technique.
>> 
>> 
>> Cheers,
>> Federico
>> 
>> __
>> Federico Reghenzani
>> M.Eng. Student @ Politecnico di Milano
>> Computer Science and Engineering
>> 
>> 
>> 
>> ___
>> devel mailing list
>> de...@open-mpi.org 
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel 
>> 
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/devel/2016/03/18723.php 
>> 
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2016/03/18724.php



smime.p7s
Description: S/MIME cryptographic signature

Re: [OMPI devel] MPI Error

Re: [OMPI devel] MPI Error

Re: [OMPI devel] MPI Error

Re: [OMPI devel] OMPI devel] MPI Error

Re: [OMPI devel] Confusion about slots

Re: [OMPI devel] Confusion about slots

Re: [OMPI devel] Confusion about slots

7 matches

Site Navigation

Mail list logo

Footer information