Hi Gilles,
I believe I have found the problem. Initially I thought it may have been an
mpi issue since it was internally within an mpi function. However, now I am
sure that the problem has to do with an overflow of 4-byte signed integers.
I am dealing with computational domains that have a little
Dominik,
with MPI_Type_indexed, array_of_displacements is an int[]
so yes, there is a risk of overflow
on the other hand, MPI_Type_create_hindexed, array_of_displacements is
an MPI_Aint[]
note
array_of_displacements
Displacement for each block, in multiples of
oldtype
I am open to any suggestions to make the code better, especially if the way
it's coded now is wrong.
I believe what the MPI_TYPE_INDEXED is trying to do is this...
I have a domain of for example 8 hexahedral elements (2x2x2 cell domain)
that has 27 unique connectivity nodes (3x3x3 nodes)
In this
Dominic,
I can only recommend you write a small self contained programs that write the
data in parallel, and then check from task 0 only that data was written as you
expected.
Feel free to take some time reading mpi io tutorials.
If you are still struggling with your code, i will try to help y
Ok, I've investigated further today, it seems "--map-by hwthread" does not
remove the problem. However, if I specified in the hostfile "node0
slots=32" it runs really slower than specifying only "node0". In both cases
I run mpirun with -np 32. So I'm quite sure I didn't understand what slots
are.
“Slots” are an abstraction commonly used by schedulers as a way of indicating
how many processes are allowed to run on a given node. It has nothing to do
with hardware, either cores or HTs.
MPI programmers frequently like to bind a process to one or more hardware
assets (cores or HTs). Thus, yo
To add to what Ralf said, you probably do not want to use Hyper Threads for HPC
workloads, as that generally results in very poor performance (as you noticed).
Set the number of slots to the number of real cores (not HT), that would yield
optimal results 95% of the time.
Aurélien
--
Aurélien