Hi,

We are working  on a portals4 components and we have found a bug  (causing a 
segmentation fault ) which must be  related to the coll/basic component.
Due to a lack of time, we cannot investigate further but this seems to be 
caused by a "free(disps);" (around line 300 in coll_basic_reduce_scatter) in 
some specific situations. In our case it  happens on a osu_reduce_scatter (from 
the OSU microbenchmarks) with at least 97 procs for sizes bigger than 512Ko

Step to reproduce :
export OMPI_MCA_mtl=^portals4
export OMPI_MCA_btl=^portals4
export OMPI_MCA_coll=basic,libnbc,self,tuned
export OMPI_MCA_osc=^portals4
export OMPI_MCA_pml=ob1
mpirun -n 97 osu_reduce_scatter -m 524288:

( reducing the number of iterations with -i 1 -x 0 should keep the bug)
Our git branch is based on the v2.x branch and the files differ almost only on 
portals4 parts.

Could someone confirm this bug ?

Emmanuel BRELLE



_______________________________________________
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

Reply via email to