Hi, We are working on a portals4 components and we have found a bug (causing a segmentation fault ) which must be related to the coll/basic component. Due to a lack of time, we cannot investigate further but this seems to be caused by a "free(disps);" (around line 300 in coll_basic_reduce_scatter) in some specific situations. In our case it happens on a osu_reduce_scatter (from the OSU microbenchmarks) with at least 97 procs for sizes bigger than 512Ko
Step to reproduce : export OMPI_MCA_mtl=^portals4 export OMPI_MCA_btl=^portals4 export OMPI_MCA_coll=basic,libnbc,self,tuned export OMPI_MCA_osc=^portals4 export OMPI_MCA_pml=ob1 mpirun -n 97 osu_reduce_scatter -m 524288: ( reducing the number of iterations with -i 1 -x 0 should keep the bug) Our git branch is based on the v2.x branch and the files differ almost only on portals4 parts. Could someone confirm this bug ? Emmanuel BRELLE
_______________________________________________ devel mailing list devel@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/devel