Larry, Thanks for following with us on this. I think your patch is cleaner than what we currently have in the trunk, so I went ahead and push it in the trunk (25461). I will request a push in 1.5 and 1.4 as well.
Regards, george. On Nov 8, 2011, at 13:57 , Larry Baker wrote: > The good news is that the issue reported in R25290 is fixed in the latest > Intel compilers release (2011.7.256). The bad news is that both the > 2011.6.233 and 2011.7.256 releases identify themselves as V12.1.0 from the > command line. (I reported this bug to Intel already.) They can only be > reliably distinguished using the predefined __INTEL_COMPILER_BUILD_DATE > macro. I verified that the build dates for all three compilers we have -- > Linux, Mac OS X, and Windows -- are the same. > > I developed a more targeted patch (attached) for OpenMPI 1.4.3 > opal/mca/memory/ptmalloc2/malloc.c which disables vectorization for > _int_malloc() only if an Intel compiler with the 2011.6.233 release build > date is found (__INTEL_COMPILER_BUILD_DATE == 20110811). This patch could > presumably make its way into all the copies of > opal/mca/memory/ptmalloc2/malloc.c in the various versions of OpenMPI that > are still being maintained. > > Larry Baker > US Geological Survey > 650-329-5608 > ba...@usgs.gov > > On 17 Oct 2011, at 8:18 PM, George Bosilca wrote: > >> Larry, >> >> Sorry for not updating this thread. The issue was identified and fixed by >> Rainer in r25290 (https://svn.open-mpi.org/trac/ompi/changeset/25290). >> Please read the comments and the linked thread on the Intel forum for more >> info about. >> >> I couldn't find a trace of this being fixed in the 1.4 series, so I would >> wait upgrading until this issue gets resolved. >> >> Thanks, >> george. >> >> On Oct 17, 2011, at 23:00 , Larry Baker wrote: >> >>> George, >>> >>> I have not had time to look over the 1.4.3 make check failure for Intel >>> 2011.6.233 compilers. Have you? >>> >>> I had planned to get 1.4.3 compiled on all six of our compilers using the >>> latest compiler releases. I was putting off upgrading to 1.4.4 or 1.5.x >>> until after that to minimize the number of things that could go wrong. Do >>> you recommend otherwise? >>> >>> Larry Baker >>> US Geological Survey >>> 650-329-5608 >>> ba...@usgs.gov >>> >>> On 7 Oct 2011, at 6:46 PM, George Bosilca wrote: >>> >>>> The may_alias attribute was part of a forward-looking attribute checking, >>>> at a time where few compiler supported them. This explains why they are >>>> not widely used in the library itself. Moreover, as they do not affect the >>>> compilation itself (as your test highlights this is not the issue with the >>>> icc 2011.6.233 compiler), there is no urge to remove the may_alias support. >>>> >>>> I just got that particular version of the compiler installed on one of our >>>> machines. I'll give it a try over the weekend. >>>> >>>> george. >>>> >>>> On Oct 7, 2011, at 20:21 , Larry Baker wrote: >>>> >>>>> The test for the __may_alias_ attribute uses the following short code >>>>> snippet: >>>>> >>>>>> int * p_value __attribute__ ((__may_alias__)); >>>>>> int >>>>>> main () >>>>>> { >>>>>> >>>>>> ; >>>>>> return 0; >>>>>> } >>>>> >>>>> Indeed, for Intel 2011 compilers prior to 2011.6.233, this results in a >>>>> warning: >>>>> >>>>>> root@hydra openmpi-1.4.3]# module load compilers/intel/2011.5.220 >>>>>> [root@hydra openmpi-1.4.3]# icc -c may_alias_test.c >>>>>> may_alias_test.c(123): warning #1292: attribute "__may_alias__" ignored >>>>>> int * p_value __attribute__ ((__may_alias__)); >>>>>> ^ >>>>>> >>>>>> [root@hydra openmpi-1.4.3]# module unload compilers/intel/2011.5.220 >>>>> >>>>>> [root@hydra openmpi-1.4.3]# module load compilers/intel/2011.6.233 >>>>>> [root@hydra openmpi-1.4.3]# icc -c may_alias_test.c >>>>> >>>>> >>>>> I modified ./configure to force >>>>> >>>>>> ompi_cv___attribute__may_alias=0 >>>>> >>>>> >>>>> Then I compiled and tested the library. Unfortunately, the results were >>>>> exactly the same: >>>>> >>>>>> make check-TESTS >>>>>> make[3]: Entering directory >>>>>> `/state/partition1/root/src/openmpi-1.4.3/test/datatype' >>>>>> /bin/sh: line 4: 26326 Segmentation fault ${dir}$tst >>>>>> FAIL: checksum >>>>>> /bin/sh: line 4: 26359 Segmentation fault ${dir}$tst >>>>>> FAIL: position >>>>>> ======================================================== >>>>>> 2 of 2 tests failed >>>>>> Please report to http://www.open-mpi.org/community/help/ >>>>>> ======================================================== >>>>> >>>>> >>>>> I could not find any use of the may_alias attribute, other than in a >>>>> #define in opal/include/opal_config_bottom.h. Is >>>>> OMPI_HAVE_ATTRIBUTE_MAY_ALIAS just cruft that can be removed? >>>>> >>>>> Larry Baker >>>>> US Geological Survey >>>>> 650-329-5608 >>>>> ba...@usgs.gov >>>>> >>>>> On 7 Oct 2011, at 11:08 AM, Larry Baker wrote: >>>>> >>>>>> I ran into a problem this past week trying to upgrade our OpenMPI 1.4.3 >>>>>> for the latest Intel 2011 compiler, 2011.6.233. >>>>>> >>>>>> make check fails with Segmentation Fault errors: >>>>>> >>>>>>> [root@hydra openmpi-1.4.3]# tail -20 >>>>>>> ../openmpi-1.4.3-check-intel.6.233.log >>>>>>> /bin/sh ../../libtool --tag=CC --mode=link icc -DNDEBUG -g -O3 >>>>>>> -finline-functions -fno-strict-aliasing -restrict -pthread >>>>>>> -fvisibility=hidden -shared-intel -export-dynamic -shared-intel -o >>>>>>> ddt_pack ddt_pack.o ../../ompi/libmpi.la -lnsl -lutil >>>>>>> libtool: link: icc -DNDEBUG -g -O3 -finline-functions >>>>>>> -fno-strict-aliasing -restrict -pthread -fvisibility=hidden >>>>>>> -shared-intel -shared-intel -o .libs/ddt_pack ddt_pack.o >>>>>>> -Wl,--export-dynamic ../../ompi/.libs/libmpi.so >>>>>>> /usr/local/src/openmpi-1.4.3/orte/.libs/libopen-rte.so >>>>>>> /usr/local/src/openmpi-1.4.3/opal/.libs/libopen-pal.so -ldl -lnsl >>>>>>> -lutil -pthread -Wl,-rpath -Wl,/usr/local/lib >>>>>>> make[3]: Leaving directory >>>>>>> `/state/partition1/root/src/openmpi-1.4.3/test/datatype' >>>>>>> make check-TESTS >>>>>>> make[3]: Entering directory >>>>>>> `/state/partition1/root/src/openmpi-1.4.3/test/datatype' >>>>>>> /bin/sh: line 4: 6322 Segmentation fault ${dir}$tst >>>>>>> FAIL: checksum >>>>>>> /bin/sh: line 4: 6355 Segmentation fault ${dir}$tst >>>>>>> FAIL: position >>>>>>> ======================================================== >>>>>>> 2 of 2 tests failed >>>>>>> Please report to http://www.open-mpi.org/community/help/ >>>>>>> ======================================================== >>>>>>> make[3]: *** [check-TESTS] Error 1 >>>>>>> make[3]: Leaving directory >>>>>>> `/state/partition1/root/src/openmpi-1.4.3/test/datatype' >>>>>>> make[2]: *** [check-am] Error 2 >>>>>>> make[2]: Leaving directory >>>>>>> `/state/partition1/root/src/openmpi-1.4.3/test/datatype' >>>>>>> make[1]: *** [check-recursive] Error 1 >>>>>>> make[1]: Leaving directory >>>>>>> `/state/partition1/root/src/openmpi-1.4.3/test' >>>>>>> make: *** [check-recursive] Error 1 >>>>>> >>>>>> >>>>>> Before trying to track down the problem, I thought I'd describe what I >>>>>> see here in case someone recognizes what might be happening. >>>>>> >>>>>> We have been using OpenMPI 1.4.3 compiled with the Intel 2011.3.174 >>>>>> compiler. I've updated the Intel 2011 compilers as they have come out >>>>>> with new versions: 2011.4.191, 2011.5.220, and now 2011.6.233. However, >>>>>> I've not recompiled OpenMPI 1.4.3 until now. >>>>>> >>>>>> Since the original compilation of OpenMPI 1.4.3 with the Intel >>>>>> 2011.3.174 compilers, I have installed libnuma and libnuma-devel RPMs on >>>>>> our cluster front end. I noticed that changed the OpenMPI 1.4.3 >>>>>> ./configure output. To test that this was not the cause of the problem, >>>>>> I recompiled OpenMPI 1.4.3 using both the CentOS/Rocks GNU compilers and >>>>>> the Intel 2011.3.174 compilers. They both passed all the make check >>>>>> tests. >>>>>> >>>>>> To find out when this problem first occurs, I systematically configured, >>>>>> compiled, and checked OpenMPI 1.4.3 with all four versions of the Intel >>>>>> 2011 compilers we have. We use the modules package to load the compiler >>>>>> environment: >>>>>> >>>>>>> [root@hydra openmpi-1.4.3]# env | grep /opt/intel >>>>>>> LD_LIBRARY_PATH=/opt/intel/composer_xe_2011_sp1.6.233/compiler/lib/intel64:/opt/intel/composer_xe_2011_sp1.6.233/mkl/lib/intel64 >>>>>>> PATH=/opt/intel/composer_xe_2011_sp1.6.233/bin/intel64:/usr/kerberos/sbin:/usr/kerberos/bin:/usr/java/latest/bin:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/opt/eclipse:/opt/ganglia/bin:/opt/ganglia/sbin:/opt/maui/bin:/opt/torque/bin:/opt/torque/sbin:/opt/rocks/bin:/opt/rocks/sbin:/root/bin >>>>>> >>>>>> >>>>>> Here's the steps I use to make and test OpenMPI 1.4.3 (I use a patched >>>>>> version to accommodate the six compilers we have; I've submitted those >>>>>> patches here in the past): >>>>>> >>>>>>> # cd /usr/local/src >>>>>>> # tar -xjf openmpi-1.4.3-patched.tar.bz2 >>>>>>> # cd openmpi-1.4.3 >>>>>>> # module load compilers/intel/2011.6.233 >>>>>>> # ./configure >../openmpi-1.4.3-configure-intel.6.233.log 2>&1 >>>>>>> --with-tm --with-openib --without-valgrind --without-udapl >>>>>>> --enable-contrib-no-build=vt --with-wrapper-ldflags="-shared-intel" >>>>>>> CC="icc" CFLAGS="-g -O3" CXX="icpc" CXXFLAGS="-g -O3" FC="ifort" >>>>>>> FCFLAGS="-g -O3" F77="ifort" FFLAGS="-g -O3" LDFLAGS="-shared-intel" >>>>>>> # make >../openmpi-1.4.3-make-intel.6.233.log 2>&1 >>>>>>> # make check >../openmpi-1.4.3-check-intel.6.233.log 2>&1 >>>>>> >>>>>> (When I generate the OpenMPI 1.4.3 library we actually use, I also add a >>>>>> --prefix. But, that complicates diff's of the stdout files for these >>>>>> steps, so it is not used here. Thus, I do NOT proceed to make install >>>>>> any of these libraries.) >>>>>> >>>>>> The three earlier versions of the Intel 2011 compilers all pass the make >>>>>> check tests. When I compare the ./configure stdout files, they are all >>>>>> identical. However, the ./configure stdout file for the Intel >>>>>> 2011.6.233 compilers has one difference: >>>>>> >>>>>>> [root@hydra openmpi-1.4.3]# diff >>>>>>> ../openmpi-1.4.3-configure-intel.{5.220,6.233}.log >>>>>>> 178c178 >>>>>>> < checking for __attribute__(may_alias)... no >>>>>>> --- >>>>>>> > checking for __attribute__(may_alias)... yes >>>>>> >>>>>> That is obviously where I will start looking for the source of the >>>>>> problem. >>>>>> >>>>>> Maybe someone reading this list knows what the purpose of that test is, >>>>>> whether the Intel 2011 compilers are expected to have this feature >>>>>> enabled, and whether the code this enables can cause this problem if the >>>>>> Intel 2011.6.233 compilers do not fully support whatever this test is >>>>>> intended to discern. >>>>>> >>>>>> Larry Baker >>>>>> US Geological Survey >>>>>> 650-329-5608 >>>>>> ba...@usgs.gov >>>>>> >>>>>> _______________________________________________ >>>>>> devel mailing list >>>>>> de...@open-mpi.org >>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>>> >>>>> _______________________________________________ >>>>> devel mailing list >>>>> de...@open-mpi.org >>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>> >>>> _______________________________________________ >>>> devel mailing list >>>> de...@open-mpi.org >>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >>> >>> _______________________________________________ >>> devel mailing list >>> de...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >> >> _______________________________________________ >> devel mailing list >> de...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/devel > <Intel20110811Fix.patch.txt>