Larry,
Thanks for following with us on this. I think your patch is cleaner than what
we currently have in the trunk, so I went ahead and push it in the trunk
(25461). I will request a push in 1.5 and 1.4 as well.
Regards,
george.
On Nov 8, 2011, at 13:57 , Larry Baker wrote:
> The good news is that the issue reported in R25290 is fixed in the latest
> Intel compilers release (2011.7.256). The bad news is that both the
> 2011.6.233 and 2011.7.256 releases identify themselves as V12.1.0 from the
> command line. (I reported this bug to Intel already.) They can only be
> reliably distinguished using the predefined __INTEL_COMPILER_BUILD_DATE
> macro. I verified that the build dates for all three compilers we have --
> Linux, Mac OS X, and Windows -- are the same.
>
> I developed a more targeted patch (attached) for OpenMPI 1.4.3
> opal/mca/memory/ptmalloc2/malloc.c which disables vectorization for
> _int_malloc() only if an Intel compiler with the 2011.6.233 release build
> date is found (__INTEL_COMPILER_BUILD_DATE == 20110811). This patch could
> presumably make its way into all the copies of
> opal/mca/memory/ptmalloc2/malloc.c in the various versions of OpenMPI that
> are still being maintained.
>
> Larry Baker
> US Geological Survey
> 650-329-5608
> [email protected]
>
> On 17 Oct 2011, at 8:18 PM, George Bosilca wrote:
>
>> Larry,
>>
>> Sorry for not updating this thread. The issue was identified and fixed by
>> Rainer in r25290 (https://svn.open-mpi.org/trac/ompi/changeset/25290).
>> Please read the comments and the linked thread on the Intel forum for more
>> info about.
>>
>> I couldn't find a trace of this being fixed in the 1.4 series, so I would
>> wait upgrading until this issue gets resolved.
>>
>> Thanks,
>> george.
>>
>> On Oct 17, 2011, at 23:00 , Larry Baker wrote:
>>
>>> George,
>>>
>>> I have not had time to look over the 1.4.3 make check failure for Intel
>>> 2011.6.233 compilers. Have you?
>>>
>>> I had planned to get 1.4.3 compiled on all six of our compilers using the
>>> latest compiler releases. I was putting off upgrading to 1.4.4 or 1.5.x
>>> until after that to minimize the number of things that could go wrong. Do
>>> you recommend otherwise?
>>>
>>> Larry Baker
>>> US Geological Survey
>>> 650-329-5608
>>> [email protected]
>>>
>>> On 7 Oct 2011, at 6:46 PM, George Bosilca wrote:
>>>
>>>> The may_alias attribute was part of a forward-looking attribute checking,
>>>> at a time where few compiler supported them. This explains why they are
>>>> not widely used in the library itself. Moreover, as they do not affect the
>>>> compilation itself (as your test highlights this is not the issue with the
>>>> icc 2011.6.233 compiler), there is no urge to remove the may_alias support.
>>>>
>>>> I just got that particular version of the compiler installed on one of our
>>>> machines. I'll give it a try over the weekend.
>>>>
>>>> george.
>>>>
>>>> On Oct 7, 2011, at 20:21 , Larry Baker wrote:
>>>>
>>>>> The test for the __may_alias_ attribute uses the following short code
>>>>> snippet:
>>>>>
>>>>>> int * p_value __attribute__ ((__may_alias__));
>>>>>> int
>>>>>> main ()
>>>>>> {
>>>>>>
>>>>>> ;
>>>>>> return 0;
>>>>>> }
>>>>>
>>>>> Indeed, for Intel 2011 compilers prior to 2011.6.233, this results in a
>>>>> warning:
>>>>>
>>>>>> root@hydra openmpi-1.4.3]# module load compilers/intel/2011.5.220
>>>>>> [root@hydra openmpi-1.4.3]# icc -c may_alias_test.c
>>>>>> may_alias_test.c(123): warning #1292: attribute "__may_alias__" ignored
>>>>>> int * p_value __attribute__ ((__may_alias__));
>>>>>> ^
>>>>>>
>>>>>> [root@hydra openmpi-1.4.3]# module unload compilers/intel/2011.5.220
>>>>>
>>>>>> [root@hydra openmpi-1.4.3]# module load compilers/intel/2011.6.233
>>>>>> [root@hydra openmpi-1.4.3]# icc -c may_alias_test.c
>>>>>
>>>>>
>>>>> I modified ./configure to force
>>>>>
>>>>>> ompi_cv___attribute__may_alias=0
>>>>>
>>>>>
>>>>> Then I compiled and tested the library. Unfortunately, the results were
>>>>> exactly the same:
>>>>>
>>>>>> make check-TESTS
>>>>>> make[3]: Entering directory
>>>>>> `/state/partition1/root/src/openmpi-1.4.3/test/datatype'
>>>>>> /bin/sh: line 4: 26326 Segmentation fault ${dir}$tst
>>>>>> FAIL: checksum
>>>>>> /bin/sh: line 4: 26359 Segmentation fault ${dir}$tst
>>>>>> FAIL: position
>>>>>> ========================================================
>>>>>> 2 of 2 tests failed
>>>>>> Please report to http://www.open-mpi.org/community/help/
>>>>>> ========================================================
>>>>>
>>>>>
>>>>> I could not find any use of the may_alias attribute, other than in a
>>>>> #define in opal/include/opal_config_bottom.h. Is
>>>>> OMPI_HAVE_ATTRIBUTE_MAY_ALIAS just cruft that can be removed?
>>>>>
>>>>> Larry Baker
>>>>> US Geological Survey
>>>>> 650-329-5608
>>>>> [email protected]
>>>>>
>>>>> On 7 Oct 2011, at 11:08 AM, Larry Baker wrote:
>>>>>
>>>>>> I ran into a problem this past week trying to upgrade our OpenMPI 1.4.3
>>>>>> for the latest Intel 2011 compiler, 2011.6.233.
>>>>>>
>>>>>> make check fails with Segmentation Fault errors:
>>>>>>
>>>>>>> [root@hydra openmpi-1.4.3]# tail -20
>>>>>>> ../openmpi-1.4.3-check-intel.6.233.log
>>>>>>> /bin/sh ../../libtool --tag=CC --mode=link icc -DNDEBUG -g -O3
>>>>>>> -finline-functions -fno-strict-aliasing -restrict -pthread
>>>>>>> -fvisibility=hidden -shared-intel -export-dynamic -shared-intel -o
>>>>>>> ddt_pack ddt_pack.o ../../ompi/libmpi.la -lnsl -lutil
>>>>>>> libtool: link: icc -DNDEBUG -g -O3 -finline-functions
>>>>>>> -fno-strict-aliasing -restrict -pthread -fvisibility=hidden
>>>>>>> -shared-intel -shared-intel -o .libs/ddt_pack ddt_pack.o
>>>>>>> -Wl,--export-dynamic ../../ompi/.libs/libmpi.so
>>>>>>> /usr/local/src/openmpi-1.4.3/orte/.libs/libopen-rte.so
>>>>>>> /usr/local/src/openmpi-1.4.3/opal/.libs/libopen-pal.so -ldl -lnsl
>>>>>>> -lutil -pthread -Wl,-rpath -Wl,/usr/local/lib
>>>>>>> make[3]: Leaving directory
>>>>>>> `/state/partition1/root/src/openmpi-1.4.3/test/datatype'
>>>>>>> make check-TESTS
>>>>>>> make[3]: Entering directory
>>>>>>> `/state/partition1/root/src/openmpi-1.4.3/test/datatype'
>>>>>>> /bin/sh: line 4: 6322 Segmentation fault ${dir}$tst
>>>>>>> FAIL: checksum
>>>>>>> /bin/sh: line 4: 6355 Segmentation fault ${dir}$tst
>>>>>>> FAIL: position
>>>>>>> ========================================================
>>>>>>> 2 of 2 tests failed
>>>>>>> Please report to http://www.open-mpi.org/community/help/
>>>>>>> ========================================================
>>>>>>> make[3]: *** [check-TESTS] Error 1
>>>>>>> make[3]: Leaving directory
>>>>>>> `/state/partition1/root/src/openmpi-1.4.3/test/datatype'
>>>>>>> make[2]: *** [check-am] Error 2
>>>>>>> make[2]: Leaving directory
>>>>>>> `/state/partition1/root/src/openmpi-1.4.3/test/datatype'
>>>>>>> make[1]: *** [check-recursive] Error 1
>>>>>>> make[1]: Leaving directory
>>>>>>> `/state/partition1/root/src/openmpi-1.4.3/test'
>>>>>>> make: *** [check-recursive] Error 1
>>>>>>
>>>>>>
>>>>>> Before trying to track down the problem, I thought I'd describe what I
>>>>>> see here in case someone recognizes what might be happening.
>>>>>>
>>>>>> We have been using OpenMPI 1.4.3 compiled with the Intel 2011.3.174
>>>>>> compiler. I've updated the Intel 2011 compilers as they have come out
>>>>>> with new versions: 2011.4.191, 2011.5.220, and now 2011.6.233. However,
>>>>>> I've not recompiled OpenMPI 1.4.3 until now.
>>>>>>
>>>>>> Since the original compilation of OpenMPI 1.4.3 with the Intel
>>>>>> 2011.3.174 compilers, I have installed libnuma and libnuma-devel RPMs on
>>>>>> our cluster front end. I noticed that changed the OpenMPI 1.4.3
>>>>>> ./configure output. To test that this was not the cause of the problem,
>>>>>> I recompiled OpenMPI 1.4.3 using both the CentOS/Rocks GNU compilers and
>>>>>> the Intel 2011.3.174 compilers. They both passed all the make check
>>>>>> tests.
>>>>>>
>>>>>> To find out when this problem first occurs, I systematically configured,
>>>>>> compiled, and checked OpenMPI 1.4.3 with all four versions of the Intel
>>>>>> 2011 compilers we have. We use the modules package to load the compiler
>>>>>> environment:
>>>>>>
>>>>>>> [root@hydra openmpi-1.4.3]# env | grep /opt/intel
>>>>>>> LD_LIBRARY_PATH=/opt/intel/composer_xe_2011_sp1.6.233/compiler/lib/intel64:/opt/intel/composer_xe_2011_sp1.6.233/mkl/lib/intel64
>>>>>>> PATH=/opt/intel/composer_xe_2011_sp1.6.233/bin/intel64:/usr/kerberos/sbin:/usr/kerberos/bin:/usr/java/latest/bin:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/opt/eclipse:/opt/ganglia/bin:/opt/ganglia/sbin:/opt/maui/bin:/opt/torque/bin:/opt/torque/sbin:/opt/rocks/bin:/opt/rocks/sbin:/root/bin
>>>>>>
>>>>>>
>>>>>> Here's the steps I use to make and test OpenMPI 1.4.3 (I use a patched
>>>>>> version to accommodate the six compilers we have; I've submitted those
>>>>>> patches here in the past):
>>>>>>
>>>>>>> # cd /usr/local/src
>>>>>>> # tar -xjf openmpi-1.4.3-patched.tar.bz2
>>>>>>> # cd openmpi-1.4.3
>>>>>>> # module load compilers/intel/2011.6.233
>>>>>>> # ./configure >../openmpi-1.4.3-configure-intel.6.233.log 2>&1
>>>>>>> --with-tm --with-openib --without-valgrind --without-udapl
>>>>>>> --enable-contrib-no-build=vt --with-wrapper-ldflags="-shared-intel"
>>>>>>> CC="icc" CFLAGS="-g -O3" CXX="icpc" CXXFLAGS="-g -O3" FC="ifort"
>>>>>>> FCFLAGS="-g -O3" F77="ifort" FFLAGS="-g -O3" LDFLAGS="-shared-intel"
>>>>>>> # make >../openmpi-1.4.3-make-intel.6.233.log 2>&1
>>>>>>> # make check >../openmpi-1.4.3-check-intel.6.233.log 2>&1
>>>>>>
>>>>>> (When I generate the OpenMPI 1.4.3 library we actually use, I also add a
>>>>>> --prefix. But, that complicates diff's of the stdout files for these
>>>>>> steps, so it is not used here. Thus, I do NOT proceed to make install
>>>>>> any of these libraries.)
>>>>>>
>>>>>> The three earlier versions of the Intel 2011 compilers all pass the make
>>>>>> check tests. When I compare the ./configure stdout files, they are all
>>>>>> identical. However, the ./configure stdout file for the Intel
>>>>>> 2011.6.233 compilers has one difference:
>>>>>>
>>>>>>> [root@hydra openmpi-1.4.3]# diff
>>>>>>> ../openmpi-1.4.3-configure-intel.{5.220,6.233}.log
>>>>>>> 178c178
>>>>>>> < checking for __attribute__(may_alias)... no
>>>>>>> ---
>>>>>>> > checking for __attribute__(may_alias)... yes
>>>>>>
>>>>>> That is obviously where I will start looking for the source of the
>>>>>> problem.
>>>>>>
>>>>>> Maybe someone reading this list knows what the purpose of that test is,
>>>>>> whether the Intel 2011 compilers are expected to have this feature
>>>>>> enabled, and whether the code this enables can cause this problem if the
>>>>>> Intel 2011.6.233 compilers do not fully support whatever this test is
>>>>>> intended to discern.
>>>>>>
>>>>>> Larry Baker
>>>>>> US Geological Survey
>>>>>> 650-329-5608
>>>>>> [email protected]
>>>>>>
>>>>>> _______________________________________________
>>>>>> devel mailing list
>>>>>> [email protected]
>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>
>>>>> _______________________________________________
>>>>> devel mailing list
>>>>> [email protected]
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>
>>>> _______________________________________________
>>>> devel mailing list
>>>> [email protected]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>
>>> _______________________________________________
>>> devel mailing list
>>> [email protected]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>> _______________________________________________
>> devel mailing list
>> [email protected]
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> <Intel20110811Fix.patch.txt>