On Fri, Feb 6, 2015 at 9:12 PM, Gilles Gouaillardet <
gilles.gouaillar...@gmail.com> wrote:

> George,
>
> I cannot acces parsec : http error 403 :-(
>

Our webserver was down over the weekend. Please try again.

  George.



>
> I understand your point of view.
> Back to the opal_lifo test, and if i remember correctly, it hangs in the
> non multi threaded part : the very first pop loops forever since cas always
> fails in comparing values that are equal indeed.
> Though there is a possibility the problem comes from ompi, and we are just
> lucky it works with recent icc, i would not go "all in" with this ...
>
> And as you pointed, even if the problem does come from the compiler, that
> does not mean ompi algo are necessarily correct.
>
> Cheers,
>
> Gilles
>
> George Bosilca <bosi...@icl.utk.edu> wrote:
> On Fri, Feb 6, 2015 at 8:54 AM, Gilles Gouaillardet <
> gilles.gouaillar...@gmail.com> wrote:
>
>> George,
>>
>> Can you point me to an other project that uses 128 bits atomics ?
>>
>
> http://icl.cs.utk.edu/parsec/. It heavily uses lock-free structures, and
> the 128 bits atomics are the safest and fastest way to implement them.
>
>
>> In my tests, i noticed that the volatile keyword is (one of) the trigger
>> of the compiler bug.
>>
>
> I usually use it for the location to be atomically changed.
>
>
>> At this stage, i could not see anything wrong in ompi, plus this is
>> working fine with recent gcc and icc, so i concluded this is an icc bug,
>> that is now fixed, so all ompi can do is hide the symptom.
>>
>
> These issues are pretty tricky to trigger, we need special race conditions
> while manipulating pointers. There are tens of papers about how to
> correctly implement FIFOs with CAS2, and even after peer reviews some of
> them turned out to be incorrect. What I am saying is that we are quick to
> blame these failures on the icc compiler, while we have no formal proof
> that the FIFO algorithm in Open MPI is correct.
>
>   George.
>
>
>
>>
>> Cheers,
>>
>> Gilles
>>
>>
>> George Bosilca <bosi...@icl.utk.edu> wrote:
>> My feeling is that the current patch hide the symptoms without addressing
>> the real issue.
>>
>> As a side note: The compiler incriminated in this thread, works perfectly
>> for 128 bits atomic operations in other projects where I use atomic LIFO &
>> FIFO (but not the one from OMPI as I already raised my concerns about this).
>>
>>   George.
>>
>> PS: Why are there totally non-related comments about FIFO in the
>> opal_lifo.h (starting line 61)?
>>
>> On Wed, Feb 4, 2015 at 11:30 PM, Gilles Gouaillardet <
>> gilles.gouaillar...@iferc.org> wrote:
>>
>>>  Paul and all,
>>>
>>> i just pushed
>>> https://github.com/open-mpi/ompi/commit/b42e3441294e9fe787fe8e9ad7403d5b8e465163
>>>
>>> when a buggy compiler is detected, configure now forces
>>> OPAL_HAVE_CMPXCHG16B=0
>>> this is enough to make opal_lifo test and make check happy again.
>>>
>>> Cheers,
>>>
>>> Gilles
>>>
>>>
>>> On 2015/02/04 17:26, Gilles Gouaillardet wrote:
>>>
>>> Paul,
>>>
>>> my previous email was misleading.
>>>
>>> what i really meant is the opal_fifo test works fine with icc 2013u5
>>> (the release before 2013sp1) and
>>> icc 2013sp1u2 and later
>>>
>>> so even if the reproducer fails with icc older that 2013sp1u2, that
>>> might not impact ompi
>>> since for other reasons, the bug is not hit
>>>
>>> for example, with icc 2013u5, OPAL_HAVE_CMPXCHG16B=0 so ompi stays away
>>> from the compiler bug.
>>>
>>> Cheers,
>>>
>>> Gilles
>>>
>>> On 2015/02/04 17:15, Paul Hargrove wrote:
>>>
>>>  Giles,
>>>
>>> Who says only 2 version are effected?
>>>
>>> I have access to 9 revisions of icc.
>>> Using your reduced case I find 7 that fail and only 2 (the latest two) that
>>> pass.
>>> Discounting icc-12 (which can't compile the test) that makes 6 versions
>>> effected by the bug (not 2).
>>>
>>> -Paul
>>>
>>> $ for x in 12.1.5.339 13.0.0.079 13.0.1.117 13.1.2.183 13.1.3.192
>>> 14.0.0.080 14.0.1.106 14.0.2.144 15.0.1.133; do module swap intel intel/$x
>>> ; echo @ Testing Intel compiler version $x; icc conftest.c && ./a.out &&
>>> echo PASS ; done
>>> @ Testing Intel compiler version 12.1.5.339
>>> conftest.c(10): error: identifier "__int128_t" is undefined
>>>       __int128_t value;
>>>       ^
>>>
>>> compilation aborted for conftest.c (code 2)
>>> @ Testing Intel compiler version 13.0.0.079
>>> a.out: conftest.c:36: main: Assertion `a.value == b.value' failed.
>>> Aborted
>>> @ Testing Intel compiler version 13.0.1.117
>>> a.out: conftest.c:36: main: Assertion `a.value == b.value' failed.
>>> Aborted
>>> @ Testing Intel compiler version 13.1.2.183
>>> a.out: conftest.c:36: main: Assertion `a.value == b.value' failed.
>>> Aborted
>>> @ Testing Intel compiler version 13.1.3.192
>>> a.out: conftest.c:36: main: Assertion `a.value == b.value' failed.
>>> Aborted
>>> @ Testing Intel compiler version 14.0.0.080
>>> a.out: conftest.c:36: main: Assertion `a.value == b.value' failed.
>>> Aborted
>>> @ Testing Intel compiler version 14.0.1.106
>>> a.out: conftest.c:36: main: Assertion `a.value == b.value' failed.
>>> Aborted
>>> @ Testing Intel compiler version 14.0.2.144
>>> PASS
>>> @ Testing Intel compiler version 15.0.1.133
>>> PASS
>>>
>>> On Tue, Feb 3, 2015 at 11:45 PM, Gilles Gouaillardet 
>>> <gilles.gouaillar...@iferc.org> wrote:
>>>
>>>
>>>   Nathan,
>>>
>>> imho, this is a compiler bug and only two versions are affected :
>>> - intel icc 14.0.0.080 (aka 2013sp1)
>>> - intel icc 14.0.1.106 (aka 2013sp1u1)
>>> /* note the bug only occurs with -O1 and higher optimization levels */
>>>
>>> here is attached a simple reproducer
>>>
>>> a simple workaround is to configure with ac_cv_type___int128=0
>>>
>>> Cheers,
>>>
>>> Gilles
>>>
>>> On 2015/02/04 4:17, Nathan Hjelm wrote:
>>>
>>> Thats the second report involving icc 14. I will dig into this later
>>> this week.
>>>
>>> -Nathan
>>>
>>> On Mon, Feb 02, 2015 at 11:03:41PM -0800, Paul Hargrove wrote:
>>>
>>>     I have seen opal_fifo hang on 2 distinct systems
>>>     + Linux/ppc32 with xlc-11.1
>>>     + Linux/x86-64 with icc-14.0.1.106
>>>    I have no explanation to offer for either hang.
>>>    No "weird" configure options were passed to either.
>>>    -Paul
>>>    --
>>>    Paul H. Hargrove                          phhargr...@lbl.gov
>>>    Computer Languages & Systems Software (CLaSS) Group
>>>    Computer Science Department               Tel: +1-510-495-2352
>>>    Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900
>>>
>>>   _______________________________________________
>>> devel mailing listde...@open-mpi.org
>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> Link to this post: 
>>> http://www.open-mpi.org/community/lists/devel/2015/02/16911.php
>>>
>>>
>>>
>>> _______________________________________________
>>> devel mailing listde...@open-mpi.org
>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> Link to this post: 
>>> http://www.open-mpi.org/community/lists/devel/2015/02/16920.php
>>>
>>>
>>>
>>> _______________________________________________
>>> devel mailing listde...@open-mpi.org
>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> Link to this 
>>> post:http://www.open-mpi.org/community/lists/devel/2015/02/16921.php
>>>
>>>  _______________________________________________
>>> devel mailing listde...@open-mpi.org
>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> Link to this post: 
>>> http://www.open-mpi.org/community/lists/devel/2015/02/16922.php
>>>
>>>
>>>
>>> _______________________________________________
>>> devel mailing listde...@open-mpi.org
>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>
>>> Link to this post: 
>>> http://www.open-mpi.org/community/lists/devel/2015/02/16923.php
>>>
>>>
>>>
>>> _______________________________________________
>>> devel mailing list
>>> de...@open-mpi.org
>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> Link to this post:
>>> http://www.open-mpi.org/community/lists/devel/2015/02/16926.php
>>>
>>
>>
>> _______________________________________________
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post:
>> http://www.open-mpi.org/community/lists/devel/2015/02/16949.php
>>
>
>
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2015/02/16959.php
>

Reply via email to