Hi,

an update on the issue:


So far, I confirmed the the problem is related with the intel *libiomp5.so*
library, and have found
the following workaround:

link  with the flag

*  -qopenmp-link=static  *
this way the libiomp5.so library is linked statically and the dmtcp
chekpoint and restart works.


best regards,

adolfo





2018-05-19 17:03 GMT-03:00 Kapil Arya <[email protected]>:

> Unfortunately, I don't have access to ifort :/. I have applied for an OSS
> license, so can't do much until I get it.
>
> On Fri, May 18, 2018 at 9:08 PM ADOLFO JAVIER BANCHIO <
> [email protected]> wrote:
>
>>
>> Hi,
>>
>> As a follow up, if I compile with ifort version 18 with the static
>> option, the program runs fine, but
>> with the the WARNING for beeing statically linked.
>>
>> This confirms that the problem lies in the intel *libiomp5.so* library.
>>
>>
>> If the problem can not be solved.
>> What are the inconvinients of statically linking and the use of DMTCP?
>>
>> regards,
>>
>> adolfo
>>
>>
>> 2018-05-18 19:21 GMT-03:00 Kapil Arya <[email protected]>:
>>
>>> I just tried the example compiled with gcc-fortran and don't see any
>>> issues:
>>>
>>> $ export OMP_NUM_THREADS="3"
>>>                                        $ dmtcp_launch ./omp_test.x
>>> [42000] NOTE at socketconnlist.cpp:220 in scanForPreExisting;
>>> REASON='found pre-existing socket... will not be restored'
>>>     fd = 30
>>>     device = socket:[1530774]
>>> [42000] WARNING at socketconnection.cpp:236 in TcpConnection;
>>> REASON='JWARNING((domain == AF_INET || domain == AF_UNIX || domain ==
>>> AF_INET6) && (type & 077) == SOCK_STREAM) failed'
>>>     domain = 0
>>>     type = 0
>>>     protocol = 0
>>> [42000] NOTE at socketconnlist.cpp:220 in scanForPreExisting;
>>> REASON='found pre-existing socket... will not be restored'
>>>     fd = 31
>>>     device = socket:[1530775]
>>> [42000] WARNING at socketconnection.cpp:236 in TcpConnection;
>>> REASON='JWARNING((domain == AF_INET || domain == AF_UNIX || domain ==
>>> AF_INET6) && (type & 077) == SOCK_STREAM) failed'
>>>     domain = 0
>>>     type = 0
>>>     protocol = 0
>>> [42000] NOTE at socketconnlist.cpp:220 in scanForPreExisting;
>>> REASON='found pre-existing socket... will not be restored'
>>>     fd = 39
>>>     device = socket:[1536308]
>>> [42000] WARNING at socketconnection.cpp:236 in TcpConnection;
>>> REASON='JWARNING((domain == AF_INET || domain == AF_UNIX || domain ==
>>> AF_INET6) && (type & 077) == SOCK_STREAM) failed'
>>>     domain = 0
>>>     type = 0
>>>     protocol = 0
>>> [42000] NOTE at socketconnlist.cpp:220 in scanForPreExisting;
>>> REASON='found pre-existing socket... will not be restored'
>>>     fd = 40
>>>     device = socket:[1536309]
>>> [42000] WARNING at socketconnection.cpp:236 in TcpConnection;
>>> REASON='JWARNING((domain == AF_INET || domain == AF_UNIX || domain ==
>>> AF_INET6) && (type & 077) == SOCK_STREAM) failed'
>>>     domain = 0
>>>     type = 0
>>>     protocol = 0
>>> Hello ...
>>> num threads =    622879781
>>>           0 /           0     -- >            8
>>>           0 /           0     -- >            8
>>>           0 /           0     -- >            8
>>>           0 /           0     -- >            8
>>>           0 /           0     -- >            8
>>>           0 /           0     -- >            8
>>>           0 /           0     -- >            8
>>>           0 /           0     -- >            8
>>>           0 /           0     -- >            8
>>>           0 /           0     -- >            9
>>>           0 /           0     -- >            9
>>>           0 /           0     -- >            9
>>>           0 /           0     -- >            9
>>>           0 /           0     -- >            9
>>>           0 /           0     -- >            9
>>>           0 /           0     -- >            9
>>>           0 /           0     -- >            9
>>>           0 /           0     -- >            9
>>>           0 /           0     -- >            9
>>>           0 /           0     -- >            9
>>>           0 /           0     -- >            9
>>>           0 /           0     -- >            9
>>>           0 /           0     -- >            9
>>>           0 /           0     -- >            9
>>>           0 /           0     -- >            9
>>>           0 /           0     -- >            9
>>>           0 /           0     -- >            9
>>>           0 /           0     -- >            9
>>>           0 /           0     -- >            9
>>>           0 /           0     -- >            9
>>> $
>>>
>>> On Fri, May 18, 2018 at 4:00 PM Kapil Arya <[email protected]>
>>> wrote:
>>>
>>>> Hi Adolfo,
>>>>
>>>> Can you also provide instructions to compile this code?
>>>>
>>>> Kapil
>>>>
>>>> On Fri, May 18, 2018 at 3:53 PM ADOLFO JAVIER BANCHIO <
>>>> [email protected]> wrote:
>>>>
>>>>>
>>>>>
>>>>> Hi all,
>>>>>
>>>>> After having googled quite a lot without success and also having
>>>>> checked archive posts, I still can not run fortran compiled openmp
>>>>> codes
>>>>> using dmtcp_launch.
>>>>>
>>>>> I have installed on a Rocks 7 (CENTOS 7) cluster dmtcp version 2.5.2
>>>>> (from rpm and also compiled with --enable-openm flag),
>>>>> and I still can not run openmp executables produced by ifort
>>>>> compilded f90 codes.
>>>>>
>>>>> I run:
>>>>>
>>>>> in *shell 1*
>>>>>
>>>>> /export/added_soft/dmtcp/dmtcp-2.5.2/bin/dmtcp_coordinator
>>>>>
>>>>>
>>>>>  and in *shell 2*
>>>>>
>>>>> export OMP_NUM_THREADS=3
>>>>>
>>>>> /export/added_soft/dmtcp/dmtcp-2.5.2/bin/dmtcp_launch ./omp_test.x
>>>>>
>>>>>
>>>>> output in *shell 1 *is:
>>>>>
>>>>> $ /export/added_soft/dmtcp/dmtcp-2.5.2/bin/dmtcp_coordinator
>>>>> dmtcp_coordinator starting...
>>>>>     Host: bandurria.fis.uncor.edu (0.0.0.0)
>>>>>     Port: 7779
>>>>>     Checkpoint Interval: disabled (checkpoint manually instead)
>>>>>     Exit on last client: 0
>>>>> Type '?' for help.
>>>>>
>>>>> [28865] NOTE at dmtcp_coordinator.cpp:1368 in
>>>>> updateCheckpointInterval; REASON='CheckpointInterval updated (for this
>>>>> computation only)'
>>>>>      oldInterval = 0
>>>>>      theCheckpointInterval = 0
>>>>> [28865] NOTE at dmtcp_coordinator.cpp:917 in onConnect; REASON='worker
>>>>> connected'
>>>>>      hello_remote.from = 1ba5f63f5ba22d27-29111-99b9e2da0f18
>>>>> [28865] NOTE at dmtcp_coordinator.cpp:667 in onData; REASON='Updating
>>>>> process Information after exec()'
>>>>>      progname = omp_test.x
>>>>>      msg.from = 1ba5f63f5ba22d27-40000-99b9e3d17fe2
>>>>>      client->identity() = 1ba5f63f5ba22d27-29111-99b9e2da0f18
>>>>>
>>>>>
>>>>> And* in shell 2*, the code starts (if I do top, it is running with
>>>>> one thread
>>>>> only, using 100% of cpu, but it seems not to spawn the threads, it
>>>>> seems
>>>>> that it gets stuck when it reaches a parallel section (the part of the
>>>>> code previous to parallel block it is actually executed).
>>>>>
>>>>>
>>>>> Thank you in advance for any help.
>>>>> I am new with dmtcp (coming from blcr), so, my apologies if this is
>>>>> a stupid issue ...
>>>>>
>>>>> regards,
>>>>>
>>>>> adolfo
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> P.S.: the code I am using for testing (other real codes fail in the
>>>>> same way)
>>>>> program omp_test
>>>>> implicit none
>>>>> integer(8)   :: i,j
>>>>> integer      :: nt,tn,omp_get_num_threads,omp_get_thread_num
>>>>>
>>>>> write(*,*) "Hello ..."
>>>>>
>>>>> !nt = omp_get_num_threads()
>>>>> write(*,*) 'num threads = ',nt
>>>>>
>>>>> !$OMP PARALLEL PRIVATE(i,tn,nt)
>>>>> do i = 1, 10**9
>>>>>   j = int( sqrt( log( real(i)/real(i**2.4) ) ) )
>>>>>   if (mod(i,10**8) == 0) then
>>>>> !    nt = omp_get_num_threads()
>>>>> !    tn =  omp_get_thread_num()
>>>>>     write(*,*)  tn, '/',nt,'    -- > ', nint( log(real(i))/log(10.) )
>>>>>   endif
>>>>> enddo
>>>>> !$OMP END PARALLEL
>>>>>
>>>>> end program
>>>>>
>>>>>
>>>>> ------------------------------------------------------------
>>>>> ------------------
>>>>> Check out the vibrant tech community on one of the world's most
>>>>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot______
>>>>> _________________________________________
>>>>> Dmtcp-forum mailing list
>>>>> [email protected]
>>>>> https://lists.sourceforge.net/lists/listinfo/dmtcp-forum
>>>>>
>>>>
>>
>> ------------------------------------------------------------
>> ------------------
>> Check out the vibrant tech community on one of the world's most
>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot______
>> _________________________________________
>> Dmtcp-forum mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/dmtcp-forum
>>
>
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Dmtcp-forum mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dmtcp-forum

Reply via email to