Re: [OMPI users] parallel I/O on 64-bit indexed arays

2011-06-06 Thread Jeff Squyres
If I understand your question correctly, this is *exactly* one of the reasons 
that the MPI Forum has been arguing about the use of a new type, "MPI_Count", 
for certain parameters that can get very, very large.

-
Sidenote: I believe that a workaround for you is to create some new MPI 
datatypes (e.g., of type contiguous) that you can then use to multiply to get 
to the offsets that you want.  I.e., if you make a type contig datatype of 4 
doubles, you can still only specify up to 2B of them, but that will now get you 
up to an offset of (2B * 4 * sizeof(double)) rather than (2B * sizeof(double)). 
 Make sense?
-

This ticket for the MPI-3 standard is a first step in the right direction, but 
won't do everything you need (this is more FYI):

https://svn.mpi-forum.org/trac/mpi-forum-web/ticket/265

See the PDF attached to the ticket; it's going up for a "first reading" in a 
month.  It'll hopefully be part of the MPI-3 standard by the end of the year 
(Fab Tillier, CC'ed, has been the chief proponent of this ticket for the past 
several months).

Quincey Koziol from the HDF group is going to propose a follow on to this 
ticket, specifically about the case you're referring to -- large counts for 
file functions and datatype constructors.  Quincey -- can you expand on what 
you'll be proposing, perchance?



On Jun 6, 2011, at 5:26 AM, Troels Haugboelle wrote:

> Hello!
> 
> The problem I face is not open-mpi specific, but I hope still the MPI wizards 
> on the list can help me nonetheless.
> 
> I am running and developing a large-scale scientific code written in 
> Fortran90. One type of objects are global 1-D vectors, which contains data 
> for particles in the application. I want to use MPI commands for saving the 
> particle data but the global 1D array holding the data can reach up to 100 
> billion elements, and array offsets and global sizes have to be 64-bit.
> 
> We use MPI_TYPE_CREATE_SUBARRAY for making a custom type and then 
> MPI_FILE_SET_VIEW and MPI_FILE_WRITE_ALL for saving the 3D field data. This 
> works with good performance on even very large installations / runs, but 
> arguments to MPI_TYPE_CREATE_SUBARRAY are 32 bit integers, and that is not 
> sufficient for the 1D-particle array. It needs 64-bit offsets and 64-bit 
> global sizes. Local sizes for each thread are 32-bit though.
> 
> What MPI call could I use to make a custom MPI type that describes the above 
> data, which has 64-bit indices / global size ?
> 
> As an example, for 3 threads the type layout would be :
> 
> Thread 0: n0 reals, n1 holes, n2 holes
> Thread 1: n0 holes, n1 reals, n2 holes
> Thread 2: n0 holes, n1 holes, n2 reals
> 
> The problem is I have to generalize that to 100 billion elements and 250k 
> threads.
> 
> As a remark; given that data keeps on becoming bigger: It would be very nice 
> if the arguments to MPI_TYPE_CREATE_SUBARRAY, and arguments to other similar 
> routines could be 64-bit.
> 
> TIA,
> 
> Troels
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/




Re: [OMPI users] difference between MTL and BTL

2011-06-06 Thread Jeff Squyres
Yes -- check out the README; there's a section on MTLs vs. BTLs.

If that's not clear, post back here and we can explain further (and update the 
README :-) ).


On Jun 4, 2011, at 10:15 AM, amjad ali wrote:

> Hello all,
>  
> the FAQ page about using myrinet
> http://www.open-mpi.org/faq/?category=myrinet 
>  
> says that
>  
> Note that one cannot use both the mx MTL and the mx BTL components at once. 
> Deciding which to use largely depends on the application being run.
>  
> Can anybody give any further clue on how to decide which one to use MTL or 
> BTL?
>  
> Thank you.
>  
> Regards,
> Amjad Ali
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/




Re: [OMPI users] openmpi (1.2.8 or above) and Intel composer XE 2011 (aka 12.0)

2011-06-06 Thread Jeff Squyres
Done -- how's this:

http://www.open-mpi.org/faq/?category=openfabrics#ib-btl



On May 27, 2011, at 12:53 PM, Gus Correa wrote:

> Eugene Loh wrote:
>> On 5/27/2011 4:32 AM, Jeff Squyres wrote:
>>> On May 27, 2011, at 4:30 AM, Robert Horton wrote:
> To be clear, if you explicitly list which BTLs to use, OMPI will only
> (try to) use exactly those and no others.
 It might be worth putting the sm btl in the FAQ:
 
 http://www.open-mpi.org/faq/?category=openfabrics#ib-btl
>>> Is this entry not clear enough?
>>> 
>>> http://www.open-mpi.org/faq/?category=tuning#selecting-components
>> I think his point is that the example in the ib-btl entry would be more 
>> helpful as a template for usage if it added sm.  Why point users to a 
>> different FAQ entry (which we don't do anyhow) when three more characters 
>> ",sm" makes the ib-btl entry so much more helpful.
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> Hi Jeff, list
> 
> I agree with Eugene and Robert.
> By all means, please add ",sm" to "openib,self" in:
> 
> http://www.open-mpi.org/faq/?category=openfabrics#ib-btl
> 
> I am yet to see a situation where you want to run with openib and self,
> but exclude sm (except for testing, perhaps when memcpy is broken).
> 
> Maybe that is what led Salvatore Podda think there was a
> "Law of Least Astonishment" behind the mca parameters syntax,
> which would insert "sm" automatically to the other two btl,
> which is not really the case.
> 
> Like Salvatore, I've got confused by the mca parameter
> syntax in the past also.
> My recollection is that Jeff wrote the second
> FAQ to placate my whining in the list about
> to sm or not to sm.
> 
> However, the second FAQ clarifies the mca parameter logic,
> along with the role of the "^" clause, and IMHO should be kept there:
> 
> http://www.open-mpi.org/faq/?category=tuning#selecting-components
> 
> My two cents,
> Gus Correa
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/




Re: [OMPI users] ifort 12.0.4 install problem

2011-06-06 Thread Jeff Squyres
On Jun 6, 2011, at 10:43 AM, Virginie trinite wrote:

> I try to compile open-mpi with ifort 12.0.4. My system is ubuntu
> lucid. Previous intallation with ifort 11.1 was fine.
> 
> configure and make all seems to work well, but make install report an error:
> libtool: line 7847: icc: command not found
> libtool: install: error: relink `lipopen-rte.la' with the above
> command before installing it
> 
> I want to underline that icc is a knom command for bash.

Somehow it became unknown.  Is your PATH being reset somehow?  Or perhaps if 
your .bashrc resetting your PATH such that even if "which icc" finds it at the 
shell prompt, if sub-shells have your .bashrc invoked, the PATH gets reset (or 
the icc settings don't get inherited properly), and therefore it becomes 
unknown...?

> I have check the FAQ and it seems to me that the problem is more like
> the one report for IBM compiler. So I try with

I'm a little confused why you're mentioning the IBM compiler...?  This issue is 
a shell/build issue (I assume...?  You only sent a few lines from the output, 
so I can't tell exactly where the error is occurring).

> configure CC=icc CXX=icpc F77=ifort FC=ifort --disable-shared --enable-static
> Now the install finish without error, but when I try to run mpi I have
> error message:

Now I'm very confused.  :-\

Can you please send all the information listed here:

http://www.open-mpi.org/community/help/

This will help me understand what the problem is and what you tried to do to 
fix it.

Thanks.

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/




Re: [OMPI users] Program hangs when using OpenMPI and CUDA

2011-06-06 Thread Fengguang Song
Hi Rolf,

I  double checked the flag just now. It was set correctly, but the hanging 
problem is still there.
But I found another way to solve the hanging problem. Just setting environment 
CUDA_NIC_INTEROP 1
could solve the issue.

Thanks,
Fengguang


On Jun 6, 2011, at 10:44 AM, Rolf vandeVaart wrote:

> Hi Fengguang:
> 
> That is odd that you see the problem even when running with the openib flags 
> set as Brice indicated.  Just to be extra sure there are no typo errors in 
> your flag settings, maybe you can verify with the ompi_info command like this?
> 
> ompi_info -mca btl_openib_flags 304 -param btl openib | grep btl_openib_flags
> 
> When running with the 304 setting, then all communications travel through a 
> regular send/receive protocol on IB.  The message is broken up into a 12K 
> fragment, followed by however many 64K fragments it takes to move the message.
> 
> I will try and find to time to reproduce the other 1 Mbyte issue that Brice 
> reported.
> 
> Rolf
> 
> 
> 
> PS: Not sure if you are interested, but in the trunk, you can configure in 
> support so that you can send and receive GPU buffers directly.  There are 
> still many performance issues to be worked out, but just thought I would 
> mention it.
> 
> 
> -Original Message-
> From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On 
> Behalf Of Fengguang Song
> Sent: Sunday, June 05, 2011 9:54 AM
> To: Open MPI Users
> Subject: Re: [OMPI users] Program hangs when using OpenMPI and CUDA
> 
> Hi Brice,
> 
> Thank you! I saw your previous discussion and actually have tried "--mca 
> btl_openib_flags 304".
> It didn't solve the problem unfortunately. In our case, the MPI buffer is 
> different from the cudaMemcpy buffer and we do manually copy between them. 
> I'm still trying to figure out how to configure OpenMPI's mca parameters to 
> solve the problem...
> 
> Thanks,
> Fengguang
> 
> 
> On Jun 5, 2011, at 2:20 AM, Brice Goglin wrote:
> 
>> Le 05/06/2011 00:15, Fengguang Song a écrit :
>>> Hi,
>>> 
>>> I'm confronting a problem when using OpenMPI 1.5.1 on a GPU cluster. 
>>> My program uses MPI to exchange data between nodes, and uses 
>>> cudaMemcpyAsync to exchange data between Host and GPU devices within a node.
>>> When the MPI message size is less than 1MB, everything works fine. 
>>> However, when the message size is > 1MB, the program hangs (i.e., an MPI 
>>> send never reaches its destination based on my trace).
>>> 
>>> The issue may be related to locked-memory contention between OpenMPI and 
>>> CUDA.
>>> Does anyone have the experience to solve the problem? Which MCA 
>>> parameters should I tune to increase the message size to be > 1MB (to avoid 
>>> the program hang)? Any help would be appreciated.
>>> 
>>> Thanks,
>>> Fengguang
>> 
>> Hello,
>> 
>> I may have seen the same problem when testing GPU direct. Do you use 
>> the same host buffer for copying from/to GPU and for sending/receiving 
>> on the network ? If so, you need a GPUDirect enabled kernel and 
>> mellanox drivers, but it only helps before 1MB.
>> 
>> You can work around the problem with one of the following solution:
>> * add --mca btl_openib_flags 304 to force OMPI to always send/recv 
>> through an intermediate (internal buffer), but it'll decrease 
>> performance before 1MB too
>> * use different host buffers for the GPU and the network and manually 
>> copy between them
>> 
>> I never got any reply from NVIDIA/Mellanox/here when I reported this 
>> problem with GPUDirect and messages larger than 1MB.
>> http://www.open-mpi.org/community/lists/users/2011/03/15823.php
>> 
>> Brice
>> 
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> ---
> This email message is for the sole use of the intended recipient(s) and may 
> contain
> confidential information.  Any unauthorized review, use, disclosure or 
> distribution
> is prohibited.  If you are not the intended recipient, please contact the 
> sender by
> reply email and destroy all copies of the original message.
> ---
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] Program hangs when using OpenMPI and CUDA

2011-06-06 Thread Rolf vandeVaart
Hi Fengguang:

That is odd that you see the problem even when running with the openib flags 
set as Brice indicated.  Just to be extra sure there are no typo errors in your 
flag settings, maybe you can verify with the ompi_info command like this?

ompi_info -mca btl_openib_flags 304 -param btl openib | grep btl_openib_flags

When running with the 304 setting, then all communications travel through a 
regular send/receive protocol on IB.  The message is broken up into a 12K 
fragment, followed by however many 64K fragments it takes to move the message.

I will try and find to time to reproduce the other 1 Mbyte issue that Brice 
reported.

Rolf



PS: Not sure if you are interested, but in the trunk, you can configure in 
support so that you can send and receive GPU buffers directly.  There are still 
many performance issues to be worked out, but just thought I would mention it.


-Original Message-
From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf 
Of Fengguang Song
Sent: Sunday, June 05, 2011 9:54 AM
To: Open MPI Users
Subject: Re: [OMPI users] Program hangs when using OpenMPI and CUDA

Hi Brice,

Thank you! I saw your previous discussion and actually have tried "--mca 
btl_openib_flags 304".
It didn't solve the problem unfortunately. In our case, the MPI buffer is 
different from the cudaMemcpy buffer and we do manually copy between them. I'm 
still trying to figure out how to configure OpenMPI's mca parameters to solve 
the problem...

Thanks,
Fengguang


On Jun 5, 2011, at 2:20 AM, Brice Goglin wrote:

> Le 05/06/2011 00:15, Fengguang Song a écrit :
>> Hi,
>> 
>> I'm confronting a problem when using OpenMPI 1.5.1 on a GPU cluster. 
>> My program uses MPI to exchange data between nodes, and uses cudaMemcpyAsync 
>> to exchange data between Host and GPU devices within a node.
>> When the MPI message size is less than 1MB, everything works fine. 
>> However, when the message size is > 1MB, the program hangs (i.e., an MPI 
>> send never reaches its destination based on my trace).
>> 
>> The issue may be related to locked-memory contention between OpenMPI and 
>> CUDA.
>> Does anyone have the experience to solve the problem? Which MCA 
>> parameters should I tune to increase the message size to be > 1MB (to avoid 
>> the program hang)? Any help would be appreciated.
>> 
>> Thanks,
>> Fengguang
> 
> Hello,
> 
> I may have seen the same problem when testing GPU direct. Do you use 
> the same host buffer for copying from/to GPU and for sending/receiving 
> on the network ? If so, you need a GPUDirect enabled kernel and 
> mellanox drivers, but it only helps before 1MB.
> 
> You can work around the problem with one of the following solution:
> * add --mca btl_openib_flags 304 to force OMPI to always send/recv 
> through an intermediate (internal buffer), but it'll decrease 
> performance before 1MB too
> * use different host buffers for the GPU and the network and manually 
> copy between them
> 
> I never got any reply from NVIDIA/Mellanox/here when I reported this 
> problem with GPUDirect and messages larger than 1MB.
> http://www.open-mpi.org/community/lists/users/2011/03/15823.php
> 
> Brice
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
---
This email message is for the sole use of the intended recipient(s) and may 
contain
confidential information.  Any unauthorized review, use, disclosure or 
distribution
is prohibited.  If you are not the intended recipient, please contact the 
sender by
reply email and destroy all copies of the original message.
---



[OMPI users] ifort 12.0.4 install problem

2011-06-06 Thread Virginie trinite
Hello

I try to compile open-mpi with ifort 12.0.4. My system is ubuntu
lucid. Previous intallation with ifort 11.1 was fine.

configure and make all seems to work well, but make install report an error:
libtool: line 7847: icc: command not found
libtool: install: error: relink `lipopen-rte.la' with the above
command before installing it

I want to underline that icc is a knom command for bash.
I have check the FAQ and it seems to me that the problem is more like
the one report for IBM compiler. So I try with
configure CC=icc CXX=icpc F77=ifort FC=ifort --disable-shared --enable-static
Now the install finish without error, but when I try to run mpi I have
error message:

No available pml components were found
This mean ..


PML ob1 cannot be selected

The output of ompi_info is:

Package: Open MPI user@CAPYS Distribution
Open MPI: 1.4.3
   Open MPI SVN revision: r23834
   Open MPI release date: Oct 05, 2010
Open RTE: 1.4.3
   Open RTE SVN revision: r23834
   Open RTE release date: Oct 05, 2010
OPAL: 1.4.3
   OPAL SVN revision: r23834
   OPAL release date: Oct 05, 2010
Ident string: 1.4.3
  Prefix: /usr/local
 Configured architecture: x86_64-unknown-linux-gnu
  Configure host: CAPYS
   Configured by: user
   Configured on: Mon Jun  6 11:00:10 CEST 2011
  Configure host: CAPYS
Built by: user
Built on: lundi 6 juin 2011, 11:03:21 (UTC+0200)
  Built host: CAPYS
  C bindings: yes
C++ bindings: yes
  Fortran77 bindings: yes (all)
  Fortran90 bindings: yes
 Fortran90 bindings size: small
  C compiler: icc
 C compiler absolute: /opt/intel/composerxe-2011.4.191/bin/intel64/icc
C++ compiler: icpc
   C++ compiler absolute: /opt/intel/composerxe-2011.4.191/bin/intel64/icpc
  Fortran77 compiler: ifort
  Fortran77 compiler abs: /opt/intel/composerxe-2011.4.191/bin/intel64/ifort
  Fortran90 compiler: ifort
  Fortran90 compiler abs: /opt/intel/composerxe-2011.4.191/bin/intel64/ifort
 C profiling: yes
   C++ profiling: yes
 Fortran77 profiling: yes
 Fortran90 profiling: yes
  C++ exceptions: no
  Thread support: posix (mpi: no, progress: no)
   Sparse Groups: no
  Internal debug support: no
 MPI parameter check: runtime
Memory profiling support: no
Memory debugging support: no
 libltdl support: yes
   Heterogeneous support: no
 mpirun default --prefix: no
 MPI I/O support: yes
   MPI_WTIME support: gettimeofday
Symbol visibility support: yes
   FT Checkpoint support: no  (checkpoint thread: no)
   MCA backtrace: execinfo (MCA v2.0, API v2.0, Component v1.4.3)
  MCA memory: ptmalloc2 (MCA v2.0, API v2.0, Component v1.4.3)
   MCA paffinity: linux (MCA v2.0, API v2.0, Component v1.4.3)
   MCA carto: auto_detect (MCA v2.0, API v2.0, Component v1.4.3)
   MCA carto: file (MCA v2.0, API v2.0, Component v1.4.3)
   MCA maffinity: first_use (MCA v2.0, API v2.0, Component v1.4.3)
   MCA timer: linux (MCA v2.0, API v2.0, Component v1.4.3)
 MCA installdirs: env (MCA v2.0, API v2.0, Component v1.4.3)
 MCA installdirs: config (MCA v2.0, API v2.0, Component v1.4.3)
 MCA dpm: orte (MCA v2.0, API v2.0, Component v1.4.3)
  MCA pubsub: orte (MCA v2.0, API v2.0, Component v1.4.3)
   MCA allocator: basic (MCA v2.0, API v2.0, Component v1.4.3)
   MCA allocator: bucket (MCA v2.0, API v2.0, Component v1.4.3)
MCA coll: basic (MCA v2.0, API v2.0, Component v1.4.3)
MCA coll: hierarch (MCA v2.0, API v2.0, Component v1.4.3)
MCA coll: inter (MCA v2.0, API v2.0, Component v1.4.3)
MCA coll: self (MCA v2.0, API v2.0, Component v1.4.3)
MCA coll: sm (MCA v2.0, API v2.0, Component v1.4.3)
MCA coll: sync (MCA v2.0, API v2.0, Component v1.4.3)
MCA coll: tuned (MCA v2.0, API v2.0, Component v1.4.3)
  MCA io: romio (MCA v2.0, API v2.0, Component v1.4.3)
   MCA mpool: fake (MCA v2.0, API v2.0, Component v1.4.3)
   MCA mpool: rdma (MCA v2.0, API v2.0, Component v1.4.3)
   MCA mpool: sm (MCA v2.0, API v2.0, Component v1.4.3)
 MCA pml: cm (MCA v2.0, API v2.0, Component v1.4.3)
 MCA pml: csum (MCA v2.0, API v2.0, Component v1.4.3)
 MCA pml: ob1 (MCA v2.0, API v2.0, Component v1.4.3)
 MCA pml: v (MCA v2.0, API v2.0, Component v1.4.3)
 MCA bml: r2 (MCA v2.0, API v2.0, Component v1.4.3)
  MCA rcache: vma (MCA v2.0, API v2.0, Component v1.4.3)
 MCA btl: self (MCA v2.0, API v2.0, Component v1.4.3)
 MCA btl: sm (MCA v2.0, API v2.0, 

[OMPI users] parallel I/O on 64-bit indexed arays

2011-06-06 Thread Troels Haugboelle

Hello!

The problem I face is not open-mpi specific, but I hope still the MPI 
wizards on the list can help me nonetheless.


I am running and developing a large-scale scientific code written in 
Fortran90. One type of objects are global 1-D vectors, which contains 
data for particles in the application. I want to use MPI commands for 
saving the particle data but the global 1D array holding the data can 
reach up to 100 billion elements, and array offsets and global sizes 
have to be 64-bit.


We use MPI_TYPE_CREATE_SUBARRAY for making a custom type and then 
MPI_FILE_SET_VIEW and MPI_FILE_WRITE_ALL for saving the 3D field data. 
This works with good performance on even very large installations / 
runs, but arguments to MPI_TYPE_CREATE_SUBARRAY are 32 bit integers, and 
that is not sufficient for the 1D-particle array. It needs 64-bit 
offsets and 64-bit global sizes. Local sizes for each thread are 32-bit 
though.


What MPI call could I use to make a custom MPI type that describes the 
above data, which has 64-bit indices / global size ?


As an example, for 3 threads the type layout would be :

Thread 0: n0 reals, n1 holes, n2 holes
Thread 1: n0 holes, n1 reals, n2 holes
Thread 2: n0 holes, n1 holes, n2 reals

The problem is I have to generalize that to 100 billion elements and 
250k threads.


As a remark; given that data keeps on becoming bigger: It would be very 
nice if the arguments to MPI_TYPE_CREATE_SUBARRAY, and arguments to 
other similar routines could be 64-bit.


TIA,

Troels




Re: [OMPI users] running MPI application and using C/R OpenMPI 1.5.3

2011-06-06 Thread Marcin Zielinski

Hello,

Did anyone try to fiddle with this riddle of mine ?

> $> mpirun -n 2 -am ft-enable-cr ./myapp <  myapp>
>
> $> ompi-checkpoint -s --term 
> produces the following error for myapp (in case of -n 2):
>
> forrtl: severe (174): SIGSEGV, segmentation fault occurred
> forrtl: severe (174): SIGSEGV, segmentation fault occurred
> [hostname:29664] local) Error: Unable to read state from named pipe
> (/global_dir/opal_cr_prog_write.29666). 0
> [hostname:29664] [[27518,0],0] ORTE_ERROR_LOG: Error in file
> snapc_full_local.c at line 1602
> 
--

> mpirun has exited due to process rank 1 with PID 29666 on
> node hostname exiting improperly. There are two reasons this could occur:
>
> 1. this process did not call "init" before exiting, but others in
> the job did. This can cause a job to hang indefinitely while it waits
> for all processes to call "init". By rule, if one process calls "init",
> then ALL processes must call "init" prior to termination.
>
> 2. this process called "init", but exited without calling "finalize".
> By rule, all processes that call "init" MUST call "finalize" prior to
> exiting or it will be considered an "abnormal termination"
>
> This may have caused other processes in the application to be
> terminated by signals sent by mpirun (as reported here).
> 
--

>
> The /global_dir/, /local_dir/ and /tmp_dir/ are all write/readable by
> the user which invokes the mpirun myapp.

best regards,

On 05/30/2011 11:25 AM, Marcin Zielinski wrote:

Dear all,

After looking for various topics to solve my problem, I'm forced to turn
to You all in here. Tho I have to say, I did not find the following
problem yet. Could be something amazingly easy to solve.

Anyways, I'm running a MPI application, compiled with an OpenMPI 1.5.3.
OpenMPI 1.5.3 been compiled with BLCR support. BLCR been compiled with
no errors and works fine. The configure looks like this:
export CC='icc'
export CXX='icpc'
export F77='ifort'
export FC='ifort'
export F90='ifort'
export FCFLAGS='-O2'
export FFLAGS='-O2'
export CFLAGS='-O2'
export CXXFLAGS='-O2'
export OMP_NUM_THREADS='1'
# export CPP='cpp'

export LD_RUNPATH=$installdir/lib

make clean
./configure --prefix=$installdir \
--enable-orterun-prefix-by-default \
--with-openib=$ofed \
--enable-mpi-threads \
--enable-ft-thread \
--with-ft=cr \
--with-blcr=/path_to_blcr_0.8.2_build_dir/ \
--with-blcr-libdir=/path_to_blcr_lib_dir/ \
--disable-dlopen \
&& \
make && make install || exit

ifort and icc are:
$ ifort --version
ifort (IFORT) 11.0 20080930 / 11.0.069 64bit

$ icc --version
icc (ICC) 11.0 20080930 / 11.0.074 64bit

The MPI application (let's skip the name and what it does) runs
perfectly fine when invoking:
mpirun ./myapp <  (running serial on parallel code)

and when invoking:
mpirun -n  ./myapp < 

In both cases it always produces the right results from calculations.

Now, enabling C/R works for one case only:
mpirun -am ft-enable-cr ./myapp <  (running serial
on parallel code with C/R enabled)

later on, invoking ompi-checkpoint -s --term 
produces a nice global snapshot and
ompi-restart 
re-runs the calculations from the checkpointed point perfectly fine,
finishing it to the end with a proper results.

Now, invoking
mpirun -n  1> -am ft-enable-cr ./myapp < 

and checkpointing:
ompi-checkpoint -s --term 
produces the following error for myapp (in case of -n 2):

forrtl: severe (174): SIGSEGV, segmentation fault occurred
forrtl: severe (174): SIGSEGV, segmentation fault occurred
[hostname:29664] local) Error: Unable to read state from named pipe
(/global_dir/opal_cr_prog_write.29666). 0
[hostname:29664] [[27518,0],0] ORTE_ERROR_LOG: Error in file
snapc_full_local.c at line 1602
--
mpirun has exited due to process rank 1 with PID 29666 on
node hostname exiting improperly. There are two reasons this could occur:

1. this process did not call "init" before exiting, but others in
the job did. This can cause a job to hang indefinitely while it waits
for all processes to call "init". By rule, if one process calls "init",
then ALL processes must call "init" prior to termination.

2. this process called "init", but exited without calling "finalize".
By rule, all processes that call "init" MUST call "finalize" prior to
exiting or it will be considered an "abnormal termination"

This may have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).
--

The /global_dir/, /local_dir/ and /tmp_dir/ are all write/readable by
the user which invokes the mpirun myapp.

Any suggestions on top of You heads ?
I would appreciate any help on this.

Best regards,