[OMPI users] CuEventCreate Failed...

2014-10-17 Thread Steven Eliuk
Hi All,

We have run into issues, that don’t really seem to materialize into incorrect 
results, nonetheless, we hope to figure out why we are getting them.

We have several environments with test from one machine, with say 1-16 
processes per node, to several machines with 1-16 processes. All systems are 
certified from Nvidia and use Nvidia Tesla k40 GPUs.

We notice frequent situations of the following,

--

The call to cuEventCreate failed. This is a unrecoverable error and will

cause the program to abort.

  Hostname: aHost

  cuEventCreate return value:   304

Check the cuda.h file for what the return value means.

--

--

The call to cuIpcGetEventHandle failed. This is a unrecoverable error and will

cause the program to abort.

  cuIpcGetEventHandle return value:   304

Check the cuda.h file for what the return value means.

--

--

The call to cuIpcGetMemHandle failed. This means the GPU RDMA protocol

cannot be used.

  cuIpcGetMemHandle return value:   304

  address: 0x700fd0400

Check the cuda.h file for what the return value means. Perhaps a reboot

of the node will clear the problem.

--

Now, our test suite still verifies results but this does cause the following 
when it happens,

The call to cuEventDestory failed. This is a unrecoverable error and will

cause the program to abort.

  cuEventDestory return value:   400

Check the cuda.h file for what the return value means.

--

---

Primary job  terminated normally, but 1 process returned

a non-zero exit code.. Per user-direction, the job has been aborted.

---

--

mpiexec detected that one or more processes exited with non-zero status, thus 
causing

the job to be terminated. The first process to do so was:


  Process name: [[37290,1],2]

  Exit code:1


We have traced the code back to the following files:
-ompi/mca/common/cuda/common_cuda.c :: 
mca_common_cuda_construct_event_and_handle()

We also know the the following:
-it happens on every machine on the very first entry to the function previously 
mentioned,
-does not happen if the buffer size is under 128 bytes… likely a different 
mech. Used for the IPC,

Last, here is an intermittent one and it produces a lot failed tests in our 
suite… when in fact they are solid, besides this error. Cause notification, 
annoyances and it would be nice to clean it up.

mpi_rank_3][cudaipc_allocate_ipc_region] 
[src/mpid/ch3/channels/mrail/src/gen2/ibv_cuda_ipc.c:487] cuda failed with 
mapping of buffer object failed


We have not been able to duplicate these errors in other MPI libs,

Thank you for your time & looking forward to your response,


Kindest Regards,
—
Steven Eliuk, Ph.D. Comp Sci,
Advanced Software Platforms Lab,
SRA - SV,
Samsung Electronics,
1732 North First Street,
San Jose, CA 95112,
Work: +1 408-652-1976,
Work: +1 408-544-5781 Wednesdays,
Cell: +1 408-819-4407.



Re: [OMPI users] knem in Open MPI 1.8.3

2014-10-17 Thread Ralph Castain

> On Oct 17, 2014, at 12:06 PM, Gus Correa  wrote:
> 
> Hi Jeff
> 
> Many thanks for looking into this and filing a bug report at 11:16PM!
> 
> Thanks to Aurelien, Ralph and Nathan for their help and clarifications
> also.
> 
> **
> 
> Related suggestion:
> 
> Add a note to the FAQ explaining that in OMPI 1.8
> the new (default) btl is vader (and what it is).
> 
> It was a real surprise to me.
> If Aurelien Bouteiller didn't tell me about vader,
> I might have never realized it even existed.
> 
> That could be part of one of the already existent FAQs
> explaining how to select the btl.
> 
> **
> 
> Doubts (btl in OMPI 1.8):
> 
> I still don't understand clearly the meaning and scope of vader
> being a "default btl”.

We mean that it has a higher priority than the other shared memory 
implementation, and so it will be used for intra-node messaging by default.

> Which is the scope of this default: intra-node btl only perhaps?

Yes - strictly intra-node

> Was there a default btl before vader, and which?

The “sm” btl was the default shared memory transport before vader

> Is vader the intra-node default only (i.e. replaces sm  by default),

Yes

> or does it somehow extend beyond node boundaries, and replaces (or brings in) 
> network btls (openib,tcp,etc) ?

Nope - just intra-node

> 
> If I am running on several nodes, and want to use openib, not tcp,
> and, say, use vader, what is the right syntax?
> 
> * nothing (OMPI will figure it out ... but what if you have 
> IB,Ethernet,Myrinet,OpenGM, altogether?)

If you have higher-speed connections, we will pick the fastest for inter-node 
messaging as the “default” since we expect you would want the fastest possible 
transport.

> * -mca btl openib (and vader will come along automatically)

Among the ones you show, this would indeed be the likely choices (openib and 
vader)

> * -mca btl openib,self (and vader will come along automatically)

The “self” btl is *always* active as the loopback transport

> * -mca btl openib,self,vader (because vader is default only for 1-node jobs)
> * something else (or several alternatives)
> 
> Whatever happened to the "self" btl in this new context?
> Gone? Still there?
> 
> Many thanks,
> Gus Correa
> 
> On 10/16/2014 11:16 PM, Jeff Squyres (jsquyres) wrote:
>> On Oct 16, 2014, at 1:35 PM, Gus Correa  wrote:
>> 
>>> and on the MCA parameter file:
>>> 
>>> btl_sm_use_knem = 1
>> 
>> I think the logic enforcing this MCA param got broken when we revamped the 
>> MCA param system.  :-(
>> 
>>> I am scratching my head to understand why a parameter with such a
>>> suggestive name ("btl_sm_have_knem_support"),
>>> so similar to the OMPI_BTL_SM_HAVE_KNEM cpp macro,
>>> somehow vanished from ompi_info in OMPI 1.8.3.
>> 
>> It looks like this MCA param was also dropped when we revamped the MCA 
>> system.  Doh!  :-(
>> 
>> There's some deep mojo going on that is somehow causing knem to not be used; 
>> I'm too tired to understand the logic right now.  I just opened 
>> https://github.com/open-mpi/ompi/issues/239 to track this issue -- feel free 
>> to subscribe to the issue to get updates.
>> 
> 
> ___
> users mailing list
> us...@open-mpi.org 
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users 
> 
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2014/10/25532.php 
> 


Re: [OMPI users] Open MPI 1.8.3 openmpi-mca-params.conf: old and new parameters

2014-10-17 Thread Ralph Castain

> On Oct 17, 2014, at 10:23 AM, Gus Correa  wrote:
> 
> Hi Ralph
> 
> Thank you.
> Your fixes covered much more than I could find.
> The section about the three levels of process placement
> (" Mapping, Ranking, and Binding: Oh My!") really helps.
> I would just add at the very beginning
> short sentences quickly characterizing each of the three levels.
> Kind of an "abstract".
> Then explain each level in more detail.

Will do - thanks!

> 
> **
> 
> Also, I found Jeff's 2013 presentation about the new style
> of process placement.
> 
> http://www.slideshare.net/jsquyres/open-mpi-explorations-in-process-affinity-eurompi13-presentation
> 
> The title calls it "LAMA".
> (That is mud in Portuguese! But the presentation is clear.)
> OK, the acronym means "Locality Aware Mapping Algorithm".
> In any case, it sounds very similar to the current process placement
> features of OMPI 1.8, although only Jeff and you can really tell if it
> is exactly the same.
> 
> If it is the same, it may help to link it to the OMPI FAQ,
> or somehow make it more visible, printable, etc.
> If there are differences between OMPI 1.8 and the presentation,
> it may be worth adjusting the presentation to the
> current OMPI 1.8, and posting it as well.
> That would be a good way to convey the OMPI 1.8
> process placement conceptual model, along with its syntax
> and examples.

Yeah, I need to do that. LAMA was an alternative implementation of the current 
map/rank/bind system. It hasn’t been fully maintained since it was introduced, 
and so I’m not sure how much of it is functional. I need to create an 
equivalent for the current implementation.


> 
> Thank you,
> Gus Correa
> 
> 
> On 10/17/2014 12:10 AM, Ralph Castain wrote:
>> I know this commit could be a little hard to parse, but I have updated
>> the mpirun man page on the trunk and will port the change over to the
>> 1.8 series tomorrow. FWIW, I’ve provided the link to the commit below so
>> you can “preview” it.
>> 
>> https://github.com/open-mpi/ompi/commit/f9d620e3a772cdeddd40b4f0789cf59c75b44868
>> 
>> HTH
>> Ralph
>> 
>> 
>>> On Oct 16, 2014, at 9:43 AM, Gus Correa >> > wrote:
>>> 
>>> Hi Ralph
>>> 
>>> Yes, I know the process placement features are powerful.
>>> They were already very good in 1.6, even in 1.4,
>>> and I just tried the new 1.8
>>> "-map-by l2cache" (works nicely on Opteron 6300).
>>> 
>>> Unfortunately I couldn't keep track, test, and use the 1.7 series.
>>> I did that in the previous "odd/new feature" series (1.3, 1.5).
>>> However, my normal workload require that
>>> I focus my attention on the "even/stable" series
>>> (less fun, more production).
>>> Hence I hopped directly from 1.6 to 1.8,
>>> although I read a number of mailing list postings about the new
>>> style of process placement.
>>> 
>>> Pestering you again about documentation (last time for now):
>>> The mpiexec man page also seems to need an update.
>>> That is probably the first place people look for information
>>> about runtime features.
>>> For instance, the process placement examples still
>>> use deprecated parameters and mpiexec options:
>>> -bind-to-core, rmaps_base_schedule_policy, orte_process_binding, etc.
>>> 
>>> Thank you,
>>> Gus Correa
>>> 
>>> On 10/15/2014 11:10 PM, Ralph Castain wrote:
 
 On Oct 15, 2014, at 11:46 AM, Gus Correa 
 > wrote:
 
> Thank you Ralph and Jeff for the help!
> 
> Glad to hear the segmentation fault is reproducible and will be fixed.
> 
> In any case, one can just avoid the old parameter name
> (rmaps_base_schedule_policy),
> and use instead the new parameter name
> (rmaps_base_mapping_policy)
> without any problem in OMPI 1.8.3.
> 
 
 Fix is in the nightly 1.8 tarball - I'll release a 1.8.4 soon to cover
 the problem.
 
> **
> 
> Thanks Ralph for sending the new (OMPI 1.8)
> parameter names for process binding.
> 
> My recollection is that sometime ago somebody (Jeff perhaps?)
> posted here a link to a presentation (PDF or PPT) explaining the
> new style of process binding, but I couldn't find it in the
> list archives.
> Maybe the link could be part of the FAQ (if not already there)?
 
 I don't think it is, but I'll try to add it over the next day or so.
 
> 
> **
> 
> The Open MPI runtime environment is really great.
> However, to take advantage of it one often has to do parameter guessing,
> and to do time consuming tests by trial and error,
> because the main sources of documentation are
> the terse output of ompi_info, and several sparse
> references in the FAQ.
> (Some of them outdated?)
> 
> In addition, the runtime environment has evolved over time,
> which is certainly a good thing.
> However, 

Re: [OMPI users] knem in Open MPI 1.8.3

2014-10-17 Thread Gus Correa

Hi Jeff

Many thanks for looking into this and filing a bug report at 11:16PM!

Thanks to Aurelien, Ralph and Nathan for their help and clarifications
also.

**

Related suggestion:

Add a note to the FAQ explaining that in OMPI 1.8
the new (default) btl is vader (and what it is).

It was a real surprise to me.
If Aurelien Bouteiller didn't tell me about vader,
I might have never realized it even existed.

That could be part of one of the already existent FAQs
explaining how to select the btl.

**

Doubts (btl in OMPI 1.8):

I still don't understand clearly the meaning and scope of vader
being a "default btl".
Which is the scope of this default: intra-node btl only perhaps?
Was there a default btl before vader, and which?
Is vader the intra-node default only (i.e. replaces sm  by default),
or does it somehow extend beyond node boundaries, and replaces (or 
brings in) network btls (openib,tcp,etc) ?


If I am running on several nodes, and want to use openib, not tcp,
and, say, use vader, what is the right syntax?

* nothing (OMPI will figure it out ... but what if you have 
IB,Ethernet,Myrinet,OpenGM, altogether?)

* -mca btl openib (and vader will come along automatically)
* -mca btl openib,self (and vader will come along automatically)
* -mca btl openib,self,vader (because vader is default only for 1-node jobs)
* something else (or several alternatives)

Whatever happened to the "self" btl in this new context?
Gone? Still there?

Many thanks,
Gus Correa

On 10/16/2014 11:16 PM, Jeff Squyres (jsquyres) wrote:

On Oct 16, 2014, at 1:35 PM, Gus Correa  wrote:


and on the MCA parameter file:

btl_sm_use_knem = 1


I think the logic enforcing this MCA param got broken when we revamped the MCA 
param system.  :-(


I am scratching my head to understand why a parameter with such a
suggestive name ("btl_sm_have_knem_support"),
so similar to the OMPI_BTL_SM_HAVE_KNEM cpp macro,
somehow vanished from ompi_info in OMPI 1.8.3.


It looks like this MCA param was also dropped when we revamped the MCA system.  
Doh!  :-(

There's some deep mojo going on that is somehow causing knem to not be used; 
I'm too tired to understand the logic right now.  I just opened 
https://github.com/open-mpi/ompi/issues/239 to track this issue -- feel free to 
subscribe to the issue to get updates.





Re: [OMPI users] Open MPI 1.8.3 openmpi-mca-params.conf: old and new parameters

2014-10-17 Thread Gus Correa

Hi Ralph

Thank you.
Your fixes covered much more than I could find.
The section about the three levels of process placement
(" Mapping, Ranking, and Binding: Oh My!") really helps.
I would just add at the very beginning
short sentences quickly characterizing each of the three levels.
Kind of an "abstract".
Then explain each level in more detail.

**

Also, I found Jeff's 2013 presentation about the new style
of process placement.

http://www.slideshare.net/jsquyres/open-mpi-explorations-in-process-affinity-eurompi13-presentation

The title calls it "LAMA".
(That is mud in Portuguese! But the presentation is clear.)
OK, the acronym means "Locality Aware Mapping Algorithm".
In any case, it sounds very similar to the current process placement
features of OMPI 1.8, although only Jeff and you can really tell if it
is exactly the same.

If it is the same, it may help to link it to the OMPI FAQ,
or somehow make it more visible, printable, etc.
If there are differences between OMPI 1.8 and the presentation,
it may be worth adjusting the presentation to the
current OMPI 1.8, and posting it as well.
That would be a good way to convey the OMPI 1.8
process placement conceptual model, along with its syntax
and examples.

Thank you,
Gus Correa


On 10/17/2014 12:10 AM, Ralph Castain wrote:

I know this commit could be a little hard to parse, but I have updated
the mpirun man page on the trunk and will port the change over to the
1.8 series tomorrow. FWIW, I’ve provided the link to the commit below so
you can “preview” it.

https://github.com/open-mpi/ompi/commit/f9d620e3a772cdeddd40b4f0789cf59c75b44868

HTH
Ralph



On Oct 16, 2014, at 9:43 AM, Gus Correa > wrote:

Hi Ralph

Yes, I know the process placement features are powerful.
They were already very good in 1.6, even in 1.4,
and I just tried the new 1.8
"-map-by l2cache" (works nicely on Opteron 6300).

Unfortunately I couldn't keep track, test, and use the 1.7 series.
I did that in the previous "odd/new feature" series (1.3, 1.5).
However, my normal workload require that
I focus my attention on the "even/stable" series
(less fun, more production).
Hence I hopped directly from 1.6 to 1.8,
although I read a number of mailing list postings about the new
style of process placement.

Pestering you again about documentation (last time for now):
The mpiexec man page also seems to need an update.
That is probably the first place people look for information
about runtime features.
For instance, the process placement examples still
use deprecated parameters and mpiexec options:
-bind-to-core, rmaps_base_schedule_policy, orte_process_binding, etc.

Thank you,
Gus Correa

On 10/15/2014 11:10 PM, Ralph Castain wrote:


On Oct 15, 2014, at 11:46 AM, Gus Correa 
> wrote:


Thank you Ralph and Jeff for the help!

Glad to hear the segmentation fault is reproducible and will be fixed.

In any case, one can just avoid the old parameter name
(rmaps_base_schedule_policy),
and use instead the new parameter name
(rmaps_base_mapping_policy)
without any problem in OMPI 1.8.3.



Fix is in the nightly 1.8 tarball - I'll release a 1.8.4 soon to cover
the problem.


**

Thanks Ralph for sending the new (OMPI 1.8)
parameter names for process binding.

My recollection is that sometime ago somebody (Jeff perhaps?)
posted here a link to a presentation (PDF or PPT) explaining the
new style of process binding, but I couldn't find it in the
list archives.
Maybe the link could be part of the FAQ (if not already there)?


I don't think it is, but I'll try to add it over the next day or so.



**

The Open MPI runtime environment is really great.
However, to take advantage of it one often has to do parameter guessing,
and to do time consuming tests by trial and error,
because the main sources of documentation are
the terse output of ompi_info, and several sparse
references in the FAQ.
(Some of them outdated?)

In addition, the runtime environment has evolved over time,
which is certainly a good thing.
However, along with this evolution, several runtime parameters
changed both name and functionality, new ones were introduced,
old ones were deprecated, which can be somewhat confusing,
and can lead to an ineffective use of the runtime environment.
(In 1.8.3 I was using several deprecated parameters from 1.6.5
that seem to be silently ignored at runtime.
I only noticed the problem because that segmentation fault happened.)

I know asking for thorough documentation is foolish,


Not really - it is something we need to get better about :-(


but I guess a simple table of runtime parameter names and valid values
in the FAQ, maybe sorted by their purpose/function, along with a few
examples of use, could help a lot.
Some of this material is now spread across several FAQ, but not so
easy to find/compare.
That doesn't need to be a comprehensive table, but commonly used

Re: [OMPI users] Open MPI 1.8: link problem when Fortran+C+Platform LSF

2014-10-17 Thread Ralph Castain
Sigh - the original message didn’t get in there, I think. See below:

Paul - it looks to me like we are adding the required libraries, but perhaps 
not to the wrapper compilers. Jeff is on travel today, but I’ll check with him 
next week.

Ralph


> Dear Open MPI developer,
> 
> we have both Open MPI 1.6(.5) and 1.8(.3) in our cluster, configured to be 
> used with Platform LSF.
> 
> One of our users run into an issue when trying to link his code (combination 
> of lex/C and Fortran) with v.1.8, whereby with OpenMPI/1.6er the code can be 
> linked OK.
> 
> > $ make
> > mpif90 -c main.f90
> > yacc -d example4.y
> > mpicc -c y.tab.c
> > mpicc -c mymain.c
> > lex example4.l
> > mpicc -c lex.yy.c
> > mpif90 -o example main.o y.tab.o mymain.o lex.yy.o
> > ld: y.tab.o(.text+0xd9): unresolvable R_X86_64_PC32 relocation against 
> > symbol `yylval'
> > ld: y.tab.o(.text+0x16f): unresolvable R_X86_64_PC32 relocation against 
> > symbol `yyval'
> > ...
> 
> looking into "mpif90 --show-me" let us see that the link line and possibly 
> the philosophy behind it has been changed, there is also a note on it:
> 
> # Note that per https://svn.open-mpi.org/trac/ompi/ticket/3422 
> , we
> # intentionally only link in the MPI libraries (ORTE, OPAL, etc. are
> # pulled in implicitly) because we intend MPI applications to only use
> # the MPI API.
> 
> 
> 
> 
> Well, by now we know two workarounds:
> a) add "-lbat -llsf" to the link line
> b) add " -Wl,--as-needed" to the link line
> 
> What would be better? Maybe one of this should be added to linker_flags=..." 
> in the .../share/openmpi/mpif90-wrapper-data.txt file? As of the note above, 
> (b) would be better?
> 
> Best
> 
> Paul Kapinos
> 
> P.S. $ mpif90 --show-me
> 
> 1.6.5
> ifort -nofor-main -I/opt/MPI/openmpi-1.6.5/linux/intel/include -fexceptions 
> -I/opt/MPI/openmpi-1.6.5/linux/intel/lib 
> -L/opt/lsf/9.1/linux2.6-glibc2.3-x86_64/lib 
> -L/opt/MPI/openmpi-1.6.5/linux/intel/lib -lmpi_f90 -lmpi_f77 -lmpi -losmcomp 
> -lrdmacm -libverbs -lrt -lnsl -lutil -lpsm_infinipath -lbat -llsf -ldl -lm 
> -lnuma -lrt -lnsl -lutil
> 
> 1.8.3
> ifort -I/opt/MPI/openmpi-1.8.3/linux/intel/include -fexceptions 
> -I/opt/MPI/openmpi-1.8.3/linux/intel/lib 
> -L/opt/lsf/9.1/linux2.6-glibc2.3-x86_64/lib -Wl,-rpath 
> -Wl,/opt/lsf/9.1/linux2.6-glibc2.3-x86_64/lib -Wl,-rpath 
> -Wl,/opt/MPI/openmpi-1.8.3/linux/intel/lib -Wl,--enable-new-dtags 
> -L/opt/MPI/openmpi-1.8.3/linux/intel/lib -lmpi_usempif08 
> -lmpi_usempi_ignore_tkr -lmpi_mpifh -lmpi
> 
> P.S.2 $ man ld
> 
>--as-needed
>--no-as-needed
>This option affects ELF DT_NEEDED tags for dynamic libraries
>mentioned on the command line after the --as-needed option.
>Normally the linker will add a DT_NEEDED tag for each dynamic
>library mentioned on the command line, regardless of whether the
>library is actually needed or not.  --as-needed causes a DT_NEEDED
>tag to only be emitted for a library that satisfies an undefined
>symbol reference from a regular object file or, if the library is
>not found in the DT_NEEDED lists of other libraries linked up to
>that point, an undefined symbol reference from another dynamic
>library.  --no-as-needed restores the default behaviour.
> 
> 
> 
> -- 
> Dipl.-Inform. Paul Kapinos   -   High Performance Computing,
> RWTH Aachen University, IT Center
> Seffenter Weg 23,  D 52074  Aachen (Germany)
> Tel: +49 241/80-24915
> 


> On Oct 17, 2014, at 7:49 AM, Ralph Castain  wrote:
> 
> Hi Paul
> 
> You should probably update your email address to the list so you can respond 
> to the comments. I’ll forward this for you in the meantime.
> 
> FWIW: we did do some fixes to the LSF linkage (provided by IBM) for the 1.8 
> series. I’ll check and see if they made it to 1.8.3 (in which case, it sounds 
> like we may need to revisit the patch) or are sitting in the branch waiting 
> for release.
> 
> 
>> On Oct 17, 2014, at 6:38 AM, Paul Kapinos  wrote:
>> 
>> Jeff, Ralph,
>> sorry for this - but it seem I'm not allowed to post anything, maybe because 
>> my mail has been changed from  to 
>> . Could you please forward my question to the 
>> developer? Thank!
>> 
>> Paul
>> 
>> 
>>  Original Message 
>> Subject: Open MPI 1.8: link problem when Fortran+C+Platform LSF
>> Date: Fri, 17 Oct 2014 09:35:36 -0400
>> From: 
>> To: 
>> 
>> You are not allowed to post to this mailing list, and your message has
>> been automatically rejected.  If you think that your messages are
>> being rejected in error, contact the mailing list owner at
>> users-ow...@open-mpi.org.
>> 
>> 
>> 
>> 
>> 
> 



Re: [OMPI users] [FEniCS] Question about MPI barriers

2014-10-17 Thread Jeff Squyres (jsquyres)
Thanks; I filed https://github.com/open-mpi/ompi/issues/242.


On Oct 17, 2014, at 5:59 AM, Jed Brown  wrote:

> Martin Sandve Alnæs  writes:
> 
>> Thanks, but ibarrier doesn't seem to be in the stable version of openmpi:
>> http://www.open-mpi.org/doc/v1.8/
>> Otherwise mpi_ibarrier+mpi_test+homemade time/sleep loop would do the trick.
> 
> MPI_Ibarrier is there (since 1.7), just missing a man page.
> ___
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2014/10/25528.php


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/



Re: [OMPI users] [FEniCS] Question about MPI barriers

2014-10-17 Thread Jed Brown
Martin Sandve Alnæs  writes:

> Thanks, but ibarrier doesn't seem to be in the stable version of openmpi:
> http://www.open-mpi.org/doc/v1.8/
> Otherwise mpi_ibarrier+mpi_test+homemade time/sleep loop would do the trick.

MPI_Ibarrier is there (since 1.7), just missing a man page.


pgpBxd2PEYphR.pgp
Description: PGP signature


[OMPI users] large memory usage and hangs when preconnecting beyond 1000 cpus

2014-10-17 Thread Marshall Ward
I currently have a numerical model that, for reasons unknown, requires
preconnection to avoid hanging on an initial MPI_Allreduce call. But
when we try to scale out beyond around 1000 cores, we are unable to
get past MPI_Init's preconnection phase.

To test this, I have a basic C program containing only MPI_Init() and
MPI_Finalize() named `mpi_init`, which I compile and run using `mpirun
-mca mpi_preconnect_mpi 1 mpi_init`.

This preconnection seems to consume a large amount of memory, and is
exceeding the available memory on our nodes (~2GiB/core) as the number
gets into the thousands (~4000 or so). If we try to preconnect to
around ~6000, we start to see hangs and crashes.

A failed 5600 core preconnection gave this warning (~10k times) while
hanging for 30 minutes:

[warn] opal_libevent2021_event_base_loop: reentrant invocation.
Only one event_base_loop can run on each event_base at once.

A failed 6000-core preconnection job crashed almost immediately with
the following error.

[r104:18459] [[32743,0],0] ORTE_ERROR_LOG: File open failure in
file ras_tm_module.c at line 159
[r104:18459] [[32743,0],0] ORTE_ERROR_LOG: File open failure in
file ras_tm_module.c at line 85
[r104:18459] [[32743,0],0] ORTE_ERROR_LOG: File open failure in
file base/ras_base_allocate.c at line 187

Should we expect to use very large amounts of memory for
preconnections of thousands of CPUs? And can these

I am using Open MPI 1.8.2 on Linux 2.6.32 (centOS) and FDR infiniband
network. This is probably not enough information, but I'll try to
provide more if necessary. My knowledge of implementation is
unfortunately very limited.


Re: [OMPI users] Open MPI 1.8.3 openmpi-mca-params.conf: old and new parameters

2014-10-17 Thread Ralph Castain
I know this commit could be a little hard to parse, but I have updated the 
mpirun man page on the trunk and will port the change over to the 1.8 series 
tomorrow. FWIW, I’ve provided the link to the commit below so you can “preview” 
it.

https://github.com/open-mpi/ompi/commit/f9d620e3a772cdeddd40b4f0789cf59c75b44868
 


HTH
Ralph


> On Oct 16, 2014, at 9:43 AM, Gus Correa  wrote:
> 
> Hi Ralph
> 
> Yes, I know the process placement features are powerful.
> They were already very good in 1.6, even in 1.4,
> and I just tried the new 1.8
> "-map-by l2cache" (works nicely on Opteron 6300).
> 
> Unfortunately I couldn't keep track, test, and use the 1.7 series.
> I did that in the previous "odd/new feature" series (1.3, 1.5).
> However, my normal workload require that
> I focus my attention on the "even/stable" series
> (less fun, more production).
> Hence I hopped directly from 1.6 to 1.8,
> although I read a number of mailing list postings about the new
> style of process placement.
> 
> Pestering you again about documentation (last time for now):
> The mpiexec man page also seems to need an update.
> That is probably the first place people look for information
> about runtime features.
> For instance, the process placement examples still
> use deprecated parameters and mpiexec options:
> -bind-to-core, rmaps_base_schedule_policy, orte_process_binding, etc.
> 
> Thank you,
> Gus Correa
> 
> On 10/15/2014 11:10 PM, Ralph Castain wrote:
>> 
>> On Oct 15, 2014, at 11:46 AM, Gus Correa > > wrote:
>> 
>>> Thank you Ralph and Jeff for the help!
>>> 
>>> Glad to hear the segmentation fault is reproducible and will be fixed.
>>> 
>>> In any case, one can just avoid the old parameter name
>>> (rmaps_base_schedule_policy),
>>> and use instead the new parameter name
>>> (rmaps_base_mapping_policy)
>>> without any problem in OMPI 1.8.3.
>>> 
>> 
>> Fix is in the nightly 1.8 tarball - I'll release a 1.8.4 soon to cover
>> the problem.
>> 
>>> **
>>> 
>>> Thanks Ralph for sending the new (OMPI 1.8)
>>> parameter names for process binding.
>>> 
>>> My recollection is that sometime ago somebody (Jeff perhaps?)
>>> posted here a link to a presentation (PDF or PPT) explaining the
>>> new style of process binding, but I couldn't find it in the
>>> list archives.
>>> Maybe the link could be part of the FAQ (if not already there)?
>> 
>> I don't think it is, but I'll try to add it over the next day or so.
>> 
>>> 
>>> **
>>> 
>>> The Open MPI runtime environment is really great.
>>> However, to take advantage of it one often has to do parameter guessing,
>>> and to do time consuming tests by trial and error,
>>> because the main sources of documentation are
>>> the terse output of ompi_info, and several sparse
>>> references in the FAQ.
>>> (Some of them outdated?)
>>> 
>>> In addition, the runtime environment has evolved over time,
>>> which is certainly a good thing.
>>> However, along with this evolution, several runtime parameters
>>> changed both name and functionality, new ones were introduced,
>>> old ones were deprecated, which can be somewhat confusing,
>>> and can lead to an ineffective use of the runtime environment.
>>> (In 1.8.3 I was using several deprecated parameters from 1.6.5
>>> that seem to be silently ignored at runtime.
>>> I only noticed the problem because that segmentation fault happened.)
>>> 
>>> I know asking for thorough documentation is foolish,
>> 
>> Not really - it is something we need to get better about :-(
>> 
>>> but I guess a simple table of runtime parameter names and valid values
>>> in the FAQ, maybe sorted by their purpose/function, along with a few
>>> examples of use, could help a lot.
>>> Some of this material is now spread across several FAQ, but not so
>>> easy to find/compare.
>>> That doesn't need to be a comprehensive table, but commonly used
>>> items like selecting the btl, selecting interfaces,
>>> dealing with process binding,
>>> modifying/enriching the stdout/sterr output
>>> (tagging output, increasing verbosity, etc),
>>> probably have their place there.
>> 
>> Yeah, we fell down on this one. The changes were announced with each
>> step in the 1.7 series, but if you step from 1.6 directly to 1.8, you'll
>> get caught flat-footed. We honestly didn't think of that case, and so we
>> mentally assumed that "of course people have been following the series -
>> they know what happened".
>> 
>> You know what they say about those who "assume" :-/
>> 
>> I'll try to get something into the FAQ about the entire new mapping,
>> ranking, and binding system. It is actually VERY powerful, allowing you
>> to specify pretty much any placement pattern you can imagine and bind it
>> to whatever level you desire. It was developed in response to requests
>> from researchers who wanted to explore application performance versus
>> 

Re: [OMPI users] knem in Open MPI 1.8.3

2014-10-17 Thread Jeff Squyres (jsquyres)
On Oct 16, 2014, at 1:35 PM, Gus Correa  wrote:

> and on the MCA parameter file:
> 
> btl_sm_use_knem = 1

I think the logic enforcing this MCA param got broken when we revamped the MCA 
param system.  :-(

> I am scratching my head to understand why a parameter with such a
> suggestive name ("btl_sm_have_knem_support"),
> so similar to the OMPI_BTL_SM_HAVE_KNEM cpp macro,
> somehow vanished from ompi_info in OMPI 1.8.3.

It looks like this MCA param was also dropped when we revamped the MCA system.  
Doh!  :-(

There's some deep mojo going on that is somehow causing knem to not be used; 
I'm too tired to understand the logic right now.  I just opened 
https://github.com/open-mpi/ompi/issues/239 to track this issue -- feel free to 
subscribe to the issue to get updates.

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/