[OMPI devel] (no subject)

2010-11-30 Thread ananda.mudar
Jeff, George, Ralph

Thanks a lot for your clarifications!!

-   Ananda

--- PREVIOUS MESSAGE ---

Subject: Re: [OMPI devel] Warning on fork() disappears if I use MPI
threads!!
From: Jeff Squyres (jsquyres_at_[hidden])
List-Post: devel@lists.open-mpi.org
Date: 2010-11-29 19:50:33

*   Previous message: Ralph Castain: "Re: [OMPI devel] Warning on
fork() disappears if I use MPI threads!!"

*   In reply to: George Bosilca: "Re: [OMPI devel] Warning on fork()
disappears if I use MPI threads!!"

*   Next in thread: Ralph Castain: "Re: [OMPI devel] Warning on
fork() disappears if I use MPI threads!!"




On Nov 29, 2010, at 6:25 PM, George Bosilca wrote:

> The main problem is that openib require to pin memory pages in order
to take advantage of RMA features. There is a major issues with these
pinned pages and fork, leading to segmentation fault in some specific
cases. However, we only pin the pages on the MPI calls related to data
transfers. Therefore, if you call fork __before__ any other MPI data
transfer function (but after MPI_Init as you use the process rank), your
application should be safe.

Note that Open MPI also pins some internal memory during MPI_INIT, but
that memory is totally internal to libmpi, so you should be safe (i.e.,
you should never be able to find it and therefore never be able to try
to touch it).

>> How can one be sure that the disabling the warning is ok? Could you
please elaborate on what makes forks vulnerable? May be that will guide
the developers to make an informed decision on whether to disable them
or find another alternative.
>
> No way to know at 100%. Now for an elaborate answer: Once upon a time
... The fork story is a long and boring one, we would all have preferred
to never heard about it (believe me). A quick and compressed version can
be found on the QLogic download page
(http://filedownloads.qlogic.com/files/driver/70277/release_QLogicIB-Bas
ic_4400_Rev_A.html).

That's a good summary. The issue is with OFED itself, not with Open MPI.


Note, too, that calling popen() should also be safe (even though we'll
warn about it -- our atfork hook has no way of knowing whether you're
calling system, popen, or something else).

--
Jeff Squyres
jsquyres_at_[hidden]
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/







Ananda B Mudar, PMP

Senior Technical Architect

Wipro Technologies

Ph: 972 765 8093

ananda.mu...@wipro.com




Please do not print this email unless it is absolutely necessary. 

The information contained in this electronic message and any attachments to 
this message are intended for the exclusive use of the addressee(s) and may 
contain proprietary, confidential or privileged information. If you are not the 
intended recipient, you should not disseminate, distribute or copy this e-mail. 
Please notify the sender immediately and destroy all copies of this message and 
any attachments. 

WARNING: Computer viruses can be transmitted via email. The recipient should 
check this email and any attachments for the presence of viruses. The company 
accepts no liability for any damage caused by any virus transmitted by this 
email. 

www.wipro.com


Re: [OMPI devel] Warning on fork() disappears if I use MPI threads!!

2010-11-30 Thread N.M. Maclaren

On Nov 30 2010, Ralph Castain wrote:


Here is what one IB vendor says about the issue on their web site 
(redacted to protect the innocent):


"At the time of this release, the (redacted-openib) driver has issues 
with buffers sharing pages when fork( ) is used. Pinned (locked in 
memory) pages are normally marked copy-on-write during a fork.


That is TRULY demented!  It is almost always precisely the wrong thing
to do.

If a page 
is pinned before a fork and subsequently written to while RDMA operations 
are being performed on the same page, silent data corruption can occur as 
RDMA operations continue to stream data to a page that has moved. To 
avoid this, the (redacted-openib) driver does not use copy-on-write 
behavior during a fork for pinned pages. Instead, access to these pages 
by the child process will result in a segmentation violation."


That is sane.  Not user-friendly, but at least sane.

While there is some variation, I believe you will find that all IB comm 
shares this problem. So it is wise to avoid using fork if you want to use 
the openib transport.


Yes and no.  Some such communication may allow RDMA only to shared memory,
which solves the problem in another way.  Several specialist HPC networks
were (are?) like that, and I can see no reason why an IB driver should not
use the same design.  That, of course, means that most MPI transfers need
a copy.

Hence the warning. Ignoring it is purely a "user beware" situation, but 
we provide that mechanism for the truly adventurous...or IB developers 
who want to someday resolve the problem.


Well, there is a much simpler case where it will "just work", which is
very probably what the OP was doing.  When the fork is immediately
followed by an exec in the child process, there isn't an issue.  We all
know the history, but the mainframe designs of having a proper spawn
primitive were much cleaner.  However, that's not what we've got.

It might be worth adding to the note that this is the ONLY case when the
ordinary user is advised to use that facility.  Or it might not, depending
on the level of Clue that readers are expected to have.


Regards,
Nick Maclaren.







Re: [OMPI devel] RFC: Bring the lastest ROMIO version from MPICH2-1.3 into the trunk

2010-11-30 Thread Pascal Deveze

Hi Jeff,

Thanks for having a look in my unified diff (see comments in the text)

I have commited all my last changes in bitbucket, including those that 
follows.



Pascal

Jeff Squyres a écrit :

Some questions about the patch:

configure.in:

@@ -2002,9 +1987,8 @@
# Turn off the building of the Fortran interface and the Info routines
EXTRA_DIRS=""
AC_DEFINE(HAVE_STATUS_SET_BYTES,1,[Define if status_set_bytes available])
-   DEFINE_HAVE_MPI_GREQUEST="#define HAVE_MPI_GREQUEST"
-   # Add the MPICH2_INCLUDE_FLAGS to CPPFLAGS
-   CPPFLAGS="$CPPFLAGS $MPICH2_INCLUDE_FLAGS"
+   DEFINE_HAVE_MPI_GREQUEST="#define HAVE_MPI_GREQUEST 1"
+   AC_DEFINE(HAVE_MPIU_FUNCS,1,[Define if MPICH2 memory tracing macros 
defined])
  
 fi

 #
 #

Do we have the MPIU functions?  Or is that an MPICH2-specific thing?
  


I have put in comments this last "AC_DEFINE":
# Open MPI does not have the MPIU functions
# AC_DEFINE(HAVE_MPIU_FUNCS,1,[Define if MPICH2 memory tracing macros 
defined])



I see that you moved confdb/aclocal_cc.m4 to acinclude.m4.  Shoudn't we just -I 
confdb instead to get all of their .m4 files?

  

This has been done during the last porting (years ago).
I have now changed this: All confdb/.m4 files are now copied from 
MPICH2. Only the definition of PAC_FUNC_NEEDS_DECL is still kept in 
acinclude.m4.

If I do not so, configure is still blocking on this macro.
All seems working well so. If you have any clue about this, I will take it !


In mpipr.h, why remove the #if 0?

-/* Open MPI: these functions are not supposed to be profiled */
-#if 0
 #undef MPI_Wtick
 #define MPI_Wtick PMPI_Wtick
 #undef MPI_Wtime
 #define MPI_Wtime PMPI_Wtime
-#endif

  


OK, I put the #if 0 again.


In configure.in, please update the version number in AM_INIT_AUTOMAKE.
  


AM_INIT_AUTOMAKE(io-romio, 1.0.0, 'no')
is changed to
AM_INIT_AUTOMAKE(io-romio, 1.0.1, 'no')


I thought there was one other thing that I saw, but I can't recall it right 
now...

This is just from looking at your diff; I didn't try to run it yet because you 
said there were some things that weren't pushed back up to bitbucket yet.





On Nov 24, 2010, at 10:48 AM, Pascal Deveze wrote:

  

Hi Jeff,

Here is the unified diff.
As only the romio subtree is modified, I made the following command:
  diff -u -r -x .svn ompi-trunk/ompi/mca/io/romio/romio/ 
NEW-ROMIO-FOR-OPENMPI/ompi/mca/io/romio/romio/ > DIFF_UPDATE
  tar cvzf DIFF_UPDATE.TGZ DIFF_UPDATE

Compilation is OK. I run the ROMIO tests.

There are a few new modifications that are not in bitbucket. I think it is not 
necessary to update bitbucket 
(http://bitbucket.org/devezep/new-romio-for-openmpi/ ).

Pascal
 
Jeff Squyres a écrit :


Thanks Pascal!

Is there any change you could send a unified diff of the tip of your hg vs. the 
SVN trunk HEAD?

E.g., if you have an hg+ssh combo tree, could you "hg up" in there to get all your work, and 
then "svn diff > diff.out" and then compress and send the diff.out?

Thanks!



On Nov 10, 2010, at 8:43 AM, Pascal Deveze wrote:

  

  

WHAT: Port the lastest ROMIO version from MPICH2-1.3 into the trunk.

WHY: There is a considerable interest in updating the ROMIO branch that was 
ported from mpich2-1.0.7

WHERE: ompi/mca/io/romio/

WHEN: Before 1.5.2, so asap

TIMEOUT: Next Tuesday teleconf, 23 Nov 2010

-

I am in charge of ticket 1888 (see at 
https://svn.open-mpi.org/trac/ompi/ticket/1888

).
I have made the porting of ROMIO available in bitbucket since September 17th 
2010. (
http://bitbucket.org/devezep/new-romio-for-openmpi/
 )
Until now, I do not have any report on this porting and it's now time to bring 
it into the trunk.
All modified files are located under the romio subtree.

Pascal Devèze

___
devel mailing list

de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




  

  

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




  




Re: [OMPI devel] 1.5 plans

2010-11-30 Thread Joshua Hursey
Can you make a v1.7 milestone on Trac, so I can move some of my tickets?

Some are CMRs, but a couple are defects, with fixes in development, that 
without those CMRs cannot be moved to v1.5.

Thanks,
Josh


On Nov 29, 2010, at 11:43 AM, Jeff Squyres wrote:

> I'm about 2 weeks late on this email; apologies.  SC and Thanksgiving got in 
> the way.
> 
> Per a discussion on the devel teleconf nearly 3 weeks ago, we have decided 
> what to do with the v1.5 series:
> 
> - 1.5.1 will be a bug fix release.  There's 2 blocker bugs right now that 
> need to be reviewed; those and the currently ready-to-commit major CMR are 
> all that is planned for 1.5.1.  Hopefully, they could be ready by tonight.
> 
> - 1.5.2 (and successive releases) will be "normal" feature releases.  There's 
> a bit of divergence between the trunk and the v1.5 branch, meaning that some 
> porting of features may be required to get over to the v1.5 branch (FWIW, I 
> think that many things will not require much porting at all -- but some 
> will).  Many of the CMRs filed against v1.5.2 are still relevant; *some* of 
> the features/bugs are still relevant.  We'll start [re-]examining the v1.5.2 
> tickets in more detail soon.  So feel free to apply to have your favorite 
> feature brought over to the v1.5 branch.  Bigger features may be kept in the 
> wings for v1.7 (e.g., the wholesale ORTE refresh for v1.5.x has been axed and 
> will wait until v1.7).  There is a bunch of affinity work occurring on the 
> trunk (and/or in hg branches) right now; we plan to bring all that stuff in 
> to the v1.5 series when ready (probably 3+ months at the earliest -- 
> especially with the December holidays delaying everything).  Once that's 
> done, we ca!
> n then probably start thinking about wrapping up the v1.5 series, converting 
> it to its stable counterpart (1.6), and then branching for v1.7.
> 
> -- 
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
> 
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 


Joshua Hursey
Postdoctoral Research Associate
Oak Ridge National Laboratory
http://users.nccs.gov/~jjhursey




Re: [OMPI devel] 1.5 plans

2010-11-30 Thread Jeff Squyres
On Nov 30, 2010, at 8:54 AM, Joshua Hursey wrote:

> Can you make a v1.7 milestone on Trac, so I can move some of my tickets?

Done.

> Some are CMRs, but a couple are defects, with fixes in development, that 
> without those CMRs cannot be moved to v1.5.
> 
> Thanks,
> Josh
> 
> 
> On Nov 29, 2010, at 11:43 AM, Jeff Squyres wrote:
> 
>> I'm about 2 weeks late on this email; apologies.  SC and Thanksgiving got in 
>> the way.
>> 
>> Per a discussion on the devel teleconf nearly 3 weeks ago, we have decided 
>> what to do with the v1.5 series:
>> 
>> - 1.5.1 will be a bug fix release.  There's 2 blocker bugs right now that 
>> need to be reviewed; those and the currently ready-to-commit major CMR are 
>> all that is planned for 1.5.1.  Hopefully, they could be ready by tonight.
>> 
>> - 1.5.2 (and successive releases) will be "normal" feature releases.  
>> There's a bit of divergence between the trunk and the v1.5 branch, meaning 
>> that some porting of features may be required to get over to the v1.5 branch 
>> (FWIW, I think that many things will not require much porting at all -- but 
>> some will).  Many of the CMRs filed against v1.5.2 are still relevant; 
>> *some* of the features/bugs are still relevant.  We'll start [re-]examining 
>> the v1.5.2 tickets in more detail soon.  So feel free to apply to have your 
>> favorite feature brought over to the v1.5 branch.  Bigger features may be 
>> kept in the wings for v1.7 (e.g., the wholesale ORTE refresh for v1.5.x has 
>> been axed and will wait until v1.7).  There is a bunch of affinity work 
>> occurring on the trunk (and/or in hg branches) right now; we plan to bring 
>> all that stuff in to the v1.5 series when ready (probably 3+ months at the 
>> earliest -- especially with the December holidays delaying everything).  
>> Once that's done, we !
> ca!
>> n then probably start thinking about wrapping up the v1.5 series, converting 
>> it to its stable counterpart (1.6), and then branching for v1.7.
>> 
>> -- 
>> Jeff Squyres
>> jsquy...@cisco.com
>> For corporate legal information go to:
>> http://www.cisco.com/web/about/doing_business/legal/cri/
>> 
>> 
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> 
> 
> 
> Joshua Hursey
> Postdoctoral Research Associate
> Oak Ridge National Laboratory
> http://users.nccs.gov/~jjhursey
> 
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/




Re: [OMPI devel] RFC: Bring the lastest ROMIO version from MPICH2-1.3 into the trunk

2010-11-30 Thread Jeff Squyres
On Nov 30, 2010, at 6:44 AM, Pascal Deveze wrote:

> I have commited all my last changes in bitbucket, including those that 
> follows.

I got a checkout, and still have some problems/questions.  More below.

If you do the IM thing, ping me on IM (I sent you my IDs in an off-list email).

>> Do we have the MPIU functions?  Or is that an MPICH2-specific thing?
> 
> I have put in comments this last "AC_DEFINE":
> # Open MPI does not have the MPIU functions
> # AC_DEFINE(HAVE_MPIU_FUNCS,1,[Define if MPICH2 memory tracing macros 
> defined]) 

Good.

>> I see that you moved confdb/aclocal_cc.m4 to acinclude.m4.  Shoudn't we just 
>> -I confdb instead to get all of their .m4 files?
>> 
> This has been done during the last porting (years ago).
> I have now changed this: All confdb/.m4 files are now copied from MPICH2. 
> Only the definition of PAC_FUNC_NEEDS_DECL is still kept in acinclude.m4.
> If I do not so, configure is still blocking on this macro.
> All seems working well so. If you have any clue about this, I will take it !

I see that we have the whole romio/confdb directory, so it seems like we should 
use that tree rather than copy to acinclude.m4.

But I note that when I get an hg clone of your repo:

- there's no .hgignore file -- making "hg status" difficult.  In your SVN+HG 
tree, can you run ./contrib/hg/build-hgignore.pl and commit/push the resulting 
.hgignore?  That would be most helpful.

- ompi/mca/io/romio/romio/adio/include/romioconf.h.in is in the hg repo, but 
should not be (it's generated).

- I don't see a romio/acinclude.m4 file in the repo, so whatever you did there 
doesn't show up for me.  

- I tried to add an ompi/mca/io/romio/romio/autogen.sh executable file that 
contained:

:
autoreconf -ivf -I confdb

and that seems to make everything work.  Can you confirm/double check?

>> In configure.in, please update the version number in AM_INIT_AUTOMAKE.
> AM_INIT_AUTOMAKE(io-romio, 1.0.0, 'no')
> is changed to
> AM_INIT_AUTOMAKE(io-romio, 1.0.1, 'no')

Can we use whatever the real ROMIO version number is?  

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/




Re: [OMPI devel] 1.5 plans

2010-11-30 Thread Terry Dontje

On 11/30/2010 09:00 AM, Jeff Squyres wrote:

On Nov 30, 2010, at 8:54 AM, Joshua Hursey wrote:


Can you make a v1.7 milestone on Trac, so I can move some of my tickets?

Done.
I have a question about Josh's recent ticket moves.  One of them 
mentions 1.5 is stablizing quickly Josh can you clarify what you mean by 
quickly because I think there will be a 1.5 release 3-6 months from 
now.  So does that fall into your quickly perspective?


--td

Some are CMRs, but a couple are defects, with fixes in development, that 
without those CMRs cannot be moved to v1.5.

Thanks,
Josh


On Nov 29, 2010, at 11:43 AM, Jeff Squyres wrote:


I'm about 2 weeks late on this email; apologies.  SC and Thanksgiving got in 
the way.

Per a discussion on the devel teleconf nearly 3 weeks ago, we have decided what 
to do with the v1.5 series:

- 1.5.1 will be a bug fix release.  There's 2 blocker bugs right now that need 
to be reviewed; those and the currently ready-to-commit major CMR are all that 
is planned for 1.5.1.  Hopefully, they could be ready by tonight.

- 1.5.2 (and successive releases) will be "normal" feature releases.  There's a 
bit of divergence between the trunk and the v1.5 branch, meaning that some porting of 
features may be required to get over to the v1.5 branch (FWIW, I think that many things 
will not require much porting at all -- but some will).  Many of the CMRs filed against 
v1.5.2 are still relevant; *some* of the features/bugs are still relevant.  We'll start 
[re-]examining the v1.5.2 tickets in more detail soon.  So feel free to apply to have 
your favorite feature brought over to the v1.5 branch.  Bigger features may be kept in 
the wings for v1.7 (e.g., the wholesale ORTE refresh for v1.5.x has been axed and will 
wait until v1.7).  There is a bunch of affinity work occurring on the trunk (and/or in hg 
branches) right now; we plan to bring all that stuff in to the v1.5 series when ready 
(probably 3+ months at the earliest -- especially with the December holidays delaying 
everything).  Once that's done, we!

   !

ca!

n then probably start thinking about wrapping up the v1.5 series, converting it 
to its stable counterpart (1.6), and then branching for v1.7.

--
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



Joshua Hursey
Postdoctoral Research Associate
Oak Ridge National Laboratory
http://users.nccs.gov/~jjhursey


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel





--
Oracle
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle *- Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.don...@oracle.com 





Re: [OMPI devel] Warning on fork() disappears if I use MPI threads!!

2010-11-30 Thread Ken Cain

Hi Jeff,

We have had some recent experience with this in an Open MPI 1.4.x 
version and thought it would be useful to contribute to the discussion. 
Please see below.


Jeff Squyres wrote:

On Nov 29, 2010, at 6:25 PM, George Bosilca wrote:


The main problem is that openib require to pin memory pages in order to take 
advantage of RMA features. There is a major issues with these pinned pages and 
fork, leading to segmentation fault in some specific cases. However, we only 
pin the pages on the MPI calls related to data transfers. Therefore, if you 
call fork __before__ any other MPI data transfer function (but after MPI_Init 
as you use the process rank), your application should be safe.


Note that Open MPI also pins some internal memory during MPI_INIT, but that 
memory is totally internal to libmpi, so you should be safe (i.e., you should 
never be able to find it and therefore never be able to try to touch it).


This is what we believe happened in our testing:

1. MPI_init allocated and pinned down some memory. This memory was 64 
byte aligned and not page-aligned to 4096 bytes. So an allocation that 
ideally should have resulted in 2 pages being pinned, actually had 3 
pages pinned with lots of unused memory on the 3rd page.


2. A child process created via popen tried to allocate some memory 
(perhaps a byproduct of popen execution itself) and was allocated memory 
on that last page with lots of unused memory. When the child tried to 
touch the allocation, there was seg fault.


We could reduce the probability of this happenning by changing the 
alignment of MPI allocations to 4096 bytes. But since MPI allocations 
are not sized to be multiple of page size, this isn't a foolproof method.


One way (agreed not ideal) to avoid the potential seg fault is to set 
the MCA parameter btl_openib_want_fork_suppoort = 0. But then you are 
"trusting" any child processes to not intentionally or as a result of a 
bug, touch the memory regions that have been registered/pinned by the 
parent.





How can one be sure that the disabling the warning is ok? Could you please 
elaborate on what makes forks vulnerable? May be that will guide the developers 
to make an informed decision on whether to disable them or find another 
alternative.

No way to know at 100%. Now for an elaborate answer: Once upon a time ... The 
fork story is a long and boring one, we would all have preferred to never heard 
about it (believe me). A quick and compressed version can be found on the 
QLogic download page 
(http://filedownloads.qlogic.com/files/driver/70277/release_QLogicIB-Basic_4400_Rev_A.html).


That's a good summary.  The issue is with OFED itself, not with Open MPI.

Note, too, that calling popen() should also be safe (even though we'll warn 
about it -- our atfork hook has no way of knowing whether you're calling 
system, popen, or something else).



Thanks,

-Ken
--
Ken Cain
Mercury Computer Systems, Inc. (http://www.mc.com)

This message is intended only for the designated recipient(s) and may
contain confidential or proprietary information of Mercury Computer
Systems, Inc. This message is solely intended to facilitate business
discussions and does not constitute an express or implied offer to sell
or purchase any products, services, or support. Any commitments must be
made in writing and signed by duly authorized representatives of each
party.


Re: [OMPI devel] 1.5 plans

2010-11-30 Thread Joshua Hursey
(Insert jab at the definition of 'quickly' when talking about OMPI releases)

>From the way I read Jeff's original email, it seems that we are trying to get 
>v1.5 stable so we can start v1.7 in the next few months (3-5). The C/R 
>functionality on the trunk is significantly different than that on the v1.5 
>(and more so with v1.4). So brining these features over the v1.5 branch will 
>require a CMR that will look like re-syncing to the trunk (it requires the 
>ORTE refresh, and a couple other odds and ends). Since the ORTE refresh was 
>killed due to the size of the feature, so has the C/R features. So even though 
>the v1.5 is a feature branch, the C/R feature is locked out of it at the 
>moment and pushed to v1.7.

So, from my perspective, there is now a push to hurry up on the v1.7 so users 
will have a release branch with the latest-n-greatest C/R functionality. 
Releasing v1.7 next summer would be fine with me, but pushing it further into 
the future seems bad to me.


As a side comment:
The stable branch is a great idea for the production side of the house since it 
is more carefully crafted and maintained. The feature branch is a great idea 
for the researchers in the group to gain exposure for new features, and 
enhancements on old features (many of these require changes to internal APIs 
and data structures). From my perspective, a slow moving feature branch is no 
longer that useful to the research community since it becomes more and more 
painful to synchronize the trunk and branch the longer it takes for the feature 
branch to stabilize for release. So the question often becomes why bother. But 
this a longer discussion for another time maybe.

-- Josh

On Nov 30, 2010, at 9:36 AM, Terry Dontje wrote:

> On 11/30/2010 09:00 AM, Jeff Squyres wrote:
>> On Nov 30, 2010, at 8:54 AM, Joshua Hursey wrote:
>> 
>> 
>>> Can you make a v1.7 milestone on Trac, so I can move some of my tickets?
>>> 
>> Done.
>> 
> I have a question about Josh's recent ticket moves.  One of them mentions 1.5 
> is stablizing quickly Josh can you clarify what you mean by quickly because I 
> think there will be a 1.5 release 3-6 months from now.  So does that fall 
> into your quickly perspective?
> 
> --td
>>> Some are CMRs, but a couple are defects, with fixes in development, that 
>>> without those CMRs cannot be moved to v1.5.
>>> 
>>> Thanks,
>>> Josh
>>> 
>>> 
>>> On Nov 29, 2010, at 11:43 AM, Jeff Squyres wrote:
>>> 
>>> 
 I'm about 2 weeks late on this email; apologies.  SC and Thanksgiving got 
 in the way.
 
 Per a discussion on the devel teleconf nearly 3 weeks ago, we have decided 
 what to do with the v1.5 series:
 
 - 1.5.1 will be a bug fix release.  There's 2 blocker bugs right now that 
 need to be reviewed; those and the currently ready-to-commit major CMR are 
 all that is planned for 1.5.1.  Hopefully, they could be ready by tonight.
 
 - 1.5.2 (and successive releases) will be "normal" feature releases.  
 There's a bit of divergence between the trunk and the v1.5 branch, meaning 
 that some porting of features may be required to get over to the v1.5 
 branch (FWIW, I think that many things will not require much porting at 
 all -- but some will).  Many of the CMRs filed against v1.5.2 are still 
 relevant; *some* of the features/bugs are still relevant.  We'll start 
 [re-]examining the v1.5.2 tickets in more detail soon.  So feel free to 
 apply to have your favorite feature brought over to the v1.5 branch.  
 Bigger features may be kept in the wings for v1.7 (e.g., the wholesale 
 ORTE refresh for v1.5.x has been axed and will wait until v1.7).  There is 
 a bunch of affinity work occurring on the trunk (and/or in hg branches) 
 right now; we plan to bring all that stuff in to the v1.5 series when 
 ready (probably 3+ months at the earliest -- especially with the December 
 holidays delaying everything).  Once that's done, we!
 
>>   !
>> 
>>> ca!
>>> 
 n then probably start thinking about wrapping up the v1.5 series, 
 converting it to its stable counterpart (1.6), and then branching for v1.7.
 
 -- 
 Jeff Squyres
 
 jsquy...@cisco.com
 
 For corporate legal information go to:
 
 http://www.cisco.com/web/about/doing_business/legal/cri/
 
 
 
 ___
 devel mailing list
 
 de...@open-mpi.org
 http://www.open-mpi.org/mailman/listinfo.cgi/devel
 
 
 
>>> 
>>> Joshua Hursey
>>> Postdoctoral Research Associate
>>> Oak Ridge National Laboratory
>>> 
>>> http://users.nccs.gov/~jjhursey
>>> 
>>> 
>>> 
>>> ___
>>> devel mailing list
>>> 
>>> de...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> 
> 
> 
> -- 
> 
> Terry D. Dontje | Principal Software Engineer
> Developer Tools Enginee

Re: [OMPI devel] 1.5 plans

2010-11-30 Thread Terry Dontje

On 11/30/2010 10:10 AM, Joshua Hursey wrote:

(Insert jab at the definition of 'quickly' when talking about OMPI releases)

> From the way I read Jeff's original email, it seems that we are trying to get 
v1.5 stable so we can start v1.7 in the next few months (3-5). The C/R 
functionality on the trunk is significantly different than that on the v1.5 (and 
more so with v1.4). So brining these features over the v1.5 branch will require a 
CMR that will look like re-syncing to the trunk (it requires the ORTE refresh, and 
a couple other odds and ends). Since the ORTE refresh was killed due to the size 
of the feature, so has the C/R features. So even though the v1.5 is a feature 
branch, the C/R feature is locked out of it at the moment and pushed to v1.7.

Yeah, we have successfully deadlocked ourselves.  We got features that 
cannot go in because they rely on stuff we refuse to bringover because 
of stability but at the same time cannot force 1.5 to be 1.6 because 1.5 
isn't stable enough itself.  Quite a pickle.  I still believe a 
refresh/sync of trunk to 1.5 makes sense.  The only other solution is to 
start 1.7 and put 1.5 to bed.   Unfortunately there are some 
implications for Oracle if all the current stuff is put into 1.7 instead 
of 1.5.

So, from my perspective, there is now a push to hurry up on the v1.7 so users 
will have a release branch with the latest-n-greatest C/R functionality. 
Releasing v1.7 next summer would be fine with me, but pushing it further into 
the future seems bad to me.

Well, I think we need to really think about this carefully to make sure 
we do not end up in the same situation 6 months from now.

As a side comment:
The stable branch is a great idea for the production side of the house since it 
is more carefully crafted and maintained. The feature branch is a great idea 
for the researchers in the group to gain exposure for new features, and 
enhancements on old features (many of these require changes to internal APIs 
and data structures). From my perspective, a slow moving feature branch is no 
longer that useful to the research community since it becomes more and more 
painful to synchronize the trunk and branch the longer it takes for the feature 
branch to stabilize for release. So the question often becomes why bother. But 
this a longer discussion for another time maybe.

IMO, the problem is we ended up not stablizing 1.5 quick enough thus 
causing so great of a divergence that we are in the pickle we are now.  
The whole idea was we were to push stuff into 1.5 quickly.  If we cannot 
do that then we may want to reconsider how we handle releases again :-(.


--td

-- Josh

On Nov 30, 2010, at 9:36 AM, Terry Dontje wrote:


On 11/30/2010 09:00 AM, Jeff Squyres wrote:

On Nov 30, 2010, at 8:54 AM, Joshua Hursey wrote:



Can you make a v1.7 milestone on Trac, so I can move some of my tickets?


Done.


I have a question about Josh's recent ticket moves.  One of them mentions 1.5 
is stablizing quickly Josh can you clarify what you mean by quickly because I 
think there will be a 1.5 release 3-6 months from now.  So does that fall into 
your quickly perspective?

--td

Some are CMRs, but a couple are defects, with fixes in development, that 
without those CMRs cannot be moved to v1.5.

Thanks,
Josh


On Nov 29, 2010, at 11:43 AM, Jeff Squyres wrote:



I'm about 2 weeks late on this email; apologies.  SC and Thanksgiving got in 
the way.

Per a discussion on the devel teleconf nearly 3 weeks ago, we have decided what 
to do with the v1.5 series:

- 1.5.1 will be a bug fix release.  There's 2 blocker bugs right now that need 
to be reviewed; those and the currently ready-to-commit major CMR are all that 
is planned for 1.5.1.  Hopefully, they could be ready by tonight.

- 1.5.2 (and successive releases) will be "normal" feature releases.  There's a 
bit of divergence between the trunk and the v1.5 branch, meaning that some porting of 
features may be required to get over to the v1.5 branch (FWIW, I think that many things 
will not require much porting at all -- but some will).  Many of the CMRs filed against 
v1.5.2 are still relevant; *some* of the features/bugs are still relevant.  We'll start 
[re-]examining the v1.5.2 tickets in more detail soon.  So feel free to apply to have 
your favorite feature brought over to the v1.5 branch.  Bigger features may be kept in 
the wings for v1.7 (e.g., the wholesale ORTE refresh for v1.5.x has been axed and will 
wait until v1.7).  There is a bunch of affinity work occurring on the trunk (and/or in hg 
branches) right now; we plan to bring all that stuff in to the v1.5 series when ready 
(probably 3+ months at the earliest -- especially with the December holidays delaying 
everything).  Once that's done, !

  we!

   !


ca!


n then probably start thinking about wrapping up the v1.5 series, converting it 
to its stable counterpart (1.6), and then branching for v1.7.

--
Jeff Squyres

jsquy...@cisco.com

For corporate legal 

Re: [OMPI devel] Warning on fork() disappears if I use MPI threads!!

2010-11-30 Thread Jeff Squyres
Excellent points Ken; thanks!

I expanded the FAQ entry here to include these points:

http://www.open-mpi.org/faq/?category=openfabrics#ofa-fork



On Nov 30, 2010, at 9:52 AM, Ken Cain wrote:

> Hi Jeff,
> 
> We have had some recent experience with this in an Open MPI 1.4.x version and 
> thought it would be useful to contribute to the discussion. Please see below.
> 
> Jeff Squyres wrote:
>> On Nov 29, 2010, at 6:25 PM, George Bosilca wrote:
>>> The main problem is that openib require to pin memory pages in order to 
>>> take advantage of RMA features. There is a major issues with these pinned 
>>> pages and fork, leading to segmentation fault in some specific cases. 
>>> However, we only pin the pages on the MPI calls related to data transfers. 
>>> Therefore, if you call fork __before__ any other MPI data transfer function 
>>> (but after MPI_Init as you use the process rank), your application should 
>>> be safe.
>> Note that Open MPI also pins some internal memory during MPI_INIT, but that 
>> memory is totally internal to libmpi, so you should be safe (i.e., you 
>> should never be able to find it and therefore never be able to try to touch 
>> it).
> 
> This is what we believe happened in our testing:
> 
> 1. MPI_init allocated and pinned down some memory. This memory was 64 byte 
> aligned and not page-aligned to 4096 bytes. So an allocation that ideally 
> should have resulted in 2 pages being pinned, actually had 3 pages pinned 
> with lots of unused memory on the 3rd page.
> 
> 2. A child process created via popen tried to allocate some memory (perhaps a 
> byproduct of popen execution itself) and was allocated memory on that last 
> page with lots of unused memory. When the child tried to touch the 
> allocation, there was seg fault.
> 
> We could reduce the probability of this happenning by changing the alignment 
> of MPI allocations to 4096 bytes. But since MPI allocations are not sized to 
> be multiple of page size, this isn't a foolproof method.
> 
> One way (agreed not ideal) to avoid the potential seg fault is to set the MCA 
> parameter btl_openib_want_fork_suppoort = 0. But then you are "trusting" any 
> child processes to not intentionally or as a result of a bug, touch the 
> memory regions that have been registered/pinned by the parent.
> 
 How can one be sure that the disabling the warning is ok? Could you please 
 elaborate on what makes forks vulnerable? May be that will guide the 
 developers to make an informed decision on whether to disable them or find 
 another alternative.
>>> No way to know at 100%. Now for an elaborate answer: Once upon a time ... 
>>> The fork story is a long and boring one, we would all have preferred to 
>>> never heard about it (believe me). A quick and compressed version can be 
>>> found on the QLogic download page 
>>> (http://filedownloads.qlogic.com/files/driver/70277/release_QLogicIB-Basic_4400_Rev_A.html).
>> That's a good summary.  The issue is with OFED itself, not with Open MPI.
>> Note, too, that calling popen() should also be safe (even though we'll warn 
>> about it -- our atfork hook has no way of knowing whether you're calling 
>> system, popen, or something else).
> 
> Thanks,
> 
> -Ken
> -- 
> Ken Cain
> Mercury Computer Systems, Inc. (http://www.mc.com)
> 
> This message is intended only for the designated recipient(s) and may
> contain confidential or proprietary information of Mercury Computer
> Systems, Inc. This message is solely intended to facilitate business
> discussions and does not constitute an express or implied offer to sell
> or purchase any products, services, or support. Any commitments must be
> made in writing and signed by duly authorized representatives of each
> party.
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/




[OMPI devel] 1.5.1rc1 is out

2010-11-30 Thread Jeff Squyres
This is primarily a bug-fix release over the 1.5 release.  Please test it 
heavily so that we can get 1.5.1 (final) out the door before Christmas.

http://www.open-mpi.org/software/ompi/v1.5/

Thanks!

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/




[OMPI devel] confusion between slot and procs on mca/rmaps

2010-11-30 Thread Damien Guinier

hi all,

Many time, there are no difference between "proc" and "slot". But when 
you use "mpirun -cpus-per-proc X", slot have X procs.
On orte/mca/rmaps/base/rmaps_base_common_mappers.c, there are a 
confusion between proc and slot. this little error impact mapping action:


On OMPI last version with 32 cores compute node:
salloc -n 8 -c 8 mpirun -bind-to-core -bycore ./a.out
[rank:0]: host:compute18
[rank:1]: host:compute19
[rank:2]: host:compute18
[rank:3]: host:compute19
[rank:4]: host:compute18
[rank:5]: host:compute19
[rank:6]: host:compute18
[rank:7]: host:compute19

with patch:
[rank:0]: host:compute18
[rank:1]: host:compute18
[rank:2]: host:compute18
[rank:3]: host:compute18
[rank:4]: host:compute19
[rank:5]: host:compute19
[rank:6]: host:compute19
[rank:7]: host:compute19

Can you say, if my patch is correct ?

Thanks you

Damien

diff -r 97ad060b8e48 orte/mca/rmaps/base/rmaps_base_common_mappers.c
--- a/orte/mca/rmaps/base/rmaps_base_common_mappers.c   Thu Oct 14 11:05:54 
2010 +0200
+++ b/orte/mca/rmaps/base/rmaps_base_common_mappers.c   Mon Oct 18 13:57:22 
2010 +0200
@@ -191,7 +191,8 @@
 if (0 == node->slots_alloc) {
 num_procs_to_assign = 1;
 } else {
-num_possible_procs = node->slots_alloc / 
jdata->map->cpus_per_rank;
+//In rmaps_base_common_mappers 'num_possible_procs' define 
number of ranks
+num_possible_procs = node->slots_alloc;
 if (0 == num_possible_procs) {
 num_procs_to_assign = 1;
 } else {
@@ -199,7 +200,8 @@
 }
 }
 } else {
-num_possible_procs = (node->slots_alloc - node->slots_inuse) / 
jdata->map->cpus_per_rank;
+//In rmaps_base_common_mappers 'num_possible_procs' define number 
of ranks
+num_possible_procs = (node->slots_alloc - node->slots_inuse);
 if (0 == num_possible_procs) {
 num_procs_to_assign = 1;
 } else {
diff -r 97ad060b8e48 orte/mca/rmaps/base/rmaps_base_support_fns.c
--- a/orte/mca/rmaps/base/rmaps_base_support_fns.c  Thu Oct 14 11:05:54 
2010 +0200
+++ b/orte/mca/rmaps/base/rmaps_base_support_fns.c  Mon Oct 18 13:57:22 
2010 +0200
@@ -339,7 +339,7 @@
  ORTE_JOBID_PRINT(jdata->jobid), current_node->name));

 /* Be sure to demarcate the slots for this proc as claimed from the node */
-current_node->slots_inuse += cpus_per_rank;
+current_node->slots_inuse += 1;

 /* see if this node is oversubscribed now */
 if (current_node->slots_inuse > current_node->slots) {


Re: [OMPI devel] Sending large messages over RDMA fails

2010-11-30 Thread Jeff Squyres
On Nov 29, 2010, at 3:51 AM, Doron Shoham wrote:

> If only the PUT flag is set and/or the btl supports only PUT method then the 
> sender will allocate a rendezvous header and will not eager send any data. 
> The receiver will schedule rdma PUT(s) of the entire message.
> It is done in mca_pml_ob1_recv_request_schedule_once()
> (ompi/mca/pml/ob1/pml_ob1_recvreq.c:683).
> We can limit the size passing to mca_bml_base_prepare_dst() to be minimum 
> between btl.max_message_size supported by the HCA and the actual message size.
> The will result a fragmentation of the RDMA write messages.

I would think that we should set btl.max_message_size during init to be the 
minimum of the MCA param and the max supported by the HCA, right?  Then there's 
no need for this min() in the critical path.

Additionally, the message must be smaller than the max message size of *both* 
HCAs, right?  So it might be necessary to add the max message size into the 
openib BTL modex data so that you can use it in mca_bml_base_prepare_dst() (or 
whatever -- been a long time since I've mucked around in there...) to compute 
the min between the two peers.

So you might still need a min, but for a different reason than what you 
originally mentioned.

> The bigger problem is when using the GET flow.
> In this flow the receiver allocate one big buffer to receive the message with 
> RDMA read in one chunk.
> There is no fragmentation mechanism in this flow which make it harder to 
> solve this issue

Doh.  I'm afraid I don't know why this was done this way originally...

> Reading the max message size supported by the HCA can be done by using verbs.
>  
> The second approach is to use RDMA direct only if the message size is smaller 
> than the max message size supported by the HCA.
>  
> Here is where the long message protocol is chosen:
> ompi/mca/pml/ob1/pml_ob1_sendreq.h line 382.
>  
> We could use the second approach until a fragmentation mechanism will be 
> added to the RDMA direct GET flow.

Are you suggesting that pml_ob1_sendreq.h:382 compare the message length to the 
btl.max_message_size and choose RDMA direct vs. RDMA pipelined?  If so, that 
might be sufficient.

But what to do about the peer's max message size?

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/