[OMPI devel] CUDA support doesn't work starting from 1.9a1r27862

2013-01-24 Thread Alessandro Fanfarillo
Dear all,
I would like to report a bug for the CUDA support on the last 5 trunk
versions.
The attached code is a simply send/receive test case which correctly works
with version 1.9a1r27844.

Starting from version 1.9a1r27862 up to 1.9a1r27897 I get the following
message:

./test: symbol lookup error: /usr/local/openmpi/lib/openmpi/mca_pml_ob1.so:
undefined symbol: progress_one_cuda_htod_event
./test: symbol lookup error: /usr/local/openmpi/lib/openmpi/mca_pml_ob1.so:
undefined symbol: progress_one_cuda_htod_event
--
mpirun has exited due to process rank 0 with PID 21641 on
node ip-10-16-24-100 exiting improperly. There are three reasons this could
occur:

1. this process did not call "init" before exiting, but others in
the job did. This can cause a job to hang indefinitely while it waits
for all processes to call "init". By rule, if one process calls "init",
then ALL processes must call "init" prior to termination.

2. this process called "init", but exited without calling "finalize".
By rule, all processes that call "init" MUST call "finalize" prior to
exiting or it will be considered an "abnormal termination"

3. this process called "MPI_Abort" or "orte_abort" and the mca parameter
orte_create_session_dirs is set to false. In this case, the run-time cannot
detect that the abort call was an abnormal termination. Hence, the only
error message you will receive is this one.

This may have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).

You can avoid this message by specifying -quiet on the mpirun command line.

-

I'm using gcc-4.7.2 and CUDA 4.2. The test fails also with CUDA 4.1.

Thanks in advance.

Best regards.

Alessandro Fanfarillo


test.tar.bz2
Description: BZip2 compressed data


Re: [OMPI devel] CUDA support doesn't work starting from 1.9a1r27862

2013-01-24 Thread Rolf vandeVaart
Thanks for this report.  I will look into this.  Can you tell me what your 
mpirun command looked like and do you know what transport you are running over?
Specifically, is this on a single node or multiple nodes?

Rolf

From: devel-boun...@open-mpi.org [mailto:devel-boun...@open-mpi.org] On Behalf 
Of Alessandro Fanfarillo
Sent: Thursday, January 24, 2013 4:11 AM
To: de...@open-mpi.org
Subject: [OMPI devel] CUDA support doesn't work starting from 1.9a1r27862

Dear all,
I would like to report a bug for the CUDA support on the last 5 trunk versions.
The attached code is a simply send/receive test case which correctly works with 
version 1.9a1r27844.
Starting from version 1.9a1r27862 up to 1.9a1r27897 I get the following message:

./test: symbol lookup error: /usr/local/openmpi/lib/openmpi/mca_pml_ob1.so: 
undefined symbol: progress_one_cuda_htod_event
./test: symbol lookup error: /usr/local/openmpi/lib/openmpi/mca_pml_ob1.so: 
undefined symbol: progress_one_cuda_htod_event
--
mpirun has exited due to process rank 0 with PID 21641 on
node ip-10-16-24-100 exiting improperly. There are three reasons this could 
occur:

1. this process did not call "init" before exiting, but others in
the job did. This can cause a job to hang indefinitely while it waits
for all processes to call "init". By rule, if one process calls "init",
then ALL processes must call "init" prior to termination.

2. this process called "init", but exited without calling "finalize".
By rule, all processes that call "init" MUST call "finalize" prior to
exiting or it will be considered an "abnormal termination"

3. this process called "MPI_Abort" or "orte_abort" and the mca parameter
orte_create_session_dirs is set to false. In this case, the run-time cannot
detect that the abort call was an abnormal termination. Hence, the only
error message you will receive is this one.

This may have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).

You can avoid this message by specifying -quiet on the mpirun command line.

-
I'm using gcc-4.7.2 and CUDA 4.2. The test fails also with CUDA 4.1.
Thanks in advance.

Best regards.

Alessandro Fanfarillo



---
This email message is for the sole use of the intended recipient(s) and may 
contain
confidential information.  Any unauthorized review, use, disclosure or 
distribution
is prohibited.  If you are not the intended recipient, please contact the 
sender by
reply email and destroy all copies of the original message.
---


Re: [OMPI devel] New ARM patch

2013-01-24 Thread Leif Lindholm

On 24/01/13 02:54, Jeff Squyres (jsquyres) wrote:

[snip] Basic point is - this is an insufficiently validated patch
referred to as "an ugly kludge" by the original author (Jon
Masters@Red Hat), who created it to be able to include it in the
Fedora ARMv5 port. I has previously provided suggestions for
improvements, but it has still been submitted to the Open MPI
users list without any of those suggestions being acted on.

I admit to being slightly miffed with it being accepted and
applied without ever being mentioned on the Open MPI developers
list


It was done by one of the core committers (George); it's in our
community's culture to go commit without discussion on the devel
list for many kinds of things.


OK. In which case I probably _should_ be on that list.
*cough* might I however suggest that a statement to that effect is added
to http://www.open-mpi.org/community/lists/ompi.php ?


FWIW: Since we all know each other pretty well, we do a lot of
communication via IM and telephone in addition to the public mailing
list discussions.  This is not because we're discussing secret
things -- it's just that you can get a lot more accomplished in a 10
minute phone call than 15 back-n-forth, 10-page, highly detailed
emails.


Sure.


A list to which I now find myself subscribed to without having
asked for or being told about - miffed again.


Sorry about that; this was my fault.  I interpreted your off-post
mails to me about not being able to post to the users list as an ask
to be subscribed (since we don't allow posts from unsubscribed
users).


Understandable - apologies for overreacting.


Rather than unsubscribe you, though, I just marked you as "nomail"
on the users' list.  So you won't receive any further mail from that
list, but you're still subscribed, so you can post.


Thanks.


I tested this patch in v1.6 and v1.7 on my Pi, and it seems to work
just fine.  "make check" passes all the ASM tests.


Just to be perfectly clear: it wouldn't on ARMv5 though, and the ARMv6
ASM test executed with NOPs for barriers, although it would correctly
pass all other tests.


To be clear: I consider you to be the primary author and maintainer
of this code, and you're certainly more of an ARM expert than any of
us.  George may not have realized that someone from ARM was still an
active part of the community; I'm not sure.


I'm certainly not very visible :)
But I do try to pay attention.


But I, too, vote that we should back out his changes from the trunk
and put your suggested patch (his patch did not make it over to v1.6
or v1.7, because I was waiting for your response).

We actually do try to get consensus for these kinds of things, so
let's give George a little time to respond before backing it out.


Sure.

Regards,

Leif

-- IMPORTANT NOTICE: The contents of this email and any attachments are 
confidential and may also be privileged. If you are not the intended recipient, 
please notify the sender immediately and do not disclose the contents to any 
other person, use it for any purpose, or store or copy the information in any 
medium.  Thank you.




Re: [OMPI devel] CUDA support doesn't work starting from 1.9a1r27862

2013-01-24 Thread Alessandro Fanfarillo
I usually run "mpirun -np 2 ./test". I execute always on a single node. The
message appears either with 1 or 2 GPUs on the single node.


2013/1/24 Rolf vandeVaart 

> Thanks for this report.  I will look into this.  Can you tell me what your
> mpirun command looked like and do you know what transport you are running
> over?
>
> Specifically, is this on a single node or multiple nodes?
>
> ** **
>
> Rolf
>
> ** **
>
> *From:* devel-boun...@open-mpi.org [mailto:devel-boun...@open-mpi.org] *On
> Behalf Of *Alessandro Fanfarillo
> *Sent:* Thursday, January 24, 2013 4:11 AM
> *To:* de...@open-mpi.org
> *Subject:* [OMPI devel] CUDA support doesn't work starting from
> 1.9a1r27862
>
> ** **
>
> Dear all,
>
> I would like to report a bug for the CUDA support on the last 5 trunk
> versions.
>
> The attached code is a simply send/receive test case which correctly works
> with version 1.9a1r27844. 
>
> Starting from version 1.9a1r27862 up to 1.9a1r27897 I get the following
> message:
>
> ./test: symbol lookup error:
> /usr/local/openmpi/lib/openmpi/mca_pml_ob1.so: undefined symbol:
> progress_one_cuda_htod_event
> ./test: symbol lookup error:
> /usr/local/openmpi/lib/openmpi/mca_pml_ob1.so: undefined symbol:
> progress_one_cuda_htod_event
> --
> mpirun has exited due to process rank 0 with PID 21641 on
> node ip-10-16-24-100 exiting improperly. There are three reasons this
> could occur:
>
> 1. this process did not call "init" before exiting, but others in
> the job did. This can cause a job to hang indefinitely while it waits
> for all processes to call "init". By rule, if one process calls "init",
> then ALL processes must call "init" prior to termination.
>
> 2. this process called "init", but exited without calling "finalize".
> By rule, all processes that call "init" MUST call "finalize" prior to
> exiting or it will be considered an "abnormal termination"
>
> 3. this process called "MPI_Abort" or "orte_abort" and the mca parameter
> orte_create_session_dirs is set to false. In this case, the run-time cannot
> detect that the abort call was an abnormal termination. Hence, the only
> error message you will receive is this one.
>
> This may have caused other processes in the application to be
> terminated by signals sent by mpirun (as reported here).
>
> You can avoid this message by specifying -quiet on the mpirun command line.
> 
>
>
>
> -
> 
>
> I'm using gcc-4.7.2 and CUDA 4.2. The test fails also with CUDA 4.1.
>
> Thanks in advance.
>
> Best regards.
>
> Alessandro Fanfarillo
>
> ** **
>
> ** **
>  --
>  This email message is for the sole use of the intended recipient(s) and
> may contain confidential information.  Any unauthorized review, use,
> disclosure or distribution is prohibited.  If you are not the intended
> recipient, please contact the sender by reply email and destroy all copies
> of the original message.
>  --
>
>
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>


Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r27880 - trunk/ompi/request

2013-01-24 Thread Jeff Squyres (jsquyres)
Many thanks for the summary!

Can you file tickets about this stuff against 1.7?  Included your patches, etc. 

These are pretty obscure issues and I'm ok not fixing them in the 1.6 branch 
(unless someone has a burning desire to get them fixed in 1.6). 

But we should properly track and fix these in the 1.7 series. I'd mark them as 
"critical" so that they don't get lost in the wilderness of other bugs. 

Sent from my phone. No type good. 

On Jan 22, 2013, at 8:57 PM, "Kawashima, Takahiro"  
wrote:

> George,
> 
> I reported the bug three months ago.
> Your commit r27880 resolved one of the bugs reported by me,
> in another approach.
> 
>  http://www.open-mpi.org/community/lists/devel/2012/10/11555.php
> 
> But other bugs are still open.
> 
> "(1) MPI_SOURCE of MPI_Status for a null request must be MPI_ANY_SOURCE."
> in my previous mail is not fixed yet. This can be fixed by my patch
> (ompi/mpi/c/wait.c and ompi/request/request.c part only) attached
> in my another mail.
> 
>  http://www.open-mpi.org/community/lists/devel/2012/10/11561.php
> 
> "(2) MPI_Status for an inactive request must be an empty status."
> in my previous mail is partially fixed. MPI_Wait is fixed by your
> r27880. But MPI_Waitall and MPI_Testall should be fixed.
> Codes similar to your r27880 should be inserted to
> ompi_request_default_wait_all and ompi_request_default_test_all.
> 
> You can confirm the fixes by the test program status.c attached in
> my previous mail. Run with -n 2. 
> 
>  http://www.open-mpi.org/community/lists/devel/2012/10/11555.php
> 
> Regards,
> Takahiro Kawashima,
> MPI development team,
> Fujitsu
> 
>> To be honest it was hanging in one of my repos for some time. If I'm not 
>> mistaken it is somehow related to one active ticket (but I couldn't find the 
>> info). It might be good to push it upstream.
>> 
>>  George.
>> 
>> On Jan 22, 2013, at 16:27 , "Jeff Squyres (jsquyres)"  
>> wrote:
>> 
>>> George --
>>> 
>>> Is there any reason not to CMR this to v1.6 and v1.7?
>>> 
>>> 
>>> On Jan 21, 2013, at 6:35 AM, svn-commit-mai...@open-mpi.org wrote:
>>> 
 Author: bosilca (George Bosilca)
 Date: 2013-01-21 06:35:42 EST (Mon, 21 Jan 2013)
 New Revision: 27880
 URL: https://svn.open-mpi.org/trac/ompi/changeset/27880
 
 Log:
 My understanding is that an MPI_WAIT() on an inactive request should
 return the empty status (MPI 3.0 page 52 line 46).
 
 Text files modified: 
 trunk/ompi/request/req_wait.c | 3 +++  

 1 files changed, 3 insertions(+), 0 deletions(-)
 
 Modified: trunk/ompi/request/req_wait.c
 ==
 --- trunk/ompi/request/req_wait.cSat Jan 19 19:33:42 2013(r27879)
 +++ trunk/ompi/request/req_wait.c2013-01-21 06:35:42 EST (Mon, 21 Jan 
 2013)(r27880)
 @@ -61,6 +61,9 @@
   }
   if( req->req_persistent ) {
   if( req->req_state == OMPI_REQUEST_INACTIVE ) {
 +if (MPI_STATUS_IGNORE != status) {
 +*status = ompi_status_empty;
 +}
   return OMPI_SUCCESS;
   }
   req->req_state = OMPI_REQUEST_INACTIVE;
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel



Re: [OMPI devel] [patch] MPI-2.2: Ordering of attribution deletion callbacks on MPI_COMM_SELF

2013-01-24 Thread KAWASHIMA Takahiro
Jeff, George,

I've implemented George's idea for ticket #3123 "MPI-2.2: Ordering of
attribution deletion callbacks on MPI_COMM_SELF". See attached
delete-attr-order.patch.

It is implemented by creating a temporal array of ordered attribute_value_t
pointers at ompi_attr_delete_all() call using attribute creation sequence
numbers. It requires linear cost only at the communicator destruction
stage and its implementation is rather simpler than my previous patch.

And apart from this MPI-2.2 ticket, I found some minor bugs and typos
in attribute.c and attribute.h. They can be fixed by the attached
attribute-bug-fix.patch. All fixes are assembled into one patch file.

I've pushed my modifications to Bitbucket.
  https://bitbucket.org/rivis/openmpi-delattrorder/src/49bf3dc7cdbc/?at=sequence
Note that my modifications are in "sequence" branch, not "default" branch.
I had committed each implementation/fixes independently that are
assembled in two patches attached to this mail. So you can see
comment/diff of each modification on Bitbucket.
  https://bitbucket.org/rivis/openmpi-delattrorder/commits/all
Changesets eaa2432 and ace994b are for ticket #3123,
and other 7 latest changesets are for bug/typo-fixes.

Regards,
KAWASHIMA Takahiro

> Jeff,
> 
> OK. I'll try implementing George's idea and then you can compare which
> one is simpler.
> 
> Regards,
> KAWASHIMA Takahiro
> 
> > Not that I'm aware of; that would be great.
> > 
> > Unlike George, however, I'm not concerned about converting to linear 
> > operations for attributes.
> > 
> > Attributes are not used often, but when they are:
> > 
> > a) there aren't many of them (so a linear penalty is trivial)
> > b) they're expected to be low performance
> > 
> > So if it makes the code simpler, I certainly don't mind linear operations.
> > 
> > 
> > 
> > On Jan 17, 2013, at 9:32 AM, KAWASHIMA Takahiro 
> >  wrote:
> > 
> > > George,
> > > 
> > > Your idea makes sense.
> > > Is anyone working on it? If not, I'll try.
> > > 
> > > Regards,
> > > KAWASHIMA Takahiro
> > > 
> > >> Takahiro,
> > >> 
> > >> Thanks for the patch. I deplore the lost of the hash table in the 
> > >> attribute management, as the potential of transforming all attributes 
> > >> operation to a linear complexity is not very appealing.
> > >> 
> > >> As you already took the decision C, it means that at the communicator 
> > >> destruction stage the hash table is not relevant anymore. Thus, I would 
> > >> have converted the hash table to an ordered list (ordered by the 
> > >> creation index, a global entity atomically updated every time an 
> > >> attribute is created), and proceed to destroy the attributed in the 
> > >> desired order. Thus instead of having a linear operation for every 
> > >> operation on attributes, we only have a single linear operation per 
> > >> communicator (and this during the destruction stage).
> > >> 
> > >>  George.
> > >> 
> > >> On Jan 16, 2013, at 16:37 , KAWASHIMA Takahiro 
> > >>  wrote:
> > >> 
> > >>> Hi,
> > >>> 
> > >>> I've implemented ticket #3123 "MPI-2.2: Ordering of attribution deletion
> > >>> callbacks on MPI_COMM_SELF".
> > >>> 
> > >>> https://svn.open-mpi.org/trac/ompi/ticket/3123
> > >>> 
> > >>> As this ticket says, attributes had been stored in unordered hash.
> > >>> So I've replaced opal_hash_table_t with opal_list_t and made necessary
> > >>> modifications for it. And I've also fixed some multi-threaded concurrent
> > >>> (get|set|delete)_attr call issues.
> > >>> 
> > >>> By this modification, following behavior changes are introduced.
> > >>> 
> > >>> (A) MPI_(Comm|Type|Win)_(get|set|delete)_attr function may be slower
> > >>> for MPI objects that has many attributes attached.
> > >>> (B) When the user-defined delete callback function is called, the
> > >>> attribute is already removed from the list. In other words,
> > >>> if MPI_(Comm|Type|Win)_get_attr is called by the user-defined
> > >>> delete callback function for the same attribute key, it returns
> > >>> flag = false.
> > >>> (C) Even if the user-defined delete callback function returns non-
> > >>> MPI_SUCCESS value, the attribute is not reverted to the list.
> > >>> 
> > >>> (A) is due to a sequential list search instead of a hash. See find_value
> > >>> function for its implementation.
> > >>> (B) and (C) are due to an atomic deletion of the attribute to allow
> > >>> multi-threaded concurrent (get|set|delete)_attr call in 
> > >>> MPI_THREAD_MULTIPLE.
> > >>> See ompi_attr_delete function for its implementation. I think this does
> > >>> not matter because MPI standard doesn't specify behavior in such cases.
> > >>> 
> > >>> The patch for Open MPI trunk is attached. If you like it, take in
> > >>> this patch.
> > >>> 
> > >>> Though I'm a employee of a company, this is my independent and private
> > >>> work at my home. No intellectual property from my company. If needed,
> > >>> I'll sign to Individual Contributor License Agreement.
diff -r 287add548d08

Re: [OMPI devel] [EXTERNAL] Re: RTE Framework

2013-01-24 Thread Richard Graham
So 3 units of modularity at this stage - got it.


Thanks,
Rich

-Original Message-
From: devel-boun...@open-mpi.org [mailto:devel-boun...@open-mpi.org] On Behalf 
Of Barrett, Brian W
Sent: Wednesday, January 23, 2013 5:26 PM
To: Open MPI Developers
Subject: Re: [OMPI devel] [EXTERNAL] Re: RTE Framework

That's not entirely true; there's some state that's required to be held by the 
RTE framework (the ompi_process_info structure), but it's minimal and does not 
scale with number of peers in a job.

In terms of interface, there's now three MPI frameworks which encompass the set 
of functionality the MPI layer needs: rte, pubsub, and dpm (the last two are 
the dynamic process stuff).  The RTE framework is a fairly small set of 
functions, probably 20?  I'm hoping we can shrink it slightly over time, but 
it's going to require some thought and changes to the OMPI layer, so I didn't 
want to do it all in one go.

Brian

On 1/23/13 8:03 AM, "Ralph Castain"  wrote:

>I'm not entirely sure what you're asking here. There is no state at all 
>in the MPI layer - just a set of function calls. Each component in the 
>ompi/mca/rte framework is required to map those function calls to their 
>own implementation. The function calls themselves are just a rename of 
>the current ORTE calls, so the implementations must provide the same 
>functionality - they are simply free to do so however they choose.
>
>
>On Jan 22, 2013, at 11:31 PM, Richard Graham 
>wrote:
>
>> Brian,
>>  First - thanks.  I am very happy this is proceeding.
>>  General question here - do you have any idea how much global state 
>>sits behind the current implementation ?  What I am trying to gauge at 
>>what level of granularity one can bring in additional capabilities.
>>  I have not looked in detail yet, but will in the near future.
>> 
>> Thanks,
>> Rich
>> 
>> -Original Message-
>> From: devel-boun...@open-mpi.org [mailto:devel-boun...@open-mpi.org] 
>>On Behalf Of Barrett, Brian W
>> Sent: Monday, January 21, 2013 9:31 PM
>> To: Open MPI Developers
>> Subject: [OMPI devel] RFC: RTE Framework
>> 
>> Hi all -
>> 
>> As discussed at the December developer's meeting, a number of us have 
>>been working on a framework in OMPI to encompass the RTE resources 
>>(typically provided by ORTE).  This follows on work Oak Ridge did on 
>>the ORCA layer, which ended up having a number of technical challenges 
>>and was dropped for a simpler approach.
>> 
>> The interface is still a work in process and designed around the 
>>concept that the ORTE component is a thin renaming around ORTE itself 
>>(this was one of the points the ORTE developers felt strongly about).
>>We think it's ready for comments and coming into the trunk, so are 
>>trying to get it looked at by a broader community.  The Mercurial 
>>repository is available
>> at:
>> 
>>  https://bitbucket.org/rhc/ompi-trunk
>> 
>> This work is focussed only on the creation of a framework to 
>>encompass the RTE interface between OMPI and ORTE.  There are 
>>currently two
>>components:
>> the ORTE component and a test component implemented over PMI.  The 
>>PMI component is only really useful if ORTE is disabled at autogen 
>>time with the --no-orte option to autogen.  Future work to build 
>>against an external OMPI (in progress, on a different branch) will 
>>make using non-orte components slightly more useful.
>> 
>> Anyway, if there aren't any major comments, I'll plan on bringing 
>>this work to the trunk this weekend (Jan 26/27).
>> 
>> Brian
>> 
>> --
>>  Brian W. Barrett
>>  Scalable System Software Group
>>  Sandia National Laboratories
>> 
>> 
>> 
>> 
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
>
>___
>devel mailing list
>de...@open-mpi.org
>http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
>


--
  Brian W. Barrett
  Scalable System Software Group
  Sandia National Laboratories






Re: [OMPI devel] RTE Framework

2013-01-24 Thread Richard Graham
I was trying to figure out what the new interface provides.  Is it supposed to 
provide the ability to replace entire run-time functionality, does it increase 
the modularity or the rte, or something else.

Thanks,
Rich

-Original Message-
From: devel-boun...@open-mpi.org [mailto:devel-boun...@open-mpi.org] On Behalf 
Of Ralph Castain
Sent: Wednesday, January 23, 2013 5:05 PM
To: Open MPI Developers
Subject: Re: [OMPI devel] RTE Framework

I'm not entirely sure what you're asking here. There is no state at all in the 
MPI layer - just a set of function calls. Each component in the ompi/mca/rte 
framework is required to map those function calls to their own implementation. 
The function calls themselves are just a rename of the current ORTE calls, so 
the implementations must provide the same functionality - they are simply free 
to do so however they choose.


On Jan 22, 2013, at 11:31 PM, Richard Graham  wrote:

> Brian,
>  First - thanks.  I am very happy this is proceeding.
>  General question here - do you have any idea how much global state sits 
> behind the current implementation ?  What I am trying to gauge at what level 
> of granularity one can bring in additional capabilities.
>  I have not looked in detail yet, but will in the near future.
> 
> Thanks,
> Rich
> 
> -Original Message-
> From: devel-boun...@open-mpi.org [mailto:devel-boun...@open-mpi.org] On 
> Behalf Of Barrett, Brian W
> Sent: Monday, January 21, 2013 9:31 PM
> To: Open MPI Developers
> Subject: [OMPI devel] RFC: RTE Framework
> 
> Hi all -
> 
> As discussed at the December developer's meeting, a number of us have been 
> working on a framework in OMPI to encompass the RTE resources (typically 
> provided by ORTE).  This follows on work Oak Ridge did on the ORCA layer, 
> which ended up having a number of technical challenges and was dropped for a 
> simpler approach.
> 
> The interface is still a work in process and designed around the concept that 
> the ORTE component is a thin renaming around ORTE itself (this was one of the 
> points the ORTE developers felt strongly about).  We think it's ready for 
> comments and coming into the trunk, so are trying to get it looked at by a 
> broader community.  The Mercurial repository is available
> at:
> 
>  https://bitbucket.org/rhc/ompi-trunk
> 
> This work is focussed only on the creation of a framework to encompass the 
> RTE interface between OMPI and ORTE.  There are currently two components:
> the ORTE component and a test component implemented over PMI.  The PMI 
> component is only really useful if ORTE is disabled at autogen time with the 
> --no-orte option to autogen.  Future work to build against an external OMPI 
> (in progress, on a different branch) will make using non-orte components 
> slightly more useful.
> 
> Anyway, if there aren't any major comments, I'll plan on bringing this work 
> to the trunk this weekend (Jan 26/27).
> 
> Brian
> 
> --
>  Brian W. Barrett
>  Scalable System Software Group
>  Sandia National Laboratories
> 
> 
> 
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



Re: [OMPI devel] RTE Framework

2013-01-24 Thread Ralph Castain
Cool - it doesn't actually increase modularity or anything. The goal wasn't to 
generalize things very much, but rather to provide a way to use different 
ORTE-like versions - e.g., if you want to use PMI without any of the rest of 
the ORTE support, or if you want to attach a fault tolerant version of ORTE.

Any RTE capable of providing the required functionality for modex and barrier 
should be okay, though they might require some ugliness in their rte component 
for the integration.

On Jan 24, 2013, at 8:17 AM, Richard Graham  wrote:

> I was trying to figure out what the new interface provides.  Is it supposed 
> to provide the ability to replace entire run-time functionality, does it 
> increase the modularity or the rte, or something else.
> 
> Thanks,
> Rich
> 
> -Original Message-
> From: devel-boun...@open-mpi.org [mailto:devel-boun...@open-mpi.org] On 
> Behalf Of Ralph Castain
> Sent: Wednesday, January 23, 2013 5:05 PM
> To: Open MPI Developers
> Subject: Re: [OMPI devel] RTE Framework
> 
> I'm not entirely sure what you're asking here. There is no state at all in 
> the MPI layer - just a set of function calls. Each component in the 
> ompi/mca/rte framework is required to map those function calls to their own 
> implementation. The function calls themselves are just a rename of the 
> current ORTE calls, so the implementations must provide the same 
> functionality - they are simply free to do so however they choose.
> 
> 
> On Jan 22, 2013, at 11:31 PM, Richard Graham  wrote:
> 
>> Brian,
>> First - thanks.  I am very happy this is proceeding.
>> General question here - do you have any idea how much global state sits 
>> behind the current implementation ?  What I am trying to gauge at what level 
>> of granularity one can bring in additional capabilities.
>> I have not looked in detail yet, but will in the near future.
>> 
>> Thanks,
>> Rich
>> 
>> -Original Message-
>> From: devel-boun...@open-mpi.org [mailto:devel-boun...@open-mpi.org] On 
>> Behalf Of Barrett, Brian W
>> Sent: Monday, January 21, 2013 9:31 PM
>> To: Open MPI Developers
>> Subject: [OMPI devel] RFC: RTE Framework
>> 
>> Hi all -
>> 
>> As discussed at the December developer's meeting, a number of us have been 
>> working on a framework in OMPI to encompass the RTE resources (typically 
>> provided by ORTE).  This follows on work Oak Ridge did on the ORCA layer, 
>> which ended up having a number of technical challenges and was dropped for a 
>> simpler approach.
>> 
>> The interface is still a work in process and designed around the concept 
>> that the ORTE component is a thin renaming around ORTE itself (this was one 
>> of the points the ORTE developers felt strongly about).  We think it's ready 
>> for comments and coming into the trunk, so are trying to get it looked at by 
>> a broader community.  The Mercurial repository is available
>> at:
>> 
>> https://bitbucket.org/rhc/ompi-trunk
>> 
>> This work is focussed only on the creation of a framework to encompass the 
>> RTE interface between OMPI and ORTE.  There are currently two components:
>> the ORTE component and a test component implemented over PMI.  The PMI 
>> component is only really useful if ORTE is disabled at autogen time with the 
>> --no-orte option to autogen.  Future work to build against an external OMPI 
>> (in progress, on a different branch) will make using non-orte components 
>> slightly more useful.
>> 
>> Anyway, if there aren't any major comments, I'll plan on bringing this work 
>> to the trunk this weekend (Jan 26/27).
>> 
>> Brian
>> 
>> --
>> Brian W. Barrett
>> Scalable System Software Group
>> Sandia National Laboratories
>> 
>> 
>> 
>> 
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r27881 - trunk/ompi/mca/btl/tcp

2013-01-24 Thread George Bosilca
http://fault-tolerance.org/

  George.

On Wed, Jan 23, 2013 at 5:10 PM, Jeff Squyres (jsquyres)
 wrote:
> On Jan 23, 2013, at 10:27 AM, George Bosilca  wrote:
>
>> While we always strive to improve this functionality, it was available as a 
>> separate software packages for quite some time.
>
> What separate software package are you referring to?
>
> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to: 
> http://www.cisco.com/web/about/doing_business/legal/cri/
>
>
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel


Re: [OMPI devel] New ARM patch

2013-01-24 Thread Jeff Squyres (jsquyres)
On Jan 24, 2013, at 8:18 AM, Leif Lindholm  wrote:

> OK. In which case I probably _should_ be on that list.
> *cough* might I however suggest that a statement to that effect is added
> to http://www.open-mpi.org/community/lists/ompi.php ?

Fair point.  Done.

>> I tested this patch in v1.6 and v1.7 on my Pi, and it seems to work
>> just fine.  "make check" passes all the ASM tests.
> 
> Just to be perfectly clear: it wouldn't on ARMv5 though, and the ARMv6
> ASM test executed with NOPs for barriers, although it would correctly
> pass all other tests.

Mmm.  Ok.  So is this a correct list of what is supported right now (i.e., in 
v1.6 with your patch)

ARM4: no
ARM5: no
ARM6: sorta (not multi-core, or anywhere we would need barriers)
ARM7: yes

?

How would George's patch have changed that list?

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/




Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r27880 - trunk/ompi/request

2013-01-24 Thread Kawashima, Takahiro
Jeff,

I've filed the ticket.
https://svn.open-mpi.org/trac/ompi/ticket/3475

Thanks,
Takahiro Kawashima,
MPI development team,
Fujitsu

> Many thanks for the summary!
> 
> Can you file tickets about this stuff against 1.7?  Included your patches, 
> etc. 
> 
> These are pretty obscure issues and I'm ok not fixing them in the 1.6 branch 
> (unless someone has a burning desire to get them fixed in 1.6). 
> 
> But we should properly track and fix these in the 1.7 series. I'd mark them 
> as "critical" so that they don't get lost in the wilderness of other bugs. 
> 
> Sent from my phone. No type good. 
> 
> On Jan 22, 2013, at 8:57 PM, "Kawashima, Takahiro" 
>  wrote:
> 
> > George,
> > 
> > I reported the bug three months ago.
> > Your commit r27880 resolved one of the bugs reported by me,
> > in another approach.
> > 
> >  http://www.open-mpi.org/community/lists/devel/2012/10/11555.php
> > 
> > But other bugs are still open.
> > 
> > "(1) MPI_SOURCE of MPI_Status for a null request must be MPI_ANY_SOURCE."
> > in my previous mail is not fixed yet. This can be fixed by my patch
> > (ompi/mpi/c/wait.c and ompi/request/request.c part only) attached
> > in my another mail.
> > 
> >  http://www.open-mpi.org/community/lists/devel/2012/10/11561.php
> > 
> > "(2) MPI_Status for an inactive request must be an empty status."
> > in my previous mail is partially fixed. MPI_Wait is fixed by your
> > r27880. But MPI_Waitall and MPI_Testall should be fixed.
> > Codes similar to your r27880 should be inserted to
> > ompi_request_default_wait_all and ompi_request_default_test_all.
> > 
> > You can confirm the fixes by the test program status.c attached in
> > my previous mail. Run with -n 2. 
> > 
> >  http://www.open-mpi.org/community/lists/devel/2012/10/11555.php
> > 
> > Regards,
> > Takahiro Kawashima,
> > MPI development team,
> > Fujitsu
> > 
> >> To be honest it was hanging in one of my repos for some time. If I'm not 
> >> mistaken it is somehow related to one active ticket (but I couldn't find 
> >> the info). It might be good to push it upstream.
> >> 
> >>  George.
> >> 
> >> On Jan 22, 2013, at 16:27 , "Jeff Squyres (jsquyres)"  
> >> wrote:
> >> 
> >>> George --
> >>> 
> >>> Is there any reason not to CMR this to v1.6 and v1.7?
> >>> 
> >>> 
> >>> On Jan 21, 2013, at 6:35 AM, svn-commit-mai...@open-mpi.org wrote:
> >>> 
>  Author: bosilca (George Bosilca)
>  Date: 2013-01-21 06:35:42 EST (Mon, 21 Jan 2013)
>  New Revision: 27880
>  URL: https://svn.open-mpi.org/trac/ompi/changeset/27880
>  
>  Log:
>  My understanding is that an MPI_WAIT() on an inactive request should
>  return the empty status (MPI 3.0 page 52 line 46).
>  
>  Text files modified: 
>  trunk/ompi/request/req_wait.c | 3 +++
>   
>  1 files changed, 3 insertions(+), 0 deletions(-)
>  
>  Modified: trunk/ompi/request/req_wait.c
>  ==
>  --- trunk/ompi/request/req_wait.cSat Jan 19 19:33:42 2013(r27879)
>  +++ trunk/ompi/request/req_wait.c2013-01-21 06:35:42 EST (Mon, 21 
>  Jan 2013)(r27880)
>  @@ -61,6 +61,9 @@
>    }
>    if( req->req_persistent ) {
>    if( req->req_state == OMPI_REQUEST_INACTIVE ) {
>  +if (MPI_STATUS_IGNORE != status) {
>  +*status = ompi_status_empty;
>  +}
>    return OMPI_SUCCESS;
>    }
>    req->req_state = OMPI_REQUEST_INACTIVE;