Re: [OMPI devel] MPI_Win_get_group

2007-07-31 Thread Dries Kimpe

* Lisandro Dalcin  [2007-07-30 18:19:21]:

> On 7/30/07, George Bosilca  wrote:
> > In the data-type section there is an advice to implementors that
> > state that a copy can simply increase the reference count if
> > applicable. So, we might want to apply the same logic here ...

> BTW, you just mentioned other obscure case. Do this apply to NAMED
> datatypes? This issue is really cumbersome in File.Get_view().

The MPI_File_get_view description in the standard has some issues related
to copies and named datatypes:

see 
http://www-unix.mcs.anl.gov/~gropp/projects/parallel/MPI/mpi-errata/discuss/fileview/fileview-1-clean.txt

  Dries



Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm



Re: [OMPI devel] minor bug report for building openmpi-1.2.3 on cygwin

2007-07-31 Thread George Bosilca

Andrew,

Thanks for the info. I fix this problem in the 1.3 release. If you  
download the nightly build after revision 15711, it will be corrected.


  Thanks,
george.

On Jul 27, 2007, at 10:19 AM, Andrew Lofthouse wrote:


Hi,

I've just built and installed openmpi-1.2.3 on cygwin.  It seems that
most files depend on opal/mca/timer/windows/timer_windows.h, but
opal/mca/timer/windows/timer_windows_component.c depends on
opal/timer/windows/timer_windows_component.h (which doesn't exist).  I
simply copied timer_windows.h to timer_windows_component.h and it  
built
correctly.  I haven't yet compiled any MPI applications to check  
correct

operation.

Regards,

AJL
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] MPI_Win_get_group

2007-07-31 Thread Lisandro Dalcin
On 7/31/07, Dries Kimpe  wrote:
> The MPI_File_get_view description in the standard has some issues related
> to copies and named datatypes:
>
> see
> http://www-unix.mcs.anl.gov/~gropp/projects/parallel/MPI/mpi-errata/discuss/fileview/fileview-1-clean.txt

Indeed, your comment was exactly the source of my comment (BTW, thank
you, this helped me to fix my Python wrappers)

In general, I think MPI standard should be fixed/clarified in many
places regarding to handling of returned references. Testing for
predefined Comm a Group handling is rather easy, but for Datatypes is
really cumbersome. Perhaps a MPI_Type_is_named(MPI_Datatype, int*flag)
would help a lot. What do you think?



-- 
Lisandro Dalcín
---
Centro Internacional de Métodos Computacionales en Ingeniería (CIMEC)
Instituto de Desarrollo Tecnológico para la Industria Química (INTEC)
Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET)
PTLC - Güemes 3450, (3000) Santa Fe, Argentina
Tel/Fax: +54-(0)342-451.1594



Re: [OMPI devel] MPI_Win_get_group

2007-07-31 Thread Jeff Squyres

On Jul 31, 2007, at 4:52 PM, Lisandro Dalcin wrote:


In general, I think MPI standard should be fixed/clarified in many
places regarding to handling of returned references. Testing for
predefined Comm a Group handling is rather easy, but for Datatypes is
really cumbersome. Perhaps a MPI_Type_is_named(MPI_Datatype, int*flag)
would help a lot. What do you think?


Just curious -- why do you need to know if a handle refers to a  
predefined object?


--
Jeff Squyres
Cisco Systems



[OMPI devel] pml failures?

2007-07-31 Thread Jeff Squyres
I'm getting a pile of test failures when running with the openib and  
tcp BTLs on the trunk.  Gleb is getting some failures, too, but his  
seem to be different than mine.


Here's what I'm seeing from manual MTT runs on my SVN/development  
install -- did you know that MTT could do that? :-)


+-+---+--+--+--+--+
| Phase   | Section   | Pass | Fail | Time out | Skip |
+-+---+--+--+--+--+
| Test Run| intel | 442  | 0| 26   | 0|
| Test Run| ibm   | 173  | 3| 1| 3|
+-+---+--+--+--+--+

The tests that are failing are:

*** WARNING: Test: MPI_Recv_pack_c, np=16, variant=1: TIMED OUT (failed)
*** WARNING: Test: MPI_Ssend_ator_c, np=16, variant=1: TIMED OUT  
(failed)
*** WARNING: Test: MPI_Irecv_pack_c, np=16, variant=1: TIMED OUT  
(failed)
*** WARNING: Test: MPI_Isend_ator_c, np=16, variant=1: TIMED OUT  
(failed)
*** WARNING: Test: MPI_Irsend_rtoa_c, np=16, variant=1: TIMED OUT  
(failed)
*** WARNING: Test: MPI_Ssend_rtoa_c, np=16, variant=1: TIMED OUT  
(failed)

*** WARNING: Test: MPI_Send_rtoa_c, np=16, variant=1: TIMED OUT (failed)
*** WARNING: Test: MPI_Send_ator_c, np=16, variant=1: TIMED OUT (failed)
*** WARNING: Test: MPI_Rsend_rtoa_c, np=16, variant=1: TIMED OUT  
(failed)
*** WARNING: Test: MPI_Reduce_loc_c, np=16, variant=1: TIMED OUT  
(failed)
*** WARNING: Test: MPI_Isend_ator2_c, np=16, variant=1: TIMED OUT  
(failed)
*** WARNING: Test: MPI_Issend_rtoa_c, np=16, variant=1: TIMED OUT  
(failed)
*** WARNING: Test: MPI_Isend_rtoa_c, np=16, variant=1: TIMED OUT  
(failed)
*** WARNING: Test: MPI_Send_ator2_c, np=16, variant=1: TIMED OUT  
(failed)
*** WARNING: Test: MPI_Issend_ator_c, np=16, variant=1: TIMED OUT  
(failed)

*** WARNING: Test: comm_join, np=16, variant=1: TIMED OUT (failed)
*** WARNING: Test: getcount, np=16, variant=1: FAILED
*** WARNING: Test: spawn, np=3, variant=1: FAILED
*** WARNING: Test: spawn_multiple, np=3, variant=1: FAILED

I'm not too worried about the comm spawn/join tests because I think  
they're heavily oversubscribing the nodes and therefore timing out.   
These were all from a default trunk build running with "mpirun --mca  
btl openib,self".


For all of these tests, I'm running on 4 nodes, 4 cores each, but  
they have varying numbers of network interfaces:


  nodes 1,2  nodes 3,4
openib3 active ports 2 active ports
tcp   4 tcp interfaces   3 tcp interfaces

Is anyone else seeing these kinds of failures?

--
Jeff Squyres
Cisco Systems



[OMPI devel] openib modular wireup

2007-07-31 Thread Jeff Squyres

Short version:
--

The modular wireup code on /tmp/jms-modular-wireup seems to be  
working.  Can people give it a whirl before I bring it back to the  
trunk?  The more esoteric your hardware setup, the better.


Longer version:
---

I think that I have completed round 1 of the modular wireup work in / 
tmp/jms-modular-wireup, meaning that all the wireup code has been  
moved out of btl_openib_endpoint.* and into connect/*.  The  
endpoint.c file now simply calls the connect interface through a  
function pointer (allowing the choice of the current RML-based wireup  
or the RDMA CM).  The selected connect "module" will call back to the  
openib endpoint for two things:


1. post receive buffers on a locally-created-but-not-yet-connected qp
2. when the qp is fully connected and ready to be used

This cleaned up the endpoint.* code a *lot*.  I also simplified the  
RML connection code a bit -- I removed some useless sub-functions, etc.


I *think* that this new connection code is all working, but per  
http://www.open-mpi.org/community/lists/devel/2007/07/2058.php, I'm  
seeing other weird failures so I'm a little reluctant to put this  
back on the trunk until I know that everything is working properly.   
Granted, the failures in the other post sound like pml errors and  
this should be a wholly separate issue (we would get different  
warnings/errors if the btl failed to connect), but still -- it seems  
a little safer to be prudent.


Still to do:

- make the static rate be exchanged and set properly during the RML  
wireup

- RDMA CM support (it returns ERR_NOT_IMPLEMENTED right now)

--
Jeff Squyres
Cisco Systems



Re: [OMPI devel] MPI_Win_get_group

2007-07-31 Thread Lisandro Dalcin
On 7/31/07, Jeff Squyres  wrote:
> Just curious -- why do you need to know if a handle refers to a
> predefined object?

If I understand correctly, new handles shoud be freed in order to do
not leak things, to follow good programming practices, and being
completelly sure a valgrind run do not report any problem.

I am working in the development of MPI for Python, a port of MPI to
Python, a high level language with automatic memory management. Said
that, in such an environment, having to call XXX.Free() for  every
object i get from a call like XXX.Get_something() is really an
unnecesary pain.

Many things in MPI are LOCAL (datatypes, groups, predefined
operations) and in general destroying them for user-space is
guaranteed by MPI to not conflict with system(MPI)-space and
communication (i.e. if you create a derived datatype four using it in
a construction of another derived datatype, you can safely free the
first).

Well, for all those LOCAL objects, I could implement automatic
deallocation of handles for Python (for Comm, Win, and File, that is
not so easy, at freeing them is a collective operation AFAIK, and
automaticaly freeing them can lead to deadlocks).

My Python wrappers (mpi4py) are inteded to be used in any platform
with any MPI implementation. But things are not so easy, as there are
many corner cases in the MPI standard.

Python es a wonderfull, powerfull language, very friendly to write
things. Prove of that is the many bug reports I provided here. By
using python, I can run all my unittest script in a single MPI run,
thus they have the potential to find interaction problems between all
parts of MPI. If any of you, OMPI developers, have some knowledge of
Python, I invite you to try mpi4py, as you would be able to write very
fast many many tests, not only for things that should work, but also
for things that should fail.

Sorry for the long mail. In short, many things in MPI are not clearly
designed for languages other than C and Fortran. Even in C++
specification, there are things that are unnaceptable, like the
open-door to the problem of having dangling references, which could be
avoided with negligible cost. Anyway, all those issues are minor for
me, and the MPI specification is just great. I hope I can find the
time to contribute to the MPI-2.1 effort to better define MPI behavior
in the corner cases (fortunatelly, there are a really small number of
them).

Regards,

-- 
Lisandro Dalcín
---
Centro Internacional de Métodos Computacionales en Ingeniería (CIMEC)
Instituto de Desarrollo Tecnológico para la Industria Química (INTEC)
Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET)
PTLC - Güemes 3450, (3000) Santa Fe, Argentina
Tel/Fax: +54-(0)342-451.1594