[OMPI devel] MPI_Mrecv(..., MPI_STATUS_IGNORE) in Open MPI 1.7.1

2013-05-01 Thread Lisandro Dalcin
It seems that Mrecv() tries to write on the status arg, even when it
is STATUS_IGNORE. Looking at the sources (pmrecv.c and pmprobe.c),
there are some memcheck code paths that access status but do not check
for STATUS_IGNORE, please review them.

$ cat tmp.c
#include 

int main(int argc, char *argv[])
{
  MPI_Message message;
  MPI_Init(&argc, &argv);
  message = MPI_MESSAGE_NO_PROC;
  MPI_Mrecv(NULL, 0, MPI_BYTE, &message, MPI_STATUS_IGNORE);
  MPI_Finalize();
  return 0;
}

$ mpicc tmp.c
$ valgrind ./a.out
...
==17489==
==17489== Invalid write of size 8
==17489==at 0x4CA811C: PMPI_Mrecv (pmrecv.c:62)
==17489==by 0x400816: main (in /tmp/a.out)
==17489==  Address 0x0 is not stack'd, malloc'd or (recently) free'd
==17489==
[localhost:17489] *** Process received signal ***
[localhost:17489] Signal: Segmentation fault (11)
[localhost:17489] Signal code: Address not mapped (1)
[localhost:17489] Failing at address: (nil)
...


--
Lisandro Dalcin
---
CIMEC (INTEC/CONICET-UNL)
Predio CONICET-Santa Fe
Colectora RN 168 Km 472, Paraje El Pozo
3000 Santa Fe, Argentina
Tel: +54-342-4511594 (ext 1011)
Tel/Fax: +54-342-4511169


[OMPI devel] MPI_Is_thread_main() with provided=MPI_THREAD_SERIALIZED

2013-09-04 Thread Lisandro Dalcin
I'm using Open MPI 1.6.5 as packaged in Fedora 19. This build does not
enable THREAD_MULTIPLE support:

$ ompi_info | grep Thread
  Thread support: posix (MPI_THREAD_MULTIPLE: no, progress: no)

In my code I call MPI_Init_thread(required=MPI_THREAD_MULTIPLE). After
that, MPI_Query_thread() returns MPI_THREAD_SERIALIZED. But calling
MPI_Is_thread_main() always return TRUE, either in the main thread or
in newly spawned threads.

I think this code is wrong for the case provided==MPI_THREAD_SERIALIZED :
https://bitbucket.org/ompiteam/ompi-svn-mirror/src/0a159982d7204d4b4b9fa61771d0fc7e9dc16771/ompi/mpi/c/is_thread_main.c?at=default#cl-50


-- 
Lisandro Dalcin
---
CIMEC (INTEC/CONICET-UNL)
Predio CONICET-Santa Fe
Colectora RN 168 Km 472, Paraje El Pozo
3000 Santa Fe, Argentina
Tel: +54-342-4511594 (ext 1011)
Tel/Fax: +54-342-4511169


[OMPI devel] Missing MPI 3 definitions

2014-03-27 Thread Lisandro Dalcin
In 1.7.5, you guys bumped MPI_VERSION to 3 but forgot to add
definitions for the following constants:

MPI_ERR_RMA_SHARED
MPI_WEIGHTS_EMPTY

Also, the following two functions are missing:

MPI_Comm_set_info()
MPI_Comm_get_info()

PS: The two missing functions are trivial to provide, the first could
simply ignore the info handle, and the second could just return a
brand new empty info handle (well, unless you implemented
MPI_Comm_dup_with_info() to actually use the info hints).


-- 
Lisandro Dalcin
---
CIMEC (UNL/CONICET)
Predio CONICET-Santa Fe
Colectora RN 168 Km 472, Paraje El Pozo
3000 Santa Fe, Argentina
Tel: +54-342-4511594 (ext 1016)
Tel/Fax: +54-342-4511169


[OMPI devel] Missing error strings for MPI_ERR_RMA_XXX error classes

2014-04-10 Thread Lisandro Dalcin
I'm testing openmpi-1.8.

MPI_Get_error_string() for the following error classes is failing. I
guess you just forgot to update the list of error strings.

MPI_ERR_RMA_RANGE
MPI_ERR_RMA_ATTACH
MPI_ERR_RMA_FLAVOR
MPI_ERR_RMA_SHARED

I'm attaching a simple test code for you to verify the issue.

Additionally, please update the following comment in mpi.h

/* Per MPI-3 p349 47, MPI_ERR_LASTCODE must be >= the last predefined
   MPI_ERR_ code.  So just set it equal to the last code --
   MPI_ERR_RMA_FLAVOR, in this case. */
#define MPI_ERR_LASTCODE  MPI_ERR_RMA_SHARED

The comment is wrong, the last predefined error class is
MPI_ERR_RMA_SHARED and not  MPI_ERR_RMA_FLAVOR.


-- 
Lisandro Dalcin
---
CIMEC (UNL/CONICET)
Predio CONICET-Santa Fe
Colectora RN 168 Km 472, Paraje El Pozo
3000 Santa Fe, Argentina
Tel: +54-342-4511594 (ext 1016)
Tel/Fax: +54-342-4511169
#include 
#include 
int main(int argc, char *argv[]) {
  int ierr;
  int errclasses[] = {
MPI_SUCCESS,
MPI_ERR_RMA_RANGE,
MPI_ERR_RMA_ATTACH,
MPI_ERR_RMA_FLAVOR,
MPI_ERR_RMA_SHARED
  };
  int resultlen = 0;
  char errstring[MPI_MAX_ERROR_STRING];
  int i, n = sizeof(errclasses)/sizeof(int);

  MPI_Init(0,0);
  MPI_Comm_set_errhandler(MPI_COMM_WORLD,MPI_ERRORS_RETURN);
  for (i=0; i

[OMPI devel] querying Op commutativity for predefined reduction operations.

2014-04-21 Thread Lisandro Dalcin
IMHO, MPI_Op_commutative() should not fail for predefined reduced operations.

[dalcinl@kw2060 openmpi]$ ompi_info --version
Open MPI v1.8

http://www.open-mpi.org/community/help/
[dalcinl@kw2060 openmpi]$ cat op_commutative.c
#include 
int main(int argc, char *argv[])
{
  int flag;
  MPI_Init(&argc,&argv);
  MPI_Op_commutative(MPI_SUM,&flag);
  MPI_Finalize();
  return 0;
}
[dalcinl@kw2060 openmpi]$ mpicc op_commutative.c
[dalcinl@kw2060 openmpi]$ ./a.out
[kw2060:19303] *** An error occurred in MPI_Op_commutative
[kw2060:19303] *** reported by process [140737157201921,140239272148992]
[kw2060:19303] *** on communicator MPI_COMM_WORLD
[kw2060:19303] *** MPI_ERR_OP: invalid reduce operation
[kw2060:19303] *** MPI_ERRORS_ARE_FATAL (processes in this
communicator will now abort,
[kw2060:19303] ***and potentially your MPI job)


-- 
Lisandro Dalcin
---
CIMEC (UNL/CONICET)
Predio CONICET-Santa Fe
Colectora RN 168 Km 472, Paraje El Pozo
3000 Santa Fe, Argentina
Tel: +54-342-4511594 (ext 1016)
Tel/Fax: +54-342-4511169


[OMPI devel] Win_fence() with assertion=MPI_MODE_NOPRECEDE|MPI_MODE_NOSUCCEED

2014-04-21 Thread Lisandro Dalcin
Open MPI errors in Win_fence() when the assertion contains both
MPI_MODE_NOPRECEDE and MPI_MODE_NOSUCCEED

Could you explain me why the following code is wrong? Please note that
the fence call with assertion !=0 is issued before and after fence
calls with assertion=0, and I'm not making any modification to the
local window, nor issuing any RMA call. My understanding of MPI RMA
operations is quite limited, but I would say that my code is valid and
should not fail.

[dalcinl@kw2060 openmpi]$ cat win_fence.c
#include 
int main(int argc, char *argv[])
{
  MPI_Win win;
  MPI_Init(&argc, &argv);
  MPI_Win_create(MPI_BOTTOM, 0, 1, MPI_INFO_NULL, MPI_COMM_SELF, &win);
  MPI_Win_fence(0, win);
  
MPI_Win_fence(MPI_MODE_NOSTORE|MPI_MODE_NOPUT|MPI_MODE_NOPRECEDE|MPI_MODE_NOSUCCEED,
win);
  MPI_Win_fence(0, win);
  MPI_Finalize();
  return 0;
}
[dalcinl@kw2060 openmpi]$ mpicc win_fence.c
[dalcinl@kw2060 openmpi]$ ./a.out
[kw2060:19890] *** An error occurred in MPI_Win_fence
[kw2060:19890] *** reported by process [140737129086977,0]
[kw2060:19890] *** on win rdma window 3
[kw2060:19890] *** MPI_ERR_ASSERT: invalid assert argument
[kw2060:19890] *** MPI_ERRORS_ARE_FATAL (processes in this win will now abort,
[kw2060:19890] ***and potentially your MPI job)
[dalcinl@kw2060 openmpi]$


-- 
Lisandro Dalcin
---
CIMEC (UNL/CONICET)
Predio CONICET-Santa Fe
Colectora RN 168 Km 472, Paraje El Pozo
3000 Santa Fe, Argentina
Tel: +54-342-4511594 (ext 1016)
Tel/Fax: +54-342-4511169


[OMPI devel] MPI_Type_create_hindexed_block() segfaults

2014-04-21 Thread Lisandro Dalcin
I believe the problem is in the following source code line
(file:ompi_datatype_args.c, line:221):

https://bitbucket.org/ompiteam/ompi-svn-mirror/src/v1.8/ompi/datatype/ompi_datatype_args.c?at=v1.8#cl-221

I think you should just remove that bogus line, and that's all.


[dalcinl@kw2060 openmpi]$ cat type_hindexed_block.c
#include 
int main(int argc, char *argv[])
{
  MPI_Aint disps[] = {0};
  MPI_Datatype datatype;
  MPI_Init(&argc, &argv);
  MPI_Type_create_hindexed_block(1, 1, disps, MPI_BYTE, &datatype);
  MPI_Finalize();
  return 0;
}
[dalcinl@kw2060 openmpi]$ mpicc type_hindexed_block.c
[dalcinl@kw2060 openmpi]$ ./a.out
[kw2060:20304] *** Process received signal ***
[kw2060:20304] Signal: Segmentation fault (11)
[kw2060:20304] Signal code: Address not mapped (1)
[kw2060:20304] Failing at address: 0x6a
[kw2060:20304] [ 0] /lib64/libpthread.so.0[0x327c40f750]
[kw2060:20304] [ 1] /lib64/libc.so.6[0x327bc94126]
[kw2060:20304] [ 2]
/home/devel/mpi/openmpi/1.8.0/lib/libmpi.so.1(ompi_datatype_set_args+0x7f1)[0x7f8f0158b62a]
[kw2060:20304] [ 3]
/home/devel/mpi/openmpi/1.8.0/lib/libmpi.so.1(MPI_Type_create_hindexed_block+0x24d)[0x7f8f015cedc8]
[kw2060:20304] [ 4] ./a.out[0x40080c]
[kw2060:20304] [ 5] /lib64/libc.so.6(__libc_start_main+0xf5)[0x327bc21d65]
[kw2060:20304] [ 6] ./a.out[0x4006f9]
[kw2060:20304] *** End of error message ***
Segmentation fault (core dumped)

-- 
Lisandro Dalcin
---
CIMEC (UNL/CONICET)
Predio CONICET-Santa Fe
Colectora RN 168 Km 472, Paraje El Pozo
3000 Santa Fe, Argentina
Tel: +54-342-4511594 (ext 1016)
Tel/Fax: +54-342-4511169


[OMPI devel] Issues with MPI_Add_error_class()

2014-04-21 Thread Lisandro Dalcin
It seems the implementation of MPI_Add_error_class() is out of sync
with the definition of MPI_ERR_LASTCODE.

Please review the list of error classes in mpi.h and the code in this
file: 
https://bitbucket.org/ompiteam/ompi-svn-mirror/src/v1.8/ompi/errhandler/errcode.c

BTW, in that file, all the MPI_T_ERR_XXX are not handled. The MPI-3
standard says they should be treated as other MPI error classes.
Trying to get an error string out of them (eg. MPI_T_ERR_MEMORY)
generates an error.



[dalcinl@kw2060 openmpi]$ cat add_error_class.c
#include 
#include 
int main(int argc, char *argv[])
{
  int errorclass,*lastused,flag;
  MPI_Init(&argc, &argv);
  MPI_Add_error_class(&errorclass);
  MPI_Comm_get_attr(MPI_COMM_WORLD, MPI_LASTUSEDCODE, &lastused, &flag);
  printf("errorclass:%d lastused:%d MPI_ERR_LASTCODE:%d\n",
errorclass, *lastused, MPI_ERR_LASTCODE);
  MPI_Finalize();
  return 0;
}
[dalcinl@kw2060 openmpi]$ mpicc add_error_class.c
[dalcinl@kw2060 openmpi]$ ./a.out
errorclass:54 lastused:54 MPI_ERR_LASTCODE:71


[dalcinl@kw2060 openmpi]$ cat error_string.c
#include 
#include 
int main(int argc, char *argv[])
{
  char errorstring[MPI_MAX_ERROR_STRING];
  int slen;
  MPI_Init(&argc, &argv);
  MPI_Error_string(MPI_T_ERR_MEMORY, errorstring, &slen);
  printf("errorclass:%d errorstring:%s\n", MPI_T_ERR_MEMORY, errorstring);
  MPI_Finalize();
  return 0;
}
[dalcinl@kw2060 openmpi]$ mpicc error_string.c
[dalcinl@kw2060 openmpi]$ ./a.out
[kw2060:20883] *** An error occurred in MPI_Error_string
[kw2060:20883] *** reported by process [140737332576257,0]
[kw2060:20883] *** on communicator MPI_COMM_WORLD
[kw2060:20883] *** MPI_ERR_ARG: invalid argument of some other kind
[kw2060:20883] *** MPI_ERRORS_ARE_FATAL (processes in this
communicator will now abort,
[kw2060:20883] ***    and potentially your MPI job)

-- 
Lisandro Dalcin
---
CIMEC (UNL/CONICET)
Predio CONICET-Santa Fe
Colectora RN 168 Km 472, Paraje El Pozo
3000 Santa Fe, Argentina
Tel: +54-342-4511594 (ext 1016)
Tel/Fax: +54-342-4511169


[OMPI devel] Different behaviour with MPI_IN_PLACE in MPI_Reduce_scatter() and MPI_Ireduce_scatter()

2014-04-21 Thread Lisandro Dalcin
I'm not sure this is actually a bug, but the difference may surprise
users. It seems that the implementation of
MPI_Ireduce_scatter(MPI_IN_PLACE,...) (ab?)uses the recvbuf to compute
the intermediate reduction, while MPI_Reduce_scatter(MPI_IN_PLACE,...)
does not.

Look at the following code (setup to be run in up to 16 processes).
While MPI_Reduce_scatter() does not change the second and following
elements of recvbuf, the nonblocking variant do modify the second and
following entries in some ranks.


[dalcinl@kw2060 openmpi]$ cat ireduce_scatter.c
#include 
#include 
#include 
int main(int argc, char *argv[])
{
  int i,size,rank;
  int recvbuf[] = {1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1};
  int rcounts[] = {1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1};
  MPI_Init(&argc, &argv);
  MPI_Comm_size(MPI_COMM_WORLD, &size);
  MPI_Comm_rank(MPI_COMM_WORLD, &rank);
  if (size > 16) MPI_Abort(MPI_COMM_WORLD,1);
#ifndef NBCOLL
#define NBCOLL 1
#endif
#if NBCOLL
  {
MPI_Request request;
MPI_Ireduce_scatter(MPI_IN_PLACE, recvbuf, rcounts, MPI_INT,
MPI_SUM, MPI_COMM_WORLD, &request);
MPI_Wait(&request,MPI_STATUS_IGNORE);
  }
#else
  MPI_Reduce_scatter(MPI_IN_PLACE, recvbuf, rcounts, MPI_INT,
 MPI_SUM, MPI_COMM_WORLD);
#endif
  printf("[%d] rbuf[%d]=%2d  expected:%2d\n", rank, 0, recvbuf[i], size);
  for (i=1; i

[OMPI devel] MPI_Comm_create_group()

2014-04-21 Thread Lisandro Dalcin
uninitialised byte(s)
==22675==at 0x327BCBCCF9: _Exit (in /usr/lib64/libc-2.18.so)
==22675==by 0x327BC3948A: __run_exit_handlers (in /usr/lib64/libc-2.18.so)
==22675==by 0x327BC39514: exit (in /usr/lib64/libc-2.18.so)
==22675==by 0x4FEF419: orte_ess_base_app_abort (ess_base_std_app.c:450)
==22675==by 0x4CF53C5: ompi_rte_abort (rte_orte_module.c:81)
==22675==by 0x4C60B04: ompi_mpi_abort (ompi_mpi_abort.c:203)
==22675==by 0x4C4B6AA: backend_fatal (errhandler_predefined.c:346)
==22675==by 0x4C4AB7C: ompi_mpi_errors_are_fatal_comm_handler
(errhandler_predefined.c:69)
==22675==by 0x4C4A63E: ompi_errhandler_invoke (errhandler_invoke.c:53)
==22675==by 0x4C81E81: PMPI_Comm_create_group (pcomm_create_group.c:79)
==22675==by 0x4008FF: main (in /home/dalcinl/Devel/BUGS-MPI/openmpi/a.out)
==22675==


-- 
Lisandro Dalcin
---
CIMEC (UNL/CONICET)
Predio CONICET-Santa Fe
Colectora RN 168 Km 472, Paraje El Pozo
3000 Santa Fe, Argentina
Tel: +54-342-4511594 (ext 1016)
Tel/Fax: +54-342-4511169


[OMPI devel] Patch to fix valgrind warning

2014-04-24 Thread Lisandro Dalcin
Please review the attached patch,

==19533== Conditional jump or move depends on uninitialised value(s)
==19533==at 0x140DAB78: component_select (osc_sm_component.c:352)
==19533==by 0xD9BA0B2: ompi_osc_base_select (osc_base_init.c:73)
==19533==by 0xD9314C1: ompi_win_allocate (win.c:182)
==19533==by 0xD982C4E: PMPI_Win_allocate (pwin_allocate.c:79)
==19533==by 0xD628887: __pyx_pw_6mpi4py_3MPI_3Win_11Allocate
(mpi4py.MPI.c:109170)
==19533==by 0x38442E0BD3: PyEval_EvalFrameEx (in
/usr/lib64/libpython2.7.so.1.0)
==19533==by 0x38442E21EC: PyEval_EvalCodeEx (in
/usr/lib64/libpython2.7.so.1.0)
==19533==by 0x38442E22F1: PyEval_EvalCode (in
/usr/lib64/libpython2.7.so.1.0)
==19533==by 0x38442F20DB: PyImport_ExecCodeModuleEx (in
/usr/lib64/libpython2.7.so.1.0)
==19533==by 0x38442F2357: ??? (in /usr/lib64/libpython2.7.so.1.0)
==19533==by 0x38442F2FF0: ??? (in /usr/lib64/libpython2.7.so.1.0)
==19533==by 0x38442F323C: ??? (in /usr/lib64/libpython2.7.so.1.0)
==19533==
==19533== Conditional jump or move depends on uninitialised value(s)
==19533==at 0x140DAB78: component_select (osc_sm_component.c:352)
==19533==by 0xD9BA0B2: ompi_osc_base_select (osc_base_init.c:73)
==19533==by 0xD93174D: ompi_win_allocate_shared (win.c:213)
==19533==by 0xD982FD0: PMPI_Win_allocate_shared (pwin_allocate_shared.c:80)
==19533==by 0xD62C727:
__pyx_pw_6mpi4py_3MPI_3Win_13Allocate_shared (mpi4py.MPI.c:109409)
==19533==by 0x38442E0BD3: PyEval_EvalFrameEx (in
/usr/lib64/libpython2.7.so.1.0)
==19533==by 0x38442E21EC: PyEval_EvalCodeEx (in
/usr/lib64/libpython2.7.so.1.0)
==19533==by 0x38442E22F1: PyEval_EvalCode (in
/usr/lib64/libpython2.7.so.1.0)
==19533==by 0x38442F20DB: PyImport_ExecCodeModuleEx (in
/usr/lib64/libpython2.7.so.1.0)
==19533==by 0x38442F2357: ??? (in /usr/lib64/libpython2.7.so.1.0)
==19533==by 0x38442F2FF0: ??? (in /usr/lib64/libpython2.7.so.1.0)
==19533==by 0x38442F323C: ??? (in /usr/lib64/libpython2.7.so.1.0)


-- 
Lisandro Dalcin
---
CIMEC (UNL/CONICET)
Predio CONICET-Santa Fe
Colectora RN 168 Km 472, Paraje El Pozo
3000 Santa Fe, Argentina
Tel: +54-342-4511594 (ext 1016)
Tel/Fax: +54-342-4511169
diff -up ompi/mca/osc/sm/osc_sm_component.c.orig 
ompi/mca/osc/sm/osc_sm_component.c
--- ompi/mca/osc/sm/osc_sm_component.c.orig 2014-04-24 10:28:58.790702380 
+0300
+++ ompi/mca/osc/sm/osc_sm_component.c  2014-04-24 10:30:15.138137733 +0300
@@ -341,7 +341,7 @@ component_select(struct ompi_win_t *win,
 #if HAVE_PTHREAD_CONDATTR_SETPSHARED && HAVE_PTHREAD_MUTEXATTR_SETPSHARED
 pthread_mutexattr_t mattr;
 pthread_condattr_t cattr;
-bool blocking_fence;
+bool blocking_fence = false;
 int flag;
 
 if (OMPI_SUCCESS != ompi_info_get_bool(info, "blocking_fence",
@@ -349,7 +349,7 @@ component_select(struct ompi_win_t *win,
 goto error;
 }
 
-if (blocking_fence) {
+if (flag && blocking_fence) {
 ret = pthread_mutexattr_init(&mattr);
 ret = pthread_mutexattr_setpshared(&mattr, PTHREAD_PROCESS_SHARED);
 if (ret != 0) {


[OMPI devel] likely bad return from MPI_File_c2f

2009-02-10 Thread Lisandro Dalcin
Try to run the trivial program below. I MPI_File_c2f(MPI_FILE_NULL)
returns "-1" (minus one), however it seems the routine should return
"0" (zero).

#include 
#include 
int main()
{
  MPI_Fint i;
  MPI_File f;
  MPI_Init(0,0);
  i = MPI_File_c2f(MPI_FILE_NULL);
  printf("MPI_File_c2f(MPI_FILE_NULL) -> %d\n", i);
  f = MPI_File_f2c(0);
  printf("MPI_File_f2c(0) == MPI_FILE_NULL -> %s\n", (f ==
MPI_FILE_NULL)?"yes":"no");
  MPI_Finalize();
}


-- 
Lisandro Dalcín
---
Centro Internacional de Métodos Computacionales en Ingeniería (CIMEC)
Instituto de Desarrollo Tecnológico para la Industria Química (INTEC)
Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET)
PTLC - Güemes 3450, (3000) Santa Fe, Argentina
Tel/Fax: +54-(0)342-451.1594



[OMPI devel] possible bugs and unexpected values in returned errors classes

2009-02-11 Thread Lisandro Dalcin
Below a list of stuff that I've got by running mpi4py testsuite. Never
reported them before just because some of them are not actually
errors, but anyway, I want to raise the discussion.

- Likely bugs (regarding my interpretation of the MPI standard)

1) When passing MPI_REQUEST_NULL, MPI_Request_free() DO NOT fail.

2) When passing MPI_REQUEST_NULL, MPI_Cancel() DO NOT fail.

3) When passing MPI_REQUEST_NULL, MPI_Request_get_status() DO NOT fail.

4)  When passing MPI_WIN_NULL, MPI_Win_get_errhandler() and
MPI_Win_set_errhandler()  DO NOT fail.


- Unexpected errors classes (at least for me)

1) When passing MPI_COMM_NULL, MPI_Comm_get_errhandler() fails with
MPI_ERR_ARG. I would expect MPI_ERR_COMM.

2) MPI_Type_free() fails with MPI_ERR_INTERN when passing predefined
datatypes like MPI_INT or MPI_FLOAT. I would expect MPI_ERR_TYPE.


- Controversial (I'm even fine with the current behavior)

1) MPI_Info_get_nthkey(info, n) returns MPI_ERR_INFO_KEY when "n" is
larger that the number of keys. Perhaps MPI_ERR_ARG would be more
appropriate? A possible rationale would be that the error is not
related to the contents of a 'key' string, but an out of range value
for "n".


That's all. Sorry for being so pedantic :-) and not offering help for
the patches, but I'm really busy.


-- 
Lisandro Dalcín
---
Centro Internacional de Métodos Computacionales en Ingeniería (CIMEC)
Instituto de Desarrollo Tecnológico para la Industria Química (INTEC)
Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET)
PTLC - Güemes 3450, (3000) Santa Fe, Argentina
Tel/Fax: +54-(0)342-451.1594



Re: [OMPI devel] possible bugs and unexpected values in returned errors classes

2009-02-16 Thread Lisandro Dalcin
On Thu, Feb 12, 2009 at 10:02 PM, Jeff Squyres  wrote:
> On Feb 11, 2009, at 8:24 AM, Lisandro Dalcin wrote:
>
>> Below a list of stuff that I've got by running mpi4py testsuite. Never
>> reported them before just because some of them are not actually
>> errors, but anyway, I want to raise the discussion.
>>
>> - Likely bugs (regarding my interpretation of the MPI standard)
>>
>> 1) When passing MPI_REQUEST_NULL, MPI_Request_free() DO NOT fail.
>>
>> 2) When passing MPI_REQUEST_NULL, MPI_Cancel() DO NOT fail.
>>
>> 3) When passing MPI_REQUEST_NULL, MPI_Request_get_status() DO NOT fail.
>
> I agree with all of these; I'm not sure why we allowed MPI_REQUEST_NULL.  I
> double checked LAM/MPI -- it errors in all of these cases.  So OMPI now
> does, too.
>
>> 4)  When passing MPI_WIN_NULL, MPI_Win_get_errhandler() and
>> MPI_Win_set_errhandler()  DO NOT fail.
>
> I was a little more dubious here; the param checking code was specifically
> checking for MPI_WIN_NULL and not classifying it as an error.  Digging to
> find out why we did that, the best that I can come up with is that it is
> *not* an error to call MPI_File_set|get_errhandler on MPI_FILE_NULL (to set
> behavior for what happens when FILE_OPEN fails); I'm *guessing* that we
> simply copied the _File_ code to the _Win_ code and forgot to remove that
> extra check.
>
> I can't find anything in MPI-2.1 that says it is legal to call set|get
> errhandler on MPI_WIN_NULL.  I checked LAM as well; LAM errors in this case.
>  So I made this now be an error in OMPI as well.
>
> Do you need these in the 1.3 series?  Or are you ok waiting for 1.4
> (assuming 1.4 takes significantly less time to release than 1.3 :-) ).
>

I do not have a strong need to get those fixes in 1.3 series. In
mpi4py, I have some compatibility layer in a implementation by
implementation (well, actually just MPICH 1/2, Open MPI and LAM) and
release by release basis trying to hide those small discrepancies and
bugs in the MPI's out there.

>> - Unexpected errors classes (at least for me)
>>
>> 1) When passing MPI_COMM_NULL, MPI_Comm_get_errhandler() fails with
>> MPI_ERR_ARG. I would expect MPI_ERR_COMM.
>
> I don't have a strong feeling on this one; I think you could probably argue
> either way.  That being said, we haven't paid too close attention to the
> error values that we return.  Unfortunately, I don't think there's much
> standardization between different MPI implementations, unless they share a
> common code ancestry.
>

You are right... However, IMHO, some agreement between Open MPI and
MPICH2 would be great, right :) ? In the end, they are the
reference/basis for other implementations.

>> 2) MPI_Type_free() fails with MPI_ERR_INTERN when passing predefined
>> datatypes like MPI_INT or MPI_FLOAT. I would expect MPI_ERR_TYPE.
>
> Ya, that seems weird.  Fixed.
>
>> - Controversial (I'm even fine with the current behavior)
>>
>> 1) MPI_Info_get_nthkey(info, n) returns MPI_ERR_INFO_KEY when "n" is
>> larger that the number of keys. Perhaps MPI_ERR_ARG would be more
>> appropriate? A possible rationale would be that the error is not
>> related to the contents of a 'key' string, but an out of range value
>> for "n".
>
> I don't have a particular opinion on this one.
>
>> That's all. Sorry for being so pedantic :-) and not offering help for
>> the patches, but I'm really busy.
>
>
> No worries; this stuff is great.  Thanks -- and keep it coming!  (we usually
> remember to cite people who submit stuff like this; e.g.,
> https://svn.open-mpi.org/trac/ompi/changeset/20537 and
> https://svn.open-mpi.org/trac/ompi/changeset/20538).
>

Jeff, once again, many thanks for you fast response, and even more
thanks for fixing the issues!


> --
> Jeff Squyres
> Cisco Systems
>
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>



-- 
Lisandro Dalcín
---
Centro Internacional de Métodos Computacionales en Ingeniería (CIMEC)
Instituto de Desarrollo Tecnológico para la Industria Química (INTEC)
Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET)
PTLC - Güemes 3450, (3000) Santa Fe, Argentina
Tel/Fax: +54-(0)342-451.1594



Re: [OMPI devel] possible bugs and unexpected values in returned errors classes

2009-02-16 Thread Lisandro Dalcin
Just found something new to comment after diving into the actual sources

On Thu, Feb 12, 2009 at 10:02 PM, Jeff Squyres  wrote:
> On Feb 11, 2009, at 8:24 AM, Lisandro Dalcin wrote:
>>
>> 1) When passing MPI_COMM_NULL, MPI_Comm_get_errhandler() fails with
>> MPI_ERR_ARG. I would expect MPI_ERR_COMM.
>
> I don't have a strong feeling on this one; I think you could probably argue
> either way.  That being said, we haven't paid too close attention to the
> error values that we return.  Unfortunately, I don't think there's much
> standardization between different MPI implementations, unless they share a
> common code ancestry.
>

After running my testsuite again and next looking at
"ompi/mpi/c/comm_set_errhandler.c", I noticed that
MPI_Comm_set_errhandler() do return MPI_ERR_COMM when invalid
communicators are passed. IMHO, for the sake of consistency, you
should fix MPI_Comm_get_errhandler() to behave the same as the setter.
Would this rationale be enough?


-- 
Lisandro Dalcín
---
Centro Internacional de Métodos Computacionales en Ingeniería (CIMEC)
Instituto de Desarrollo Tecnológico para la Industria Química (INTEC)
Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET)
PTLC - Güemes 3450, (3000) Santa Fe, Argentina
Tel/Fax: +54-(0)342-451.1594



Re: [OMPI devel] possible bugs and unexpected values in returned errors classes

2009-02-19 Thread Lisandro Dalcin
On Thu, Feb 19, 2009 at 10:54 AM, Jeff Squyres  wrote:
> On Feb 16, 2009, at 9:14 AM, Lisandro Dalcin wrote:
>
>> After running my testsuite again and next looking at
>> "ompi/mpi/c/comm_set_errhandler.c", I noticed that
>> MPI_Comm_set_errhandler() do return MPI_ERR_COMM when invalid
>> communicators are passed. IMHO, for the sake of consistency, you
>> should fix MPI_Comm_get_errhandler() to behave the same as the setter.
>> Would this rationale be enough?
>
>
> Looks like we're a bit all over the map:
>
> - comm_set_errhandler: mpi_err_comm
> - comm_get_errhandler: mpi_err_arg
> - file_set_errhandler: mpi_err_file
> - file_get_errhandler: mpi_err_file
> - win_set_errhandler: mpi_err_arg
> - win_get_errhandler: mpi_err_arg
>
> I agree that it would be good to have these all be consistent.  Just to be
> sure: are you saying you'd prefer MPI_ERR_COMM|FILE|WIN for each of these
> (respectively), vs. all of them returning MPI_ERR_ARG?
>

Yes, I prefer the MPI_ERR_COMM|FILE|WIN if you pass the null handle to
the MPI_XXX_{get|set}_errhandler. Of course, remember that for
MPI_File, the rules are a bit different: MPI_FILE_NULL have to be
special-cased as it is a valid handle for this call...

OTOH, if you have a valid Com/File/Win handle, but you try to set
MPI_ERRHANDLER_NULL, then in all cases we should get MPI_ERR_ARG (as
MPI does not provide a dedicated error class for signaling invalid
Errhandler handles).



> --
> Jeff Squyres
> Cisco Systems
>
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>



-- 
Lisandro Dalcín
---
Centro Internacional de Métodos Computacionales en Ingeniería (CIMEC)
Instituto de Desarrollo Tecnológico para la Industria Química (INTEC)
Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET)
PTLC - Güemes 3450, (3000) Santa Fe, Argentina
Tel/Fax: +54-(0)342-451.1594



[OMPI devel] some comments on attribute catching, create/free() keyvals and all that.

2009-03-13 Thread Lisandro Dalcin
I've posted this to MPICH2-Dev, and then decided to re-post this here,
at the behavior of Open MPI is exactly the same.

You may also want to try the code right below, and next the one at the
end of the forwarder message.

#include 
#include 
int main( int argc, char ** argv ) {
  int Key1, tmp1, Key2, tmp2;
  MPI_Init(&argc, &argv);

  MPI_Keyval_create(MPI_NULL_COPY_FN, MPI_NULL_DELETE_FN, &Key1,(void *) 0);
  tmp1=Key1;
  MPI_Keyval_free(&tmp1);

  MPI_Keyval_create(MPI_NULL_COPY_FN, MPI_NULL_DELETE_FN, &Key2, (void *) 0);
  tmp2=Key2;
  MPI_Keyval_free(&tmp2);

  MPI_Finalize();

  printf("MPI_KEYVAL_INVALID: %d\n", MPI_KEYVAL_INVALID);

  printf("Key1: %d\n", Key1);
  printf("tmp1: %d\n", tmp1);

  printf("Key2: %d\n", Key2);
  printf("tmp2: %d\n", tmp2);

  return 0;
}


-- Forwarded message --
From: Lisandro Dalcin 
List-Post: devel@lists.open-mpi.org
Date: Fri, Mar 13, 2009 at 4:01 PM
Subject: some comments on attribute catching, create/free() keyvals
and all  that.
To: mpich2-...@mcs.anl.gov


As I've shown in a previous emal, MPICH2 likely implement
create()/free() for keyvals using a counter that is incr/decr ...

Now, give a try to the code pasted below. This shows that (at least in
MPICH2), MPI_Keyval_free() have to be used with great care as it is
IMHO dangerous, and basically these calls should be all done near
MPI_Finalize() time... or bad things could happen...

The only reference I can found in the MPI standard is at
(http://www.mpi-forum.org/docs/mpi21-report-bw/node147.htm#Node147)
where MPI_Comm_free_keyval() is explained... However, I believe that
description is talking about different things...

Should MPICH2 stop decrefing the keyval counter? You know, about 2<<31
values should be enough, right ;-) ?
But then.. What the pourpose of having MPI_Keyval_free()? Just to
invalidate de passed value by setting it to KEYVAL_INVALID?


#include 
#include 

int free_KeyVal(MPI_Comm c, int k, void *v,void *ctx)
{
 printf("free_KeyVal()\n");
 return MPI_SUCCESS;
}

int main( int argc, char ** argv ) {
 int Key1, Key2, Val1=1, Val2=2, ValOut;
 MPI_Init(&argc, &argv);

 MPI_Keyval_create(MPI_NULL_COPY_FN, free_KeyVal, &Key1,(void *) 0);
 MPI_Attr_put(MPI_COMM_SELF, Key1, &Val1);
 MPI_Keyval_free(&Key1);

 MPI_Keyval_create(MPI_NULL_COPY_FN, MPI_NULL_DELETE_FN, &Key2,(void *) 0);
 MPI_Attr_put(MPI_COMM_SELF, Key2, &Val2);
 MPI_Keyval_free(&Key2);

 MPI_Finalize();

 return 0;
}


--
Lisandro Dalcín
---
Centro Internacional de Métodos Computacionales en Ingeniería (CIMEC)
Instituto de Desarrollo Tecnológico para la Industria Química (INTEC)
Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET)
PTLC - Güemes 3450, (3000) Santa Fe, Argentina
Tel/Fax: +54-(0)342-451.1594



-- 
Lisandro Dalcín
---
Centro Internacional de Métodos Computacionales en Ingeniería (CIMEC)
Instituto de Desarrollo Tecnológico para la Industria Química (INTEC)
Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET)
PTLC - Güemes 3450, (3000) Santa Fe, Argentina
Tel/Fax: +54-(0)342-451.1594



[OMPI devel] MPI_Accumulate() with MPI_PROC_NULL target rank

2009-07-15 Thread Lisandro Dalcin
The MPI 2-1 standard says:

"MPI_PROC_NULL is a valid target rank in the MPI RMA calls
MPI_ACCUMULATE, MPI_GET, and MPI_PUT. The effect is the same as for
MPI_PROC_NULL in MPI point-to-point communication. After any RMA
operation with rank MPI_PROC_NULL, it is still necessary to finish the
RMA epoch with the synchronization method that started the epoch."


Unfortunately, MPI_Accumulate() is not quite the same as
point-to-point, as a reduction is involved. Suppose you make this call
(let me abuse and use keyword arguments):

MPI_Accumulate(..., target_rank=MPI_PROC_NULL,
target_datatype=MPI_BYTE, op=MPI_SUM, ...)

IIUC, the call fails (with MPI_ERR_OP) in Open MPI because MPI_BYTE is
an invalid datatype for MPI_SUM.

But provided that the target rank is MPI_PROC_NULL, would it make
sense for the call to success?


-- 
Lisandro Dalcín
---
Centro Internacional de Métodos Computacionales en Ingeniería (CIMEC)
Instituto de Desarrollo Tecnológico para la Industria Química (INTEC)
Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET)
PTLC - Güemes 3450, (3000) Santa Fe, Argentina
Tel/Fax: +54-(0)342-451.1594



[OMPI devel] Cannot Free() a datatype created with Dup() or Create_resized()

2009-08-31 Thread Lisandro Dalcin
In current ompi-trunk (svn up'ed and built a few minutes ago), a
Free() from a datatype obtained with Dup() or Create_resized() from a
predefined datatype is failing with ERR_TYPE...

Is this change intentional or is it a regression?


$ cat typedup.py
from mpi4py import MPI
t = MPI.INT.Dup()
t.Free()

$ python typedup.py
Traceback (most recent call last):
  File "typedup.py", line 3, in 
t.Free()
  File "Datatype.pyx", line 328, in mpi4py.MPI.Datatype.Free
(src/mpi4py.MPI.c:28632)
mpi4py.MPI.Exception: MPI_ERR_TYPE: invalid datatype


-- 
Lisandro Dalcín
---
Centro Internacional de Métodos Computacionales en Ingeniería (CIMEC)
Instituto de Desarrollo Tecnológico para la Industria Química (INTEC)
Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET)
PTLC - Güemes 3450, (3000) Santa Fe, Argentina
Tel/Fax: +54-(0)342-451.1594



[OMPI devel] more bug/comments for current trunk

2009-09-02 Thread Lisandro Dalcin
Disclaimer: this is for trunk svn up'ed yesterday.

The code below should fail with ERR_COMM, but it succeed...

#include 
int main(int argc, char **argv)
{
  int *value, flag;
  MPI_Init(NULL, NULL);
  MPI_Comm_get_attr(MPI_COMM_NULL, MPI_TAG_UB, &value, &flag);
  MPI_Finalize();
  return 0;
}


Additionally, this is really not a bug, but I'll comment about it
anyway (I think I've commented about this some time ago)...

I would expect the two codes below to fail with MPI_ERR_KEYVAL, but
they fail with MPI_ERR_OTHER...

#include 
int main(int argc, char **argv)
{
  int *value, flag;
  MPI_Init(NULL, NULL);
  MPI_Comm_get_attr(MPI_COMM_WORLD, MPI_KEYVAL_INVALID, &value, &flag);
  MPI_Finalize();
  return 0;
}

#include 
int main(int argc, char **argv)
{
  MPI_Win win;
  int *value, flag;
  MPI_Init(NULL, NULL);
  MPI_Win_create(MPI_BOTTOM, 0, 1,
 MPI_INFO_NULL, MPI_COMM_SELF, &win);
  MPI_Win_get_attr(win, MPI_KEYVAL_INVALID, &value, &flag);
  MPI_Win_free(&win);
  MPI_Finalize();
  return 0;
}


-- 
Lisandro Dalcín
---
Centro Internacional de Métodos Computacionales en Ingeniería (CIMEC)
Instituto de Desarrollo Tecnológico para la Industria Química (INTEC)
Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET)
PTLC - Güemes 3450, (3000) Santa Fe, Argentina
Tel/Fax: +54-(0)342-451.1594



[OMPI devel] Dynamic languages, dlopen() issues, and symbol visibility of libtool ltdl API in current trunk

2009-09-16 Thread Lisandro Dalcin
Hi all.. I have to contact you again about the issues related to
dlopen()ing libmpi with RTLD_LOCAL, as many dynamic languages (Python
in my case) do.

So far, I've been able to manage the issues (despite the "do nothing"
policy from Open MPI devs, which I understand) in a more or less
portable manner by taking advantage of the availability of libtool
ltdl symbols in the Open MPI libraries (specifically, in libopen-pal).
For reference, all this hackery is here:
http://code.google.com/p/mpi4py/source/browse/trunk/src/compat/openmpi.h

However, I noticed that in current trunk (v1.4, IIUC) things have
changed and libtool symbols are not externally available. Again, I
understand the reason and acknowledge that such change is a really
good thing. However, this change has broken all my hackery for
dlopen()ing libmpi before the call to MPI_Init().

Is there any chance that libopen-pal could provide some properly
prefixed (let say, using "opal_" as a prefix) wrapper calls to a small
subset of the libtool ltdl API? The following set of wrapper calls
would is the minimum required to properly load libmpi in a portable
manner and cleanup resources (let me abuse of my previous suggestion
and add the opal_ prefix):

opal_lt_dlinit()
opal_lt_dlexit()

opal_lt_dladvise_init(a)
opal_lt_dladvise_destroy(a)
opal_lt_dladvise_global(a)
opal_lt_dladvise_ext(a)

opal_lt_dlopenadvise(n,a)
opal_lt_dlclose(h)

Any chance this request could be considered? I would really like to
have this before any Open MPI tarball get released without libtool
symbols exposed...


-- 
Lisandro Dalcín
---
Centro Internacional de Métodos Computacionales en Ingeniería (CIMEC)
Instituto de Desarrollo Tecnológico para la Industria Química (INTEC)
Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET)
PTLC - Güemes 3450, (3000) Santa Fe, Argentina
Tel/Fax: +54-(0)342-451.1594



Re: [OMPI devel] Dynamic languages, dlopen() issues, and symbol visibility of libtool ltdl API in current trunk

2009-09-22 Thread Lisandro Dalcin
On Mon, Sep 21, 2009 at 9:45 AM, Jeff Squyres  wrote:
> Ick; I appreciate Lisandro's quandry, but don't quite know what to do.
>

I'm just asking the library "libopen-pal.so" exposing ltdl calls
wrapped with an "opal_" prefix. This way, the original ltdl calls hare
hidden (no chance to collide with user code using an incompatible
libtool version), but Open MPI provides a portable way to dlopen()
shared libs/dynamic modules. In simple terms, I'm asking
"libopen-pal.so" to contain ltdl wrapper calls like this one:

OMPI_DECLSPEC lt_dlhandle opal_lt_dlopenadvise(const char *filename,
lt_dladvise advise) /* note opal_ prefix! */
{
   return lt_dlopenadvise(filename,advise); /* original ltdl call*/
}


Then, third-party code (like mpi4py or any other dynamic MPI module
for any other dynamic language) can do this:

#include 
#if defined(OPEN_MPI)
typedef void *lt_dlhandle;
typedef void *lt_dladvise;
OMPI_DECLSPEC extern lt_dlhandle opal_lt_dlopenadvise(const char *, lt_dladvise)
#endif
...
#if defined(OPEN_MPI)
/* init advice, not shown ... */
opal_lt_dlopenadvise("mpi", advice);
/* destroy advice, not shown ... */
#endif
MPI_Init(0,0);

>
> How about keeping libltdl fvisibility=hidden inside mpi4py?
>

Not sure if I was clear enough in my comments above, but mpi4py does
not bundles/link libtool. Just abuses on libtool availability in
"libopen-pal.so" for the sake of portability.

>
> On Sep 17, 2009, at 11:16 AM, Josh Hursey wrote:
>
>> So I started down this road a couple months ago. I was using the
>> lt_dlopen() and friends in the OPAL CRS self module. The visibility
>> changes broke that functionality. The one solution that I started
>> implementing was precisely what you suggested, wrapping a subset the
>> libtool calls and prefixing them with opal_*. The email thread is below:
>>   http://www.open-mpi.org/community/lists/devel/2009/07/6531.php
>>
>> The problem that I hit was that libtool's build system did not play
>> well with the visibility symbols. This caused dlopen to be disabled
>> incorrectly. The libtool folks have a patch and, I believe, they are
>> planning on incorporating in the next release. The email thread is
>> below:
>>   http://thread.gmane.org/gmane.comp.gnu.libtool.patches/9446
>>
>> So we would (others can speak up if not) certainly consider such a
>> wrapper, but I think we need to wait for the next libtool release
>> (unless there is other magic we can do) before it would be usable.
>>
>> Do others have any other ideas on how we might get around this in the
>> mean time?
>>
>> -- Josh
>>
>>
>> On Sep 16, 2009, at 5:59 PM, Lisandro Dalcin wrote:
>>
>> > Hi all.. I have to contact you again about the issues related to
>> > dlopen()ing libmpi with RTLD_LOCAL, as many dynamic languages (Python
>> > in my case) do.
>> >
>> > So far, I've been able to manage the issues (despite the "do nothing"
>> > policy from Open MPI devs, which I understand) in a more or less
>> > portable manner by taking advantage of the availability of libtool
>> > ltdl symbols in the Open MPI libraries (specifically, in libopen-pal).
>> > For reference, all this hackery is here:
>> > http://code.google.com/p/mpi4py/source/browse/trunk/src/compat/openmpi.h
>> >
>> > However, I noticed that in current trunk (v1.4, IIUC) things have
>> > changed and libtool symbols are not externally available. Again, I
>> > understand the reason and acknowledge that such change is a really
>> > good thing. However, this change has broken all my hackery for
>> > dlopen()ing libmpi before the call to MPI_Init().
>> >
>> > Is there any chance that libopen-pal could provide some properly
>> > prefixed (let say, using "opal_" as a prefix) wrapper calls to a small
>> > subset of the libtool ltdl API? The following set of wrapper calls
>> > would is the minimum required to properly load libmpi in a portable
>> > manner and cleanup resources (let me abuse of my previous suggestion
>> > and add the opal_ prefix):
>> >
>> > opal_lt_dlinit()
>> > opal_lt_dlexit()
>> >
>> > opal_lt_dladvise_init(a)
>> > opal_lt_dladvise_destroy(a)
>> > opal_lt_dladvise_global(a)
>> > opal_lt_dladvise_ext(a)
>> >
>> > opal_lt_dlopenadvise(n,a)
>> > opal_lt_dlclose(h)
>> >
>> > Any chance this request could be considered? I would really like to
>> > have this before any Open MPI tarball get released without libtool
>> > symbols exposed...
>> &

[OMPI devel] ompi-trunk: have MPI_REAL2 (if available) but missing MPI_COMPLEX4

2009-09-23 Thread Lisandro Dalcin
Disclaimer: I have almost no experience with Fortran, nor I'm needing
this, but anyway (perhaps just as a reminder for you) :-)...

Provided that:

1) Open MPI exposes MPI_LOGICAL{1|2|4|8}, and they are not (AFAIK)
listed in the MPI standard (I cannot found them in MPI-2.2)

2) The MPI-2.2 standard DO list MPI COMPLEX4 (at least in 2.2) ...

would it make sense that you add MPI_COMPLEX4 support ASAP, even
before full MPI-2.2 support?

-- 
Lisandro Dalcín
---
Centro Internacional de Métodos Computacionales en Ingeniería (CIMEC)
Instituto de Desarrollo Tecnológico para la Industria Química (INTEC)
Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET)
PTLC - Güemes 3450, (3000) Santa Fe, Argentina
Tel/Fax: +54-(0)342-451.1594



Re: [OMPI devel] ompi-trunk: have MPI_REAL2 (if available) but missing MPI_COMPLEX4

2009-09-26 Thread Lisandro Dalcin
Jeff, about ticket #2032 ...

I was not asking you to eliminate MPI_LOGICALx, just asked for
MPI_COMPLEX4 to be added...

If you still think the MPI_LOGICALx should be removed from Open MPI,
what about rename the macros to OMPI_LOGICALx ? IMHO, the MPI_LOGICALx
(provided that Fortran compilers do support them) are an omission in
the 2.2 standard.


On Wed, Sep 23, 2009 at 4:33 PM, Lisandro Dalcin  wrote:
> Disclaimer: I have almost no experience with Fortran, nor I'm needing
> this, but anyway (perhaps just as a reminder for you) :-)...
>
> Provided that:
>
> 1) Open MPI exposes MPI_LOGICAL{1|2|4|8}, and they are not (AFAIK)
> listed in the MPI standard (I cannot found them in MPI-2.2)
>
> 2) The MPI-2.2 standard DO list MPI COMPLEX4 (at least in 2.2) ...
>
> would it make sense that you add MPI_COMPLEX4 support ASAP, even
> before full MPI-2.2 support?
>
> --
> Lisandro Dalcín
> ---
> Centro Internacional de Métodos Computacionales en Ingeniería (CIMEC)
> Instituto de Desarrollo Tecnológico para la Industria Química (INTEC)
> Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET)
> PTLC - Güemes 3450, (3000) Santa Fe, Argentina
> Tel/Fax: +54-(0)342-451.1594
>



-- 
Lisandro Dalcín
---
Centro Internacional de Métodos Computacionales en Ingeniería (CIMEC)
Instituto de Desarrollo Tecnológico para la Industria Química (INTEC)
Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET)
PTLC - Güemes 3450, (3000) Santa Fe, Argentina
Tel/Fax: +54-(0)342-451.1594



Re: [OMPI devel] ompi-trunk: have MPI_REAL2 (if available) but missing MPI_COMPLEX4

2009-09-26 Thread Lisandro Dalcin
On Sat, Sep 26, 2009 at 2:07 PM, George Bosilca  wrote:
> These two issues are completely orthogonal.
>
> On one side, MPI_LOGICALx is not defined by the MPI standard but it is
> supported by several others MPI libraries and as it match a C type it is
> easy to implement.
>

OK, so we agree here.. It is good to have the MPI_LOGICALx despite not
being listed in the standard.

> On the other side, MPI_COMPLEX4 is defined by the MPI standard as an
> optional datatype (MPI 2.2 page 451) which doesn't make it mandatory (sic!),
> is supported by few others MPI libraries but as it doesn't match any C type
> it is really difficult to implement.
>

OK, you are right, it is really difficult to implement, at least in
pure-C code... BTW, this also applies to MPI_REAL2, right ? Then...

$ nm /usr/local/openmpi/dev-trunk/lib/libmpi.so | grep real2
000fe0e0 D ompi_mpi_real2

So if you have support for real(kind=2) in "ompi_mpi_real2" ... Do you
still think that it is so hard to support complex(kind=4) ??

Anyway, I see that MPI_REAL2 is never #define'd to &ompi_mpi_real2 .



>  george.
>
> On Sep 26, 2009, at 11:04 , Lisandro Dalcin wrote:
>
>> Jeff, about ticket #2032 ...
>>
>> I was not asking you to eliminate MPI_LOGICALx, just asked for
>> MPI_COMPLEX4 to be added...
>>
>> If you still think the MPI_LOGICALx should be removed from Open MPI,
>> what about rename the macros to OMPI_LOGICALx ? IMHO, the MPI_LOGICALx
>> (provided that Fortran compilers do support them) are an omission in
>> the 2.2 standard.
>>
>>
>> On Wed, Sep 23, 2009 at 4:33 PM, Lisandro Dalcin 
>> wrote:
>>>
>>> Disclaimer: I have almost no experience with Fortran, nor I'm needing
>>> this, but anyway (perhaps just as a reminder for you) :-)...
>>>
>>> Provided that:
>>>
>>> 1) Open MPI exposes MPI_LOGICAL{1|2|4|8}, and they are not (AFAIK)
>>> listed in the MPI standard (I cannot found them in MPI-2.2)
>>>
>>> 2) The MPI-2.2 standard DO list MPI COMPLEX4 (at least in 2.2) ...
>>>
>>> would it make sense that you add MPI_COMPLEX4 support ASAP, even
>>> before full MPI-2.2 support?
>>>
>>> --
>>> Lisandro Dalcín
>>> ---
>>> Centro Internacional de Métodos Computacionales en Ingeniería (CIMEC)
>>> Instituto de Desarrollo Tecnológico para la Industria Química (INTEC)
>>> Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET)
>>> PTLC - Güemes 3450, (3000) Santa Fe, Argentina
>>> Tel/Fax: +54-(0)342-451.1594
>>>
>>
>>
>>
>> --
>> Lisandro Dalcín
>> ---
>> Centro Internacional de Métodos Computacionales en Ingeniería (CIMEC)
>> Instituto de Desarrollo Tecnológico para la Industria Química (INTEC)
>> Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET)
>> PTLC - Güemes 3450, (3000) Santa Fe, Argentina
>> Tel/Fax: +54-(0)342-451.1594
>>
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
>
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>



-- 
Lisandro Dalcín
---
Centro Internacional de Métodos Computacionales en Ingeniería (CIMEC)
Instituto de Desarrollo Tecnológico para la Industria Química (INTEC)
Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET)
PTLC - Güemes 3450, (3000) Santa Fe, Argentina
Tel/Fax: +54-(0)342-451.1594



[OMPI devel] MPI_Group_{incl|exc} with nranks=0 and ranks=NULL

2009-10-21 Thread Lisandro Dalcin
Currently (trunk, just svn update'd), the following call fails
(because of the ranks=NULL pointer)

MPI_Group_{incl|excl}(group, 0, NULL, &newgroup)

BTW, MPI_Group_translate_ranks() has similar issues...


Provided that Open MPI accept the combination (int_array_size=0,
int_array_ptr=NULL) in other calls, I think it should also accept the
NULL's in the calls above... What do you think?


-- 
Lisandro Dalcín
---
Centro Internacional de Métodos Computacionales en Ingeniería (CIMEC)
Instituto de Desarrollo Tecnológico para la Industria Química (INTEC)
Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET)
PTLC - Güemes 3450, (3000) Santa Fe, Argentina
Tel/Fax: +54-(0)342-451.1594



Re: [OMPI devel] possible bugs and unexpected values in returned errors classes

2009-12-09 Thread Lisandro Dalcin
It seems that this issue got lost.

On Thu, Feb 12, 2009 at 9:02 PM, Jeff Squyres  wrote:
> On Feb 11, 2009, at 8:24 AM, Lisandro Dalcin wrote:
>
>> Below a list of stuff that I've got by running mpi4py testsuite.
>>
>> 4)  When passing MPI_WIN_NULL, MPI_Win_get_errhandler() and
>> MPI_Win_set_errhandler()  DO NOT fail.
>
> I was a little more dubious here; the param checking code was specifically
> checking for MPI_WIN_NULL and not classifying it as an error.  Digging to
> find out why we did that, the best that I can come up with is that it is
> *not* an error to call MPI_File_set|get_errhandler on MPI_FILE_NULL (to set
> behavior for what happens when FILE_OPEN fails); I'm *guessing* that we
> simply copied the _File_ code to the _Win_ code and forgot to remove that
> extra check.
>
> I can't find anything in MPI-2.1 that says it is legal to call set|get
> errhandler on MPI_WIN_NULL.  I checked LAM as well; LAM errors in this case.
>  So I made this now be an error in OMPI as well.
>
> Do you need these in the 1.3 series?  Or are you ok waiting for 1.4
> (assuming 1.4 takes significantly less time to release than 1.3 :-) ).
>

In short:

When passing MPI_WIN_NULL, MPI_Win_get_errhandler() and
MPI_Win_set_errhandler()  DO NOT fail.

Jeff, you promised this for 1.4 ;-). Any chance for 1.4.1 ?

-- 
Lisandro Dalcín
---
Centro Internacional de Métodos Computacionales en Ingeniería (CIMEC)
Instituto de Desarrollo Tecnológico para la Industria Química (INTEC)
Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET)
PTLC - Güemes 3450, (3000) Santa Fe, Argentina
Tel/Fax: +54-(0)342-451.1594



[OMPI devel] failure with zero-length Reduce() and both sbuf=rbuf=NULL

2009-12-10 Thread Lisandro Dalcin
See the code below. The commented-out combinations for sbuf,rbuf do
work, but the one passing sbuf=rbuf=NULL (i.e, the uncommented one
show below) makes the call fail with MPI_ERR_ARG.

#include 

int main( int argc, char ** argv ) {
  int ierr;
  int sbuf,rbuf;
  MPI_Init(&argc, &argv);
  ierr = MPI_Reduce(/*&sbuf, &rbuf,*/
/*&sbuf, NULL,*/
/*NULL, &rbuf,*/
NULL, NULL,
0, MPI_INT,
MPI_SUM, 0, MPI_COMM_WORLD);
  MPI_Finalize();
  return 0;
}


-- 
Lisandro Dalcín
---
Centro Internacional de Métodos Computacionales en Ingeniería (CIMEC)
Instituto de Desarrollo Tecnológico para la Industria Química (INTEC)
Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET)
PTLC - Güemes 3450, (3000) Santa Fe, Argentina
Tel/Fax: +54-(0)342-451.1594



Re: [OMPI devel] failure with zero-length Reduce() and both sbuf=rbuf=NULL

2009-12-11 Thread Lisandro Dalcin
On Thu, Dec 10, 2009 at 4:26 PM, George Bosilca  wrote:
> Lisandro,
>
> This code is not correct from the MPI standard perspective. The reason is 
> independent of the datatype or count, it is solely related to the fact that 
> the MPI_Reduce cannot accept a sendbuf equal to the recvbuf (or one has to 
> use MPI_IN_PLACE).
>

George, I have to disagree. Zero-length buffers are a very special
case, and the MPI std is not very explicit about this limit case. Try
the code pasted at the end.

1) In Open MPI, the only one of these failing for sbuf=rbuf=NULL is MPI_Reduce()

2) As reference, all the calls succeed in MPICH2.



#include 
#include 

int main( int argc, char ** argv ) {
  int ierr;
  MPI_Init(&argc, &argv);
  ierr = MPI_Scan(
  NULL, NULL,
  0,
  MPI_INT,
  MPI_SUM,
  MPI_COMM_WORLD);
  ierr = MPI_Exscan(
NULL, NULL,
0,
MPI_INT,
MPI_SUM,
MPI_COMM_WORLD);
  ierr = MPI_Allreduce(
   NULL, NULL,
   0,
   MPI_INT,
   MPI_SUM,
   MPI_COMM_WORLD);
#if 1
  ierr = MPI_Reduce(
NULL, NULL,
0,
MPI_INT,
MPI_SUM,
0,
MPI_COMM_WORLD);
#endif
  MPI_Finalize();
  return 0;
}



-- 
Lisandro Dalcín
---
Centro Internacional de Métodos Computacionales en Ingeniería (CIMEC)
Instituto de Desarrollo Tecnológico para la Industria Química (INTEC)
Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET)
PTLC - Güemes 3450, (3000) Santa Fe, Argentina
Tel/Fax: +54-(0)342-451.1594



Re: [OMPI devel] failure with zero-length Reduce() andbothsbuf=rbuf=NULL

2010-02-10 Thread Lisandro Dalcin
er 
> when paired with an appropriate datatype.
>
> As such, NULL is *not* a special case.  It's a potentially valid buffer, just 
> like any other value.
>

Are you assuming here that MPI_BOTTOM do is exactly the same as NULL,
at least in Open MPI?

How can (ptr=NULL,count>0,MPI_INT) or other predefined datatypes be a
valid buffer ? However, with
(ptr=MPI_BOTTOM,count>0,usr-def-datatype), that's other story...

>> Special casing Open MPI in my testsuite to disable these tests is just
>> a matter of adding two lines,  but before that I would like to have
>> some sort of final pronouncement on all this from your side.
>
> What is the purpose of testing 0-length reductions?
>

I'm testing zero-length reductions because MPI implementations can
potentially support them. My Python wrappers should support as much
features of the underlying MPI implementation as possible. Then I
should support zero-length reductions if possible.

In Python land (specially when third party extension modules written
in C are involved) and likely other places, a zero-length array is
something not very well defined... Instances could be singletons (then
pointers could alias, because this should not be an issue as the array
length is zero), pointers could be non-NULL and always different (i.e.
what malloc(0) returns in some platforms), or pointer could be NULL
(because that's what malloc(0) returns, of because the implemention
code special-case things by enforcing ptr=NULL,len=0 for zero-length
array instances).

As there are different ways to represent a zero-length array using a
(ptr,len) pair, I tried to make sure by exhaustive testing that all
the possibilities were working... Such testing or corner cases is not
easy in general :-). Some thing fails depending on the MPI
implementation, some other things work but likely by accident. You
see, I'm suffering the usual nightmares of platform/implementation
defined behavior :-( ...


-- 
Lisandro Dalcin
---
Centro Internacional de Métodos Computacionales en Ingeniería (CIMEC)
Instituto de Desarrollo Tecnológico para la Industria Química (INTEC)
Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET)
PTLC - Güemes 3450, (3000) Santa Fe, Argentina
Tel/Fax: +54-(0)342-451.1594



Re: [OMPI devel] failure with zero-length Reduce() andbothsbuf=rbuf=NULL

2010-02-10 Thread Lisandro Dalcin
On 10 February 2010 13:42, Eugene Loh  wrote:
> Here is a related case.
>
> If I remember correctly, the HPCC pingpong test synchronizes occasionally by
> having one process send a zero-byte broadcast to all other processes.
>  What's a zero-byte broadcast?  Well, some MPIs apparently send no data, but
> do have synchronization semantics.  (No non-root process can exit before the
> root process has entered.)  Other MPIs treat the zero-byte broadcasts as
> no-ops;  there is no synchronization and then timing results from the HPCC
> pingpong test are very misleading.  So far as I can tell, the MPI standard
> doesn't address which behavior is correct.

Yep... for p2p communication things are more clear (and behavior more
consistens in the MPI's out there) regarding zero-length messages...
IMHO, collectives should be non-op only in the sense that no actual
reduction is made because there are no elements to operate on. I mean,
if Reduce(count=1) implies a sync, Reduce(count=0) should also imply a
sync...


> The test strikes me as
> deficient:  it would have been just as easy to have a single-word broadcast
> to implement the synchronization they were looking for.
>

Or use MPI_Barrier() ...


-- 
Lisandro Dalcin
---
Centro Internacional de Métodos Computacionales en Ingeniería (CIMEC)
Instituto de Desarrollo Tecnológico para la Industria Química (INTEC)
Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET)
PTLC - Güemes 3450, (3000) Santa Fe, Argentina
Tel/Fax: +54-(0)342-451.1594



Re: [OMPI devel] failure with zero-length Reduce()andbothsbuf=rbuf=NULL

2010-02-10 Thread Lisandro Dalcin
On 10 February 2010 14:19, Jeff Squyres  wrote:
> On Feb 10, 2010, at 11:59 AM, Lisandro Dalcin wrote:
>
>> > If I remember correctly, the HPCC pingpong test synchronizes occasionally 
>> > by
>> > having one process send a zero-byte broadcast to all other processes.
>> >  What's a zero-byte broadcast?  Well, some MPIs apparently send no data, 
>> > but
>> > do have synchronization semantics.  (No non-root process can exit before 
>> > the
>> > root process has entered.)  Other MPIs treat the zero-byte broadcasts as
>> > no-ops;  there is no synchronization and then timing results from the HPCC
>> > pingpong test are very misleading.  So far as I can tell, the MPI standard
>> > doesn't address which behavior is correct.
>>
>> Yep... for p2p communication things are more clear (and behavior more
>> consistens in the MPI's out there) regarding zero-length messages...
>> IMHO, collectives should be non-op only in the sense that no actual
>> reduction is made because there are no elements to operate on. I mean,
>> if Reduce(count=1) implies a sync, Reduce(count=0) should also imply a
>> sync...
>
> Sorry to disagree again.  :-)
>
> The *only* MPI collective operation that guarantees a synchronization is 
> barrier.  The lack of synchronization guarantee for all other collective 
> operations is very explicit in the MPI spec.

Of course.

> Hence, it is perfectly valid for an MPI implementation to do something like a 
> no-op when no data transfer actually needs to take place
>

So you say that an MPI implementation is free to do make a sync in
case of Bcast(count=1), but not in the case of Bcast(count=0) ? I
could agree that such behavior is technically correct regarding the
MPI standard... But it makes me feel a bit uncomfortable... OK, in the
end, the change on semantic depending on message sizes is comparable
to the blocking/nonblocking one for  MPI_Send(count=10^8) versus
Send(count=1).

>
> (except, of course, the fact that Reduce(count=1) isn't defined ;-) ).
>

You likely meant Reduce(count=0) ... Good catch ;-)


PS: The following question is unrelated to this thread, but my
curiosity+laziness cannot resist... Does Open MPI has some MCA
parameter to add a synchronization at every collective call?

-- 
Lisandro Dalcin
---
Centro Internacional de Métodos Computacionales en Ingeniería (CIMEC)
Instituto de Desarrollo Tecnológico para la Industria Química (INTEC)
Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET)
PTLC - Güemes 3450, (3000) Santa Fe, Argentina
Tel/Fax: +54-(0)342-451.1594



[OMPI devel] Request_free() and Cancel() with REQUEST_NULL

2010-02-11 Thread Lisandro Dalcin
Why Request_free() and Cancel() do not fail when REQUEST_NULL is
passed? Am I missing something?

#include 

int main(int argc, char *argv[])
{
  MPI_Request req;
  MPI_Init(&argc, &argv);
  req = MPI_REQUEST_NULL;
  MPI_Request_free(&req);
  req = MPI_REQUEST_NULL;
  MPI_Cancel(&req);
  MPI_Finalize();
  return 0;
}


PS: The code below was tested with 1.4.1

-- 
Lisandro Dalcin
---
Centro Internacional de Métodos Computacionales en Ingeniería (CIMEC)
Instituto de Desarrollo Tecnológico para la Industria Química (INTEC)
Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET)
PTLC - Güemes 3450, (3000) Santa Fe, Argentina
Tel/Fax: +54-(0)342-451.1594



Re: [OMPI devel] failure withzero-lengthReduce()andbothsbuf=rbuf=NULL

2010-02-11 Thread Lisandro Dalcin
On 11 February 2010 12:04, George Bosilca  wrote:
>
> Therefore, we can argue as much as you want about what the correct arguments 
> of a reduce call should be, a reduce(count=0) is one of the meaningless MPI 
> calls and as such should not be tolerated.
>

Well, I have to disagree... I understand you (as an MPI implementor)
think that Reduce(count=0) could be meaningless and add complexity to
the implementation of MPI_Reduce()... But Reduce(count=0) could save
user code of special-casing the count==0 situation... after all,
zero-length arrays/sequences/containers do appear in actual codes...

> Anyway, this discussion diverged from its original subject. The standard is 
> pretty clear on what set of arguments are valid, and the fact that the send 
> and receive buffers should be different is one of the strongest requirement 
> (and this independent on what count is).

Sorry If count=0, why sendbuf!=recvbuff is SO STRONGLY required? I
cannot figure out the answer...

> As a courtesy, Open MPI accepts the heresy of a count = zero, but there is 
> __absolutely__ no reason to stop checking the values of the other arguments 
> when this is true. If the user really want to base the logic of his 
> application on such a useless and non-standard statement (reduce(0)) at least 
> he has to have the courtesy to provide a valid set of arguments.

I'm still not convinced that recuce(0) is non-standard, as Jeff
pointer out, the standard says "non-negative integer". The later
comment is IMHO is not saying that count=0 is invalid, such  a
conclusion is a misinterpretation. What's would be the rationale of
making Reduce(count=0) invalid, when all other
(communication+reductions) collective calls do not explicitly say that
count=0 is invalid, and "count" arguments are always described as
"non-negative integer" ??

>
> PS: If I can suggest a correct approach to fix the python bindings I would 
> encourage you to go for the strongest and more meaningful approach, sendbuf 
> should always be different that recvbuf (independent on the value of count).
>

I have the feeling that you think I'm bikeshedding because I'm lazy or
I have nothing more useful to do :-)... That's not the case... I'm the
developer of a MPI wrapper, it is not my business to impose arbitrary
restrictions on users... then I would like MPI implementations to
follow that rule... if count=0, I cannot see why I should restrict
user to pass sendbuf!=recvbuf ... moreover, in a dynamic language like
Python, things are not always obvious...

Let me show you a little Python experiment... Enter you python prompt,
and type this:

$ python
>>> from array import array
>>> a = array('i', []) # one zero-length array of integers (C-int)
>>> b = array('i', []) # other zero-length array
>>> a is b # are 'a' and 'b' the same object instance?
False
>>>

So far, so good.. we have two different arrays of integers, and their
length is zero...
Let's see the values of the (pointer, length), where the pointer is
represented as its integer value:

>>> a.buffer_info()
(0, 0)
>>> b.buffer_info()
(0, 0)
>>>

Now, suppose I do this:

>>> from mpi4py import MPI
>>> MPI.COMM_WORLD.Reduce(a, b, op=MPI.SUM, root=0)
Traceback (most recent call last):
  File "", line 1, in 
  File "Comm.pyx", line 534, in mpi4py.MPI.Comm.Reduce (src/mpi4py.MPI.c:52115)
mpi4py.MPI.Exception: MPI_ERR_ARG: invalid argument of some other kind
>>>

Then a mpi4py user mail me asking: WTF? 'a' and 'b' were different
arrays, what's going on? why my call failed? And then I have to say:
this fails because of two implementation details... Built-in Python's
'array.array' instances have pointer=NULL when lenght=0, and your MPI
implementation requires sendbuf!=recvbuf, even if count=0 and
sendbuf=recvbuf=NULL... Again, you may still thing that
Reduce(count=0), or any other (count=0) is a nonsense,
I may even agree with you... But IMHO that's not what the standard
says, but again, imposing restrictions user codes should not be our
business...

Geoge, what could I do here? Should I forcibly pass a different,fake
value enforcing sendbuff!=recvbuff myself when count=0? Would this be
portable? What if other MPI implementation in some platform decides to
complain because the fake value I'm passing does not represent a valid
address?


PS: Maintaining a MPI-2 binding for Python may requires a lot of care
and attention to little details. And I have to support, Python>=2.3
and the new Python 3; on Windows, Linux and OS X, with many of the
MPI-1 and MPI-2 implementations out there... Consistent behavior and
standard compliance on MPI implementations is FUNDAMENTAL to develop
portable wrappers for

[OMPI devel] MPI_Win_get_errhandler() and MPI_Win_set_errhandler() do not fail when passing MPI_WIN_NULL

2010-02-11 Thread Lisandro Dalcin
I've reported this long ago (alongside other issues now fixed)...

I can see that this is fixed in trunk and branches/v1.5, but not
backported to branches/v1.4

Any chance to get this for 1.4.2? Or should it wait until 1.5?


-- 
Lisandro Dalcin
---
Centro Internacional de Métodos Computacionales en Ingeniería (CIMEC)
Instituto de Desarrollo Tecnológico para la Industria Química (INTEC)
Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET)
PTLC - Güemes 3450, (3000) Santa Fe, Argentina
Tel/Fax: +54-(0)342-451.1594



Re: [OMPI devel] failure withzero-lengthReduce()andbothsbuf=rbuf=NULL

2010-02-11 Thread Lisandro Dalcin
On 11 February 2010 15:06, George Bosilca  wrote:
> This is absolutely not true. Open MPI supports zero length collective 
> operations (all of them actually), but if their arguments are correctly 
> shaped.
>

OK, you are right here ...

> What you're asking for is a free ticket to write MPI calls that do not follow 
> the MPI requirements when a special value for count is given.
>

But you did not answer my previous question... What's the rationale
for requiring sendbuf!=recvbuf when count=0? I would argue you want a
free ticket :-) to put restrictions on user code (without an actual
rationale) in order to simplify your implementation.

> While zero-length arrays/sequence/containers do appears in real code, they 
> are not equal to NULL. If they are NULL, that means they do not contain any 
> useful data, and they don't need to be source or target of any kind of 
> [collective or point-to-point] communications.
>

Yes, I know. Moreover, I agree with you. NULL should be reserved for
invalid pointers, not for zero-length array... The problem is that
people out there seem to disagree or just do not pay any attention to
this, thus (pointer=NULL,length=0) DO APPEAR in real life (like the
Python example I previously showed you)... Additionally, some time ago
(while discussing MPI_Alloc_mem(size=0)) we commented on the different
return values for malloc(0) depending on the platform...

Well, this discussion got too far... In the end, I agree that
representing zero-length arrays with (pointer=NULL,length=0) should be
regarded as bad practice...



-- 
Lisandro Dalcin
---
Centro Internacional de Métodos Computacionales en Ingeniería (CIMEC)
Instituto de Desarrollo Tecnológico para la Industria Química (INTEC)
Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET)
PTLC - Güemes 3450, (3000) Santa Fe, Argentina
Tel/Fax: +54-(0)342-451.1594



Re: [OMPI devel] RFC: ABI break between 1.4 and 1.5 / .so versioning

2010-02-19 Thread Lisandro Dalcin
On 18 February 2010 10:53, Jeff Squyres  wrote:
> On Feb 18, 2010, at 1:53 AM, Ralf Wildenhues wrote:
>
>> You could probably create fake empty libopen-rte and libopen-pal stub
>> libraries with 0:0:0 purely for the sake of allowing such an a.out to
>> still work (on systems with versioned sonames[1]).  Since this doesn't
>> actually use any of the APIs from those libraries, there is no problem
>> here, and your 1.5 libmpi will pull in the 1:0:0 versions of the other
>> two libraries.
>
> You get 10 "evil genius" points for a nifty-yet-icky solution.  :-)
>
> I don't really want to continue carrying forward empty libraries just to 
> maintain ABI.  I'm (mostly) ok with breaking ABI at a major series change 
> (i.e., 1.5.0).
>

And you could add a FAQ entry or document in some place how to do this
trick, just in case a sysadmin desperately needs the hack because of
pressure from some user with ABI issues.


-- 
Lisandro Dalcin
---
Centro Internacional de Métodos Computacionales en Ingeniería (CIMEC)
Instituto de Desarrollo Tecnológico para la Industria Química (INTEC)
Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET)
PTLC - Güemes 3450, (3000) Santa Fe, Argentina
Tel/Fax: +54-(0)342-451.1594



[OMPI devel] malloc(0) warnings

2010-05-05 Thread Lisandro Dalcin
After building 1.4.2 with debug flags to configure, I get this (I've
got these warnings in previous releases, too):

malloc debug: Request for 0 bytes (coll_inter_gatherv.c, 94)
malloc debug: Request for 0 bytes (coll_inter_gatherv.c, 94)
malloc debug: Request for 0 bytes (coll_inter_gatherv.c, 94)
malloc debug: Request for 0 bytes (coll_inter_gatherv.c, 94)

malloc debug: Request for 0 bytes (coll_inter_scatterv.c, 82)
malloc debug: Request for 0 bytes (coll_inter_scatterv.c, 82)
malloc debug: Request for 0 bytes (coll_inter_scatterv.c, 82)
malloc debug: Request for 0 bytes (coll_inter_scatterv.c, 82)


-- 
Lisandro Dalcin
---
CIMEC (INTEC/CONICET-UNL)
Predio CONICET-Santa Fe
Colectora RN 168 Km 472, Paraje El Pozo
Tel: +54-342-4511594 (ext 1011)
Tel/Fax: +54-342-4511169


[OMPI devel] MPI_Type_free(MPI_BYTE) not failing after MPI_Win_create()

2010-06-18 Thread Lisandro Dalcin
See the code below. As expected, you DO get an error (though the error
is ERR_INTERN, somewhat not informative). However, if you fist
create() % destroy a windows, you DO NOT get any error. This is VERY
strange, right? I understand this issue is going to be very low
priority for you, but I'm wondering whether this is actually related
to some deeper, nasty bug that can cause actual trouble in valid code.


#include 
int main(int argc, char *argv[])
{
  MPI_Init(&argc, &argv);
#if 0
  {
MPI_Win win;
MPI_Win_create(MPI_BOTTOM,0,1,MPI_INFO_NULL,MPI_COMM_SELF,&win);
MPI_Win_free(&win);
  }
#endif
  {
MPI_Datatype byte = MPI_BYTE;
MPI_Type_free(&byte);
  }
  MPI_Finalize();
  return 0;
}


-- 
Lisandro Dalcin
---
CIMEC (INTEC/CONICET-UNL)
Predio CONICET-Santa Fe
Colectora RN 168 Km 472, Paraje El Pozo
Tel: +54-342-4511594 (ext 1011)
Tel/Fax: +54-342-4511169


[OMPI devel] VampirTrace and MPI_Init_thread()

2010-08-10 Thread Lisandro Dalcin
Below you have C program that will MPI_Init_thread()

$ cat demo/helloworld.c
#include 
#include 

int main(int argc, char *argv[])
{
  int provided;
  int size, rank, len;
  char name[MPI_MAX_PROCESSOR_NAME];

  MPI_Init_thread(&argc, &argv, MPI_THREAD_MULTIPLE, &provided);

  MPI_Comm_size(MPI_COMM_WORLD, &size);
  MPI_Comm_rank(MPI_COMM_WORLD, &rank);
  MPI_Get_processor_name(name, &len);

  printf("Hello, World! I am process %d of %d on %s.\n", rank, size, name);

  MPI_Finalize();
  return 0;
}


Now I build like this:

$ mpicc-vt demo/helloworld.c

and then try to run it:

$ ./a.out
Hello, World! I am process 0 of 1 on trantor.
[trantor:18854] *** An error occurred in MPI_Group_free
[trantor:18854] *** on communicator MPI_COMM_WORLD
[trantor:18854] *** MPI_ERR_GROUP: invalid group
[trantor:18854] *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)

However, if MPI_Init() is used, it succeeds.

It seems the MPI_Init_thread() wrapper to PMPI_Init_thread() is
missing, see this:

$ nm a.out | grep MPI_Init
0805c4ef T MPI_Init
 U MPI_Init_thread
 U PMPI_Init


PS: Sorry if this is actually a VT bug. I'm not a VT user, I'm just
reporting this issue (related to a mpi4py bug report that arrived at
my inbox months ago).

-- 
Lisandro Dalcin
---
CIMEC (INTEC/CONICET-UNL)
Predio CONICET-Santa Fe
Colectora RN 168 Km 472, Paraje El Pozo
Tel: +54-342-4511594 (ext 1011)
Tel/Fax: +54-342-4511169


Re: [OMPI devel] VampirTrace and MPI_Init_thread()

2010-08-10 Thread Lisandro Dalcin
On 10 August 2010 22:59, George Bosilca  wrote:
> Lisandro,
>
> Thanks for the report. I quickly checked the Open MPI source code and the .so 
> library and both show the existence of both MPI_Init_thread and 
> PMPI_Init_thread symbols.
>
> 00031b60 T _MPI_Init_thread
> 0005e7c0 T _PMPI_Init_thread
>
> I CC'ed the VT folks.
>

OK. Now I more confident that the problem is in VT:

nm /usr/local/openmpi/1.4.2/lib/libvt.mpi.a | grep MPI_Init
00ab T MPI_Init
 U PMPI_Init

I would expect a "  T MPI_Init_thread" line to appear, but it is
not the case.

Many thanks,


-- 
Lisandro Dalcin
---
CIMEC (INTEC/CONICET-UNL)
Predio CONICET-Santa Fe
Colectora RN 168 Km 472, Paraje El Pozo
Tel: +54-342-4511594 (ext 1011)
Tel/Fax: +54-342-4511169


Re: [OMPI devel] VampirTrace and MPI_Init_thread()

2010-08-11 Thread Lisandro Dalcin
On 11 August 2010 03:12, Matthias Jurenz  wrote:
> Hello Lisandro,
>
> this problem will be fixed in the next Open MPI release. There was an obsolete
> preprocessor condition around the MPI_Init_thread wrapper, so the source code
> could never be compiled :-(
>
> Thanks for the hint.
>
> Matthias
>

OK. Many thanks for you clarification.

BTW, I have and additional issue. I'm trying to build as shared
library from libvt*.a using by passing -whole-archive to the linker.
The idea behind this is to use that library with LD_PRELOAD to get MPI
tracing of a binary compiled with plain mpicc (i.e, not mpicc-vt). For
example, I managed to get this trick working with MPE. Moreover, I can
enable MPI profiling at runtime in a Python script using mpi4pt by
dlopen'ing the shared lib with profiling symbols before loading the
mpi4py.MPI Python extension module. Being able to profile without a
recompile is nice ;-)

However, see this:

$ pwd
/usr/local/openmpi/1.4.2/lib

$ ll libvt*
-rw-r--r--. 1 root root 410784 2010-05-05 20:40 libvt.a
-rw-r--r--. 1 root root 197618 2010-05-05 20:40 libvt.fmpi.a
-rw-r--r--. 1 root root 569128 2010-05-05 20:40 libvt.mpi.a
-rw-r--r--. 1 root root 503514 2010-05-05 20:40 libvt.omp.a
-rw-r--r--. 1 root root 661466 2010-05-05 20:40 libvt.ompi.a

$ nm libvt* | grep pomp_rd_table
 U pomp_rd_table
 U pomp_rd_table
 U pomp_rd_table
 U pomp_rd_table
 U pomp_rd_table
 U pomp_rd_table
 U pomp_rd_table
 U pomp_rd_table
 U pomp_rd_table
 U pomp_rd_table

That symbol (and possibly others) are undefined and I cannot found
them elsewhere. Is there any easy way to build a shared lib with the
MPI_xxx symbols?


-- 
Lisandro Dalcin
---
CIMEC (INTEC/CONICET-UNL)
Predio CONICET-Santa Fe
Colectora RN 168 Km 472, Paraje El Pozo
Tel: +54-342-4511594 (ext 1011)
Tel/Fax: +54-342-4511169


Re: [OMPI devel] VampirTrace and MPI_Init_thread()

2010-08-13 Thread Lisandro Dalcin
On 13 August 2010 05:22, Matthias Jurenz  wrote:
> On Wednesday 11 August 2010 23:16:50 Lisandro Dalcin wrote:
>> On 11 August 2010 03:12, Matthias Jurenz 
> wrote:
>> > Hello Lisandro,
>> >
>> > this problem will be fixed in the next Open MPI release. There was an
>> > obsolete preprocessor condition around the MPI_Init_thread wrapper, so
>> > the source code could never be compiled :-(
>> >
>> > Thanks for the hint.
>> >
>> > Matthias
>>
>> OK. Many thanks for you clarification.
>>
>> BTW, I have and additional issue. I'm trying to build as shared
>> library from libvt*.a using by passing -whole-archive to the linker.
>> The idea behind this is to use that library with LD_PRELOAD to get MPI
>> tracing of a binary compiled with plain mpicc (i.e, not mpicc-vt). For
>> example, I managed to get this trick working with MPE. Moreover, I can
>> enable MPI profiling at runtime in a Python script using mpi4pt by
>> dlopen'ing the shared lib with profiling symbols before loading the
>> mpi4py.MPI Python extension module. Being able to profile without a
>> recompile is nice ;-)
>>
>> However, see this:
>>
>> $ pwd
>> /usr/local/openmpi/1.4.2/lib
>>
>> $ ll libvt*
>> -rw-r--r--. 1 root root 410784 2010-05-05 20:40 libvt.a
>> -rw-r--r--. 1 root root 197618 2010-05-05 20:40 libvt.fmpi.a
>> -rw-r--r--. 1 root root 569128 2010-05-05 20:40 libvt.mpi.a
>> -rw-r--r--. 1 root root 503514 2010-05-05 20:40 libvt.omp.a
>> -rw-r--r--. 1 root root 661466 2010-05-05 20:40 libvt.ompi.a
>>
>> $ nm libvt* | grep pomp_rd_table
>>          U pomp_rd_table
>>          U pomp_rd_table
>>          U pomp_rd_table
>>          U pomp_rd_table
>>          U pomp_rd_table
>>          U pomp_rd_table
>>          U pomp_rd_table
>>          U pomp_rd_table
>>          U pomp_rd_table
>>          U pomp_rd_table
>>
>> That symbol (and possibly others) are undefined and I cannot found
>> them elsewhere. Is there any easy way to build a shared lib with the
>> MPI_xxx symbols?
>>
>
> Actually, the symbols above will be defined at compile/link time of the
> application by the OpenMP instrumentor "OPARI".
> However, while your application doesn't use OpenMP it should work to define 
> the
> missing symbols in a separate source file (see attachment) when building the
> shared library:
>
> gcc -fPIC -I/vampirtrace -shared missing_syms.c -o
> libvt.mpi.so -Wl,--whole-archive /libvt.mpi.a  libdir>/libotf.a -Wl,--no-whole-archive -ldl -lz -L -lmpi
>

OK. Many thanks for the hint.

I was able to build a shared lib, dlopen() it at runtime and get MPI
traces from Python scripts without need of recompiles with mpicc-vt.
Sweet!

> FYI, the next Open MPI 1.5 will come with a newer VampirTrace which provides
> shared libraries by default.
>

Nice! ... Perhaps Open MPI mpiexec's could gain a -vt flag to enable
traces at runtime (should be easy to implement with LD_PRELOAD,
right?)...

BTW, I understand Open MPI 1.5 VT will have the MPI_Init_thread()
issue fixed. Any chance for v1.4 series?


-- 
Lisandro Dalcin
---
CIMEC (INTEC/CONICET-UNL)
Predio CONICET-Santa Fe
Colectora RN 168 Km 472, Paraje El Pozo
Tel: +54-342-4511594 (ext 1011)
Tel/Fax: +54-342-4511169



[OMPI devel] Barrier() after Finalize() when a file handle is leaked.

2010-09-15 Thread Lisandro Dalcin
I've tested this with (--enable-debug --enable-picky
--enable-mem-debug) 1.4.2 and 1.5rc6. Despite being debug builds, a
mpi4py user got the same with (likely release) builds in both Ubuntu
and OS X.

$ cat open.c
#include 
int main(int argc, char *argv[]) {
  MPI_File f;
  MPI_Init(&argc, &argv);
  MPI_File_open(MPI_COMM_WORLD, "test.plt", MPI_MODE_RDONLY, MPI_INFO_NULL, &f);
  /* MPI_File_close(&f); */
  MPI_Finalize();
  return 0;
}

$ mpicc open.c

$ ./a.out
*** The MPI_Barrier() function was called after MPI_FINALIZE was invoked.
*** This is disallowed by the MPI standard.
*** Your MPI job will now abort.
[trantor:15145] Abort after MPI_FINALIZE completed successfully; not
able to guarantee that all other processes were killed!


So if you open a file but never close it, a MPI_Barrier() gets called
after MPI_Finalize(). Could that come from a finalizer ROMIO callback?
However, I do not get this failure with MPICH2, and Open MPI seems to
behave just fine regarding MPI_Finalized(), the code below work as
expected:

#include 
#include 

static int atexitmpi(MPI_Comm comm, int k, void *v, void *xs) {
  int flag;
  MPI_Finalized(&flag);
  printf("atexitmpi: finalized=%d\n", flag);
  MPI_Barrier(MPI_COMM_WORLD);
}

int main(int argc, char *argv[]) {
  int keyval = MPI_KEYVAL_INVALID;
  MPI_Init(&argc, &argv);
  MPI_Comm_create_keyval(MPI_COMM_NULL_COPY_FN, atexitmpi, &keyval, 0);
  MPI_Comm_set_attr(MPI_COMM_SELF, keyval, 0);
  MPI_Finalize();
  return 0;
}



-- 
Lisandro Dalcin
---
CIMEC (INTEC/CONICET-UNL)
Predio CONICET-Santa Fe
Colectora RN 168 Km 472, Paraje El Pozo
Tel: +54-342-4511594 (ext 1011)
Tel/Fax: +54-342-4511169


[OMPI devel] C type of MPI_UNWEIGHTED and MPI_WEIGHTS_EMPTY

2016-03-13 Thread Lisandro Dalcin
Currently, from mpi.h

#define MPI_UNWEIGHTED   ((void *) 2)  /* unweighted graph */
#define MPI_WEIGHTS_EMPTY((void *) 3)  /* empty weights */

However, according to the MPI-3.1 standard (page 680), they should be

#define MPI_UNWEIGHTED   ((int *) 2)  /* unweighted graph */
#define MPI_WEIGHTS_EMPTY((int *) 3)  /* empty weights */

PS: While the current definition is kind of harmless for C, it is
likely wrong for C++.

-- 
Lisandro Dalcin

Research Scientist
Computer, Electrical and Mathematical Sciences & Engineering (CEMSE)
Extreme Computing Research Center (ECRC)
King Abdullah University of Science and Technology (KAUST)
http://ecrc.kaust.edu.sa/

4700 King Abdullah University of Science and Technology
al-Khawarizmi Bldg (Bldg 1), Office # 4332
Thuwal 23955-6900, Kingdom of Saudi Arabia
http://www.kaust.edu.sa

Office Phone: +966 12 808-0459


[OMPI devel] Issue with 2.0.0rc3, singleton init

2016-06-16 Thread Lisandro Dalcin
omm_base_select (grpcomm_base_select.c:87)
==31396==by 0x57E9AFD: orte_ess_base_app_setup (ess_base_std_app.c:223)
==31396==by 0x74B67E1: rte_init (ess_singleton_module.c:323)
==31396==by 0x57A2B26: orte_init (orte_init.c:226)
==31396==by 0x4E8CECE: ompi_mpi_init (ompi_mpi_init.c:501)
==31396==by 0x4EC0EAD: PMPI_Init_thread (pinit_thread.c:69)
==31396==by 0x4008F2: main (in /home/dalcinl/Devel/mpi4py-dev/demo/a.out)
==31396==  If you believe this happened as a result of a stack
==31396==  overflow in your program's main thread (unlikely but
==31396==  possible), you can try to increase the size of the
==31396==  main thread stack using the --main-stacksize= flag.
==31396==  The main thread stack size used in this run was 8720384.
Killed

-- 
Lisandro Dalcin

Research Scientist
Computer, Electrical and Mathematical Sciences & Engineering (CEMSE)
Extreme Computing Research Center (ECRC)
King Abdullah University of Science and Technology (KAUST)
http://ecrc.kaust.edu.sa/

4700 King Abdullah University of Science and Technology
al-Khawarizmi Bldg (Bldg 1), Office # 0109
Thuwal 23955-6900, Kingdom of Saudi Arabia
http://www.kaust.edu.sa

Office Phone: +966 12 808-0459


[OMPI devel] MPI_Group_intersection: malloc(0) warning with 2.0.0rc3

2016-06-16 Thread Lisandro Dalcin
Trivial Python code checking the intersection of the empty group with itself.

$ cat tmp.py
from mpi4py import MPI
empty = MPI.Group.Intersection(MPI.GROUP_EMPTY, MPI.GROUP_EMPTY)
assert MPI.Group.Compare(empty, MPI.GROUP_EMPTY) in [MPI.IDENT, MPI.CONGRUENT]

$ mpiexec -n 1 python tmp.py
malloc debug: Request for 0 bytes (group/group.c, 456)

-- 
Lisandro Dalcin

Research Scientist
Computer, Electrical and Mathematical Sciences & Engineering (CEMSE)
Extreme Computing Research Center (ECRC)
King Abdullah University of Science and Technology (KAUST)
http://ecrc.kaust.edu.sa/

4700 King Abdullah University of Science and Technology
al-Khawarizmi Bldg (Bldg 1), Office # 0109
Thuwal 23955-6900, Kingdom of Saudi Arabia
http://www.kaust.edu.sa

Office Phone: +966 12 808-0459


[OMPI devel] 2.0.0rc3 MPI_Comm_split_type()

2016-06-16 Thread Lisandro Dalcin
Could you please check/confirm you are supporting passing
split_type=MPI_UNDEFINED to MPI_Comm_split_type() ? IIRC, this is a
regression from 2.0.0rc2.

$ cat test-comm-split-type.py
from mpi4py import MPI
subcomm = MPI.COMM_WORLD.Split_type(MPI.UNDEFINED)
assert subcomm == MPI.COMM_NULL

$ mpiexec -n 1 python test-comm-split-type.py
Traceback (most recent call last):
  File "test-comm-split-type.py", line 2, in 
subcomm = MPI.COMM_WORLD.Split_type(MPI.UNDEFINED)
  File "MPI/Comm.pyx", line 214, in mpi4py.MPI.Comm.Split_type
(src/mpi4py.MPI.c:95252)
mpi4py.MPI.Exception: MPI_ERR_ARG: invalid argument of some other kind


-- 
Lisandro Dalcin

Research Scientist
Computer, Electrical and Mathematical Sciences & Engineering (CEMSE)
Extreme Computing Research Center (ECRC)
King Abdullah University of Science and Technology (KAUST)
http://ecrc.kaust.edu.sa/

4700 King Abdullah University of Science and Technology
al-Khawarizmi Bldg (Bldg 1), Office # 0109
Thuwal 23955-6900, Kingdom of Saudi Arabia
http://www.kaust.edu.sa

Office Phone: +966 12 808-0459


[OMPI devel] some possible bugs

2006-09-26 Thread Lisandro Dalcin

I'am developing mpi4py, a MPI port for Python. I've wrote many
unittest scripts for my wrappers, which also pretend to test MPI
implementations.

Below, I list some issues I've found when building my wrappers with
Open MPI 1.1.1. Please let me know your opinions.

- MPI_Group_translate_ranks(group1, n, ranks1, group2, ranks2) fails
  (with MPI_ERR_GROUP) if n != size(group1). Regarding the standard,
  I understand this routine should whork for any value of n, if
  ranks1 contains values (even if some are duplicated) in a valid
  range according to size(group1).

- MPI_Info_get_nthkey(INFO, 0, key) does not fail when INFO is
  empty, ie, when MPI_Info_get_nkeys(info, &nkeys) returns nkeys==0.

- Usage of MPI_IN_PLACE is broken in some collectives, below the
  reasons I've found:

  + MPI_Gather:with sendbuf=MPI_IN_PLACE, sendcount is not ignored.
  + MPI_Scatter:   with recvbuf=MPI_IN_PLACE, recvcount is not ignored.
  + MPI_Allgather: with sendbuf=MPI_IN_PLACE, sendcount is not ignored.

  The standard says that [send|recv]count and [send|recv]type are
  ignored. I've not tested vector variants, perhaps they suffer the
  same problem.

- Some extended collective communications failed (not by raising
  errors, but instead aborting tracing to stdout) when using
  intercommunicators. Sometimes, the problems appeared when
  size(local_group) != size(remote_group). However, MPI_Barrier and
  MPI_Bcast worked well. I still could not get the reasons for those
  failures. I've found a similar problem in MPICH2 when configured
  with error-cheking enabled (they had a bug in some error-cheking
  macros, I reported this issue and next they told me I was right).


--
Lisandro Dalcín
---
Centro Internacional de Métodos Computacionales en Ingeniería (CIMEC)
Instituto de Desarrollo Tecnológico para la Industria Química (INTEC)
Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET)
PTLC - Güemes 3450, (3000) Santa Fe, Argentina
Tel/Fax: +54-(0)342-451.1594



[OMPI devel] Fwd: MPI_INPLACE problem

2006-09-27 Thread Lisandro Dalcin

Here an example of the problems I have with MPI_INPLACE in OMPI.
Hoping this can be useful. Perhaps the problem is not in OMPI sources,
but in my particular build. I've configured with:

$ head -n 7 config.log | tail -n 1
 $ ./configure --disable-dlopen --prefix /usr/local/openmpi/1.1.1

First I present a very simple program that gives right results with
OMPI, next a small modification changing the sendcount argument, wich
gives now wrong results.

Using MPICH2, both versions give the same, right result.

My environment:
--

$ echo $PATH
/usr/local/openmpi/1.1.1/bin:/usr/kerberos/bin:/usr/lib/ccache/bin:/usr/local/bin:/bin:/usr/bin:/usr/X11R6/bin:.

$ echo $LD_LIBRARY_PATH
/usr/local/openmpi/1.1.1/lib:/usr/local/openmpi/1.1.1/lib/openmpi

First test program
-

This stupid program gathers the values of comm.rank at a root process
with rank = com.size/2 and prints the gathered values.

$ cat gather.c
#include 
#include 

int main() {
 int size, rank, root;
 MPI_Init(NULL, NULL);
 MPI_Comm_size(MPI_COMM_WORLD, &size);
 MPI_Comm_rank(MPI_COMM_WORLD, &rank);
 root = size/2;
 if (rank == root) {
   int i;
   int *buf = (int *) malloc(size * sizeof(int));
   for (i=0; i
#include 

int main() {
 int size, rank, root;
 MPI_Init(NULL, NULL);
 MPI_Comm_size(MPI_COMM_WORLD, &size);
 MPI_Comm_rank(MPI_COMM_WORLD, &rank);
 root = size/2;
 if (rank == root) {
   int i;
   int *buf = (int *) malloc(size * sizeof(int));
   for (i=0; i

[OMPI devel] problem with MPI_[Pack|Unpack]_external

2006-09-29 Thread Lisandro Dalcin

I've just catched a problem with packing/unpacking using 'external32'
in Linux. The problem seems to be word ordering, I believe you forgot
to make the little-endian <-> big-endian conversion somewhere. Below,
an interactive session with ipython (sorry, no time to write in C)
showing the problem. Please, ignore me if this has been already
reported.

In [1]: import numpy

In [2]: from mpi4py import MPI

In [3]: print numpy.dtype('i').itemsize, MPI.INT.extent
4 4

In [4]: print numpy.dtype('b').itemsize, MPI.BYTE.extent
1 1

In [5]:

In [5]: arr1 = numpy.array([256], dtype='i') # one int, for input

In [6]: print arr1
[256]

In [7]: buf = numpy.array([0,0,0,0], dtype='b') # four bytes, auxiliar

In [8]: print buf
[0 0 0 0]

In [9]: p = MPI.INT.Pack_external('external32', arr1, buf, 0)

In [10]: print buf, repr(buf.tostring())
[0 1 0 0] '\x00\x01\x00\x00'

In [11]: arr2 = numpy.array([0], dtype='i') # one int, for output

In [12]: print arr2
[0]

In [13]: p = MPI.INT.Unpack_external('external32', buf, 0, arr2)

In [14]: print arr2
[65536]

In [15]: print arr2.byteswap()
[256]


--
Lisandro Dalcín
---
Centro Internacional de Métodos Computacionales en Ingeniería (CIMEC)
Instituto de Desarrollo Tecnológico para la Industria Química (INTEC)
Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET)
PTLC - Güemes 3450, (3000) Santa Fe, Argentina
Tel/Fax: +54-(0)342-451.1594



[OMPI devel] MPI_XXX_{get|set}_errhandler in general , and for files in particular

2006-10-09 Thread Lisandro Dalcin

Looking at MPI-2 errata document,
http://www.mpi-forum.org/docs/errata-20-2.html, is says:

Page 61, after line 36. Add the following (paralleling the errata to MPI-1.1):

MPI_{COMM,WIN,FILE}_GET_ERRHANDLER behave as if a new error handler
object is created. That is, once the error handler is no longer
needed, MPI_ERRHANDLER_FREE should be called with the error handler
returned from MPI_ERRHANDLER_GET or MPI_{COMM,WIN,FILE}_GET_ERRHANDLER
to mark the error handler for deallocation. This provides behavior
similar to that of MPI_COMM_GROUP and MPI_GROUP_FREE.

Well, is seems that OMPI does not currently follow this specification.
Any plans to change this? Or it will not go in?

Additionaly, I've noted that MPI_File_get_errhandler fails with
MPI_ERR_FILE is passed file handle is MPI_FILE_NULL. However, I
undersand (regarding the standard) this is the handle to query to
get/set/reset the default error handler for new files... I think
MPI_File_{get|set}_errhandler should accept MPI_FILE_NULL handle. Am I
right?

--
Lisandro Dalcín
---
Centro Internacional de Métodos Computacionales en Ingeniería (CIMEC)
Instituto de Desarrollo Tecnológico para la Industria Química (INTEC)
Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET)
PTLC - Güemes 3450, (3000) Santa Fe, Argentina
Tel/Fax: +54-(0)342-451.1594



[OMPI devel] Something broken using Persistent Requests

2006-10-12 Thread Lisandro Dalcin

I am getting errors using persistent communications (OMPI 1.1.1). I am
trying to implement (in Python) example 2.32 from page 107 of MPI- The
Complete Reference (V1, 2nd. edition).

I think the problem is not in my wrappers (my script works fine with
MPICH2). Below the two issues:

1 - MPI_Startall fails (returning a negative error code, -105, which
in fact it seems to be out of range [MPI_SUCCESS...MPI_LASTCODE]).
However, doing 'for r in reqlist: r.Start()' works.

2 - And then, calling MPI_Waitall (or even iterating over request
array and calling MPI_Wait), the request seems to be deallocated (I
get MPI_REQUEST_NULL upon return), so I cannot start them again. I
understand this is wrong, the request handles should be marked as
inactive, but not for deallocation.

Please, ignore me if this was reported. I am really busy and I have
not found the time to navigate the OMPI sources to get in touch with
its internal, so I am always reporting problems, and never patches.
Sorry!

--
Lisandro Dalcín
---
Centro Internacional de Métodos Computacionales en Ingeniería (CIMEC)
Instituto de Desarrollo Tecnológico para la Industria Química (INTEC)
Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET)
PTLC - Güemes 3450, (3000) Santa Fe, Argentina
Tel/Fax: +54-(0)342-451.1594



[OMPI devel] Fwd: MPI_GROUP_TRANSLATE_RANKS (again)

2006-10-19 Thread Lisandro Dalcin

I've successfully installed the just released 1.1.2. So I go for a new
round catching bugs, non standard behavior, or just what could be seen
as convenient features.

The problem I've reported with MPI_GROUP_TRANSLATE_RANKS was
corrected. However, looking at MPI-2 errata documment, it says:

Add to page 36, after 3.2.11 (above)

3.2.12 MPI_GROUP_TRANSLATE_RANKS and MPI_PROC_NULL

MPI_PROC_NULL is a valid rank for input to MPI_GROUP_TRANSLATE_RANKS,
which returns MPI_PROC_NULL as the translated rank.

But it seems it returns MPI_UNDEFINED in this case. Try yourself:

In [1]: from mpi4py import MPI

In [2]: group = MPI.COMM_WORLD.Get_group()

In [3]: MPI.Group.Translate_ranks(group, [MPI.PROC_NULL], group)
Out[3]: [-32766]

In [4]: MPI.UNDEFINED
Out[4]: -32766


Additionaly, OMPI segfaults if the group is MPI_GROUP_EMPY. Try yourself

In [5]: group = MPI.GROUP_EMPTY

In [6]: MPI.Group.Translate_ranks(group, [MPI.PROC_NULL], group)
Signal:11 info.si_errno:0(Success) si_code:1(SEGV_MAPERR)
Failing at addr:0xfff8
[0] func:/usr/local/openmpi/1.1.2/lib/libopal.so.0 [0xba1dfc]
[1] func:[0xe67440]
[2] func:/usr/local/openmpi/1.1.2/lib/libmpi.so.0(MPI_Group_translate_ranks+0xaa
) [0x5f0786]
[3] func:/u/dalcinl/lib/python/mpi4py/_mpi.so [0xa5a6c6]
[4] func:/usr/local/lib/libpython2.4.so.1.0(PyCFunction_Call+0x66) [0x1d5d66]
# more traceback .
[31] func:/usr/local/lib/libpython2.4.so.1.0 [0x20b009]
*** End of error message ***
Segmentation fault


--
Lisandro Dalcín
---
Centro Internacional de Métodos Computacionales en Ingeniería (CIMEC)
Instituto de Desarrollo Tecnológico para la Industria Química (INTEC)
Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET)
PTLC - Güemes 3450, (3000) Santa Fe, Argentina
Tel/Fax: +54-(0)342-451.1594



[OMPI devel] MPI_BUFFER_ATTACH/DETACH behaviour

2006-10-19 Thread Lisandro Dalcin

As a general idea and following similar MPI concepts, it can be really
useful if MPI_BUFFER_ATTACH/DETACH allowed a layered usage, inside
modules. That is, inside a call, a library can make a 'detach' and
cache it, next 'attach' an internally allocated resource, call BSEND,
'detach' it own resources, and finaly re-'attach' the original
resources. I've already disccussed this a bit with Bill Gropp,
regarding MPICH2 behaviour.

So I would to propose the following:

1- MPI_BUFFER_ATTACH should attach the provided buffer, raising an
error if the provided size is less than BSEND_OVERHEAD (why to
postpone the error until MPI_Bsend?). Currently, the behavior is:

In [1]: from mpi4py import MPI

In [2]: mem = MPI.Alloc_mem(1)

In [3]: mem
Out[3]: 

In [4]: MPI.Attach_buffer(mem)

In [5]: MPI.BSEND_OVERHEAD
Out[5]: 128

Any subsequent MPI_BSEND is likely to fail for lack of buffer space. Am I right?

2- MPI_BUFFER_ATTACH should raise an error if a previous buffer was
attached. OMPI currently seems to work like this, however in a second
call to attach i get an error code -104, which I think is internal and
should be remaped to public range [SUCCESS, LASTCODE). See below, the
error string is generated by MY code, because I asumed as a genral
rule that calling MPI_GET_ERROR_STRING is unsafe with an out of range
error code.

In [6]: MPI.Attach_buffer(mem)
---
mpi4py.MPI.Exception Traceback (most
recent call last)
# more output 
Exception: unable to retrieve error string, ierr=-104 out of range
[MPI_SUCCESS=0, MPI_ERR_LASTCODE=54)


3 - MPI_BUFFER_DETACH should always success, even if there is no
buffer to detach. In the last case, it should return a null pointer,
and perhaps a zero size.

This way, inside a library routine we can safely call
MPI_BUFFER_DETACH, MPI_BUFFER_ATTACH/DETACH owned memory, and finally
test if original buffer (gotten in the initial call to detach) is
valid buy testing pointer or size.

Waiting for your comments...

Regards,

--
Lisandro Dalcín
---
Centro Internacional de Métodos Computacionales en Ingeniería (CIMEC)
Instituto de Desarrollo Tecnológico para la Industria Química (INTEC)
Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET)
PTLC - Güemes 3450, (3000) Santa Fe, Argentina
Tel/Fax: +54-(0)342-451.1594



[OMPI devel] some stuff defined for Fortran but not for C

2006-10-20 Thread Lisandro Dalcin

in release 1.1.2, the following is included in 'mpif-config.h'

 parameter (OMPI_MAJOR_VERSION=1)
 parameter (OMPI_MINOR_VERSION=1)
 parameter (OMPI_RELEASE_VERSION=2)

Any chance of having this accesible in C ?

--
Lisandro Dalcín
---
Centro Internacional de Métodos Computacionales en Ingeniería (CIMEC)
Instituto de Desarrollo Tecnológico para la Industria Química (INTEC)
Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET)
PTLC - Güemes 3450, (3000) Santa Fe, Argentina
Tel/Fax: +54-(0)342-451.1594



Re: [OMPI devel] [Open MPI] #529: MPI_START* returning OMPI_* error codes

2006-10-23 Thread Lisandro Dalcin

On 10/22/06, Open MPI  wrote:

#529: MPI_START* returning OMPI_* error codes
-+--
Reporter:  jsquyres  |   Owner:
Type:  defect|  Status:  new
Priority:  major |   Milestone:  Open MPI 1.1.3
 Version:  trunk |Keywords:
-+--
 I sent this on the core list but got no reply, so I'm turning it into a
 ticket.

 A user reported that they were getting back value -105 from MPI_STARTALL
 (which is OMPI_ERR_REQUEST -- assumedly they were using
 MPI_ERRORS_RETURN).  Regardless of what is happening to make this error be
 returned, we should never be returning an OMPI_ERR_* value from an MPI
 function.  Instead, we should be converting this from OMPI_ERR_* to
 MPI_ERR_*.

 Specifically, don't we need to wrap the returns of these MCA_PML_CALLs in
 OMPI_ERRHANDLER_RETURN?  Something like:

 {{{
 rc = MCA_PML_CALL(start());
 OMPI_ERRHANDLER_RETURN(rc, X, rc, FUNC_NAME);
 }}}

 where XXX is some relevant communicator:

  * MPI_START: the communicator of the single request -- easy enough
  * MPI_STARTALL: MPI-1:3.9 says that STARTALL is exactly equivalent to
 calling START n times, so I guess we use the communicator from the request
 that caused the error.  pml_base_module_start_t() doesn't return ''which''
 request caused the error, so I'm guessing that if (OMPI_SUCCESS != rc),
 we'll have to scan through the list of requests to find the first one with
 an error and use the communicator from that one.  Right?

--
Ticket URL: 
Open MPI 




Yes, but... Which error handler will be called?? The one associated to
the communicator involved in the request, or MPI_COMM_WORLD? I do not
remember right now if the standar says anything about this. If not, it
should call the error handler of WORLD communicator. Am I right?


--
Lisandro Dalcín
---
Centro Internacional de Métodos Computacionales en Ingeniería (CIMEC)
Instituto de Desarrollo Tecnológico para la Industria Química (INTEC)
Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET)
PTLC - Güemes 3450, (3000) Santa Fe, Argentina
Tel/Fax: +54-(0)342-451.1594



[OMPI devel] Problems in Collectives+Intercomms

2006-11-06 Thread Lisandro Dalcin

A user testing my MPI wrappers for Python found a couple of problems
with OMPI-1.1 using valgrind, here are his reports.

http://projects.scipy.org/mpi4py/ticket/9
http://projects.scipy.org/mpi4py/ticket/10

I've investigated this at OMPI-1.1.2 sources, and found the following
in file ompi/mpi/c/allgatherv.c

  size = ompi_comm_size(comm);
  for (i = 0; i < size; ++i) {
if (recvcounts[i] < 0) {
  return OMPI_ERRHANDLER_INVOKE(comm, MPI_ERR_COUNT, FUNC_NAME);
} else if (MPI_DATATYPE_NULL == recvtype) {
  return OMPI_ERRHANDLER_INVOKE(comm, MPI_ERR_TYPE, FUNC_NAME);
}
  }

Two things to point on this source file :

- I cannot see any special check for the itercommuncator case, and
checking recvcount with 'ompi_comm_size' is wrong, it should be with
remote comm size. Am I right?

- Test for recvtype should be done outside the loop.

In file ompi/mpi/c/gatherv.c, there are special check for intercomms.
However, in the intercomm specific checks, you are still using
'ompi_comm_size'


As reference, in ompi/mpi/c/alltoallv.c you have

size = ompi_comm_remote_size(comm);
for (i = 0; i < size; ++i) {
  if (recvcounts[i] < 0) {
err = MPI_ERR_COUNT;
  } else if (MPI_DATATYPE_NULL == recvtype) {
err = MPI_ERR_TYPE;
  } else {
OMPI_CHECK_DATATYPE_FOR_SEND(err, sendtype, sendcounts[i]);
  }
  OMPI_ERRHANDLER_CHECK(err, comm, err, FUNC_NAME);
}

This is OK, but perhaps some things can be moved outside the loop.

Regards,

--
Lisandro Dalcín
---
Centro Internacional de Métodos Computacionales en Ingeniería (CIMEC)
Instituto de Desarrollo Tecnológico para la Industria Química (INTEC)
Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET)
PTLC - Güemes 3450, (3000) Santa Fe, Argentina
Tel/Fax: +54-(0)342-451.1594



[OMPI devel] failures runing mpi4py testsuite, perhaps Comm.Split()

2007-07-11 Thread Lisandro Dalcin

Ups, sended to wrong list, forwarded here...

-- Forwarded message --
From: Lisandro Dalcin 
List-Post: devel@lists.open-mpi.org
Date: Jul 11, 2007 8:58 PM
Subject: failures runing mpi4py testsuite, perhaps Comm.Split()
To: Open MPI 


Hello all, after a long time I'm here again. I am improving mpi4py in
order to support MPI threads, and I've found some problem with latest
version 1.2.3

I've configured with:

$ ./configure --prefix /usr/local/openmpi/1.2.3 --enable-mpi-threads
--disable-dependency-tracking

However, for the following fail, MPI_Init_thread() was not used. This
test creates a intercommunicator by using Comm.Split() followed by
Intracomm.Create_intercomm(). When running in two or more procs (for
one proc this test is skipped), I got (sometimes) the following trace

[trantor:06601] *** Process received signal ***
[trantor:06601] Signal: Segmentation fault (11)
[trantor:06601] Signal code: Address not mapped (1)
[trantor:06601] Failing at address: 0xa8
[trantor:06601] [ 0] [0x958440]
[trantor:06601] [ 1]
/usr/local/openmpi/1.2.3/lib/openmpi/mca_btl_sm.so(mca_btl_sm_component_progress+0x1483)
[0x995553]
[trantor:06601] [ 2]
/usr/local/openmpi/1.2.3/lib/openmpi/mca_bml_r2.so(mca_bml_r2_progress+0x36)
[0x645d06]
[trantor:06601] [ 3]
/usr/local/openmpi/1.2.3/lib/libopen-pal.so.0(opal_progress+0x58)
[0x1a2c88]
[trantor:06601] [ 4]
/usr/local/openmpi/1.2.3/lib/libmpi.so.0(ompi_request_wait_all+0xea)
[0x140a8a]
[trantor:06601] [ 5]
/usr/local/openmpi/1.2.3/lib/openmpi/mca_coll_tuned.so(ompi_coll_tuned_sendrecv_actual+0xc8)
[0x22d6e8]
[trantor:06601] [ 6]
/usr/local/openmpi/1.2.3/lib/openmpi/mca_coll_tuned.so(ompi_coll_tuned_allgather_intra_bruck+0xf2)
[0x231ca2]
[trantor:06601] [ 7]
/usr/local/openmpi/1.2.3/lib/openmpi/mca_coll_tuned.so(ompi_coll_tuned_allgather_intra_dec_fixed+0x8b)
[0x22db7b]
[trantor:06601] [ 8]
/usr/local/openmpi/1.2.3/lib/libmpi.so.0(ompi_comm_split+0x9d)
[0x12d92d]
[trantor:06601] [ 9]
/usr/local/openmpi/1.2.3/lib/libmpi.so.0(MPI_Comm_split+0xad)
[0x15a53d]
[trantor:06601] [10] /u/dalcinl/lib/python/mpi4py/_mpi.so [0x508500]
[trantor:06601] [11]
/usr/local/lib/libpython2.5.so.1.0(PyCFunction_Call+0x14d) [0xe150ad]
[trantor:06601] [12]
/usr/local/lib/libpython2.5.so.1.0(PyEval_EvalFrameEx+0x64af)
[0xe626bf]
[trantor:06601] [13]
/usr/local/lib/libpython2.5.so.1.0(PyEval_EvalCodeEx+0x7c4) [0xe63814]
[trantor:06601] [14]
/usr/local/lib/libpython2.5.so.1.0(PyEval_EvalFrameEx+0x5a43)
[0xe61c53]
[trantor:06601] [15]
/usr/local/lib/libpython2.5.so.1.0(PyEval_EvalFrameEx+0x6130)
[0xe62340]
[trantor:06601] [16]
/usr/local/lib/libpython2.5.so.1.0(PyEval_EvalCodeEx+0x7c4) [0xe63814]
[trantor:06601] [17] /usr/local/lib/libpython2.5.so.1.0 [0xe01450]
[trantor:06601] [18]
/usr/local/lib/libpython2.5.so.1.0(PyObject_Call+0x37) [0xddf5c7]
[trantor:06601] [19]
/usr/local/lib/libpython2.5.so.1.0(PyEval_EvalFrameEx+0x42eb)
[0xe604fb]
[trantor:06601] [20]
/usr/local/lib/libpython2.5.so.1.0(PyEval_EvalCodeEx+0x7c4) [0xe63814]
[trantor:06601] [21] /usr/local/lib/libpython2.5.so.1.0 [0xe0137a]
[trantor:06601] [22]
/usr/local/lib/libpython2.5.so.1.0(PyObject_Call+0x37) [0xddf5c7]
[trantor:06601] [23] /usr/local/lib/libpython2.5.so.1.0 [0xde6de5]
[trantor:06601] [24]
/usr/local/lib/libpython2.5.so.1.0(PyObject_Call+0x37) [0xddf5c7]
[trantor:06601] [25] /usr/local/lib/libpython2.5.so.1.0 [0xe2abc9]
[trantor:06601] [26]
/usr/local/lib/libpython2.5.so.1.0(PyObject_Call+0x37) [0xddf5c7]
[trantor:06601] [27]
/usr/local/lib/libpython2.5.so.1.0(PyEval_EvalFrameEx+0x1481)
[0xe5d691]
[trantor:06601] [28]
/usr/local/lib/libpython2.5.so.1.0(PyEval_EvalCodeEx+0x7c4) [0xe63814]
[trantor:06601] [29] /usr/local/lib/libpython2.5.so.1.0 [0xe01450]
[trantor:06601] *** End of error message ***


As the problem seems to originate in Comm.Split(), I've written a
small python script to test it::

from mpi4py import MPI

# true MPI_COMM_WORLD_HANDLE
BASECOMM = MPI.__COMM_WORLD__

BASE_SIZE = BASECOMM.Get_size()
BASE_RANK = BASECOMM.Get_rank()

if BASE_RANK < (BASE_SIZE // 2) :
   COLOR = 0
else:
   COLOR = 1

INTRACOMM = BASECOMM.Split(COLOR, key=0)
print 'Done!!!'

This seems always work, but running it under valgrind (note
valgrind-py below is just an alias adding a suppression file for
python) I get the following:

mpiexec -n 3 valgrind-py python test.py

=6727== Warning: set address range perms: large range 134217728 (defined)
==6727== Source and destination overlap in memcpy(0x4C93EA0, 0x4C93EA8, 16)
==6727==at 0x4006CE6: memcpy (mc_replace_strmem.c:116)
==6727==by 0x46C59CA: ompi_ddt_copy_content_same_ddt (in
/usr/local/openmpi/1.2.3/lib/libmpi.so.0.0.0)
==6727==by 0x4BADDCE: ompi_coll_tuned_allgather_intra_bruck (in
/usr/local/openmpi/1.2.3/lib/openmpi/mca_coll_tuned.so)
==6727==by 0x4BA9B7A: ompi_coll_tuned_allgather_intra_dec_fixed
(in /usr/local/openmpi/1.2.3/lib/openmpi/mca_coll_tuned.so)
==6727==by 0x46A692C: ompi_comm_split (in
/usr/local/

Re: [OMPI devel] failures runing mpi4py testsuite, perhaps Comm.Split()

2007-07-11 Thread Lisandro Dalcin

On 7/11/07, George Bosilca  wrote:

The two errors you provide are quite different. The first one has
been addresses few days ago in the trunk (https://svn.open-mpi.org/
trac/ompi/changeset/15291). If instead of the 1.2.3 you use anything
after r15291 you will be safe in a threading case.


Please, take into account that in this case I not used MPI_Init_tread() ...

In any case, sorry for making noise if this was already reported. I
have other issues to report, but perhaps I should try the svn version.
Please, understand me, I am really busy with many things as to be
up-to-date with every source code I use. Sorry again.




The second is different. The problem is that memcpy is a lot faster
than memmove, and that's why we use it.


Yes, of course.


The case where the 2 data
overlap are quite minimal. I'll take a look to see exactly what
happened there.


Initially, I though it was my error, but next realized that this seems
to happen in Comm.Split() internals.



--
Lisandro Dalcín
---
Centro Internacional de Métodos Computacionales en Ingeniería (CIMEC)
Instituto de Desarrollo Tecnológico para la Industria Química (INTEC)
Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET)
PTLC - Güemes 3450, (3000) Santa Fe, Argentina
Tel/Fax: +54-(0)342-451.1594



[OMPI devel] COVERITY STATIC SOURCE CODE ANALYSIS

2007-07-19 Thread Lisandro Dalcin

Have any of you ever consider asking OpenMPI being included here, as
it is an open source project?

http://scan.coverity.com/index.html


From many sources (mainly related to Python), it seems the results are

impressive.

Regards,

--
Lisandro Dalcín
---
Centro Internacional de Métodos Computacionales en Ingeniería (CIMEC)
Instituto de Desarrollo Tecnológico para la Industria Química (INTEC)
Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET)
PTLC - Güemes 3450, (3000) Santa Fe, Argentina
Tel/Fax: +54-(0)342-451.1594



[OMPI devel] MPI_APPNUM value for apps not started through mpiexec

2007-07-23 Thread Lisandro Dalcin

Using a fresh (2 hours agoo) update of SVN branch v1.2, I've found
that attribute MPI_APPNUM returns -1 (minus one) when an 'sequential'
application is not launched through mpiexec.

Reading the MPI standard, I understand it should return a non-negative
integer if defined, or it should not be defined at all.

http://www.mpi-forum.org/docs/mpi-20-html/node113.htm#Node113
"""
If an application was not spawned with MPI_COMM_SPAWN or
MPI_COMM_SPAWN_MULTIPLE, and MPI_APPNUM doesn't make sense in the
context of the implementation-specific startup mechanism, MPI_APPNUM
is not set.
"""

I'm not sure if this is intended, but I report it anyway, sorry if
this is issue was already reported.

Regards,

--
Lisandro Dalcín
---
Centro Internacional de Métodos Computacionales en Ingeniería (CIMEC)
Instituto de Desarrollo Tecnológico para la Industria Química (INTEC)
Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET)
PTLC - Güemes 3450, (3000) Santa Fe, Argentina
Tel/Fax: +54-(0)342-451.1594



[OMPI devel] MPI_ALLOC_MEM warning when requesting 0 (zero) bytes

2007-07-23 Thread Lisandro Dalcin

If I understand correctly the standard,
http://www.mpi-forum.org/docs/mpi-20-html/node54.htm#Node54

MPI_ALLOC_MEM with size=0 is valid ('size' is a nonnegative integer)

Then, using branch v1.2, I've got the following warning at runtime:

malloc debug: Request for 0 bytes (base/mpool_base_alloc.c, 194)

As always, forget me if this was already reported.

Regards,

--
Lisandro Dalcín
---
Centro Internacional de Métodos Computacionales en Ingeniería (CIMEC)
Instituto de Desarrollo Tecnológico para la Industria Química (INTEC)
Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET)
PTLC - Güemes 3450, (3000) Santa Fe, Argentina
Tel/Fax: +54-(0)342-451.1594



Re: [OMPI devel] Fwd: [Open MPI] #1101: MPI_ALLOC_MEM with 0 size must be valid

2007-07-24 Thread Lisandro Dalcin

On 7/23/07, Jeff Squyres  wrote:

Does anyone have any opinions on this?  If not, I'll go implement
option #1.


Sorry, Jeff... just reading this. I think your option #1 is the
better. However, I want to warn you about to issues:

* In my Linux FC6 box, malloc(0) return different pointers for each
call. In fact, I believe this is a requeriment for malloc, in the case
of MPI_Alloc_mem, this could be relaxed, but it could cause problems
(supose some code building a hash table using pointers as keys, or
even a stl::map). Just a warn.

* malloc(0) return an aligned pointer, here I really think
MPI_Alloc_mem should return a pointer with the same aligment a
malloc(1) would return. So I am not sure your global char[1] is OK.

As reference, I can comment the approach used in Python memory
allocator to assure portability across platforms. They always alloc at
least 1 byte. This is not so important in an environment like Python,
but perhaps this approach in wrong for an MPI implementation.


Regards,

--
Lisandro Dalcín
---
Centro Internacional de Métodos Computacionales en Ingeniería (CIMEC)
Instituto de Desarrollo Tecnológico para la Industria Química (INTEC)
Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET)
PTLC - Güemes 3450, (3000) Santa Fe, Argentina
Tel/Fax: +54-(0)342-451.1594



Re: [OMPI devel] Fwd: [Open MPI] #1101: MPI_ALLOC_MEM with 0 size must be valid

2007-07-24 Thread Lisandro Dalcin

Per Lisandro's comments: I think that if you need a random/valid
value for an STL map (or similar), malloc(0) is not a good idea to
use as a key.


OK, regarding comments in this thread, you are completelly right. I am
fine with returning NULL.

BTW, should'nt this issue be commented in the standard? Perhaps in the
errata document? I think there is no a strong need to make it
implementation dependent.

MPI-2 could mandate/suggest that if size=0, the returned pointer is
NULL, but then MPI_Free_mem with a NULL pointer should succeed.

Now a question: What about Fortran ?

--
Lisandro Dalcín
---
Centro Internacional de Métodos Computacionales en Ingeniería (CIMEC)
Instituto de Desarrollo Tecnológico para la Industria Química (INTEC)
Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET)
PTLC - Güemes 3450, (3000) Santa Fe, Argentina
Tel/Fax: +54-(0)342-451.1594



Re: [OMPI devel] MPI_ALLOC_MEM warning when requesting 0 (zero) bytes

2007-07-25 Thread Lisandro Dalcin

On 7/23/07, Jeff Squyres  wrote:

I think that this will require a little tomfoolery to fix properly
because we can't simply return NULL (you can't expect to use the
pointer that we return to store anything, but you should be able to
expect to be able to dereference it without seg faulting).


Excellent! As reference, MPICH2 seems to return different pointers for
size=0, but perhaps this happens because it falls to use system
malloc, and in my box this always return different, non-null pointers.

--
Lisandro Dalcín
---
Centro Internacional de Métodos Computacionales en Ingeniería (CIMEC)
Instituto de Desarrollo Tecnológico para la Industria Química (INTEC)
Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET)
PTLC - Güemes 3450, (3000) Santa Fe, Argentina
Tel/Fax: +54-(0)342-451.1594



Re: [OMPI devel] MPI_ALLOC_MEM warning when requesting 0 (zero) bytes

2007-07-26 Thread Lisandro Dalcin

On 7/25/07, Jeff Squyres  wrote:

Be sure to read this thread in order -- the conclusion of the thread
was that we now actually *do* return NULL, per POSIX advice.


OK, I got confused. And now, MPI_Free_mem is going to fail with a NULL
pointer? Not sure what POSIX says, but then OMPI should also follow it
advice, right?

--
Lisandro Dalcín
---
Centro Internacional de Métodos Computacionales en Ingeniería (CIMEC)
Instituto de Desarrollo Tecnológico para la Industria Química (INTEC)
Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET)
PTLC - Güemes 3450, (3000) Santa Fe, Argentina
Tel/Fax: +54-(0)342-451.1594



[OMPI devel] MPI_Win_get_group

2007-07-27 Thread Lisandro Dalcin
The MPI-2 standard says (see bottom of
)

MPI_WIN_GET_GROUP returns a duplicate of the group of the communicator
used to create the window. associated with win. The group is returned
in group.

Pease, note the 'duplicate' ...

Well, it seems OMPI (v1.2 svn) is not returning a duplicate, comparing
the handles with == C operator gives true. Can you confirm this?
Should the word 'duplicate' be interpreted as 'a new reference to' ?

As reference, MPICH2 seems to return different handles.

Anyway, I think the standard needs to be corrected/clarified. Perhaps
the strict 'duplication' does not make any sense.

Regards, and sorry me for raising again such low-level, corner-cases ...


-- 
Lisandro Dalcín
---
Centro Internacional de Métodos Computacionales en Ingeniería (CIMEC)
Instituto de Desarrollo Tecnológico para la Industria Química (INTEC)
Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET)
PTLC - Güemes 3450, (3000) Santa Fe, Argentina
Tel/Fax: +54-(0)342-451.1594



Re: [OMPI devel] MPI_Win_get_group

2007-07-28 Thread Lisandro Dalcin
On 7/28/07, Brian Barrett  wrote:
> In my opinion, we conform to the standard.  We reference count the
> group, it's incremented on call to MPI_WIN_GROUP, and you can safely
> call MPI_GROUP_FREE on the group returned from MPI_WIN_GROUP.  Groups
> are essentially immutable, so there's no way I can think of that we
> violate the MPI standard.

Your reasoning make a lot of sense. It there seems to be no point in
requiring a true 'duplicate' being returned by Win.Get_group().

> Others are, of course, free to disagree with me.

I do not strictly disagree with you, but the standard do (if it is
readed like a lawyer). In any case, this is other corner case that
should be clarified in the standard...

In the mean time, I would prefer to follow the standard as close as
possible. If not, some external, stupid test suite (like the one I
have for mip4py) would report that OMPI is wrong about this point.

BTW, Were is the place to disccuss this stantard isues? There is a
mailing list at mpi-forum.org?

Regards,

-- 
Lisandro Dalcín
---
Centro Internacional de Métodos Computacionales en Ingeniería (CIMEC)
Instituto de Desarrollo Tecnológico para la Industria Química (INTEC)
Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET)
PTLC - Güemes 3450, (3000) Santa Fe, Argentina
Tel/Fax: +54-(0)342-451.1594



[OMPI devel] MPI_Comm_free with MPI_COMM_SELF

2007-07-28 Thread Lisandro Dalcin
I tried to free COMM_SELF, and it seems to call the error handler
attached to COMM_WORLD. Is this intended? Should'nt OMPI use the error
handler to COMM_SELF?

As reference, I tried this with MPICH2, and of course the call fails,
but using the error handler in COMM_SELF.

Again, this is a new corner case AFAIK not taked into account in the standard.

Regards,

-- 
Lisandro Dalcín
---
Centro Internacional de Métodos Computacionales en Ingeniería (CIMEC)
Instituto de Desarrollo Tecnológico para la Industria Química (INTEC)
Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET)
PTLC - Güemes 3450, (3000) Santa Fe, Argentina
Tel/Fax: +54-(0)342-451.1594



[OMPI devel] freeing GROUP_EMPTY

2007-07-28 Thread Lisandro Dalcin
A simple test trying to free GROUP_EMPTY failed with the following trace.

a.out: ../opal/class/opal_object.h:403: opal_obj_run_destructors:
Assertion `((void *)0) != object->obj_class' failed.
[trantor:19821] *** Process received signal ***
[trantor:19821] Signal: Aborted (6)
[trantor:19821] Signal code:  (-6)
[trantor:19821] [ 0] [0xcf5440]
[trantor:19821] [ 1] /lib/libc.so.6(abort+0x101) [0x4fe3c591]
[trantor:19821] [ 2] /lib/libc.so.6(__assert_fail+0xfb) [0x4fe3438b]
[trantor:19821] [ 3] /usr/local/openmpi/dev/lib/libmpi.so.0 [0xe554e2]
[trantor:19821] [ 4]
/usr/local/openmpi/dev/lib/libmpi.so.0(ompi_group_finalize+0x66)
[0xe55b69]
[trantor:19821] [ 5]
/usr/local/openmpi/dev/lib/libmpi.so.0(ompi_mpi_finalize+0x37a)
[0xe62ab6]
[trantor:19821] [ 6]
/usr/local/openmpi/dev/lib/libmpi.so.0(PMPI_Finalize+0x5f) [0xe9ca6f]
[trantor:19821] [ 7] a.out(main+0x2f) [0x804877d]
[trantor:19821] [ 8] /lib/libc.so.6(__libc_start_main+0xdc) [0x4fe27f2c]
[trantor:19821] [ 9] a.out [0x8048661]
[trantor:19821] *** End of error message ***
Aborted


-- 
Lisandro Dalcín
---
Centro Internacional de Métodos Computacionales en Ingeniería (CIMEC)
Instituto de Desarrollo Tecnológico para la Industria Química (INTEC)
Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET)
PTLC - Güemes 3450, (3000) Santa Fe, Argentina
Tel/Fax: +54-(0)342-451.1594



Re: [OMPI devel] MPI_Win_get_group

2007-07-30 Thread Lisandro Dalcin
On 7/29/07, Jeff Squyres  wrote:
> On Jul 28, 2007, at 4:41 PM, Lisandro Dalcin wrote:
>
> > In the mean time, I would prefer to follow the standard as close as
> > possible. If not, some external, stupid test suite (like the one I
> > have for mip4py) would report that OMPI is wrong about this point.
>

What exactly are you testing for?

Equality with the '==' C operator (i.e. handle equality). Using
Group.Compare() yields IDENT, as expected. But for groups, I
understand IDENT means either equal handles (in the C/C++ '==' sense)
or groups with the same size and rank order.


-- 
Lisandro Dalcín
---
Centro Internacional de Métodos Computacionales en Ingeniería (CIMEC)
Instituto de Desarrollo Tecnológico para la Industria Química (INTEC)
Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET)
PTLC - Güemes 3450, (3000) Santa Fe, Argentina
Tel/Fax: +54-(0)342-451.1594



[OMPI devel] looking up service

2007-07-30 Thread Lisandro Dalcin
MPI_Lookup_name() is supposed to work on v1.2 branch? I cannot get it
working (it fails with MPI_ERR_NAME).

-- 
Lisandro Dalcín
---
Centro Internacional de Métodos Computacionales en Ingeniería (CIMEC)
Instituto de Desarrollo Tecnológico para la Industria Química (INTEC)
Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET)
PTLC - Güemes 3450, (3000) Santa Fe, Argentina
Tel/Fax: +54-(0)342-451.1594



Re: [OMPI devel] MPI_Win_get_group

2007-07-30 Thread Lisandro Dalcin
On 7/30/07, George Bosilca  wrote:
> In the data-type section there is an advice to implementors that
> state that a copy can simply increase the reference count if
> applicable. So, we might want to apply the same logic here ...

BTW, you just mentioned other obscure case. Do this apply to NAMED
datatypes? This issue is really cumbersome in File.Get_view().

-- 
Lisandro Dalcín
---
Centro Internacional de Métodos Computacionales en Ingeniería (CIMEC)
Instituto de Desarrollo Tecnológico para la Industria Química (INTEC)
Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET)
PTLC - Güemes 3450, (3000) Santa Fe, Argentina
Tel/Fax: +54-(0)342-451.1594



Re: [OMPI devel] MPI_Win_get_group

2007-07-31 Thread Lisandro Dalcin
On 7/31/07, Dries Kimpe  wrote:
> The MPI_File_get_view description in the standard has some issues related
> to copies and named datatypes:
>
> see
> http://www-unix.mcs.anl.gov/~gropp/projects/parallel/MPI/mpi-errata/discuss/fileview/fileview-1-clean.txt

Indeed, your comment was exactly the source of my comment (BTW, thank
you, this helped me to fix my Python wrappers)

In general, I think MPI standard should be fixed/clarified in many
places regarding to handling of returned references. Testing for
predefined Comm a Group handling is rather easy, but for Datatypes is
really cumbersome. Perhaps a MPI_Type_is_named(MPI_Datatype, int*flag)
would help a lot. What do you think?



-- 
Lisandro Dalcín
---
Centro Internacional de Métodos Computacionales en Ingeniería (CIMEC)
Instituto de Desarrollo Tecnológico para la Industria Química (INTEC)
Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET)
PTLC - Güemes 3450, (3000) Santa Fe, Argentina
Tel/Fax: +54-(0)342-451.1594



Re: [OMPI devel] MPI_Win_get_group

2007-07-31 Thread Lisandro Dalcin
On 7/31/07, Jeff Squyres  wrote:
> Just curious -- why do you need to know if a handle refers to a
> predefined object?

If I understand correctly, new handles shoud be freed in order to do
not leak things, to follow good programming practices, and being
completelly sure a valgrind run do not report any problem.

I am working in the development of MPI for Python, a port of MPI to
Python, a high level language with automatic memory management. Said
that, in such an environment, having to call XXX.Free() for  every
object i get from a call like XXX.Get_something() is really an
unnecesary pain.

Many things in MPI are LOCAL (datatypes, groups, predefined
operations) and in general destroying them for user-space is
guaranteed by MPI to not conflict with system(MPI)-space and
communication (i.e. if you create a derived datatype four using it in
a construction of another derived datatype, you can safely free the
first).

Well, for all those LOCAL objects, I could implement automatic
deallocation of handles for Python (for Comm, Win, and File, that is
not so easy, at freeing them is a collective operation AFAIK, and
automaticaly freeing them can lead to deadlocks).

My Python wrappers (mpi4py) are inteded to be used in any platform
with any MPI implementation. But things are not so easy, as there are
many corner cases in the MPI standard.

Python es a wonderfull, powerfull language, very friendly to write
things. Prove of that is the many bug reports I provided here. By
using python, I can run all my unittest script in a single MPI run,
thus they have the potential to find interaction problems between all
parts of MPI. If any of you, OMPI developers, have some knowledge of
Python, I invite you to try mpi4py, as you would be able to write very
fast many many tests, not only for things that should work, but also
for things that should fail.

Sorry for the long mail. In short, many things in MPI are not clearly
designed for languages other than C and Fortran. Even in C++
specification, there are things that are unnaceptable, like the
open-door to the problem of having dangling references, which could be
avoided with negligible cost. Anyway, all those issues are minor for
me, and the MPI specification is just great. I hope I can find the
time to contribute to the MPI-2.1 effort to better define MPI behavior
in the corner cases (fortunatelly, there are a really small number of
them).

Regards,

-- 
Lisandro Dalcín
---
Centro Internacional de Métodos Computacionales en Ingeniería (CIMEC)
Instituto de Desarrollo Tecnológico para la Industria Química (INTEC)
Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET)
PTLC - Güemes 3450, (3000) Santa Fe, Argentina
Tel/Fax: +54-(0)342-451.1594



Re: [OMPI devel] MPI_Win_get_group

2007-08-06 Thread Lisandro Dalcin
On 8/1/07, Jeff Squyres  wrote:
> On Jul 31, 2007, at 6:43 PM, Lisandro Dalcin wrote:
>> having to call XXX.Free() for  every
> > object i get from a call like XXX.Get_something() is really an
> > unnecesary pain.
>
> Gotcha.
>
> But I don't see why this means that you need to know if an MPI handle
> points to an intrinsic object or not...?

Because many predefined, intrinsic objects cannot (or should not be
able to) be freed, acording to the standard.

> > Many things in MPI are LOCAL (datatypes, groups, predefined
> > operations) and in general destroying them for user-space is
> > guaranteed by MPI to not conflict with system(MPI)-space and
> > communication (i.e. if you create a derived datatype four using it in
> > a construction of another derived datatype, you can safely free the
> > first).
> >
> > Well, for all those LOCAL objects, I could implement automatic
> > deallocation of handles for Python (for Comm, Win, and File, that is
> > not so easy, at freeing them is a collective operation AFAIK, and
> > automaticaly freeing them can lead to deadlocks).
>
> This is a difficult issue -- deadlocks for removing objects that are
> collective actions.  It's one of the reasons the Forum decided not to
> have the C++ bindings automatically free handles when they go out of
> scope.

An that was a really good and natural decision.

> > Sorry for the long mail. In short, many things in MPI are not clearly
> > designed for languages other than C and Fortran. Even in C++
> > specification, there are things that are unnaceptable, like the
> > open-door to the problem of having dangling references, which could be
> > avoided with negligible cost.
>
> Yes and no.  As the author of the C++ bindings chapter in MPI-2, I
> have a pretty good idea why we didn't do this.  :-)

Please, do not missunderstand me. C++ bindings are almost perfect for
me. The only thing I object a bit is the open-door for dangling
references. Any way, this is a minor problem. And the C++ bindings are
my source of inspiration for my python wrappers, as they are really
good for me.

> The standard is meant to be as simple, straightforward,
> and cross-language as possible (and look where it is!  Imagine if we
> had tried to make a real class library -- it would have led to even
> more corner cases and imprecision in the official standard).

Well, I have to completely agree with you. And as I said before, the
corner cases are really a few, compared to all the number of (rather
orthogonal) features provided in MPI. And all guess all this is going
to be solved with minor clarifications/corrections in MPI-2.1.



-- 
Lisandro Dalcín
---
Centro Internacional de Métodos Computacionales en Ingeniería (CIMEC)
Instituto de Desarrollo Tecnológico para la Industria Química (INTEC)
Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET)
PTLC - Güemes 3450, (3000) Santa Fe, Argentina
Tel/Fax: +54-(0)342-451.1594



Re: [OMPI devel] MPI_Win_get_group

2007-08-07 Thread Lisandro Dalcin
On 8/6/07, Jeff Squyres  wrote:
> On Aug 6, 2007, at 2:42 PM, Lisandro Dalcin wrote:
> > Because many predefined, intrinsic objects cannot (or should not be
> > able to) be freed, acording to the standard.
>
> I understand that.  :-)  But why would you call XXX.Free() on an
> intrinsic object?  If you're instantiating an MPI handle, you know
> that it's a user-created object and therefore you should MPI free it,
> right?  If you didn't instantiate it, then it's not a user-defined
> object, and therefore you shouldn't MPI free it.

Well, let take two examples:

- GROUP_EMPTY is predefined handle. The standard says that some group
operations may return ir (for example, Group::Excl). So you get a
handle which may (or may not) be GROUP_EMPTY. Then you are going to
free the return of Group::Excl(). If it is an empty group, but not
equal (in the C == operator sense) to GROUP_EMPTY, then you can safely
free it, but if the return is exactly GROUP_EMPTY. what should I do as
I user? Should I free it? Well, the standard says nothing about this,
and MPI implementations can do what they want. Disccussing this issue
with MPICH2 developers, they first decided to generate an error if
Group::Free() is called with GROUP_EMPTY, then I updated my code,
released a new version of mpi4py, and after all this, they sent me a
mail saying they reverted the change (i.e, users can free GROUP_EMPTY)
because some extenal code (Intel MPI Benchmark) was failing. And take
into account that checking for GROUP_EMPTY is a easy task (just use
==), but even in that case it you do not know if you can safely free
the result of a group operation.

- Other example is File::Get_view(). This function returns datatype
handles. If the returned handle is predefined, you cannot free it (not
sure, but i believe the standars explicitely says that), if it is not,
you sould free it. But in order to know that, you have to go through
datatype decoding.

IMHO, all those issues could be corrected in the standard, as follow

- Group operations NEVER return GROUP_EMPTY, so the user is in charge
to always call Group::Free() on the result. GROUP_EMPTY cannot be
freed, and it is just provided by convenience (i.e. you can use
Group::Compare() to know if a group has no members). The performance
implications of doing this (that is return a true duplicate of the
empty group) seems negligible.

- File::Get_View() should also never return a predefined datatype, but
a duplicate of them (in the Datatype::Dup() sense). Again, I cannot
see any performance penalty on this.

> If it's a question of trying to have a generic destructor (pardon me
> -- I know next to nothing about python) for your MPI handle classes,
> you can have a private member flag in your handle class indicating
> whether the underlying MPI handle is intrinsic or not.  Have a
> special communicator for instantiating the global / intrinsic objects
> (e.g., for MPI_INT) that sets this flag for "true"; have all other
> constructors set it to "false".  In the destructor, you check this
> flag and know whether you should call the corresponding MPI free
> function (assuming you solve issues surrounding deadlock, etc.).

I am currently doing this, but only for issuing a warning if a
non-predefined object is 'leaked'. For all local objects, like groups,
datatypes, I believe I could go further and enable automatic
destruction, but for global objects, like comm, win, files, the
deadlock problem is almost impossible to avoid. So for global objects
the user is still in charge of doing de destruction. But I am
completely fine with this. The only big problem is always to know (in
a implementation independed way) if you are able to free a handle.

Please, let continue this thread!!! We need to agree in the way things
should be done, and surely we should include MPICH2 people in this
discussion. This way, we have more chances to correct/clarify things
por MPI-2.1 (or even MPI 2.0).

Or do you think I should raise all this stuff to mpi-2.1 mailing list?
Regarding the archives, the list does not seems to be really active.
That's the reason I am always shotting here (and on mpich-maint) with
the hope of attract the attention of experienced people like all you
(in the end, I was in high-school by the time MPI was borned!!).

Regards,

-- 
Lisandro Dalcín
---
Centro Internacional de Métodos Computacionales en Ingeniería (CIMEC)
Instituto de Desarrollo Tecnológico para la Industria Química (INTEC)
Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET)
PTLC - Güemes 3450, (3000) Santa Fe, Argentina
Tel/Fax: +54-(0)342-451.1594



Re: [OMPI devel] MPI_Win_get_group

2007-08-07 Thread Lisandro Dalcin
On 8/1/07, Jeff Squyres  wrote:
> BTW, I totally forgot to mention a notable C++ MPI bindings project
> that is the next-generation/successor to OMPI: the Boost C++ MPI
> bindings (boost.mpi).
>
>  http://www.generic-programming.org/~dgregor/boost.mpi/doc/
>
> I believe there's also python bindings included...?

Well, after taking a look, let me say that (in its current state) this
boost bindings are (of course, in my biased view) far, far inferior to
OOMPI and even standard C++ bindings. They seems a mixture of C style
and some STL patterns. I really prefer Jeff's MPI-object-centric
interface.


-- 
Lisandro Dalcín
---
Centro Internacional de Métodos Computacionales en Ingeniería (CIMEC)
Instituto de Desarrollo Tecnológico para la Industria Química (INTEC)
Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET)
PTLC - Güemes 3450, (3000) Santa Fe, Argentina
Tel/Fax: +54-(0)342-451.1594



Re: [OMPI devel] [OMPI users] Possible Memcpy bug in MPI_Comm_split

2007-08-17 Thread Lisandro Dalcin
On 8/16/07, George Bosilca  wrote:
> Well, finally someone discovered it :) I know about this problem for
> quite a while now, it pop up during our own valgrind test of the
> collective module in Open MPI. However, it never create any problems
> in the applications, at least not as far as I know. That's why I'm
> reticent to replace the memcpy by a memmove (where the arguments are
> allowed to overlap) as there is a performance penalty.

George, I believe I also reported this some time ago, and your
comments were the same :-).

No time to dive into the internals, but for my the question is: What's
going on in Comm::Split() that it falls to copy everlapping memory? It
is expected, or it is perhaps a bug?

Regards,

>
>george.
>
> On Aug 16, 2007, at 9:31 AM, Allen Barnett wrote:
>
> > Hi:
> > I was running my OpenMPI 1.2.3 application under Valgrind and I
> > observed
> > this error message:
> >
> > ==14322== Source and destination overlap in memcpy(0x41F5BD0,
> > 0x41F5BD8,
> > 16)
> > ==14322==at 0x49070AD: memcpy (mc_replace_strmem.c:116)
> > ==14322==by 0x4A45CF4: ompi_ddt_copy_content_same_ddt
> > (in /home/scratch/DMP/RHEL4-GCC4/lib/libmpi.so.0.0.0)
> > ==14322==by 0x7A6C386: ompi_coll_tuned_allgather_intra_bruck
> > (in /home/scratch/DMP/RHEL4-GCC4/lib/openmpi/mca_coll_tuned.so)
> > ==14322==by 0x4A29FFE: ompi_comm_split
> > (in /home/scratch/DMP/RHEL4-GCC4/lib/libmpi.so.0.0.0)
> > ==14322==by 0x4A4E322: MPI_Comm_split
> > (in /home/scratch/DMP/RHEL4-GCC4/lib/libmpi.so.0.0.0)
> > ==14322==by 0x400A26: main
> > (in /home/scratch/DMP/severian_tests/ompi/a.out)
> >
> > Attached is a reduced code example. I run it like:
> >
> > mpirun -np 3 valgrind ./a.out
> >
> > I only see this error if there are an odd number of processes! I don't
> > know if this is really a problem or not, though. My OMPI application
> > seems to work OK. However, the linux man page for memcpy says
> > overlapping range copying is undefined.
> >
> > Other details: x86_64 (one box, two dual-core opterons), RHEL 4.5,
> > OpenMPI-1.2.3 compiled with the RHEL-supplied GCC 4 (gcc4 (GCC) 4.1.1
> > 20070105 (Red Hat 4.1.1-53)), valgrind 3.2.3.
> >
> > Thanks,
> > Allen
> >
> >
> > --
> > Allen Barnett
> > Transpire, Inc.
> > e-mail: al...@transpireinc.com
> > Ph: 518-887-2930
> >
> > ___
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>


-- 
Lisandro Dalcín
---
Centro Internacional de Métodos Computacionales en Ingeniería (CIMEC)
Instituto de Desarrollo Tecnológico para la Industria Química (INTEC)
Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET)
PTLC - Güemes 3450, (3000) Santa Fe, Argentina
Tel/Fax: +54-(0)342-451.1594



[OMPI devel] MPI_GROUP_EMPTY and MPI_Group_free()

2007-12-04 Thread Lisandro Dalcin
Dear all,

As I see some activity on a related ticked, below some comments I
sended to Bill Gropp some days ago about this subject. Bill did not
write me back, I know he is really busy.

Group operations are supposed to return new groups, so the used has to
free the result. Additionally, the standard say that those operations
may return the empty group. Then the issue: if the empty group is
returned, the user should or should not call MPI_Group_free()??. I
could not find any part of the standard about freeing MPI_GROUP_EMPTY.

This issue is very similar to the one in MPI-1 related to error handlers.

I believe the standard should be a bit stricter here, but many
possibilities are:

* MPI_GROUP_EMPTY must be freed it it is the result of a group
operation. This way similar to the management of predefined error
handlers.

* MPI_GROUP_EMPTY cannot be freed, as it is a predefined handle. Users
have to always check if the result of a group operation is
MPI_GROUP_EMPTY to know if they can or cannot free them. This way is
similar to the current management of predefined datatypes.



--
Lisandro Dalcín
---
Centro Internacional de Métodos Computacionales en Ingeniería (CIMEC)
Instituto de Desarrollo Tecnológico para la Industria Química (INTEC)
Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET)
PTLC - Güemes 3450, (3000) Santa Fe, Argentina
Tel/Fax: +54-(0)342-451.1594


-- 
Lisandro Dalcín
---
Centro Internacional de Métodos Computacionales en Ingeniería (CIMEC)
Instituto de Desarrollo Tecnológico para la Industria Química (INTEC)
Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET)
PTLC - Güemes 3450, (3000) Santa Fe, Argentina
Tel/Fax: +54-(0)342-451.1594



[OMPI devel] valgrind warnings (uninited mem passed to syscall)

2007-12-17 Thread Lisandro Dalcin
Dear all,

I'm getting valgrind warnings related to syscalls with uninitialized
memory (with release 1.2.4).

Before providing more details and code reproducing the problem, I
would like to know if there is any configure option I should take care
of which enables extra memory initialization (--enable-debug is
enough? I ask this because MPICH2 has specific configure option for
this, perhaps you also have something similar).

-- 
Lisandro Dalcín
---
Centro Internacional de Métodos Computacionales en Ingeniería (CIMEC)
Instituto de Desarrollo Tecnológico para la Industria Química (INTEC)
Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET)
PTLC - Güemes 3450, (3000) Santa Fe, Argentina
Tel/Fax: +54-(0)342-451.1594



[OMPI devel] some possible bugs after trying 1.2.6

2008-04-14 Thread Lisandro Dalcin
Hi all, I've just downloaded and installed release 1.2.6.
Additionally, I'm reimplementing from scratch my Python wrappers for
MPI using some more advanded tools than manual C coding. Now, I do not
try in any way of doing argument checking as I did before. Then I've
ran al my unittest machinger. And then comments follows.


MPI_Comm_get_errhandler() if called with MPI_COMM_NULL raises error
class MPI_ERR_ARG. I believe it should be MPI_ERR_COMM.


MPI_Abort() if called with MPI_COMM_NULL directly abort the process
instead of calling the error handler set in MPI_COMM_WORLD. I do not
know what is correct here; this is just for your information.


MPI_Cancel() and MPI_Request_free() success if they are called with
MPI_REQUEST_NULL. At first sight, this seems erroneous (at least in
the MPI-1) as TestXXX and WaitXXX should be the only accepting the
null handle, but now I cannot remember if MPI-2 clarified/modified
this (I believe not).


-- 
Lisandro Dalcín
---
Centro Internacional de Métodos Computacionales en Ingeniería (CIMEC)
Instituto de Desarrollo Tecnológico para la Industria Química (INTEC)
Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET)
PTLC - Güemes 3450, (3000) Santa Fe, Argentina
Tel/Fax: +54-(0)342-451.1594



[OMPI devel] Envelope of HINDEXED_BLOCK

2014-08-26 Thread Lisandro Dalcin
I've just installed 1.8.2, something is still wrong with
HINDEXED_BLOCK datatypes.

Please note the example below, it should print "ni=2" but I'm getting "ni=7".

$ cat type_hindexed_block.c
#include 
#include 
int main(int argc, char *argv[])
{
  MPI_Datatype datatype;
  MPI_Aint disps[] = {0,2,4,6,8};
  int ni,na,nd,combiner;
  MPI_Init(&argc, &argv);
  MPI_Type_create_hindexed_block(5, 2, disps, MPI_BYTE, &datatype);
  MPI_Type_get_envelope(datatype, &ni, &na, &nd, &combiner);
  printf("ni=%d na=%d nd=%d combiner=%d\n", ni, na, nd, combiner);
  MPI_Type_free(&datatype);
  MPI_Finalize();
  return 0;
}

$ mpicc type_hindexed_block.c

$ ./a.out
ni=7 na=5 nd=1 combiner=18


-- 
Lisandro Dalcin

Research Scientist
Computer, Electrical and Mathematical Sciences & Engineering (CEMSE)
Numerical Porous Media Center (NumPor)
King Abdullah University of Science and Technology (KAUST)
http://numpor.kaust.edu.sa/

4700 King Abdullah University of Science and Technology
al-Khawarizmi Bldg (Bldg 1), Office # 4332
Thuwal 23955-6900, Kingdom of Saudi Arabia
http://www.kaust.edu.sa

Office Phone: +966 12 808-0459


[OMPI devel] Comm_split_type(COMM_SELF, MPI_UNDEFINED, ...)

2014-08-26 Thread Lisandro Dalcin
While I agree that the code below is rather useless, however I'm not
sure it should actually fail:

$ cat comm_split_type.c
#include 
#include 
int main(int argc, char *argv[])
{
  MPI_Comm comm;
  MPI_Init(&argc, &argv);
  MPI_Comm_split_type(MPI_COMM_SELF,MPI_UNDEFINED,0,MPI_INFO_NULL,&comm);
  assert(comm == MPI_COMM_NULL);
  MPI_Finalize();
  return 0;
}

$ mpicc comm_split_type.c
$ ./a.out
[kw2060:9865] *** An error occurred in MPI_Comm_split_type
[kw2060:9865] *** reported by process [140735368986625,140071768424448]
[kw2060:9865] *** on communicator MPI_COMM_SELF
[kw2060:9865] *** MPI_ERR_ARG: invalid argument of some other kind
[kw2060:9865] *** MPI_ERRORS_ARE_FATAL (processes in this communicator
will now abort,
[kw2060:9865] ***and potentially your MPI job)

-- 
Lisandro Dalcin

Research Scientist
Computer, Electrical and Mathematical Sciences & Engineering (CEMSE)
Numerical Porous Media Center (NumPor)
King Abdullah University of Science and Technology (KAUST)
http://numpor.kaust.edu.sa/

4700 King Abdullah University of Science and Technology
al-Khawarizmi Bldg (Bldg 1), Office # 4332
Thuwal 23955-6900, Kingdom of Saudi Arabia
http://www.kaust.edu.sa

Office Phone: +966 12 808-0459


[OMPI devel] MPI calls in callback functions during MPI_Finalize()

2014-08-26 Thread Lisandro Dalcin
Another issue while testing 1.8.2 (./configure --enable-debug
--enable-mem-debug).

Please look at the following code. I'm duplicating COMM_WORLD and
composing the dupe on it. The attribute free function is written to
Comm_free the duped comm and deallocate memory. However, the run fails
with the error you can see at the end.

IMHO, this is a bug. This way of managing duplicated communicator
contexts is quite common in parallel libraries. Moreover, In the MPI 3
standard, page 364, lines 11 and 12, it says:
"""
For example, MPI is “active” in callback functions that are invoked
during MPI_FINALIZE.
"""

Interestingly, if I replace WORLD -> SELF in the code below, I do not
get the error.

$ cat finalize.c
#include 
#include 

static int free_comm(MPI_Comm comm, int k, void *v, void *xs)
{
  MPI_Comm_free((MPI_Comm *)v);
  free(v);
  return MPI_SUCCESS;
}


int main(int argc, char *argv[])
{
  int keyval;
  MPI_Comm base,*comm;
  MPI_Init(&argc, &argv);

  MPI_Comm_create_keyval(MPI_COMM_NULL_COPY_FN, free_comm, &keyval, NULL);

  base = MPI_COMM_WORLD;
  comm = (MPI_Comm *)malloc(sizeof(MPI_Comm));
  MPI_Comm_dup(base, comm);
  MPI_Comm_set_attr(base, keyval, comm);

  MPI_Finalize();
  return 0;
}

$ mpicc finalize.c
$ ./a.out
*** The MPI_Comm_free() function was called after MPI_FINALIZE was invoked.
*** This is disallowed by the MPI standard.
*** Your MPI job will now abort.
[kw2060:14603] Local abort after MPI_FINALIZE completed successfully;
not able to aggregate error messages, and not able to guarantee that
all other processes were killed!


-- 
Lisandro Dalcin

Research Scientist
Computer, Electrical and Mathematical Sciences & Engineering (CEMSE)
Numerical Porous Media Center (NumPor)
King Abdullah University of Science and Technology (KAUST)
http://numpor.kaust.edu.sa/

4700 King Abdullah University of Science and Technology
al-Khawarizmi Bldg (Bldg 1), Office # 4332
Thuwal 23955-6900, Kingdom of Saudi Arabia
http://www.kaust.edu.sa

Office Phone: +966 12 808-0459


[OMPI devel] Neighbor collectives with periodic Cartesian topologies of size one

2014-08-26 Thread Lisandro Dalcin
I finally managed to track down some issues in mpi4py's test suite
using Open MPI 1.8+. The code below should be enough to reproduce the
problem. Run it under valgrind to make sense of my following
diagnostics.

In this code I'm creating a 2D, periodic Cartesian topology out of
COMM_SELF. In this case, the process in COMM_SELF has 4 logical in/out
links to itself. So we have size=1 but indegree=outdegree=4. However,
in ompi/mca/coll/basic/coll_basic_module.c, "size * 2" request are
being allocated to manage communication:

if (OMPI_COMM_IS_INTER(comm)) {
size = ompi_comm_remote_size(comm);
} else {
size = ompi_comm_size(comm);
}
basic_module->mccb_num_reqs = size * 2;
basic_module->mccb_reqs = (ompi_request_t**)
malloc(sizeof(ompi_request_t *) * basic_module->mccb_num_reqs);

I guess you have to also special-case for topologies and allocate
indegree+outdegree requests (not sure about this number, just
guessing).


#include 
#include 

int main(int argc, char *argv[])
{
  MPI_Comm comm;
  int ndims = 2, dims[2] = {1,1}, periods[2] = {1,1};
  int sendbuf = 7, recvbuf[5] = {0,0,0,0,0};
  MPI_Init(&argc, &argv);

  MPI_Cart_create(MPI_COMM_SELF, ndims, dims, periods, 0, &comm);

  MPI_Neighbor_allgather(&sendbuf, 1, MPI_INT,
 recvbuf,  1, MPI_INT,
 comm);

  {int i; for (i=0;i<5;i++) printf("%d ",recvbuf[i]); printf("\n");}

  MPI_Finalize();
  return 0;
}


-- 
Lisandro Dalcin

Research Scientist
Computer, Electrical and Mathematical Sciences & Engineering (CEMSE)
Numerical Porous Media Center (NumPor)
King Abdullah University of Science and Technology (KAUST)
http://numpor.kaust.edu.sa/

4700 King Abdullah University of Science and Technology
al-Khawarizmi Bldg (Bldg 1), Office # 4332
Thuwal 23955-6900, Kingdom of Saudi Arabia
http://www.kaust.edu.sa

Office Phone: +966 12 808-0459


[OMPI devel] malloc 0 warnings

2014-08-26 Thread Lisandro Dalcin
I'm getting a bunch of the following messages. Are they signaling some
easy-to-fix internal issue? Do you need code to reproduce each one?

malloc debug: Request for 0 bytes (coll_libnbc_ireduce_scatter_block.c, 67)
...
malloc debug: Request for 0 bytes (nbc_internal.h, 496)
...
malloc debug: Request for 0 bytes (osc_rdma_active_target.c, 74)


-- 
Lisandro Dalcin

Research Scientist
Computer, Electrical and Mathematical Sciences & Engineering (CEMSE)
Numerical Porous Media Center (NumPor)
King Abdullah University of Science and Technology (KAUST)
http://numpor.kaust.edu.sa/

4700 King Abdullah University of Science and Technology
al-Khawarizmi Bldg (Bldg 1), Office # 4332
Thuwal 23955-6900, Kingdom of Saudi Arabia
http://www.kaust.edu.sa

Office Phone: +966 12 808-0459


Re: [OMPI devel] Envelope of HINDEXED_BLOCK

2014-08-26 Thread Lisandro Dalcin
On 26 August 2014 19:27, Ralph Castain  wrote:
> Do you know if this works in the trunk? If so, then it may just be a missing 
> commit that should have come across to 1.8.2 and we can chase it down
>

$ ./autogen.pl
Open MPI autogen (buckle up!)

1. Checking tool versions

   Searching for autoconf
 Found autoconf version 2.69; checking version...
   Found version component 2 -- need 2
   Found version component 69 -- need 69
 ==> ACCEPTED
   Searching for libtoolize
 Found libtoolize version 2.4.2; checking version...
   Found version component 2 -- need 2
   Found version component 4 -- need 4
   Found version component 2 -- need 2
 ==> ACCEPTED
   Searching for automake
 Found automake version 1.13.4; checking version...
   Found version component 1 -- need 1
   Found version component 13 -- need 12
 ==> ACCEPTED
...
libtoolize: putting libltdl files in LT_CONFIG_LTDL_DIR, `opal/libltdl'.
libtoolize: `COPYING.LIB' not found in `/usr/share/libtool/libltdl'
autoreconf: libtoolize failed with exit status: 1
Command failed: autoreconf -ivf --warnings=all,no-obsolete,no-override -I config


Could it be related to automake 13 instead of 12 ?


-- 
Lisandro Dalcin

Research Scientist
Computer, Electrical and Mathematical Sciences & Engineering (CEMSE)
Numerical Porous Media Center (NumPor)
King Abdullah University of Science and Technology (KAUST)
http://numpor.kaust.edu.sa/

4700 King Abdullah University of Science and Technology
al-Khawarizmi Bldg (Bldg 1), Office # 4332
Thuwal 23955-6900, Kingdom of Saudi Arabia
http://www.kaust.edu.sa

Office Phone: +966 12 808-0459


Re: [OMPI devel] MPI calls in callback functions during MPI_Finalize()

2014-08-26 Thread Lisandro Dalcin
On 26 August 2014 21:29, George Bosilca  wrote:
> The MPI standard clearly states (in 8.7.1 Allowing User Functions at Process
> Termination) that the mechanism you describe is only allowed on
> MPI_COMM_SELF. The most relevant part starts at line 14.
>

IMHO, you are misinterpreting the standard. Please note that the
"callbacks" I'm talking about are the ones registered for freeing
cached attributes, their invocation is tied to the lifetime of the MPI
handle. The callbacks you are talking about are different kind of
beasts, they are callbacks you what to run specifically at
MPI_Finalize().

Caching duplicated communicators is a key feature in many libraries.
How do you propose to handle the deallocation of the duped
communicators when COMM_WORLD is involved?




-- 
Lisandro Dalcin

Research Scientist
Computer, Electrical and Mathematical Sciences & Engineering (CEMSE)
Numerical Porous Media Center (NumPor)
King Abdullah University of Science and Technology (KAUST)
http://numpor.kaust.edu.sa/

4700 King Abdullah University of Science and Technology
al-Khawarizmi Bldg (Bldg 1), Office # 4332
Thuwal 23955-6900, Kingdom of Saudi Arabia
http://www.kaust.edu.sa

Office Phone: +966 12 808-0459


Re: [OMPI devel] MPI calls in callback functions during MPI_Finalize()

2014-08-27 Thread Lisandro Dalcin
On 26 August 2014 23:59, George Bosilca  wrote:
> Lisandro,
>
> You rely on a feature clearly prohibited by the MPI standard. Please read
> the entire section I pinpointed you to (8.7.1).
>
> There are 2 key sentences in the section.
>
> 1. When MPI_FINALIZE is called, it will first execute the equivalent of an
> MPI_COMM_FREE on MPI_COMM_SELF.
>
> 2. The freeing of MPI_COMM_SELF occurs before any other parts of MPI are
> affected. Thus, for example, calling MPI_FINALIZED will return false in any
> of these callback functions. Once done with MPI_COMM_SELF, the order and
> rest of the actions taken by MPI_FINALIZE is not specified.
>
> Thus when MPI is calling the equivalent of MPI_COMM_FREE on your
> communicator, it is too late the MPI is already considered as finalized.
> Moreover, relying on MPI to cleanup your communicators is already bad habit,
> which is rightfully punished by Open MPI.
>

After much thinking about it, I must surrender :-), you were right.
Sorry for the noise.


-- 
Lisandro Dalcin

Research Scientist
Computer, Electrical and Mathematical Sciences & Engineering (CEMSE)
Numerical Porous Media Center (NumPor)
King Abdullah University of Science and Technology (KAUST)
http://numpor.kaust.edu.sa/

4700 King Abdullah University of Science and Technology
al-Khawarizmi Bldg (Bldg 1), Office # 4332
Thuwal 23955-6900, Kingdom of Saudi Arabia
http://www.kaust.edu.sa

Office Phone: +966 12 808-0459


Re: [OMPI devel] Envelope of HINDEXED_BLOCK

2014-08-27 Thread Lisandro Dalcin
On 26 August 2014 22:28, Paul Hargrove  wrote:
>> libtoolize: putting libltdl files in LT_CONFIG_LTDL_DIR, `opal/libltdl'.
>> libtoolize: `COPYING.LIB' not found in `/usr/share/libtool/libltdl'
>> autoreconf: libtoolize failed with exit status: 1
>
>
> The error message is from libtoolize about a file missing from the libtool
> installation directory.
> So, this looks (to me) like a mis-installation of libtool.
>

Of course, after
$ sudo yum install libtool-ltdl-devel
in my Fedora 20 box, everything went fine. Sorry for the noise.



-- 
Lisandro Dalcin

Research Scientist
Computer, Electrical and Mathematical Sciences & Engineering (CEMSE)
Numerical Porous Media Center (NumPor)
King Abdullah University of Science and Technology (KAUST)
http://numpor.kaust.edu.sa/

4700 King Abdullah University of Science and Technology
al-Khawarizmi Bldg (Bldg 1), Office # 4332
Thuwal 23955-6900, Kingdom of Saudi Arabia
http://www.kaust.edu.sa

Office Phone: +966 12 808-0459


Re: [OMPI devel] malloc 0 warnings

2014-08-27 Thread Lisandro Dalcin
On 27 August 2014 02:38, Jeff Squyres (jsquyres)  wrote:
> If you have reproducers, yes, that would be most helpful -- thanks.
>

OK, here you have something to start. To be fair, this is a reduction
with zero count. I have many other tests for reductions with zero
count that are failing.

Does Open MPI ban zero-count reduction calls, or any failure is actually a bug?

$ cat ireduce_scatter_block.c
#include 
int main(int argc, char *argv[])
{
  MPI_Request request;
  MPI_Init(&argc, &argv);
  MPI_Ireduce_scatter_block(NULL, NULL, 0, MPI_INT,
MPI_SUM, MPI_COMM_SELF, &request);
  MPI_Wait(&request, MPI_STATUS_IGNORE);
  MPI_Finalize();
  return 0;
}

$ mpicc ireduce_scatter_block.c
$ ./a.out
malloc debug: Request for 0 bytes (coll_libnbc_ireduce_scatter_block.c, 67)


Re: [OMPI devel] malloc 0 warnings

2014-08-27 Thread Lisandro Dalcin
On 27 August 2014 02:38, Jeff Squyres (jsquyres)  wrote:
> If you have reproducers, yes, that would be most helpful -- thanks.
>

Here you have another one...

$ cat igatherv.c
#include 
int main(int argc, char *argv[])
{
  signed char a=1,b=2;
  int rcounts[1] = {0};
  int rdispls[1] = {0};
  MPI_Request request;
  MPI_Init(&argc, &argv);
  MPI_Igatherv(&a, 0, MPI_SIGNED_CHAR,
   &b, rcounts, rdispls, MPI_SIGNED_CHAR,
   0, MPI_COMM_SELF, &request);
  MPI_Wait(&request, MPI_STATUS_IGNORE);
  MPI_Finalize();
  return 0;
}

$ mpicc igatherv.c
$ ./a.out
malloc debug: Request for 0 bytes (nbc_internal.h, 496)


[OMPI devel] Valgrind warning in MPI_Win_allocate[_shared]()

2014-09-28 Thread Lisandro Dalcin
Just built 1.8.3 for another round of testing with mpi4py. I'm getting
the following valgrind warning:

==4718== Conditional jump or move depends on uninitialised value(s)
==4718==at 0xD0D9F4C: component_select (osc_sm_component.c:333)
==4718==by 0x4CF44F6: ompi_osc_base_select (osc_base_init.c:73)
==4718==by 0x4C68B69: ompi_win_allocate (win.c:182)
==4718==by 0x4CBB8C2: PMPI_Win_allocate (pwin_allocate.c:79)
==4718==by 0x400898: main (in /home/dalcinl/Devel/BUGS-MPI/openmpi/a.out)

The offending code is in ompi/mca/osc/sm/osc_sm_component.c, it seems
you forgot to initialize the "blocking_fence" to a default true or
false value.

bool blocking_fence;
int flag;

if (OMPI_SUCCESS != ompi_info_get_bool(info, "blocking_fence",
   &blocking_fence, &flag)) {
goto error;
}

    if (blocking_fence) {


-- 
Lisandro Dalcin

Research Scientist
Computer, Electrical and Mathematical Sciences & Engineering (CEMSE)
Numerical Porous Media Center (NumPor)
King Abdullah University of Science and Technology (KAUST)
http://numpor.kaust.edu.sa/

4700 King Abdullah University of Science and Technology
al-Khawarizmi Bldg (Bldg 1), Office # 4332
Thuwal 23955-6900, Kingdom of Saudi Arabia
http://www.kaust.edu.sa

Office Phone: +966 12 808-0459


Re: [OMPI devel] Different behaviour with MPI_IN_PLACE in MPI_Reduce_scatter() and MPI_Ireduce_scatter()

2014-09-28 Thread Lisandro Dalcin
On 22 April 2014 03:02, George Bosilca  wrote:
> Btw, the proposed validator was incorrect the first printf instead of
>
>  printf(“[%d] rbuf[%d]=%2d  expected:%2d\n”, rank, 0, recvbuf[i], size);
>
> should be
>
>  printf(“[%d] rbuf[%d]=%2d  expected:%2d\n”, rank, 0, recvbuf[0], size);
>

I'm testing this with 1.8.3 after fixed the my incorrect printf, and
still get different results (and the nbcoll one is wrong) using one
process (for two or more everything's OK).

$ mpicc -DNBCOLL=0 ireduce_scatter.c && mpiexec -n 1 ./a.out
[0] rbuf[0]= 1  expected: 1

$ mpicc -DNBCOLL=1 ireduce_scatter.c && mpiexec -n 1 ./a.out
[0] rbuf[0]=60  expected: 1


-- 
Lisandro Dalcin

Research Scientist
Computer, Electrical and Mathematical Sciences & Engineering (CEMSE)
Numerical Porous Media Center (NumPor)
King Abdullah University of Science and Technology (KAUST)
http://numpor.kaust.edu.sa/

4700 King Abdullah University of Science and Technology
al-Khawarizmi Bldg (Bldg 1), Office # 4332
Thuwal 23955-6900, Kingdom of Saudi Arabia
http://www.kaust.edu.sa

Office Phone: +966 12 808-0459


Re: [OMPI devel] Neighbor collectives with periodic Cartesian topologies of size one

2014-09-28 Thread Lisandro Dalcin
On 25 September 2014 20:50, Nathan Hjelm  wrote:
> On Tue, Aug 26, 2014 at 07:03:24PM +0300, Lisandro Dalcin wrote:
>> I finally managed to track down some issues in mpi4py's test suite
>> using Open MPI 1.8+. The code below should be enough to reproduce the
>> problem. Run it under valgrind to make sense of my following
>> diagnostics.
>>
>> In this code I'm creating a 2D, periodic Cartesian topology out of
>> COMM_SELF. In this case, the process in COMM_SELF has 4 logical in/out
>> links to itself. So we have size=1 but indegree=outdegree=4. However,
>> in ompi/mca/coll/basic/coll_basic_module.c, "size * 2" request are
>> being allocated to manage communication:
>>
>> if (OMPI_COMM_IS_INTER(comm)) {
>> size = ompi_comm_remote_size(comm);
>> } else {
>> size = ompi_comm_size(comm);
>> }
>> basic_module->mccb_num_reqs = size * 2;
>> basic_module->mccb_reqs = (ompi_request_t**)
>> malloc(sizeof(ompi_request_t *) * basic_module->mccb_num_reqs);
>>
>> I guess you have to also special-case for topologies and allocate
>> indegree+outdegree requests (not sure about this number, just
>> guessing).
>>
>
> I wish this was possible but the topology information is not available
> at that point. We may be able to change that but I don't see the work
> completing anytime soon. I committed an alternative fix as r32796 and
> CMR'd it to 1.8.3. I can confirm that the attached reproducer no longer
> produces a SEGV. Let me know if you run into any more issues.
>

Did your fix get in for 1.8.3? I'm still getting the segfault.



-- 
Lisandro Dalcin

Research Scientist
Computer, Electrical and Mathematical Sciences & Engineering (CEMSE)
Numerical Porous Media Center (NumPor)
King Abdullah University of Science and Technology (KAUST)
http://numpor.kaust.edu.sa/

4700 King Abdullah University of Science and Technology
al-Khawarizmi Bldg (Bldg 1), Office # 4332
Thuwal 23955-6900, Kingdom of Saudi Arabia
http://www.kaust.edu.sa

Office Phone: +966 12 808-0459


Re: [OMPI devel] Different behaviour with MPI_IN_PLACE in MPI_Reduce_scatter() and MPI_Ireduce_scatter()

2014-12-23 Thread Lisandro Dalcin
On 28 September 2014 at 19:13, George Bosilca  wrote:
> Lisandro,
>
> Good catch. Indeed the MPI_Ireduce_scatter was not covering the case where
> MPI_IN_PLACE was used over a communicator with a single participant. I
> pushed a patch and schedule it for 1.8.4. Check
> https://svn.open-mpi.org/trac/ompi/ticket/4924 for more info.
>

While your change fixed the issues when using MPI_IN_PLACE, now 1.8.4
seems to fail when in-place is not used.

Please try the attached example:

$ mpicc -DNBCOLL=0 ireduce_scatter.c
$ mpiexec -n 2 ./a.out
[0] rbuf[0]= 2  expected: 2
[0] rbuf[1]= 0  expected: 0
[1] rbuf[0]= 2  expected: 2
[1] rbuf[1]= 0  expected: 0
$ mpiexec -n 1 ./a.out
[0] rbuf[0]= 1  expected: 1


$ mpicc -DNBCOLL=1 ireduce_scatter.c
$ mpiexec -n 2 ./a.out
[0] rbuf[0]= 2  expected: 2
[0] rbuf[1]= 0  expected: 0
[1] rbuf[0]= 2  expected: 2
[1] rbuf[1]= 0  expected: 0
$ mpiexec -n 1 ./a.out
[0] rbuf[0]= 0  expected: 1

The last one is wrong. Not sure what's going on. Am I missing something?


-- 
Lisandro Dalcin

Research Scientist
Computer, Electrical and Mathematical Sciences & Engineering (CEMSE)
Numerical Porous Media Center (NumPor)
King Abdullah University of Science and Technology (KAUST)
http://numpor.kaust.edu.sa/

4700 King Abdullah University of Science and Technology
al-Khawarizmi Bldg (Bldg 1), Office # 4332
Thuwal 23955-6900, Kingdom of Saudi Arabia
http://www.kaust.edu.sa

Office Phone: +966 12 808-0459
#include 
#include 
#include 
int main(int argc, char *argv[])
{
  int i,size,rank;
  int sendbuf[] = {1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1};
  int recvbuf[] = {0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0};
  int rcounts[] = {1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1};
  MPI_Init(&argc, &argv);
  MPI_Comm_size(MPI_COMM_WORLD, &size);
  MPI_Comm_rank(MPI_COMM_WORLD, &rank);
  if (size > 16) MPI_Abort(MPI_COMM_WORLD,1);
#ifndef NBCOLL
#define NBCOLL 1
#endif
#if NBCOLL
  {
MPI_Request request;
MPI_Ireduce_scatter(sendbuf, recvbuf, rcounts, MPI_INT,
MPI_SUM, MPI_COMM_WORLD, &request);
MPI_Wait(&request,MPI_STATUS_IGNORE);
  }
#else
  MPI_Reduce_scatter(sendbuf, recvbuf, rcounts, MPI_INT,
 MPI_SUM, MPI_COMM_WORLD);
#endif
  printf("[%d] rbuf[%d]=%2d  expected:%2d\n", rank, 0, recvbuf[0], size);
  for (i=1; i

[OMPI devel] Warnings about malloc(0) in debug build

2015-05-07 Thread Lisandro Dalcin
Folks, I've just built 1.8.5 to test with mpi4py. My configure line was:

$ ./configure --prefix=/home/devel/mpi/openmpi/1.8.5 --enable-debug
--enable-mem-debug

While running the tests, my terminal was flooded with malloc(0)
warnings, below a list of unique lines.

malloc debug: Request for 0 bytes (coll_libnbc_ireduce_scatter_block.c, 67)
malloc debug: Request for 0 bytes (nbc_internal.h, 505)
malloc debug: Request for 0 bytes (osc_rdma_active_target.c, 74)
malloc debug: Request for 0 bytes (osc_rdma_active_target.c, 76)


-- 
Lisandro Dalcin

Research Scientist
Computer, Electrical and Mathematical Sciences & Engineering (CEMSE)
Numerical Porous Media Center (NumPor)
King Abdullah University of Science and Technology (KAUST)
http://numpor.kaust.edu.sa/

4700 King Abdullah University of Science and Technology
al-Khawarizmi Bldg (Bldg 1), Office # 4332
Thuwal 23955-6900, Kingdom of Saudi Arabia
http://www.kaust.edu.sa

Office Phone: +966 12 808-0459


[OMPI devel] Issues with MPI_Type_create_f90_{real|complex}

2015-05-07 Thread Lisandro Dalcin
This is with a debug build of 1.8.5

I'm getting segfaults with tests related to the use of
MPI_Type_create_f90_{real|complex}. See below the attached test case
and the valgrind output (BTW, MPI_Type_create_f90_integer seems to be
OK).

$ cat type_f90.c
#include 
int main(int argc, char *argv[])
{
  MPI_Datatype datatype;
  MPI_Init(&argc, &argv);

  MPI_Type_create_f90_integer(4, &datatype);

  MPI_Type_create_f90_real( 6,  30, &datatype);
  MPI_Type_create_f90_real(15, 300, &datatype);

  MPI_Type_create_f90_complex( 6,  30, &datatype);
  MPI_Type_create_f90_complex(15, 300, &datatype);

  MPI_Finalize();
  return 0;
}

$ mpicc type_f90.c

$ valgrind -q ./a.out
==1025== Invalid write of size 4
==1025==at 0x4C740BF: ompi_datatype_set_args (ompi_datatype_args.c:206)
==1025==by 0x4CC91CE: PMPI_Type_create_f90_real
(ptype_create_f90_real.c:108)
==1025==by 0x400878: main (in /home/dalcinl/Devel/BUGS-MPI/openmpi/a.out)
==1025==  Address 0x8e703cc is 0 bytes after a block of size 60 alloc'd
==1025==at 0x4A0645D: malloc (in
/usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==1025==by 0x53236F0: opal_malloc (malloc.c:101)
==1025==by 0x4C739E3: ompi_datatype_set_args (ompi_datatype_args.c:121)
==1025==by 0x4CC91CE: PMPI_Type_create_f90_real
(ptype_create_f90_real.c:108)
==1025==by 0x400878: main (in /home/dalcinl/Devel/BUGS-MPI/openmpi/a.out)
==1025==
==1025== Invalid write of size 4
==1025==at 0x4C740BF: ompi_datatype_set_args (ompi_datatype_args.c:206)
==1025==by 0x4CC91CE: PMPI_Type_create_f90_real
(ptype_create_f90_real.c:108)
==1025==by 0x40088E: main (in /home/dalcinl/Devel/BUGS-MPI/openmpi/a.out)
==1025==  Address 0x8e7073c is 0 bytes after a block of size 60 alloc'd
==1025==at 0x4A0645D: malloc (in
/usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==1025==by 0x53236F0: opal_malloc (malloc.c:101)
==1025==by 0x4C739E3: ompi_datatype_set_args (ompi_datatype_args.c:121)
==1025==by 0x4CC91CE: PMPI_Type_create_f90_real
(ptype_create_f90_real.c:108)
==1025==by 0x40088E: main (in /home/dalcinl/Devel/BUGS-MPI/openmpi/a.out)
==1025==
==1025== Invalid write of size 4
==1025==at 0x4C740BF: ompi_datatype_set_args (ompi_datatype_args.c:206)
==1025==by 0x4CC8636: PMPI_Type_create_f90_complex
(ptype_create_f90_complex.c:110)
==1025==by 0x4008A4: main (in /home/dalcinl/Devel/BUGS-MPI/openmpi/a.out)
==1025==  Address 0x8e70aac is 0 bytes after a block of size 60 alloc'd
==1025==at 0x4A0645D: malloc (in
/usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==1025==by 0x53236F0: opal_malloc (malloc.c:101)
==1025==by 0x4C739E3: ompi_datatype_set_args (ompi_datatype_args.c:121)
==1025==by 0x4CC8636: PMPI_Type_create_f90_complex
(ptype_create_f90_complex.c:110)
==1025==by 0x4008A4: main (in /home/dalcinl/Devel/BUGS-MPI/openmpi/a.out)
==1025==
==1025== Invalid write of size 4
==1025==at 0x4C740BF: ompi_datatype_set_args (ompi_datatype_args.c:206)
==1025==by 0x4CC8636: PMPI_Type_create_f90_complex
(ptype_create_f90_complex.c:110)
==1025==by 0x4008BA: main (in /home/dalcinl/Devel/BUGS-MPI/openmpi/a.out)
==1025==  Address 0x8e70e1c is 0 bytes after a block of size 60 alloc'd
==1025==at 0x4A0645D: malloc (in
/usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==1025==by 0x53236F0: opal_malloc (malloc.c:101)
==1025==by 0x4C739E3: ompi_datatype_set_args (ompi_datatype_args.c:121)
==1025==by 0x4CC8636: PMPI_Type_create_f90_complex
(ptype_create_f90_complex.c:110)
==1025==by 0x4008BA: main (in /home/dalcinl/Devel/BUGS-MPI/openmpi/a.out)
==1025==


-- 
Lisandro Dalcin

Research Scientist
Computer, Electrical and Mathematical Sciences & Engineering (CEMSE)
Numerical Porous Media Center (NumPor)
King Abdullah University of Science and Technology (KAUST)
http://numpor.kaust.edu.sa/

4700 King Abdullah University of Science and Technology
al-Khawarizmi Bldg (Bldg 1), Office # 4332
Thuwal 23955-6900, Kingdom of Saudi Arabia
http://www.kaust.edu.sa

Office Phone: +966 12 808-0459


  1   2   >