ML with OpenMPI
On Sat 2008-03-22 10:19, Lisandro Dalcin wrote: Give a try. When using MPICH2, PETSc just passes --with-mpi=PATH_TO_MPI and ML get it right. Perhaps ML have some trouble with OpenMPI, I've never tried. If you built OpenMPI yourself and with shared libs, do not forget to define LD_LIBRARY_PATH to point to the dir with the OpenMPI libs. If not, some configure test of ML could fail, and then MPI is assumed to be absent. It turns out I was chasing this in entirely the wrong direction. ML was configured just fine and was correctly using MPI, but we had not defined HAVE_CONFIG_H so ml_common.h was not setting all the variables that depend on ml_config.h. In particular, ml_config.h sets HAVE_MPI correctly, but the following is in ml_config.h: #ifdef HAVE_CONFIG_H ... #ifdef HAVE_MPI #ifndef ML_MPI #define ML_MPI #endif #endif ... #endif /*ifdef HAVE_CONFIG_H*/ Indeed, adding -DHAVE_CONFIG_H to CFLAGS in src/ksp/pc/impls/ml/makefile fixes the problem (and the manual include of ml_config.h in ml.c becomes unnecessary). That is, the patch below makes everything work correctly. Jed diff -r 2ae11e456aa7 src/ksp/pc/impls/ml/makefile --- a/src/ksp/pc/impls/ml/makefile Fri Mar 21 17:33:24 2008 -0500 +++ b/src/ksp/pc/impls/ml/makefile Tue Mar 25 08:35:12 2008 +0100 @@ -5,7 +5,7 @@ ALL: lib -CFLAGS = ${ML_INCLUDE} +CFLAGS = ${ML_INCLUDE} -DHAVE_CONFIG_H FFLAGS = SOURCEC = ml.c SOURCEF = diff -r 2ae11e456aa7 src/ksp/pc/impls/ml/ml.c --- a/src/ksp/pc/impls/ml/ml.c Fri Mar 21 17:33:24 2008 -0500 +++ b/src/ksp/pc/impls/ml/ml.c Tue Mar 25 08:35:12 2008 +0100 @@ -10,7 +10,6 @@ #include math.h EXTERN_C_BEGIN -#include ml_config.h #include ml_include.h EXTERN_C_END -- next part -- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 197 bytes Desc: not available URL: http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20080325/525e21f4/attachment.pgp
ML with OpenMPI
However, I still do not understand why I never had this problem. Jed, you built ML yourself, or were you letting PETSc to automatically download and build it? Or perhaps I did not noticed the problem because of MPICH2? On 3/25/08, Barry Smith bsmith at mcs.anl.gov wrote: I have pushed this fix to petsc-dev Thank you for figuring this out, Barry On Mar 25, 2008, at 2:38 AM, Jed Brown wrote: On Sat 2008-03-22 10:19, Lisandro Dalcin wrote: Give a try. When using MPICH2, PETSc just passes --with-mpi=PATH_TO_MPI and ML get it right. Perhaps ML have some trouble with OpenMPI, I've never tried. If you built OpenMPI yourself and with shared libs, do not forget to define LD_LIBRARY_PATH to point to the dir with the OpenMPI libs. If not, some configure test of ML could fail, and then MPI is assumed to be absent. It turns out I was chasing this in entirely the wrong direction. ML was configured just fine and was correctly using MPI, but we had not defined HAVE_CONFIG_H so ml_common.h was not setting all the variables that depend on ml_config.h. In particular, ml_config.h sets HAVE_MPI correctly, but the following is in ml_config.h: #ifdef HAVE_CONFIG_H ... #ifdef HAVE_MPI #ifndef ML_MPI #define ML_MPI #endif #endif ... #endif /*ifdef HAVE_CONFIG_H*/ Indeed, adding -DHAVE_CONFIG_H to CFLAGS in src/ksp/pc/impls/ml/ makefile fixes the problem (and the manual include of ml_config.h in ml.c becomes unnecessary). That is, the patch below makes everything work correctly. Jed diff -r 2ae11e456aa7 src/ksp/pc/impls/ml/makefile --- a/src/ksp/pc/impls/ml/makefile Fri Mar 21 17:33:24 2008 -0500 +++ b/src/ksp/pc/impls/ml/makefile Tue Mar 25 08:35:12 2008 +0100 @@ -5,7 +5,7 @@ ALL: lib -CFLAGS = ${ML_INCLUDE} +CFLAGS = ${ML_INCLUDE} -DHAVE_CONFIG_H FFLAGS = SOURCEC = ml.c SOURCEF = diff -r 2ae11e456aa7 src/ksp/pc/impls/ml/ml.c --- a/src/ksp/pc/impls/ml/ml.c Fri Mar 21 17:33:24 2008 -0500 +++ b/src/ksp/pc/impls/ml/ml.c Tue Mar 25 08:35:12 2008 +0100 @@ -10,7 +10,6 @@ #include math.h EXTERN_C_BEGIN -#include ml_config.h #include ml_include.h EXTERN_C_END -- Lisandro Dalc?n --- Centro Internacional de M?todos Computacionales en Ingenier?a (CIMEC) Instituto de Desarrollo Tecnol?gico para la Industria Qu?mica (INTEC) Consejo Nacional de Investigaciones Cient?ficas y T?cnicas (CONICET) PTLC - G?emes 3450, (3000) Santa Fe, Argentina Tel/Fax: +54-(0)342-451.1594
ML with OpenMPI
Well, then that would mean that I was using ML through PETSc in PARALLEL runs with no MPI support !!! Do you believe that scenario is possible? Looking at ML configure script and generated makefiles, in them there is a line saying DEFS = -DHAVE_CONFIG_H Do you have that line? Next, this $(DEFS) is included in compiler command definition. Additionally, I did $ nm -C libml.a | grep MPI and undefined references to the MPI functions appered as expected. Sorry about my insinstence, but I believe we need to figure out what's exactly going on. On 3/25/08, Jed Brown jed at 59a2.org wrote: I let PETSc build it for me. I think you did not notice the problem because MPICH2 defines MPI_Comm to be an int which happens to be the same as used by ML in their dummy MPI so there is no type mismatch. From the contents of ml_common.h, it looks like you would still run into trouble if you were using optional features of ML. The reason I like OpenMPI is exactly this stronger type checking and that it seems to crash sooner when I have a bug. Jed -- Lisandro Dalc?n --- Centro Internacional de M?todos Computacionales en Ingenier?a (CIMEC) Instituto de Desarrollo Tecnol?gico para la Industria Qu?mica (INTEC) Consejo Nacional de Investigaciones Cient?ficas y T?cnicas (CONICET) PTLC - G?emes 3450, (3000) Santa Fe, Argentina Tel/Fax: +54-(0)342-451.1594
ML with OpenMPI
On Tue 2008-03-25 12:11, Lisandro Dalcin wrote: Well, then that would mean that I was using ML through PETSc in PARALLEL runs with no MPI support !!! Do you believe that scenario is possible? No, ML was built correctly. The build output has -DHAVE_CONFIG_H on every build line. What *is* happening is that the headers included by ksp/pc/impls/ml/ml.c were essentially for a non-MPI build because HAVE_CONFIG_H was not defined. That is, including ml_config.h defines the autoconf'd macros (like HAVE_MPI) and ml_common.h uses them to set ML-local macros (like ML_MPI) *only* if HAVE_CONFIG_H is defined. So when we include ml_include.h without defining HAVE_CONFIG_H, we see the interface for a default (non-MPI) build. This interface is (apparently) the same as an MPI build with MPICH2, but not with OpenMPI. Since the library was built with MPI, there was no dangerous type casting, and you were using it with MPI, there was no problem. When using OpenMPI, it sees a conflict between the ML's dummy MPI interface and OpenMPI's because ML and MPICH2 use MPI_Comm = int while OpenMPI uses an opaque pointer value. Looking at ML configure script and generated makefiles, in them there is a line saying DEFS = -DHAVE_CONFIG_H Do you have that line? Next, this $(DEFS) is included in compiler command definition. Additionally, I did $ nm -C libml.a | grep MPI and undefined references to the MPI functions appered as expected. Sorry about my insinstence, but I believe we need to figure out what's exactly going on. No problem. I agree it is important. Does my explanation above make sense to you? Jed -- next part -- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 197 bytes Desc: not available URL: http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20080325/6e972319/attachment.pgp
ML with OpenMPI
OK. Now all is clear to me. Sorry about my confusion. So I have to conclude that ML machinery for including headers is a bit broken, I think. Many thanks for your explanation. On 3/25/08, Jed Brown jed at 59a2.org wrote: No, ML was built correctly. The build output has -DHAVE_CONFIG_H on every build line. What *is* happening is that the headers included by ksp/pc/impls/ml/ml.c were essentially for a non-MPI build because HAVE_CONFIG_H was not defined. That is, including ml_config.h defines the autoconf'd macros (like HAVE_MPI) and ml_common.h uses them to set ML-local macros (like ML_MPI) *only* if HAVE_CONFIG_H is defined. So when we include ml_include.h without defining HAVE_CONFIG_H, we see the interface for a default (non-MPI) build. This interface is (apparently) the same as an MPI build with MPICH2, but not with OpenMPI. Since the library was built with MPI, there was no dangerous type casting, and you were using it with MPI, there was no problem. When using OpenMPI, it sees a conflict between the ML's dummy MPI interface and OpenMPI's because ML and MPICH2 use MPI_Comm = int while OpenMPI uses an opaque pointer value. Looking at ML configure script and generated makefiles, in them there is a line saying DEFS = -DHAVE_CONFIG_H Do you have that line? Next, this $(DEFS) is included in compiler command definition. Additionally, I did $ nm -C libml.a | grep MPI and undefined references to the MPI functions appered as expected. Sorry about my insinstence, but I believe we need to figure out what's exactly going on. No problem. I agree it is important. Does my explanation above make sense to you? Jed -- Lisandro Dalc?n --- Centro Internacional de M?todos Computacionales en Ingenier?a (CIMEC) Instituto de Desarrollo Tecnol?gico para la Industria Qu?mica (INTEC) Consejo Nacional de Investigaciones Cient?ficas y T?cnicas (CONICET) PTLC - G?emes 3450, (3000) Santa Fe, Argentina Tel/Fax: +54-(0)342-451.1594
ML with OpenMPI
I think our usage of ML is broken. We use #include ml_config.h in ml.c. However looks like ml requires HAVE_CONFIG_H defined for other things aswell. So the correct fix is to change the above line to: #define HAVE_CONFIG_H Satish On Tue, 25 Mar 2008, Lisandro Dalcin wrote: OK. Now all is clear to me. Sorry about my confusion. So I have to conclude that ML machinery for including headers is a bit broken, I think. Many thanks for your explanation. On 3/25/08, Jed Brown jed at 59a2.org wrote: No, ML was built correctly. The build output has -DHAVE_CONFIG_H on every build line. What *is* happening is that the headers included by ksp/pc/impls/ml/ml.c were essentially for a non-MPI build because HAVE_CONFIG_H was not defined. That is, including ml_config.h defines the autoconf'd macros (like HAVE_MPI) and ml_common.h uses them to set ML-local macros (like ML_MPI) *only* if HAVE_CONFIG_H is defined. So when we include ml_include.h without defining HAVE_CONFIG_H, we see the interface for a default (non-MPI) build. This interface is (apparently) the same as an MPI build with MPICH2, but not with OpenMPI. Since the library was built with MPI, there was no dangerous type casting, and you were using it with MPI, there was no problem. When using OpenMPI, it sees a conflict between the ML's dummy MPI interface and OpenMPI's because ML and MPICH2 use MPI_Comm = int while OpenMPI uses an opaque pointer value. Looking at ML configure script and generated makefiles, in them there is a line saying DEFS = -DHAVE_CONFIG_H Do you have that line? Next, this $(DEFS) is included in compiler command definition. Additionally, I did $ nm -C libml.a | grep MPI and undefined references to the MPI functions appered as expected. Sorry about my insinstence, but I believe we need to figure out what's exactly going on. No problem. I agree it is important. Does my explanation above make sense to you? Jed
ML with OpenMPI
The MPI standard does not specify that MPI_Comm = int and in fact OpenMPI uses a pointer value which lets the compiler do slightly more type checking. This type checking recently caused me trouble when building with-download-ml. There is a line in ml_comm.h which defines their communicator to be an int when ML_MPI is not defined. It was not immediately clear why this is not defined, but result is that the compiler chokes when building PETSc. This patch fixes the problem as long as sizeof(int) = sizeof(MPI_Comm), but this is *not* the case on x86_64 with OpenMPI. It's not clear to me how much of this is an upstream issue and how much is a configuration issue. Jed diff -r c074838b79ed src/ksp/pc/impls/ml/ml.c --- a/src/ksp/pc/impls/ml/ml.c Thu Mar 20 17:04:05 2008 -0500 +++ b/src/ksp/pc/impls/ml/ml.c Fri Mar 21 22:50:32 2008 +0100 @@ -815,7 +815,7 @@ PetscErrorCode MatWrapML_SHELL(ML_Operat MLcomm = mlmat-comm; ierr = PetscNew(Mat_MLShell,shellctx);CHKERRQ(ierr); - ierr = MatCreateShell(MLcomm-USR_comm,m,n,PETSC_DETERMINE,PETSC_DETERMINE,shellctx,newmat);CHKERRQ(ierr); + ierr = MatCreateShell((MPI_Comm)MLcomm-USR_comm,m,n,PETSC_DETERMINE,PETSC_DETERMINE,shellctx,newmat);CHKERRQ(ierr); ierr = MatShellSetOperation(*newmat,MATOP_MULT,(void(*)(void))MatMult_ML);CHKERRQ(ierr); ierr = MatShellSetOperation(*newmat,MATOP_MULT_ADD,(void(*)(void))MatMultAdd_ML);CHKERRQ(ierr); shellctx-A = *newmat; @@ -844,7 +844,7 @@ PetscErrorCode MatWrapML_MPIAIJ(ML_Opera n = mlmat-invec_leng; if (m != n) SETERRQ2(PETSC_ERR_ARG_OUTOFRANGE,m %d must equal to n %d,m,n); - ierr = MatCreate(mlmat-comm-USR_comm,A);CHKERRQ(ierr); + ierr = MatCreate((MPI_Comm)mlmat-comm-USR_comm,A);CHKERRQ(ierr); ierr = MatSetSizes(A,m,n,PETSC_DECIDE,PETSC_DECIDE);CHKERRQ(ierr); ierr = MatSetType(A,MATMPIAIJ);CHKERRQ(ierr); ierr = PetscMalloc3(m,PetscInt,nnzA,m,PetscInt,nnzB,m,PetscInt,nnz);CHKERRQ(ierr); -- next part -- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 197 bytes Desc: not available URL: http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20080321/1dc56156/attachment.pgp
ML with OpenMPI
On Fri 2008-03-21 19:31, Lisandro Dalcin wrote: Mmm... I believe this is a configuration issue... if ML_MPI were defined, then ML_USR_COMM should be MPI_Comm. But the problem is perhaps on the ML side, not the PETSc side. ml_common.h #define ML_MPI if macro HAVE_MPI is defined. In turn HAVE_MPI is at ml_config.h, and that file is surelly generated by ML configure script. For some reason ML's configure failed to found MPI with the command line stuff PETSc pass to it. Look at the 'config.log' file inside the ml-5.0 dir to find what happened.
ML with OpenMPI
Give a try. When using MPICH2, PETSc just passes --with-mpi=PATH_TO_MPI and ML get it right. Perhaps ML have some trouble with OpenMPI, I've never tried. If you built OpenMPI yourself and with shared libs, do not forget to define LD_LIBRARY_PATH to point to the dir with the OpenMPI libs. If not, some configure test of ML could fail, and then MPI is assumed to be absent. On 3/21/08, Jed Brown jed at 59a2.org wrote: On Fri 2008-03-21 19:31, Lisandro Dalcin wrote: Mmm... I believe this is a configuration issue... if ML_MPI were defined, then ML_USR_COMM should be MPI_Comm. But the problem is perhaps on the ML side, not the PETSc side. ml_common.h #define ML_MPI if macro HAVE_MPI is defined. In turn HAVE_MPI is at ml_config.h, and that file is surelly generated by ML configure script. For some reason ML's configure failed to found MPI with the command line stuff PETSc pass to it. Look at the 'config.log' file inside the ml-5.0 dir to find what happened. From the ML docs, it looks like ML's configure expects --with-mpi-compilers if it is building with MPI. I modified config/PETSc/packages/ml.py to include this option and I'll let you know if it fixes the problem. Jed -- Lisandro Dalc?n --- Centro Internacional de M?todos Computacionales en Ingenier?a (CIMEC) Instituto de Desarrollo Tecnol?gico para la Industria Qu?mica (INTEC) Consejo Nacional de Investigaciones Cient?ficas y T?cnicas (CONICET) PTLC - G?emes 3450, (3000) Santa Fe, Argentina Tel/Fax: +54-(0)342-451.1594
ML with OpenMPI
Jed, You can take a look at config/PETSc/packages/ml.py; essentially we call their configure with a given set of compilers (and MPI information). So I would say you have to report the bug to those folks; their configure should handle that issue, shouldn't it. Barry On Mar 21, 2008, at 5:09 PM, Jed Brown wrote: The MPI standard does not specify that MPI_Comm = int and in fact OpenMPI uses a pointer value which lets the compiler do slightly more type checking. This type checking recently caused me trouble when building with-download-ml. There is a line in ml_comm.h which defines their communicator to be an int when ML_MPI is not defined. It was not immediately clear why this is not defined, but result is that the compiler chokes when building PETSc. This patch fixes the problem as long as sizeof(int) = sizeof(MPI_Comm), but this is *not* the case on x86_64 with OpenMPI. It's not clear to me how much of this is an upstream issue and how much is a configuration issue. Jed diff -r c074838b79ed src/ksp/pc/impls/ml/ml.c --- a/src/ksp/pc/impls/ml/ml.c Thu Mar 20 17:04:05 2008 -0500 +++ b/src/ksp/pc/impls/ml/ml.c Fri Mar 21 22:50:32 2008 +0100 @@ -815,7 +815,7 @@ PetscErrorCode MatWrapML_SHELL(ML_Operat MLcomm = mlmat-comm; ierr = PetscNew(Mat_MLShell,shellctx);CHKERRQ(ierr); - ierr = MatCreateShell(MLcomm- USR_comm ,m,n,PETSC_DETERMINE,PETSC_DETERMINE,shellctx,newmat);CHKERRQ(ierr); + ierr = MatCreateShell((MPI_Comm)MLcomm- USR_comm ,m,n,PETSC_DETERMINE,PETSC_DETERMINE,shellctx,newmat);CHKERRQ(ierr); ierr = MatShellSetOperation(*newmat,MATOP_MULT,(void(*) (void))MatMult_ML);CHKERRQ(ierr); ierr = MatShellSetOperation(*newmat,MATOP_MULT_ADD,(void(*) (void))MatMultAdd_ML);CHKERRQ(ierr); shellctx-A = *newmat; @@ -844,7 +844,7 @@ PetscErrorCode MatWrapML_MPIAIJ(ML_Opera n = mlmat-invec_leng; if (m != n) SETERRQ2(PETSC_ERR_ARG_OUTOFRANGE,m %d must equal to n %d,m,n); - ierr = MatCreate(mlmat-comm-USR_comm,A);CHKERRQ(ierr); + ierr = MatCreate((MPI_Comm)mlmat-comm-USR_comm,A);CHKERRQ(ierr); ierr = MatSetSizes(A,m,n,PETSC_DECIDE,PETSC_DECIDE);CHKERRQ(ierr); ierr = MatSetType(A,MATMPIAIJ);CHKERRQ(ierr); ierr = PetscMalloc3 (m,PetscInt,nnzA,m,PetscInt,nnzB,m,PetscInt,nnz);CHKERRQ(ierr);
ML with OpenMPI
Mmm... I believe this is a configuration issue... if ML_MPI were defined, then ML_USR_COMM should be MPI_Comm. But the problem is perhaps on the ML side, not the PETSc side. ml_common.h #define ML_MPI if macro HAVE_MPI is defined. In turn HAVE_MPI is at ml_config.h, and that file is surelly generated by ML configure script. For some reason ML's configure failed to found MPI with the command line stuff PETSc pass to it. Look at the 'config.log' file inside the ml-5.0 dir to find what happened. On 3/21/08, Jed Brown jed at 59a2.org wrote: The MPI standard does not specify that MPI_Comm = int and in fact OpenMPI uses a pointer value which lets the compiler do slightly more type checking. This type checking recently caused me trouble when building with-download-ml. There is a line in ml_comm.h which defines their communicator to be an int when ML_MPI is not defined. It was not immediately clear why this is not defined, but result is that the compiler chokes when building PETSc. This patch fixes the problem as long as sizeof(int) = sizeof(MPI_Comm), but this is *not* the case on x86_64 with OpenMPI. It's not clear to me how much of this is an upstream issue and how much is a configuration issue. Jed diff -r c074838b79ed src/ksp/pc/impls/ml/ml.c --- a/src/ksp/pc/impls/ml/ml.c Thu Mar 20 17:04:05 2008 -0500 +++ b/src/ksp/pc/impls/ml/ml.c Fri Mar 21 22:50:32 2008 +0100 @@ -815,7 +815,7 @@ PetscErrorCode MatWrapML_SHELL(ML_Operat MLcomm = mlmat-comm; ierr = PetscNew(Mat_MLShell,shellctx);CHKERRQ(ierr); - ierr = MatCreateShell(MLcomm-USR_comm,m,n,PETSC_DETERMINE,PETSC_DETERMINE,shellctx,newmat);CHKERRQ(ierr); + ierr = MatCreateShell((MPI_Comm)MLcomm-USR_comm,m,n,PETSC_DETERMINE,PETSC_DETERMINE,shellctx,newmat);CHKERRQ(ierr); ierr = MatShellSetOperation(*newmat,MATOP_MULT,(void(*)(void))MatMult_ML);CHKERRQ(ierr); ierr = MatShellSetOperation(*newmat,MATOP_MULT_ADD,(void(*)(void))MatMultAdd_ML);CHKERRQ(ierr); shellctx-A = *newmat; @@ -844,7 +844,7 @@ PetscErrorCode MatWrapML_MPIAIJ(ML_Opera n = mlmat-invec_leng; if (m != n) SETERRQ2(PETSC_ERR_ARG_OUTOFRANGE,m %d must equal to n %d,m,n); - ierr = MatCreate(mlmat-comm-USR_comm,A);CHKERRQ(ierr); + ierr = MatCreate((MPI_Comm)mlmat-comm-USR_comm,A);CHKERRQ(ierr); ierr = MatSetSizes(A,m,n,PETSC_DECIDE,PETSC_DECIDE);CHKERRQ(ierr); ierr = MatSetType(A,MATMPIAIJ);CHKERRQ(ierr); ierr = PetscMalloc3(m,PetscInt,nnzA,m,PetscInt,nnzB,m,PetscInt,nnz);CHKERRQ(ierr); -- Lisandro Dalc?n --- Centro Internacional de M?todos Computacionales en Ingenier?a (CIMEC) Instituto de Desarrollo Tecnol?gico para la Industria Qu?mica (INTEC) Consejo Nacional de Investigaciones Cient?ficas y T?cnicas (CONICET) PTLC - G?emes 3450, (3000) Santa Fe, Argentina Tel/Fax: +54-(0)342-451.1594