Re: [OMPI users] btl_openib_cpc_include rdmacm questions
I am pretty sure MTL's and BTL's are very different, but just as a note, This users code (Crash) hangs at MPI_Allreduce() in Openib But runs on: tcp psm (an mtl, different hardware) Putting it out there if it does have any bearing. Otherwise ignore. Brock Palen www.umich.edu/~brockp Center for Advanced Computing bro...@umich.edu (734)936-1985 On May 12, 2011, at 10:20 AM, Brock Palen wrote: > On May 12, 2011, at 10:13 AM, Jeff Squyres wrote: > >> On May 11, 2011, at 3:21 PM, Dave Love wrote: >> >>> We can reproduce it with IMB. We could provide access, but we'd have to >>> negotiate with the owners of the relevant nodes to give you interactive >>> access to them. Maybe Brock's would be more accessible? (If you >>> contact me, I may not be able to respond for a few days.) >> >> Brock has replied off-list that he, too, is able to reliably reproduce the >> issue with IMB, and is working to get access for us. Many thanks for your >> offer; let's see where Brock's access takes us. > > I should also note that as far as I know I have three codes (CRASH, Namd > (some cases), and another user code. That lockup on a collective on OpenIB > but run with the same library on Gig-e. > > So I am not sure it is limited to IMB, or I could be crossing errors, > normally I would assume unmatched eager recvs for this sort of problem. > >> -- we have not closed this issue, >>> >>> Which issue? I couldn't find a relevant-looking one. >> >> https://svn.open-mpi.org/trac/ompi/ticket/2714 >> >> -- >> Jeff Squyres >> jsquy...@cisco.com >> For corporate legal information go to: >> http://www.cisco.com/web/about/doing_business/legal/cri/ >> >> >> ___ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users >> >> > > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users > >
Re: [OMPI users] Windows: MPI_Allreduce() crashes when using MPI_DOUBLE_PRECISION
Hello Hiral, in the ompi_info You attached, the fortran size detection did not work correctly (on viscluster -- aka that shows the you used the std.-installation package): ... Fort dbl prec size: 4 ... This most probably does not match Your compiler's setting for DOUBLE PRECISION, which probably considers this to be 8. Does REAL work for You? Shiqing is currently away, will ask when he returns. With best regards, Rainer On Wednesday 11 May 2011 09:29:03 hi wrote: > Hi Jeff, > > > Can you send the info listed on the help page? > > > >From the HELP page... > > ***For run-time problems: > 1) Check the FAQ first. Really. This can save you a lot of time; many > common problems and solutions are listed there. > I couldn't find reference in FAQ. > > 2) The version of Open MPI that you're using. > I am using pre-built openmpi-1.5.3 64-bit and 32-bit binaries on Window 7 > I also tried with locally built openmpi-1.5.2 using Visual Studio 2008 > 32-bit compilers > I tried various compilers: VS-9 32-bit and VS-10 64-bit and > corresponding intel ifort compiler. > > 3) The config.log file from the top-level Open MPI directory, if > available (please compress!). > Don't have. > > 4) The output of the "ompi_info --all" command from the node where > you're invoking mpirun. > see output of pre-built openmpi-1.5.3_x64/bin/ompi_info --all" in > attachments. > > 5) If running on more than one node -- > I am running test program on single none. > > 6) A detailed description of what is failing. > Already described in this post. > > 7) Please include information about your network: > As I am running test program on local and single machine, this might > not be required. > > > You forgot ierr in the call to MPI_Finalize. You also paired > > DOUBLE_PRECISION data with MPI_INTEGER in the call to allreduce. And > > you mixed sndbuf and rcvbuf in the call to allreduce, meaning that when > > your print rcvbuf afterwards, it'll always still be 0. > > As I am not Fortran programmer, this is my mistake !!! > > >program Test_MPI > >use mpi > >implicit none > > > >DOUBLE PRECISION rcvbuf(5), sndbuf(5) > >INTEGER nproc, rank, ierr, n, i, ret > > > >n = 5 > >do i = 1, n > >sndbuf(i) = 2.0 > >rcvbuf(i) = 0.0 > >end do > > > >call MPI_INIT(ierr) > >call MPI_COMM_RANK(MPI_COMM_WORLD, rank, ierr) > >call MPI_COMM_SIZE(MPI_COMM_WORLD, nproc, ierr) > >write(*,*) "size=", nproc, ", rank=", rank > >write(*,*) "start --, rcvbuf=", rcvbuf > >CALL MPI_ALLREDUCE(sndbuf, rcvbuf, n, > > & MPI_DOUBLE_PRECISION, MPI_SUM, MPI_COMM_WORLD, ierr) > >write(*,*) "end --, rcvbuf=", rcvbuf > > > >CALL MPI_Finalize(ierr) > >end > > > > (you could use "include 'mpif.h'", too -- I tried both) > > > > This program works fine for me. > > I am observing same crash, as described in this thread (when executing > as "mpirun -np 2 mar_f_dp.exe"), even with above correct and simple > test program. I commented 'use mpi' as it gave me "Error in compiled > module file" error, so I used 'include "mpif.h"' statement (see > attachement). > > It seems that Windows specific issue, (I could run this test program > on Linux with openmpi-1.5.1). > > Can anybody try this test program on Windows? > > Thank you in advance. > -Hiral -- Dr.-Ing. Rainer Keller http://www.hlrs.de/people/keller HLRS Tel: ++49 (0)711-685 6 5858 Nobelstrasse 19 Fax: ++49 (0)711-685 6 5832 70550 Stuttgartemail: kel...@hlrs.de Germany AIM/Skype:rusraink
Re: [OMPI users] Windows: MPI_Allreduce() crashes when using MPI_DOUBLE_PRECISION
Can please try this test program on Windows environment. Thank you. -Hiral On Thu, May 12, 2011 at 10:59 AM, hiwrote: > Any comment or suggestion on my below update > > > > On Wed, May 11, 2011 at 12:59 PM, hi wrote: >> Hi Jeff, >> >>> Can you send the info listed on the help page? >> >> From the HELP page... >> >> ***For run-time problems: >> 1) Check the FAQ first. Really. This can save you a lot of time; many >> common problems and solutions are listed there. >> I couldn't find reference in FAQ. >> >> 2) The version of Open MPI that you're using. >> I am using pre-built openmpi-1.5.3 64-bit and 32-bit binaries on Window 7 >> I also tried with locally built openmpi-1.5.2 using Visual Studio 2008 >> 32-bit compilers >> I tried various compilers: VS-9 32-bit and VS-10 64-bit and >> corresponding intel ifort compiler. >> >> 3) The config.log file from the top-level Open MPI directory, if >> available (please compress!). >> Don't have. >> >> 4) The output of the "ompi_info --all" command from the node where >> you're invoking mpirun. >> see output of pre-built openmpi-1.5.3_x64/bin/ompi_info --all" in >> attachments. >> >> 5) If running on more than one node -- >> I am running test program on single none. >> >> 6) A detailed description of what is failing. >> Already described in this post. >> >> 7) Please include information about your network: >> As I am running test program on local and single machine, this might >> not be required. >> >>> You forgot ierr in the call to MPI_Finalize. You also paired >>> DOUBLE_PRECISION data with MPI_INTEGER in the call to allreduce. And you >>> mixed sndbuf and rcvbuf in the call to allreduce, meaning that when your >>> print rcvbuf afterwards, it'll always still be 0. >> >> As I am not Fortran programmer, this is my mistake !!! >> >> >>> program Test_MPI >>> use mpi >>> implicit none >>> >>> DOUBLE PRECISION rcvbuf(5), sndbuf(5) >>> INTEGER nproc, rank, ierr, n, i, ret >>> >>> n = 5 >>> do i = 1, n >>> sndbuf(i) = 2.0 >>> rcvbuf(i) = 0.0 >>> end do >>> >>> call MPI_INIT(ierr) >>> call MPI_COMM_RANK(MPI_COMM_WORLD, rank, ierr) >>> call MPI_COMM_SIZE(MPI_COMM_WORLD, nproc, ierr) >>> write(*,*) "size=", nproc, ", rank=", rank >>> write(*,*) "start --, rcvbuf=", rcvbuf >>> CALL MPI_ALLREDUCE(sndbuf, rcvbuf, n, >>> & MPI_DOUBLE_PRECISION, MPI_SUM, MPI_COMM_WORLD, ierr) >>> write(*,*) "end --, rcvbuf=", rcvbuf >>> >>> CALL MPI_Finalize(ierr) >>> end >>> >>> (you could use "include 'mpif.h'", too -- I tried both) >>> >>> This program works fine for me. >> >> I am observing same crash, as described in this thread (when executing >> as "mpirun -np 2 mar_f_dp.exe"), even with above correct and simple >> test program. I commented 'use mpi' as it gave me "Error in compiled >> module file" error, so I used 'include "mpif.h"' statement (see >> attachement). >> >> It seems that Windows specific issue, (I could run this test program >> on Linux with openmpi-1.5.1). >> >> Can anybody try this test program on Windows? >> >> Thank you in advance. >> -Hiral >> >
Re: [OMPI users] Windows: MPI_Allreduce() crashes when using MPI_DOUBLE_PRECISION
Shiqing -- Got any ideas here? Are you able to run this sample program with mpif.h or use mpi on Windows? On May 11, 2011, at 12:29 AM, hi wrote: > Hi Jeff, > >> Can you send the info listed on the help page? > > From the HELP page... > > ***For run-time problems: > 1) Check the FAQ first. Really. This can save you a lot of time; many > common problems and solutions are listed there. > I couldn't find reference in FAQ. > > 2) The version of Open MPI that you're using. > I am using pre-built openmpi-1.5.3 64-bit and 32-bit binaries on Window 7 > I also tried with locally built openmpi-1.5.2 using Visual Studio 2008 > 32-bit compilers > I tried various compilers: VS-9 32-bit and VS-10 64-bit and > corresponding intel ifort compiler. > > 3) The config.log file from the top-level Open MPI directory, if > available (please compress!). > Don't have. > > 4) The output of the "ompi_info --all" command from the node where > you're invoking mpirun. > see output of pre-built openmpi-1.5.3_x64/bin/ompi_info --all" in attachments. > > 5) If running on more than one node -- > I am running test program on single none. > > 6) A detailed description of what is failing. > Already described in this post. > > 7) Please include information about your network: > As I am running test program on local and single machine, this might > not be required. > >> You forgot ierr in the call to MPI_Finalize. You also paired >> DOUBLE_PRECISION data with MPI_INTEGER in the call to allreduce. And you >> mixed sndbuf and rcvbuf in the call to allreduce, meaning that when your >> print rcvbuf afterwards, it'll always still be 0. > > As I am not Fortran programmer, this is my mistake !!! > > >>program Test_MPI >>use mpi >>implicit none >> >>DOUBLE PRECISION rcvbuf(5), sndbuf(5) >>INTEGER nproc, rank, ierr, n, i, ret >> >>n = 5 >>do i = 1, n >>sndbuf(i) = 2.0 >>rcvbuf(i) = 0.0 >>end do >> >>call MPI_INIT(ierr) >>call MPI_COMM_RANK(MPI_COMM_WORLD, rank, ierr) >>call MPI_COMM_SIZE(MPI_COMM_WORLD, nproc, ierr) >>write(*,*) "size=", nproc, ", rank=", rank >>write(*,*) "start --, rcvbuf=", rcvbuf >>CALL MPI_ALLREDUCE(sndbuf, rcvbuf, n, >> & MPI_DOUBLE_PRECISION, MPI_SUM, MPI_COMM_WORLD, ierr) >>write(*,*) "end --, rcvbuf=", rcvbuf >> >>CALL MPI_Finalize(ierr) >>end >> >> (you could use "include 'mpif.h'", too -- I tried both) >> >> This program works fine for me. > > I am observing same crash, as described in this thread (when executing > as "mpirun -np 2 mar_f_dp.exe"), even with above correct and simple > test program. I commented 'use mpi' as it gave me "Error in compiled > module file" error, so I used 'include "mpif.h"' statement (see > attachement). > > It seems that Windows specific issue, (I could run this test program > on Linux with openmpi-1.5.1). > > Can anybody try this test program on Windows? > > Thank you in advance. > -Hiral > -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
Re: [OMPI users] btl_openib_cpc_include rdmacm questions
On May 12, 2011, at 10:13 AM, Jeff Squyres wrote: > On May 11, 2011, at 3:21 PM, Dave Love wrote: > >> We can reproduce it with IMB. We could provide access, but we'd have to >> negotiate with the owners of the relevant nodes to give you interactive >> access to them. Maybe Brock's would be more accessible? (If you >> contact me, I may not be able to respond for a few days.) > > Brock has replied off-list that he, too, is able to reliably reproduce the > issue with IMB, and is working to get access for us. Many thanks for your > offer; let's see where Brock's access takes us. I should also note that as far as I know I have three codes (CRASH, Namd (some cases), and another user code. That lockup on a collective on OpenIB but run with the same library on Gig-e. So I am not sure it is limited to IMB, or I could be crossing errors, normally I would assume unmatched eager recvs for this sort of problem. > >>> -- we have not closed this issue, >> >> Which issue? I couldn't find a relevant-looking one. > > https://svn.open-mpi.org/trac/ompi/ticket/2714 > > -- > Jeff Squyres > jsquy...@cisco.com > For corporate legal information go to: > http://www.cisco.com/web/about/doing_business/legal/cri/ > > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users > >
Re: [OMPI users] btl_openib_cpc_include rdmacm questions
On May 11, 2011, at 3:21 PM, Dave Love wrote: > We can reproduce it with IMB. We could provide access, but we'd have to > negotiate with the owners of the relevant nodes to give you interactive > access to them. Maybe Brock's would be more accessible? (If you > contact me, I may not be able to respond for a few days.) Brock has replied off-list that he, too, is able to reliably reproduce the issue with IMB, and is working to get access for us. Many thanks for your offer; let's see where Brock's access takes us. >> -- we have not closed this issue, > > Which issue? I couldn't find a relevant-looking one. https://svn.open-mpi.org/trac/ompi/ticket/2714 -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
Re: [OMPI users] btl_openib_cpc_include rdmacm questions
On May 11, 2011, at 4:27 PM, Dave Love wrote: > Ralph Castainwrites: > >> I'll go back to my earlier comments. Users always claim that their >> code doesn't have the sync issue, but it has proved to help more often >> than not, and costs nothing to try, > > Could you point to that post, or tell us what to try excatly, given > we're running IMB? Thanks. http://www.open-mpi.org/community/lists/users/2011/04/16243.php > > (As far as I know, this isn't happening with real codes, just IMB, but > only a few have been in use.) Interesting - my prior experience was with real codes, typically "legacy" codes that worked fine until you loaded the node. > > -- > Excuse the typping -- I have a broken wrist > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] Sorry! You were supposed to get help about: But couldn't open help-orterun.txt
Hi, Clarifications: - I have downloaded pre-build OpenMPI_v1..5.3-x64 from open-mpi.org - installed it on Window 7 - and then copied OpenMPI_v1..5.3-x64 directory from Windows 7 to Windows Server 2008 into different directory and also in same directory Now on Windows Server 2008, I am observing these errors... c:\ompi_tests\win64>mpirun mar_f_i_op.exe [nbld-w08:04820] [[30632,0],0] ORTE_ERROR_LOG: Error in file ..\..\..\openmpi-1.5.3\orte\mca\ras\base\ras_base_allocate.c at line 147 [nbld-w08:04820] [[30632,0],0] ORTE_ERROR_LOG: Error in file ..\..\..\openmpi-1.5.3\orte\mca\plm\base\plm_base_launch_support.c at line 99 [nbld-w08:04820] [[30632,0],0] ORTE_ERROR_LOG: Error in file ..\..\..\openmpi-1.5.3\orte\mca\plm\ccp\plm_ccp_module.c at line 186 = As suggested, I tried with following but nothing worked... - copied to the same directory as it was in previous machine - executed "mpirun -mca orte_headnode_name HEADNODE_NAME" and "mpirun -mca orte_headnode_name MYHOSTNAME" - set OPENMPI_HOME and other OPAL_* env variables as follow... set OPENMPI_HOME=C:\MPIs\OpenMPI_v1.5.3-x64 set OPAL_PREFIX=C:\MPIs\OpenMPI_v1.5.3-x64 set OPAL_EXEC_PREFIX=C:\MPIs\OpenMPI_v1.5.3-x64 set OPAL_BINDIR=C:\MPIs\OpenMPI_v1.5.3-x64\bin set OPAL_SBINDIR=C:\MPIs\OpenMPI_v1.5.3-x64\sbin set OPAL_LIBEXECDIR=C:\MPIs\OpenMPI_v1.5.3-x64\libexec set OPAL_DATAROOTDIR=C:\MPIs\OpenMPI_v1.5.3-x64\share set OPAL_DATADIR=C:\MPIs\OpenMPI_v1.5.3-x64\share set OPAL_SYSCONFDIR=C:\MPIs\OpenMPI_v1.5.3-x64\etc set OPAL_LOCALSTATEDIR=C:\MPIs\OpenMPI_v1.5.3-x64\etc set OPAL_LIBDIR=C:\MPIs\OpenMPI_v1.5.3-x64\lib set OPAL_INCLUDEDIR=C:\MPIs\OpenMPI_v1.5.3-x64\include set OPAL_INFODIR=C:\MPIs\OpenMPI_v1.5.3-x64\share\info set OPAL_MANDIR=C:\MPIs\OpenMPI_v1.5.3-x64\share\man set OPAL_PKGDATADIR=C:\MPIs\OpenMPI_v1.5.3-x64\share\openmpi set OPAL_PKGLIBDIR=C:\MPIs\OpenMPI_v1.5.3-x64\lib\openmpi set OPAL_PKGINCLUDEDIR=C:\MPIs\OpenMPI_v1.5.3-x64\include\openmpi Please correct if I missed any other env variable. Thank you. -Hiral On Wed, May 11, 2011 at 8:56 PM, Shiqing Fanwrote: > Hi, > > The error message means that Open MPI couldn't allocate any compute node. It > might because the headnode wasn't discovered. You could try with option "-mca > orte_headnode_name HEADNODE_NAME" in the mpirun command line (mpirun --help > will show how to use it) . > > And Jeff is also right, special care should be taken for the executable > paths, and it's better to use UNC path. > > To clarify the path issue, if you just copy the OMPI dir to another computer, > there might also be another problem that OMPI couldn't load the registry > entries, as the registry entries were set during the installation phase on > the specific computer. In 1.5.3, a overall env "OPENMPI_HOME" will do the > work. > > Regards, > Shiqing > - 原始邮件 - > 发件人: Jeff Squyres > 收件人: Open MPI Users > 已发送邮件: Wed, 11 May 2011 15:21:26 +0200 (CEST) > 主题: Re: [OMPI users] Sorry! You were supposed to get help about: But > couldn't open help-orterun.txt > > On May 11, 2011, at 5:50 AM, Ralph Castain wrote: > >>> Clarification: I installed pre-built OpenMPI_v1.5.3-x64 on Windows 7 >>> and copied this directory into Windows Server 2008. > > Did you copy OMPI to the same directory tree that you built it? > > OMPI hard-codes some directory names when it builds, and it expects to find > that directory structure when it runs. If you build OMPI with a --prefix of > /foo, but then move it to /bar, various things may not work (like finding > help messages, etc.) unless you set the OMPI/OPAL environment variables that > tell OMPI where the files are actually located. > > -- > Jeff Squyres > jsquy...@cisco.com > For corporate legal information go to: > http://www.cisco.com/web/about/doing_business/legal/cri/ > > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users > >
Re: [OMPI users] Windows: MPI_Allreduce() crashes when using MPI_DOUBLE_PRECISION
Any comment or suggestion on my below update On Wed, May 11, 2011 at 12:59 PM, hiwrote: > Hi Jeff, > >> Can you send the info listed on the help page? > > From the HELP page... > > ***For run-time problems: > 1) Check the FAQ first. Really. This can save you a lot of time; many > common problems and solutions are listed there. > I couldn't find reference in FAQ. > > 2) The version of Open MPI that you're using. > I am using pre-built openmpi-1.5.3 64-bit and 32-bit binaries on Window 7 > I also tried with locally built openmpi-1.5.2 using Visual Studio 2008 > 32-bit compilers > I tried various compilers: VS-9 32-bit and VS-10 64-bit and > corresponding intel ifort compiler. > > 3) The config.log file from the top-level Open MPI directory, if > available (please compress!). > Don't have. > > 4) The output of the "ompi_info --all" command from the node where > you're invoking mpirun. > see output of pre-built openmpi-1.5.3_x64/bin/ompi_info --all" in attachments. > > 5) If running on more than one node -- > I am running test program on single none. > > 6) A detailed description of what is failing. > Already described in this post. > > 7) Please include information about your network: > As I am running test program on local and single machine, this might > not be required. > >> You forgot ierr in the call to MPI_Finalize. You also paired >> DOUBLE_PRECISION data with MPI_INTEGER in the call to allreduce. And you >> mixed sndbuf and rcvbuf in the call to allreduce, meaning that when your >> print rcvbuf afterwards, it'll always still be 0. > > As I am not Fortran programmer, this is my mistake !!! > > >> program Test_MPI >> use mpi >> implicit none >> >> DOUBLE PRECISION rcvbuf(5), sndbuf(5) >> INTEGER nproc, rank, ierr, n, i, ret >> >> n = 5 >> do i = 1, n >> sndbuf(i) = 2.0 >> rcvbuf(i) = 0.0 >> end do >> >> call MPI_INIT(ierr) >> call MPI_COMM_RANK(MPI_COMM_WORLD, rank, ierr) >> call MPI_COMM_SIZE(MPI_COMM_WORLD, nproc, ierr) >> write(*,*) "size=", nproc, ", rank=", rank >> write(*,*) "start --, rcvbuf=", rcvbuf >> CALL MPI_ALLREDUCE(sndbuf, rcvbuf, n, >> & MPI_DOUBLE_PRECISION, MPI_SUM, MPI_COMM_WORLD, ierr) >> write(*,*) "end --, rcvbuf=", rcvbuf >> >> CALL MPI_Finalize(ierr) >> end >> >> (you could use "include 'mpif.h'", too -- I tried both) >> >> This program works fine for me. > > I am observing same crash, as described in this thread (when executing > as "mpirun -np 2 mar_f_dp.exe"), even with above correct and simple > test program. I commented 'use mpi' as it gave me "Error in compiled > module file" error, so I used 'include "mpif.h"' statement (see > attachement). > > It seems that Windows specific issue, (I could run this test program > on Linux with openmpi-1.5.1). > > Can anybody try this test program on Windows? > > Thank you in advance. > -Hiral >
[OMPI users] Invitation to connect on LinkedIn
LinkedIn Open, I'd like to add you to my professional network on LinkedIn. - alex alex su Developer at Alibaba Cloud Computing Company China Confirm that you know alex su https://www.linkedin.com/e/kq0fyp-gnl5gmpc-3e/isd/2864619397/g7iDMgn0/ -- (c) 2011, LinkedIn Corporation