[OMPI users] problem with mpiexec/mpirun
Hello everyone! I am trying to run 11 instances of my program on 6 dual-core Opterons (it is not time-consuming application anyway, takes 10 seconds at one-core laptop :)). so, when I type: mpiexec -machinefile hostfile -n 11 ./program nothing happens! The output of: "mpdtrace -l" command (from the machine I type the command at) is: lx64a171_41469 (10.156.70.171) lx64a176_47945 (10.156.70.176) lx64a175_42990 (10.156.70.175) lx64a174_39601 (10.156.70.174) lx64a173_45387 (10.156.70.173) lx64a172_55297 (10.156.70.172) (seems that all 6 machines are there) Does anyone have any idea/clue what the reason could be? Thanks in advance! Regards, Jovana
Re: [OMPI users] problems with mpexec/mpirun
I'm afraid you're right... I was testing it with Open MPI on my laptop, but later on the cluster I had some problems... Probably a colleague has uploaded mpich... But I thought the behavior I see might be "implementation-independant". Probably sounds stupid... :) Thanks anyway :) 2009/10/12 > Send users mailing list submissions to >us...@open-mpi.org > > To subscribe or unsubscribe via the World Wide Web, visit >http://www.open-mpi.org/mailman/listinfo.cgi/users > or, via email, send a message with subject or body 'help' to >users-requ...@open-mpi.org > > You can reach the person managing the list at >users-ow...@open-mpi.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of users digest..." > > > Today's Topics: > > 1. problem with mpiexec/mpirun (Jovana Knezevic) > 2. Re: problem with mpiexec/mpirun (Ralph Castain) > > > ------ > > Message: 1 > Date: Mon, 12 Oct 2009 17:01:03 +0200 > From: Jovana Knezevic > Subject: [OMPI users] problem with mpiexec/mpirun > To: us...@open-mpi.org > Message-ID: ><9d13e50c0910120801p4058214n7e5de181c09b...@mail.gmail.com> > Content-Type: text/plain; charset=ISO-8859-1 > > Hello everyone! > > I am trying to run 11 instances of my program on 6 dual-core Opterons > (it is not time-consuming application anyway, takes 10 seconds at > one-core laptop :)). > so, when I type: > > mpiexec -machinefile hostfile -n 11 ./program > > nothing happens! > > The output of: > > "mpdtrace -l" command (from the machine I type the command at) is: > > lx64a171_41469 (10.156.70.171) > lx64a176_47945 (10.156.70.176) > lx64a175_42990 (10.156.70.175) > lx64a174_39601 (10.156.70.174) > lx64a173_45387 (10.156.70.173) > lx64a172_55297 (10.156.70.172) > > (seems that all 6 machines are there) > > Does anyone have any idea/clue what the reason could be? > > Thanks in advance! > > Regards, > Jovana > > > -- > > Message: 2 > Date: Mon, 12 Oct 2009 09:10:51 -0600 > From: Ralph Castain > Subject: Re: [OMPI users] problem with mpiexec/mpirun > To: Open MPI Users > Message-ID: > Content-Type: text/plain; charset=us-ascii; format=flowed; delsp=yes > > Hate to say this, but you don't appear to be using Open MPI. > "mpdtrace" is an MPICH command, last I checked. > > You might try their mailing list, or check which mpiexec you are using > and contact them. > > > On Oct 12, 2009, at 9:01 AM, Jovana Knezevic wrote: > > > Hello everyone! > > > > I am trying to run 11 instances of my program on 6 dual-core Opterons > > (it is not time-consuming application anyway, takes 10 seconds at > > one-core laptop :)). > > so, when I type: > > > > mpiexec -machinefile hostfile -n 11 ./program > > > > nothing happens! > > > > The output of: > > > > "mpdtrace -l" command (from the machine I type the command at) is: > > > > lx64a171_41469 (10.156.70.171) > > lx64a176_47945 (10.156.70.176) > > lx64a175_42990 (10.156.70.175) > > lx64a174_39601 (10.156.70.174) > > lx64a173_45387 (10.156.70.173) > > lx64a172_55297 (10.156.70.172) > > > > (seems that all 6 machines are there) > > > > Does anyone have any idea/clue what the reason could be? > > > > Thanks in advance! > > > > Regards, > > Jovana > > ___ > > users mailing list > > us...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > > -- > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users > > End of users Digest, Vol 1373, Issue 2 > ** >
[OMPI users] MPI_Bsend vs. MPI_Ibsend
Dear all, Could anyone please clarify me the difference between MPI_Bsend and MPI_Ibsend? Or, in other words, what exactly is "blocking" in MPI_Bsend, when the data is stored in the buffer and we "return"? :-) Another, but similar, question: What about the data-buffer - when can it be reused in each of the cases - simple examples: for (i=0; i
[OMPI users] MPI_Bsend vs. MPI_Ibsend (2)
-| > > > > > > Actually the 'B' in MPI_Bsend() specifies that it is a blocking *buffered* > send. So if I remember my standards correctly, this call requires: > > 1) you will have to explicitly manage the send buffers via > MPI_Buffer_[attach|detach](), and > > 2) the send will block until a corresponding receive is posted. > > The MPI_Ibsend() is the immediate version of the above and will return w/o > the requirement for the corresponding received. Since it is a buffered > send the local data copy should be completed before it returns, allowing > you to change the contents of the local data buffer. But there is no > guaranty that the message has been send, so you should not reuse the send > buffer until after verifying the completion of the send via MPI_Wait() or > similar. > > In your example, since MPI_Test() won't block, you can have a problem. Use > MPI_Wait() instead or change your send buffer to one that is not being > used. > > -bill > > > > -Original Message- > From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On > Behalf Of Jovana Knezevic > Sent: Thursday, May 06, 2010 4:44 AM > To: us...@open-mpi.org > Subject: [OMPI users] MPI_Bsend vs. MPI_Ibsend > > Dear all, > > Could anyone please clarify me the difference between MPI_Bsend and > MPI_Ibsend? Or, in other words, what exactly is "blocking" in > MPI_Bsend, when the data is stored in the buffer and we "return"? :-) > Another, but similar, question: > > What about the data-buffer - when can it be reused in each of the > cases - simple examples: > > for (i=0; i > MPI_Bsend (&data_buffer[0], ..., slave_id1...); > > } // Can any problem occur here, since we send the data_buffer several > times? > > for (i=0; i > MPI_Ibsend (&data_buffer[0], ..., slave[i]..., &request); > MPI_Test(&request...) > > } // Any difference to previous example? Concerning the re-use of > data_buffer? > > Thank you a lot in advance. > > Regards > Jovana > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >
[OMPI users] MPI_Bsend vs. MPI_Ibsend
Thank you very much, now I get it! :-) Cheers, Jovana
Re: [OMPI users] Behaviour of MPI_Cancel when using 'large' messages
Hello Gijsbert, I had the same problem few months ago. I even could not cancel the messages for which I did not have a matching receive on the other side (thus, they could not have been received! :-)). I was wondering really what was going on... I have some experience with MPI, but I am not an expert. I would really appreciate an explanation from the developers. While "google"-ing the potential solution, I found out that some distributions (not Open-MPI) do not allow canceling, thus, I think that one cannot rely on MPI_Cancel(). If I am right, the question is then: why implement it?! Is the logic behind "better ever than never"? :-) So, use it when it is better to do the cancellation, but don't really rely on it... ?! As I said, I am not an expert, but it would be great to hear about this from them. If, however, YOU find any solution, it would be great if you wrote about it on this list! Thanks in advance. Regards, Jovana Knezevic 2010/6/7 : > Send users mailing list submissions to > us...@open-mpi.org > > To subscribe or unsubscribe via the World Wide Web, visit > http://www.open-mpi.org/mailman/listinfo.cgi/users > or, via email, send a message with subject or body 'help' to > users-requ...@open-mpi.org > > You can reach the person managing the list at > users-ow...@open-mpi.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of users digest..." > > > Today's Topics: > > 1. Re: mpi_iprobe not behaving as expect (David Zhang) > 2. Re: mpi_iprobe not behaving as expect (David Zhang) > 3. Re: mpi_iprobe not behaving as expect (David Zhang) > 4. Behaviour of MPI_Cancel when using 'large' messages > (Gijsbert Wiesenekker) > 5. Re: [sge::tight-integration] slot scheduling and resources > handling (Eloi Gaudry) > 6. ompi-restart, ompi-ps problem (Nguyen Kim Son) > 7. ompi-restart failed (Nguyen Toan) > 8. Re: ompi-restart failed (Nguyen Toan) > > > -- > > Message: 1 > Date: Sun, 6 Jun 2010 11:08:41 -0700 > From: David Zhang > Subject: Re: [OMPI users] mpi_iprobe not behaving as expect > To: users > Message-ID: > > Content-Type: text/plain; charset="iso-8859-1" > > On Sat, Jun 5, 2010 at 2:44 PM, David Zhang wrote: > >> Dear all: >> >> I'm using mpi_iprobe to serve as a way to send signals between different >> mpi executables. I'm using the following test codes (fortran): >> >> #1 >> program send >> implicit none >> include 'mpif.h' >> >> real*8 :: vec(2)=1.0 >> integer :: ierr,i=0,request(1) >> >> call mpi_init(ierr) >> do >> call mpi_isend(vec,2,mpi_real8, >> 0,1,mpi_comm_world,request(1),ierr) >> i=i+1 >> print *,i >> vec=-vec >> call usleep_fortran(2.d0) >> call mpi_wait(request(1),MPI_STATUS_IGNORE,ierr) >> end do >> >> end program send >> -- >> #2 >> program send >> implicit none >> include 'mpif.h' >> >> real*8 :: vec(2) >> integer :: ierr >> >> call mpi_init(ierr) >> do >> if(key_present()) then >> call >> mpi_recv(vec,2,mpi_real8,1,1,mpi_comm_world,MPI_STATUS_IGNORE,ierr) >> end if >> call usleep_fortran(0.05d0) >> >> end do >> >> contains >> >> function key_present() >> implicit none >> logical :: key_present >> >> key_present = .false. >> call >> mpi_iprobe(1,1,mpi_comm_world,key_present,MPI_STATUS_IGNORE,ierr) >> print *, key_present >> >> end function key_present >> >> end program send >> --- >> The usleep_fortran is a routine I've written to pause the program for that >> amount of time (in seconds). As you can see, on the receiving end I'm >> probing to see whether the message has being received every 0.05 seconds, >> where each probing would result a print of the probing result; while the >> sending is once every 2 seconds. >> >> Doing >> mpirun -np 1 recv : -np 1 send >> Naturally I expect the output to be something like: >> >> 1 >> (fourty or so F) >> T >> 2 >> (another fourty or so F) >> T &g
[OMPI users] mpirun problem
The description of the problem is included in Problem txt-file in the zipped attachment. Thank you. ompi_support.tar.gz Description: GNU Zip compressed data
[OMPI users] mpirun problem
Hello, I'm new to MPI, so I'm going to explain my problem in detail I'm trying to compile a simple application using mpicc (on SUSE 10.0) and run it - compilation passes well, but mpirun is the problem. So, let's say the program is called 1.c, I tried the following: mpicc -o 1 1.c (and, just for the case, after problems with mpirun, I tried the following, too) mpicc --showme:compile mpicc --showme:link mpicc -I/include -pthread 1.c -pthread -I/lib -lmpi -lopen-rte -lopen-pal -ldl -Wl,--export-dynamic -lnsl -lutil -lm -ldl -o 1 Both versions (wih or without flags) produced executables as expected (so, when I write: ./1 it executes in expected manner), but when I try this: mpirun -np 4 ./1, it terminates giving the following message: ssh: (none): Name or service not known -- A daemon (pid 6877) died unexpectedly with status 255 while attempting to launch so we are aborting. There may be more information reported by the environment (see above). This may be because the daemon was unable to find all the needed shared libraries on the remote node. You may set your LD_LIBRARY_PATH to have the location of the shared libraries on the remote nodes and this will automatically be forwarded to the remote nodes. -- -- mpirun noticed that the job aborted, but has no info as to the process that caused that situation. -- mpirun: clean termination accomplished Those are the PATH and LD_LIBRARY_PATH variables (I changed profile file manually and those are the results:) /* note: openmpi-1.3 folder with the instalation is in /root/Desktop , I configured it with --prefix /root/Desktop/openmpi-1.3, so bin and lib folders are in /root/Desktop/openmpi-1.3 */ echo $PATH /root/Desktop/openmpi-1.3/bin:/usr/lib/qt3/bin:/root/Desktop/openmpi-1.3/bin:/usr/lib/qt3/bin:/sbin:/usr/sbin:/usr/local/sbin:/opt/kde3/sbin:/opt/gnome/sbin:/root/bin:/usr/local/bin:/usr/bin:/usr/X11R6/bin:/bin:/usr/games:/opt/gnome/bin:/opt/kde3/bin:/usr/lib/mit/bin:/usr/lib/mit/sbin echo $LD_LIBRARY_PATH /root/Desktop/openmpi-1.3/lib:/usr/lib/qt3/lib:/root/Desktop/openmpi-1.3/lib:/usr/lib/qt3/lib: I included in Attachment the 'omp-info' and 'ifconfig' - command results in separate files as well as compressed config.log file from the root folder of the instalation Thank you very much in advance for your help. Regards, Jovana K. ompi_support.tar Description: Unix tar archive
[OMPI users] Problem with MPI_File_read()
Hello everyone! I have a problems using MPI_File_read() in C. Simple code below, trying to read an integer prints to the standard output wrong result (instead of 1 prints 825307441). I tried this function with 'MPI_CHAR' datatype and it works. Probably I'm not using it properly for MPI_INT, but I can't find what can be a problem anywhere in the literature, so I would really appreciate if anyone of you could check out the code below quickly and maybe give me some advice, or tell me what's wrong with it. Thanks a lot in advance. Regards, Jovana Knezevic #include #include #include void read_file (MPI_File *infile) { MPI_Status status; int *buf; int i; buf = (int *)malloc( 5 * sizeof(int) ); for(i=0; i<5; i++) buf[i]=0; MPI_File_read(*infile, buf, 1, MPI_INT, &status); printf("%d\n", buf[0]); } int main (int argc, char **argv) { MPI_File infile1; int procID, nproc; MPI_Init (&argc, &argv); MPI_Comm_rank (MPI_COMM_WORLD, &procID); MPI_Comm_size (MPI_COMM_WORLD, &nproc); printf("begin\n"); MPI_File_open(MPI_COMM_WORLD,"first.dat" ,MPI_MODE_RDONLY,MPI_INFO_NULL,&infile1); if(procID==0) { printf("proc0\n"); read_file(&infile1); } else { printf("proc1\n"); } MPI_File_close(&infile1); printf("end\n"); MPI_Finalize(); return EXIT_SUCCESS; }
[OMPI users] Problem with MPI_File_read() (2)
> > Hi Jovana, > > 825307441 is 0x31313131 in base 16 (hexadecimal), which is the string > `' in ASCII. MPI_File_read reads in binary values (not ASCII) just > as the standard functions read(2) and fread(3) do. > > So, your program is fine; however, your data file (first.dat) is not. > > Cheers, > Shaun > Thank you very much, Shaun! Ok, now I realise it's really stupid that I was trying so hard to get the result that I wanted :-) Well, it seems it's not a problem if I'm just reading with MPI_File_read and writing with MPI_File_write, but if I try to do some calculations with the data I read, it doesn't work... Do you maybe have some idea how one can deal with this ( I have an input file for my project - much larger code than the sample I gave last time - consisting of integers, doubles, characters and so on... Maybe it's a silly question, but can I convert my input file somehow into something so that it works? :-) Any ideas would help. Thanks again. Cheers, Jovana
Re: [OMPI users] Problem with MPI_File_read() (2)
> > In general, files written by MPI_File_write (and friends) are only > guaranteed to be readable by MPI_File_read (and friends). So if you > have an ASCII input file, or even a binary input file, you might need > to read it in with traditional/unix file read functions and then write > it out with MPI_File_write. Then your parallel application will be > able to use the various MPI_File_* functions to read the file at run- > time. Hence, there's no real generic -> > convertor; you'll need to write your own that is specific to your data. > > Make sense? Hello Jeff! Thanks a lot! Yes, sure, what you say makes sense. On the other hand, it seems I will have to "traditionaly"-open the input file for n times - each one for one process, since anyway all of my processes have to collect their data from it (each parsing it from the beginning to the end), don't you think so? I wanted to take an advantage of MPI to read (in each process) the data from one file... Or have I misunderstood something?