Re: [OMPI users] MPI and C++ (Boost)
Hi Luis, Luis Vitorio Cargnini wrote: Your suggestion is a great and interesting idea. I only have the fear to get used to the Boost and could not get rid of Boost anymore, because one thing is sure the abstraction added by Boost is impressive, it turn I should add that I fully understand what it is you are saying and despite all the good things there were being said about Boost, I was avoiding it for a very long time because of the dependency issue. For two reasons -- the dependency issue for myself (exactly like what you said) and distributing it means users will have to do an extra step (regardless of how easy/hard the step is, it's an extra step). I finally switched over :-) and the "prototype" idea was just a way to ease you into it. MPI programs are hard to get right, and Boost aside, it is a good idea to have something working that is easy to do and then you can remove the parts that you don't like later. By the way, it seems that less-used parts of MPI do not have equivalents in Boost.MPI, so just using Boost won't solve all of your problems. There is a list here (the table with the entries that say "unsupported"): http://www.boost.org/doc/libs/1_39_0/doc/html/mpi/tutorial.html#mpi.c_mapping Good luck! Ray
Re: [OMPI users] MPI and C++ (Boost)
Hi Luis, Luis Vitorio Cargnini wrote: Thanks, but I really do not want to use Boost. Is easier ? certainly is, but I want to make it using only MPI itself and not been dependent of a Library, or templates like the majority of boost a huge set of templates and wrappers for different libraries, implemented in C, supplying a wrapper for C++. I admit Boost is a valuable tool, but in my case, as much independent I could be from additional libs, better. I've used Boost MPI before and it really isn't that bad and shouldn't be seen as "just another library". Many parts of Boost are on their way to being part of the standard and are discussed and debated on. And so, it isn't the same as going to some random person's web page and downloading their library/template. Of course, it takes time to make it into the standard and I'm not entirely sure if everything will (probably not). (One "annoying" thing about Boost MPI is that you have to compile it...if you are distributing your code, end-users might find that bothersome...oh, and serialization as well.) One suggestion might be to make use of Boost and once you got your code working, start changing it back. At least you will have a working program to compare against. Kind of like writing a prototype first... Ray
Re: [OMPI users] Error connecting to nodes ?
Hi Ashika, Ashika Umanga Umagiliya wrote: In my MPI environment I have 3 Debian machines all setup openMPI in /usr/local/openMPI, configured PATH and LD_LIBRARY_PATH correctly. And I have also configured passwordless SSH login in each node. But when I execute my application , it gives following error , what seems to be the problem ? Have you check whether or not mpirun works on a single machine (i.e., mpirun -np 4 -host localhost mandel)? Did you install openmpi from source or via the apt-get package manager? I used the pkg mgr and orted is located at /usr/bin/orted -- do you have this file on all 3 systems? And this is Debian stable? Ray
Re: [OMPI users] Compiling ompi for use on another machine
Hi Ben, ben rodriguez wrote: I have compiled ompi and another program for use on another rhel5/x86_64 machine, after transfering the binaries and setting up environment variables is there anything else I need to do for ompi to run properly? When executing my prog I get: -- MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD with errorcode 1. NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes. You may or may not see output from other processes, depending on exactly when Open MPI kills them. -- Just a few thoughts about your problem... Are the two machines identical in architecture and RH installation? Is there any reason why you cannot compile on the other machine too? (Sometimes the location of dynamic libraries, etc. changes so I try to make a note to always recompile on each machine.) Are you having problems running your program on each node individually first? If not, you might try that first (i.e., with "--np 1"). Ray
[OMPI users] Can't start program across network
Hi all, I'm having a problem running mpirun and I was wondering if there are suggestions on how to find out the cause. I have 3 machines that I can use: X, Y, and Z. The important thing is that X is different from Y and Z (the software installed, version of Linux, etc. Y and Z are identical software installations.) All of this works: [On X] mpirun --host Y,Z --np 2 uname -a [On X] mpirun --host X,Y,Z --np 3 uname -a [On Y] mpirun --host Y --np 2 uname -a (and likewise, other combinations) What doesn't work is: [On Y] mpirun --host Y,Z --np 2 uname -a [On Y] mpirun --host X,Y,Z --np 3 uname -a ...and similarly for machine Z. I can confirm that from any of the 3 machines, I can ssh to the other without typing in a password. I set up the RSA keys correctly [I think]. When I run the above commands, it just hangs. Adding "--verbose" doesn't produce any information...I don't know what it's doing. I had a longer running program than "uname" and I didn't see it appear on any of the machines. In fact [since it hangs], I don't see uname on "top", either. I do, however, see "mpirun" and "orted" on top, though. I guess some setup is missing that X has that the other two do not have. Any suggestions on how to find out the cause of this problem? Thank you! Ray PS: It has been a long time since I got X working...I might have done something that I no longer remember; but I don't remember seeing this problem before.
Re: [OMPI users] round-robin scheduling question [hostfile]
Hi Ralph, Ralph Castain wrote: ... The man page will describe all the various options. Which one is best for your app really depends on what the app is doing, the capabilities and topology of your cluster, etc. A little experimentation can help you get a feel for when to use which one. Thank you for the explanation! So far, I've only been using -np and letting the rest work itself through magic :-) -- but, I'll try the options you suggested and also other options in the man page of mpirun to see what works for my application... Thanks again! Ray
[OMPI users] round-robin scheduling question [hostfile]
Hi all, According to FAQ 14 (How do I control how my processes are scheduled across nodes?) [http://www.open-mpi.org/faq/?category=running#mpirun-scheduling], it says that the default scheduling policy is by slot and not by node. I'm curious why the default is "by slot" since I am thinking of explicitly specifying by node but I'm wondering if there is an issue which I haven't considered. I would think that one reason for "by node" is to distribute HDD access across machines [as is the case for me since my program is HDD access intensive]. Or perhaps I am mistaken? I'm now thinking that "by slot" is the default because processes with ranks that are close together might do similar tasks and you would want them on the same node? Is that the reason? Also, at the end of this FAQ, it says "NOTE: This is the scheduling policy in Open MPI because of a long historical precendent..." -- does this "This" refer to "the fact that there are two scheduling policies" or "the fact that 'by slot' is the default"? If the latter, then that explains why "by slot" is the default, I guess... Thank you! Ray
Re: [OMPI users] running as rank 0 of 1 for 2 processor
Hi Ramya, Ramya Narasimhan wrote: Hi, I have installed openmpi-1.3. When I checked for the example programs, the output shows only rank 0 of size 1 for 2 processors. When I gave the command: *mpirun -hostfile node -np 2 hello_c* the output is Hello, world, I am 0 of 1 Hello, world, I am 0 of 1 In my node file, I have *IP address* slots=2 max_slots=2 I don't know why it is not giving as 0 of 2 and 1 of 2. Other than the text output, can you confirm that this is happening? I'm no expert, but as a sanity check, I would put an infinite loop after printing "hello, world". Then opening another terminal and run "top". I'm not sure if all top's are the same, but I would hit "f" to include more fields and then "J" to include the "Last used cpu". Check the help manual for your top if its different for you. On my top, this information is not included by default. Then return to the previous screen. Ideally, you should see two processes running on your computer. One on processor "0" and the other on "1". Can you confirm this is not the case and both are set to "0"? Ray
Re: [OMPI users] Compilers
Hi Amos, Amos Leffler wrote: I want to compile Open-mpi using intel compilers. Unfortunately the Series 10 C compiler(icc) license has expired. I downloaded and looked at the Series 11 C++ compiler (no C compiler listed) and would like to know if you can use this together with an enclosed or obtained C compiler from Intel. The release notes are a bit overwhelming! Is it possible to use the standard Linux gcc instead? Yes you can use gcc/g++ as that is what I use. I do not know about Intel's compilers though as I don't use them. However, this answer in the FAQ seems to address your query: http://www.open-mpi.org/faq/?category=building#build-compilers ...and the answer seems to be "yes" (in fact, Intel compilers is the example used). Ray
Re: [OMPI users] Only the root can run mpirun? other users how to do ?
Hi Chong, Chong Su wrote: Now the mpi can be used normally . But we need let the non-root user run the MPI program too, HOW can we do ? What type of operating system? Generally, anyone can run mpirun and mpic/mpic++/etc. Are you unable to do that? What kind of error message are you getting? And are you sure that it was installed properly? i.e., Can root run it and not normal users? Ray
Re: [OMPI users] OpenMPI on a LAN
Hi Heitor, Heitor Florido wrote: I have installed OpenMPI on both computers and my application works on on both of them, but when I try to communicate between them, the method MPI_Lookup_name can't resolve the name published by the other machine. I've tried to run the example from mpi-forum that uses MPI_Open_port too, but it didn't work either. After reading about it on some FAQs e some other threads from the forum, I believe that I need to config my ssh options. Honestly, when I installed Open MPI, I didn't perform any configuration of the ssh options, as far as I can remember. I'm not sure if someone else can help you. I can imagine networks being set up incorrectly, but I can't imagine what incorrect ssh option there would be to prevent one computer from finding another. In addition to some FAQs, Gus suggested running a simple example called hello_c.c. Have you tried that? It is hard to give any suggestions unless you give more information such as a shortened version of your source code and what is the command line that you ran mpirun with. It might help if you ran some existing code (such as http://mpi.deino.net/mpi_functions/MPI_Lookup_name.html), too. Ray
Re: [OMPI users] OpenMPI on a LAN
Hi Heitor, Heitor Florido wrote: Hello, I have built an application using opemmpi 1.2.8 that is a client/server application that uses MPI_publish_name and MPI_Lookup_name to start the communication. This application works fine on a single computer. However, i'd like to run it on 2 pcs using linux (ubuntu) connected by a LAN. If I start the server on a computer, my cliente can't find the name published by the server and crashes. What should I do make this work? I believe that my application is correct, so I think I need to install some other service to make MPI see the other computer. []s Have you installed Open MPI on both computers? And can you run your application as a single computer on both computers? (i.e., each one working independently) If you are concerned about your Open MPI installation, you can try some simple code from the web that has been shown to work to confirm that the problem is not with your application (I'm not saying it is :-), just suggesting that you should remove it from the equation, if you can). Ray
Re: [OMPI users] timing + /usr/bin/time
Hi Reuti, I have to admit that I'm so familiar with SGE, but I'll take a look at it so that I'll learn something. In my current situation, I don't /need/ to report a user time. I was just wondering if it has any meaning and what people mean when they show numbers or a graph and just says "time". But thank you for pointing this out! Ray Reuti wrote: Hi Ray, with the Tight Integration of Open MPI into SGE (http://gridengine.sunsource.net/) you will get a correct accouting. Every process created with qrsh (a replacement for ssh) will have an additional group id attached and SGE will accumulate them all. Depending on the size of the cluster, you might want to look into a batch queuing system. In fact: we use it even local on some machines to serialize the workflow. -- Reuti
Re: [OMPI users] timing + /usr/bin/time
Hi Fabian, Thank you for clarifying things and confirming some of the things that I thought. I guess I have a clearer understanding now. Fabian Hänsel wrote: H, I guess user time does not matter since it is real time that we are interested in reducing. Right. Even if we *could* measure user time of every MPI worker process correctly this was not what you are interested in: Depending on the algorithm a significant amount of time could get spend waiting for MPI messages to arrive - and that time would not count as user time, but also was not 'wasted' as something important happens. The reason why I was wondering is that some people in research papers compare their algorithm (system) with another one by measuring user time since it removes some of the effects of what the system does on behalf of the user's process. And some people, I guess, see this as a fairer comparison. On the other hand, I guess I've realized the obvious -- that Open MPI doesn't reduce the efficiency of the algorithm. Even worse, increases in user time is an artifact of Open MPI, so it is somewhat misleading if we are analyzing an algorithm. What MPI should do (if properly used) is to reduce the real time and that's what we should be reporting...even if it includes other things that we did not want previously, like the time spent by the OS in swapping memory, etc. [Papers I've read with graphs that have "time" on the y-axis and "processors" on the x-axis rarely mention what time they are measuring...but it seems obviously now that it must be real time since user time should [???] increase with more processors.I think...of course, assuming we can total the user time across machines accurately.] Thank you for your message(s)! Think I got it now... :-) Ray
Re: [OMPI users] timing + /usr/bin/time
Hi Fabian, Fabian Hänsel wrote: On a separate topic, but related to your post here, how did you do the timing? [Especially to so many digits of accuracy. :-) ] two things to consider: i) What do I actually (want to) measure? ii) How accurate can I do that? i) Option iA) execution time of the whole program One could use /usr/bin/time. Simple, but that is not that accurate. If you do not need microsecond accuracy and measure all the things you want to compare in the same fashion a run like "/usr/bin/time mpirun -np X myprog" should perfectly suits your needs. So, to make sure I understand what happens... This command: mpirun -np 2 myprog starts the program "mpirun" and two processes of "myprog". So, what the "real time" of /usr/bin/time reports is the wall clock for mpirun. Does the user time have any meaning here? I'm not very good with the theory behind multi-processor programming...but Perl (for example)has a "times" function (http://perldoc.perl.org/functions/times.html) which "Returns a ... list ... for this process and the children of this process". Are the two instances of myprog considered children of mpirun? H, I guess user time does not matter since it is real time that we are interested in reducing. Option iB) time my 'crunching core' runs Something in rank0 like "time1=gettime(); crunch_alot(); time2=gettime(); time_used=time2-time1;" is much better suited to actually measure only the important parts, esp. if they can run shortly compared to overall programm execution time. (this assumes that crunch_alot() has a barrier character, e.g. by collecting values from all processes at the end) ii) MPI_Wtime() is generally considered the best way, as it is platform independent and usually MPI libs try to use the most accurate measurement method available on every certain platform. Yes, gettimg () would work also. I didn' t know there was a MPI_Wtime () function, though. Thanks! I've just used gettimeofday(), because some of my demo apps are intended to run independent of MPI. So my accuracy is at most micro seconds (on some platforms that might be different; e.g. only 10ms-steps). [Especially to so many digits of accuracy. :-) ] By unsuitably using printf("%.9f",time) ;-) I've just not cleared up that part. (But still the thing I wanted to demonstrate became quite apparent.) Ah! You demonstrated that well -- thanks! However, I saw all those digits of accuracy and I thought you were doing some magic with timing MPI programs that I'd like to know -- it was only a "%.9f", after all... :-) Thank you! Ray
Re: [OMPI users] dual cores --> timing + /usr/bin/time
Hi Fabian, On a separate topic, but related to your post here, how did you do the timing? [Especially to so many digits of accuracy. :-) ] I will have to time my program and I don't think /usr/bin/time would do it. Are the numbers it report accurate [for an MPI program]? I think the "user time" would be inaccurate since I need to get the user time of all the processes...but the "real time" of the main process should be ok? Ray Fabian Hänsel wrote: Be warned that at least in default config running more MPI threads than you have cores results in dog slow code. Single core machine: $ cat my-hosts localhost slots=1 $ mpirun -np 1 -hostfile my-hosts ./sort selectionsort 1024 1024 0.009905000seconds $ mpirun -np 2 -hostfile my-hosts ./sort selectionsort 1024 1024 4.113605000 seconds (on dual core both -np 1 and -np 2 run almost equally fast (only slightly speedup due to poor algorithm (developed for demonstration purposes))
Re: [OMPI users] Open MPI programs with autoconf/automake?
Hi Jeff, Jeff Squyres wrote: On Nov 10, 2008, at 6:41 AM, Jed Brown wrote: With #define's and compiler flags, I think that can be easily done -- was wondering if this is something that developers using MPI do and whether AC/AM supports it. AC will allow you to #define whatever you want -- look at the documentation for AC_DEFINE and AC_DEFINE_UNQUOTED. You can also tell your configure script to accept various --with- and --enable- arguments; see the docs for AC_ARG_WITH and AC_ARG_ENABLE. Thanks for this! I know "it's in the document", but I've been going through it with much difficulty. Definitely complete, but hard to get into and know what it is I need. So, some keywords to search for will definitely help! If --with-mpi is not specified, the following will happen: 1. You don't set CC (and friends), so AC_PROG_CC will find the default compilers. Hence, your app will not be compiled and linked against the MPI libraries. 2. #define BUILDING_WITH_MPI to 0, so the code above will compile out the call to MPI_Send(). Both of these are valid techniques -- use whichever suits your app the best. I see; thank you for giving me this second option. I guess I'm more attracted to this since it allows me to continue working with Open MPI. As I am writing the system [now], I'll have to keep in mind to make it modular so that parts can be #define'd in and out easily. Thank you for your careful explanation! Ray
Re: [OMPI users] dual cores
Dear Erin, I'm nowhere near a guru, so I hope you don't what I have to say (it might be wrong...). But what I did was just put a long loop into the program and while it was running, I opened another window and looked at the output of "top". Obviously, without the loop, the program would terminate too fast. If you have two CPUs and the total of the process exceeds 100% (i.e., if you run with np=2, you might have 98% and 98%), then I would think this is enough proof that both cores are being used. I'm saying this on the list hoping that someone can correct my knowledge of it, too... Ray Hodgess, Erin wrote: Dear Open MPI gurus: I have just installed Open MPI this evening. I have a dual core laptop and I would like to have both cores running. Here is the following my-hosts file: localhost slots=2 and here is the command and output: mpirun --hostfile my-hosts -np 4 --byslot hello_c |sort Hello, world, I am 0 of 4 Hello, world, I am 1 of 4 Hello, world, I am 2 of 4 Hello, world, I am 3 of 4 hodgesse@erinstoy:~/Desktop/openmpi-1.2.8/examples> How do I know if both cores are running, please?
Re: [OMPI users] Open MPI programs with autoconf/automake?
Hi Nuno, Thank you for the link and your update to it. I definitely don't mind that it isn't "pretty"! :-) Since your post, I've been trying to understand it and how to work it in. But, I think I've been making some progress over the weekend. Thank you! Ray Nuno Sucena Almeida wrote: Hi, see if this macro solves your problem: http://autoconf-archive.cryp.to/acx_mpi.html it requires some improvement, but might be a start. Since I only have OpenMPI I use it in the following way(it's not pretty): configure.ac: (...) dnl Check for MPI dnl This check will set the MPICC and MPICXX variables to the MPI compiler ones dnl if the library is found, or to the regular compilers if not. AC_ARG_WITH(mpi, [AC_HELP_STRING([--with-mpi], [enable MPI support [default=yes]])], [case "${withval}" in yes|no) with_mpi=$withval;; *) AC_MSG_ERROR(bad value ${withval} for --with-mpi);; esac], [with_mpi=yes]) if test "x$with_mpi" = "xyes"; then ACX_MPI([], [AC_MSG_ERROR(could not find mpi library for --with-mpi)]) AC_DEFINE(HAVE_MPI) MPI_CXXLIBS=`mpicxx --showme:link` MPI_CXXFLAGS=`mpicxx --showme:compile` AC_SUBST(MPI_CXXLIBS) AC_SUBST(MPI_CXXFLAGS) else MPICC="$CC" MPICXX="$CXX" AC_SUBST(MPICC) AC_SUBST(MPICXX) fi AM_CONDITIONAL([WE_HAVE_MPI],[test "x$with_mpi" = "xyes"]) (...) Makefile.am: (...) # MPI headers/libraries: INCLUDES+=$(MPI_CXXFLAGS) OTHERLIBS+=$(MPI_CXXLIBS) etc I would start by improving the mentioned macro with specific support for each MPI implementation... Nuno On Thursday 06 November 2008 06:35:33 am Raymond Wan wrote: I'm not sure if this is relevant to this mailing list, but I'm trying to get autoconf/automake working with an Open MPI program I am writing (in
Re: [OMPI users] Open MPI programs with autoconf/automake?
Hi Jeff, Thank you for your reply! Indeed, I was never going to look at OMPI's use of AC/AM...no doubt that would be far too complex for me. :-) The AC/AM documents I have found so far are quite difficult for me -- I really am starting from zero. Prior to using MPI, I have been writing my own Makefiles and for the projects I work on [usually alone], that was sufficient. However, your tip did help me; I dropped "MPI" from my search and added "tutorial" instead and the hits are better. As a starting point, I probably will only support open source MPIs. One thing I was wondering about was whether it is possible, though the use of #define's, to create code that is both multi-processor (MPI/mpic++) and single-processor (normal g++). That is, if users do not have any MPI installed, it compiles it with g++. With #define's and compiler flags, I think that can be easily done -- was wondering if this is something that developers using MPI do and whether AC/AM supports it. But thanks and I'll try more phrase combinations in google... Ray Jeff Squyres wrote: OMPI itself uses AC/AM to build itself, but our configure.ac and some of our Makefile.am's are fairly complex -- I wouldn't use these as starting points. You probably want to start with some general AC/AM tutorials (the AM documentation reads somewhat like a tutorial -- you might want to look there?). Just google around for AC/AM tutorials; leave "MPI" out of your searching. Indeed, all you really want to do is build your software -- the only real difference between your app and other apps is that you want to use mpicc and friends to build it (vs. gcc and friends). Most other aspects should be the same. Hence, the big difference for building an MPI application with AC/AM is that you want to set the C, C++, and Fortran compilers to the various "wrapper" MPI compilers (e.g., CC=mpicc, CXX=mpic++, FC=mpif90, F77=mpif77). Then AC_PROG_CC (etc.) will find the wrapper compiler instead of gcc (for example). It gets tricky, though, because not all MPI implementations have wrapper compilers -- so it's up to you to decide how portable you want to be. The open source MPI's both have wrapper compilers by the same names (mpicc et al.), but some of the vendor/MPP platform-specific MPI's may not. Good luck. On Nov 6, 2008, at 6:35 AM, Raymond Wan wrote: Hi all, I'm not sure if this is relevant to this mailing list, but I'm trying to get autoconf/automake working with an Open MPI program I am writing (in C++) but unfortunately, I don't know where to begin. I'm new to both tools but have it working well enough for a non-MPI program. When I google for these terms, I end up with results of people who have problems with autoconf/automake and *installing* Open MPI -- which isn't what I am looking for. Or, I get results that are well beyond what I need...I just need something to start with and won't be combining programming languages, etc. Does anyone have a brief example of configure.ac and/or Makefile.am to start me off or know of a tutorial that describes how they can be adapted for Open MPI from a non-MPI program? Thank you -- any help appreciated! Ray ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
[OMPI users] Debian MPI -- mpirun missing
Hi all, I'm very new to MPI and am trying to install it on to a Debian Etch system. I did have mpich installed and I believe that is causing me problems. I completely uninstalled it and then ran: update-alternatives --remove-all mpicc Then, I installed the following packages: libibverbs1 openmpi-bin openmpi-common openmpi-libs0 openmpi-dbg openmpi-dev And it now says: >> update-alternatives --display mpicc mpicc - status is auto. link currently points to /usr/bin/mpicc.openmpi /usr/bin/mpicc.openmpi - priority 40 slave mpif90: /usr/bin/mpif90.openmpi slave mpiCC: /usr/bin/mpic++.openmpi slave mpic++: /usr/bin/mpic++.openmpi slave mpif77: /usr/bin/mpif77.openmpi slave mpicxx: /usr/bin/mpic++.openmpi Current `best' version is /usr/bin/mpicc.openmpi. which seems ok to me... So, I tried to compile something (I had sample code from a book I purchased a while back, but for mpich), however, I can run the program as-is, but I think I should be running it with mpirun -- the FAQ suggests there is one? But, there is no mpirun anywhere. It's not in /usr/bin. I updated the filename database (updatedb) and tried a "locate mpirun", and I get only one hit: /usr/include/openmpi/ompi/runtime/mpiruntime.h Is there a package that I neglected to install? I did an "aptitude search openmpi" and installed everything listed... :-) Or perhaps I haven't removed all trace of mpich? Thank you in advance! Ray