Re: [OMPI users] Searching the FAQ
On Jan 25, 2010, at 5:38 PM, Gus Correa wrote: > A) Keep the FAQ, please! No worries -- I am not asking about removing the FAQ. I was more asking if people wanted the *form* of the FAQ would be useful in a different form. > B) Add an "ALL FAQ" category, to make keyword search easier > on web browsers. Hmm. Not a bad idea. Probably not hard to do. > C) Please write the (long overdue) FAQ set about the > OpenMPI collectives! > > I asked before, and I beg for it again: > Please write a set of FAQs about > OpenMPI collectives, and how to tune them up. George Bosilca is the owner of this one -- he's the guy with all the knowledge... > Which algorithms are available for each collective? > What is the rationale behind these algorithms? > What is the default algorithm used by each collective? > How do you enforce the use of a certain collective algorithm? > What are the pros and cons of hardwiring > a choice of collective algorithm? > How to tune up the collective algorithms to your application and to > your hardware? George -- if you write something up, I can word smyth it to put it into nice FAQ prose... -- Jeff Squyres jsquy...@cisco.com
[OMPI users] Can I start MPI_Spawn child processes early?
Hi All, I am trying to use MPI for scientific High Performance (hpc) applications. I use MPI_Spawn to create child processes. Is there a way to start child processes early than the parent process, using MPI_Spawn? I want this because, my experiments showed that the time to spawn the children by parent is too long for HPC apps which slows down the whole process. If the children are ready when parent application process seeks for them, that initial delay can be avoided. Is there a way to do that? Thanks in advance, Jaison Australian National University
Re: [OMPI users] Searching the FAQ
Hi Jeff Thanks for your "RFC on FAQs"! There go my two cents: A) Keep the FAQ, please! I am a big fan of the OpenMPI FAQ. I use them all the time. I also recommend them to everybody on this list and on other lists. I've seen a lot of people do the same. In the absence of more comprehensive documentation, the FAQ is the resource we all count on to fix mistakes, look for a forgotten syntax, setup our computers to work properly with OpenMPI, learn a new concept, etc. So, whatever you do, please don't do away with the FAQs, unless you already have more comprehensive documentation ready to replace the FAQ. B) Add an "ALL FAQ" category, to make keyword search easier on web browsers. Keyword search of the FAQ is a bit cumbersome when one has 26 different FAQ categories / web pages to search for. A very simple / minimal effort way to allow web search of the whole FAQ set would be to add the "ALL FAQ" category to the current FAQ categories list (maybe on the very top of the list). The "ALL FAQ" page would concatenate all of your FAQ HTML files, allowing keyword search across all FAQs in any web browser. One doesn't need to be fancy and stylish to be effective. C) Please write the (long overdue) FAQ set about the OpenMPI collectives! I asked before, and I beg for it again: Please write a set of FAQs about OpenMPI collectives, and how to tune them up. The current resource available to learn about collectives are several sparse postings on the mailing list archive. Despite the interesting questions posed and the generous answers provided about collectives on the mailing list, they don't form a coherent elucidating body, and are not easy to follow. Some questions about collectives that in one way or another have been asked on the list: Which algorithms are available for each collective? What is the rationale behind these algorithms? What is the default algorithm used by each collective? How do you enforce the use of a certain collective algorithm? What are the pros and cons of hardwiring a choice of collective algorithm? How to tune up the collective algorithms to your application and to your hardware? And more ... Cheers, Gus Correa - Gustavo Correa Lamont-Doherty Earth Observatory - Columbia University Palisades, NY, 10964-8000 - USA - Jeff Squyres wrote: I have some simple questions for all you users out there about the OMPI FAQ. I ask because we see a LOT of you end up on the OMPI FAQ in our web statistics (most users who search either end up on the FAQ and/or on the web archives of the mailing list). Hence, I'd like to know if we can improve the FAQ from a usability standpoint. 1. Is the FAQ useful in its current form? More specifically: - I personally find it a little difficult to web search for something and then end up on a single FAQ page with a LOT of information on it (e.g., the text for all the questions/answers in that category). I.e., if I'm searching for something specific, it would be useful to end up on a page with *just that one FAQ question/answer*. - OTOH, if I don't know exactly what I'm looking for, it is useful to see a whole page of FAQ questions and answers so that I can scan through them all to find what I'm looking for (vs. clicking through a million different individual pages). 2. We wrote all the PHP for the OMPI FAQ ourselves (it's not driven by a database; the content is all in individual text files). Back when we started, we surveyed the web FAQ systems and found each of them lacking for one reason or another (I don't remember the details), and therefore wrote our own PHP stuff. Do people have other FAQ web systems that they'd recommend these days? 3. Are there other features from an FAQ that you would like to see in the OMPI FAQ? I ask these questions because a) the current system has annoyed me a few too many times recently for various limitations, and b) I'm wondering if there is something better out there -- better searching, more web-2.0-ish, ...whatever. We're certainly not tied to the existing FAQ system -- the current set of questions and answers is fairly easy to extract from the PHP, so we could move it to another system if it would be desirable.
Re: [OMPI users] ABI stabilization/versioning
On Mon, 25 Jan 2010 15:10:12 -0500, Jeff Squyres wrote: > Indeed. Our wrapper compilers currently explicitly list all 3 > libraries (-lmpi -lopen-rte -lopen-pal) because we don't know if those > libraries will be static or shared at link time. I am suggesting that it is unavoidable for the person doing the linking to be explicit about whether they want static or dynamic libs when they invoke mpicc. Consider the pkg-config model where you might write gcc -static -o my-app main.o `pkg-config --libs --static openmpi fftw3` gcc -o my-app main.o `pkg-config --libs openmpi fftw3` In MPI world, gcc -static -o my-app main.o `mpicc -showme:link-static` `pkg-config --libs --static fftw3` gcc -o my-app main.o `mpicc -showme:link` `pkg-config --libs fftw3` seems tolerable. The trick (as you point out) is to get the option processed when the wrapper is being invoked as the compiler instead of just for the -showme options. Possible options are defining an OMPI_STATIC environment variable or inspecting argv for --link:static (or some such). This is one many the reasons why wrappers are a horrible solution, especially when they are expected to be used in nontrivial cases. Ideally, the adopted plan could be done in some coordination with MPICH2 (which lacks a -showme:link analogue) so that it is not so hard to write portable build systems. > > On the cited bug report, I just wanted to note that collapsing > > libopen-rte and libopen-pal (even only in production builds) has the > > undesirable effect that their ABI cannot change without incrementing > > the soname of libmpi (i.e. user binaries are coupled just as tightly > > to these libraries as when they were separate but linked explicitly, > > so this offers no benefit at all). > > Indeed -- this is exactly the reason we ended up leaving libopen-* .so > versions at 0:0:0. But not versioning those libs isn't much of a solution either since it becomes possible to get an ABI mismatch at runtime (consider someone who uses them independently, or if they are packaged separately as in a distribution so that it becomes possible to update these out from underneath libmpi). > There's an additional variable -- we had considered collapsing all 3 > libraries into libmpi for production builds, My point was that this is no solution at all since you have to bump the soname any time you change libopen-*. So even users who NEVER call into libopen-* have to relink any time something happens there, despite their interface not changing. And that is exactly the situation if the wrappers continue to overlink AND libopen-* became versioned, so at least by keeping them separate, you give users the option of not overlinking (albeit manually) and the option of using libopen-* without libmpi. > Yuck. It's 2010 and we still don't have a standard way to represent link dependencies (pkg-config might be the closest thing, but it's bad if you have multiple versions of the same library, and the granularity is wrong, e.g. if you want to link some exotic lib statically and the common ones dynamically). Jed
Re: [OMPI users] checkpointing multi node and multi process applications
Actually, let me roll that back a bit. I was preparing a custom patch for the v1.4 series, and it seems that the code does not have the bug I mentioned. It is only the v1.5 and trunk that were effected by this. The v1.4 series should be fine. I will still ask that the error message fix be brought over to the v1.4 branch, but it is unlikely to fix your problem. However it would be useful to know if upgrading to the trunk or v1.5 series fixes this problem. The v1.4 series has an old version of the file and metadata handling mechanisms, so I am encouraging people to move to the v1.5 series if possible. -- Josh On Jan 25, 2010, at 3:33 PM, Josh Hursey wrote: So while working on the error message, I noticed that the global coordinator was using the wrong path to investigate the checkpoint metadata. This particular section of code is not often used (which is probably why I could not reproduce). I just committed a fix to the Open MPI development trunk: https://svn.open-mpi.org/trac/ompi/changeset/22479 Additionally, I am asking for this to be brought over to the v1.4 and v1.5 release branches: https://svn.open-mpi.org/trac/ompi/ticket/2195 https://svn.open-mpi.org/trac/ompi/ticket/2196 It seems to solve the problem as I could reproduce it. Can you try the trunk (either SVN checkout or nightly tarball from tonight) and check if this solves your problem? Cheers, Josh On Jan 25, 2010, at 12:14 PM, Josh Hursey wrote: I am not able to reproduce this problem with the 1.4 branch using a hostfile, and node configuration like you mentioned. I suspect that the error is caused by a failed local checkpoint. The error message is triggered when the global coordinator (located in 'mpirun') tries to read the metadata written by the application in the local snapshot. If the global coordinator cannot properly read the metadata, then it will print a variety of error messages depending on what is going wrong. If these are the only two errors produced, then this typically means that the local metadata file has been found, but is empty/ corrupted. Can you send me the contents of the local checkpoint metadata file: shell$ cat GLOBAL_SNAPSHOT_DIR/ompi_global_snapshot_YYY.ckpt/0/ opal_snapshot_0.ckpt/snapshot_meta.data It should look something like: - # # PID: 23915 # Component: blcr # CONTEXT: ompi_blcr_context.23915 - It may also help to see the following metadata file as well: shell$ cat GLOBAL_SNAPSHOT_DIR/ompi_global_snapshot_YYY.ckpt/ global_snapshot_meta.data If there are other errors printed by the process, that would potentially indicate a different problem. So if there are, let me know. This error message should be a bit more specific about which process checkpoint is causing the problem, and what the this usually indicates. I filed a bug to cleanup the error: https://svn.open-mpi.org/trac/ompi/ticket/2190 -- Josh On Jan 21, 2010, at 8:27 AM, Jean Potsam wrote: Hi Josh/all, I have upgraded the openmpi to v 1.4 but still get the same error when I try executing the application on multiple nodes: *** Error: expected_component: PID information unavailable! Error: expected_component: Component Name information unavailable! *** I am running my application from the node 'portal11' as follows: mpirun -am ft-enable-cr -np 2 --hostfile hosts myapp. The file 'hosts' contains two host names: portal10, portal11. I am triggering the checkpoint using ompi-checkpoint -v 'PID' from portal11. I configured open mpi as follows: # ./configure --prefix=/home/jean/openmpi/ --enable-picky --enable- debug --enable-mpi-profile --enable-mpi-cxx --enable-pretty-print- stacktrace --enable-binaries --enable-trace --enable-static=yes -- enable-debug --with-devel-headers=1 --with-mpi-param-check=always --with-ft=cr --enable-ft-thread --with-blcr=/usr/local/blcr/ -- with-blcr-libdir=/usr/local/blcr/lib --enable-mpi-threads=yes # Question: what do you think can be wrong? Please instruct me on how to resolve this problem. Thank you Jean --- On Mon, 11/1/10, Josh Hursey wrote: From: Josh Hursey Subject: Re: [OMPI users] checkpointing multi node and multi process applications To: "Open MPI Users" Date: Monday, 11 January, 2010, 21:42 On Dec 19, 2009, at 7:42 AM, Jean Potsam wrote: > Hi Everyone, >I am trying to checkpoint an mpi application running on multiple nodes. However, I get some error messages when i trigger the checkpointing process. > > Error: expected_component: PID information unavailable! > Error: expected_component: Component Name information unavailable! > > I am using open mpi 1.3 and blcr 0.8.1 Can you try the v1.4 release and see if the problem persists? > > I execute my application as follows: > > mpirun -am ft-enable-cr -np 3 --hostfile host
Re: [OMPI users] checkpointing multi node and multi process applications
So while working on the error message, I noticed that the global coordinator was using the wrong path to investigate the checkpoint metadata. This particular section of code is not often used (which is probably why I could not reproduce). I just committed a fix to the Open MPI development trunk: https://svn.open-mpi.org/trac/ompi/changeset/22479 Additionally, I am asking for this to be brought over to the v1.4 and v1.5 release branches: https://svn.open-mpi.org/trac/ompi/ticket/2195 https://svn.open-mpi.org/trac/ompi/ticket/2196 It seems to solve the problem as I could reproduce it. Can you try the trunk (either SVN checkout or nightly tarball from tonight) and check if this solves your problem? Cheers, Josh On Jan 25, 2010, at 12:14 PM, Josh Hursey wrote: I am not able to reproduce this problem with the 1.4 branch using a hostfile, and node configuration like you mentioned. I suspect that the error is caused by a failed local checkpoint. The error message is triggered when the global coordinator (located in 'mpirun') tries to read the metadata written by the application in the local snapshot. If the global coordinator cannot properly read the metadata, then it will print a variety of error messages depending on what is going wrong. If these are the only two errors produced, then this typically means that the local metadata file has been found, but is empty/corrupted. Can you send me the contents of the local checkpoint metadata file: shell$ cat GLOBAL_SNAPSHOT_DIR/ompi_global_snapshot_YYY.ckpt/0/ opal_snapshot_0.ckpt/snapshot_meta.data It should look something like: - # # PID: 23915 # Component: blcr # CONTEXT: ompi_blcr_context.23915 - It may also help to see the following metadata file as well: shell$ cat GLOBAL_SNAPSHOT_DIR/ompi_global_snapshot_YYY.ckpt/ global_snapshot_meta.data If there are other errors printed by the process, that would potentially indicate a different problem. So if there are, let me know. This error message should be a bit more specific about which process checkpoint is causing the problem, and what the this usually indicates. I filed a bug to cleanup the error: https://svn.open-mpi.org/trac/ompi/ticket/2190 -- Josh On Jan 21, 2010, at 8:27 AM, Jean Potsam wrote: Hi Josh/all, I have upgraded the openmpi to v 1.4 but still get the same error when I try executing the application on multiple nodes: *** Error: expected_component: PID information unavailable! Error: expected_component: Component Name information unavailable! *** I am running my application from the node 'portal11' as follows: mpirun -am ft-enable-cr -np 2 --hostfile hosts myapp. The file 'hosts' contains two host names: portal10, portal11. I am triggering the checkpoint using ompi-checkpoint -v 'PID' from portal11. I configured open mpi as follows: # ./configure --prefix=/home/jean/openmpi/ --enable-picky --enable- debug --enable-mpi-profile --enable-mpi-cxx --enable-pretty-print- stacktrace --enable-binaries --enable-trace --enable-static=yes -- enable-debug --with-devel-headers=1 --with-mpi-param-check=always -- with-ft=cr --enable-ft-thread --with-blcr=/usr/local/blcr/ --with- blcr-libdir=/usr/local/blcr/lib --enable-mpi-threads=yes # Question: what do you think can be wrong? Please instruct me on how to resolve this problem. Thank you Jean --- On Mon, 11/1/10, Josh Hursey wrote: From: Josh Hursey Subject: Re: [OMPI users] checkpointing multi node and multi process applications To: "Open MPI Users" Date: Monday, 11 January, 2010, 21:42 On Dec 19, 2009, at 7:42 AM, Jean Potsam wrote: > Hi Everyone, >I am trying to checkpoint an mpi application running on multiple nodes. However, I get some error messages when i trigger the checkpointing process. > > Error: expected_component: PID information unavailable! > Error: expected_component: Component Name information unavailable! > > I am using open mpi 1.3 and blcr 0.8.1 Can you try the v1.4 release and see if the problem persists? > > I execute my application as follows: > > mpirun -am ft-enable-cr -np 3 --hostfile hosts gol. > > My question: > > Does openmpi with blcr support checkpointing of multi node execution of mpi application? If so, can you provide me with some information on how to achieve this. Open MPI is able to checkpoint a multi-node application (that's what it was designed to do). There are some examples at the link below: http://www.osl.iu.edu/research/ft/ompi-cr/examples.php -- Josh > > Cheers, > > Jean. > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/
Re: [OMPI users] [ompi-1.4.1] compiling without openib, running with openib + ompi141 and gcc3
On Jan 25, 2010, at 11:58 AM, Mathieu Gontier wrote: > I built OpenMPI-1.4.1 without openib support with the following configuration > options: > > ./configure > --prefix=/develop/libs/OpenMPI/openmpi-1.4.1/LINUX_GCC_4_1_tcp_mach > --enable-static --enable-shared --enable-cxx-exceptions --enable-mpi-f77 > --disable-mpi-f90 --enable-mpi-cxx --disable-mpi-cxx-seek --enable-dist > --enable-mpi-profile --enable-binaries --enable-mpi-threads > --enable-memchecker --disable-debug --with-pic --with-threads --with-sge Note that you should not use --enable-dist. --enable-dist is used by the OMPI maintainers ONLY when generating official downloadable tarballs. It is *NOT* guaranteed to make sane / correct builds for general purpose runs. Here's what ./configure --help says about --enable-dist: --enable-dist guarantee that that the "dist" make target will be functional, although may not guarantee that any other make target will be functional. Specifically: --enable-dist allows some configure tests to "pass" even though they shouldn't. For example, I don't have MX installed on my systems. But with --enable-dist, the MX tests in OMPI's configure script will "pass" just enough so that I can "make dist" to generate a tarball and still include all the MX plugin source code. > On my cluster, I run a small test (a broadcast on a 100 integer array) on 12 > processes balanced on 3 nodes, but I asked for using openib. It works with > the following messages: > > mpirun -np 12 -hostfile /tmp/72936.1.64.q/machines --mca btl openib,sm,self > /home/numeca/tmp/gontier/bcast/exe_ompi_cluster -nloop 2 -nbuff 100 Is your PATH and LD_LIBRARY_PATH set correctly such that you'll find the "right" ones (i.e., the ones that you just built/installed in /develop/libs/OpenMPI/openmpi-1.4.1/LINUX_GCC_4_1_tcp_mach)? I.e., is it possible that you're finding some other OMPI install that has OpenFabrics support? Further, did you ever previously install Open MPI into that prefix and include OpenFabrics support? I ask because OMPI's OpenFabrics support is in the form of a plugin -- if you simply installed another copy of OMPI into the same prefix without uninstalling first, the OpenFabrics plugin could still have been left in the tree, and therefore used at run time. Finally, note that you didn't tell Open MPI to *NOT* build OpenFabrics support. In this case, OMPI's configure script looks for OpenFabrics support, and if it finds it, builds it. But if it doesn't find OpenFabrics support (and you didn't specifically ask for it), it just skips it and keeps going. You might want to look through the output of OMPI's configure and see if it found OpenFabrics support and therefore decided to build it. > I finally run ompi_info: > > ./ompi_info | grep openib > MCA btl: openib (MCA v2.0, API v2.0, Component v1.4.1) > > Openib seems to be supported. That is weird because I did not ask for... Yep; see above. > So, assuming the compilation of OpenMPI which does not support openib here, > what happened? Was tcp selected? How can I check which device has been used > (or force an explicit message)? Unfortunately, OMPI currently lacks a good message indicating which device is used at run-time (because it's actually a surprisingly complex issue, since OMPI chooses a communication device based on which peer it's talking to, among other reasons). We hope to have a good message in sometime in the OMPI 1.5 series. > By the way, what is the meaning of this message in my case? Do you mean this message? - WARNING: There was an error initializing an OpenFabrics device. Local host: node005 Local device: mthca0 - If so, it means that Open MPI was unable to initialize the InfiniBand HCA known as "mthca0" on the server known as node005. The RLIMIT messages are likely symptoms of the issue; you likely need to set your registered memory limits to "unlimited". See the OMPI FAQ in the OpenFabrics section for questions about registered memory limits for instructions how. > By the way, another different think: does OpenMPI must be compiled with > gcc-4.1 or later, or gcc-3.4 (for example) can be used? gcc 3.4 should be fine. -- Jeff Squyres jsquy...@cisco.com
Re: [OMPI users] Problems building Open MPI 1.4.1 with Pathscale
1 - Do you have problems with openmpi 1.4 too? (I don't, haven't built 1.4.1 yet) 2 - There is a bug in the pathscale compiler with -fPIC and -g that generates incorrect dwarf2 data so debuggers get really confused and will have BIG problems debugging the code. I'm chasing them to get a fix... 3 - Do you have an example code that have problems? On Mon, 2010-01-25 at 15:01 -0500, Jeff Squyres wrote: > I'm afraid I don't have any clues offhand. We *have* had problems with the > Pathscale compiler in the past that were never resolved by their support > crew. However, they were of the "variables weren't initialized and the > process generally aborts" kind of failure, not a "persistent hang" kind of > failure. > > Can you tell where in MPI_Init the process is hanging? E.g., can you build > Open MPI with debugging enabled (such as by passing CFLAGS=-g to OMPI's > configure line) and then attach a debugger to a hung process and see what > it's stuck on? > > > On Jan 25, 2010, at 7:52 AM, Rafael Arco Arredondo wrote: > > > Hello: > > > > I'm having some issues with Open MPI 1.4.1 and Pathscale compiler > > (version 3.2). Open MPI builds successfully with the following configure > > arguments: > > > > ./configure --with-openib=/usr --with-openib-libdir=/usr/lib64 > > --with-sge --enable-static CC=pathcc CXX=pathCC F77=pathf90 F90=pathf90 > > FC=pathf90 > > > > (we have OpenFabrics 1.2 Infiniband drivers, by the way) > > > > However, applications hang on MPI_Init (or maybe MPI_Comm_rank or > > MPI_Comm_size, a basic hello-world anyway doesn't print 'Hello World > > from node...'). I tried running them with and without SGE. Same result. > > > > This hello-world works flawlessly when I build Open MPI with gcc: > > > > ./configure --with-openib=/usr --with-openib-libdir=/usr/lib64 > > --with-sge --enable-static > > > > This successful execution runs in one machine only, so it shouldn't use > > Infiniband, and it also works when several nodes are used. > > > > I was able to build previous versions of Open MPI with Pathscale (1.2.6 > > and 1.3.2, particularly). I tried building version 1.4.1 both with > > Pathscale 3.2 and Pathscale 3.1. No difference. > > > > Any ideas? > > > > Thank you in advance, > > > > Rafa > > > > -- > > Rafael Arco Arredondo > > Centro de Servicios de Informática y Redes de Comunicaciones > > Universidad de Granada > > > > ___ > > users mailing list > > us...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > >
Re: [OMPI users] ABI stabilization/versioning
On Jan 25, 2010, at 12:55 PM, Jed Brown wrote: > > The short version is that the possibility of static linking really > > fouls up the scheme, and we haven't figured out a good way around this > > yet. :-( > > So pkg-config addresses this with it's Libs.private field and an > explicit command-line argument when you want static libs, e.g. > > $ pkg-config --libs libavcodec > -lavcodec > $ pkg-config --libs --static libavcodec > -pthread -lavcodec -lz -lbz2 -lfaac -lfaad -lmp3lame -lopencore-amrnb > -lopencore-amrwb -ltheoraenc -ltheoradec -lvorbisenc -lvorbis -logg -lx264 > -lm -lxvidcore -ldl -lasound -lavutil > > There is no way to simultaneously (a) prevent overlinking shared libs > and (b) correctly link static libs without an explicit statement from > the user about whether to link *your library* statically or dynamically. Indeed. Our wrapper compilers currently explicitly list all 3 libraries (-lmpi -lopen-rte -lopen-pal) because we don't know if those libraries will be static or shared at link time. If they're shared, then listing -lmpi should be sufficient because its implicit dependencies should be sufficient to pull in the other two (and therefore libopen-rte and libopen-pal can have their own, independent .so version numbers. yay!). But if they're static, then libmpi has no implicit dependencies, and you *have* to list all clauses (-lmpi -lopen-rte -lopen-pal). We did not want our wrapper compilers to get in the business of: - attempting to divine whether the link will be static or dynamic (e.g., could be as "simple" [read: not really] as parsing argv, but could be as difficult as reading compiler config files). - figuring out shared library filenames (e.g., .so, .dylib, .dll, ...etc.). Yuck. > Unfortunately, pkgconfig doesn't work well with multiple builds of a > package, and doesn't know how to link some libs statically and some > dynamically. > > On the cited bug report, I just wanted to note that collapsing > libopen-rte and libopen-pal (even only in production builds) has the > undesirable effect that their ABI cannot change without incrementing the > soname of libmpi (i.e. user binaries are coupled just as tightly to > these libraries as when they were separate but linked explicitly, so > this offers no benefit at all). Indeed -- this is exactly the reason we ended up leaving libopen-* .so versions at 0:0:0. There's an additional variable -- we had considered collapsing all 3 libraries into libmpi for production builds, but the problem here is that multiple external projects have starting using libopen-rte and libopen-pal independently of libmpi. Hence, we can't just make those libraries disappear. :-\ The developers of those external projects don't want a big monolithic library to link against, particularly when they have nothing to do with MPI. Yuck. -- Jeff Squyres jsquy...@cisco.com
Re: [OMPI users] Problems building Open MPI 1.4.1 with Pathscale
I'm afraid I don't have any clues offhand. We *have* had problems with the Pathscale compiler in the past that were never resolved by their support crew. However, they were of the "variables weren't initialized and the process generally aborts" kind of failure, not a "persistent hang" kind of failure. Can you tell where in MPI_Init the process is hanging? E.g., can you build Open MPI with debugging enabled (such as by passing CFLAGS=-g to OMPI's configure line) and then attach a debugger to a hung process and see what it's stuck on? On Jan 25, 2010, at 7:52 AM, Rafael Arco Arredondo wrote: > Hello: > > I'm having some issues with Open MPI 1.4.1 and Pathscale compiler > (version 3.2). Open MPI builds successfully with the following configure > arguments: > > ./configure --with-openib=/usr --with-openib-libdir=/usr/lib64 > --with-sge --enable-static CC=pathcc CXX=pathCC F77=pathf90 F90=pathf90 > FC=pathf90 > > (we have OpenFabrics 1.2 Infiniband drivers, by the way) > > However, applications hang on MPI_Init (or maybe MPI_Comm_rank or > MPI_Comm_size, a basic hello-world anyway doesn't print 'Hello World > from node...'). I tried running them with and without SGE. Same result. > > This hello-world works flawlessly when I build Open MPI with gcc: > > ./configure --with-openib=/usr --with-openib-libdir=/usr/lib64 > --with-sge --enable-static > > This successful execution runs in one machine only, so it shouldn't use > Infiniband, and it also works when several nodes are used. > > I was able to build previous versions of Open MPI with Pathscale (1.2.6 > and 1.3.2, particularly). I tried building version 1.4.1 both with > Pathscale 3.2 and Pathscale 3.1. No difference. > > Any ideas? > > Thank you in advance, > > Rafa > > -- > Rafael Arco Arredondo > Centro de Servicios de Informática y Redes de Comunicaciones > Universidad de Granada > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users > -- Jeff Squyres jsquy...@cisco.com
Re: [OMPI users] ABI stabilization/versioning
On Mon, 25 Jan 2010 09:09:47 -0500, Jeff Squyres wrote: > The short version is that the possibility of static linking really > fouls up the scheme, and we haven't figured out a good way around this > yet. :-( So pkg-config addresses this with it's Libs.private field and an explicit command-line argument when you want static libs, e.g. $ pkg-config --libs libavcodec -lavcodec $ pkg-config --libs --static libavcodec -pthread -lavcodec -lz -lbz2 -lfaac -lfaad -lmp3lame -lopencore-amrnb -lopencore-amrwb -ltheoraenc -ltheoradec -lvorbisenc -lvorbis -logg -lx264 -lm -lxvidcore -ldl -lasound -lavutil There is no way to simultaneously (a) prevent overlinking shared libs and (b) correctly link static libs without an explicit statement from the user about whether to link *your library* statically or dynamically. Unfortunately, pkgconfig doesn't work well with multiple builds of a package, and doesn't know how to link some libs statically and some dynamically. On the cited bug report, I just wanted to note that collapsing libopen-rte and libopen-pal (even only in production builds) has the undesirable effect that their ABI cannot change without incrementing the soname of libmpi (i.e. user binaries are coupled just as tightly to these libraries as when they were separate but linked explicitly, so this offers no benefit at all). Jed
[OMPI users] Searching the FAQ
I have some simple questions for all you users out there about the OMPI FAQ. I ask because we see a LOT of you end up on the OMPI FAQ in our web statistics (most users who search either end up on the FAQ and/or on the web archives of the mailing list). Hence, I'd like to know if we can improve the FAQ from a usability standpoint. 1. Is the FAQ useful in its current form? More specifically: - I personally find it a little difficult to web search for something and then end up on a single FAQ page with a LOT of information on it (e.g., the text for all the questions/answers in that category). I.e., if I'm searching for something specific, it would be useful to end up on a page with *just that one FAQ question/answer*. - OTOH, if I don't know exactly what I'm looking for, it is useful to see a whole page of FAQ questions and answers so that I can scan through them all to find what I'm looking for (vs. clicking through a million different individual pages). 2. We wrote all the PHP for the OMPI FAQ ourselves (it's not driven by a database; the content is all in individual text files). Back when we started, we surveyed the web FAQ systems and found each of them lacking for one reason or another (I don't remember the details), and therefore wrote our own PHP stuff. Do people have other FAQ web systems that they'd recommend these days? 3. Are there other features from an FAQ that you would like to see in the OMPI FAQ? I ask these questions because a) the current system has annoyed me a few too many times recently for various limitations, and b) I'm wondering if there is something better out there -- better searching, more web-2.0-ish, ...whatever. We're certainly not tied to the existing FAQ system -- the current set of questions and answers is fairly easy to extract from the PHP, so we could move it to another system if it would be desirable. -- Jeff Squyres jsquy...@cisco.com
Re: [OMPI users] checkpointing multi node and multi process applications
I am not able to reproduce this problem with the 1.4 branch using a hostfile, and node configuration like you mentioned. I suspect that the error is caused by a failed local checkpoint. The error message is triggered when the global coordinator (located in 'mpirun') tries to read the metadata written by the application in the local snapshot. If the global coordinator cannot properly read the metadata, then it will print a variety of error messages depending on what is going wrong. If these are the only two errors produced, then this typically means that the local metadata file has been found, but is empty/corrupted. Can you send me the contents of the local checkpoint metadata file: shell$ cat GLOBAL_SNAPSHOT_DIR/ompi_global_snapshot_YYY.ckpt/0/ opal_snapshot_0.ckpt/snapshot_meta.data It should look something like: - # # PID: 23915 # Component: blcr # CONTEXT: ompi_blcr_context.23915 - It may also help to see the following metadata file as well: shell$ cat GLOBAL_SNAPSHOT_DIR/ompi_global_snapshot_YYY.ckpt/ global_snapshot_meta.data If there are other errors printed by the process, that would potentially indicate a different problem. So if there are, let me know. This error message should be a bit more specific about which process checkpoint is causing the problem, and what the this usually indicates. I filed a bug to cleanup the error: https://svn.open-mpi.org/trac/ompi/ticket/2190 -- Josh On Jan 21, 2010, at 8:27 AM, Jean Potsam wrote: Hi Josh/all, I have upgraded the openmpi to v 1.4 but still get the same error when I try executing the application on multiple nodes: *** Error: expected_component: PID information unavailable! Error: expected_component: Component Name information unavailable! *** I am running my application from the node 'portal11' as follows: mpirun -am ft-enable-cr -np 2 --hostfile hosts myapp. The file 'hosts' contains two host names: portal10, portal11. I am triggering the checkpoint using ompi-checkpoint -v 'PID' from portal11. I configured open mpi as follows: # ./configure --prefix=/home/jean/openmpi/ --enable-picky --enable- debug --enable-mpi-profile --enable-mpi-cxx --enable-pretty-print- stacktrace --enable-binaries --enable-trace --enable-static=yes -- enable-debug --with-devel-headers=1 --with-mpi-param-check=always -- with-ft=cr --enable-ft-thread --with-blcr=/usr/local/blcr/ --with- blcr-libdir=/usr/local/blcr/lib --enable-mpi-threads=yes # Question: what do you think can be wrong? Please instruct me on how to resolve this problem. Thank you Jean --- On Mon, 11/1/10, Josh Hursey wrote: From: Josh Hursey Subject: Re: [OMPI users] checkpointing multi node and multi process applications To: "Open MPI Users" Date: Monday, 11 January, 2010, 21:42 On Dec 19, 2009, at 7:42 AM, Jean Potsam wrote: > Hi Everyone, >I am trying to checkpoint an mpi application running on multiple nodes. However, I get some error messages when i trigger the checkpointing process. > > Error: expected_component: PID information unavailable! > Error: expected_component: Component Name information unavailable! > > I am using open mpi 1.3 and blcr 0.8.1 Can you try the v1.4 release and see if the problem persists? > > I execute my application as follows: > > mpirun -am ft-enable-cr -np 3 --hostfile hosts gol. > > My question: > > Does openmpi with blcr support checkpointing of multi node execution of mpi application? If so, can you provide me with some information on how to achieve this. Open MPI is able to checkpoint a multi-node application (that's what it was designed to do). There are some examples at the link below: http://www.osl.iu.edu/research/ft/ompi-cr/examples.php -- Josh > > Cheers, > > Jean. > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
[OMPI users] [ompi-1.4.1] compiling without openib, running with openib + ompi141 and gcc3
Hello, I built OpenMPI-1.4.1 without openib support with the following configuration options: ./configure --prefix=/develop/libs/OpenMPI/openmpi-1.4.1/LINUX_GCC_4_1_tcp_mach --enable-static --enable-shared --enable-cxx-exceptions --enable-mpi-f77 --disable-mpi-f90 --enable-mpi-cxx --disable-mpi-cxx-seek --enable-dist --enable-mpi-profile --enable-binaries --enable-mpi-threads --enable-memchecker --disable-debug --with-pic --with-threads --with-sge On my cluster, I run a small test (a broadcast on a 100 integer array) on 12 processes balanced on 3 nodes, but I asked for using openib. It works with the following messages: mpirun -np 12 -hostfile /tmp/72936.1.64.q/machines --mca btl openib,sm,self /home/numeca/tmp/gontier/bcast/exe_ompi_cluster -nloop 2 -nbuff 100 libibverbs: Warning: RLIMIT_MEMLOCK is 32768 bytes. This will severely limit memory registrations. libibverbs: Warning: RLIMIT_MEMLOCK is 32768 bytes. This will severely limit memory registrations. -- WARNING: There was an error initializing an OpenFabrics device. Local host: node005 Local device: mthca0 -- libibverbs: Warning: RLIMIT_MEMLOCK is 32768 bytes. This will severely limit memory registrations. libibverbs: Warning: RLIMIT_MEMLOCK is 32768 bytes. This will severely limit memory registrations. libibverbs: Warning: RLIMIT_MEMLOCK is 32768 bytes. This will severely limit memory registrations. libibverbs: Warning: RLIMIT_MEMLOCK is 32768 bytes. This will severely limit memory registrations. libibverbs: Warning: RLIMIT_MEMLOCK is 32768 bytes. This will severely limit memory registrations. libibverbs: Warning: RLIMIT_MEMLOCK is 32768 bytes. This will severely limit memory registrations. libibverbs: Warning: RLIMIT_MEMLOCK is 32768 bytes. This will severely limit memory registrations. libibverbs: Warning: RLIMIT_MEMLOCK is 32768 bytes. This will severely limit memory registrations. libibverbs: Warning: RLIMIT_MEMLOCK is 32768 bytes. This will severely limit memory registrations. libibverbs: Warning: RLIMIT_MEMLOCK is 32768 bytes. This will severely limit memory registrations. processing... done [node005:04791] 11 more processes have sent help message help-mpi-btl-openib.txt / error in device init [node005:04791] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages I finally run ompi_info: ./ompi_info | grep openib MCA btl: openib (MCA v2.0, API v2.0, Component v1.4.1) Openib seems to be supported. That is weird because I did not ask for... So, assuming the compilation of OpenMPI which does not support openib here, what happened? Was tcp selected? How can I check which device has been used (or force an explicit message)? By the way, what is the meaning of this message in my case? By the way, another different think: does OpenMPI must be compiled with gcc-4.1 or later, or gcc-3.4 (for example) can be used? Thank you for your help, Mathieu.
Re: [OMPI users] Checkpoint/Restart error
I tested the 1.4.1 release, and everything worked fine for me (tested a few different configurations of nodes/environments). The ompi-checkpoint error you cited is usually caused by one of two things: - The PID specified is wrong (which I don't think that is the case here) - The session directory cannot be found in /tmp. So I think the problem is the latter. The session directory looks something like: /tmp/openmpi-sessions-USERNAME@LOCALHOST_0 Within this directory the mpirun process places its contact information. ompi-checkpoint uses this contact information to connect to the job. If it cannot find it, then it errors out. (We definitely need a better error message here. I filed a ticket [1]). We usually do not recommend running Open MPI as a root user. So I would strongly recommend that you do not run as a root user. With a regular user, check the location of the session directory. Make sure that it is in /tmp on the node where 'mpirun' and 'ompi- checkpoint' are run. -- Josh [1] https://svn.open-mpi.org/trac/ompi/ticket/2189 On Jan 25, 2010, at 5:48 AM, Andreea Costea wrote: So? anyone? any clue? Summarize: - installed OpenMPI 1.4.1 on fresh Centos 5 - mpirun works but ompi-checkpoint throws this error: ORTE_ERROR_LOG: Not found in file orte-checkpoint.c at line 405 - on another VM I have OpenMPI 1.3.3. installed. Checkpointing works fine on guest but has the previous mentioned error on root. Both root and guest show the same output after "param -all -all" except for the $HOME (which only matters for mca_component_path, mca_param_files, snapc_base_global_snapshot_dir) Thanks, Andreea On Tue, Jan 19, 2010 at 9:01 PM, Andreea Costea > wrote: I noticed one more thing. As I still have some VMs that have OpenMPI version 1.3.3 installed I started to use those machines 'till I fix the problem with 1.4.1 And while checkpointing on one of this VMs I realized that checkpointing as a guest works fine and checkpointing as a root outputs the same error like in 1.4.1. : ORTE_ERROR_LOG: Not found in file orte-checkpoint.c at line 405 I logged the outputs of "ompi_info --param all all" which I run for root and for another user and the only differences were at these parameters: mca_component_path mca_param_files snapc_base_global_snapshot_dir All 3 params differ because of the $HOME. One more thing: I don't have the directory $HOME/.openmpi Ideas? Thanks, Andreea On Tue, Jan 19, 2010 at 12:51 PM, Andreea Costea > wrote: Well... I decided to install a fresh OS to be sure that there is no OpenMPI version conflict. So I formatted one of my VMs, did a fresh CentOS install, installed BLCR 0.8.2 and OpenMPI 1.4.1 and the result: the same. mpirun works but ompi-checkpoint has that error at line 405: [[35906,0],0] ORTE_ERROR_LOG: Not found in file orte-checkpoint.c at line 405 As for the files remaining after uninstalling: Jeff you were rigth. There is no file left, just some empty directories. Which might be the problem with that ORTE_ERROR_LOG error? Thanks, Andreea On Fri, Jan 15, 2010 at 11:47 PM, Andreea Costea > wrote: It's almost midnight here, so I left home, but I will try it tomorrow. There were some directories left after "make uninstall". I will give more details tomorrow. Thanks Jeff, Andreea On Fri, Jan 15, 2010 at 11:30 PM, Jeff Squyres wrote: On Jan 15, 2010, at 8:07 AM, Andreea Costea wrote: > - I wanted to update to version 1.4.1 and I uninstalled previous version like this: make uninstall, and than manually deleted all the left over files. the directory where I installed was /usr/local I'll let Josh answer your CR questions, but I did want to ask about this point. AFAIK, "make uninstall" removes *all* Open MPI files. For example: - [7:25] $ cd /path/to/my/OMPI/tree [7:25] $ make install > /dev/null [7:26] $ find /tmp/bogus/ -type f | wc 646 646 28082 [7:26] $ make uninstall > /dev/null [7:27] $ find /tmp/bogus/ -type f | wc 0 0 0 [7:27] $ - I realize that some *directories* are left in $prefix, but there should be no *files* left. Are you seeing something different? -- Jeff Squyres jsquy...@cisco.com ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] Windows CMake build problems ... (cont.)
Yes, it might be necessary. Done in r22473. Thanks, Shiqing Jeff Squyres wrote: Should this kind of info be added to README.windows? On Jan 25, 2010, at 4:34 AM, wrote: Thanks, that second part about the wrappers was what I was looking for. Charlie ... Original Message Subject: Re: [OMPI users] Windows CMake build problems ... (cont.) From: Shiqing Fan Date: Mon, January 25, 2010 2:09 am To: cjohn...@valverdecomputing.com Cc: Open MPI Users Hi Charlie, Actually, to compile and link your application with Open MPI on Windows is similar as on Linux. You have to link your application against the generated Open MPI libraries, e.g. libopen-mpi.lib (don't forget the suffix 'd' if you build debug version of the OMPI libraries, e.g. libopen-mpid.lib). But according to the information you provided, I assume that you only added the search path into the project, that's not enough, you should probably add the library names into "Project Property Pages" -> "Configuration Properties" -> Linker -> Input -> "Additional Dependencies", normally only libopen-mpi.lib (or libopen-mpid.lib) would be enough, so that Visual Studio will know which libraries to link to. Besides, the Open MPI compiler wrappers should also work on Windows, in this case you just need to open a "Visual Studio command prompt" with the Open MPI path env added (e.g. "set PATH=c:\Program Files\OpenMPI_v1.4\bin;%PATH%"), and simply run command like: mpicc app.c and mpirun -np 2 app.exe Please note that, before executing the application, Open MPI has to be installed somewhere either by build the "INSTALL" project or by running the generated installer, so that the correct Open MPI folder structure could be created. Regards, Shiqing cjohn...@valverdecomputing.com wrote: OK, so I'm a little farther on and perplexed. As I said, Visual C++ 2005 (release 8.0.50727.867) build of OpenMPI 1.4, using CMake 2.6.4, built everything and it all linked. Went ahead and built the PACKAGE item in the OpenMPI.sln project, which made a zip file and an installer (although it was not obvious where to look for this , what its name was, etc., I figured it out by dates on files). Another thing that''s not obvious, is how to shoehorn your code into a VCC project that will successfully build. I created a project from existing files in a place where the include on the mpi.h would be found and examples, etc. did compile. However, they did not find any of the library routines. Link errors. So, I added in the generated libraries location into the search locations for libraries. No good. So, I added all of the generated libraries into the VCC project I created. No good. How does one do this (aside from rigging up something through CMake, cygwin, minGW, or MS SFU)? Charlie ... Original Message Subject: Re: [OMPI users] Windows CMake build problems ... (cont.) From: Shiqing Fan Date: Fri, January 15, 2010 2:56 am To: cjohn...@valverdecomputing.com Cc: Open MPI Users Hi Charlie, Glad to hear that you compiled it successfully. The error you got with 1.3.4 is a bug that the CMake script didn't set the SVN information correctly, and it has been fixed in 1.4 and later. Thanks, Shiqing cjohn...@valverdecomputing.com wrote: Yes that was it. A much improved result now from CMake 2.6.4, no errors from compiling openmpi-1.4: 1>libopen-pal - 0 error(s), 9 warning(s) 2>libopen-rte - 0 error(s), 7 warning(s) 3>opal-restart - 0 error(s), 0 warning(s) 4>opal-wrapper - 0 error(s), 0 warning(s) 5>libmpi - 0 error(s), 42 warning(s) 6>orte-checkpoint - 0 error(s), 0 warning(s) 7>orte-ps - 0 error(s), 0 warning(s) 8>orted - 0 error(s), 0 warning(s) 9>orte-clean - 0 error(s), 0 warning(s) 10>orterun - 0 error(s), 3 warning(s) 11>ompi_info - 0 error(s), 0 warning(s) 12>ompi-server - 0 error(s), 0 warning(s) 13>libmpi_cxx - 0 error(s), 61 warning(s) == Build: 13 succeeded, 0 failed, 1 up-to-date, 0 skipped == And only one failure from compiling openmpi-1.3.4 (the ompi_info project): 1>libopen-pal - 0 error(s), 9 warning(s) 2>libopen-rte - 0 error(s), 7 warning(s) 3>opal-restart - 0 error(s), 0 warning(s) 4>opal-wrapper - 0 error(s), 0 warning(s) 5>orte-checkpoint - 0 error(s), 0 warning(s) 6>libmpi - 0 error(s), 42 warning(s) 7>orte-ps - 0 error(s), 0 warning(s) 8>orted - 0 error(s), 0 warning(s) 9>orte-clean - 0 error(s), 0 warning(s) 10>orterun - 0 error(s), 3 warning(s) 11>ompi_info - 3 error(s), 0 warning(s) 12>ompi-server - 0 error(s), 0 warning(s) 13>libmpi_cxx - 0 error(s), 61 warning(s) == Rebuild All: 13 succeeded, 1 failed, 0 skipped == Here's the listing from the non-linking project: 11>-- Rebuild All started: Project: ompi_info, Configuration: Debug Win32 -- 11>Deleting intermediate and output files for project 'ompi_info', configuration 'Deb
Re: [OMPI users] ABI stabilization/versioning
On Jan 25, 2010, at 7:11 AM, Dave Love wrote: > What's the status of (stabilizing and?) versioning libraries? If I > recall correctly, it was supposed to be defined as fixed for some > release period as of 1.3.something. Correct. We started with 1.3.2 or 1.3.3, IIRC...? I'd have to go back and check to be sure. To be clear, however, we are only versioning the MPI libraries (as you noted, libmpi went to 0.0.1). That is, the hidden sub-libraries (libopen-rte and libopen-pal) are still NOT versioned for complex, icky reasons (see https://svn.open-mpi.org/trac/ompi/ticket/2092 for more details). The short version is that the possibility of static linking really fouls up the scheme, and we haven't figured out a good way around this yet. :-( To be absolutely crystal clear: OMPI's MPI shared libraries now have .so versioning enabled, but you still can't install two copies of Open MPI into the same $prefix (without overriding a bunch of other directory names, that is, like $pkglibdir, etc.). This is because Open MPI has a bunch of files that are not named in relation to OMPI's version number (e.g., $includedir/mpi.h, $mandir/man3/*, $pkgdir/*, libopen-rte.so, etc.). That is, the lack of .so versioning in libopen-rte and libopen-pal are only two of (unfortunately) many reasons that you can't install 2 different versions of Open MPI into the same $prefix. Does that make sense? > I assumed that the libraries would then be versioned (at least for ELF > -- I don't know about other formats) and we could remove a major source > of grief from dynamically linking against the wrong thing, and I think > Jeff said that would happen. Right -- we're using the Libtool shared library versioning scheme. > However, the current sources don't seem to > be trying to set libtool version info, though I'm not sure what > determines them producing .so.0.0.1 instead of .0.0.0 in other binaries > I have. The top-level VERSION file has text fields that set what the version numbers will be for each of the so libraries. These numbers get pasted in to various Makefile's in the build process; hence, the LT .so versioning info is included down at the level where each .so library is created (by Libtool). Check out our wiki page about the shared library version numbering: https://svn.open-mpi.org/trac/ompi/wiki/ReleaseProcedures. > This doesn't seem to have been addressed in the Debian or > Fedora packaging, either > > Is that just an oversight or something dropped, so it could be fixed > (modulo historical mess) if someone did the work? It isn't covered > under http://www.open-mpi.org/software/ompi/versions/ or as far as I can > tell in the FAQ, and seems important (like plenty of other things, I'm > sure!), given how much of a problem it's been for users and admins doing > updates. Good point -- I'll take a to-do to add some text about the shared library versioning scheme in the FAQ and the /versions/ page. Probably not today, but I should be able to get to it this week. Do the links and text I provided above give you enough information / rationale? -- Jeff Squyres jsquy...@cisco.com
Re: [OMPI users] Windows CMake build problems ... (cont.)
Should this kind of info be added to README.windows? On Jan 25, 2010, at 4:34 AM, wrote: > Thanks, that second part about the wrappers was what I was looking for. > > Charlie ... > > Original Message > Subject: Re: [OMPI users] Windows CMake build problems ... (cont.) > From: Shiqing Fan > Date: Mon, January 25, 2010 2:09 am > To: cjohn...@valverdecomputing.com > Cc: Open MPI Users > > > Hi Charlie, > > Actually, to compile and link your application with Open MPI on Windows > is similar as on Linux. You have to link your application against the > generated Open MPI libraries, e.g. libopen-mpi.lib (don't forget the > suffix 'd' if you build debug version of the OMPI libraries, e.g. > libopen-mpid.lib). > > But according to the information you provided, I assume that you only > added the search path into the project, that's not enough, you should > probably add the library names into "Project Property Pages" -> > "Configuration Properties" -> Linker -> Input -> "Additional > Dependencies", normally only libopen-mpi.lib (or libopen-mpid.lib) would > be enough, so that Visual Studio will know which libraries to link to. > > Besides, the Open MPI compiler wrappers should also work on Windows, in > this case you just need to open a "Visual Studio command prompt" with > the Open MPI path env added (e.g. "set PATH=c:\Program > Files\OpenMPI_v1.4\bin;%PATH%"), and simply run command like: > > > mpicc app.c > > and > > > mpirun -np 2 app.exe > > > Please note that, before executing the application, Open MPI has to be > installed somewhere either by build the "INSTALL" project or by running > the generated installer, so that the correct Open MPI folder structure > could be created. > > > Regards, > Shiqing > > > cjohn...@valverdecomputing.com wrote: > > OK, so I'm a little farther on and perplexed. > > > > As I said, Visual C++ 2005 (release 8.0.50727.867) build > > of OpenMPI 1.4, using CMake 2.6.4, built everything and it all linked. > > > > Went ahead and built the PACKAGE item in the OpenMPI.sln project, > > which made a zip file and an installer (although it was not obvious > > where to look for this , what its name was, etc., I figured it out by > > dates on files). > > > > Another thing that''s not obvious, is how to shoehorn your code into a > > VCC project that will successfully build. > > > > I created a project from existing files in a place where the include > > on the mpi.h would be found and examples, etc. did compile. > > > > However, they did not find any of the library routines. Link errors. > > > > So, I added in the generated libraries location into the search > > locations for libraries. > > > > No good. > > > > So, I added all of the generated libraries into the VCC project I created. > > > > No good. > > > > How does one do this (aside from rigging up something through CMake, > > cygwin, minGW, or MS SFU)? > > > > Charlie ... > > > > > > Original Message > > Subject: Re: [OMPI users] Windows CMake build problems ... (cont.) > > From: Shiqing Fan > > Date: Fri, January 15, 2010 2:56 am > > To: cjohn...@valverdecomputing.com > > Cc: Open MPI Users > > > > > > Hi Charlie, > > > > Glad to hear that you compiled it successfully. > > > > The error you got with 1.3.4 is a bug that the CMake script didn't > > set > > the SVN information correctly, and it has been fixed in 1.4 and later. > > > > > > Thanks, > > Shiqing > > > > > > cjohn...@valverdecomputing.com wrote: > > > Yes that was it. > > > > > > A much improved result now from CMake 2.6.4, no errors from > > compiling > > > openmpi-1.4: > > > > > > 1>libopen-pal - 0 error(s), 9 warning(s) > > > 2>libopen-rte - 0 error(s), 7 warning(s) > > > 3>opal-restart - 0 error(s), 0 warning(s) > > > 4>opal-wrapper - 0 error(s), 0 warning(s) > > > 5>libmpi - 0 error(s), 42 warning(s) > > > 6>orte-checkpoint - 0 error(s), 0 warning(s) > > > 7>orte-ps - 0 error(s), 0 warning(s) > > > 8>orted - 0 error(s), 0 warning(s) > > > 9>orte-clean - 0 error(s), 0 warning(s) > > > 10>orterun - 0 error(s), 3 warning(s) > > > 11>ompi_info - 0 error(s), 0 warning(s) > > > 12>ompi-server - 0 error(s), 0 warning(s) > > > 13>libmpi_cxx - 0 error(s), 61 warning(s) > > > == Build: 13 succeeded, 0 failed, 1 up-to-date, 0 skipped > > > == > > > > > > And only one failure from compiling openmpi-1.3.4 (the ompi_info > > project): > > > > > > > 1>libopen-pal - 0 error(s), 9 warning(s) > > > > 2>libopen-rte - 0 error(s), 7 warning(s) > > > > 3>opal-restart - 0 error(s), 0 warning(s) > > > > 4>opal-wrapper - 0 error(s), 0 warning(s) > > > > 5>orte-checkpoint - 0 error(s), 0 warning(s) > > > > 6>libmpi - 0 error(s), 42 warning(s) > > > > 7>orte-ps - 0 error(s), 0 warning(s) > > > > 8>orted - 0 error(s), 0 warning(s) > > > > 9>orte-clean - 0 error(s), 0 warning(s) > > > > 10>orterun - 0 error(s), 3 warning(s) > > > > 11>ompi_info - 3 error(s), 0 warning(s) > > > > 12>ompi-serve
Re: [OMPI users] ABI stabilization/versioning
Am Montag, den 25.01.2010, 12:11 + schrieb Dave Love: > I assumed that the libraries would then be versioned (at least for ELF > -- I don't know about other formats) and we could remove a major source > of grief from dynamically linking against the wrong thing, and I think > Jeff said that would happen. However, the current sources don't seem to > be trying to set libtool version info, though I'm not sure what > determines them producing .so.0.0.1 instead of .0.0.0 in other binaries > I have. This doesn't seem to have been addressed in the Debian or > Fedora packaging, either The ABI should be stable since 1.3.2. OMPI 1.4.x does set the libtool version info; Versions where bumped to 0.0.1 for libmpi which has no effect for dynamic linking. Could you please elaborate on what needs to be addressed? Debian does not have 1.4.1 yet though I'm planning to upload it really soon. The ABI did not change (also not in an incompatible way, AFAICS). If you know of any issues, I'd be glad if you could tell us, so we can find a solution before any damage is done. Thanks in advance! Best regards Manuel
[OMPI users] Problems building Open MPI 1.4.1 with Pathscale
Hello: I'm having some issues with Open MPI 1.4.1 and Pathscale compiler (version 3.2). Open MPI builds successfully with the following configure arguments: ./configure --with-openib=/usr --with-openib-libdir=/usr/lib64 --with-sge --enable-static CC=pathcc CXX=pathCC F77=pathf90 F90=pathf90 FC=pathf90 (we have OpenFabrics 1.2 Infiniband drivers, by the way) However, applications hang on MPI_Init (or maybe MPI_Comm_rank or MPI_Comm_size, a basic hello-world anyway doesn't print 'Hello World from node...'). I tried running them with and without SGE. Same result. This hello-world works flawlessly when I build Open MPI with gcc: ./configure --with-openib=/usr --with-openib-libdir=/usr/lib64 --with-sge --enable-static This successful execution runs in one machine only, so it shouldn't use Infiniband, and it also works when several nodes are used. I was able to build previous versions of Open MPI with Pathscale (1.2.6 and 1.3.2, particularly). I tried building version 1.4.1 both with Pathscale 3.2 and Pathscale 3.1. No difference. Any ideas? Thank you in advance, Rafa -- Rafael Arco Arredondo Centro de Servicios de Informática y Redes de Comunicaciones Universidad de Granada
[OMPI users] ABI stabilization/versioning
What's the status of (stabilizing and?) versioning libraries? If I recall correctly, it was supposed to be defined as fixed for some release period as of 1.3.something. I assumed that the libraries would then be versioned (at least for ELF -- I don't know about other formats) and we could remove a major source of grief from dynamically linking against the wrong thing, and I think Jeff said that would happen. However, the current sources don't seem to be trying to set libtool version info, though I'm not sure what determines them producing .so.0.0.1 instead of .0.0.0 in other binaries I have. This doesn't seem to have been addressed in the Debian or Fedora packaging, either Is that just an oversight or something dropped, so it could be fixed (modulo historical mess) if someone did the work? It isn't covered under http://www.open-mpi.org/software/ompi/versions/ or as far as I can tell in the FAQ, and seems important (like plenty of other things, I'm sure!), given how much of a problem it's been for users and admins doing updates.
Re: [OMPI users] Checkpoint/Restart error
So? anyone? any clue? Summarize: - installed OpenMPI 1.4.1 on fresh Centos 5 - mpirun works but ompi-checkpoint throws this error: ORTE_ERROR_LOG: Not found in file orte-checkpoint.c at line 405 - on another VM I have OpenMPI 1.3.3. installed. Checkpointing works fine on guest but has the previous mentioned error on root. Both root and guest show the same output after "param -all -all" except for the $HOME (which only matters for mca_component_path, mca_param_files, snapc_base_global_snapshot_dir) Thanks, Andreea On Tue, Jan 19, 2010 at 9:01 PM, Andreea Costea wrote: > I noticed one more thing. As I still have some VMs that have OpenMPI > version 1.3.3 installed I started to use those machines 'till I fix the > problem with 1.4.1 And while checkpointing on one of this VMs I realized > that checkpointing as a guest works fine and checkpointing as a root outputs > the same error like in 1.4.1. : ORTE_ERROR_LOG: Not found in file > orte-checkpoint.c at line 405 > > I logged the outputs of "ompi_info --param all all" which I run for root > and for another user and the only differences were at these parameters: > > mca_component_path > mca_param_files > snapc_base_global_snapshot_dir > > All 3 params differ because of the $HOME. > One more thing: I don't have the directory $HOME/.openmpi > > Ideas? > > Thanks, > Andreea > > > > > > On Tue, Jan 19, 2010 at 12:51 PM, Andreea Costea > wrote: > >> Well... I decided to install a fresh OS to be sure that there is no >> OpenMPI version conflict. So I formatted one of my VMs, did a fresh CentOS >> install, installed BLCR 0.8.2 and OpenMPI 1.4.1 and the result: the same. >> mpirun works but ompi-checkpoint has that error at line 405: >> >> [[35906,0],0] ORTE_ERROR_LOG: Not found in file orte-checkpoint.c at line >> 405 >> >> As for the files remaining after uninstalling: Jeff you were rigth. There >> is no file left, just some empty directories. >> >> Which might be the problem with that ORTE_ERROR_LOG error? >> >> Thanks, >> Andreea >> >> On Fri, Jan 15, 2010 at 11:47 PM, Andreea Costea >> wrote: >> >>> It's almost midnight here, so I left home, but I will try it tomorrow. >>> There were some directories left after "make uninstall". I will give more >>> details tomorrow. >>> >>> Thanks Jeff, >>> Andreea >>> >>> >>> On Fri, Jan 15, 2010 at 11:30 PM, Jeff Squyres wrote: >>> On Jan 15, 2010, at 8:07 AM, Andreea Costea wrote: > - I wanted to update to version 1.4.1 and I uninstalled previous version like this: make uninstall, and than manually deleted all the left over files. the directory where I installed was /usr/local I'll let Josh answer your CR questions, but I did want to ask about this point. AFAIK, "make uninstall" removes *all* Open MPI files. For example: - [7:25] $ cd /path/to/my/OMPI/tree [7:25] $ make install > /dev/null [7:26] $ find /tmp/bogus/ -type f | wc 646 646 28082 [7:26] $ make uninstall > /dev/null [7:27] $ find /tmp/bogus/ -type f | wc 0 0 0 [7:27] $ - I realize that some *directories* are left in $prefix, but there should be no *files* left. Are you seeing something different? -- Jeff Squyres jsquy...@cisco.com ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users >>> >>> >> >
Re: [OMPI users] Windows CMake build problems ... (cont.)
Thanks, that second part about the wrappers was what I was looking for. Charlie ... Original Message Subject: Re: [OMPI users] Windows CMake build problems ... (cont.)From: Shiqing Fan Date: Mon, January 25, 2010 2:09 amTo: cjohn...@valverdecomputing.comCc: Open MPI Users Hi Charlie,Actually, to compile and link your application with Open MPI on Windows is similar as on Linux. You have to link your application against the generated Open MPI libraries, e.g. libopen-mpi.lib (don't forget the suffix 'd' if you build debug version of the OMPI libraries, e.g. libopen-mpid.lib).But according to the information you provided, I assume that you only added the search path into the project, that's not enough, you should probably add the library names into "Project Property Pages" -> "Configuration Properties" -> Linker -> Input -> "Additional Dependencies", normally only libopen-mpi.lib (or libopen-mpid.lib) would be enough, so that Visual Studio will know which libraries to link to.Besides, the Open MPI compiler wrappers should also work on Windows, in this case you just need to open a "Visual Studio command prompt" with the Open MPI path env added (e.g. "set PATH=c:\Program Files\OpenMPI_v1.4\bin;%PATH%"), and simply run command like:> mpicc app.cand> mpirun -np 2 app.exePlease note that, before executing the application, Open MPI has to be installed somewhere either by build the "INSTALL" project or by running the generated installer, so that the correct Open MPI folder structure could be created.Regards,Shiqingcjohn...@valverdecomputing.com wrote:> OK, so I'm a little farther on and perplexed.> > As I said, Visual C++ 2005 (release 8.0.50727.867) build > of OpenMPI 1.4, using CMake 2.6.4, built everything and it all linked.> > Went ahead and built the PACKAGE item in the OpenMPI.sln project, > which made a zip file and an installer (although it was not obvious > where to look for this , what its name was, etc., I figured it out by > dates on files).> > Another thing that''s not obvious, is how to shoehorn your code into a > VCC project that will successfully build.> > I created a project from existing files in a place where the include > on the mpi.h would be found and examples, etc. did compile.> > However, they did not find any of the library routines. Link errors.> > So, I added in the generated libraries location into the search > locations for libraries.> > No good.> > So, I added all of the generated libraries into the VCC project I created.> > No good.> > How does one do this (aside from rigging up something through CMake, > cygwin, minGW, or MS SFU)?> > Charlie ...> >> Original Message > Subject: Re: [OMPI users] Windows CMake build problems ... (cont.)> From: Shiqing Fan > Date: Fri, January 15, 2010 2:56 am> To: cjohn...@valverdecomputing.com> Cc: Open MPI Users >>> Hi Charlie,>> Glad to hear that you compiled it successfully.>> The error you got with 1.3.4 is a bug that the CMake script didn't> set> the SVN information correctly, and it has been fixed in 1.4 and later.>>> Thanks,> Shiqing>>> cjohn...@valverdecomputing.com wrote:> > Yes that was it.> >> > A much improved result now from CMake 2.6.4, no errors from> compiling> > openmpi-1.4:> >> > 1>libopen-pal - 0 error(s), 9 warning(s)> > 2>libopen-rte - 0 error(s), 7 warning(s)> > 3>opal-restart - 0 error(s), 0 warning(s)> > 4>opal-wrapper - 0 error(s), 0 warning(s)> > 5>libmpi - 0 error(s), 42 warning(s)> > 6>orte-checkpoint - 0 error(s), 0 warning(s)> > 7>orte-ps - 0 error(s), 0 warning(s)> > 8>orted - 0 error(s), 0 warning(s)> > 9>orte-clean - 0 error(s), 0 warning(s)> > 10>orterun - 0 error(s), 3 warning(s)> > 11>ompi_info - 0 error(s), 0 warning(s)> > 12>ompi-server - 0 error(s), 0 warning(s)> > 13>libmpi_cxx - 0 error(s), 61 warning(s)> > == Build: 13 succeeded, 0 failed, 1 up-to-date, 0 skipped> > ==> >> > And only one failure from compiling openmpi-1.3.4 (the ompi_info> project):> >> > > 1>libopen-pal - 0 error(s), 9 warning(s)> > > 2>libopen-rte - 0 error(s), 7 warning(s)> > > 3>opal-restart - 0 error(s), 0 warning(s)> > > 4>opal-wrapper - 0 error(s), 0 warning(s)> > > 5>orte-checkpoint - 0 error(s), 0 warning(s)> > > 6>libmpi - 0 error(s), 42 warning(s)> > > 7>orte-ps - 0 error(s), 0 warning(s)> > > 8>orted - 0 error(s), 0 warning(s)> > > 9>orte-clean - 0 error(s), 0 warning(s)> > > 10>orterun - 0 error(s), 3 warning(s)> > > 11>ompi_info - 3 error(s), 0 warning(s)> > > 12>ompi-server - 0 error(s), 0 warning(s)> > > 13>libmpi_cxx - 0 error(s), 61 warning(s)> > > == Rebuild All: 13 succeeded, 1 failed, 0 skipped> ==> >> > Here's the listing from the non-linking project:> >> > 11>-- Rebuild All started: Project: ompi_info, Configuration:> > Debug Win32 --> > 11>Deleting intermediate and output files for project 'ompi_info',> > configuration 'Debug|Win32'> > 11>Compiling...> > 11>version.cc> > 11>..\..\..\..\openmpi-1.3.4\ompi\tools\ompi_info\version.cc(136) :> > err
Re: [OMPI users] Windows CMake build problems ... (cont.)
Hi Charlie, Actually, to compile and link your application with Open MPI on Windows is similar as on Linux. You have to link your application against the generated Open MPI libraries, e.g. libopen-mpi.lib (don't forget the suffix 'd' if you build debug version of the OMPI libraries, e.g. libopen-mpid.lib). But according to the information you provided, I assume that you only added the search path into the project, that's not enough, you should probably add the library names into "Project Property Pages" -> "Configuration Properties" -> Linker -> Input -> "Additional Dependencies", normally only libopen-mpi.lib (or libopen-mpid.lib) would be enough, so that Visual Studio will know which libraries to link to. Besides, the Open MPI compiler wrappers should also work on Windows, in this case you just need to open a "Visual Studio command prompt" with the Open MPI path env added (e.g. "set PATH=c:\Program Files\OpenMPI_v1.4\bin;%PATH%"), and simply run command like: > mpicc app.c and > mpirun -np 2 app.exe Please note that, before executing the application, Open MPI has to be installed somewhere either by build the "INSTALL" project or by running the generated installer, so that the correct Open MPI folder structure could be created. Regards, Shiqing cjohn...@valverdecomputing.com wrote: OK, so I'm a little farther on and perplexed. As I said, Visual C++ 2005 (release 8.0.50727.867) build of OpenMPI 1.4, using CMake 2.6.4, built everything and it all linked. Went ahead and built the PACKAGE item in the OpenMPI.sln project, which made a zip file and an installer (although it was not obvious where to look for this , what its name was, etc., I figured it out by dates on files). Another thing that''s not obvious, is how to shoehorn your code into a VCC project that will successfully build. I created a project from existing files in a place where the include on the mpi.h would be found and examples, etc. did compile. However, they did not find any of the library routines. Link errors. So, I added in the generated libraries location into the search locations for libraries. No good. So, I added all of the generated libraries into the VCC project I created. No good. How does one do this (aside from rigging up something through CMake, cygwin, minGW, or MS SFU)? Charlie ... Original Message Subject: Re: [OMPI users] Windows CMake build problems ... (cont.) From: Shiqing Fan Date: Fri, January 15, 2010 2:56 am To: cjohn...@valverdecomputing.com Cc: Open MPI Users Hi Charlie, Glad to hear that you compiled it successfully. The error you got with 1.3.4 is a bug that the CMake script didn't set the SVN information correctly, and it has been fixed in 1.4 and later. Thanks, Shiqing cjohn...@valverdecomputing.com wrote: > Yes that was it. > > A much improved result now from CMake 2.6.4, no errors from compiling > openmpi-1.4: > > 1>libopen-pal - 0 error(s), 9 warning(s) > 2>libopen-rte - 0 error(s), 7 warning(s) > 3>opal-restart - 0 error(s), 0 warning(s) > 4>opal-wrapper - 0 error(s), 0 warning(s) > 5>libmpi - 0 error(s), 42 warning(s) > 6>orte-checkpoint - 0 error(s), 0 warning(s) > 7>orte-ps - 0 error(s), 0 warning(s) > 8>orted - 0 error(s), 0 warning(s) > 9>orte-clean - 0 error(s), 0 warning(s) > 10>orterun - 0 error(s), 3 warning(s) > 11>ompi_info - 0 error(s), 0 warning(s) > 12>ompi-server - 0 error(s), 0 warning(s) > 13>libmpi_cxx - 0 error(s), 61 warning(s) > == Build: 13 succeeded, 0 failed, 1 up-to-date, 0 skipped > == > > And only one failure from compiling openmpi-1.3.4 (the ompi_info project): > > > 1>libopen-pal - 0 error(s), 9 warning(s) > > 2>libopen-rte - 0 error(s), 7 warning(s) > > 3>opal-restart - 0 error(s), 0 warning(s) > > 4>opal-wrapper - 0 error(s), 0 warning(s) > > 5>orte-checkpoint - 0 error(s), 0 warning(s) > > 6>libmpi - 0 error(s), 42 warning(s) > > 7>orte-ps - 0 error(s), 0 warning(s) > > 8>orted - 0 error(s), 0 warning(s) > > 9>orte-clean - 0 error(s), 0 warning(s) > > 10>orterun - 0 error(s), 3 warning(s) > > 11>ompi_info - 3 error(s), 0 warning(s) > > 12>ompi-server - 0 error(s), 0 warning(s) > > 13>libmpi_cxx - 0 error(s), 61 warning(s) > > == Rebuild All: 13 succeeded, 1 failed, 0 skipped == > > Here's the listing from the non-linking project: > > 11>-- Rebuild All started: Project: ompi_info, Configuration: > Debug Win32 -- > 11>Deleting intermediate and output files for project 'ompi_info', > configuration 'Debug|Win32' > 11>Compiling... > 11>version.cc > 11>..\..\..\..\openmpi-1.3.4\ompi\tools\ompi_info\version.cc(136) : > error C2059: syntax error : ',' > 11>..\..\..\..\openmpi-1.3.4\ompi\to
Re: [OMPI users] Windows CMake build problems ... (cont.)
OK, so I'm a little farther on and perplexed. As I said, Visual C++ 2005 (release 8.0.50727.867) build of OpenMPI 1.4, using CMake 2.6.4, built everything and it all linked. Went ahead and built the PACKAGE item in the OpenMPI.sln project, which made a zip file and an installer (although it was not obvious where to look for this , what its name was, etc., I figured it out by dates on files). Another thing that''s not obvious, is how to shoehorn your code into a VCC project that will successfully build. I created a project from existing files in a place where the include on the mpi.h would be found and examples, etc. did compile. However, they did not find any of the library routines. Link errors. So, I added in the generated libraries location into the search locations for libraries. No good. So, I added all of the generated libraries into the VCC project I created. No good. How does one do this (aside from rigging up something through CMake, cygwin, minGW, or MS SFU)? Charlie ... Original Message Subject: Re: [OMPI users] Windows CMake build problems ... (cont.)From: Shiqing Fan Date: Fri, January 15, 2010 2:56 amTo: cjohn...@valverdecomputing.comCc: Open MPI Users Hi Charlie,Glad to hear that you compiled it successfully.The error you got with 1.3.4 is a bug that the CMake script didn't set the SVN information correctly, and it has been fixed in 1.4 and later.Thanks,Shiqingcjohn...@valverdecomputing.com wrote:> Yes that was it.>> A much improved result now from CMake 2.6.4, no errors from compiling > openmpi-1.4:>> 1>libopen-pal - 0 error(s), 9 warning(s)> 2>libopen-rte - 0 error(s), 7 warning(s)> 3>opal-restart - 0 error(s), 0 warning(s)> 4>opal-wrapper - 0 error(s), 0 warning(s)> 5>libmpi - 0 error(s), 42 warning(s)> 6>orte-checkpoint - 0 error(s), 0 warning(s)> 7>orte-ps - 0 error(s), 0 warning(s)> 8>orted - 0 error(s), 0 warning(s)> 9>orte-clean - 0 error(s), 0 warning(s)> 10>orterun - 0 error(s), 3 warning(s)> 11>ompi_info - 0 error(s), 0 warning(s)> 12>ompi-server - 0 error(s), 0 warning(s)> 13>libmpi_cxx - 0 error(s), 61 warning(s)> == Build: 13 succeeded, 0 failed, 1 up-to-date, 0 skipped > ==>> And only one failure from compiling openmpi-1.3.4 (the ompi_info project):>> > 1>libopen-pal - 0 error(s), 9 warning(s)> > 2>libopen-rte - 0 error(s), 7 warning(s)> > 3>opal-restart - 0 error(s), 0 warning(s)> > 4>opal-wrapper - 0 error(s), 0 warning(s)> > 5>orte-checkpoint - 0 error(s), 0 warning(s)> > 6>libmpi - 0 error(s), 42 warning(s)> > 7>orte-ps - 0 error(s), 0 warning(s)> > 8>orted - 0 error(s), 0 warning(s)> > 9>orte-clean - 0 error(s), 0 warning(s)> > 10>orterun - 0 error(s), 3 warning(s)> > 11>ompi_info - 3 error(s), 0 warning(s)> > 12>ompi-server - 0 error(s), 0 warning(s)> > 13>libmpi_cxx - 0 error(s), 61 warning(s)> > == Rebuild All: 13 succeeded, 1 failed, 0 skipped ==>> Here's the listing from the non-linking project:>> 11>-- Rebuild All started: Project: ompi_info, Configuration: > Debug Win32 --> 11>Deleting intermediate and output files for project 'ompi_info', > configuration 'Debug|Win32'> 11>Compiling...> 11>version.cc> 11>..\..\..\..\openmpi-1.3.4\ompi\tools\ompi_info\version.cc(136) : > error C2059: syntax error : ','> 11>..\..\..\..\openmpi-1.3.4\ompi\tools\ompi_info\version.cc(147) : > error C2059: syntax error : ','> 11>..\..\..\..\openmpi-1.3.4\ompi\tools\ompi_info\version.cc(158) : > error C2059: syntax error : ','> 11>param.cc> 11>output.cc> 11>ompi_info.cc> 11>components.cc> 11>Generating Code...> 11>Build log was saved at > "file://c:\prog\mon\ompi\tools\ompi_info\ompi_info.dir\Debug\BuildLog.htm"> 11>ompi_info - 3 error(s), 0 warning(s)>> Thank you Shiqing !>> Charlie ...>> Original Message > Subject: Re: [OMPI users] Windows CMake build problems ... (cont.)> From: Shiqing Fan > Date: Thu, January 14, 2010 11:20 am> To: Open MPI Users ,> cjohn...@valverdecomputing.com>>> Hi Charlie,>> The problem turns out to be the different behavior of one CMake> macro in> different version of CMake. And it's fixed in Open MPI trunk with> r22405. I also created a ticket to move the fix over to 1.4> branch, see> #2169: https://svn.open-mpi.org/trac/ompi/ticket/2169 .>> So you could either switch to use OMPI trunk or use CMake 2.6 to> solve> the problem. Thanks a lot.>>> Best Regards,> Shiqing>>> cjohn...@valverdecomputing.com wrote:> > The OpenMPI build problem I'm having occurs in both OpenMPI 1.4> and 1.3.4.> >> > I am on a Windows 7 (US) Enterprise (x86) OS on an HP system with> > Intel core 2 extreme x9000 (4GB RAM), using the 2005 Visual> Studio for> > S/W Architects (release 8.0.50727.867).> >> > [That release has everything the platform SDK would have.]> >> > I'm using CMake 2.8 to generate code, I used it correctly,> pointing at> > the root directory where the makelists are located for the source> side> > and to an empty directory for the build side: did configure, _*I did> > not cli