Re: [OMPI users] Problem with MPI_File_read() (2)
> > In general, files written by MPI_File_write (and friends) are only > guaranteed to be readable by MPI_File_read (and friends). So if you > have an ASCII input file, or even a binary input file, you might need > to read it in with traditional/unix file read functions and then write > it out with MPI_File_write. Then your parallel application will be > able to use the various MPI_File_* functions to read the file at run- > time. Hence, there's no real generic -> > convertor; you'll need to write your own that is specific to your data. > > Make sense? Hello Jeff! Thanks a lot! Yes, sure, what you say makes sense. On the other hand, it seems I will have to "traditionaly"-open the input file for n times - each one for one process, since anyway all of my processes have to collect their data from it (each parsing it from the beginning to the end), don't you think so? I wanted to take an advantage of MPI to read (in each process) the data from one file... Or have I misunderstood something?
Re: [OMPI users] Problem with MPI_File_read() (2)
On Apr 15, 2009, at 5:06 AM, Jovana Knezevic wrote: Yes, sure, what you say makes sense. On the other hand, it seems I will have to "traditionaly"-open the input file for n times - each one for one process, since anyway all of my processes have to collect their data from it (each parsing it from the beginning to the end), don't you think so? I wanted to take an advantage of MPI to read (in each process) the data from one file... Or have I misunderstood something? The idea behind MPI I/O is that it can be done in parallel. It usually works best when you have an underlying parallel filesystem. In such cases (typically paired with very large input data), you can exploit the parallelism of the underlying filesystem to efficiently get just the necessary data to each MPI process. If you input data isn't that large, or if you don't have a parallel filesystem (e.g., you're just using NFS), such schemes can actually be less efficient / slower. It may even be better to have something like MPI_COMM_WORLD rank 0 read in the entire file and MPI_BCAST / MPI_SCATTER / etc. the relevant data to each MPI process as necessary. It's up to you to decide which is best for your application; it really depends on the requirements of what you are doing, your local setup, etc. -- Jeff Squyres Cisco Systems
Re: [OMPI users] libnuma issue
On Apr 6, 2009, at 4:24 PM, Prentice Bisbal wrote: > I would appreciate help in circumventing the problem. I believe you need --with-libnuma=/usr. Sorry for the late reply. FWIW, the above is correct -- you should use --with-libnuma=/usr, not --with-libnuma=/usr/lib. Please also note in this thread: http://www.open-mpi.org/community/lists/users/2009/04/8853.php We found some obscure logic issues with the handling of --with- libnuma. I doubt that those should affect you (since you're mentioning an explicit directory for libnuma), but I mention it to be complete. -- Jeff Squyres Cisco Systems
Re: [OMPI users] Incorrect results with MPI-IO under OpenMPI v1.3.1
Can either of you provide a small example that shows the problem, perchance? On Apr 6, 2009, at 4:41 PM, Yvan Fournier wrote: Hello to all, I have also encountered a similar bug with MPI-IO with Open MPI 1.3.1, reading a Code_Saturne preprocessed mesh file (www.code-saturne.org). Reading the file can be done using 2 MPI-IO modes, or one non-MPI-IO mode. The first MPI-IO mode uses individual file pointers, and involves a series of MPI_File_Read_all with all ranks using the same view (for record headers), interlaced with MPI_File_Read_all with ranks using different views (for record data, successive blocks being read by each rank). The second MPI-IO mode uses explicit file offsets, with MPI_File_read_at_all instead of MPI_File_read_all. Both MPI-IO modes seem to work fine with OpenMPI 1.2, MPICH 2, and variants on IBM Blue Gene/L and P, as well as Bull Novascale, but with OpenMPI 1.3.1, data read seems to be corrupt on at least one file using the individual file pointers approach (though it works well using explicit offsets). The bug does not appear in unit tests, and it only appears after several records are read on the case that does fail (on 2 ranks), so to reproduce it with a simple program, I would have to extract the exact file access patterns from the exact case which fails, which would require a few extra hours of work. If the bug is not reproduced in a simpler manner first, I will try to build a simple program reproducing the bug within a week or 2, but In the meantime, I just want to confirm Scott's observation (hoping it is the same bug). Best regards, Yvan Fournier On Mon, 2009-04-06 at 16:03 -0400, users-requ...@open-mpi.org wrote: > Date: Mon, 06 Apr 2009 12:16:18 -0600 > From: Scott Collis > Subject: [OMPI users] Incorrect results with MPI-IO under OpenMPI > v1.3.1 > To: us...@open-mpi.org > Message-ID: > Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes > > I have been a user of MPI-IO for 4+ years and have a code that has run > correctly with MPICH, MPICH2, and OpenMPI 1.2.* > > I recently upgraded to OpenMPI 1.3.1 and immediately noticed that my > MPI-IO generated output files are corrupted. I have not yet had a > chance to debug this in detail, but it appears that > MPI_File_write_all() commands are not placing information correctly on > their file_view when running with more than 1 processor (everything is > okay with -np 1). > > Note that I have observed the same incorrect behavior on both Linux > and OS-X. I have also gone back and made sure that the same code > works with MPICH, MPICH2, and OpenMPI 1.2.* so I'm fairly confident > that something has been changed or broken as of OpenMPI 1.3.*. Just > today, I checked out the SVN repository version of OpenMPI and built > and tested my code with that and the results are incorrect just as for > the 1.3.1 tarball. > > While I plan to continue to debug this and will try to put together a > small test that demonstrates the issue, I thought that I would first > send out this message to see if this might trigger a thought within > the OpenMPI development team as to where this issue might be. > > Please let me know if you have any ideas as I would very much > appreciate it! > > Thanks in advance, > > Scott > -- > Scott Collis > sscol...@me.com > ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users -- Jeff Squyres Cisco Systems
Re: [OMPI users] openmpi src rpm and message coalesce
On Apr 10, 2009, at 9:54 AM, vkm wrote: I was trying to understand how "btl_openib_use_message_coalescing" is working. Heh. It's ugly. :-) It's purely a benchmark optimization; there are very few (if any) real- world apps that will benefit from this feature. I freely admit that we were pressured by marketing types to put in this feature (despite resisting this feature for a year or two). Basically, if you're sending the same exact message to the same MPI peer repeatedly, and if you run out of networking buffers (e.g., you're waiting for the current set of messages to drain before any more network buffers will become available), if you notice that the last message on the queue is exactly the same as your message, then you can just increment a counter on the last message. This effectively means that when you send that last message, you are effectively sending N (where N == the counter) messages in that one fragment. The receiver knows/ understands this optimization and will match N posted MPI receives against that one incoming message. This is a bunch of logic that was added that benefits benchmarks but not real apps. Yuck. :-( Since for a certain test scenario, IMB-EXT is working if I use "btl_openib_use_message_coalescing = 0" and not for "btl_openib_use_message_coalescing = 1" No idea, who can have BUG here either open-mpi or low-level- driver !! ?? Could this be related to http://www.open-mpi.org/community/lists/announce/2009/03/0029.php? Howsoever, I have one more concern as well. I added some prints to debug openmpi. I was following below procedure, Extract OFED TAR Extract openmpi*.src.rpm Go to SOURCE Extract openmpi*.tgz modify code Create TAR Create openmpi*.src.rpm Build rpm It is probably a whole lot simpler / faster to just get a source tarball from www.open-mpi.org and build / install it manually (rather than create a new RPM every time). Particularly if you're adding printf's in Open MPI components -- you can just "make install" directly from the component directory (which will compile and install just that plugin -- not all of OMPI). Note, too, that you might want to use "opal_output(0, "printf-like string with %d, %s, ...etc.", ...printf-like varargs)" for debugging output instead of printf. -- Jeff Squyres Cisco Systems
Re: [OMPI users] mpirun: symbol lookup error:/usr/local/lib/openmpi/mca_plm_lsf.so: undefined symbol: ls b_init
Chris graciously gave me access to his machines to test this on. With this access, I found the problem and scheduled the fix to be applied to the 1.3 series: https://svn.open-mpi.org/trac/ompi/attachment/ticket/1885 Thanks Chris! On Apr 11, 2009, at 11:04 PM, Chris Walker wrote: We're having this same problem with 1.3 and 1.3.1. In our case, it looks like mca_plm_lsf.so doesn't load libbat or liblsf: [root@hero0101 openmpi]# ldd mca_plm_lsf.so libnsl.so.1 => /lib64/libnsl.so.1 (0x2adbec183000) libutil.so.1 => /lib64/libutil.so.1 (0x2adbec39b000) libm.so.6 => /lib64/libm.so.6 (0x2adbec59e000) libpthread.so.0 => /lib64/libpthread.so.0 (0x2adbec822000) libc.so.6 => /lib64/libc.so.6 (0x2adbeca3c000) /lib64/ld-linux-x86-64.so.2 (0x003945c0) [root@hero0101 openmpi]# mca_ess_lsf.so and mca_ras_lsf.so both do, however, e.g., [root@hero0101 openmpi]# ldd mca_ras_lsf.so libbat.so => /lsf/7.0/linux2.6-glibc2.3-x86_64/lib/libbat.so (0x2b86740ee000) liblsf.so => /lsf/7.0/linux2.6-glibc2.3-x86_64/lib/liblsf.so (0x2b8674384000) libnsl.so.1 => /lib64/libnsl.so.1 (0x2b8674693000) libutil.so.1 => /lib64/libutil.so.1 (0x2b86748ac000) libm.so.6 => /lib64/libm.so.6 (0x2b8674aaf000) libpthread.so.0 => /lib64/libpthread.so.0 (0x2b8674d32000) libc.so.6 => /lib64/libc.so.6 (0x2b8674f4d000) /lib64/ld-linux-x86-64.so.2 (0x003945c0) [root@hero0101 openmpi]# Chris ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users -- Jeff Squyres Cisco Systems
Re: [OMPI users] libnuma issue
I used --with-libnuma=/usr since Prentice Bisbal's suggestion and it worked. Unfortunately, I found no way to fix the failure in finding libimf.so when compiling openmpi-1.3.1 with intels, as you have seen in other e-mail from me. And gnu compilers (which work well with both openmpi and the slower code of my application) are defeated by the faster code of my application. With limited hardware resources, I must rely on that 40% speeding up. My post how to implement the -rpath flag, which should fix the libimf.so problem, has found no answer. And i found no help through google. If you have a suggestion on that it would be great. I was referring to the following notes: "However, dynamic linkage is also a headache in that the mechanisms used to find shared libraries during dynamic loading are not all that robust on Linux systems running MPICH or other MPI packages. Typically the LOAD_LIBRARY_PATH environment variable will be used to find shared libraries during loading, but this variable is not reliably propagated to all processes. For this reason, for the compilers that use compiler shared libraries (ifort, pathscale), we use LD_LIBRARY_PATH during configuration to set an -rpath linkage option, which is reliably available in the executable. This works well as long as you insure that the path is the same for all machines running pmemd. Earlier versions of ifort actually also set -rpath, but this was dropped due to confusing error messages when ifort is executed without arguments." thanks francesco On Wed, Apr 15, 2009 at 1:04 PM, Jeff Squyres wrote: > On Apr 6, 2009, at 4:24 PM, Prentice Bisbal wrote: > >> > I would appreciate help in circumventing the problem. >> >> I believe you need >> >> --with-libnuma=/usr. >> > > > Sorry for the late reply. > > FWIW, the above is correct -- you should use --with-libnuma=/usr, not > --with-libnuma=/usr/lib. > > Please also note in this thread: > > http://www.open-mpi.org/community/lists/users/2009/04/8853.php > > We found some obscure logic issues with the handling of --with-libnuma. I > doubt that those should affect you (since you're mentioning an explicit > directory for libnuma), but I mention it to be complete. > > -- > Jeff Squyres > Cisco Systems > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >
Re: [OMPI users] libnuma issue
Francesco Pietra wrote: > I used --with-libnuma=/usr since Prentice Bisbal's suggestion and it > worked. Unfortunately, I found no way to fix the failure in finding > libimf.so when compiling openmpi-1.3.1 with intels, as you have seen > in other e-mail from me. And gnu compilers (which work well with both > openmpi and the slower code of my application) are defeated by the > faster code of my application. With limited hardware resources, I must > rely on that 40% speeding up. > To fix the libimf.so problem you need to include the path to Intel's libimf.so in your LD_LIBRARY_PATH environment variable. On my system, I installed v11.074 of the Intel compilers in /usr/local/intel, so my libimf.so file is located here: /usr/local/intel/Compiler/11.0/074/lib/intel64/libimf.so So I just add that to my LD_LIBRARY_PATH: LD_LIBRARY_PATH=/usr/local/intel/Compiler/11.0/074/lib/intel64:$LD_LIBRARY_PATH export LD_LIBRARY_PATH Now I can run whatever programs need libimf.so without any problems. In your case, you'll want to that before your make command. Here's exactly what I use to compile OpenMPI with the Intel Compilers: export PATH=/usr/local/intel/Compiler/11.0/074/bin/intel64:$PATH export LD_LIBRARY_PATH=/usr/local/intel/Compiler/11.0/074/lib/intel64:$LD_LIBRARY_PATH ../configure CC=icc CXX=icpc F77=ifort FC=ifort --prefix=/usr/local/openmpi-1.2.8/intel-11/x86_64 --disable-ipv6 --with-sge --with-openib --enable-static -- Prentice
Re: [OMPI users] libnuma issue
On Wed, Apr 15, 2009 at 8:39 PM, Prentice Bisbal wrote: > Francesco Pietra wrote: >> I used --with-libnuma=/usr since Prentice Bisbal's suggestion and it >> worked. Unfortunately, I found no way to fix the failure in finding >> libimf.so when compiling openmpi-1.3.1 with intels, as you have seen >> in other e-mail from me. And gnu compilers (which work well with both >> openmpi and the slower code of my application) are defeated by the >> faster code of my application. With limited hardware resources, I must >> rely on that 40% speeding up. >> > > To fix the libimf.so problem you need to include the path to Intel's > libimf.so in your LD_LIBRARY_PATH environment variable. On my system, I > installed v11.074 of the Intel compilers in /usr/local/intel, so my > libimf.so file is located here: > > /usr/local/intel/Compiler/11.0/074/lib/intel64/libimf.so > > So I just add that to my LD_LIBRARY_PATH: > > LD_LIBRARY_PATH=/usr/local/intel/Compiler/11.0/074/lib/intel64:$LD_LIBRARY_PATH > export LD_LIBRARY_PATH Just a clarification: With my system I use the latest intels version 10, 10.1.2.024, and mkl 10.1.2.024 because it proved difficult to make a debian package with version 11. At echo $LD_LIBRARY_PATH /opt/intel/mkl/10.1.2.024/lib/em64t:/opt/intel/cce/10.1.022/lib:opt/intel/fce/10.1.022/lib:/usr/local/lib (that /lib contains libimf.so) That results from sourcing in my .bashrc: . /opt/intel/fce/10.1.022/bin/ifortvars.sh . /opt/intel/cce/10.1.022/bin/iccvars.sh Did you suppress that sourcing before exporting the LD_EXPORT_PATH to the library at issue? Having so much turned around the proble, it is not unlikely that I am messing myself. thanks francesco > > Now I can run whatever programs need libimf.so without any problems. In > your case, you'll want to that before your make command. > > Here's exactly what I use to compile OpenMPI with the Intel Compilers: > > export PATH=/usr/local/intel/Compiler/11.0/074/bin/intel64:$PATH > > export > LD_LIBRARY_PATH=/usr/local/intel/Compiler/11.0/074/lib/intel64:$LD_LIBRARY_PATH > > ../configure CC=icc CXX=icpc F77=ifort FC=ifort > --prefix=/usr/local/openmpi-1.2.8/intel-11/x86_64 --disable-ipv6 > --with-sge --with-openib --enable-static > > -- > Prentice > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >
Re: [OMPI users] libnuma issue
You could try statically linking the Intel-provided libraries. Use LDFLAGS=-static-intel --Nysal On Wed, 2009-04-15 at 21:03 +0200, Francesco Pietra wrote: > On Wed, Apr 15, 2009 at 8:39 PM, Prentice Bisbal wrote: > > Francesco Pietra wrote: > >> I used --with-libnuma=/usr since Prentice Bisbal's suggestion and it > >> worked. Unfortunately, I found no way to fix the failure in finding > >> libimf.so when compiling openmpi-1.3.1 with intels, as you have seen > >> in other e-mail from me. And gnu compilers (which work well with both > >> openmpi and the slower code of my application) are defeated by the > >> faster code of my application. With limited hardware resources, I must > >> rely on that 40% speeding up. > >> > > > > To fix the libimf.so problem you need to include the path to Intel's > > libimf.so in your LD_LIBRARY_PATH environment variable. On my system, I > > installed v11.074 of the Intel compilers in /usr/local/intel, so my > > libimf.so file is located here: > > > > /usr/local/intel/Compiler/11.0/074/lib/intel64/libimf.so > > > > So I just add that to my LD_LIBRARY_PATH: > > > > LD_LIBRARY_PATH=/usr/local/intel/Compiler/11.0/074/lib/intel64:$LD_LIBRARY_PATH > > export LD_LIBRARY_PATH > > Just a clarification: With my system I use the latest intels version > 10, 10.1.2.024, and mkl 10.1.2.024 because it proved difficult to make > a debian package with version 11. At > > echo $LD_LIBRARY_PATH > > /opt/intel/mkl/10.1.2.024/lib/em64t:/opt/intel/cce/10.1.022/lib:opt/intel/fce/10.1.022/lib:/usr/local/lib > > (that /lib contains libimf.so) > > That results from sourcing in my .bashrc: > > . /opt/intel/fce/10.1.022/bin/ifortvars.sh > . /opt/intel/cce/10.1.022/bin/iccvars.sh > > Did you suppress that sourcing before exporting the LD_EXPORT_PATH to > the library at issue? Having so much turned around the proble, it is > not unlikely that I am messing myself. > > thanks > francesco > > > > > > Now I can run whatever programs need libimf.so without any problems. In > > your case, you'll want to that before your make command. > > > > Here's exactly what I use to compile OpenMPI with the Intel Compilers: > > > > export PATH=/usr/local/intel/Compiler/11.0/074/bin/intel64:$PATH > > > > export > > LD_LIBRARY_PATH=/usr/local/intel/Compiler/11.0/074/lib/intel64:$LD_LIBRARY_PATH > > > > ../configure CC=icc CXX=icpc F77=ifort FC=ifort > > --prefix=/usr/local/openmpi-1.2.8/intel-11/x86_64 --disable-ipv6 > > --with-sge --with-openib --enable-static > > > > -- > > Prentice > > ___ > > users mailing list > > us...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users