[OMPI devel] Defect from ticket #3079 still present in 1.6.1rc1
Hello, 3 months ago I opened a ticket about an extra local data copy being made in the pairwise alltoallv implementation in the "tuned" module that can hurt performance in some cases: https://svn.open-mpi.org/trac/ompi/ticket/3079 As far as I can see the milestone was set to Open MPI 1.6.1 and although it was quite trivial to fix (and I have submitted the appropriate patch with the ticket), the defect is still present in the latest revision of the 1.6 branch and also in trunk. Given that in most cluster cases OMPI ends up using "tuned" and that 1.6.1rc1 makes the pairwise algorithm the default, shouldn't this defect have been fixed by now? Kind regards, Hristo Iliev -- Hristo Iliev, Ph.D. -- High Performance Computing RWTH Aachen University, Center for Computing and Communication Rechen- und Kommunikationszentrum der RWTH Aachen Seffenter Weg 23, D 52074 Aachen (Germany) Tel: +49 241 80 24367 -- Fax/UMS: +49 241 80 624367 smime.p7s Description: S/MIME cryptographic signature
[OMPI devel] Broken password recovery functionality of the Trac system
Hello, The password recovery functionality of the Trac system at svn.open-mpi.org appears to be broken. After providing the required user name and e-mail address, one is presented with the following "Oops": Trac detected an internal error: UnboundLocalError: local variable 'u_id' referenced before assignment There was an internal error in Trac. It is recommended that you notify your local Trac administrator with the information needed to reproduce the issue. To that end, you could [Create] a ticket. The action that triggered the error was: POST: /reset_password I am being sent an e-mail containing the new password but I am unable to log in with it. I've tried it several times - 100% reproducible behaviour. Kind regards, Hristo -- Hristo Iliev, Ph.D. -- High Performance Computing RWTH Aachen University, Center for Computing and Communication Rechen- und Kommunikationszentrum der RWTH Aachen Seffenter Weg 23, D 52074 Aachen (Germany) Tel: +49 241 80 24367 -- Fax/UMS: +49 241 80 624367 smime.p7s Description: S/MIME cryptographic signature
Re: [OMPI devel] MPI_Reduce() is losing precision
Hi Santhosh, Numeric differences are to be expected with parallel applications. The basic reason for that is that on many architectures floating-point operations are performed using higher internal precision than that of the arguments and only the final result is rounded back to the lower output precision. When performing the same operation in parallel, intermediate results are communicated using the lower precision and thus the final result could differ. How much it would differ depends on the stability of the algorithm - it could be a slight difference in the last 1-2 significant bits or it could be a completely different result (e.g. when integrating chaotic dynamic systems). In your particular case with one process the MPI_Reduce is actually reduced to a no-op and the summing is done entirely in the preceding loop. With two processes the sum is broken into two parts which are computed with higher precision but converted to float before being communicated. You could try to "cure" this (non-problem) by telling your compiler to not use higher precision for intermediate results. Hope that helps, Hristo -- Hristo Iliev, Ph.D. -- High Performance Computing RWTH Aachen University, Center for Computing and Communication Rechen- und Kommunikationszentrum der RWTH Aachen Seffenter Weg 23, D 52074 Aachen (Germany) From: devel-boun...@open-mpi.org [mailto:devel-boun...@open-mpi.org] On Behalf Of Santhosh Kokala Sent: Monday, October 15, 2012 8:07 AM To: Open MPI Developers Subject: [OMPI devel] MPI_Reduce() is losing precision Hi All, I am having a strange problem with the floating precision. I get correct precision when I launch with one process, but when the same code is launched with 2 or more process I am losing precision in MPI_Redcue(., MPI_FLOAT, MPI_SUM..); call. Output from my code (admin)host:~$ mpirun -np 1 string 10 0.1 0.9 10 3 sum = 1 sum = 0.92 sum = 1.00043 (admin)host:~$ mpirun -np 2 string 10 0.1 0.9 10 3 sum = 1 sum = 1 sum = 1.00049 As you can see I am loosing precision. Can someone help me fix this code? Last parameter to my code is the number of iterations. I am attaching source code to this email. Santhosh smime.p7s Description: S/MIME cryptographic signature
Re: [OMPI devel] FOSS for scientists devroom at FOSDEM 2013
I might attend out of curiosity - Brussels is just an hour and so from here. Kind regards, Hristo -- Hristo Iliev, Ph.D. -- High Performance Computing RWTH Aachen University, Center for Computing and Communication Rechen- und Kommunikationszentrum der RWTH Aachen Seffenter Weg 23, D 52074 Aachen (Germany) > -Original Message- > From: devel-boun...@open-mpi.org [mailto:devel-boun...@open-mpi.org] > On Behalf Of Jeff Squyres > Sent: Tuesday, November 20, 2012 4:17 PM > To: Open MPI Developers > Cc: foss4scientists-devr...@lists.fosdem.org > Subject: Re: [OMPI devel] FOSS for scientists devroom at FOSDEM 2013 > > Cool! Thanks for the invite. > > Do we have any European friends who would be able to attend this > conference? > > > On Nov 20, 2012, at 10:02 AM, Sylwester Arabas wrote: > > > Dear Open MPI Team, > > > > A day-long session ("devroom") on Free/Libre and Open Source Software > (FLOSS) for scientists will be held during the next FOSDEM conference, > Brussels, 2-3 February 2013 (http://fosdem.org/2013). > > > > We aim at having a dozen or two short talks introducing projects, > advertising brand new features of established tools, discussing issues > relevant to the development of software for scientific computing, and > touching on the interdependence of FLOSS and open science. > > > > You can find more info on the call for talks at: > > http://slayoo.github.com/fosdem2013/ > > > > The deadline for sending talk proposals is December 16th 2012. > > > > Please send your submissions or comments to: > > foss4scientists-devr...@lists.fosdem.org > > > > Please do forward this message to anyone potentially interested. Please > also let us know if you have any suggestions for what would you like to hear > about in the devroom. > > > > Looking forward to meeting you in Brussels. > > Thanks in advance. > > > > The conveners, > > Sylwester Arabas, Juan Antonio AƱel, Christos Siopis > > > > P.S. There are open calls for main-track talks, lightning talks, and stands at > FOSDEM as well, see: https://www.fosdem.org/2013/ > > > > -- > > http://www.igf.fuw.edu.pl/~slayoo/ > > ___ > > devel mailing list > > de...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/devel > > > -- > Jeff Squyres > jsquy...@cisco.com > For corporate legal information go to: > http://www.cisco.com/web/about/doing_business/legal/cri/ > > > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel smime.p7s Description: S/MIME cryptographic signature
[OMPI devel] KNEM + user-space hybrid for sm BTL
Hello, Could someone, who is more familiar with the architecture of the sm BTL, comment on the technical feasibility of the following: is it possible to easily extend the BTL (i.e. without having to rewrite it completely from scratch) so as to be able to perform transfers using both KNEM (or other kernel-assisted copying mechanism) for messages over a given size and the normal user-space mechanism for smaller messages with the switch-over point being a user-tunable parameter? >From what I've seen, both implementations have something in common, e.g. both use FIFOs to communicate controlling information. The motivation behind this are our efforts to become greener by extracting the best possible out of the box performance on our systems without having to profile each and every user application that runs on them. We've already determined that activating KNEM really benefits some collective operations on big shared-memory systems, but the increased latency significantly slows down small message transfers, which also hits the pipelined implementations. sm's code doesn't seem to be very complex but still I've decided to ask first before diving any deeper. Kind regards, Hristo -- Hristo Iliev, PhD - High Performance Computing Team RWTH Aachen University, Center for Computing and Communication Rechen- und Kommunikationszentrum der RWTH Aachen Seffenter Weg 23, D 52074 Aachen (Germany) smime.p7s Description: S/MIME cryptographic signature
[OMPI devel] Possible bug: rdma OSC does not progress RMA operations
Hi, It looks like the rmda OSC component does not progress passive RMA operations at the target during calls to MPI_WIN_(UN)LOCK. As a sample case take a master-worker program where each worker writes to an entry in an array exposed in the master's window: MPI_Comm_size(MPI_COMM_WORLD, &size); MPI_Comm_rank(MPI_COMM_WORLD, &rank); If (rank == 0) { // Master code MPI_Alloc_mem(size * sizeof(int), MPI_INFO_NULL, &array); MPI_Win_create(array, size * sizeof(int), sizeof(int), MPI_INFO_NULL, MPI_COMM_WORLD, &win); do { MPI_Win_lock(MPI_LOCK_EXCLUSIVE, 0, 0, win); nonzeros = count non-zero elements of array MPI_Win_unlock(0, win); } while(nonzeros < size-1); MPI_Win_free(&win); MPI_Free_mem(array); } else { // Worker code int one = 1; MPI_Win_create(NULL, 0, 1, MPI_INFO_NULL, MPI_COMM_WORLD, &win); // Postpone the RMA with a rank-specific time sleep(rank); MPI_Win_lock(MPI_LOCK_EXCLUSIVE, 0, 0, win); MPI_Put(&one, 1, MPI_INT, 0, rank, 1, MPI_INT, win); MPI_Win_unlock(0, win); MPI_Win_free(&win); } Attached is a complete sample program. The program hangs when run with the default MCA settings: $ mpirun -n 3 ./rma.x [1379003818.571960] 0 workers checked in [1379003819.571317] Worker 1 acquired lock [1379003819.571374] Worker 1 unlocking the window [1379003820.571342] Worker 2 acquired lock [1379003820.571384] Worker 2 unlocking the window On the other hand, it works as expected if pt2pt is forced: $ mpirun --mca osc pt2pt -n 3 ./rma.x | sort [1379003926.000442] 0 workers checked in [1379003926.998981] Worker 1 acquired lock [1379003926.999027] Worker 1 unlocking the window [1379003926.999076] Worker 1 synched [1379003926.999078] 1 workers checked in [1379003927.998917] Worker 2 acquired lock [1379003927.998940] Worker 2 unlocking the window [1379003927.998962] Worker 2 synched [1379003927.998964] 2 workers checked in [1379003927.998973] All workers checked in [1379003927.998996] Worker 1 done [1379003927.998996] Worker 2 done [1379003927.999099] Master finished All processes are started on the same host. Open MPI is 1.6.4 without progression thread. The output from ompi_info is attached. The same behaviour (hang with rdma, success with pt2pt) is observed when the tcp BTL is used and when all processes run on separate cluster nodes and talk via the openib BTL. Is this a bug in the rdma OSC component or does the sample program violate the MPI correctness requirements for RMA operations? Kind regards, Hristo -- Hristo Iliev, PhD - High Performance Computing Team RWTH Aachen University, Center for Computing and Communication Rechen- und Kommunikationszentrum der RWTH Aachen Seffenter Weg 23, D 52074 Aachen (Germany) Package: Open MPI pk224...@linuxbmc0601.rz.rwth-aachen.de Distribution Open MPI: 1.6.4 Open MPI SVN revision: r28081 Open MPI release date: Feb 19, 2013 Open RTE: 1.6.4 Open RTE SVN revision: r28081 Open RTE release date: Feb 19, 2013 OPAL: 1.6.4 OPAL SVN revision: r28081 OPAL release date: Feb 19, 2013 MPI API: 2.1 Ident string: 1.6.4 Prefix: /opt/MPI/openmpi-1.6.4/linux/intel Configured architecture: x86_64-unknown-linux-gnu Configure host: linuxbmc0601.rz.RWTH-Aachen.DE Configured by: pk224850 Configured on: Wed May 22 17:01:57 CEST 2013 Configure host: linuxbmc0601.rz.RWTH-Aachen.DE Built by: pk224850 Built on: Wed May 22 17:18:51 CEST 2013 Built host: linuxbmc0601.rz.RWTH-Aachen.DE C bindings: yes C++ bindings: yes Fortran77 bindings: yes (all) Fortran90 bindings: yes Fortran90 bindings size: small C compiler: icc C compiler absolute: /opt/intel/Compiler/11.1/080/bin/intel64/icc C compiler family name: INTEL C compiler version: 1110.20101201 C++ compiler: icpc C++ compiler absolute: /opt/intel/Compiler/11.1/080/bin/intel64/icpc Fortran77 compiler: ifort -nofor-main -f77rtl -fpconstant -intconstant Fortran77 compiler abs: /opt/intel/Compiler/11.1/080/bin/intel64/ifort Fortran90 compiler: ifort -nofor-main Fortran90 compiler abs: /opt/intel/Compiler/11.1/080/bin/intel64/ifort C profiling: yes C++ profiling: yes Fortran77 profiling: yes Fortran90 profiling: yes C++ exceptions: yes Thread support: posix (MPI_THREAD_MULTIPLE: no, progress: no) Sparse Groups: no Internal debug support: no MPI interface warnings: no MPI parameter check: runtime Memory profiling support: no Memory debugging support: no libltdl support: no Heterogeneous support: yes mpirun default --prefix: yes MPI I/O support: yes MPI_WTIME support: gettimeofday S