[OMPI devel] MPI_Ireduce_scatter_block hangs
Hello, In the case of single process the MPI_Ireduce_scatter_block is segfaulting with v1.9a1r26786. But in other cases (commsize >= 2) processes hang in MPI_Ireduce_scatter_block. The NBC_Progress hangs during rounds processing. The following example illustrates this problem: $ cat ireduce_scatter_block_test.c #include #include #include int main(int argc, char **argv) { MPI_Request req; MPI_Status status; double *sbuf, *rbuf; int commsize, i, j, count = 10; MPI_Init(&argc, &argv); MPI_Comm_size(MPI_COMM_WORLD, &commsize); sbuf = malloc(sizeof(*sbuf) * count * commsize); rbuf = malloc(sizeof(*rbuf) * count); for (i = 0; i < commsize; i++) { for (j = 0; j < count; j++) { sbuf[i] = 1.0; } } MPI_Ireduce_scatter_block(sbuf, rbuf, count, MPI_DOUBLE, MPI_SUM, MPI_COMM_WORLD, &req); MPI_Wait(&req, &status); free(rbuf); free(sbuf); MPI_Finalize(); return 0; } -- Mikhail Kurnosov Computer Systems Department Siberian State University of Telecommunications and Information Sciences Address: 630102, 86 Kirova str., Novosibirsk, Russia Email: mkurno...@gmail.com http://cpct.sibsutis.ru/~mkurnosov
[OMPI devel] Still bothered / cannot run an application
(cross-post to 'users' and 'devel' mailing lists) Dear Open MPI developer, a long time ago, I reported about an error in Open MPI: http://www.open-mpi.org/community/lists/users/2012/02/18565.php Well, in the 1.6 the behaviour has changed: the test case don't hang forever and block an InfiniBand interface, but seem to run through, and now this error message is printed: -- The OpenFabrics (openib) BTL failed to register memory in the driver. Please check /var/log/messages or dmesg for driver specific failure reason. The failure occured here: Local host:mlx4_0 Device:openib_reg_mr Function: Cannot allocate memory() Errno says: You may need to consult with your system administrator to get this problem fixed. -- Looking into FAQ http://www.open-mpi.org/faq/?category=openfabrics#ib-locked-pages deliver us no hint about what is bad. The locked memory is unlimited: -- pk224850@linuxbdc02:~[502]$ cat /etc/security/limits.conf | grep memlock #- memlock - max locked-in-memory address space (KB) * hardmemlock unlimited * softmemlock unlimited -- Could it still be an Open MPI issue? Are you interested in reproduce this? Best, Paul Kapinos P.S: The same test with Intel MPI cannot run using DAPL, but run very fine opef 'ofa' (= native verbs as Open MPI use it). So I believe the problem is rooted in the communication pattern of the program; it send very LARGE messages to a lot of/all other processes. (The program perform an matrix transposition of a distributed matrix). -- Dipl.-Inform. Paul Kapinos - High Performance Computing, RWTH Aachen University, Center for Computing and Communication Seffenter Weg 23, D 52074 Aachen (Germany) Tel: +49 241/80-24915 smime.p7s Description: S/MIME Cryptographic Signature
Re: [OMPI devel] [EXTERNAL] MPI_Ireduce_scatter_block hangs
Hello - Thank you for the bug report. This has been fixed in the trunk. Brian On 7/12/12 1:46 AM, "Mikhail Kurnosov" wrote: >Hello, > >In the case of single process the MPI_Ireduce_scatter_block is >segfaulting with v1.9a1r26786. > >But in other cases (commsize >= 2) processes hang in >MPI_Ireduce_scatter_block. The NBC_Progress hangs during rounds >processing. > >The following example illustrates this problem: > >$ cat ireduce_scatter_block_test.c >#include >#include >#include > >int main(int argc, char **argv) >{ > MPI_Request req; > MPI_Status status; > double *sbuf, *rbuf; > int commsize, i, j, count = 10; > > MPI_Init(&argc, &argv); > MPI_Comm_size(MPI_COMM_WORLD, &commsize); > > sbuf = malloc(sizeof(*sbuf) * count * commsize); > rbuf = malloc(sizeof(*rbuf) * count); > for (i = 0; i < commsize; i++) { > for (j = 0; j < count; j++) { > sbuf[i] = 1.0; > } > } > > MPI_Ireduce_scatter_block(sbuf, rbuf, count, MPI_DOUBLE, > MPI_SUM, MPI_COMM_WORLD, &req); > MPI_Wait(&req, &status); > > free(rbuf); > free(sbuf); > MPI_Finalize(); > > return 0; >} > >-- >Mikhail Kurnosov >Computer Systems Department >Siberian State University of Telecommunications and Information Sciences >Address: 630102, 86 Kirova str., Novosibirsk, Russia >Email: mkurno...@gmail.com >http://cpct.sibsutis.ru/~mkurnosov >___ >devel mailing list >de...@open-mpi.org >http://www.open-mpi.org/mailman/listinfo.cgi/devel > >
[OMPI devel] Summary of the problem with r26626
After some digging Terry and I discovered the problem with r26626. To perform an rdma transaction pmls used to explicitly promote the seg_addr from prepare_src/dst to 64-bits before sending it over the wire. The other end would then (inconsistently) use the lval to perform the get/put. Segments are now opaque objects so the pmls simply memcpy the segments into the rdma header (without promoting seg_addr). So, right now we have a mixture of lvals and pvals in the put and get paths which will not work in two cases: 32-bit bit, and mixed 32/64-bit environments. I can think of a few ways to fix this: - Require the pmls to explicitly promote seg_addr to 64-bits after the memcpy. This is a band aid fix but I can implement/commit it very quickly (this will work fine until a more permanent solution is found). - Require prepare_src/dst to return segments with 64-bit addresses for all rdma fragments (0 == reserve). This is relatively simple for most btls but a little more complicated for openib. The openib btl may pack data for a get/put into a send segment. The obvious way to handle this case is to set the lval in prepare_src and restore the pval when the send fragment is returned. - Change the btl interface in a way that allows the btl to prepare segments specifically to be sent to another machine. This is a bit more complicated and would require lots of discussion and an RFC. I am open to suggestions. -Nathan
Re: [OMPI devel] RFC: enable the use of source in platform files
I wouln't consider sourced variables being overritten by the sourcing platform file a problem. I can update the platform file documentation to make sure this behavior is clear. Can you point me at the right faq page? -Nathan On Mon, Jul 09, 2012 at 08:36:41PM -0700, Ralph Castain wrote: > Okay, it took me awhile to grok thru all this, and I now understand how it is > working. You do have a question, though, with duplicated entries. At the > moment, we ignore any entry that is duplicated on the configure cmd line - > i.e., if you set something in a platform file, and then attempt to also set > it on the cmd line, we ignore the cmd line (without warning). In this case, > an entry in the first file that is duplicated in the second file gets > overwritten, also without warning. > > Dunno if that's an issue or not, but something to be aware of. > > > On Jul 9, 2012, at 4:52 PM, Ralph Castain wrote: > > > I keep scratching my head over this, and I just cannot figure out how this > > is going to do what you think. We "source" the platform file solely to > > execute any envar settings that are in it - i.e., to capture the CFLAGS=... > > and other such directives. We then read/parse the platform file to get all > > the configure directives - e.g., enable_debug=yes. Sourcing the platform > > file will set the envars, but won't capture the rest of the directives. > > > > Am I missing something here? It doesn't sound like you've really even tried > > this yet - sure, chaining "source" commands will work, but do you actually > > get the desired configuration?? > > > > Hence my comment about needing to modify the parser so it ALSO follows the > > "source" directive. > > > > On Jul 9, 2012, at 3:58 PM, Nathan Hjelm wrote: > > > >> On Mon, Jul 09, 2012 at 03:31:33PM -0700, Ralph Castain wrote: > >>> So if I understand this right, you would have multiple platform files, > >>> each "sourcing" a common one that contains the base directives? It sounds > >>> to me like you need more than the change below to make that work - you > >>> would need to interpret the platform file itself to read and execute a > >>> "source" directive inside it, wouldn't you? > >> > >> That is exactly what I want to do. The change in the RFC is the only one > >> needed as platform files are sourced by ompi_load_platform.m4. This means > >> platforms can contain arbitrary m4/shell code (including the source > >> directive)! I tried my patch with a one line platform file that sourced an > >> existing platform file and it worked as expected. > >> > >>> It would really help if your change (either comments or the RFC) actually > >>> explained what the heck you are doing so I wouldn't have to waste hours > >>> trying to figure out the impact of this patch :-/ > >>> > >> > >> The RFC does explain what the patch does but I guess I could have > >> elaborated on the implications. > >> > >> Before the patch we source the platform file then cd into the platform > >> directory to find the mca parameters file. If a platform file were to have > >> a source directive then it would have to be relative to the build > >> directory (or absolute). By cding into the platform file's directory > >> before sourcing the platform file and source directives are relative to > >> the platform file's directory (or absolute). There is no impact outside of > >> m4/shell commands within the platform file that read/write/stat files. > >> > >> I will add some additional comments before the commit (if there are no > >> objects of course) elaborating on the change. > >> > >> -Nathan > >> ___ > >> devel mailing list > >> de...@open-mpi.org > >> http://www.open-mpi.org/mailman/listinfo.cgi/devel > > > > > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel