[OMPI devel] MPI_Ireduce_scatter_block hangs

2012-07-12 Thread Mikhail Kurnosov

Hello,

In the case of single process the MPI_Ireduce_scatter_block is 
segfaulting with v1.9a1r26786.


But in other cases (commsize >= 2) processes hang in 
MPI_Ireduce_scatter_block. The NBC_Progress hangs during rounds processing.


The following example illustrates this problem:

$ cat ireduce_scatter_block_test.c
#include 
#include 
#include 

int main(int argc, char **argv)
{
MPI_Request req;
MPI_Status status;
double *sbuf, *rbuf;
int commsize, i, j, count = 10;

MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &commsize);

sbuf = malloc(sizeof(*sbuf) * count * commsize);
rbuf = malloc(sizeof(*rbuf) * count);
for (i = 0; i < commsize; i++) {
for (j = 0; j < count; j++) {
sbuf[i] = 1.0;
}
}

MPI_Ireduce_scatter_block(sbuf, rbuf, count, MPI_DOUBLE,
  MPI_SUM, MPI_COMM_WORLD, &req);
MPI_Wait(&req, &status);

free(rbuf);
free(sbuf);
MPI_Finalize();

return 0;
}

--
Mikhail Kurnosov
Computer Systems Department
Siberian State University of Telecommunications and Information Sciences
Address: 630102, 86 Kirova str., Novosibirsk, Russia
Email: mkurno...@gmail.com
http://cpct.sibsutis.ru/~mkurnosov


[OMPI devel] Still bothered / cannot run an application

2012-07-12 Thread Paul Kapinos

(cross-post to 'users' and 'devel' mailing lists)

Dear Open MPI developer,
a long time ago, I reported about an error in Open MPI:
http://www.open-mpi.org/community/lists/users/2012/02/18565.php

Well, in the 1.6 the behaviour has changed: the test case don't hang forever and 
block an InfiniBand interface, but seem to run through, and now this error 
message is printed:

--
The OpenFabrics (openib) BTL failed to register memory in the driver.
Please check /var/log/messages or dmesg for driver specific failure
reason.
The failure occured here:

  Local host:mlx4_0
  Device:openib_reg_mr
  Function:  Cannot allocate memory()
  Errno says:

You may need to consult with your system administrator to get this
problem fixed.
--



Looking into FAQ
http://www.open-mpi.org/faq/?category=openfabrics#ib-locked-pages
deliver us no hint about what is bad. The locked memory is unlimited:
--
pk224850@linuxbdc02:~[502]$ cat /etc/security/limits.conf | grep memlock
#- memlock - max locked-in-memory address space (KB)
*   hardmemlock unlimited
*   softmemlock unlimited
--


Could it still be an Open MPI issue? Are you interested in reproduce this?

Best,
Paul Kapinos

P.S: The same test with Intel MPI cannot run using DAPL, but run very fine opef 
'ofa' (= native verbs as Open MPI use it). So I believe the problem is rooted in 
the communication pattern of the program; it send very LARGE messages to a lot 
of/all other processes. (The program perform an matrix transposition of a 
distributed matrix).


--
Dipl.-Inform. Paul Kapinos   -   High Performance Computing,
RWTH Aachen University, Center for Computing and Communication
Seffenter Weg 23,  D 52074  Aachen (Germany)
Tel: +49 241/80-24915



smime.p7s
Description: S/MIME Cryptographic Signature


Re: [OMPI devel] [EXTERNAL] MPI_Ireduce_scatter_block hangs

2012-07-12 Thread Barrett, Brian W
Hello -

Thank you for the bug report.  This has been fixed in the trunk.

Brian

On 7/12/12 1:46 AM, "Mikhail Kurnosov"  wrote:

>Hello,
>
>In the case of single process the MPI_Ireduce_scatter_block is
>segfaulting with v1.9a1r26786.
>
>But in other cases (commsize >= 2) processes hang in
>MPI_Ireduce_scatter_block. The NBC_Progress hangs during rounds
>processing.
>
>The following example illustrates this problem:
>
>$ cat ireduce_scatter_block_test.c
>#include 
>#include 
>#include 
>
>int main(int argc, char **argv)
>{
> MPI_Request req;
> MPI_Status status;
> double *sbuf, *rbuf;
> int commsize, i, j, count = 10;
>
> MPI_Init(&argc, &argv);
> MPI_Comm_size(MPI_COMM_WORLD, &commsize);
>
> sbuf = malloc(sizeof(*sbuf) * count * commsize);
> rbuf = malloc(sizeof(*rbuf) * count);
> for (i = 0; i < commsize; i++) {
> for (j = 0; j < count; j++) {
> sbuf[i] = 1.0;
> }
> }
>
> MPI_Ireduce_scatter_block(sbuf, rbuf, count, MPI_DOUBLE,
>   MPI_SUM, MPI_COMM_WORLD, &req);
> MPI_Wait(&req, &status);
>
> free(rbuf);
> free(sbuf);
> MPI_Finalize();
>
> return 0;
>}
>
>--
>Mikhail Kurnosov
>Computer Systems Department
>Siberian State University of Telecommunications and Information Sciences
>Address: 630102, 86 Kirova str., Novosibirsk, Russia
>Email: mkurno...@gmail.com
>http://cpct.sibsutis.ru/~mkurnosov
>___
>devel mailing list
>de...@open-mpi.org
>http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
>






[OMPI devel] Summary of the problem with r26626

2012-07-12 Thread Nathan Hjelm
After some digging Terry and I discovered the problem with r26626. To perform 
an rdma transaction pmls used to explicitly promote the seg_addr from 
prepare_src/dst to 64-bits before sending it over the wire. The other end would 
then (inconsistently) use the lval to perform the get/put. Segments are now 
opaque objects so the pmls simply memcpy the segments into the rdma header 
(without promoting seg_addr). So, right now we have a mixture of lvals and 
pvals in the put and get paths which will not work in two cases: 32-bit bit, 
and mixed 32/64-bit environments.

I can think of a few ways to fix this:

 - Require the pmls to explicitly promote seg_addr to 64-bits after the memcpy. 
This is a band aid fix but I can implement/commit it very quickly (this will 
work fine until a more permanent solution is found).
 - Require prepare_src/dst to return segments with 64-bit addresses for all 
rdma fragments (0 == reserve). This is relatively simple for most btls but a 
little more complicated for openib. The openib btl may pack data for a get/put 
into a send segment. The obvious way to handle this case is to set the lval in 
prepare_src and restore the pval when the send fragment is returned.
 - Change the btl interface in a way that allows the btl to prepare segments 
specifically to be sent to another machine. This is a bit more complicated and 
would require lots of discussion and an RFC.

I am open to suggestions.

-Nathan


Re: [OMPI devel] RFC: enable the use of source in platform files

2012-07-12 Thread Nathan Hjelm
I wouln't consider sourced variables being overritten by the sourcing platform 
file a problem. I can update the platform file documentation to make sure this 
behavior is clear. Can you point me at the right faq page?

-Nathan

On Mon, Jul 09, 2012 at 08:36:41PM -0700, Ralph Castain wrote:
> Okay, it took me awhile to grok thru all this, and I now understand how it is 
> working. You do have a question, though, with duplicated entries. At the 
> moment, we ignore any entry that is duplicated on the configure cmd line - 
> i.e., if you set something in a platform file, and then attempt to also set 
> it on the cmd line, we ignore the cmd line (without warning). In this case, 
> an entry in the first file that is duplicated in the second file gets 
> overwritten, also without warning.
> 
> Dunno if that's an issue or not, but something to be aware of.
> 
> 
> On Jul 9, 2012, at 4:52 PM, Ralph Castain wrote:
> 
> > I keep scratching my head over this, and I just cannot figure out how this 
> > is going to do what you think. We "source" the platform file solely to 
> > execute any envar settings that are in it - i.e., to capture the CFLAGS=... 
> > and other such directives. We then read/parse the platform file to get all 
> > the configure directives - e.g., enable_debug=yes. Sourcing the platform 
> > file will set the envars, but won't capture the rest of the directives.
> > 
> > Am I missing something here? It doesn't sound like you've really even tried 
> > this yet - sure, chaining "source" commands will work, but do you actually 
> > get the desired configuration??
> > 
> > Hence my comment about needing to modify the parser so it ALSO follows the 
> > "source" directive.
> > 
> > On Jul 9, 2012, at 3:58 PM, Nathan Hjelm wrote:
> > 
> >> On Mon, Jul 09, 2012 at 03:31:33PM -0700, Ralph Castain wrote:
> >>> So if I understand this right, you would have multiple platform files, 
> >>> each "sourcing" a common one that contains the base directives? It sounds 
> >>> to me like you need more than the change below to make that work - you 
> >>> would need to interpret the platform file itself to read and execute a 
> >>> "source" directive inside it, wouldn't you?
> >> 
> >> That is exactly what I want to do. The change in the RFC is the only one 
> >> needed as platform files are sourced by ompi_load_platform.m4. This means 
> >> platforms can contain arbitrary m4/shell code (including the source 
> >> directive)! I tried my patch with a one line platform file that sourced an 
> >> existing platform file and it worked as expected.
> >> 
> >>> It would really help if your change (either comments or the RFC) actually 
> >>> explained what the heck you are doing so I wouldn't have to waste hours 
> >>> trying to figure out the impact of this patch :-/
> >>> 
> >> 
> >> The RFC does explain what the patch does but I guess I could have 
> >> elaborated on the implications.
> >> 
> >> Before the patch we source the platform file then cd into the platform 
> >> directory to find the mca parameters file. If a platform file were to have 
> >> a source directive then it would have to be relative to the build 
> >> directory (or absolute). By cding into the platform file's directory 
> >> before sourcing the platform file and source directives are relative to 
> >> the platform file's directory (or absolute). There is no impact outside of 
> >> m4/shell commands within the platform file that read/write/stat files.
> >> 
> >> I will add some additional comments before the commit (if there are no 
> >> objects of course) elaborating on the change.
> >> 
> >> -Nathan
> >> ___
> >> devel mailing list
> >> de...@open-mpi.org
> >> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> > 
> 
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel