Re: [OMPI users] ORTE_ERROR: orte_ess_base_open failed

2012-08-26 Thread Ralph Castain
I have no further ideas, I'm afraid. The only thing I can see is that your 
directory tree doesn't look right - if /usr/local is your prefix, then there 
should be a /usr/local/lib/openmpi directory, and the .la's should be in there.

You might try reinstalling it to somewhere other than /usr/local - perhaps put 
it somewhere under your home directory instead so you don't need root 
permissions to do the install. See if the directory tree looks any different.

It would also help to see your configure line, and know something more about 
your system. It looks like you have slurm, so I assume this is some kind of 
Linux box?


On Aug 26, 2012, at 7:23 PM, Shanthini Kannan  wrote:

> Hello Ralph,
> /usr/local/lib is in my LD_LIBRARY_PATH.
> I am running the right version of mpirun and I do have all permissions on 
> them.
> 
> Thanks!
> Shanthini
> 
> On Fri, Aug 24, 2012 at 7:30 PM, Ralph Castain  wrote:
> And just to be sure - /usr/local/lib is in your ld_lib_path, yes?
> 
> You might also check the permissions to ensure you can read them. Also, check 
> "which mpirun" - let's make sure you are running the one you think!
> 
> On Aug 24, 2012, at 4:22 PM, Shanthini Kannan  wrote:
> 
>> Thanks Ralph.
>> My prefix is /usr/local and I see that mca_ess_env.la is present in 
>> /usr/local/lib directory.
>>  
>> -bash-4.2# pwd
>> /usr/local/lib
>> -bash-4.2# ls mca_ess*
>> mca_ess_env.la  mca_ess_singleton.la  mca_ess_slurm.la   mca_ess_tool.la
>> mca_ess_env.so  mca_ess_singleton.so  mca_ess_slurm.so   mca_ess_tool.so
>> mca_ess_hnp.la  mca_ess_slave.la  mca_ess_slurmd.la
>> mca_ess_hnp.so  mca_ess_slave.so  mca_ess_slurmd.so
>> -bash-4.2#
>> 
>> On Fri, Aug 24, 2012 at 7:13 PM, Ralph Castain  wrote:
>> Check you /lib directory - there should be an openmpi directory 
>> under it, and that should have a bunch of libs in it. One of those should be 
>> mca_ess_env.la
>> 
>> Is it there?
>> 
>> On Aug 24, 2012, at 3:27 PM, Shanthini Kannan  wrote:
>> 
>>> I had the OMPI lib directory added in /etc/ld.so.conf.
>>> I also added it in LD_LIBRARY_PATH, but it made no difference.
>>> When I call mpirun, should I specify the MCA in command-line?
>>> Thanks!
>>> 
>>> On Fri, Aug 24, 2012 at 2:07 PM, Ralph Castain  wrote:
>>> I suspect your LD_LIBRARY_PATH doesn't include the OMPI lib location
>>> 
>>> On Aug 24, 2012, at 10:58 AM, Shanthini Kannan  
>>> wrote:
>>> 
 Hello,
 I am running mpptest over Open MPI (v1.5.4). 
 I get the following error saying component "env" in Framework "ess" is not 
 found. Am I missing something?  I am new to MPI and any help you can offer 
 is appreciated.
 
 A requested component was not found, or was unable to be opened.  This
 means that this component is either not installed or is unable to be
 used on your system (e.g., sometimes this means that shared libraries
 that the component requires are unable to be found/loaded).  Note that
 Open MPI stopped checking at the first component that it did not find.
 
 Host:  AV8
 Framework: ess
 Component: env
 --
 [AV8:05354] [[INVALID],INVALID] ORTE_ERROR_LOG: Error in file 
 runtime/orte_init.c at line 120
 --
 It looks like orte_init failed for some reason; your parallel process is
 likely to abort.  There are many reasons that a parallel process can
 fail during orte_init; some of which are due to configuration or
 environment problems.  This failure appears to be an internal failure;
 here's some additional information (which may only be relevant to an
 Open MPI developer):
 
   orte_ess_base_open failed
   --> Returned value Error (-1) instead of ORTE_SUCCESS
 
 Thanks!
 Shanthini
 ___
 users mailing list
 us...@open-mpi.org
 http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> 
>>> 
>>> ___
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> 
>>> ___
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> 
>> 
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> 
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 

Re: [OMPI users] ORTE_ERROR: orte_ess_base_open failed

2012-08-26 Thread Shanthini Kannan
Hello Ralph,
/usr/local/lib is in my LD_LIBRARY_PATH.
I am running the right version of mpirun and I do have all permissions on
them.

Thanks!
Shanthini

On Fri, Aug 24, 2012 at 7:30 PM, Ralph Castain  wrote:

> And just to be sure - /usr/local/lib is in your ld_lib_path, yes?
>
> You might also check the permissions to ensure you can read them. Also,
> check "which mpirun" - let's make sure you are running the one you think!
>
> On Aug 24, 2012, at 4:22 PM, Shanthini Kannan 
> wrote:
>
> Thanks Ralph.
> My prefix is /usr/local and I see that mca_ess_env.la is present in
> /usr/local/lib directory.
>
> -bash-4.2# pwd
> /usr/local/lib
> -bash-4.2# ls mca_ess*
> mca_ess_env.la  mca_ess_singleton.la  mca_ess_slurm.la   mca_ess_tool.la
> mca_ess_env.so  mca_ess_singleton.so  mca_ess_slurm.so   mca_ess_tool.so
> mca_ess_hnp.la  mca_ess_slave.la  mca_ess_slurmd.la
> mca_ess_hnp.so  mca_ess_slave.so  mca_ess_slurmd.so
> -bash-4.2#
>
> On Fri, Aug 24, 2012 at 7:13 PM, Ralph Castain  wrote:
>
>> Check you /lib directory - there should be an openmpi directory
>> under it, and that should have a bunch of libs in it. One of those should
>> be mca_ess_env.la
>>
>> Is it there?
>>
>> On Aug 24, 2012, at 3:27 PM, Shanthini Kannan 
>> wrote:
>>
>> I had the OMPI lib directory added in /etc/ld.so.conf.
>> I also added it in LD_LIBRARY_PATH, but it made no difference.
>> When I call mpirun, should I specify the MCA in command-line?
>> Thanks!
>>
>> On Fri, Aug 24, 2012 at 2:07 PM, Ralph Castain  wrote:
>>
>>> I suspect your LD_LIBRARY_PATH doesn't include the OMPI lib location
>>>
>>> On Aug 24, 2012, at 10:58 AM, Shanthini Kannan 
>>> wrote:
>>>
>>> Hello,
>>> I am running mpptest over Open MPI (v1.5.4).
>>> I get the following error saying component "env" in Framework "ess" is
>>> not found. Am I missing something?  I am new to MPI and any help you can
>>> offer is appreciated.
>>>
>>> A requested component was not found, or was unable to be opened.  This
>>> means that this component is either not installed or is unable to be
>>> used on your system (e.g., sometimes this means that shared libraries
>>> that the component requires are unable to be found/loaded).  Note that
>>> Open MPI stopped checking at the first component that it did not find.
>>>
>>> Host:  AV8
>>> Framework: ess
>>> Component: env
>>>
>>> --
>>> [AV8:05354] [[INVALID],INVALID] ORTE_ERROR_LOG: Error in file
>>> runtime/orte_init.c at line 120
>>>
>>> --
>>> It looks like orte_init failed for some reason; your parallel process is
>>> likely to abort.  There are many reasons that a parallel process can
>>> fail during orte_init; some of which are due to configuration or
>>> environment problems.  This failure appears to be an internal failure;
>>> here's some additional information (which may only be relevant to an
>>> Open MPI developer):
>>>
>>>   orte_ess_base_open failed
>>>   --> Returned value Error (-1) instead of ORTE_SUCCESS
>>>
>>> Thanks!
>>> Shanthini
>>>  ___
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>>
>>>
>>> ___
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>
>>
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>


Re: [OMPI users] MPI_Irecv: Confusion with <> inputy parameter

2012-08-26 Thread devendra rai
Hello Hristo, Jeff,

Thanks a lot for your note. I understand the concept much better now. In fact, 
now I understand why the word "maximum number of elements in the receive 
buffer" in all of the documentation means.

However, I still think that the online documentation is confusing (and little 
vague), and could be worded better. It is worsened by the fact that all other 
sites simply copy the description verbatim.

Thanks a lot anyway!

Devendra





 From: "Iliev, Hristo" 
To: devendra rai  
Cc: Open MPI Users  
Sent: Tuesday, 21 August 2012, 10:37
Subject: Re: [OMPI users] MPI_Irecv: Confusion with <> inputy 
parameter
 

Hello, Devendra,

Sending and receiving messages in MPI are atomic operations - they complete 
only when the whole message was sent or received. MPI_Test only tells you if 
the operation has completed - there is no indication like "30% of the message 
was sent/received, stay tuned for more".

On the sender side, the message is constructed by taking bytes from various 
locations in memory, specified by the type map of the MPI datatype used. Then 
on the receiver side the message is deconstructed back into memory by placing 
the received bytes according to the type map of the MPI datatype provided. The 
combination of receive datatype and receive count gives you a certain number of 
bytes (that is the type size obtainable by MPI_Type_size times "count"). If the 
message is shorter, that means that some elements of the receive buffer will 
not be filled, which is OK - you can test exactly how many elements were filled 
with MPI_Get_count on the status of the receive operation. If the message was 
longer, however, there won't be enough place to put all the data that the 
message is carrying and an overflow error would occur.

This works best by example. Image that in one process you issue:

MPI_Send(data, 80, MPI_BYTE, ...);

This will send a message containing 80 elements of type byte. Now on the 
receiver side you issue:

MPI_Irecv(data, 160, MPI_BYTE, ..., );

What will happen is that the message will be received in its entirety since 80 
times the size of MPI_BYTE is less than or equal to 160 times the size of 
MPI_BYTE. Calling MPI_Test on "request" will produce true in the completion 
flag and you will get back a status variable (unless you provided 
MPI_STATUS_IGNORE) and then you can call:

MPI_Get_count(, MPI_BYTE, );

Now "count" will contain 80 - the actual number of elements received.

But if the receive operation was instead:

MPI_Irecv(data, 40, MPI_BYTE, ..., );

since 40 times the size of MPI_BYTE is less than the size of the message, there 
will be not enough space to receive the entire message and an overflow error 
would occur. The MPI_Irecv itself only initiates the receive operation and will 
not return an error. Rather you will obtain the overflow error in the MPI_ERROR 
field of the status argument, returned by MPI_Test (the test call itself will 
return MPI_SUCCESS).

Since MPI operations are atomic, you cannot send a message of 160 element and 
then receive it with two separate receives of size 80 - this is very important 
and it is difficult to grasp initially by people, who come to MPI from the 
traditional Unix network programming.

I would recommend that you head to http://www.mpi-forum.org/ and download from 
there the PDF of the latest MPI 2.2 standard (or order the printed book). 
Unlike many other standard documents this one is actually readable by normal 
people and contains many useful explanations and examples. Read through entire 
section 3.2 to get a better idea of how messaging works in MPI.

Hope that helps to clarify things,

Hristo


On 21.08.2012, at 10:01, devendra rai  wrote:

Hello Jeff and Hristo,
>
>Now I am completely confused:
>
>So, let's say, the complete reception requires 8192 bytes. And, I have:
>
>MPI_Irecv(
>    (void*)this->receivebuffer,/* the receive buffer */
>    this->receive_packetsize,  /* 80 */
>    MPI_BYTE,   /* The data type expected 
>*/
>    this->transmittingnode,    /* The node from which to 
>receive */
>    this->uniquetag,   /* Tag */
>    MPI_COMM_WORLD, /* Communicator */
>    _request  /* request handle */
>    );
>
>
>
>
>
>That means, the the MPI_Test will tell me that the reception is complete when 
>I have received the first 80 bytes. Correct?
>
>
>Next, let[s say that I have a receive buffer with a capacity of 160 bytes, 
>then, will overflow error occur here? Even if I have decided to receive a 
>large payload in chunks of 80 bytes?
>
>
>I am sorry, the manual and the API reference was too vague for me.
>
>
>Thanks a lot
>
>
>Devendra
>
>
>
> 

Re: [OMPI users] openmpi 1.6.1 Questions

2012-08-26 Thread Brock Palen
Thanks and super cool.

Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
bro...@umich.edu
(734)936-1985



On Aug 25, 2012, at 7:06 AM, Jeff Squyres wrote:

> On Aug 24, 2012, at 10:45 AM, Brock Palen wrote:
> 
>>> Right now we should be just warning if we can't register 3/4 of your 
>>> physical memory (we can't really test for anything more than that).  But it 
>>> doesn't abort.
>> Ok
>> 
>>> We could add a tunable that makes it abort in this case, if you think that 
>>> would be useful.
>> I think so, in my case that would mean a node is miss-configured, and rather 
>> than running slowly I want it brought to my attention, 
> 
> 
> Ok, this is easy enough to add.  Due to a PGI compilation issue, it looks 
> like we're going to probably roll a 1.6.2 in the immediate future; we can 
> include this in there.
> 
> -- 
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to: 
> http://www.cisco.com/web/about/doing_business/legal/cri/
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users