Re: [OMPI users] Open MPI 1.5.4/Intel XE make check failure (CentOS-5.6 x86_64) [FIXED]

2011-09-07 Thread Tru Huynh
On Fri, Sep 02, 2011 at 04:35:51PM +0200, Tru Huynh wrote:
> hi
> 
> same issue reported previously 
> http://www.open-mpi.org/community/lists/users/2011/03/15915.php
> 
> updated to version 1.5.4 for OpenMPI and Intel XE.
> I tried with 12.0.3.137, 12.0.4.191 and 12.0.5.220 (latest).
> 
> -> same failure for opal_datatype_test.

Fixed with 12.1.0.233 Build 20110811

Compiler issue.

Tru


[OMPI users] Can you set the gid of the processes created by mpirun?

2011-09-07 Thread Blosch, Edwin L
The mpirun command is invoked when the user's group is 'set group' to group 
650.  When the rank 0 process creates files, they have group ownership 650.  
But the user's login group is group 1040. The child processes that get started 
on other nodes run with group 1040, and the files they create have group 
ownership 1040.

Is there a way to tell mpirun to start the child processes with the same uid 
and gid as the rank 0 process?

Thanks


Re: [OMPI users] Can you set the gid of the processes created by mpirun?

2011-09-07 Thread Ralph Castain
On Sep 7, 2011, at 7:38 AM, Blosch, Edwin L wrote:

> The mpirun command is invoked when the user’s group is ‘set group’ to group 
> 650.  When the rank 0 process creates files, they have group ownership 650.  
> But the user’s login group is group 1040. The child processes that get 
> started on other nodes run with group 1040, and the files they create have 
> group ownership 1040.
>  
> Is there a way to tell mpirun to start the child processes with the same uid 
> and gid as the rank 0 process?

I'm afraid not - never came up before. Could be done, but probably not right 
away. What version are you using?

>  
> Thanks
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users



Re: [OMPI users] Open MPI 1.5.4/Intel XE make check failure (CentOS-5.6 x86_64) [FIXED]

2011-09-07 Thread Jeff Squyres
Whew.  

Thanks for letting us know.  :-)


On Sep 7, 2011, at 6:34 AM, Tru Huynh wrote:

> On Fri, Sep 02, 2011 at 04:35:51PM +0200, Tru Huynh wrote:
>> hi
>> 
>> same issue reported previously 
>> http://www.open-mpi.org/community/lists/users/2011/03/15915.php
>> 
>> updated to version 1.5.4 for OpenMPI and Intel XE.
>> I tried with 12.0.3.137, 12.0.4.191 and 12.0.5.220 (latest).
>> 
>> -> same failure for opal_datatype_test.
> 
> Fixed with 12.1.0.233 Build 20110811
> 
> Compiler issue.
> 
> Tru
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/




Re: [OMPI users] Can you set the gid of the processes created by mpirun?

2011-09-07 Thread Blosch, Edwin L
Ralph,

Thanks for the reply.   I'm using 1.4.2.

We have a job queueing system with a prioritization scheme where the priorities 
of jobs are in part a function of the group id.  This is why, for us, it is 
common that the initial mpirun command executes with a group other than the 
user's default group.   We also have some applications where each process 
writes data to disk, and the resulting collection of output files has mixed 
group permissions.  This creates problems --- mostly just inconvenience --- but 
I could imagine some security-conscious folks might be more concerned about it. 
  Also, if it's relevant, the OpenMPI we are using is built without support for 
the job-queueing system (our preference for various reasons).

Ed

From: Ralph Castain [mailto:r...@open-mpi.org]
Sent: Wednesday, September 07, 2011 8:53 AM
To: Open MPI Users
Subject: Re: [OMPI users] Can you set the gid of the processes created by 
mpirun?

On Sep 7, 2011, at 7:38 AM, Blosch, Edwin L wrote:


The mpirun command is invoked when the user's group is 'set group' to group 
650.  When the rank 0 process creates files, they have group ownership 650.  
But the user's login group is group 1040. The child processes that get started 
on other nodes run with group 1040, and the files they create have group 
ownership 1040.

Is there a way to tell mpirun to start the child processes with the same uid 
and gid as the rank 0 process?

I'm afraid not - never came up before. Could be done, but probably not right 
away. What version are you using?



Thanks
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



Re: [OMPI users] Can you set the gid of the processes created by mpirun?

2011-09-07 Thread Ralph Castain
I see - I'll try to devise a patch for you shortly.


On Sep 7, 2011, at 10:58 AM, Blosch, Edwin L wrote:

> Ralph,
>  
> Thanks for the reply.   I’m using 1.4.2.
>  
> We have a job queueing system with a prioritization scheme where the 
> priorities of jobs are in part a function of the group id.  This is why, for 
> us, it is common that the initial mpirun command executes with a group other 
> than the user’s default group.   We also have some applications where each 
> process writes data to disk, and the resulting collection of output files has 
> mixed group permissions.  This creates problems --- mostly just inconvenience 
> --- but I could imagine some security-conscious folks might be more concerned 
> about it.   Also, if it’s relevant, the OpenMPI we are using is built without 
> support for the job-queueing system (our preference for various reasons).
>  
> Ed
>  
> From: Ralph Castain [mailto:r...@open-mpi.org] 
> Sent: Wednesday, September 07, 2011 8:53 AM
> To: Open MPI Users
> Subject: Re: [OMPI users] Can you set the gid of the processes created by 
> mpirun?
>  
> On Sep 7, 2011, at 7:38 AM, Blosch, Edwin L wrote:
> 
> 
> The mpirun command is invoked when the user’s group is ‘set group’ to group 
> 650.  When the rank 0 process creates files, they have group ownership 650.  
> But the user’s login group is group 1040. The child processes that get 
> started on other nodes run with group 1040, and the files they create have 
> group ownership 1040.
>  
> Is there a way to tell mpirun to start the child processes with the same uid 
> and gid as the rank 0 process?
>  
> I'm afraid not - never came up before. Could be done, but probably not right 
> away. What version are you using?
> 
> 
>  
> Thanks
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>  
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users



Re: [OMPI users] Can you set the gid of the processes created by mpirun?

2011-09-07 Thread Reuti
Hi,

Am 07.09.2011 um 18:58 schrieb Blosch, Edwin L:

> Ralph,
>  
> Thanks for the reply.   I’m using 1.4.2.
>  
> We have a job queueing system with a prioritization scheme where the 
> priorities of jobs are in part a function of the group id.  This is why, for 
> us, it is common that the initial mpirun command executes with a group other 
> than the user’s default group.   We also have some applications where each 
> process writes data to disk, and the resulting collection of output files has 
> mixed group permissions.  This creates problems --- mostly just inconvenience 
> --- but I could imagine some security-conscious folks might be more concerned 
> about it.   Also, if it’s relevant, the OpenMPI we are using is built without 
> support for the job-queueing system (our preference for various reasons).

you mean you change the group id of the user before you submit the job? In 
GridEngine you can specify whether the actual group id should be used for the 
job, or the default login id.

Having a tight integration, also the slave processes will run with the same 
group id.

-- Reuti


>  Ed
>  
> From: Ralph Castain [mailto:r...@open-mpi.org] 
> Sent: Wednesday, September 07, 2011 8:53 AM
> To: Open MPI Users
> Subject: Re: [OMPI users] Can you set the gid of the processes created by 
> mpirun?
>  
> On Sep 7, 2011, at 7:38 AM, Blosch, Edwin L wrote:
> 
> 
> The mpirun command is invoked when the user’s group is ‘set group’ to group 
> 650.  When the rank 0 process creates files, they have group ownership 650.  
> But the user’s login group is group 1040. The child processes that get 
> started on other nodes run with group 1040, and the files they create have 
> group ownership 1040.
>  
> Is there a way to tell mpirun to start the child processes with the same uid 
> and gid as the rank 0 process?
>  
> I'm afraid not - never came up before. Could be done, but probably not right 
> away. What version are you using?
> 
> 
>  
> Thanks
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>  
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] MPI_Spawn error: Data unpack would read past end of buffer" (-26) instead of "Success"

2011-09-07 Thread Simone Pellegrini

On 09/06/2011 06:11 PM, Ralph Castain wrote:

Hmmm...well, nothing definitive there, I'm afraid.

All I can suggest is to remove/reduce the threading. Like I said, we aren't 
terribly thread safe at this time. I suspect you're stepping into one of those 
non-safe areas here.

Hopefully will do better in later releases.

Hi again,
I made some improvements on this problem myself. It looks like is not a 
related to threading and/or race conditions but instead to the behavior 
of MPI_Finalize invoked by the spawned processes. Apparently despite the 
spawned processes all invoke MPI_Finalize, the processes remains alive 
blocked on a semaphore. Therefore by spawning more and more processes I 
end up having hundreds of processes and slowly filling up all the 
available file descriptors.


I got this hint by running my code with mpich2. After a while I also get 
an error there related to file descriptors and since then it was easy to 
understand what was going on (you should made errors semantically more 
sound in open mpi).


By the way, I solved the problem by invoking MPI_Comm_disconnect on the 
inter-communicator I receive from the spawning task (MPI_Finalize is not 
enough). This makes the spawned tasks to close the parent communicator 
and terminate.


After this small change the system is more stable now and that specific 
error is gone. Unfortunately a different message showed up:


[arch-moto][[530,1],0][btl_tcp_frag.c:215:mca_btl_tcp_frag_recv] 
mca_btl_tcp_frag_recv: readv failed: Bad file descriptor (9)


This error doesn't make the program terminate.

Some other times I get an hard error, which is:
[err] event_queue_remove: 0x7fb5fc008c58(fd 14) not on queue 8
[arch-moto][[14492,46],0][btl_tcp_endpoint.c:638:mca_btl_tcp_endpoint_complete_connect] 
connect() to 192.168.88.1 failed: Connection refused (111)
[arch-moto:09536] [[14492,0],0] ORTE_ERROR_LOG: A message is attempting 
to be sent to a process whose contact information is unknown in file 
rml_oob_send.c at line 145

[arch-moto:09536] [[14492,0],0] attempted to send to [[14492,1],0]: tag 6
[arch-moto:09536] [[14492,0],0] ORTE_ERROR_LOG: A message is attempting 
to be sent to a process whose contact information is unknown in file 
base/plm_base_receive.c at line 278

--
mpirun has exited due to process rank 0 with PID 9538 on
node arch-moto exiting improperly. There are two reasons this could occur:

1. this process did not call "init" before exiting, but others in
the job did. This can cause a job to hang indefinitely while it waits
for all processes to call "init". By rule, if one process calls "init",
then ALL processes must call "init" prior to termination.

2. this process called "init", but exited without calling "finalize".
By rule, all processes that call "init" MUST call "finalize" prior to
exiting or it will be considered an "abnormal termination"

This may have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).
--

any hints from this?

cheers, Simone



On Sep 6, 2011, at 1:20 PM, Simone Pellegrini wrote:


On 09/06/2011 04:58 PM, Ralph Castain wrote:

On Sep 6, 2011, at 12:49 PM, Simone Pellegrini wrote:


On 09/06/2011 02:57 PM, Ralph Castain wrote:

Hi Simone

Just to clarify: is your application threaded? Could you please send the OMPI 
configure cmd you used?

yes, it is threaded. There are basically 3 threads, 1 for the outgoing messages 
(MPI_send), 1 for incoming messages (MPI_Iprobe / MPI_Recv) and one spawning.

I am not sure what you mean with OMPI configure cmd I used... I simply do 
mpirun --np 1 ./executable

How was OMPI configured when it was installed? If you didn't install it, then 
provide the output of ompi_info - it will tell us.

[@arch-moto tasksys]$ ompi_info
 Package: Open MPI nobody@alderaan Distribution
Open MPI: 1.5.3
   Open MPI SVN revision: r24532
   Open MPI release date: Mar 16, 2011
Open RTE: 1.5.3
   Open RTE SVN revision: r24532
   Open RTE release date: Mar 16, 2011
OPAL: 1.5.3
   OPAL SVN revision: r24532
   OPAL release date: Mar 16, 2011
Ident string: 1.5.3
  Prefix: /usr
Configured architecture: x86_64-unknown-linux-gnu
  Configure host: alderaan
   Configured by: nobody
   Configured on: Thu Jul  7 13:21:35 UTC 2011
  Configure host: alderaan
Built by: nobody
Built on: Thu Jul  7 13:27:08 UTC 2011
  Built host: alderaan
  C bindings: yes
C++ bindings: yes
  Fortran77 bindings: yes (all)
  Fortran90 bindings: yes
Fortran90 bindings size: small
  C compiler: gcc
 C compiler absolute: /usr/bin/gcc
  C compiler family name: GNU
  C compiler version: 4.6.1
C++ co

Re: [OMPI users] MPI_Spawn error: Data unpack would read past end of buffer" (-26) instead of "Success"

2011-09-07 Thread Jeff Squyres
On Sep 7, 2011, at 4:03 PM, Simone Pellegrini wrote:

> By the way, I solved the problem by invoking MPI_Comm_disconnect on the 
> inter-communicator I receive from the spawning task (MPI_Finalize is not 
> enough). This makes the spawned tasks to close the parent communicator and 
> terminate.

This is correct MPI behavior.

Just having spawned processes call Finalize is not sufficient, because they are 
still "connected" to the parent(s) who spawned them, meaning that you can 
eventually run out of resources.

Having your children disconnect before finalizing is definitely a good idea.

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/




Re: [OMPI users] Can you set the gid of the processes created by mpirun?

2011-09-07 Thread Ed Blosch
Typically it is something like 'qsub -W group_list=groupB 
myjob.sh'. Ultimately myjob.sh runs with gid groupB on some host in the
cluster.  When that script reaches the mpirun command, then mpirun and the
processes started on the same host all run with gid groupB, but any of the
spawned processes that start on other hosts run with the user's default
group, say groupA.

It did occur to me that the launching technique might have some ability to
influence this behavior as you indicated. I don't know what launcher is
being used in our cases, I guess it's rsh/ssh.

-Original Message-
From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On
Behalf Of Reuti
Sent: Wednesday, September 07, 2011 12:24 PM
To: Open MPI Users
Subject: Re: [OMPI users] Can you set the gid of the processes created by
mpirun?

Hi,

you mean you change the group id of the user before you submit the job? In
GridEngine you can specify whether the actual group id should be used for
the job, or the default login id.

Having a tight integration, also the slave processes will run with the same
group id.

-- Reuti


>  Ed
>  
> From: Ralph Castain [mailto:r...@open-mpi.org] 
> Sent: Wednesday, September 07, 2011 8:53 AM
> To: Open MPI Users
> Subject: Re: [OMPI users] Can you set the gid of the processes created by
mpirun?
>  
> On Sep 7, 2011, at 7:38 AM, Blosch, Edwin L wrote:
> 
> 
> The mpirun command is invoked when the user's group is 'set group' to
group 650.  When the rank 0 process creates files, they have group ownership
650.  But the user's login group is group 1040. The child processes that get
started on other nodes run with group 1040, and the files they create have
group ownership 1040.
>  
> Is there a way to tell mpirun to start the child processes with the same
uid and gid as the rank 0 process?
>  
> I'm afraid not - never came up before. Could be done, but probably not
right away. What version are you using?
> 
> 
>  
> Thanks
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>  
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users