Re: [OMPI devel] btl/openib: get_ib_dev_distance doesn't see processes as bound if the job has been launched by srun

2012-02-06 Thread nadia . derbey
Resending, as i didn't get any answer...

Regards,
Nadia
 
-- 
Nadia Derbey

 


devel-boun...@open-mpi.org wrote on 01/27/2012 05:38:34 PM:

> De : "nadia.derbey" 
> A : Open MPI Developers 
> Date : 01/27/2012 05:35 PM
> Objet : [OMPI devel] btl/openib: get_ib_dev_distance doesn't see 
> processes as bound if the job has been launched by srun
> Envoyé par : devel-boun...@open-mpi.org
> 
> Hi,
> 
> If a job is launched using "srun --resv-ports --cpu_bind:..." and slurm
> is configured with:
>TaskPlugin=task/affinity
>TaskPluginParam=Cpusets
> 
> each rank of that job is in a cpuset that contains a single CPU.
> 
> Now, if we use carto on top of this, the following happens in
> get_ib_dev_distance() (in btl/openib/btl_openib_component.c):
>. opal_paffinity_base_get_processor_info() is called to get the
>  number of logical processors (we get 1 due to the singleton cpuset)
>. we loop over that # of processors to check whether our process is
>  bound to one of them. In our case the loop will be executed only
>  once and we will never get the correct binding information.
>. if the process is bound actually get the distance to the device.
>  in our case we won't execute that part of the code.
> 
> The attached patch is a proposal to fix the issue.
> 
> Regards,
> Nadia
> [attachment "get_ib_dev_distance.patch" deleted by Nadia Derbey/FR/
> BULL] ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

get_ib_dev_distance.patch
Description: Binary data


Re: [OMPI devel] problem running mpirun and orted on same machine

2012-02-06 Thread Maurice Feskanich

Hi Ralph,

Is it always the case that a proc launched local to mpirun uses mpirun 
as the daemon?  Our engine is not local to the host that mpirun is on, 
it just happens to send the task back to that same host, and the grid 
engine system handles all the process starting.  If it is the case, is 
there any particular flag or option I should be using with the orted on 
the local host to indicate that it is local?  Should I even be starting 
an orted in this case, and if not, how would I start the proc?  Also, 
would it be safe to always decrease by one the maximum vpid used with 
the orteds for the other tasks?


Thanks,

Maury


On 02/03/12 11:24, Ralph Castain wrote:

No brilliant suggestion - it sounds like your plugin isn't accurately computing 
the number of daemons. When a proc is launched local to mpirun, it uses mpirun 
as the daemon - it doesn't start another daemon on the same node. If your 
plugin is doing so, or you are computing an extra daemon vpid that doesn't 
truly exist, then you will have problems.

On Feb 3, 2012, at 11:27 AM, Maurice Feskanich wrote:


Hi Folks,

I'm having a problem with running mpirun when one of the tasks winds up running 
on the same machine as mpirun.

A little background: our system uses a plugin to send tasks to grid engine.  We 
are currently using version 1.3.4 (we are not able to move to a newer version 
because of the requirements of the tools that use our system.)  Our code runs 
on Solaris (both Sparc and X86), and Linux.

What we are seeing is that sometimes mpirun gets a segmentaion violation at 
line 342 of plm_base_launch_support.c:

pdatorted[mev->sender.vpid]->state = ORTE_PROC_STATE_RUNNING;

Investigation has found that mev->sender.vpid is a number that is one greater 
than the number of non-nil elements in the pdatorted array.

Here is the dbx stacktrace:

t@1 (l@1) program terminated by signal SEGV (no mapping at the fault address)
Current function is process_orted_launch_report (optimized)
  342   pdatorted[mev->sender.vpid]->state = ORTE_PROC_STATE_RUNNING;
(dbx) where
current thread: t@1
=>[1] process_orted_launch_report(fd = ???, opal_event = ???, data = ???) (optimized), at 
0x7f491e60 (line ~342) in "plm_base_launch_support.c"
  [2] event_process_active(base = ???) (optimized), at 0x7f241d04 (line ~651) in 
"event.c"
  [3] opal_event_base_loop(base = ???, flags = ???) (optimized), at 0x7f242178 
(line ~823) in "event.c"
  [4] opal_event_loop(flags = ???) (optimized), at 0x7f241f98 (line ~730) in 
"event.c"
  [5] opal_progress() (optimized), at 0x7f21d484 (line ~189) in 
"opal_progress.c"
  [6] orte_plm_base_daemon_callback(num_daemons = ???) (optimized), at 0x7f492388 
(line ~459) in "plm_base_launch_support.c"  [7] orte_plm_dream_spawn(0x8f0ac, 
0x478560, 0x47868c, 0x12c, 0x7d305198, 0x8a8c), at 0x7d304a5c
  [8] orterun(argc = 11, argv = 0x7fffede8), line 748 in "orterun.c"
  [9] main(argc = 11, argv = 0x7fffede8), line 13 in "main.c"


The vpids we use when we start the orteds are 1-based, but the pdatorted array 
is zero-based.

Any help anyone can provide would be very appreciated.  Please don't hesitate 
to ask questions.

Thanks,

Maury Feskanich
  Oracle, Inc.
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


Re: [OMPI devel] problem running mpirun and orted on same machine

2012-02-06 Thread Ralph Castain

On Feb 6, 2012, at 10:35 AM, Maurice Feskanich wrote:

> Hi Ralph,
> 
> Is it always the case that a proc launched local to mpirun uses mpirun as the 
> daemon?

Yes - mpirun knows which procs are local and launches them. You can, however, 
tell mpirun -not- to use the local node for MPI procs by adding -no-use-local 
to the cmd line (or set the equiv mca param). Note that no MPI proc will then 
be launched on the node where mpirun resides if you set that flag.

>  Our engine is not local to the host that mpirun is on, it just happens to 
> send the task back to that same host, and the grid engine system handles all 
> the process starting.

Guess I'm confused. If SGE is starting the MPI procs, then why are you starting 
orteds at all? What purpose can they serve if the orteds aren't starting the 
procs themselves?

>  If it is the case, is there any particular flag or option I should be using 
> with the orted on the local host to indicate that it is local?

No orted can be local to mpirun - it will confuse the system.

>  Should I even be starting an orted in this case, and if not, how would I 
> start the proc?  Also, would it be safe to always decrease by one the maximum 
> vpid used with the orteds for the other tasks?

You just need to accurately compute the number of orteds being launched. There 
is one per node, minus the node where mpirun is executing.

I'm willing to chat about this on the phone, if it would help. Contact me 
off-list if you want to do so.


> 
> Thanks,
> 
> Maury
> 
> 
> On 02/03/12 11:24, Ralph Castain wrote:
>> No brilliant suggestion - it sounds like your plugin isn't accurately 
>> computing the number of daemons. When a proc is launched local to mpirun, it 
>> uses mpirun as the daemon - it doesn't start another daemon on the same 
>> node. If your plugin is doing so, or you are computing an extra daemon vpid 
>> that doesn't truly exist, then you will have problems.
>> 
>> On Feb 3, 2012, at 11:27 AM, Maurice Feskanich wrote:
>> 
>>> Hi Folks,
>>> 
>>> I'm having a problem with running mpirun when one of the tasks winds up 
>>> running on the same machine as mpirun.
>>> 
>>> A little background: our system uses a plugin to send tasks to grid engine. 
>>>  We are currently using version 1.3.4 (we are not able to move to a newer 
>>> version because of the requirements of the tools that use our system.)  Our 
>>> code runs on Solaris (both Sparc and X86), and Linux.
>>> 
>>> What we are seeing is that sometimes mpirun gets a segmentaion violation at 
>>> line 342 of plm_base_launch_support.c:
>>> 
>>>pdatorted[mev->sender.vpid]->state = ORTE_PROC_STATE_RUNNING;
>>> 
>>> Investigation has found that mev->sender.vpid is a number that is one 
>>> greater than the number of non-nil elements in the pdatorted array.
>>> 
>>> Here is the dbx stacktrace:
>>> 
>>> t@1 (l@1) program terminated by signal SEGV (no mapping at the fault 
>>> address)
>>> Current function is process_orted_launch_report (optimized)
>>>  342   pdatorted[mev->sender.vpid]->state = ORTE_PROC_STATE_RUNNING;
>>> (dbx) where
>>> current thread: t@1
>>> =>[1] process_orted_launch_report(fd = ???, opal_event = ???, data = ???) 
>>> (optimized), at 0x7f491e60 (line ~342) in 
>>> "plm_base_launch_support.c"
>>>  [2] event_process_active(base = ???) (optimized), at 0x7f241d04 
>>> (line ~651) in "event.c"
>>>  [3] opal_event_base_loop(base = ???, flags = ???) (optimized), at 
>>> 0x7f242178 (line ~823) in "event.c"
>>>  [4] opal_event_loop(flags = ???) (optimized), at 0x7f241f98 (line 
>>> ~730) in "event.c"
>>>  [5] opal_progress() (optimized), at 0x7f21d484 (line ~189) in 
>>> "opal_progress.c"
>>>  [6] orte_plm_base_daemon_callback(num_daemons = ???) (optimized), at 
>>> 0x7f492388 (line ~459) in "plm_base_launch_support.c"  [7] 
>>> orte_plm_dream_spawn(0x8f0ac, 0x478560, 0x47868c, 0x12c, 
>>> 0x7d305198, 0x8a8c), at 0x7d304a5c
>>>  [8] orterun(argc = 11, argv = 0x7fffede8), line 748 in "orterun.c"
>>>  [9] main(argc = 11, argv = 0x7fffede8), line 13 in "main.c"
>>> 
>>> 
>>> The vpids we use when we start the orteds are 1-based, but the pdatorted 
>>> array is zero-based.
>>> 
>>> Any help anyone can provide would be very appreciated.  Please don't 
>>> hesitate to ask questions.
>>> 
>>> Thanks,
>>> 
>>> Maury Feskanich
>>>  Oracle, Inc.
>>> ___
>>> devel mailing list
>>> de...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> 
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel