Re: [OMPI devel] binding with MCA parameters: broken or user error?

2009-10-12 Thread Terry Dontje
In regards to the "-mca XXX" option not overriding the file setting I 
thought I saw this working for v1.3.  However, I just retested this and 
I am seeing the same issue of the "-mca" option not affecting 
orte_process_binding or rmaps_base_schedule_policy.


This seems to work under the trunk.  I wonder if the issue might be 
something we did in r22050 where we stopped calling orte_register_params 
twice?  Not sure exactly why that would have prevented the mca option 
setting taking place the first time. 


--td

Ralph Castain wrote:
Try adding -display-devel-map to your cmd line so you can see what 
OMPI thinks the binding and mapping policy is set to - that'll tell 
you if the problem is in the mapping or in the daemon binding.


Also, it might help to know something about this node - like how many 
sockets, cores/socket.


On Oct 8, 2009, at 11:17 PM, Eugene Loh wrote:


Here are two problems with openmpi-1.3.4a1r22051

# Here, I try to run the moral equivalent of -bysocket -bind-to-socket,
# using the MCA parameter form specified on the mpirun command line.
# No binding results.  THIS IS PROBLEM 1.
% mpirun -np 5 --mca rmaps_base_schedule_policy socket --mca 
orte_process_binding socket -report-bindings hostname

saem9
saem9
saem9
saem9
saem9

# Same thing with the "core" form.
% mpirun -np 5 --mca rmaps_base_schedule_policy core --mca 
orte_process_binding core -report-bindings hostname

saem9
saem9
saem9
saem9
saem9

# Now, I set the MCA parameters as environment variables.
# I then check the spellings and confirm all is set using ompi_info.
% setenv OMPI_MCA_rmaps_base_schedule_policy socket
% setenv OMPI_MCA_orte_process_binding socket
% ompi_info -a | grep rmaps_base_schedule_policy
 MCA rmaps: parameter "rmaps_base_schedule_policy" 
(current value: "socket", data source: environment)

% ompi_info -a | grep orte_process_binding
  MCA orte: parameter "orte_process_binding" (current 
value: "socket", data source: environment)


# So, now I run a simple program.
# I get binding now, but I'm filling up the first socket before going 
to the second.

# THIS IS PROBLEM 2.
% mpirun -np 5 -report-bindings hostname
[saem9:23947] [[29741,0],0] odls:default:fork binding child 
[[29741,1],0] to socket 0 cpus 000f
[saem9:23947] [[29741,0],0] odls:default:fork binding child 
[[29741,1],1] to socket 0 cpus 000f
[saem9:23947] [[29741,0],0] odls:default:fork binding child 
[[29741,1],2] to socket 0 cpus 000f
[saem9:23947] [[29741,0],0] odls:default:fork binding child 
[[29741,1],3] to socket 0 cpus 000f
[saem9:23947] [[29741,0],0] odls:default:fork binding child 
[[29741,1],4] to socket 1 cpus 00f0

saem9
saem9
saem9
saem9
saem9

# Adding -bysocket to the command line fixes things.
% mpirun -np 5 -bysocket -report-bindings hostname
[saem9:23953] [[29751,0],0] odls:default:fork binding child 
[[29751,1],0] to socket 0 cpus 000f
[saem9:23953] [[29751,0],0] odls:default:fork binding child 
[[29751,1],1] to socket 1 cpus 00f0
[saem9:23953] [[29751,0],0] odls:default:fork binding child 
[[29751,1],2] to socket 0 cpus 000f
[saem9:23953] [[29751,0],0] odls:default:fork binding child 
[[29751,1],3] to socket 1 cpus 00f0
[saem9:23953] [[29751,0],0] odls:default:fork binding child 
[[29751,1],4] to socket 0 cpus 000f

saem9
saem9
saem9
saem9
saem9

Bug?  Or am I doing something wrong?
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




[OMPI devel] segv in coll tuned

2009-10-12 Thread Lenny Verkhovsky
Hi,
I experience the following error with current trunk r22090. It also occures
in 1.3 branch.
#~/work/svn/ompi/branches/1.3//build_x86-64/install/bin/mpirun -H witch21
-np 4 -mca coll_tuned_use_dynamic_rules 1 ./IMB-MPI1
Sometimes it's error, and sometimes it's segv. It recreates with np>4.
[witch21:26540] *** An error occurred in MPI_Barrier
[witch21:26540] *** on communicator MPI COMMUNICATOR 3 SPLIT FROM 0
[witch21:26540] *** MPI_ERR_ARG: invalid argument of some other kind
[witch21:26540] *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)
--
mpirun has exited due to process rank 0 with PID 26540 on
node witch21 exiting without calling "finalize". This may
have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).
--
3 total processes killed (some possibly by mpirun during cleanup)

thanks
Lenny.


Re: [OMPI devel] segv in coll tuned

2009-10-12 Thread Terry Dontje
Does that test also pass sometimes?  I am seeing some random set of 
tests segv'ing in the SM btl, using a v1.3 derivative.


--td
Lenny Verkhovsky wrote:

Hi,
I experience the following error with current trunk r22090. It also 
occures in 1.3 branch.
#~/work/svn/ompi/branches/1.3//build_x86-64/install/bin/mpirun -H 
witch21 -np 4 -mca coll_tuned_use_dynamic_rules 1 ./IMB-MPI1 
Sometimes it's error, and sometimes it's segv. It recreates with np>4.

[witch21:26540] *** An error occurred in MPI_Barrier
[witch21:26540] *** on communicator MPI COMMUNICATOR 3 SPLIT FROM 0
[witch21:26540] *** MPI_ERR_ARG: invalid argument of some other kind
[witch21:26540] *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)
--
mpirun has exited due to process rank 0 with PID 26540 on
node witch21 exiting without calling "finalize". This may
have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).
--
3 total processes killed (some possibly by mpirun during cleanup)

thanks
Lenny.


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel
  




Re: [OMPI devel] segv in coll tuned

2009-10-12 Thread Lenny Verkhovsky
not since I started testing it :)
it failes somewhere in ompi_coll_tuned_get_target_method_params function, I
am taking a look right now.

On Mon, Oct 12, 2009 at 3:33 PM, Terry Dontje  wrote:

> Does that test also pass sometimes?  I am seeing some random set of tests
> segv'ing in the SM btl, using a v1.3 derivative.
>
> --td
> Lenny Verkhovsky wrote:
>
>> Hi,
>> I experience the following error with current trunk r22090. It also
>> occures in 1.3 branch.
>> #~/work/svn/ompi/branches/1.3//build_x86-64/install/bin/mpirun -H witch21
>> -np 4 -mca coll_tuned_use_dynamic_rules 1 ./IMB-MPI1 Sometimes it's error,
>> and sometimes it's segv. It recreates with np>4.
>> [witch21:26540] *** An error occurred in MPI_Barrier
>> [witch21:26540] *** on communicator MPI COMMUNICATOR 3 SPLIT FROM 0
>> [witch21:26540] *** MPI_ERR_ARG: invalid argument of some other kind
>> [witch21:26540] *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)
>> --
>> mpirun has exited due to process rank 0 with PID 26540 on
>> node witch21 exiting without calling "finalize". This may
>> have caused other processes in the application to be
>> terminated by signals sent by mpirun (as reported here).
>> --
>> 3 total processes killed (some possibly by mpirun during cleanup)
>>
>> thanks
>> Lenny.
>> 
>>
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>>
>
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>


Re: [OMPI devel] segv in coll tuned

2009-10-12 Thread Lenny Verkhovsky
well, I see that it returnes 0 at this line,
since base_com_rule->n_msg_sizes==0
coll_tuned_dynamic_rules.c +359
if( (NULL == base_com_rule) || (0 == base_com_rule->n_msg_sizes)) {
  return (0);
  }

Sometimes it passes if I tell IMB -npmin 4.
On Mon, Oct 12, 2009 at 3:37 PM, Lenny Verkhovsky <
lenny.verkhov...@gmail.com> wrote:

> not since I started testing it :)
> it failes somewhere in ompi_coll_tuned_get_target_method_params function, I
> am taking a look right now.
>
> On Mon, Oct 12, 2009 at 3:33 PM, Terry Dontje wrote:
>
>> Does that test also pass sometimes?  I am seeing some random set of tests
>> segv'ing in the SM btl, using a v1.3 derivative.
>>
>> --td
>> Lenny Verkhovsky wrote:
>>
>>> Hi,
>>> I experience the following error with current trunk r22090. It also
>>> occures in 1.3 branch.
>>> #~/work/svn/ompi/branches/1.3//build_x86-64/install/bin/mpirun -H witch21
>>> -np 4 -mca coll_tuned_use_dynamic_rules 1 ./IMB-MPI1 Sometimes it's error,
>>> and sometimes it's segv. It recreates with np>4.
>>> [witch21:26540] *** An error occurred in MPI_Barrier
>>> [witch21:26540] *** on communicator MPI COMMUNICATOR 3 SPLIT FROM 0
>>> [witch21:26540] *** MPI_ERR_ARG: invalid argument of some other kind
>>> [witch21:26540] *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)
>>>
>>> --
>>> mpirun has exited due to process rank 0 with PID 26540 on
>>> node witch21 exiting without calling "finalize". This may
>>> have caused other processes in the application to be
>>> terminated by signals sent by mpirun (as reported here).
>>>
>>> --
>>> 3 total processes killed (some possibly by mpirun during cleanup)
>>>
>>> thanks
>>> Lenny.
>>> 
>>>
>>> ___
>>> devel mailing list
>>> de...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>
>>>
>>
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>
>


Re: [OMPI devel] binding with MCA parameters: broken or user error?

2009-10-12 Thread Ralph Castain
I fixed the process schedule issue on the trunk over the weekend (not  
moved to 1.3 yet while it "soaked") - the binding issue was working  
fine on the trunk.


I believe I applied the fix to stop calling register_params twice to  
1.3 already, but I can check.


On Oct 12, 2009, at 4:36 AM, Terry Dontje wrote:

In regards to the "-mca XXX" option not overriding the file setting  
I thought I saw this working for v1.3.  However, I just retested  
this and I am seeing the same issue of the "-mca" option not  
affecting orte_process_binding or rmaps_base_schedule_policy.


This seems to work under the trunk.  I wonder if the issue might be  
something we did in r22050 where we stopped calling  
orte_register_params twice?  Not sure exactly why that would have  
prevented the mca option setting taking place the first time.

--td

Ralph Castain wrote:
Try adding -display-devel-map to your cmd line so you can see what  
OMPI thinks the binding and mapping policy is set to - that'll tell  
you if the problem is in the mapping or in the daemon binding.


Also, it might help to know something about this node - like how  
many sockets, cores/socket.


On Oct 8, 2009, at 11:17 PM, Eugene Loh wrote:


Here are two problems with openmpi-1.3.4a1r22051

# Here, I try to run the moral equivalent of -bysocket -bind-to- 
socket,

# using the MCA parameter form specified on the mpirun command line.
# No binding results.  THIS IS PROBLEM 1.
% mpirun -np 5 --mca rmaps_base_schedule_policy socket --mca  
orte_process_binding socket -report-bindings hostname

saem9
saem9
saem9
saem9
saem9

# Same thing with the "core" form.
% mpirun -np 5 --mca rmaps_base_schedule_policy core --mca  
orte_process_binding core -report-bindings hostname

saem9
saem9
saem9
saem9
saem9

# Now, I set the MCA parameters as environment variables.
# I then check the spellings and confirm all is set using ompi_info.
% setenv OMPI_MCA_rmaps_base_schedule_policy socket
% setenv OMPI_MCA_orte_process_binding socket
% ompi_info -a | grep rmaps_base_schedule_policy
MCA rmaps: parameter  
"rmaps_base_schedule_policy" (current value: "socket", data  
source: environment)

% ompi_info -a | grep orte_process_binding
 MCA orte: parameter "orte_process_binding" (current  
value: "socket", data source: environment)


# So, now I run a simple program.
# I get binding now, but I'm filling up the first socket before  
going to the second.

# THIS IS PROBLEM 2.
% mpirun -np 5 -report-bindings hostname
[saem9:23947] [[29741,0],0] odls:default:fork binding child  
[[29741,1],0] to socket 0 cpus 000f
[saem9:23947] [[29741,0],0] odls:default:fork binding child  
[[29741,1],1] to socket 0 cpus 000f
[saem9:23947] [[29741,0],0] odls:default:fork binding child  
[[29741,1],2] to socket 0 cpus 000f
[saem9:23947] [[29741,0],0] odls:default:fork binding child  
[[29741,1],3] to socket 0 cpus 000f
[saem9:23947] [[29741,0],0] odls:default:fork binding child  
[[29741,1],4] to socket 1 cpus 00f0

saem9
saem9
saem9
saem9
saem9

# Adding -bysocket to the command line fixes things.
% mpirun -np 5 -bysocket -report-bindings hostname
[saem9:23953] [[29751,0],0] odls:default:fork binding child  
[[29751,1],0] to socket 0 cpus 000f
[saem9:23953] [[29751,0],0] odls:default:fork binding child  
[[29751,1],1] to socket 1 cpus 00f0
[saem9:23953] [[29751,0],0] odls:default:fork binding child  
[[29751,1],2] to socket 0 cpus 000f
[saem9:23953] [[29751,0],0] odls:default:fork binding child  
[[29751,1],3] to socket 1 cpus 00f0
[saem9:23953] [[29751,0],0] odls:default:fork binding child  
[[29751,1],4] to socket 0 cpus 000f

saem9
saem9
saem9
saem9
saem9

Bug?  Or am I doing something wrong?
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] binding with MCA parameters: broken or user error?

2009-10-12 Thread Terry Dontje

Ralph Castain wrote:
I fixed the process schedule issue on the trunk over the weekend (not 
moved to 1.3 yet while it "soaked") - the binding issue was working 
fine on the trunk.

So there was an issue of "-mca orte_process_binding" not being interpreted?


I believe I applied the fix to stop calling register_params twice to 
1.3 already, but I can check.
No I was asking whether that fix might be causing the 
orte_process_binding mca param to not be interpreted.  But I think from 
what you say in the first paragraph I guess I probably was wrong.


--td



On Oct 12, 2009, at 4:36 AM, Terry Dontje wrote:

In regards to the "-mca XXX" option not overriding the file setting I 
thought I saw this working for v1.3.  However, I just retested this 
and I am seeing the same issue of the "-mca" option not affecting 
orte_process_binding or rmaps_base_schedule_policy.


This seems to work under the trunk.  I wonder if the issue might be 
something we did in r22050 where we stopped calling 
orte_register_params twice?  Not sure exactly why that would have 
prevented the mca option setting taking place the first time.

--td

Ralph Castain wrote:
Try adding -display-devel-map to your cmd line so you can see what 
OMPI thinks the binding and mapping policy is set to - that'll tell 
you if the problem is in the mapping or in the daemon binding.


Also, it might help to know something about this node - like how 
many sockets, cores/socket.


On Oct 8, 2009, at 11:17 PM, Eugene Loh wrote:


Here are two problems with openmpi-1.3.4a1r22051

# Here, I try to run the moral equivalent of -bysocket 
-bind-to-socket,

# using the MCA parameter form specified on the mpirun command line.
# No binding results.  THIS IS PROBLEM 1.
% mpirun -np 5 --mca rmaps_base_schedule_policy socket --mca 
orte_process_binding socket -report-bindings hostname

saem9
saem9
saem9
saem9
saem9

# Same thing with the "core" form.
% mpirun -np 5 --mca rmaps_base_schedule_policy core --mca 
orte_process_binding core -report-bindings hostname

saem9
saem9
saem9
saem9
saem9

# Now, I set the MCA parameters as environment variables.
# I then check the spellings and confirm all is set using ompi_info.
% setenv OMPI_MCA_rmaps_base_schedule_policy socket
% setenv OMPI_MCA_orte_process_binding socket
% ompi_info -a | grep rmaps_base_schedule_policy
MCA rmaps: parameter "rmaps_base_schedule_policy" 
(current value: "socket", data source: environment)

% ompi_info -a | grep orte_process_binding
 MCA orte: parameter "orte_process_binding" (current 
value: "socket", data source: environment)


# So, now I run a simple program.
# I get binding now, but I'm filling up the first socket before 
going to the second.

# THIS IS PROBLEM 2.
% mpirun -np 5 -report-bindings hostname
[saem9:23947] [[29741,0],0] odls:default:fork binding child 
[[29741,1],0] to socket 0 cpus 000f
[saem9:23947] [[29741,0],0] odls:default:fork binding child 
[[29741,1],1] to socket 0 cpus 000f
[saem9:23947] [[29741,0],0] odls:default:fork binding child 
[[29741,1],2] to socket 0 cpus 000f
[saem9:23947] [[29741,0],0] odls:default:fork binding child 
[[29741,1],3] to socket 0 cpus 000f
[saem9:23947] [[29741,0],0] odls:default:fork binding child 
[[29741,1],4] to socket 1 cpus 00f0

saem9
saem9
saem9
saem9
saem9

# Adding -bysocket to the command line fixes things.
% mpirun -np 5 -bysocket -report-bindings hostname
[saem9:23953] [[29751,0],0] odls:default:fork binding child 
[[29751,1],0] to socket 0 cpus 000f
[saem9:23953] [[29751,0],0] odls:default:fork binding child 
[[29751,1],1] to socket 1 cpus 00f0
[saem9:23953] [[29751,0],0] odls:default:fork binding child 
[[29751,1],2] to socket 0 cpus 000f
[saem9:23953] [[29751,0],0] odls:default:fork binding child 
[[29751,1],3] to socket 1 cpus 00f0
[saem9:23953] [[29751,0],0] odls:default:fork binding child 
[[29751,1],4] to socket 0 cpus 000f

saem9
saem9
saem9
saem9
saem9

Bug?  Or am I doing something wrong?
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] binding with MCA parameters: broken or user error?

2009-10-12 Thread Ralph Castain


On Oct 12, 2009, at 9:19 AM, Terry Dontje wrote:


Ralph Castain wrote:
I fixed the process schedule issue on the trunk over the weekend  
(not moved to 1.3 yet while it "soaked") - the binding issue was  
working fine on the trunk.
So there was an issue of "-mca orte_process_binding" not being  
interpreted?


I could not replicate the binding problem on the trunk. I haven't  
explored it further just yet.




I believe I applied the fix to stop calling register_params twice  
to 1.3 already, but I can check.
No I was asking whether that fix might be causing the  
orte_process_binding mca param to not be interpreted.  But I think  
from what you say in the first paragraph I guess I probably was wrong.


I don't see how, but I will look at it later.



--td



On Oct 12, 2009, at 4:36 AM, Terry Dontje wrote:

In regards to the "-mca XXX" option not overriding the file  
setting I thought I saw this working for v1.3.  However, I just  
retested this and I am seeing the same issue of the "-mca" option  
not affecting orte_process_binding or rmaps_base_schedule_policy.


This seems to work under the trunk.  I wonder if the issue might  
be something we did in r22050 where we stopped calling  
orte_register_params twice?  Not sure exactly why that would have  
prevented the mca option setting taking place the first time.

--td

Ralph Castain wrote:
Try adding -display-devel-map to your cmd line so you can see  
what OMPI thinks the binding and mapping policy is set to -  
that'll tell you if the problem is in the mapping or in the  
daemon binding.


Also, it might help to know something about this node - like how  
many sockets, cores/socket.


On Oct 8, 2009, at 11:17 PM, Eugene Loh wrote:


Here are two problems with openmpi-1.3.4a1r22051

# Here, I try to run the moral equivalent of -bysocket -bind-to- 
socket,
# using the MCA parameter form specified on the mpirun command  
line.

# No binding results.  THIS IS PROBLEM 1.
% mpirun -np 5 --mca rmaps_base_schedule_policy socket --mca  
orte_process_binding socket -report-bindings hostname

saem9
saem9
saem9
saem9
saem9

# Same thing with the "core" form.
% mpirun -np 5 --mca rmaps_base_schedule_policy core --mca  
orte_process_binding core -report-bindings hostname

saem9
saem9
saem9
saem9
saem9

# Now, I set the MCA parameters as environment variables.
# I then check the spellings and confirm all is set using  
ompi_info.

% setenv OMPI_MCA_rmaps_base_schedule_policy socket
% setenv OMPI_MCA_orte_process_binding socket
% ompi_info -a | grep rmaps_base_schedule_policy
   MCA rmaps: parameter  
"rmaps_base_schedule_policy" (current value: "socket", data  
source: environment)

% ompi_info -a | grep orte_process_binding
MCA orte: parameter "orte_process_binding" (current  
value: "socket", data source: environment)


# So, now I run a simple program.
# I get binding now, but I'm filling up the first socket before  
going to the second.

# THIS IS PROBLEM 2.
% mpirun -np 5 -report-bindings hostname
[saem9:23947] [[29741,0],0] odls:default:fork binding child  
[[29741,1],0] to socket 0 cpus 000f
[saem9:23947] [[29741,0],0] odls:default:fork binding child  
[[29741,1],1] to socket 0 cpus 000f
[saem9:23947] [[29741,0],0] odls:default:fork binding child  
[[29741,1],2] to socket 0 cpus 000f
[saem9:23947] [[29741,0],0] odls:default:fork binding child  
[[29741,1],3] to socket 0 cpus 000f
[saem9:23947] [[29741,0],0] odls:default:fork binding child  
[[29741,1],4] to socket 1 cpus 00f0

saem9
saem9
saem9
saem9
saem9

# Adding -bysocket to the command line fixes things.
% mpirun -np 5 -bysocket -report-bindings hostname
[saem9:23953] [[29751,0],0] odls:default:fork binding child  
[[29751,1],0] to socket 0 cpus 000f
[saem9:23953] [[29751,0],0] odls:default:fork binding child  
[[29751,1],1] to socket 1 cpus 00f0
[saem9:23953] [[29751,0],0] odls:default:fork binding child  
[[29751,1],2] to socket 0 cpus 000f
[saem9:23953] [[29751,0],0] odls:default:fork binding child  
[[29751,1],3] to socket 1 cpus 00f0
[saem9:23953] [[29751,0],0] odls:default:fork binding child  
[[29751,1],4] to socket 0 cpus 000f

saem9
saem9
saem9
saem9
saem9

Bug?  Or am I doing something wrong?
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] [OMPI users] cartofile

2009-10-12 Thread Eugene Loh




This e-mail was on the users alias... see
http://www.open-mpi.org/community/lists/users/2009/09/10710.php

There wasn't much response, so let me ask another question.  How about
if we remove the cartofile section from the DESCRIPTION section of the
OMPI mpirun man page?  It's a lot of text that illustrates how to
create a cartofile without saying anything about why one would want to
go to the trouble.  What does this impact?  What does it change? 
What's the motivation for doing this stuff?  What's this stuff good for?

Another alternative could be to move the cartofile description to a FAQ
page.

The mpirun man page is rather long and I was thinking that if we could
remove some "low impact" stuff out, we could improve the overall
signal-to-noise ratio of the page.

In any case, I personally would like to know what cartofiles are good
for.

Eugene Loh wrote:

  
  
Thank you, but I don't understand who is consuming this information for
what.  E.g., the mpirun man page describes the carto file, but doesn't
give users any indication whether they should be worrying about this.
  
Lenny Verkhovsky wrote:
  

Hi Eugene,

carto file is a file with a staic graph topology of your node.
in the opal/mca/carto/file/carto_file.h you can see example.
( yes I know that , it should be help/man list :) )
Basically it describes a map of your node and inside
interconnection.
Hopefully it will be discovered automatically someday, 
but for now you can describe your node manually.
Best regards 
Lenny.

On Thu, Sep 17, 2009 at 12:38 AM, Eugene
Loh 
wrote:
I
feel like I should know, but what's a cartofile?  I guess you supply
"topological" information about a host, but I can't tell how this
information is used by, say, mpirun.