Re: [OMPI devel] [OMPI users] cartofile

2009-10-13 Thread Ralph Castain
We agreed on today's telecon to leave the code in the OMPI code base  
for now, but to remove the option from the mpirun man page since  
nobody can explain how to use it anyway.


Then we will wait in hope that someone(s) complete the coding of this  
"feature" and document its use.


On Oct 13, 2009, at 12:09 PM, Kenneth Lloyd wrote:

I agree with Terry and Eugene, but now what are we going to do about  
it?

This is a potentially very powerful feature.

Ken



-Original Message-
From: devel-boun...@open-mpi.org
[mailto:devel-boun...@open-mpi.org] On Behalf Of Terry Dontje
Sent: Tuesday, October 13, 2009 7:08 AM
To: Open MPI Developers
Subject: Re: [OMPI devel] [OMPI users] cartofile

After rereading the manpage for the umpteenth time I agree
with Eugene
that the information provided on cartofile is next to
useless.   Ok, so
you describe what your node looks like but what does mpirun
or libmpi do with that information?  Other than the option to
provide the cartofile it isn't obvious how a user or libmpi
uses this information.

I've looked on the faq and wiki and have not found anything
yet on how one "current" uses cartofile.

--td

Eugene Loh wrote:

This e-mail was on the users alias... see
http://www.open-mpi.org/community/lists/users/2009/09/10710.php

There wasn't much response, so let me ask another question.

How about

if we remove the cartofile section from the DESCRIPTION

section of the

OMPI mpirun man page?  It's a lot of text that illustrates how to
create a cartofile without saying anything about why one

would want to

go to the trouble.  What does this impact?  What does it change?
What's the motivation for doing this stuff?  What's this

stuff good for?


Another alternative could be to move the cartofile description to a
FAQ page.

The mpirun man page is rather long and I was thinking that

if we could

remove some "low impact" stuff out, we could improve the overall
signal-to-noise ratio of the page.

In any case, I personally would like to know what

cartofiles are good for.


Eugene Loh wrote:

Thank you, but I don't understand who is consuming this

information

for what.  E.g., the mpirun man page describes the carto file, but
doesn't give users any indication whether they should be worrying
about this.

Lenny Verkhovsky wrote:

Hi Eugene,
carto file is a file with a staic graph topology of your node.
in the opal/mca/carto/file/carto_file.h you can see example.
( yes I know that , it should be help/man list :) ) Basically it
describes a map of your node and inside interconnection.
Hopefully it will be discovered automatically someday,

but for now

you can describe your node manually.
Best regards
Lenny.

On Thu, Sep 17, 2009 at 12:38 AM, Eugene Loh > wrote:

   I feel like I should know, but what's a cartofile?  I

guess you

   supply "topological" information about a host, but I

can't tell

   how this information is used by, say, mpirun.




--

--

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] [OMPI users] cartofile

2009-10-13 Thread Kenneth Lloyd
I agree with Terry and Eugene, but now what are we going to do about it?
This is a potentially very powerful feature.

Ken


> -Original Message-
> From: devel-boun...@open-mpi.org 
> [mailto:devel-boun...@open-mpi.org] On Behalf Of Terry Dontje
> Sent: Tuesday, October 13, 2009 7:08 AM
> To: Open MPI Developers
> Subject: Re: [OMPI devel] [OMPI users] cartofile
> 
> After rereading the manpage for the umpteenth time I agree 
> with Eugene 
> that the information provided on cartofile is next to 
> useless.   Ok, so 
> you describe what your node looks like but what does mpirun 
> or libmpi do with that information?  Other than the option to 
> provide the cartofile it isn't obvious how a user or libmpi 
> uses this information.
> 
> I've looked on the faq and wiki and have not found anything 
> yet on how one "current" uses cartofile.
> 
> --td
> 
> Eugene Loh wrote:
> > This e-mail was on the users alias... see 
> > http://www.open-mpi.org/community/lists/users/2009/09/10710.php
> >
> > There wasn't much response, so let me ask another question. 
>  How about 
> > if we remove the cartofile section from the DESCRIPTION 
> section of the 
> > OMPI mpirun man page?  It's a lot of text that illustrates how to 
> > create a cartofile without saying anything about why one 
> would want to 
> > go to the trouble.  What does this impact?  What does it change?
> > What's the motivation for doing this stuff?  What's this 
> stuff good for?
> >
> > Another alternative could be to move the cartofile description to a 
> > FAQ page.
> >
> > The mpirun man page is rather long and I was thinking that 
> if we could 
> > remove some "low impact" stuff out, we could improve the overall 
> > signal-to-noise ratio of the page.
> >
> > In any case, I personally would like to know what 
> cartofiles are good for.
> >
> > Eugene Loh wrote:
> >> Thank you, but I don't understand who is consuming this 
> information 
> >> for what.  E.g., the mpirun man page describes the carto file, but 
> >> doesn't give users any indication whether they should be worrying 
> >> about this.
> >>
> >> Lenny Verkhovsky wrote:
> >>> Hi Eugene,
> >>> carto file is a file with a staic graph topology of your node.
> >>> in the opal/mca/carto/file/carto_file.h you can see example.
> >>> ( yes I know that , it should be help/man list :) ) Basically it 
> >>> describes a map of your node and inside interconnection.
> >>> Hopefully it will be discovered automatically someday, 
> but for now 
> >>> you can describe your node manually.
> >>> Best regards
> >>> Lenny.
> >>>
> >>> On Thu, Sep 17, 2009 at 12:38 AM, Eugene Loh  >>> > wrote:
> >>>
> >>> I feel like I should know, but what's a cartofile?  I 
> guess you
> >>> supply "topological" information about a host, but I 
> can't tell
> >>> how this information is used by, say, mpirun.
> >>>
> > 
> --
> > --
> >
> > ___
> > devel mailing list
> > de...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >   
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel



Re: [OMPI devel] [OMPI users] cartofile

2009-10-13 Thread Terry Dontje
I guess my problem with the manpage or any info on carto in general is 
that there is no text that describes what happens if you have a 
cartofile and how it affects a job when you pass it in. 


--td

Sylvain Jeaugey wrote:

We worked a bit on it and yes, there is some work to do :

* The syntax used to describe the various components is far from being 
consistent from one usage to another ("SOCKET", "NODE", ...). We 
manage to make things reading the various not up to date example files 
- but mainly the code.


* The auto-detect component does not seem to do anything. We 
implemented it, and planned to release it. For now the code is heavily 
based on linux kernel functionalities, but missing the needed ifdefs.


Also, we did a patch to dump in graphviz format the detected (or read) 
topology.


Not much time to work on this right now, but if anyone wants to work 
on it, we may help.


Sylvain

On Tue, 13 Oct 2009, Ralph Castain wrote:


Here is where OMPI uses it:

ompi/mca/btl/openib/btl_openib_component.c:1918:static 
opal_carto_graph_t *host_topo;
ompi/mca/btl/openib/btl_openib_component.c:1923:
opal_carto_base_node_t *device_node;
ompi/mca/btl/openib/btl_openib_component.c:1931:device_node = 
opal_carto_base_find_node(host_topo, device);
ompi/mca/btl/openib/btl_openib_component.c:1941: 
opal_carto_base_node_t *slot_node;
ompi/mca/btl/openib/btl_openib_component.c:1951:slot_node = 
opal_carto_base_find_node(host_topo, slot);
ompi/mca/btl/openib/btl_openib_component.c:1958:distance = 
opal_carto_base_spf(host_topo, slot_node, device_node);
ompi/mca/btl/openib/btl_openib_component.c:1989: 
opal_carto_base_get_host_graph(_topo, "Infiniband");
ompi/mca/btl/openib/btl_openib_component.c:1998: 
opal_carto_base_free_graph(host_topo);

ompi/mca/btl/sm/btl_sm.c:118:opal_carto_graph_t *topo;
ompi/mca/btl/sm/btl_sm.c:123:opal_carto_node_distance_t *dist;
ompi/mca/btl/sm/btl_sm.c:124:opal_carto_base_node_t *slot_node;
ompi/mca/btl/sm/btl_sm.c:129:if (OMPI_SUCCESS != 
opal_carto_base_get_host_graph(, "Memory")) {
ompi/mca/btl/sm/btl_sm.c:134: opal_value_array_init(, 
sizeof(opal_carto_node_distance_t));
ompi/mca/btl/sm/btl_sm.c:157: slot_node = 
opal_carto_base_find_node(topo, myslot);
ompi/mca/btl/sm/btl_sm.c:163: 
opal_carto_base_get_nodes_distance(topo, slot_node, "Memory", );
ompi/mca/btl/sm/btl_sm.c:168: dist = (opal_carto_node_distance_t 
*) opal_value_array_get_item(, 0);

ompi/mca/btl/sm/btl_sm.c:175: opal_carto_base_free_graph(topo);

No idea if it is of any value or not. I don't know of anyone who has 
ever written a carto file for a system, has any idea how to do so, or 
why they should. Looking at the code, it wouldn't appear to have any 
value on any of the machines at LANL, but I may be missing something 
- not a lot of help around to understand it.


On Oct 13, 2009, at 7:08 AM, Terry Dontje wrote:

After rereading the manpage for the umpteenth time I agree with 
Eugene that the information provided on cartofile is next to 
useless.   Ok, so you describe what your node looks like but what 
does mpirun or libmpi do with that information?  Other than the 
option to provide the cartofile it isn't obvious how a user or 
libmpi uses this information.


I've looked on the faq and wiki and have not found anything yet on 
how one "current" uses cartofile.


--td

Eugene Loh wrote:
This e-mail was on the users alias... see 
http://www.open-mpi.org/community/lists/users/2009/09/10710.php


There wasn't much response, so let me ask another question.  How 
about if we remove the cartofile section from the DESCRIPTION 
section of the OMPI mpirun man page?  It's a lot of text that 
illustrates how to create a cartofile without saying anything about 
why one would want to go to the trouble.  What does this impact?  
What does it change?  What's the motivation for doing this stuff?  
What's this stuff good for?


Another alternative could be to move the cartofile description to a 
FAQ page.


The mpirun man page is rather long and I was thinking that if we 
could remove some "low impact" stuff out, we could improve the 
overall signal-to-noise ratio of the page.


In any case, I personally would like to know what cartofiles are 
good for.


Eugene Loh wrote:
Thank you, but I don't understand who is consuming this 
information for what.  E.g., the mpirun man page describes the 
carto file, but doesn't give users any indication whether they 
should be worrying about this.


Lenny Verkhovsky wrote:

Hi Eugene,
carto file is a file with a staic graph topology of your node.
in the opal/mca/carto/file/carto_file.h you can see example.
( yes I know that , it should be help/man list :) )
Basically it describes a map of your node and inside 
interconnection.

Hopefully it will be discovered automatically someday,
but for now you can describe your node manually.
Best regards Lenny.

On Thu, Sep 17, 2009 at 12:38 AM, Eugene Loh 

Re: [OMPI devel] [OMPI users] cartofile

2009-10-13 Thread Sylvain Jeaugey

We worked a bit on it and yes, there is some work to do :

* The syntax used to describe the various components is far from being 
consistent from one usage to another ("SOCKET", "NODE", ...). We manage to 
make things reading the various not up to date example files - but mainly 
the code.


* The auto-detect component does not seem to do anything. We implemented 
it, and planned to release it. For now the code is heavily based on linux 
kernel functionalities, but missing the needed ifdefs.


Also, we did a patch to dump in graphviz format the detected (or read) 
topology.


Not much time to work on this right now, but if anyone wants to work on 
it, we may help.


Sylvain

On Tue, 13 Oct 2009, Ralph Castain wrote:


Here is where OMPI uses it:

ompi/mca/btl/openib/btl_openib_component.c:1918:static opal_carto_graph_t 
*host_topo;
ompi/mca/btl/openib/btl_openib_component.c:1923:opal_carto_base_node_t 
*device_node;
ompi/mca/btl/openib/btl_openib_component.c:1931:device_node = 
opal_carto_base_find_node(host_topo, device);
ompi/mca/btl/openib/btl_openib_component.c:1941: 
opal_carto_base_node_t *slot_node;
ompi/mca/btl/openib/btl_openib_component.c:1951:slot_node = 
opal_carto_base_find_node(host_topo, slot);
ompi/mca/btl/openib/btl_openib_component.c:1958:distance = 
opal_carto_base_spf(host_topo, slot_node, device_node);
ompi/mca/btl/openib/btl_openib_component.c:1989: 
opal_carto_base_get_host_graph(_topo, "Infiniband");
ompi/mca/btl/openib/btl_openib_component.c:1998: 
opal_carto_base_free_graph(host_topo);

ompi/mca/btl/sm/btl_sm.c:118:opal_carto_graph_t *topo;
ompi/mca/btl/sm/btl_sm.c:123:opal_carto_node_distance_t *dist;
ompi/mca/btl/sm/btl_sm.c:124:opal_carto_base_node_t *slot_node;
ompi/mca/btl/sm/btl_sm.c:129:if (OMPI_SUCCESS != 
opal_carto_base_get_host_graph(, "Memory")) {
ompi/mca/btl/sm/btl_sm.c:134: opal_value_array_init(, 
sizeof(opal_carto_node_distance_t));
ompi/mca/btl/sm/btl_sm.c:157: slot_node = opal_carto_base_find_node(topo, 
myslot);
ompi/mca/btl/sm/btl_sm.c:163: opal_carto_base_get_nodes_distance(topo, 
slot_node, "Memory", );
ompi/mca/btl/sm/btl_sm.c:168: dist = (opal_carto_node_distance_t *) 
opal_value_array_get_item(, 0);

ompi/mca/btl/sm/btl_sm.c:175: opal_carto_base_free_graph(topo);

No idea if it is of any value or not. I don't know of anyone who has ever 
written a carto file for a system, has any idea how to do so, or why they 
should. Looking at the code, it wouldn't appear to have any value on any of 
the machines at LANL, but I may be missing something - not a lot of help 
around to understand it.


On Oct 13, 2009, at 7:08 AM, Terry Dontje wrote:

After rereading the manpage for the umpteenth time I agree with Eugene that 
the information provided on cartofile is next to useless.   Ok, so you 
describe what your node looks like but what does mpirun or libmpi do with 
that information?  Other than the option to provide the cartofile it isn't 
obvious how a user or libmpi uses this information.


I've looked on the faq and wiki and have not found anything yet on how one 
"current" uses cartofile.


--td

Eugene Loh wrote:
This e-mail was on the users alias... see 
http://www.open-mpi.org/community/lists/users/2009/09/10710.php


There wasn't much response, so let me ask another question.  How about if 
we remove the cartofile section from the DESCRIPTION section of the OMPI 
mpirun man page?  It's a lot of text that illustrates how to create a 
cartofile without saying anything about why one would want to go to the 
trouble.  What does this impact?  What does it change?  What's the 
motivation for doing this stuff?  What's this stuff good for?


Another alternative could be to move the cartofile description to a FAQ 
page.


The mpirun man page is rather long and I was thinking that if we could 
remove some "low impact" stuff out, we could improve the overall 
signal-to-noise ratio of the page.


In any case, I personally would like to know what cartofiles are good for.

Eugene Loh wrote:
Thank you, but I don't understand who is consuming this information for 
what.  E.g., the mpirun man page describes the carto file, but doesn't 
give users any indication whether they should be worrying about this.


Lenny Verkhovsky wrote:

Hi Eugene,
carto file is a file with a staic graph topology of your node.
in the opal/mca/carto/file/carto_file.h you can see example.
( yes I know that , it should be help/man list :) )
Basically it describes a map of your node and inside interconnection.
Hopefully it will be discovered automatically someday,
but for now you can describe your node manually.
Best regards Lenny.

On Thu, Sep 17, 2009 at 12:38 AM, Eugene Loh > wrote:


  I feel like I should know, but what's a cartofile?  I guess you
  supply "topological" information about a host, but I can't tell
  how this information is used by, say, mpirun.



Re: [OMPI devel] [OMPI users] cartofile

2009-10-13 Thread Ralph Castain

Here is where OMPI uses it:

ompi/mca/btl/openib/btl_openib_component.c:1918:static  
opal_carto_graph_t *host_topo;
ompi/mca/btl/openib/btl_openib_component.c:1923: 
opal_carto_base_node_t *device_node;
ompi/mca/btl/openib/btl_openib_component.c:1931:device_node =  
opal_carto_base_find_node(host_topo, device);
ompi/mca/btl/openib/btl_openib_component.c:1941: 
opal_carto_base_node_t *slot_node;
ompi/mca/btl/openib/btl_openib_component.c:1951:slot_node =  
opal_carto_base_find_node(host_topo, slot);
ompi/mca/btl/openib/btl_openib_component.c:1958:distance =  
opal_carto_base_spf(host_topo, slot_node, device_node);
ompi/mca/btl/openib/btl_openib_component.c:1989: 
opal_carto_base_get_host_graph(_topo, "Infiniband");
ompi/mca/btl/openib/btl_openib_component.c:1998: 
opal_carto_base_free_graph(host_topo);

ompi/mca/btl/sm/btl_sm.c:118:opal_carto_graph_t *topo;
ompi/mca/btl/sm/btl_sm.c:123:opal_carto_node_distance_t *dist;
ompi/mca/btl/sm/btl_sm.c:124:opal_carto_base_node_t *slot_node;
ompi/mca/btl/sm/btl_sm.c:129:if (OMPI_SUCCESS !=  
opal_carto_base_get_host_graph(, "Memory")) {
ompi/mca/btl/sm/btl_sm.c:134: opal_value_array_init(, sizeof 
(opal_carto_node_distance_t));
ompi/mca/btl/sm/btl_sm.c:157: slot_node = opal_carto_base_find_node 
(topo, myslot);
ompi/mca/btl/sm/btl_sm.c:163: opal_carto_base_get_nodes_distance 
(topo, slot_node, "Memory", );
ompi/mca/btl/sm/btl_sm.c:168: dist = (opal_carto_node_distance_t  
*) opal_value_array_get_item(, 0);

ompi/mca/btl/sm/btl_sm.c:175: opal_carto_base_free_graph(topo);

No idea if it is of any value or not. I don't know of anyone who has  
ever written a carto file for a system, has any idea how to do so, or  
why they should. Looking at the code, it wouldn't appear to have any  
value on any of the machines at LANL, but I may be missing something -  
not a lot of help around to understand it.


On Oct 13, 2009, at 7:08 AM, Terry Dontje wrote:

After rereading the manpage for the umpteenth time I agree with  
Eugene that the information provided on cartofile is next to  
useless.   Ok, so you describe what your node looks like but what  
does mpirun or libmpi do with that information?  Other than the  
option to provide the cartofile it isn't obvious how a user or  
libmpi uses this information.


I've looked on the faq and wiki and have not found anything yet on  
how one "current" uses cartofile.


--td

Eugene Loh wrote:

This e-mail was on the users alias... see 
http://www.open-mpi.org/community/lists/users/2009/09/10710.php

There wasn't much response, so let me ask another question.  How  
about if we remove the cartofile section from the DESCRIPTION  
section of the OMPI mpirun man page?  It's a lot of text that  
illustrates how to create a cartofile without saying anything about  
why one would want to go to the trouble.  What does this impact?   
What does it change?  What's the motivation for doing this stuff?   
What's this stuff good for?


Another alternative could be to move the cartofile description to a  
FAQ page.


The mpirun man page is rather long and I was thinking that if we  
could remove some "low impact" stuff out, we could improve the  
overall signal-to-noise ratio of the page.


In any case, I personally would like to know what cartofiles are  
good for.


Eugene Loh wrote:
Thank you, but I don't understand who is consuming this  
information for what.  E.g., the mpirun man page describes the  
carto file, but doesn't give users any indication whether they  
should be worrying about this.


Lenny Verkhovsky wrote:

Hi Eugene,
carto file is a file with a staic graph topology of your node.
in the opal/mca/carto/file/carto_file.h you can see example.
( yes I know that , it should be help/man list :) )
Basically it describes a map of your node and inside  
interconnection.

Hopefully it will be discovered automatically someday,
but for now you can describe your node manually.
Best regards Lenny.

On Thu, Sep 17, 2009 at 12:38 AM, Eugene Loh > wrote:


   I feel like I should know, but what's a cartofile?  I guess you
   supply "topological" information about a host, but I can't tell
   how this information is used by, say, mpirun.




___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] [OMPI users] cartofile

2009-10-13 Thread Terry Dontje
After rereading the manpage for the umpteenth time I agree with Eugene 
that the information provided on cartofile is next to useless.   Ok, so 
you describe what your node looks like but what does mpirun or libmpi do 
with that information?  Other than the option to provide the cartofile 
it isn't obvious how a user or libmpi uses this information.


I've looked on the faq and wiki and have not found anything yet on how 
one "current" uses cartofile.


--td

Eugene Loh wrote:
This e-mail was on the users alias... see 
http://www.open-mpi.org/community/lists/users/2009/09/10710.php


There wasn't much response, so let me ask another question.  How about 
if we remove the cartofile section from the DESCRIPTION section of the 
OMPI mpirun man page?  It's a lot of text that illustrates how to 
create a cartofile without saying anything about why one would want to 
go to the trouble.  What does this impact?  What does it change?  
What's the motivation for doing this stuff?  What's this stuff good for?


Another alternative could be to move the cartofile description to a 
FAQ page.


The mpirun man page is rather long and I was thinking that if we could 
remove some "low impact" stuff out, we could improve the overall 
signal-to-noise ratio of the page.


In any case, I personally would like to know what cartofiles are good for.

Eugene Loh wrote:
Thank you, but I don't understand who is consuming this information 
for what.  E.g., the mpirun man page describes the carto file, but 
doesn't give users any indication whether they should be worrying 
about this.


Lenny Verkhovsky wrote:

Hi Eugene,
carto file is a file with a staic graph topology of your node.
in the opal/mca/carto/file/carto_file.h you can see example.
( yes I know that , it should be help/man list :) )
Basically it describes a map of your node and inside interconnection.
Hopefully it will be discovered automatically someday,
but for now you can describe your node manually.
Best regards 
Lenny.


On Thu, Sep 17, 2009 at 12:38 AM, Eugene Loh > wrote:


I feel like I should know, but what's a cartofile?  I guess you
supply "topological" information about a host, but I can't tell
how this information is used by, say, mpirun.




___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel