[hwloc-devel] Create success (hwloc r1.7.2rc1r5763)

2013-08-27 Thread MPI Team
Creating nightly hwloc snapshot SVN tarball was a success.

Snapshot:   hwloc 1.7.2rc1r5763
Start time: Tue Aug 27 21:05:33 EDT 2013
End time:   Tue Aug 27 21:09:42 EDT 2013

Your friendly daemon,
Cyrador


[hwloc-devel] Create success (hwloc r1.8a1r5764)

2013-08-27 Thread MPI Team
Creating nightly hwloc snapshot SVN tarball was a success.

Snapshot:   hwloc 1.8a1r5764
Start time: Tue Aug 27 21:01:01 EDT 2013
End time:   Tue Aug 27 21:05:33 EDT 2013

Your friendly daemon,
Cyrador


Re: [OMPI devel] ompi_info

2013-08-27 Thread Jeff Squyres (jsquyres)
On Aug 27, 2013, at 3:13 PM, Nathan Hjelm  wrote:

>>>  1a. ompi_info has a *very long-standing precedent* behavior of using 
>>>  MCA params to exclude the display of components (and their 
>>> params). Users have come to rely on this behavior to test that OMPI is 
>>> honoring their $HOME/.openmpi-mca-params.conf file (for example) because -- 
>>> at least prior to new ompi_info -- there was no other way to verify that.
> 
> Please take a look @ r29070. I changed the default behavior of ompi_info
> -a when --level is not specified to assume level 9. I also added an
> option (--selected-only/-s) that limits the output to components that
> may be selected. Let me know if this fix is ok.


I don't think it's going to be enough.

George's point is that the *default behavior* for ompi_info for years has been 
to do what --selected-only does.  So adding a non-default option to get that 
same behavior... I think George will hate that.  Right, George?  :-)

I think your option 2b) from your previous mail was the compromise:

-
To summarize what will be done:

1) --all without a --level will assume --level 9
2) Either a) add an option to ompi_info to suppress registering all
components when a component selection parameter is set (ie. --mca btl
self,sm) or b) somehow mark the parameters of unused components as such.
-

I.e., show all components, but mark those who are not selected somehow.

Sorry.  :-\

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/



Re: [OMPI devel] ompi_info

2013-08-27 Thread Nathan Hjelm
On Tue, Aug 27, 2013 at 06:57:09PM +0200, George Bosilca wrote:
> 
> On Jul 19, 2013, at 17:57 , Jeff Squyres (jsquyres)  
> wrote:
> 
> > I've now talked to both George and Nathan.  Let me summarize for the web 
> > archives / community:
> > 
> > 1. There are two main points of contention:
> > 
> >   1a. ompi_info has a *very long-standing precedent* behavior of using 
> >  MCA params to exclude the display of components (and their 
> > params). Users have come to rely on this behavior to test that OMPI is 
> > honoring their $HOME/.openmpi-mca-params.conf file (for example) because -- 
> > at least prior to new ompi_info -- there was no other way to verify that.
> > 
> >   1b. It seems meaningless for MPI_T_Init to open *all* components when 
> > we're just going to be exposing a bunch of components/parameters that will 
> > not be used.  Easy example: MPI_T_Init will open all the PMLs, but we'll 
> > only end up using one of them.  Why have all the rest?
> 
> Any progress on this?


Please take a look @ r29070. I changed the default behavior of ompi_info
-a when --level is not specified to assume level 9. I also added an
option (--selected-only/-s) that limits the output to components that
may be selected. Let me know if this fix is ok.

-Nathan


Re: [OMPI devel] Quick observation - component ignored for 7 years

2013-08-27 Thread Nathan Hjelm
I agree. Unless anyone is using this it should go away.

-Nathan

On Tue, Aug 27, 2013 at 10:49:03AM -0700, Rolf vandeVaart wrote:
>The ompi/mca/rcache/rb component has been .ompi_ignored for almost 7
>years.  Should we delete it?
> 
> 
> 
> 
> 
>  --
> 
>This email message is for the sole use of the intended recipient(s) and
>may contain confidential information.  Any unauthorized review, use,
>disclosure or distribution is prohibited.  If you are not the intended
>recipient, please contact the sender by reply email and destroy all copies
>of the original message.
> 
>  --

> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel



Re: [OMPI devel] Quick observation - component ignored for 7 years

2013-08-27 Thread Jeff Squyres (jsquyres)
+1

On Aug 27, 2013, at 1:49 PM, Rolf vandeVaart 
 wrote:

> The ompi/mca/rcache/rb component has been .ompi_ignored for almost 7 years.  
> Should we delete it?
>  
>  
> This email message is for the sole use of the intended recipient(s) and may 
> contain confidential information.  Any unauthorized review, use, disclosure 
> or distribution is prohibited.  If you are not the intended recipient, please 
> contact the sender by reply email and destroy all copies of the original 
> message.
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/



Re: [hwloc-devel] hwloc-distrib - please add the option to distribute the jobs in the reverse direction

2013-08-27 Thread Brice Goglin
Le 27/08/2013 18:15, Jiri Hladky a écrit :
> using the weights looks like a good solution to me. However, we need
> to think if and how we should propagate the weight via the upper
> levels of the hierarchy. So for example if you have a socket with 4
> cores and cores 0&1 and 2&3 share L2 cache than the question is if and
> how the weight 1 for core 0 should propagate via L2 cache to core 1
> (with some reduced factor).

You just explained why I don't like weights. Some people will want to
ignore L2, some won't. Specifying all this on the command-line would be
horrible, and implementing it will be horrible too.

> I think that --reverse option is much easier for the implementation
> and for the clear requirement and understanding how the output should
> look like.

Implementing reverse bitmap_singlify() isn't so easy.

Also "--reverse" would have a semantics that no users ever requested,
it's only a workaround for your actual need ("ignore core0 if
possible"). What if somebody laer comes with a machine where he wants to
preferably ignore core 7 and maybe ignore core 11 too, because some
special daemons are running there? We'd need to add
--dont-reverse-but-ignore-some-cores-if-possible. Or what if somebody
wants to ignore the first core but still get other cores in the normal
order?

I tend to think we should let the application handle these specific
cases (finding what can be ignored while still having enough objects,
and then calling distribute accordingly).

Brice



[OMPI devel] Quick observation - component ignored for 7 years

2013-08-27 Thread Rolf vandeVaart
The ompi/mca/rcache/rb component has been .ompi_ignored for almost 7 years.  
Should we delete it?




---
This email message is for the sole use of the intended recipient(s) and may 
contain
confidential information.  Any unauthorized review, use, disclosure or 
distribution
is prohibited.  If you are not the intended recipient, please contact the 
sender by
reply email and destroy all copies of the original message.
---


Re: [OMPI devel] ompi_info

2013-08-27 Thread Nathan Hjelm
On Tue, Aug 27, 2013 at 06:57:09PM +0200, George Bosilca wrote:
> 
> On Jul 19, 2013, at 17:57 , Jeff Squyres (jsquyres)  
> wrote:
> 
> > I've now talked to both George and Nathan.  Let me summarize for the web 
> > archives / community:
> > 
> > 1. There are two main points of contention:
> > 
> >   1a. ompi_info has a *very long-standing precedent* behavior of using 
> >  MCA params to exclude the display of components (and their 
> > params). Users have come to rely on this behavior to test that OMPI is 
> > honoring their $HOME/.openmpi-mca-params.conf file (for example) because -- 
> > at least prior to new ompi_info -- there was no other way to verify that.
> > 
> >   1b. It seems meaningless for MPI_T_Init to open *all* components when 
> > we're just going to be exposing a bunch of components/parameters that will 
> > not be used.  Easy example: MPI_T_Init will open all the PMLs, but we'll 
> > only end up using one of them.  Why have all the rest?
> 
> Any progress on this?

There was until a bad puppet script wiped out all my data on my work
computer. I will work on it today and should have something ready to
push tomorrow.

To summarize what will be done:

1) --all without a --level will assume --level 9
2) Either a) add an option to ompi_info to suppress registering all
components when a component selection parameter is set (ie. --mca btl
self,sm) or b) somehow mark the parameters of unused components as such.

1 and 2a are easy. 2b is a little harder.

-Nathan


Re: [OMPI devel] ompi_info

2013-08-27 Thread George Bosilca

On Jul 19, 2013, at 17:57 , Jeff Squyres (jsquyres)  wrote:

> I've now talked to both George and Nathan.  Let me summarize for the web 
> archives / community:
> 
> 1. There are two main points of contention:
> 
>   1a. ompi_info has a *very long-standing precedent* behavior of using 
>  MCA params to exclude the display of components (and their 
> params). Users have come to rely on this behavior to test that OMPI is 
> honoring their $HOME/.openmpi-mca-params.conf file (for example) because -- 
> at least prior to new ompi_info -- there was no other way to verify that.
> 
>   1b. It seems meaningless for MPI_T_Init to open *all* components when we're 
> just going to be exposing a bunch of components/parameters that will not be 
> used.  Easy example: MPI_T_Init will open all the PMLs, but we'll only end up 
> using one of them.  Why have all the rest?

Any progress on this?

  George.


> 
> 2. After talking to Nathan, here's our answers:
> 
>   2a. Fair enough.  The long-standing ompi_info behavior precedent alone is 
> probably enough to warrant re-thinking the new ompi_info behavior.  Nathan 
> will implement a compromise (that George was ok with when I talked on the 
> phone with him).  If you have a  parameter somewhere that disables 
> components (e.g., $HOME/.openmpi-mca-params.conf contains "btl = 
> tcp,sm,self"), then ompi_info will somehow mark those components' parameters 
> as "inactive" in the prettyprint and parseable outputs
> 
>   2b. Nathan reminded me why we chose to do this.  It requires a little 
> explanation...
> 
> One thing to remember: MPI_T parameters *are* MCA parameters.  Hence, the 
> btl_tcp_if_include MCA parameter is also the btl_tcp_if_include MPI_T 
> parameter.  Put differently: MPI_T and MCA are two interfaces to the same 
> back-end variables.
> 
> Something to note: if you call MPI_Init and then later call MPI_T_init, the 
> latter is effectively a no-op.
> 
> The interesting case is when you call MPI_T_init before you call MPI_Init.  
> In this case, as has been noted in this thread: we open all components in all 
> frameworks.
> 
> However: what hasn't been noted is that during the subsequent MPI_Init, we do 
> normal selection *and will close unused components* (which also un-registers 
> all their corresponding MPI_T/MCA parameters).  For example:
> 
> 1. During MPI_T_init, we'll open all the PMLs: CM, OB1, etc.
> 
> 2. Subsequent calls to MPI_T functions can *set* MPI_T/MCA params.  For 
> example, you can use MPI_T to pick the ob1 PML.
> 
> 3. When MPI_Init is finally invoked, normal MPI_Init flow is observed; if an 
> MCA param was set to, for example, pick the PML, it will be honored and the 
> non-selected PMLs will be closed.  Consequently, all the MPI_T/MCA params for 
> the closed components will disappear from MPI_T (which is allowed by the 
> spec).  Continuing the example from #2, the CM PML will be closed, and all of 
> its MPI_T/MCA params will disappear.
> 
> Put simply: the point of opening *all* frameworks/components is to find out 
> what all the params are so that the window of time represented by #2 can be 
> utilized to examine/set MCA params as you want before you go into the 
> "normal" MPI process 
> 
> So I think we're ok from this standpoint.
> 
> 
> 
> 
> On Jul 19, 2013, at 10:29 AM, "Jeff Squyres (jsquyres)"  
> wrote:
> 
>> George and I talked about this on the phone; I understand his questions 
>> better now.
>> 
>> Nathan and I will get together and work through his questions and come back 
>> to everyone with some answers / proposals.
>> 
>> 
>> On Jul 19, 2013, at 9:27 AM, George Bosilca  wrote:
>> 
>>> 
>>> My point is the following. I favor consistent behaviors and I'm always in 
>>> favor of respecting the configuration files. ALWAYS like in the word that 
>>> mean all cases without exception. Thus, by default, ompi_info as any other 
>>> component of the Open MPI infrastructure MUST read the configuration files 
>>> and respect all options provided there. And here was another inconsistency 
>>> on the "new" approach. ompi_info reports some of the values (like the eager 
>>> size and friends), while deciding to ignore others (like the list of the 
>>> active PML and BTL).
>>> 
>>> I do concede that there are cases where we need something slightly 
>>> different, maybe as a convenience. If there is a need for a special case 
>>> for ompi_info to ignore the configuration files, let's add it. But do't 
>>> make it the default, instead request a special command line argument for it.
>>> 
>>> There were several mentions about he MPI_T in this discussion. If I 
>>> understand what was said about it, all components are loaded, they register 
>>> their parameters and them based on user selection some the them are 
>>> unloaded. Thus my question is: from the tools perspective what is the 
>>> interest of knowing that a special MPI_T parameter exists but not be able 
>>> 

Re: [hwloc-devel] lstopo - please add the information about the kernel to the graphical output

2013-08-27 Thread Brice Goglin
The problem I have while playing with this is that it takes a lot of
space. Putting the entire uname on a single line will be truncated when
the topology drawing isn't large (on machines with 2 cores for
instance). And using multiple lines would make the legend huge.
We could make it optional. But if you have to remember to manually
enable this new option, why not just remember to export to XML instead,
you already have all uname info in there.

Brice



Le 26/08/2013 15:11, Jiri Hladky a écrit :
> Hi Brice,
> hi all,
>
> I do run hwloc on different versions of Linux kernel when testing
> RHEL. Since sometimes kernel is buggy and does not detect the topology
> correctly it would be useful to have at the bottom of the graphical
> output of the lstopo not only host name but also the version of the
> kernel.
>
> Example of C code on Linux to write this info:
> ===
> #include 
> if(uname(_buf) == -1)
> printf("uname call failed!");
> else {
> printf("Nodename:\t%s\n", uname_buf.nodename);
> printf("Sysname:\t%s\n", uname_buf.sysname);
> printf("Release:\t%s\n", uname_buf.release);
> printf("Version:\t%s\n", uname_buf.version);
> printf("Machine:\t%s\n", uname_buf.machine);
>}
>
> Nodename:   localhost.localdomain
> Sysname:Linux
> Release:3.10.7-100.fc18.x86_64
> Version:#1 SMP Thu Aug 15 22:21:29 UTC 2013
> Machine:x86_64
> ===
>
>
> Suggestion: on the graphical output of lstopo please add the line
>
> System: Linux, x86_64, 3.10.7-100.fc18.x86_64 #1 SMP Thu Aug 15
> 22:21:29 UTC 2013
>
>
> printf("System: %s, %s, %s %s\n", uname_buf.sysname,
> uname_buf.machine, uname_buf.release, uname_buf.version);
>
> Would it be possible? Any further ideas, suggestions?
>
> Thanks a lot!
> Jirka
>
> On Tue, Jun 18, 2013 at 5:17 PM, Jiri Hladky  > wrote:
>
> Hi Brice,
>
> we test linux kernel job scheduler. To do so, we run for example 
>
> 1
> 2
> 
> 16
> 32 linpack benchmarks simultaneously. 
> (upto the number of cores)
>
> For each group of jobs we have this output:
>
> ===2 simultaneous jobs
> PID #CPU #CPU #CPU #CPU
> PID #CPU #CPU #CPU
> ==
>
> where first column is PID of linpack benchmark and other columns
> is CPU on which the process was running with granularity of 1 second
>
> I would like to check the possibilities to visualize the results
> to the output similar to lstopo --top (see the attachment). I
> would like to write a simple utility which will
>  * parse the above file
>  * foreach timestep create an output similar to lstopo --top
> output showing, where each job was running
>
> How difficult would be this? Could you please provide some hints
> what API functionality to use?
>
> Thanks!
> Jirka
>
>



Re: [hwloc-devel] hwloc-distrib - please add the option to distribute the jobs in the reverse direction

2013-08-27 Thread Jiri Hladky
Hi Brice,
hi Chris,

using the weights looks like a good solution to me. However, we need to
think if and how we should propagate the weight via the upper levels of the
hierarchy. So for example if you have a socket with 4 cores and cores 0&1
and 2&3 share L2 cache than the question is if and how the weight 1 for
core 0 should propagate via L2 cache to core 1 (with some reduced factor).


  L2  L2
L1   L1   L1   L1
C#0 C#1  C#2 C#3

hwloc-distrib --weight 1 socket:all.core:0  --single N

should return

N=1 => CORE=2 or 3 (definitely not CORE=1 because of common L2 cache
between CORE#0 and CORE#1 )
N=2 => CORE=2,1 (not CORE#3 because of L2 cache topology)
N=3 => CORE=2,1,3
N=4 => CORE=2,1,3,0

Implementing the weights is definitely a very nice feature but IMHO due to
the need to propagate it through upper levels it can be quite complicated
for the implementation, especially when multiple weights are specified at
the command line. For example, consider this case

hwloc-distrib --weight 10 socket:all.core:0  --single 2

for the topology described above. Should it return
COREs 2 and 3
or
COREs 2 and 1 ?

(in other words, which weight should CORE#1 get when CORE#0 has weight 10
and they are connected via L2 cache ?)


I think that --reverse option is much easier for the implementation and for
the clear requirement and understanding how the output should look like.

Any thoughts, comments on that?

Thanks!
Jirka






On Tue, Aug 27, 2013 at 9:14 AM, Brice Goglin  wrote:

> Le 27/08/2013 05:07, Christopher Samuel a écrit :
> > On 27/08/13 00:07, Brice Goglin wrote:
> >
> > > But there's a more general problem here, some people may want
> > > something similar for other cases. I need to think about it.
> >
> > Something like a sort order perhaps, combined with some method to
> > exclude or weight PUs based on some metrics (including a user defined
> > weight)?
>
> Excluding is already supported with --restrict.
>
>
> If you want to (if possible) avoid core 0 on each socket, and (at least)
> avoid core 0 on the entire machine, you'd need a command-line like this:
>
>hwloc-distrib --weight 1 socket:all.core:0 --weight 2 core:0 ...
>
> Instead of doing
>
>if $(hwloc-calc -N pu all ~socket:all.core:0) -le $jobs; then
>   hwloc-distrib --restrict $(all ~socket:all.core:0) ...
>else if $(hwloc-calc -N pu all ~core:0) -le $jobs; then
>   hwloc-distrib --restrict $(all ~core.0) ...
>else
>   hwloc-distrib ...
>fi
>
>
> Another solution would be to have hwloc-distrib error-out when there are
> not enough objects to distribute jobs. You'll do:
>
>hwloc-distrib --new-option --restrict $(all ~socket:all.core:0) ...
>|| hwloc-distrib --new-option --restrict $(all ~core.0) ...
>|| hwloc-distrib ...
>
>
> And if you want to use entire cores instead of individual PUs, you can
> still use "--to core" to stop distributing once you reach the core level.
>
>
> > I had a quick poke around looking at /proc/irq/*/ and it would appear
> > you can gather info about which CPUs are eligible to handle IRQs from
> > the smp_affinity bitmask (or smp_affinity_list).
>
> smp_affinity_list is only accessible to root unfortunately, that's why
> we never used it in hwloc.
>
> > The node file there just "shows the node to which the device using the
> > IRQ reports itself as being attached. This hardware locality
> > information does not include information about any possible driver
> > locality preference."
>
> Ah, I missed the addition is "node" file. This one is world-accessible.
>
> Brice
>
> ___
> hwloc-devel mailing list
> hwloc-de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-devel
>


Re: [hwloc-devel] hwloc-distrib - please add the option to distribute the jobs in the reverse direction

2013-08-27 Thread Brice Goglin
Le 27/08/2013 05:07, Christopher Samuel a écrit :
> On 27/08/13 00:07, Brice Goglin wrote:
>
> > But there's a more general problem here, some people may want
> > something similar for other cases. I need to think about it.
>
> Something like a sort order perhaps, combined with some method to
> exclude or weight PUs based on some metrics (including a user defined
> weight)?

Excluding is already supported with --restrict.


If you want to (if possible) avoid core 0 on each socket, and (at least)
avoid core 0 on the entire machine, you'd need a command-line like this:

   hwloc-distrib --weight 1 socket:all.core:0 --weight 2 core:0 ...

Instead of doing

   if $(hwloc-calc -N pu all ~socket:all.core:0) -le $jobs; then
  hwloc-distrib --restrict $(all ~socket:all.core:0) ...
   else if $(hwloc-calc -N pu all ~core:0) -le $jobs; then
  hwloc-distrib --restrict $(all ~core.0) ...
   else
  hwloc-distrib ...
   fi


Another solution would be to have hwloc-distrib error-out when there are
not enough objects to distribute jobs. You'll do:

   hwloc-distrib --new-option --restrict $(all ~socket:all.core:0) ...
   || hwloc-distrib --new-option --restrict $(all ~core.0) ...
   || hwloc-distrib ...


And if you want to use entire cores instead of individual PUs, you can
still use "--to core" to stop distributing once you reach the core level.


> I had a quick poke around looking at /proc/irq/*/ and it would appear
> you can gather info about which CPUs are eligible to handle IRQs from
> the smp_affinity bitmask (or smp_affinity_list).

smp_affinity_list is only accessible to root unfortunately, that's why
we never used it in hwloc.

> The node file there just "shows the node to which the device using the
> IRQ reports itself as being attached. This hardware locality
> information does not include information about any possible driver
> locality preference."

Ah, I missed the addition is "node" file. This one is world-accessible.

Brice



Re: [hwloc-devel] hwloc-distrib - please add the option to distribute the jobs in the reverse direction

2013-08-27 Thread Christopher Samuel
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 27/08/13 00:07, Brice Goglin wrote:

> But there's a more general problem here, some people may want
> something similar for other cases. I need to think about it.

Something like a sort order perhaps, combined with some method to
exclude or weight PUs based on some metrics (including a user defined
weight)?

I had a quick poke around looking at /proc/irq/*/ and it would appear
you can gather info about which CPUs are eligible to handle IRQs from
the smp_affinity bitmask (or smp_affinity_list).

The node file there just "shows the node to which the device using the
IRQ reports itself as being attached. This hardware locality
information does not include information about any possible driver
locality preference."

cheers,
Chris
- -- 
 Christopher SamuelSenior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/  http://twitter.com/vlsci

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iEYEARECAAYFAlIcF/QACgkQO2KABBYQAh/7oQCcDSLlgEJqBGDerUD481ho6UWc
Rp0AnRC4cC/Kdhwe75tgg1O/LrcfxXM0
=r4pj
-END PGP SIGNATURE-