Re: [hwloc-users] Netloc feature suggestion

2019-08-16 Thread Jeff Squyres (jsquyres) via hwloc-users
Don't forget that network topologies can also be complex -- it's not always a 
simple, single-path hierarchy.  There can be multiple paths between any pair of 
hosts on the network.  Sometimes the hosts are aware of the multiple paths, 
sometimes they are not (e.g., sometimes the fabric routing changes during the 
course of a single MPI job, and the hosts/MPI applications are unaware).

Meaning: the information about which network paths are taken for a given 
host-A-to-host-B traversal may be both distributed and transient.


On Aug 14, 2019, at 11:05 AM, Rigel Falcao do Couto Alves 
mailto:rigel.al...@tu-dresden.de>> wrote:

Hi,

I am doing a PhD in performance analysis of highly parallel CFD codes and would 
like to suggest a feature for Netloc: from topic Build Scotch sub-architectures 
(at https://www.open-mpi.org/projects/hwloc/doc/v2.0.3/a00329.php), create a 
function-version of netloc_get_resources, which could retrieve at runtime the 
network details of the available cluster resources (i.e. the nodes allocated to 
the job). I am mostly interested about how many switches (the gray circles in 
the figure below) need to be traversed in order for any pair of allocated nodes 
to communicate with each other:



For example, suppose my job is running within 4 nodes in the cluster, 
illustrated by the numbers above. All I would love to get from Netloc - at 
runtime - is some sort of classification of the nodes, like:

1: aa
2: ab
3: ba
4: ca

The difference between nodes 1 and 2 is on the last digit, which means their 
MPI communications only need to traverse 1 switch; however, between any of them 
and nodes 3 or 4, the difference starts on the second-last digit, which means 
their communications need to traverse two switches. More digits may be 
left-added to the string, per necessity; i.e. if the central gray circle on the 
above figure is connected to another switch, which in turnleads to another part 
of the cluster's structure (with its own switches, nodes etc.). For me, it is 
at the present moment irrelevant whether e.g. nodes 1 and 2 are physically - or 
logically - consecutive to each other: a, b, c etc. would be just arbitrary 
identifiers.

I would then use this data to plot the process placement, using open-source 
tools developed here in the University of Dresden (Germany); i.e. Scotch is not 
an option for me. The results of my study will be open-source as well and I can 
gladly share them with you once the thesis is finished.

I hope I have clearly explained what I have in mind; please let me know if 
there are any questions. Finally, it is important that this feature is part of 
Netloc's API (as it is supposed to be integrated with the tools we develop 
here), works at runtime and doesn't require root privileges (as those tools are 
used by our cluster's costumers on their every-day job submissions).

Kind regards,


--
Dipl.-Ing. Rigel Alves
researcher

Technische Universität Dresden
Center for Information Services and High Performance Computing (ZIH)
Zellescher Weg 12 A 218, 01069 Dresden | Germany

�� +49 (351) 463.42418
�� https://tu-dresden.de/zih/die-einrichtung/struktur/rigel-alves


___
hwloc-users mailing list
hwloc-users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/hwloc-users


--
Jeff Squyres
jsquy...@cisco.com



___
hwloc-users mailing list
hwloc-users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/hwloc-users

Re: [hwloc-users] Hwloc command not working

2017-03-02 Thread Jeff Squyres (jsquyres)
Jeyaraj --

I think what we need is a bit more specific information in order to help you.  
Everyone's system is setup differently; we don't know how yours is setup.  For 
example:

- What version of hwloc did you install?
- Where did you get the RPM for hwloc?
- How exactly are you testing?
- You mentioned that you installed an hwloc RPM -- did you look at the RPM 
contents to see if it includes the hwloc-dump-hwdata command?  (e.g., "rpm -ql 
hwloc")
- If you're using hwloc from that RPM and something is not there that you 
expect to be there, you might want to contact the maintainer of that RPM (e.g., 
look at "rpm -qi hwloc")
- ...etc.

When you check the contents of the hwloc RPM, you might notice that 
hwloc-dump-hwdata might well be installed in the sbin directory, not the bin 
directory.  A wild guess: perhaps your PATH does not include the sbin 
directory...?


> On Mar 2, 2017, at 8:25 AM, Marco Atzeri  wrote:
> 
> On 02/03/2017 11:56, jeyaraj wrote:
>> Hi,
>> All rpm file and tar file also installed. Not working
>> 
> 
> a bit vague, and we have no crystal ball to look on your system.
> 
> Are you sure the package is properly installed ?
> How you did the verification ?
> Should you not ask on the mailing list for help of your system,
> what ever it is, instead of asking here ?
> 
> 
> 
> 
> 
> 
> 
> ___
> hwloc-users mailing list
> hwloc-users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/hwloc-users


-- 
Jeff Squyres
jsquy...@cisco.com

___
hwloc-users mailing list
hwloc-users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/hwloc-users


Re: [hwloc-users] hello world can't run in Ubuntu 12.04

2015-04-15 Thread Jeff Squyres (jsquyres)
+1

Please try upgrading to Open MPI v1.8.x and see if that solves your problem.


> On Apr 15, 2015, at 12:06 AM, Christopher Samuel  
> wrote:
> 
> On 15/04/15 12:19, Li Li wrote:
> 
>>I am installed openmpi 1.5 and test it with a simple program
> 
> Umm, Open-MPI 1.5 is ancient!
> 
> Open-MPI 1.8.x is the current stable release branch, 1.6 was the
> previous stable release branch (we're still on that here).
> 
> 1.5 was the old feature branch that led up to the 1.6 stable series.
> 
> All the best,
> Chris
> -- 
> Christopher SamuelSenior Systems Administrator
> VLSCI - Victorian Life Sciences Computation Initiative
> Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
> http://www.vlsci.org.au/  http://twitter.com/vlsci
> 
> ___
> hwloc-users mailing list
> hwloc-us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users
> Link to this post: 
> http://www.open-mpi.org/community/lists/hwloc-users/2015/04/1163.php


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/



Re: [hwloc-users] Selecting real cores vs HT cores

2014-12-11 Thread Jeff Squyres (jsquyres)
On Dec 11, 2014, at 2:03 PM, Brice Goglin  wrote:

> By the way, if you can't in the BIOS, you may want to disable the
> hyperthread in the kernel:
> 
> for i in $(hwloc-calc --whole-system --po -I pu core:all.pu:0) ; do echo 0 > 
> /sys/devices/system/cpu/cpu$i/online ; done
> 
> (write 1 instead of 0 to reenable them).

But keep in mind that this is the semantic equivalent of using hwloc-bind to 
bind to the first HT in each core.

I.e., disabling HT in the Linux kernel just disables scheduling on the 2nd HT.

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/



Re: [hwloc-users] Selecting real cores vs HT cores

2014-12-11 Thread Jeff Squyres (jsquyres)
I'm not sure you're asking a well-formed question.

When the BIOS is set to enable hyper threading, then several resources on the 
core are split when the machine is booted up (e.g., some of the queue depths 
for various processing units in the core are half the length that they are when 
hyperthreading is disabled in the BIOS).

Hence, running a process on a core that only uses a single hyperthread (when HT 
is enabled) is not quite the same thing as booting up with HT disabled and 
running that same job on the core.

Make sense?

Meaning: if you want to test HT vs. non-HT performance, you really need to 
change the BIOS settings and reboot, sorry.

Also, note that if you have HT enabled and you run a single-threaded app bound 
to a core, it will only use 1 of those HTs -- the other HT will be largely 
dormant. Meaning: don't expect that running a single-threaded app on a core 
that has HT enabled will magically take advantage of some performance benefit 
of aggressive automatic parallelization.  You really need multiple threads in a 
process to get performance advantages out of HT.



On Dec 11, 2014, at 12:51 PM, Brock Palen  wrote:

> When a system has HT enabled is one core presented the real one and one the 
> fake partner?  Or is that not the case?
> 
> If wanting to test behavior without messing with the bios how do I select 
> just the 'real cores'  if this is the case?   
> 
> I am looking for the equivelent of 
> 
> hwloc-bind ALLREALCORES  my.exe
> 
> Doing some performance study type things.
> 
> Thanks,
> 
> Brock Palen
> www.umich.edu/~brockp
> CAEN Advanced Computing
> XSEDE Campus Champion
> bro...@umich.edu
> (734)936-1985
> 
> 
> 
> ___
> hwloc-users mailing list
> hwloc-us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users
> Link to this post: 
> http://www.open-mpi.org/community/lists/hwloc-users/2014/12/1126.php


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/



Re: [hwloc-users] Using hwloc to detect Hard Disks

2014-09-22 Thread Jeff Squyres (jsquyres)
On Aug 28, 2014, at 7:27 PM, Samuel Thibault  wrote:

>> I am not able to figure out how to read Hard drive details, for e.g.,
>> the content provided by hdparm application.
>> 
>> My first question is, is it possible to read this using hwloc? If yes, can
>> anyone direct me to the documentation which describes how to use it?
> 
> Well, hwloc's goal is to describe the hardware _locality_, not its
> precise content.  So we don't provide that level of detail, we only
> provide where the pieces of hardware reside.

Can you be a bit more specific about what information you want to query?

I ask because it strikes me that hwloc does gather some kinds of hardware 
information and put them as attributes on existing hwloc topology objects.

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/



Re: [hwloc-users] CPU info on ARM

2014-01-28 Thread Jeff Squyres (jsquyres)
I passed this on to my OMPI ARM contact (Leif Lindholm).  Here's what he said:

   "It gets a bit trickier on ARM... since we may also have (implementation
time) configurable cache sizes and also big.LITTLE (different processor
models executing in the same SMP system)."

He passed the question on to another ARM guy, asking for further detail.  I'll 
pass on what he says.



On Jan 28, 2014, at 3:39 AM, Brice Goglin  wrote:

> Hello,
> 
> Is anybody familiar with ARM CPUs?
> 
> I am adding more CPU information because Intel needs more:
> CPUVendor=GenuineIntel
> CPUModel=Intel(R) Xeon(R) CPU E5-2680 0 @ 2.70GHz
> CPUModelNumber=45
> CPUFamilyNumber=6
> 
> Would something similar be useful for ARM? What are the fields below
> from /proc/cpuinfo on ARM that would be useful to developers?
> Processor: Marvell PJ4Bv7 Processor rev 1 (v7l)
> BogoMIPS: 1196.85
> Features: swp half thumb fastmult vfp edsp vfpv3 vfpv3d16 tls
> CPU implementer: 0x56
> CPU architecture: 7
> CPU variant: 0x1
> CPU part: 0x581
> CPU revision: 1
> Hardware: Marvell Armada-370
> Revision: 
> Serial: 
> 
> thanks
> Brice
> 
> ___
> hwloc-users mailing list
> hwloc-us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/



[hwloc-users] hwloc problem on SGI machine

2014-01-10 Thread Jeff Squyres (jsquyres)
Jeff Becker (CC'ed) reported to me a failure with hwloc 1.7.2 (in OMPI trunk).  
I had him verify this with a standalone hwloc 1.7.2, and then had him try 
standalone hwloc 1.8 as well -- all got the same failure.

Here's what he's seeing in 1.7.2:

$ lstopo
Different OS indexes
lstopo: topology-linux.c:2731: look_sysfsnode: Assertion `node == res_obj' 
failed.
Aborted (core dumped)

In 1.8, the issue is the same, but a different line number (2741).

It's an SGI x86_64 server, running SLES 11.

Is this an hwloc issue, or a hardware issue?

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/



[hwloc-users] Migrating www.open-mpi.org

2013-08-05 Thread Jeff Squyres (jsquyres)
All --

Our hosting provider will be migrating 
www.open-mpi.org to a new machine on Wednesday.  See 
message below for details.


Begin forwarded message:

From: DongInn Kim >
Subject: Migrating www.open-mpi.org from milliways to 
lion
List-Post: hwloc-users@lists.open-mpi.org
Date: August 5, 2013 11:53:38 AM PDT

Dear Open MPI developers and users,

We are planning to move all the services under 
www.open-mpi.org to the new server on Wednesday, Aug 
7th, 2013.
This migration may need some outage time of web services (e.g., 
http://www.open-mpi.org) and mailing list services 
(e.g., us...@open-mpi.org, 
de...@open-mpi.org, …).

The migration schedule is following:
- Date: Wednesday, Aug 7th, 2013
- Time:
6:00am-8:00am Pacific US time
7:00am-9:00am Mountain US time
8:00am-10:00am Central US time
9:00am-11:00am Eastern US time
1:00pm-3:00pm GMT

The following services would not be available during the migration.

- Web services (e.g., www.open-mpi.org)
- mailing lists:
  ad...@open-mpi.org
  announce
  bugs
  devel
  devel-core
  docs
  ft
  hwloc-announce
  hwloc-bugs
  hwloc-devel
  hwloc-svn
  hwloc-users
  llamas
  mtt-announce
  mtt-bugs
  mtt-devel
  mtt-devel-core
  mtt-results
  mtt-svn
  mtt-users
  ompi-user-docs-bugs
  ompi-user-docs-svn
  svn
  svn-docs
  svn-docs-full
  svn-full
  svn-private
  svn-private-full
  users
- Mail archives
  http://www.open-mpi.org/community/lists/
- Mercurial mirror
  Will disappear (it has long-since moved out to Bitbucket)

I hope that we will not lose any mails sent to the above mailing lists even 
during the migration but it would be really appreciated if you hold up sending 
emails and svn commit until the migration is done.

Please let me know if you have any questions or issues about this migration.

Regards,

--
- DongInn
---
CREST System administrator
Indiana University
Bloomington, IN


--
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/