Re: [hwloc-users] howloc with scalemp

2010-04-07 Thread Brice Goglin
Brock Palen wrote:
> [brockp@nyx0809 INTEL]$ lstopo -
> System(79GB)
>   Misc0
> Node#0(10GB) + Socket#1 + L3(8192KB)
>   L2(256KB) + L1(32KB) + Core#0 + P#0
>   L2(256KB) + L1(32KB) + Core#1 + P#1
>   L2(256KB) + L1(32KB) + Core#2 + P#2
>   L2(256KB) + L1(32KB) + Core#3 + P#3
> Node#1(10GB) + Socket#0 + L3(8192KB)
>   L2(256KB) + L1(32KB) + Core#0 + P#4
>   L2(256KB) + L1(32KB) + Core#1 + P#5
>   L2(256KB) + L1(32KB) + Core#2 + P#6
>   L2(256KB) + L1(32KB) + Core#3 + P#7
>   Misc0
> Node#2(10GB) + Socket#3 + L3(8192KB)
>   L2(256KB) + L1(32KB) + Core#0 + P#8
>   L2(256KB) + L1(32KB) + Core#1 + P#9
>   L2(256KB) + L1(32KB) + Core#2 + P#10
>   L2(256KB) + L1(32KB) + Core#3 + P#11
> Node#3(10GB) + Socket#2 + L3(8192KB)
>   L2(256KB) + L1(32KB) + Core#0 + P#12
>   L2(256KB) + L1(32KB) + Core#1 + P#13
>   L2(256KB) + L1(32KB) + Core#2 + P#14
>   L2(256KB) + L1(32KB) + Core#3 + P#15
>   Misc0
> Node#4(10GB) + Socket#5 + L3(8192KB)
>   L2(256KB) + L1(32KB) + Core#0 + P#16
>   L2(256KB) + L1(32KB) + Core#1 + P#17
>   L2(256KB) + L1(32KB) + Core#2 + P#18
>   L2(256KB) + L1(32KB) + Core#3 + P#19
> Node#5(10GB) + Socket#4 + L3(8192KB)
>   L2(256KB) + L1(32KB) + Core#0 + P#20
>   L2(256KB) + L1(32KB) + Core#1 + P#21
>   L2(256KB) + L1(32KB) + Core#2 + P#22
>   L2(256KB) + L1(32KB) + Core#3 + P#23
>   Misc0
> Node#6(10GB) + Socket#7 + L3(8192KB)
>   L2(256KB) + L1(32KB) + Core#0 + P#24
>   L2(256KB) + L1(32KB) + Core#1 + P#25
>   L2(256KB) + L1(32KB) + Core#2 + P#26
>   L2(256KB) + L1(32KB) + Core#3 + P#27
> Node#7(10GB) + Socket#6 + L3(8192KB)
>   L2(256KB) + L1(32KB) + Core#0 + P#28
>   L2(256KB) + L1(32KB) + Core#1 + P#29
>   L2(256KB) + L1(32KB) + Core#2 + P#30
>   L2(256KB) + L1(32KB) + Core#3 + P#31
>
> I don't know why they are all labeled Misc0  but it does see the extra
> layer.
>
> If you want other information let me know.

Great, there are probably some distance information in sysfs.

Can you send the output of
cat /sys/devices/system/node/node*/distance

Brice



Re: [hwloc-users] Creating a D wrapper around hwloc

2010-04-16 Thread Brice Goglin
Jim Burnes wrote:
> I can make these available to D in several different ways, but I need
> to know the true intent of marking them as "static __inline".
>
> 1. Are they marked that way simply to increase performance?
>   

No.

> 2. Are they marked that way to avoid some sort of thread safety issue?
>   

No.

> If the answer is (1) then I can safely remove their "static __inline"
> markup and compile them into the library.
>   

In the beginning, one reason was to have examples of easy traversal
routines in the headers, so as to improve documentation a bit. It offers
more features, shows developers how to implement them, without
increasing the core library size, and without increasing the ABI size too.

> This is a cool library.  Thanks for the extensive work.
>   

Thanks!

Brice



Re: [hwloc-users] hwloc RPM spec file

2010-04-26 Thread Brice Goglin
On 23/04/2010 18:09, Jirka Hladky wrote:
> Hello,
>
> I have written hwloc RPM spec file. It's attached.
>
> Thanks
> Jirka
>   
>

Thanks Jirka, but don't you need some BuildRequires such as the following?

libX11-devel
libxml2-devel
cairo-devel
ncurses-devel


Tony (CCed) also worked on RPMs for Fedora in the past (see
http://koji.fedoraproject.org/koji/taskinfo?taskID=1815736). I don't
know which one is better. It would be good to have somebody upload hwloc
in Redhat and Fedora repos at some point.

Maybe adding the spec file to the SVN could be good too? IIRC, you can
build RPM packages with a single command line from the tarball thanks to
this.

Brice


Re: [hwloc-users] hwloc on systems with more than 64 cpus?

2010-05-16 Thread Brice Goglin
No, there is no such limit. If you have 128cores, the cpuset string will
be 0x,0x,0x,0x

As long as you have less than 1024 cores, everything should work fine.
For more than 1024, you'll need to rebuild with a manual change in the
source code, or wait for hwloc 1.1.

Brice




On 14/05/2010 23:51, Jirka Hladky wrote:
> Thanks Samuel!!
>
> The data looks fine. hwloc rocks.
>
> I assume 
> --cpuset
> option (lstopo command)
> is not supported on such systems, right? 
>
> My understanding is that cpuset masks works only upto 64 cores. Is it correct?
>
> Thanks
> Jirka
>
> On Friday 14 May 2010 08:06:12 pm Samuel Thibault wrote:
>   
>> Jeff Squyres, le Fri 14 May 2010 09:09:44 -0400, a écrit :
>> 
>>> I believe that Brice / Samuel (the two main developers) have tested hwloc
>>> on an old Altix 4700 with 256 itanium cores.
>>>
>>> I don't have their exact results, and I don't see them on IM right now,
>>> so I don't know if they're around today or not...
>>>   
>> It was tested on a 256 core itanium machine, see
>> tests/linux/256ia64-64n2s2c.tar.gz.output
>>
>> Samuel
>> ___
>> hwloc-users mailing list
>> hwloc-us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users
>> 
> ___
> hwloc-users mailing list
> hwloc-us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users
>   



Re: [hwloc-users] hwloc on cray

2010-06-23 Thread Brice Goglin
Hello Norman,

I don't think anybody ever tried. But we have an entry in the TODO list
saying "port to cray catamount" :)
If anybody wants to port hwloc on cray, we'd be happy to help. Getting
us an access on a Cray machine might also help :)

Brice



Le 23/06/2010 04:05, Norman Lo a écrit :
> Hi,
>
> Has anyone tried building hwloc on cray machine ?
>
> Thank you very much with your help in advance.
>
> Norman
> ___
> hwloc-users mailing list
> hwloc-us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users



Re: [hwloc-users] hwloc sockets support on solaris

2010-06-23 Thread Brice Goglin
I see this in the solaris binding core:

  if (hwloc_cpuset_weight(hwloc_set) != 1) {
errno = EXDEV;
return -1;
  }

OMPI doesn't get this error ?

Brice




Le 23/06/2010 21:56, Terry Dontje a écrit :
> Does hwloc think it supports binding processes to sockets or multiple
> cpus?  I am asking because I believe there are no current Solaris
> accessors that support this (processor_bind only binds a pid or a set
> of pids to a *single* processor). 
>
> I bring this up because in testing OMPI with hwloc support it looks
> like -bind-to-socket is acting like -bind-to-core on Solaris.  I
> believe the issue is hwloc should be returning an error to tell OMPI
> it cannot bind-to-socket or multiple cpus at that.
>
> -- 
> Oracle
> Terry D. Dontje | Principal Software Engineer
> Developer Tools Engineering | +1.650.633.7054
> Oracle * - Performance Technologies*
> 95 Network Drive, Burlington, MA 01803
> Email terry.don...@oracle.com 
> -- 
> Oracle
> Terry D. Dontje | Principal Software Engineer
> Developer Tools Engineering | +1.650.633.7054
> Oracle * - Performance Technologies*
> 95 Network Drive, Burlington, MA 01803
> Email terry.don...@oracle.com 
>
>
> ___
> hwloc-users mailing list
> hwloc-us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users
>   



Re: [hwloc-users] hwloc sockets support on solaris

2010-06-23 Thread Brice Goglin
Le 23/06/2010 22:27, Jeff Squyres a écrit :
> Hm.  We should be.  Here's the hwloc plugin code for setting CPU affinity 
> (it's static because it's invoked by function pointer):
>
> static int module_set(opal_paffinity_base_cpu_set_t mask)
> {
> int i, ret = OPAL_SUCCESS;
> hwloc_cpuset_t set;
> hwloc_topology_t *t = _paffinity_hwloc_component.topology;
>
> set = hwloc_cpuset_alloc();
> hwloc_cpuset_zero(set);
> for (i = 0; ((unsigned int) i) < OPAL_PAFFINITY_BITMASK_T_NUM_BITS; ++i) {
> if (OPAL_PAFFINITY_CPU_ISSET(i, mask) &&
> i < mca_paffinity_hwloc_component.cpuset_max_size) {
> hwloc_cpuset_cpu(set, i);
>   

Don't you want hwloc_cpuset_set(set, i) instead ?
hwloc_cpuset_cpu(set, i) changes the cpuset into a single CPU, i.e. it's
zero(set) + set(set, i).

Brice



Re: [hwloc-users] Getting a graphics view for anon graphic system...

2010-07-02 Thread Brice Goglin
Le 09/06/2010 21:52, Brice Goglin a écrit :
> Le 09/06/2010 21:41, Jeff Squyres a écrit :
>   
>> On Jun 6, 2010, at 4:03 PM, Olivier Cessenat wrote:
>>
>>   
>> 
>>> What you write is clear to computer scientists, but I failed to figure
>>> out what it meant. Sorry, it is clear now !
>>> 
>>>   
>> FWIW, there's a section about "output formats" in the hwloc-ls.1 man page.  
>> It's probably worth adding a sentence in there that the list in the man page 
>> applies to the filenames; i.e., that the filename determines the output 
>> format.
>>   
>> 
> By the way, I wonder if we should have explicit --input and --output
> options to clarify the lstopo command-line.
> For instance
> lstopo --xml file.xml file2.png
> would be better written as
> lstopo --input file.xml --output file2.png
>
> So basically, I am saying that --input/-i would be a superset of our
> current --xml, --synthetic and --fsroot. And --output/-o would be
> implicit when passing an filename argument on the command line. And we
> could have --output-format/--of to enforce the output without looking at
> the filename extension (and maybe --input-format/--if to enforce
> xml/synthetic/fsroot).
>   

Most of the above (except the addition of an explicit --output before
the filename) is available in the lstopo branch.

Brice



Re: [hwloc-users] hwloc_set/get_thread_cpubind

2010-07-15 Thread Brice Goglin
Le 14/07/2010 20:28, Αλέξανδρος Παπαδογιαννάκης a écrit :
> hwloc_set_thread_cpubind and hwloc_get_thread_cpubind are missing from the 
> html documentation
> http://www.open-mpi.org/projects/hwloc/doc/v1.0.1/group__hwlocality__binding.php
> 
>   

It may be related to way doxygen handles #ifdef, but we seem to have the
right PREDEFINED variable in the config, and the html doc seems properly
generated on my machine (Debian with doxygen 1.6.3).

Brice



Re: [hwloc-users] xmlbuffer test failure

2010-11-05 Thread Brice Goglin
Looks like there's something specific to your machine. Can you send the
XML output of lstopo ?

thanks
Brice



Le 05/11/2010 05:41, ryuuta a écrit :
> Hi,
>
> I'd like to report the failure of the one of the tests run by 'make
> check':
>
> exported to buffer 0x8546408 length 3070
> re-exported to buffer 0x854ce70 length 3047
> lt-xmlbuffer: ../../hwloc/tests/xmlbuffer.c:36: main: Assertion `size1
> == size2' failed.
> /bin/sh: line 5: 14531 Aborted ${dir}$tst
> FAIL: xmlbuffer
> 
> 1 of 19 tests failed
> Please report to http://www.open-mpi.org/community/help/
> 
>
> I'm using gcc-4.5.1, libxml2-2.7.7, and zlib-1.2.5.
> The revision of hwloc checked out from svn is: 2702
>
> Regards,
> Ryuta
>
>
> ___
> hwloc-users mailing list
> hwloc-us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users
>   



Re: [hwloc-users] xmlbuffer test failure

2010-11-05 Thread Brice Goglin
Here's the patch :)


Le 05/11/2010 08:10, Brice Goglin a écrit :
> Interesting, you don't have any hugepage information, it's probably
> disabled in the kernel. Can you apply th attached patch and check that
> the XML output only contains a single "page_type" line and that it
> looks valid ? It should be something like
>   
> If so, then rerun make check again.
>
> thanks,
> Brice
>
>
>
> Le 05/11/2010 07:55, ryuuta a écrit :
>> Here they are.
>> Thanks for the diagnosis.
>>
>> Regards,
>> Ryuta
>>
>> On Fri, Nov 5, 2010 at 3:46 PM, Brice Goglin <brice.gog...@inria.fr
>> <mailto:brice.gog...@inria.fr>> wrote:
>>
>> Looks like you have some unexpected hugepage information. Not
>> sure it's the cause of the XML problem, but we need to debug this
>> too. Can you send the .tar.bz2 and .output file that
>> hwloc-gather-topology.sh generates ?
>>
>> In the meantime, I need to change this test so as to show the
>> difference between the XML exports when this make check test fails.
>>
>> thanks,
>> Brice
>>
>>
>>
>> Le 05/11/2010 07:35, ryuuta a écrit :
>>> Here you go.
>>>
>>> On Fri, Nov 5, 2010 at 3:07 PM, Brice Goglin
>>> <brice.gog...@inria.fr <mailto:brice.gog...@inria.fr>> wrote:
>>>
>>> Looks like there's something specific to your machine. Can
>>> you send the XML output of lstopo ?
>>>
>>> thanks
>>> Brice
>>>
>>>
>>>
>>> Le 05/11/2010 05:41, ryuuta a écrit :
>>>> Hi,
>>>>
>>>> I'd like to report the failure of the one of the tests run
>>>> by 'make check':
>>>>
>>>> exported to buffer 0x8546408 length 3070
>>>> re-exported to buffer 0x854ce70 length 3047
>>>> lt-xmlbuffer: ../../hwloc/tests/xmlbuffer.c:36: main:
>>>> Assertion `size1 == size2' failed.
>>>> /bin/sh: line 5: 14531 Aborted ${dir}$tst
>>>> FAIL: xmlbuffer
>>>> 
>>>> 1 of 19 tests failed
>>>> Please report to http://www.open-mpi.org/community/help/
>>>> 
>>>>
>>>> I'm using gcc-4.5.1, libxml2-2.7.7, and zlib-1.2.5.
>>>> The revision of hwloc checked out from svn is: 2702
>>>>
>>>> Regards,
>>>> Ryuta
>>>>
>>>>
>>>> ___
>>>> hwloc-users mailing list
>>>> hwloc-us...@open-mpi.org <mailto:hwloc-us...@open-mpi.org>
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users
>>>>   
>>>
>>>
>>
>>
>
> ___
> hwloc-users mailing list
> hwloc-us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users
>   

Index: src/topology-linux.c
===
--- src/topology-linux.c	(révision 2703)
+++ src/topology-linux.c	(copie de travail)
@@ -1635,7 +1635,7 @@
   uint64_t meminfo_hugepages_count, meminfo_hugepages_size;
   struct stat st;
   int has_sysfs_hugepages = 0;
-  int types = 2;
+  int types = 1;
   int err;

   err = hwloc_stat("/sys/kernel/mm/hugepages", , topology->backend_params.sysfs.root_fd);
@@ -1662,6 +1662,7 @@

   if (memory->page_types) {
 uint64_t remaining_local_memory = memory->local_memory;
+#if 0
 if (has_sysfs_hugepages) {
   /* read from node%d/hugepages/hugepages-%skB/nr_hugepages */
   hwloc_parse_hugepages_info(topology, "/sys/kernel/mm/hugepages", memory, _local_memory);
@@ -1671,6 +1672,7 @@
   memory->page_types[1].count = meminfo_hugepages_count;
   remaining_local_memory -= meminfo_hugepages_count * meminfo_hugepages_size;
 }
+#endif
 memory->page_types[0].count = remaining_local_memory / memory->page_types[0].size;
   }
 }


Re: [hwloc-users] hwloc@SC10

2010-11-12 Thread Brice Goglin
My talk on the Cisco booth is Wednesday at 11am. Now I need to find out
what is the right way to pronounce "hwloc" :)

Brice



Le 12/11/2010 14:46, Jeff Squyres a écrit :
> Brice will also be giving a ~10 min short talk on hwloc in the Cisco booth; 
> stop by and say hello!  You can hear the "right" way to pronounce "hwloc".  
> :-)
>
> Cisco is also hosting some Open MPI/MPI-related short talks in our booth; I 
> just posted about this on the Open MPI lists:
>
> http://www.open-mpi.org/community/lists/users/2010/11/14741.php
>
> Drop by the Cisco booth for the exact schedule; we're right next to the main 
> SciNet NOC.
>
> See you there!
>
>
>
> On Nov 8, 2010, at 11:22 AM, Brice Goglin wrote:
>
>   
>> Hello,
>> For those of you going to SC10 @ New Orleans next week, you should know
>> that hwloc will be there too. I will be hanging around the INRIA Booth
>> (#2751, between TACC and Microsoft) and Jeff Squyres will be near the
>> Cisco Booth (#3247, on the other side of Microsoft). Feel free to visit
>> us and request new features for hwloc 1.2 :)
>> Brice
>>
>> ___
>> hwloc-users mailing list
>> hwloc-us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users
>> 
>
>   



Re: [hwloc-users] Identifying NIC in a topology using HWLOC

2010-12-27 Thread Brice Goglin
Hello Saktheesh,

NICs do not appear in the topology yet. This is under development in the
libpci branch. You can take a look at
https://svn.open-mpi.org/svn/hwloc/branches/libpci
and tell us what you think of the interface. If you're talking about
infiniband NICs, hwloc/openfabrics-verbs.h might help you in the meantime.

By the way, what do you mean by "distance" ? There is also some work
about exposing NUMA distances (and more) in the API, see
https://svn.open-mpi.org/svn/hwloc/branches/distances

Brice




Le 27/12/2010 02:32, S.A.Saktheesh a écrit :
> Hi all,
>
> I am trying to identify NIC in a topology with the help of HWLOC.
> Is NIC considered as a object in the topology.I am able to find other
> objects in the mailing list such as CORE,PU,CACHE etc.
>
> I am trying to find this because I am interested in finding distance
> between a Socket and a NIC in a given topology.
>
> -- 
> With kind regards,
> Saktheesh.
>
>
> ___
> hwloc-users mailing list
> hwloc-us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users
>   



Re: [hwloc-users] Problem getting cpuset of MPI task

2011-02-09 Thread Brice Goglin
Le 09/02/2011 16:53, Hendryk Bockelmann a écrit :
> Since I am new to hwloc there might be a misunderstanding from my
> side, but I have a problem getting the cpuset of MPI tasks. I just
> want to run a simple MPI program to see on which cores (or CPUs in
> case of hyperthreading or SMT) the tasks run, so that I can arrange my
> MPI communicators.
>
> For the program below I get the following output:
>
> Process 0 of 2 on tide
> Process 1 of 2 on tide
> --> cpuset of process 0 is 0x000f
> --> cpuset of process 0 after singlify is 0x0001
> --> cpuset of process 1 is 0x000f
> --> cpuset of process 1 after singlify is 0x0001
>
> So why do both MPI tasks report the same cpuset?

Hello Hendryk,

Your processes are not bound, there may run anywhere they want.
hwloc_get_cpubind() tells you where they are bound. That's why the
cpuset is 0xf first (all the existing logical processors in the
machine).

You want to know where they actually run. It's different from where
there are bound. The former is included in the latter. The former is a
single processor, while the later may be any combination of any processors).

hwloc cannot tell you where a task run. But I am looking at implementing
it. I actually sent a patch to hwloc-devel about it yesterday [1]. You
would just have to replace get_cpubind with get_cpuexec (or whatever the
final function name is).

You should note that such a function would not be guaranteed to return
something true since the process may migrate to another processor in the
meantime.

Also note that hwloc_bitmap_singlify is usually used to "simplify" a
cpuset (to avoid migration between multiple SMT for instance) before
binding a task (calling set_cpubind). It's useless in your code above.

Brice

[1] http://www.open-mpi.org/community/lists/hwloc-devel/2011/02/1915.php



> Here is the program (attached you find the output of
> hwloc-gather-topology.sh):
>
> #include 
> #include 
> #include "hwloc.h"
> #include "mpi.h"
>
> int main(int argc, char* argv[]) {
>
>hwloc_topology_t topology;
>hwloc_bitmap_t cpuset;
>char *str = NULL;
>int myid, numprocs, namelen;
>char procname[MPI_MAX_PROCESSOR_NAME];
>
>MPI_Init(,);
>MPI_Comm_size(MPI_COMM_WORLD,);
>MPI_Comm_rank(MPI_COMM_WORLD,);
>MPI_Get_processor_name(procname,);
>
>printf("Process %d of %d on %s\n", myid, numprocs, procname);
>
>hwloc_topology_init();
>hwloc_topology_load(topology);
>
>/* get native cpuset of this process */
>cpuset = hwloc_bitmap_alloc();
>hwloc_get_cpubind(topology, cpuset, 0);
>hwloc_bitmap_asprintf(, cpuset);
>printf("--> cpuset of process %d is %s\n", myid, str);
>free(str);
>hwloc_bitmap_singlify(cpuset);
>hwloc_bitmap_asprintf(, cpuset);
>printf("--> cpuset of process %d after singlify is %s\n", myid, str);
>free(str);
>
>hwloc_bitmap_free(cpuset);
>hwloc_topology_destroy(topology);
>
>MPI_Finalize();
>return 0;
> }
>
>
> ___
> hwloc-users mailing list
> hwloc-us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users
>   



Re: [hwloc-users] hwloc-ps output - how to verify process binding on the core level?

2011-02-14 Thread Brice Goglin
Le 14/02/2011 07:43, Siew Yin Chan a écrit :
>
>>
>>
>
> No. Each hwloc-bind command in the mpirun above doesn't know that
> there are other hwloc-bind instances on the same machine. All of
> them bind their process to all cores in the first socket.
>
> => Agree. For socket:0.core:0-3 , hwloc will bind the MPI processes to
> all cores in the first socket. But how are the individual processes
> mapped on these cores? Will it be in this order:
>
>
> rank 0 à core 0
>
> rank 1 à core 1
>
> rank 2 à core 2
>
> rank 3 à core 3
>
>
> Or in this *arbitrary* order:
>
>
> rank 0 à core 1
>
> rank 1 à core 3
>
> rank 2 à core 0
>
> rank 3 à core 2
>

The operating system decides where each process runs (according to the
binding). It usually has no knowledge of MPI ranks. And I don't think it
looks at the PID numbers during the scheduling. So it's very likely random.


Please distinguish your replies from the test you quote. Otherwise, it's
hard to understand where's your reply. I hope I didn't miss anything.

Brice




Re: [hwloc-users] on using hwloc_get_area_membind_nodeset

2011-07-05 Thread Brice Goglin
Le 05/07/2011 20:13, Alfredo Buttari a écrit :
> Hi all,
> if I understand correctly this routine can tell on which NUMA node(s)
> a specific memory area resides, is this correct?
> Will this routine work on any memory area allocated with any
> allocation routine other than those provided by hwloc?
>
> Can anybody provide a simple example of usage of this routine?
>
> I tried something simple like this
>
> hwloc_topology_t topology;
> int *a;
> hwloc_membind_policy_t *policy;
> hwloc_nodeset_t nodeset;
>
> hwloc_topology_init();
> hwloc_topology_load(topology);
>
> a = (int *) malloc(1000*sizeof(int));
> nodeset = hwloc_bitmap_alloc();
> ret = hwloc_get_area_membind_nodeset( topology, a,
> 1000*sizeof(int), nodeset, policy, HWLOC_MEMBIND_STRICT);
> printf("---> %d\n",ret);
>
> hwloc_topology_destroy(topology);
>
>
>
> but I'm always getting a -1 in ret. What's wrong?

Hello,

You're running Linux and errno is ENOSYS, right? From what I remember,
it's not supported on Linux because getting memory binding is very
poorly supported. I think we could implement it but it would be very
slow (one get_mempolicy syscall per virtual page or something like that).

Brice



Re: [hwloc-users] on using hwloc_get_area_membind_nodeset

2011-07-06 Thread Brice Goglin
I created ticket #46 about this. We'll try to implement this in 1.3.

https://svn.open-mpi.org/trac/hwloc/ticket/46

Brice



Le 06/07/2011 08:48, Alfredo Buttari a écrit :
> Brice, Samuel,
> thanks for your very quick replies. Yes you're right, errno is set to
> ENOSYS. No luck.
> Maybe I can get away with a single call to get_mempolicy (no need to
> check for all the pages in the memory area).
> Thanks again
>
> best regards
> alfredo
>
>
> On Tue, Jul 5, 2011 at 8:34 PM, Brice Goglin <brice.gog...@inria.fr> wrote:
>> Hello,
>>
>> You're running Linux and errno is ENOSYS, right? From what I remember,
>> it's not supported on Linux because getting memory binding is very
>> poorly supported. I think we could implement it but it would be very
>> slow (one get_mempolicy syscall per virtual page or something like that).
>>
>> Brice
>
>



Re: [hwloc-users] Thread core affinity

2011-08-01 Thread Brice Goglin
Le 01/08/2011 12:16, Gabriele Fatigati a écrit :
> Hi,
>
> reading a hwloc-v1.2-a4 manual, on page 15, i look an example
> with 4-socket 2-core machine with hyperthreading.
>
> Core id's are not exclusive as said before. PU's id are exclusive but
> not physically sequential (I suppose)
>
> PU P#0 is in socket P#0 on Core P#0. PU P#1 is in another socket!

These indexes are "physical indexes" (that's the default in the
graphical lstopo output). But we may want to make that clearer in the doc.

Brice



Re: [hwloc-users] Thread core affinity

2011-08-01 Thread Brice Goglin
"PU P#0" means "PU object with physical index 0".
"P#" prefix means "physical index".
"L#" prefix means "logical index" (the one you want to use in
get_obj_by_type).
Use -l or -p to switch from one to the other in lstopo.

Brice



Le 01/08/2011 14:47, Gabriele Fatigati a écrit :
> Hi Brice,
>
> so, if I inderstand well, PU P# numbers are not  the same specified
>  as HWLOC_OBJ_PU flag?
>
> 2011/8/1 Brice Goglin <brice.gog...@inria.fr
> <mailto:brice.gog...@inria.fr>>
>
> Le 01/08/2011 12:16, Gabriele Fatigati a écrit :
> > Hi,
> >
> > reading a hwloc-v1.2-a4 manual, on page 15, i look an example
> > with 4-socket 2-core machine with hyperthreading.
> >
> > Core id's are not exclusive as said before. PU's id are
> exclusive but
> > not physically sequential (I suppose)
> >
> > PU P#0 is in socket P#0 on Core P#0. PU P#1 is in another socket!
>
> These indexes are "physical indexes" (that's the default in the
> graphical lstopo output). But we may want to make that clearer in
> the doc.
>
> Brice
>
>
>
>
> -- 
> Ing. Gabriele Fatigati
>
> Parallel programmer
>
> CINECA Systems & Tecnologies Department
>
> Supercomputing Group
>
> Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy
>
> www.cineca.it <http://www.cineca.it>Tel:   +39 051
> 6171722
>
> g.fatigati [AT] cineca.it <http://cineca.it>  



Re: [hwloc-users] Thread core affinity

2011-08-01 Thread Brice Goglin
You're confusing object types with index types.

PU is an object type, like Core, Socket, ... "logical processor" is a
generic name for cores when there's no SMT, hardware threads when
there's SMT/Hyperthreading, ... PU is basically the smallest thing that
can run a software thread.

"P#" is just the way you're numbering object, it works for PU and for
other object types.

Any object of any type can be identified through a unique logical index,
and possibly non-unique physical index.

We don't often use the name "logical processor" because it's indeed
confusing. "Processing Unit" is less confusing, that's why it's the
official name for the smallest objects in hwloc.

Brice







Le 01/08/2011 15:04, Gabriele Fatigati a écrit :
> Hi Brice,
>
> you said:
>
> "PU P#0" means "PU object with physical index 0".
> "P#" prefix means "physical index".
>
> But from the hwloc manual, page 58:
>
>
> HWLOC_OBJ_PU: Processing Unit, or (Logical) Processor..
>
>
> but it is in conflict with what you said :(
>
>
> 2011/8/1 Brice Goglin <brice.gog...@inria.fr
> <mailto:brice.gog...@inria.fr>>
>
> "PU P#0" means "PU object with physical index 0".
> "P#" prefix means "physical index".
> "L#" prefix means "logical index" (the one you want to use in
> get_obj_by_type).
> Use -l or -p to switch from one to the other in lstopo.
>
> Brice
>
>
>
> Le 01/08/2011 14:47, Gabriele Fatigati a écrit :
>> Hi Brice,
>>
>> so, if I inderstand well, PU P# numbers are not  the same
>> specified  as HWLOC_OBJ_PU flag?
>>
>> 2011/8/1 Brice Goglin <brice.gog...@inria.fr
>> <mailto:brice.gog...@inria.fr>>
>>
>> Le 01/08/2011 12:16, Gabriele Fatigati a écrit :
>> > Hi,
>> >
>> > reading a hwloc-v1.2-a4 manual, on page 15, i look an example
>> > with 4-socket 2-core machine with hyperthreading.
>> >
>> > Core id's are not exclusive as said before. PU's id are
>> exclusive but
>> > not physically sequential (I suppose)
>> >
>> > PU P#0 is in socket P#0 on Core P#0. PU P#1 is in another
>> socket!
>>
>> These indexes are "physical indexes" (that's the default in the
>> graphical lstopo output). But we may want to make that
>> clearer in the doc.
>>
>> Brice
>>
>>
>>
>>
>> -- 
>> Ing. Gabriele Fatigati
>>
>> Parallel programmer
>>
>> CINECA Systems & Tecnologies Department
>>
>> Supercomputing Group
>>
>> Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy
>>
>> www.cineca.it <http://www.cineca.it>Tel:  
>> +39 051 6171722
>>
>> g.fatigati [AT] cineca.it <http://cineca.it>  
>
>
>
>
> -- 
> Ing. Gabriele Fatigati
>
> Parallel programmer
>
> CINECA Systems & Tecnologies Department
>
> Supercomputing Group
>
> Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy
>
> www.cineca.it <http://www.cineca.it>Tel:   +39 051
> 6171722
>
> g.fatigati [AT] cineca.it <http://cineca.it>  



Re: [hwloc-users] Thread core affinity

2011-08-01 Thread Brice Goglin
It's just a coincidence. Most modern machines (many of them are NUMA)
have non sequential numbers (to maximize memory bandwidth in the dumb
cases).

Brice




Le 01/08/2011 15:29, Gabriele Fatigati a écrit :
> Ok,
>
> now it's more clear. Just a little question. Why in a NUMA machine,
> PU# are sequential (page 17), and in a non NUMA machine are not
> sequential? ( page 16)
>
> 2011/8/1 Brice Goglin <brice.gog...@inria.fr
> <mailto:brice.gog...@inria.fr>>
>
> You're confusing object types with index types.
>
> PU is an object type, like Core, Socket, ... "logical processor"
> is a generic name for cores when there's no SMT, hardware threads
> when there's SMT/Hyperthreading, ... PU is basically the smallest
> thing that can run a software thread.
>
> "P#" is just the way you're numbering object, it works for PU and
> for other object types.
>
> Any object of any type can be identified through a unique logical
> index, and possibly non-unique physical index.
>
> We don't often use the name "logical processor" because it's
> indeed confusing. "Processing Unit" is less confusing, that's why
> it's the official name for the smallest objects in hwloc.
>
> Brice
>
>
>
>
>
>
>
> Le 01/08/2011 15:04, Gabriele Fatigati a écrit :
>> Hi Brice,
>>
>> you said:
>>
>> "PU P#0" means "PU object with physical index 0".
>> "P#" prefix means "physical index".
>>
>> But from the hwloc manual, page 58:
>>
>>
>> HWLOC_OBJ_PU: Processing Unit, or (Logical) Processor..
>>
>>
>> but it is in conflict with what you said :(
>>
>>
>> 2011/8/1 Brice Goglin <brice.gog...@inria.fr
>> <mailto:brice.gog...@inria.fr>>
>>
>> "PU P#0" means "PU object with physical index 0".
>> "P#" prefix means "physical index".
>> "L#" prefix means "logical index" (the one you want to use in
>> get_obj_by_type).
>> Use -l or -p to switch from one to the other in lstopo.
>>
>> Brice
>>
>>
>>
>> Le 01/08/2011 14:47, Gabriele Fatigati a écrit :
>>> Hi Brice,
>>>
>>> so, if I inderstand well, PU P# numbers are not  the same
>>> specified  as HWLOC_OBJ_PU flag?
>>>
>>> 2011/8/1 Brice Goglin <brice.gog...@inria.fr
>>> <mailto:brice.gog...@inria.fr>>
>>>
>>> Le 01/08/2011 12:16, Gabriele Fatigati a écrit :
>>> > Hi,
>>> >
>>> > reading a hwloc-v1.2-a4 manual, on page 15, i look an
>>> example
>>> > with 4-socket 2-core machine with hyperthreading.
>>> >
>>> > Core id's are not exclusive as said before. PU's id
>>> are exclusive but
>>> > not physically sequential (I suppose)
>>> >
>>> > PU P#0 is in socket P#0 on Core P#0. PU P#1 is in
>>> another socket!
>>>
>>> These indexes are "physical indexes" (that's the default
>>> in the
>>> graphical lstopo output). But we may want to make that
>>> clearer in the doc.
>>>
>>> Brice
>>>
>>>
>>>
>>>
>>> -- 
>>> Ing. Gabriele Fatigati
>>>
>>> Parallel programmer
>>>
>>> CINECA Systems & Tecnologies Department
>>>
>>> Supercomputing Group
>>>
>>> Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy
>>>
>>> www.cineca.it <http://www.cineca.it>   
>>> Tel:   +39 051 6171722
>>>
>>> g.fatigati [AT] cineca.it <http://cineca.it>  
>>
>>
>>
>>
>> -- 
>> Ing. Gabriele Fatigati
>>
>> Parallel programmer
>>
>> CINECA Systems & Tecnologies Department
>>
>> Supercomputing Group
>>
>> Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy
>>
>> www.cineca.it <http://www.cineca.it>Tel:  
>> +39 051 6171722
>>
>> g.fatigati [AT] cineca.it <http://cineca.it>  
>
>
>
>
> -- 
> Ing. Gabriele Fatigati
>
> Parallel programmer
>
> CINECA Systems & Tecnologies Department
>
> Supercomputing Group
>
> Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy
>
> www.cineca.it <http://www.cineca.it>Tel:   +39 051
> 6171722
>
> g.fatigati [AT] cineca.it <http://cineca.it>  



Re: [hwloc-users] hwloc varning flag

2011-08-13 Thread Brice Goglin
I think I am seeing this too on a custom program, so probably not your
application's fault.
Brice



Le 13/08/2011 10:37, Gabriele Fatigati a écrit :
>
>
> Dearhwloc users and developers,
>
> I'm using hwloc 1.2 stable version Intel 11 compiled and checking my
> little application with valgrind 3.5.
>
> My app calls hwloc_set_area_membind_nodeset() function from a OpenMP
> thread:
>
> hwloc_set_area_membind_nodeset(topology, mem, 1, nodeset,
> HWLOC_MEMBIND_BIND, HWLOC_MEMBIND_THREAD | HWLOC_MEMBIND_STRICT |
> HWLOC_MEMBIND_NOCPUBIND ) 
>
> membind seems to work well, but valgrind give me a follow warning:
>
> ==2904== Syscall param mbind(nodemask) points to unaddressable byte(s)
> ==2904==at 0x4FF33C1: syscall6 (in /usr/lib64/libnuma.so.1)
> ==2904==by 0x4FF3436: mbind (in /usr/lib64/libnuma.so.1)
> ==2904==by 0x4C208AC: hwloc_linux_set_area_membind
> (topology-linux.c:1071)
> ==2904==by 0x4C1AC3E: hwloc_set_area_membind_nodeset (bind.c:396)
> ==2904==by 0x402165: bind_memory_tonode (main.c:97)
> ==2904==  Address 0x5a64978 is 0 bytes after a block of size 8 alloc'd
> ==2904==at 0x4A05140: calloc (vg_replace_malloc.c:418)
> ==2904==by 0x4C20646: hwloc_linux_membind_mask_from_nodeset
> (topology-linux.c:996)
> ==2904==by 0x4C2083E: hwloc_linux_set_area_membind
> (topology-linux.c:1054)
> ==2904==by 0x4C1AC3E: hwloc_set_area_membind_nodeset (bind.c:396)
> ==2904==by 0x401CBB: bind_memory_tonode (main.c:97)
>
> valgrind has  --tool=memcheck --leak-check=full  exec flags.
>
> It give me the same warning also with just one byte memory bound.
>
> Is it a hwloc warning or my applications warning?
>
> Thanks in forward.
>
>
>
>
>
> -- 
> Ing. Gabriele Fatigati
>
> HPC specialist
>
> SuperComputing Applications and Innovation Department
>
> Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy
>
> www.cineca.it Tel:   +39 051
> 6171722
>
> g.fatigati [AT] cineca.it   
>
>
> ___
> hwloc-users mailing list
> hwloc-us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users



Re: [hwloc-users] hwloc varning flag

2011-08-14 Thread Brice Goglin
FWIW it's worth, it's a "bug" in valgrind. The manpage of mbind does not
exactly match the kernel requirements on mbind parameters. And valgrind
fails at respecting the manpage anyway. See
https://bugs.kde.org/show_bug.cgi?id=280083 for the mess...

Brice



Le 13/08/2011 15:07, Brice Goglin a écrit :
> I think I am seeing this too on a custom program, so probably not your
> application's fault.
> Brice
>
>
>
> Le 13/08/2011 10:37, Gabriele Fatigati a écrit :
>>
>>
>> Dearhwloc users and developers,
>>
>> I'm using hwloc 1.2 stable version Intel 11 compiled and checking my
>> little application with valgrind 3.5.
>>
>> My app calls hwloc_set_area_membind_nodeset() function from a OpenMP
>> thread:
>>
>> hwloc_set_area_membind_nodeset(topology, mem, 1, nodeset,
>> HWLOC_MEMBIND_BIND, HWLOC_MEMBIND_THREAD | HWLOC_MEMBIND_STRICT |
>> HWLOC_MEMBIND_NOCPUBIND ) 
>>
>> membind seems to work well, but valgrind give me a follow warning:
>>
>> ==2904== Syscall param mbind(nodemask) points to unaddressable byte(s)
>> ==2904==at 0x4FF33C1: syscall6 (in /usr/lib64/libnuma.so.1)
>> ==2904==by 0x4FF3436: mbind (in /usr/lib64/libnuma.so.1)
>> ==2904==by 0x4C208AC: hwloc_linux_set_area_membind
>> (topology-linux.c:1071)
>> ==2904==by 0x4C1AC3E: hwloc_set_area_membind_nodeset (bind.c:396)
>> ==2904==by 0x402165: bind_memory_tonode (main.c:97)
>> ==2904==  Address 0x5a64978 is 0 bytes after a block of size 8 alloc'd
>> ==2904==at 0x4A05140: calloc (vg_replace_malloc.c:418)
>> ==2904==by 0x4C20646: hwloc_linux_membind_mask_from_nodeset
>> (topology-linux.c:996)
>> ==2904==by 0x4C2083E: hwloc_linux_set_area_membind
>> (topology-linux.c:1054)
>> ==2904==by 0x4C1AC3E: hwloc_set_area_membind_nodeset (bind.c:396)
>> ==2904==by 0x401CBB: bind_memory_tonode (main.c:97)
>>
>> valgrind has  --tool=memcheck --leak-check=full  exec flags.
>>
>> It give me the same warning also with just one byte memory bound.
>>
>> Is it a hwloc warning or my applications warning?
>>
>> Thanks in forward.
>>
>>
>>
>>
>>
>> -- 
>> Ing. Gabriele Fatigati
>>
>> HPC specialist
>>
>> SuperComputing Applications and Innovation Department
>>
>> Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy
>>
>> www.cineca.it <http://www.cineca.it>Tel:   +39
>> 051 6171722
>>
>> g.fatigati [AT] cineca.it <http://cineca.it>  
>>
>>
>> ___
>> hwloc-users mailing list
>> hwloc-us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users
>



[hwloc-users] Re : hwloc varning flag

2011-08-15 Thread Brice Goglin

No it just means that valgrind could properly check how hwloc uses mbind. But I 
checked the hwloc code again, things look ok, and the kernel is happy with our 
mbind parameters.
Brice


- Reply message -
De : "Gabriele Fatigati" <g.fatig...@cineca.it>
Pour?: "Brice Goglin" <brice.gog...@inria.fr>
Cc : "Hardware locality user list" <hwloc-us...@open-mpi.org>
Objet : [hwloc-users] hwloc varning flag
Date : lun., ao?t 15, 2011 10:55




Hi Brice,

thanks for the info, but it means mbind() function could does not works in
some cases?

2011/8/14 Brice Goglin <brice.gog...@inria.fr>

> **
> FWIW it's worth, it's a "bug" in valgrind. The manpage of mbind does not
> exactly match the kernel requirements on mbind parameters. And valgrind
> fails at respecting the manpage anyway. See
> https://bugs.kde.org/show_bug.cgi?id=280083 for the mess...
>
> Brice
>
>
>
> Le 13/08/2011 15:07, Brice Goglin a ?crit :
>
> I think I am seeing this too on a custom program, so probably not your
> application's fault.
> Brice
>
>
>
> Le 13/08/2011 10:37, Gabriele Fatigati a ?crit :
>
>
>
> Dearhwloc users and developers,
>
>  I'm using hwloc 1.2 stable version Intel 11 compiled and checking my
> little application with valgrind 3.5.
>
>  My app calls hwloc_set_area_membind_nodeset() function from a OpenMP
> thread:
>
>  hwloc_set_area_membind_nodeset(topology, mem, 1, nodeset,
> HWLOC_MEMBIND_BIND, HWLOC_MEMBIND_THREAD | HWLOC_MEMBIND_STRICT |
> HWLOC_MEMBIND_NOCPUBIND )
>
>  membind seems to work well, but valgrind give me a follow warning:
>
>  ==2904== Syscall param mbind(nodemask) points to unaddressable byte(s)
> ==2904==at 0x4FF33C1: syscall6 (in /usr/lib64/libnuma.so.1)
> ==2904==by 0x4FF3436: mbind (in /usr/lib64/libnuma.so.1)
> ==2904==by 0x4C208AC: hwloc_linux_set_area_membind
> (topology-linux.c:1071)
> ==2904==by 0x4C1AC3E: hwloc_set_area_membind_nodeset (bind.c:396)
> ==2904==by 0x402165: bind_memory_tonode (main.c:97)
> ==2904==  Address 0x5a64978 is 0 bytes after a block of size 8 alloc'd
> ==2904==at 0x4A05140: calloc (vg_replace_malloc.c:418)
> ==2904==by 0x4C20646: hwloc_linux_membind_mask_from_nodeset
> (topology-linux.c:996)
> ==2904==by 0x4C2083E: hwloc_linux_set_area_membind
> (topology-linux.c:1054)
> ==2904==by 0x4C1AC3E: hwloc_set_area_membind_nodeset (bind.c:396)
>  ==2904==by 0x401CBB: bind_memory_tonode (main.c:97)
>
>  valgrind has  --tool=memcheck --leak-check=full  exec flags.
>
>  It give me the same warning also with just one byte memory bound.
>
>  Is it a hwloc warning or my applications warning?
>
>  Thanks in forward.
>
>
>
>
>
>  --
> Ing. Gabriele Fatigati
>
> HPC specialist
>
> SuperComputing Applications and Innovation Department
>
> Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy
>
> www.cineca.itTel:   +39 051 6171722
>
> g.fatigati [AT] cineca.it
>
>
> ___
> hwloc-users mailing 
> listhwloc-users@open-mpi.orghttp://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users
>
>
>
>


-- 
Ing. Gabriele Fatigati

HPC specialist

SuperComputing Applications and Innovation Department

Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy

www.cineca.itTel:   +39 051 6171722

g.fatigati [AT] cineca.it


[hwloc-users] Re : lstopo on multiple machines

2011-08-16 Thread Brice Goglin

Hello Seb,
Hwloc only looks at the local machine, there's no support for multinode 
topology detection so far. We are considering adding it but we don't know yet 
what users want to do with it, if it should be in the core or not, automatic or 
nor. Your feedback is welcome.
Brice

- Reply message -
De : "PULVERAIL S?bastien" 
Pour?: 
Objet : [hwloc-users] lstopo on multiple machines
Date : mar., ao?t 16, 2011 15:04




Hello,



I have two machines I use for running my programs on multiple nodes (with
hydra or slurm).

When I launch my lstopo command, only one machine characteristics are
printed.

How can I tell HWLOC to look for those two machines ?



--

Seb





Re: [hwloc-users] Bind current thread to a specific cpu

2011-08-18 Thread Brice Goglin
Are you talking about logical ids (the one given by hwloc) or
physical/OS ids (the one given by the OS and possibly in strange order)
? You should avoid using physical ids, but...

If logical, you can hwloc_get_obj_by_type() to get the corresponding
object, then use its ->cpuset.

If physical, you just need a cpuset that contains the bit corresponding
to this id. You can use hwloc_bitmap_only(set, ) to reset a
(previously allocated) cpuset to nothing but this id.

Brice



Le 18/08/2011 10:10, PULVERAIL Sébastien a écrit :
>
> Hi,
>
>  
>
> I'm looking for a function that allows to bind the current thread to a
> specific cpu given by its id.
>
>  
>
> I found the function hwloc_set_thread_cpubind to bind a thread to a
> cpuset.
>
> I also found the function hwloc_bitmap_singlify to keep only one index
> in the cpuset.
>
> But I didn't found anything to only keep the cpu I need in my cpuset...
>
>
> Is it possible ?
>
>  
>
> Best regards
>
> _
>
>  
>
> *Sébastien Pulvérail*| Sogeti High Tech
>
> Phone +33 (0) 5 34 46 92 98 | Mobile +33 (0) 6 84 44 73 26
>
> sebastien.pulver...@sogeti.com 
>
>  
>
> 3 Chemin de Laporte | Bât. AEROPARK | 31300 Toulouse | France
>
> www.sogeti.com / www.sogeti-hightech.fr
>
>  
>
> logo_signature_email_Sogeti High Tech
>
> _
>
> P/Please consider the environment before printing !/
>
>  
>
>
> ___
> hwloc-users mailing list
> hwloc-us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users



Re: [hwloc-users] Numa availability

2011-08-28 Thread Brice Goglin
Le 28/08/2011 12:14, Gabriele Fatigati a écrit :
> Dear hwloc users, 
>
> what happens if I use hwloc on a non-NUMA machine? I suppose memory
> binding has no sense because there is not a memory locality concept.
> And regards execution binding? are there some difference on a non-NUMA
> machine?

Hello Gabriele,

Execution binding remains exactly the same.

Memory binding has no sense on non-NUMA machine but it's still available
for application portability (it basically just does nothing).

>  Is there a hwloc routine to check this?

get_nbobjs_by_type(topology, HWLOC_OBJ_NODE) tells how many NUMA node
objects exist.
If you get >1, the machine is NUMA.
If the non-NUMA case, I think you can get 0 or 1 depending on whether
the OS is NUMA-aware or not (not sure we should remove this possible
difference).

Brice



[hwloc-users] Re : Re : hwloc topology check initializing

2011-09-03 Thread Brice Goglin

There's no way to implement this check safely (being non NULL doesn't mean it 
was properly initialized by the user, it could still point to random memory 
that would cause a segfault when checking).

If you really need something like this, put an integer value on the side of the 
topology variable, and make 0 or 1 depending on whether the topology was init 
or not.

Brice


- Reply message -
De : "Gabriele Fatigati" <g.fatig...@cineca.it>
Pour : "Hardware locality user list" <hwloc-us...@open-mpi.org>
Objet : [hwloc-users] Re : hwloc topology check initializing
Date : sam., sept. 3, 2011 15:26




Hi Brice,

but it works only if the user  assing NULL to topology.

hwloc_topology_init() does not check the argument passed ? There are no ways
to check if topology is initialized or not?

Thanks.

2011/9/3 Brice Goglin <brice.gog...@inria.fr>

>
> Assign NULL to the topology when declaring the variable. It will be changed
> into something else when init() is called.
>
> Brice
>
> - Reply message -
> De : "Gabriele Fatigati" <g.fatig...@cineca.it>
> Pour : "Hardware locality user list" <hwloc-us...@open-mpi.org>
> Objet : [hwloc-users] hwloc topology check initializing
> Date : sam., sept. 3, 2011 10:56
>
>
>
>
> Dear hwloc users,
>
> how to check if my hwloc topology is initialized? I have to use
> hwloc_topology_check? This code not works:
>
> hwloc_topology_t topology;
>
> if( topology==NULL)
>  exit(-1);
>
>
>
>
>
> --
> Ing. Gabriele Fatigati
>
> HPC specialist
>
> SuperComputing Applications and Innovation Department
>
> Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy
>
> www.cineca.itTel:   +39 051 6171722
>
> g.fatigati [AT] cineca.it
>
> ___
> hwloc-users mailing list
> hwloc-us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users
>



-- 
Ing. Gabriele Fatigati

HPC specialist

SuperComputing Applications and Innovation Department

Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy

www.cineca.itTel:   +39 051 6171722

g.fatigati [AT] cineca.it


Re: [hwloc-users] Process and thread binding

2011-09-12 Thread Brice Goglin
Le 12/09/2011 13:29, Gabriele Fatigati a écrit :
> Hi Birce,
>
> I'm so  confused..
>
> I'm binding MPI processes with set_cpu_bind and it works well. The
> problem is when I  try to bind process and threads.
>
> It seem that thread process influence bind of main thread.
>
> And from hwloc manual:
>
>
> hwloc_set_cpubind()
>
> Bind *current process* or thread on cpus given in bitmap set.
>
> Why you are saying tha process bind is not possible? I'm using it and
> it work well!

It worked because you never mixed it with single thread binding. If you
bind process X to coreA and then thread Y of process X to coreB, what
you should now see with get_cpubind is that process X is now bound to
cores A+B, thread Y to B, and all other threads to A.

Brice



Re: [hwloc-users] Process and thread binding

2011-09-12 Thread Brice Goglin
Le 12/09/2011 14:17, Gabriele Fatigati a écrit :
> Mm, and why? In a hybrid code ( MPI + OpenMP), my idea is to bind a
> single MPI process in  one core, and his threads in other cores.
> Otherwise I have all threads that runs on a single core.
>

The usual way to do that is to first bind the entire process to all
cores available to all its thread and then bind each thread to a single
core.

For instance, if doing Open MPI + OpenMP with one process per socket
* you pass --bind-to-socket on the mpirun/mpiexec command-line
* when the MPI process starts, the OpenMP runtime calls something like
get_cpubind to find out how many cores were given to it
* it creates the exact same number of OpenMP threads and bind one of
them on each core

Brice



Re: [hwloc-users] hwloc set membind function

2011-09-22 Thread Brice Goglin
Le 22/09/2011 12:20, Gabriele Fatigati a écrit :
> NUMA node(s) near the specified cpuset.
>
> What does   "nodes near the specified cpuset" means? The node wherethe
> specified cpuset lives?
> Set the default memory binding policy of the current process or thread
> to prefer the

The node near the CPU specified in the cpuset.

> The first thread allocates with a malloc an array. The second thread
> (bound on other node) initialize the array.
>
> The free memory on the nodes decrease only on the node where the
> second thread is. Is it rigth?

Yes.

>
>  hwloc_set_membind involves all future allocations?
>

Yes. And already allocated pages if you add the migrate flag.

Brice



Re: [hwloc-users] hwloc set membind function

2011-09-25 Thread Brice Goglin
Le 25/09/2011 11:14, Gabriele Fatigati a écrit :
>
> I report my  questions in a different way (in the first question i did
> a mistake):
>
>
> 1) I don't understand the means of set_membind() function. Why I
> should to allocate in a node "near" my cpuset and not in my local node
> (where thread or process runs?)

It's exactly the same. Your local node is near the cpuset that contains
the CPUs that are close to this node.

> 2) Which is the behaviour of HWLOC_MEMBIND_BIND  flag? 
>
> From the manual:
>
> "Allocate memory on the specified nodes."
>
> It means that I can allocate without binding the memory?

It's about physical memory allocation (first touch causing a fault
causing a page to be allocated), not about virtual memory (malloc).

> What happens if one thread allocate and another thread in another node
>  read/write for the first time this memory? In a little example I see
> the memory is allocated on the second thread, not where first thread
> does malloc().  So, when I have to use HWLOC_MEMBIND_BIND flag? Or it
> has nothing to do with binding?
>
> If the effective allocation is done when first thread touch the
> memory, which is the means of this flag?

The flag says "when the first touch occurs and the physical memory is
allocated for real, don't allocate on the local node (default), but
rather allocate where specified by set_membind".

> 2) My goal is to replicate the behaviour of set_area_membind_nodeset()
> in some manner for all futures allocation without call this function
> each time I allocate some memory. Is it possible to do this?

set_membind_nodeset() with BIND and the nodeset containing the nodes
where physical memory should be allocated.

Brice



Re: [hwloc-users] hwloc set membind function

2011-09-25 Thread Brice Goglin
Le 25/09/2011 12:19, Gabriele Fatigati a écrit :
> Hi Brice,
>
> >The flag says "when the first touch occurs and the physical memory is
> allocated for real, don't allocate on the local node (default), but
> >rather allocate where specified by set_membind".
>
> If is it already allocated for real, how set_membind() can allocate on
> other node?

Add the MIGRATE flag.

> So, what's the difference between HWLOC_MEMBIND_BIND and
> HWLOC_MEMBIND_FIRSTTOUCH?

First touch makes the allocation on the node local to the thread that
touches first (default on Linux).
BIND makes the allocation on the node specified in set_membind.

> Doing the follow test:
>
> omp parallel region
>
> if(tid==0){
>  malloc(array)...
>  set_area_membind(HWLOCMEMBIND_BIND, node 0)
> }
>
> if (tid==1){
>  set_area_membind(HWLOCMEMBIND_BIND, node 1)

If both set_area_membind work on the same array (not on different
halves), this is doubly-wrong:
* you have no guarantee that thread 0 has already finished doing the
malloc before thread 1 does set_area_membind on the buffer.
* doing two set_area_membind on the same entire array is useless, the
second one will overwrite the first one.

Brice



Re: [hwloc-users] hwloc set membind function

2011-09-25 Thread Brice Goglin
Le 25/09/2011 20:27, Gabriele Fatigati a écrit :
> if(tid==0){
>
>  set_membind(HWLOCMEMBIND_BIND, node 0)
>  malloc(array)...
>
> }
>
> if (tid==1){
>  set_membind(HWLOCMEMBIND_BIND, node 1)
>
> for(i...)
>   array(i)
> }
>
> end parallel region
>
>
> array is allocated on node 1, not node 0 as I expected So it seems
> set_membind() of second thread influence in some manner array
> allocation using first touch.

Why do you call set_membind() here? It's whole point is to change the
allocation policy of the current thread. If this thread then
first-touches some data, this data will obviously get allocated
acocording to set_membind().

If you don't want set_membind() to modify the allocation policy of the
current thread, why do you call it?

Brice




Re: [hwloc-users] hwloc set membind function

2011-09-25 Thread Brice Goglin
Le 25/09/2011 21:05, Gabriele Fatigati a écrit :
> Ok,
>
> so, set_membind() merged with HWLOC_MEMBIND_BIND is useless?

It's likely the most useful memory binding case. It's similar to what
numactl --membind does for instance, very common.

> The behaviour I want to set is it possible?

I just said "you have to touch right after malloc."

Brice


>
> 2011/9/25 Brice Goglin <brice.gog...@inria.fr
> <mailto:brice.gog...@inria.fr>>
>
> Le 25/09/2011 20:57, Gabriele Fatigati a écrit :
> > after done this, memory is allocated not in a local node of thread
> > that does set_membind and malloc, but in node of  thread that
> touches
> > it. And I don't understand this behaviour :(
>
> Memory is allocated when first-touched. If there's no area-specific
> policy, it's allocated according to the policy of the thread that
> touches, not according to the policy of the one that did malloc.
> If you
> want to follow the malloc'er thread, you have to touch right after
> malloc.
>
> Brice
>
>
> ___
> hwloc-users mailing list
> hwloc-us...@open-mpi.org <mailto:hwloc-us...@open-mpi.org>
> http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users
>
>
>
>
> -- 
> Ing. Gabriele Fatigati
>
> HPC specialist
>
> SuperComputing Applications and Innovation Department
>
> Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy
>
> www.cineca.it <http://www.cineca.it>Tel:   +39 051
> 6171722
>
> g.fatigati [AT] cineca.it <http://cineca.it>  
>
>
> ___
> hwloc-users mailing list
> hwloc-us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users



Re: [hwloc-users] How to combine hwloc-bind and mpirun

2011-11-10 Thread Brice Goglin
Le 10/11/2011 13:13, Rafael R. Pappalardo a écrit :
> I am trying to send a MPI job to selected cores on a 64 cores machine. With 
> taskset I use:
>
> mpirun -np 8 taskset -c 1,3,5,7,9,11,13,15 program
>
> but if I substitute taskset by hwloc-bind doing
>
> mpirun -np 8 hwloc-bind core:1 core:3 core:5 core:7 core:9 core:11 core:13 
> core:15 program
>
> it does not work.

What do you mean by "does not work"? Failure? No binding? Wrong binding?

Note that taskset numbers are very likely different from hwloc-bind core
numbers. If you want to bind on 8 cores on the second socket, it may be
mpirun -np 8 hwloc-bind core:8-15 program


> "Each hwloc-bind command in the mpirun above doesn't know that there
> are other hwloc-bind instances on the same machine. All of them bind
> their process to all cores in the first socket. "

This sentence also applies to taskset.

> Is there something wrong if I do:
>
> hwloc-bind core:1 core:3 core:5 core:7 core:9 core:11 core:13 core:15 mpirun -
> np 8 program

If you don't run the mpirun command on the machine where the final MPI
processes run, it won't work at all.

Otherwise, I would say that it depends on the implementation of mpirun.
And even if it binds the final MPI processes, it won't be better than above.

If you want to bind each individual process on a single and independent
core, you can:
* use a mpirun that can do that
* use a more complex mpiexec line if your MPI implementation supports
it, for instance by bind each process individually:
mpiexec -np 1 hwloc-bind core:8 program : -np 1 hwloc-bind core:9
program : -np 1 hwloc-bind core:10 program : -np 1 hwloc-bind core:11
program : -np 1 hwloc-bind core:12 program : -np 1 hwloc-bind core:13
program : -np 1 hwloc-bind core:14 program : -np 1 hwloc-bind core:15
program

Brice



Re: [hwloc-users] GPU/NIC/CPU locality

2011-11-29 Thread Brice Goglin
Hello Stefan,

hwloc 1.3 already has support for PCI device detection. These new
objects contain a "class" field that can help you know if it's a NIC/GPU/...

However it's hard to know which PCI device is eth0 or eth1, so we also
try to add some OS device inside PCI device. If you're using Linux, you
will see which network device (eth0, ...), IB device (mlx4_0, ...), or
disk (sda, ...) corresponds to each PCI device (if any). Just run lstopo
on your machine to see what I am talking about. Then you should read the
I/O devices section in the doc.

There's also some work to insert CUDA device information inside those
PCI devices.

Additionally, we have some helpers to retrieve locality of some custom
libraries objects (OFED, CUDA, ...). See the interoperability section in
the doc.

How are you using GPUs and NICs in your software? Which libraries or
ways do you use to access them?

hope this helps.
Brice




Le 29/11/2011 09:32, Stefan Eilemann a écrit :
> All,
>
> We have the need to discover which GPUs and NICs are close to which CPUs[1], 
> independent from CUDA. From the overview page there are hints that there is 
> some kind of support planned, but it's unclear to me of how much of this is 
> implemented.
>
> Is there support in hwloc, and in which version, for this? If yes, can you 
> give me a hint/code snippet on how to do this? If no, what does it take to 
> get this support in hwloc?
>
>
> Cheers,
>
> Stefan.
>
> [1] https://github.com/Eyescale/Equalizer/issues/57
>



Re: [hwloc-users] GPU/NIC/CPU locality

2011-11-29 Thread Brice Goglin

> Hwloc optional build support status (more details can be found above):
>
> Probe / display PCI devices: yes
> Graphical output (Cairo):yes
> XML output:  full

"XML output" should be "XML input/output" or "XML support".

> Memory support:  binding, set policy, migrate pages

Looks ok otherwise.

Brice



[hwloc-users] removing old cpuset API?

2012-01-19 Thread Brice Goglin
Dear hwloc users,

The cpuset API (hwloc_cpuset_*) was replaced by the bitmap API
(hwloc_bitmap_*) in v1.1.0, back in december 2010. We kept backward
compatibility by #defin'ing the old API on top of the new one. So you
may stil use the old API in your application (but you would get
"deprecated" warning).

Now, we're thinking of removing this compatiblity layer one day or
another. You would have to upgrade your application to the new API. If
your application must still work with old hwloc too, you may support
both API by #defin'ing the new API on top of the old one as explained at
the end of http://localhost/hwloc/projects/hwloc/doc/v1.3.1/a00010.php

So, is anybody against removing the hwloc/cpuset.h compatibility layer
in the near future (not before v1.5, which may mean Spring 2012) and
letting application deal with this if they really need it?

Brice



[hwloc-users] hwloc and HTX device ?

2012-01-27 Thread Brice Goglin
Hello,

I'd like to see what hwloc reports on AMD machines with a HTX card
(hypertransport expansion card). The most widely known case would likely
be a 3-5-years old AMD cluster with Pathscale Infinipath network cards.
But I think there are also some accelerators such as clearspeed, and the
numaconnect single-image interconnect.

HTX slots do not involve PCI, but AMD may have implemented some glue to
make them appear in lspci anyway. So it's not clear if hwloc will see
them or not.

If anybody has access to such a machine, could you please run lstopo
(>=1.3) there and tell us if the HTX device appear? if not, we'll need
to see if there are some /dev files to look at. If yes, does it appear
close to a single socket ? If so, is this the right socket ? (feel free
to tell what model of machine this is, I will check in the motherboard
manual to make sure this is the right socket).

Thanks
Brice



Re: [hwloc-users] receive 0x0 from hwloc_cuda_get_device_cpuset

2012-02-21 Thread Brice Goglin
Le 21/02/2012 15:42, Albert Solernou a écrit :
> Hi,
> I have several questions in order to fix this issue from the machine
> side.
>
> 1) I realised that on this machine neither libcpuset nor cpuset-utils
> are installed. Could this be related to the problem?

No, Linux "cpuset" are very different from hwloc "cpuset" and "bitmap"
unfortunately. The former is about reducing the available resources in
the machine so that processes cannot use the entire CPUs for instance.
hwloc detects this feature but it doesn't need libcpuset to do so.
Things just work :)

> 2) Could you specify any BIOS parameter we could tune up

You can look for PCI affinity or PCI NUMA maybe. But I don't think
you'll find anything because your machine isn't NUMA anyway. I/O
affinity don't matter on this machine, there's no reason to
enable/disable it in this BIOS.

> 3) Could this issue be related to the linux kernel?

I think the kernel has been properly detecting this kind of affinity
from the BIOS for a very long time. At least 2.6.18 but likely way earlier.

You should just forget about this problem and use hwloc 1.4.1rc1
(released today, already on the web, to be announced soon, once windows
zips are ready). It contains the workaround for your problem.

Brice



Re: [hwloc-users] Problems on SMP with 48 cores

2012-03-13 Thread Brice Goglin
Le 13/03/2012 17:04, Hartmut Kaiser a écrit :
>>> But the problems I was seeing were not MSVC specific. It's a
>>> proliferation of arcane (non-POSIX) function use (like strcasecmp,
>>> etc.) missing use of HAVE_UNISTD_H, HAVE_STRINGS_H to wrap
>>> non-standard headers, unsafe mixing of
>>> int32<->int64 data types, reliance on int (and other types) having a
>>> certain bit-size, totally unsafe shift operations, wide use of
>>> (non-C-standard) gcc extensions, etc. Should I go on?
> More investigation shows that the code currently assumes group (and
> processor) masks to be 32 bit, which is not true on 64 bit systems.

No. What it assumes is that you have a sane compiler where ulong is not
32bits on 64bits systems :)

Brice



Re: [hwloc-users] Problems on SMP with 48 cores

2012-03-13 Thread Brice Goglin
Le 13/03/2012 17:04, Hartmut Kaiser a écrit :
>>> But the problems I was seeing were not MSVC specific. It's a
>>> proliferation of arcane (non-POSIX) function use (like strcasecmp,
>>> etc.) missing use of HAVE_UNISTD_H, HAVE_STRINGS_H to wrap
>>> non-standard headers, unsafe mixing of
>>> int32<->int64 data types, reliance on int (and other types) having a
>>> certain bit-size, totally unsafe shift operations, wide use of
>>> (non-C-standard) gcc extensions, etc. Should I go on?
> More investigation shows that the code currently assumes group (and
> processor) masks to be 32 bit, which is not true on 64 bit systems. For
> instance this (topology-windows.c: line 643):
>
> hwloc_bitmap_from_ith_ulong(obj->cpuset, GroupMask[i].Group,
> GroupMask[i].Mask);

Try applying something like the patch below. Totally untested obviously,
but we'll see if that starts improving lstopo.

Brice


diff --git a/src/topology-windows.c b/src/topology-windows.c
index 55821a4..94e5073 100644
--- a/src/topology-windows.c
+++ b/src/topology-windows.c
@@ -520,7 +520,8 @@ hwloc_look_windows(struct hwloc_topology *topology)
obj = hwloc_alloc_setup_object(type, id);
 obj->cpuset = hwloc_bitmap_alloc();
hwloc_debug("%s#%u mask %lx\n", hwloc_obj_type_string(type), id, 
procInfo[i].ProcessorMask);
-   hwloc_bitmap_from_ulong(obj->cpuset, procInfo[i].ProcessorMask);
+   hwloc_bitmap_from_ulong(obj->cpuset, procInfo[i].ProcessorMask & 
0x);
+   hwloc_bitmap_from_ith_ulong(obj->cpuset, 1, procInfo[i].ProcessorMask 
>> 32);

switch (type) {
  case HWLOC_OBJ_NODE:
@@ -622,7 +623,8 @@ hwloc_look_windows(struct hwloc_topology *topology)
  mask = procInfo->Group.GroupInfo[id].ActiveProcessorMask;
  hwloc_debug("group %u %d cpus mask %lx\n", id,
   procInfo->Group.GroupInfo[id].ActiveProcessorCount, mask);
- hwloc_bitmap_from_ith_ulong(obj->cpuset, id, mask);
+ hwloc_bitmap_from_ith_ulong(obj->cpuset, 2*id, mask & 0x);
+ hwloc_bitmap_from_ith_ulong(obj->cpuset, 2*id+1, mask >> 32);
  hwloc_insert_object_by_cpuset(topology, obj);
}
continue;
@@ -636,7 +638,8 @@ hwloc_look_windows(struct hwloc_topology *topology)
 obj->cpuset = hwloc_bitmap_alloc();
 for (i = 0; i < num; i++) {
   hwloc_debug("%s#%u %d: mask %d:%lx\n", hwloc_obj_type_string(type), 
id, i, GroupMask[i].Group, GroupMask[i].Mask);
-  hwloc_bitmap_from_ith_ulong(obj->cpuset, GroupMask[i].Group, 
GroupMask[i].Mask);
+  hwloc_bitmap_from_ith_ulong(obj->cpuset, 2*GroupMask[i].Group, 
GroupMask[i].Mask & 0xfff);
+  hwloc_bitmap_from_ith_ulong(obj->cpuset, 2*GroupMask[i].Group+1, 
GroupMask[i].Mask >> 32);
 }

switch (type) {




Re: [hwloc-users] Problems on SMP with 48 cores

2012-03-14 Thread Brice Goglin
We debugged this in private emails with Hartmut. His 48-core platform is
now detected properly. Everything got fixed with a patch
functionnally-identical to what Samuel sent earlier. There's a bit of
work before we can commit the fix, but Windows support for more than 32
cores will be officially fixed in the upcoming hwloc v1.4.2.
Thanks a lot to Hartmut for testing all these patches.
Brice



Re: [hwloc-users] Using distances

2012-04-21 Thread Brice Goglin

On 21/04/2012 12:23, Jeffrey Squyres wrote:

I'm trying to use hwloc distances in Open MPI (e.g., find the distance from 
each OpenFabrics device to the PU(s) where this process is bound), and I'm a 
bit confused by the distances documentation.

If I have a WHOLE_SYSTEM topology, and I know that this process is bound to one 
or more PUs (e.g., both PUs in a core), can you summarize how I use the hwloc 
distances functionality to determine the distance from my process to each of 
the OF devices?



I assume you have the entire distance (latency) matrix between all NUMA 
nodes as usually reported by the BIOS.


const struct hwloc_distance_s *distances = 
hwloc_get_whole_distance_matrix_by_type(topology, HWLOC_OBJ_NODE);

assert(distances);
assert(distances->latency);

Now distances->latency[a+b*distances->nbobjs] contains the latency 
between NUMA nodes whose *logical* indexes are a and b (it may be 
asymmetrical).



Now get the NUMA node object close to your PUs and the NUMA objects 
close to each OFED device, take their ->logical_index and you'll get the 
latencies.


Brice





Re: [hwloc-users] Using distances

2012-04-21 Thread Brice Goglin

On 21/04/2012 13:15, Jeffrey Squyres wrote:

On Apr 21, 2012, at 7:09 AM, Brice Goglin wrote:


I assume you have the entire distance (latency) matrix between all NUMA nodes 
as usually reported by the BIOS.

const struct hwloc_distance_s *distances = 
hwloc_get_whole_distance_matrix_by_type(topology, HWLOC_OBJ_NODE);
assert(distances);
assert(distances->latency);

Is this stored on the topology object?


No it's stored in the object that covers all objects connected by the 
distance matrix. So usually in the root object.



Hence, if this distance data is already covered by the XML export/import, then 
I should have this data.


Yes it should be there.


Now distances->latency[a+b*distances->nbobjs] contains the latency between NUMA 
nodes whose *logical* indexes are a and b (it may be asymmetrical).

Now get the NUMA node object close to your PUs and the NUMA objects close to each 
OFED device, take their ->logical_index and you'll get the latencies.

Ah, ok.  This is what I didn't understand from the docs -- is there no distance 
to actual PCI devices?  I.e., distance is only measured between NUMA nodes?

I ask because the functions allow measuring distance by depth and type -- are 
those effectively ignored, and really all you can check is the distance between 
NUMA nodes?


You can have distance matrices between any object sets of any 
type/depth. Depends what the BIOS reports or what the user adds. The 
BIOS usually only reports NUMA node distances.


We could extend them by saying that the distance between any child of 
NUMA node X and any child of NUMA node Y is equal to the distance 
between NUMA nodes X and Y, but we don't do that.


One reason is that the current distance stuff lets the user add a 
distance matrix between NUMA nodes and another one between sockets, even 
if they are incompatible. When this happens, which one do you use to 
generate the distance between two cores?


There are some tickets in trac they will help clarifying all this mess.

Brice




Re: [hwloc-users] possible concurrency issue with reading /proc data on Linux

2012-04-21 Thread Brice Goglin

On 21/04/2012 23:08, Vlad wrote:

Greetings,

I use hwloc-1.4.1 stable on Red Hat 5 and am seeing a possible 
concurrency issue not covered by the "Thread Safety" guidelines:


- I start a small number (4) of threads,  each of which does some work 
and periodically executes hwloc_get_last_cpu_location() with 
HWLOC_CPUBIND_PROCESS
- occasionally, one or two of those threads will see the call fail 
with ENOSYS (even though the same call has already executed 
successfully a number of times)


These errors are transient and seem to occur only when some of the 
threads in the group are terminating. I've skimmed through the 
implementation in topology-linux.c and it seems plausible to me that 
the errors could be caused by failure to read /proc state "atomically" 
in the presence of concurrent thread starts/exits.


Of course, the latter is hard (impossible ?) to do because the state 
always changes and a snapshot can only be obtained with a single 
read() (which in turn would require knowing how many thread entries to 
expect in advance). However, returning ENOSYS in such cases does not 
seems intended but rather a flaw in retry logic. Similar issues may be 
present with other API methods that rely on 
hwloc_linux_foreach_proc_tid() orhwloc_linux_get_proc_tids().


Can you try the attached patch? It doesn't abort the loop immediately on 
per-tid errors anymore. This may work better when threads disappear. I 
don't remember if the retry logic was written while thinking about 
adding threads only or about adding and removing threads.


If the patch doesn't help, can you send your code to help debug things?

An alternative explanation could be that the retry logic is correct 
but the implementation relies on readdir(), which is documented to not 
be thread-safe: 
http://www.gnu.org/software/libc/manual/html_node/Reading_002fClosing-Directory.html




I don't this can happen. Your threads should not be accessing the same 
DIR stream here.


Thanks
Brice

diff --git a/src/topology-linux.c b/src/topology-linux.c
index e1f46cb..99a6381 100644
--- a/src/topology-linux.c
+++ b/src/topology-linux.c
@@ -475,7 +475,7 @@ hwloc_linux_foreach_proc_tid(hwloc_topology_t topology,
   char taskdir_path[128];
   DIR *taskdir;
   pid_t *tids, *newtids;
-  unsigned i, nr, newnr;
+  unsigned i, nr, newnr, failed;
   int err;

   if (pid)
@@ -497,11 +497,17 @@ hwloc_linux_foreach_proc_tid(hwloc_topology_t topology,

  retry:
   /* apply the callback to all threads */
+  failed=0;
   for(i=0; i

Re: [hwloc-users] possible concurrency issue with reading /proc data on Linux

2012-04-21 Thread Brice Goglin

On 21/04/2012 23:36, Vlad wrote:


Will try this within a day or two. At the moment I am simply using a 
retry loop on ENOSYS and usually no more than one retry is needed.


Ok thanks.

You are probably correct. I was thinking of this code from 
https://svn.open-mpi.org/trac/hwloc/browser/trunk/src/topology-linux.c:


445 while ((dirent = readdir(taskdir)) != NULL) {

"taskdir" here is /proc//task, correct? In which case the threads 
will be doing readdir() on the same DIR stream...


taskdir is a different DIR* for each thread here: each thread does its 
own get_last_cpu_location() which calls its own instance of opendir(). 
Even if the directory behind these DIR* descriptors are the same, it 
should be fine, there's no concurrency on the same DIR* descriptor in 
readdir.


Brice









Thanks
Brice

___
hwloc-users mailing list
hwloc-us...@open-mpi.org 
http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users

If the patch doesn't help, can you send your code to help debug things?



___
hwloc-users mailing list
hwloc-us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users




Re: [hwloc-users] possible concurrency issue with reading /proc data on Linux

2012-04-23 Thread Brice Goglin

On 23/04/2012 16:13, Vlad wrote:

This one seems fine, too.

Note that it should always be possible to read at least the current 
thread's /proc data.


This code also works when the task reading the cpubinding/location is 
not part of the process it looks at.


Brice

In my workaround, should I run out of retries I default to 
hwloc_get_last_cpu_location(... HWLOC_CPUBIND_THREAD) -- since 
presumably that can't fail and the result is technically valid given 
hwloc_get_last_cpu_location() semantics (it reads state that's 
inherently transient).


On Apr 23, 2012, at 7:53 AM, Brice Goglin wrote:


On 21/04/2012 23:36, Vlad wrote:



On Apr 21, 2012, at 5:26 PM, Brice Goglin wrote:


On 21/04/2012 23:08, Vlad wrote:

Greetings,

I use hwloc-1.4.1 stable on Red Hat 5 and am seeing a possible 
concurrency issue not covered by the "Thread Safety" guidelines:


- I start a small number (4) of threads,  each of which does some 
work and periodically executes hwloc_get_last_cpu_location() with 
HWLOC_CPUBIND_PROCESS
- occasionally, one or two of those threads will see the call fail 
with ENOSYS (even though the same call has already executed 
successfully a number of times)


These errors are transient and seem to occur only when some of the 
threads in the group are terminating. I've skimmed through the 
implementation in topology-linux.c and it seems plausible to me 
that the errors could be caused by failure to read /proc state 
"atomically" in the presence of concurrent thread starts/exits.


Of course, the latter is hard (impossible ?) to do because the 
state always changes and a snapshot can only be obtained with a 
single read() (which in turn would require knowing how many thread 
entries to expect in advance). However, returning ENOSYS in such 
cases does not seems intended but rather a flaw in retry logic. 
Similar issues may be present with other API methods that rely on 
hwloc_linux_foreach_proc_tid() orhwloc_linux_get_proc_tids().


Can you try the attached patch? It doesn't abort the loop 
immediately on per-tid errors anymore. This may work better when 
threads disappear. I don't remember if the retry logic was written 
while thinking about adding threads only or about adding and 
removing threads.


If the patch doesn't help, can you send your code to help debug things?


Will try this within a day or two. At the moment I am simply using a 
retry loop on ENOSYS and usually no more than one retry is needed.




Here's a possibly better patch. It lets the retry logic happen before 
checking whether we should return ENOSYS and friends.


Brice

___
hwloc-users mailing list
hwloc-us...@open-mpi.org <mailto:hwloc-us...@open-mpi.org>
http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users




___
hwloc-users mailing list
hwloc-us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users




Re: [hwloc-users] hwloc_get_last_cpu_location on AIX

2012-05-08 Thread Brice Goglin
Le 08/05/2012 14:33, Hendryk Bockelmann a écrit :
> Hello,
>
> I just ran into trouble using hwloc_get_last_cpu_location on our
> POWER6 cluster with AIX6.1
> My plan is to find out if the binding of the job-scheduler was correct
> for MPI-tasks and OpenMP-threads. This is what I want to use:
>
> support = hwloc_topology_get_support(topology);
> ret = hwloc_get_cpubind(topology, set, HWLOC_CPUBIND_THREAD);
> if (support->cpubind->get_thisthread_cpubind) {
>   hwloc_bitmap_asprintf(, set);
>   printf("--> cpuset (thread %d) is %s \n",omp_get_thread_num(),str);
> }
> if (support->cpubind->get_thisthread_last_cpu_location) {
>   ret = hwloc_set_cpubind(topology, set, HWLOC_CPUBIND_THREAD);
>   last = hwloc_bitmap_alloc();
>   ret = hwloc_get_last_cpu_location(topology,last,HWLOC_CPUBIND_THREAD);
>   hwloc_bitmap_asprintf(, last);
>   printf("--> cpu_loca (thread %d) is %s \n",omp_get_thread_num(),str);
> }
>
> this is what I found in src/tests/hwloc_get_last_cpu_location.c
>
> Running this on my local linux machine gives e.g.:
>
> --> cpuset (thread 1) is 0x0005
> --> cpuset (thread 0) is 0x0005
> --> cpu_loca (thread 0) is 0x0004
> --> cpu_loca (thread 1) is 0x0001
>
> hence, (support->cpubind->get_thisthread_cpubind) and
> (support->cpubind->get_thisthread_last_cpu_location) are both true
>
> but on the AIX cluster I just get:
>
> --> cpuset (thread 0) is 0x0003
> --> cpuset (thread 1) is 0x0003
>
> hence, (support->cpubind->get_thisthread_last_cpu_location) is false.
> Now the question is whether this is related to my install of
> hwloc-1.4.1 or a general problem on AIX?

Hello,
get_last_cpu_location is currently not implemented on AIX. There's a
TODO in the code saying that we should use AIX "mycpu". The main problem
with hwloc on AIX is that none of us has access to a AIX machine anymore.
Brice



Re: [hwloc-users] hwloc - Build problem.

2012-05-20 Thread Brice Goglin
Hello Anatoly,

You likely need to add libxml2.a to your link command-line. And some
others may be missing later.

Instead of linking with src/.libs/libhwloc.a, you should run "make
install" and use libhwloc.a from there (use
--prefix= to tell configure where to install).

Once hwloc is installed, pkg-config can tell you which dependency libs
are needed for static linking:
$ pkg-config --static --libs hwloc
-lhwloc -lxml2 -lz -lm -lpci
If you don't install hwloc in prefix=/usr, you may need PKG_CONFIG_PATH
to tell pkg-config where to look.

Anyway, the above list (-lxml2 -lz -lm -lpci) should be correct.

Brice




Le 20/05/2012 15:14, Anatoly G a écrit :
> Dear HWLOC.
> I downloaded 1.4.2 version (tar file).
> Performed 
> 1) *./configure --enable-static*
> 2) *make*
> 3) *Wrote program*
> #include "hwloc.h"
> link with ($hwloc_dir)/src/.libhwloc.a
> 4) *In link stage I get following errors:*
> Linking  EXE:
> /space/home/anatol-g/Grape/release_4.6_FH/core/bin/linux64/rhe6/g++4.4.4/debug/mpi_rcv_waitany
> ...
> /product/grape-data/hwloc-1.4.2/src/.libs/libhwloc.a(topology-xml.o):
> In function `hwloc_libxml2_disable_stderrwarnings':
> topology-xml.c:(.text+0x2d9): undefined reference to `__xmlGenericError'
> topology-xml.c:(.text+0x2f0): undefined reference to
> `xmlSetGenericErrorFunc'
> /product/grape-data/hwloc-1.4.2/src/.libs/libhwloc.a(topology-xml.o):
> In function `hwloc_backend_xml_init':
> topology-xml.c:(.text+0x34f): undefined reference to `xmlCheckVersion'
> topology-xml.c:(.text+0x37c): undefined reference to `xmlReadFile'
> .
> .
>
> Attached program file + error report.
> I use 
> OS: Red Hat 6.0 Santiago  
> gcc 4.4.4-13
>
> Can you please help me?
> Sorry if my question looks stupid.
> Anatoly.
>
>
> ___
> hwloc-users mailing list
> hwloc-us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users



Re: [hwloc-users] hwloc_get_last_cpu_location on AIX

2012-05-29 Thread Brice Goglin
Thanks to your help, this should now work in hwloc trunk. A tarball will
be available tomorrow morning at
http://www.open-mpi.org/software/hwloc/nightly/trunk/
(you need a SVN revision >= 4528)

I also added instruction cache detection for AIX while I was working on AIX.

I am now looking at get_last_cpu_location() for entire processes instead
of individual threads.

Brice



Le 08/05/2012 14:41, Brice Goglin a écrit :
> Le 08/05/2012 14:33, Hendryk Bockelmann a écrit :
>> Hello,
>>
>> I just ran into trouble using hwloc_get_last_cpu_location on our
>> POWER6 cluster with AIX6.1
>> My plan is to find out if the binding of the job-scheduler was correct
>> for MPI-tasks and OpenMP-threads. This is what I want to use:
>>
>> support = hwloc_topology_get_support(topology);
>> ret = hwloc_get_cpubind(topology, set, HWLOC_CPUBIND_THREAD);
>> if (support->cpubind->get_thisthread_cpubind) {
>>   hwloc_bitmap_asprintf(, set);
>>   printf("--> cpuset (thread %d) is %s \n",omp_get_thread_num(),str);
>> }
>> if (support->cpubind->get_thisthread_last_cpu_location) {
>>   ret = hwloc_set_cpubind(topology, set, HWLOC_CPUBIND_THREAD);
>>   last = hwloc_bitmap_alloc();
>>   ret = hwloc_get_last_cpu_location(topology,last,HWLOC_CPUBIND_THREAD);
>>   hwloc_bitmap_asprintf(, last);
>>   printf("--> cpu_loca (thread %d) is %s \n",omp_get_thread_num(),str);
>> }
>>
>> this is what I found in src/tests/hwloc_get_last_cpu_location.c
>>
>> Running this on my local linux machine gives e.g.:
>>
>> --> cpuset (thread 1) is 0x0005
>> --> cpuset (thread 0) is 0x0005
>> --> cpu_loca (thread 0) is 0x0004
>> --> cpu_loca (thread 1) is 0x0001
>>
>> hence, (support->cpubind->get_thisthread_cpubind) and
>> (support->cpubind->get_thisthread_last_cpu_location) are both true
>>
>> but on the AIX cluster I just get:
>>
>> --> cpuset (thread 0) is 0x0003
>> --> cpuset (thread 1) is 0x0003
>>
>> hence, (support->cpubind->get_thisthread_last_cpu_location) is false.
>> Now the question is whether this is related to my install of
>> hwloc-1.4.1 or a general problem on AIX?
> Hello,
> get_last_cpu_location is currently not implemented on AIX. There's a
> TODO in the code saying that we should use AIX "mycpu". The main problem
> with hwloc on AIX is that none of us has access to a AIX machine anymore.
> Brice
>



Re: [hwloc-users] hwloc_get_latency() failures and confusion

2012-08-06 Thread Brice Goglin
Le 06/08/2012 23:47, Wheeler, Kyle Bruce a écrit :
> Hello,
>
> I'm failing to understand what hwloc (v1.5) is doing. I'm trying to use 
> hwloc_get_latency() to determine the distance between two cores.
>
> The two cores are on different sockets. According to libnuma's numactl, the 
> latency between the two sockets is 20, whereas between cores on the same 
> socket is 10. According to hwloc-ls -v, the latency is 2.0, whereas between 
> cores on the same socket is 1.0. Thus, I know that hwloc is getting topology 
> information.
>
> However, programmatically, hwloc_get_latency() just returns -1. I tried using 
> hwloc_get_whole_distance_matrix_by_depth(), and found that the distance 
> matrix is only defined for depth 0

Hello Kyle,
The distance/latency API is indeed difficult to understand because it
tries to be (too) generic.
You should not be getting a distance matrix for depth 0 above. You get
one for depth=1 (the depth of NUMAnodes in your topology).

>  which, according to hwloc_obj_type_string(hwloc_get_depth_type(topology, 0)) 
> is "Machine". Now, the documentation for 
> hwloc_get_whole_distance_matrix_by_depth() says it returns "a distances 
> structure containing a matrix with all distances between all objects at the 
> given depth". Given that I only have one object that depth 0 (just the one 
> machine), what does this mean? If I try with depth 1 (aka "NUMANode" or 
> HWLOC_OBJ_NODE), I get NULL back, suggesting that there is no matrix of 
> distances between NUMANodes. Of course, that's not true; hwloc-ls reports 
> that matrix! So what's going on here?

hwloc-ls uses hwloc_get_whole_distance_matrix_by_depth() :

for (depth = 0; depth < topodepth; depth++) {
  distances = hwloc_get_whole_distance_matrix_by_depth(topology, depth);
  if (!distances || !distances->latency)
continue;
  printf("latency matrix between %ss (depth %u) by %s indexes:\n",
 hwloc_obj_type_string(hwloc_get_depth_type(topology, depth)),
 depth,
 logical ? "logical" : "physical");
  hwloc_utils_print_distance_matrix(topology, hwloc_get_root_obj(topology), 
distances->nbobjs, depth, distances->latency, logical);
}


So I don't see how you could be seeing something different. Can you send
your code and your XML topology?

> I would add that the hwloc_distances_s returned by 
> hwloc_get_whole_distance_matrix_by_depth(topology, 0) is: { 0, 0, 0x0, 0, 0 }

That's strange, I need to look at this.

> And why is hwloc_get_latency() failing?

If you pass some Core objects to get_latency(), it's expected that it
fails because the topology only has latencies between NUMA nodes. You
should walk up the object parent links until you find NUMAnode objects.
We've been thinking of handling this case inside hwloc but we're not
sure it's always a good idea to do so.


We have several tickets open against the distance code. We know it's not
perfect so we'll be happy to hear your feedback. There are so many
things involved in this case that it's hard to figure out what's
actually important to users.

Brice



Re: [hwloc-users] [EXTERNAL] Re: hwloc_get_latency() failures and confusion

2012-08-06 Thread Brice Goglin
Le 07/08/2012 00:36, Wheeler, Kyle Bruce a écrit :
> A, that's key! The documentation currently says "Look at ancestor
> objects from the bottom to the top until one of them contains a
> distance matrix that matches the objects exactly", which suggests to
> me that it will traverse the object hierarchy looking for the
> NUMANodes *for* me.

Ahh, this one is exactly what's really confusing. There are two things here:
1) the object that contains the distance matrix
2) the object that are covered by the matrix

When the matrix covers the entire machine (usual case), (1) is the root
object and (2) are NUMA nodes.
If you ever have a distance matrix between all cores of the first socket
(and not any other core in the machine), the first socket object would
contain a matrix with distance->relative_depth = depth(socket)-depth(core)

So when you're looking for Core latencies, you check whether the
ancestor immediately above Core contain a matrix for Core distances,
then its parent, ... up to the root object. So it's about (1) moving up
but (2) remains the same type (but (2) gets wider when (1) goes up).

Brice



Re: [hwloc-users] lstopo and GPus

2012-08-28 Thread Brice Goglin
Le 28/08/2012 14:23, Samuel Thibault a écrit :
> Gabriele Fatigati, le Tue 28 Aug 2012 14:19:44 +0200, a écrit :
>> I'm using hwloc 1.5. I would to see how GPUs are connected with the processor
>> socket using lstopo command. 
> About connexion with the socket, there is indeed no real graphical
> difference between "connected to socket #1" and "connected to all
> sockets". You can use the text output for that:
>
> $ lstopo
>   Socket #0
>   Socket #1
> PCI...
> (connected to socket #1)
>
> vs
>
> $ lstopo
>   Socket #0
>   Socket #1
>   PCI...
> (connected to both sockets)

Fortunately, this won't occur in most cases (including Gabriele's
machines) because there's a NUMAnode object above each socket. Both the
socket and the PCI bus are drawn inside the NUMA box, so things appear
OK in graphics to.

I've never seen the problem on a real machine, but a fake topology with
a PCI bus attached to a socket that is not strictly equal to the above
NUMA node is indeed wrongly displayed.


Gabriele, assuming you have a dual Xeon X56xx Westmere machine, there
are plenty of such platforms where the GPU is indeed connected to both
sockets. Or it could be a buggy BIOS.

Brice



Re: [hwloc-users] Thread binding problem

2012-09-05 Thread Brice Goglin
What does errno contain?
Aside of ENOSYS and EXDEV, you may also get the "usual" error codes such
as ENOMEM, EPERM or EINVAL.
We didn't document all of them, it mostly depends on the underlying
kernel and mbind implementations.
Brice



Le 05/09/2012 15:44, Gabriele Fatigati a écrit :
> Hi,
>
> I've noted that hwloc_set_area_membind_nodeset return -1 but errno is
> not equal to EXDEV or ENOSYS. I supposed that these two case was the
> two unique possibly.
>
> From the hwloc documentation:
>
> -1 with errno set to ENOSYS if the action is not supported
> -1 with errno set to EXDEV if the binding cannot be enforced
>
>
> Any other binding failure reason? The memory available is enought.
>
> 2012/9/5 Brice Goglin <brice.gog...@inria.fr
> <mailto:brice.gog...@inria.fr>>
>
> Hello Gabriele,
>
> The only limit that I would think of is the available physical
> memory on each NUMA node (numactl -H will tell you how much of
> each NUMA node memory is still available).
> malloc usually only fails (it returns NULL?) when there no
> *virtual* memory anymore, that's different. If you don't allocate
> tons of terabytes of virtual memory, this shouldn't happen easily.
>
> Brice
>
>
>
>
> Le 05/09/2012 14:27, Gabriele Fatigati a écrit :
>> Dear Hwloc users and developers,
>>
>>
>> I'm using hwloc 1.4.1 on a multithreaded program in a Linux
>> platform, where each thread bind many non contiguos pieces of a
>> big matrix using in a very intensive way
>> hwloc_set_area_membind_nodeset function:
>>
>> hwloc_set_area_membind_nodeset(topology, punt+offset, len,
>> nodeset, HWLOC_MEMBIND_BIND, HWLOC_MEMBIND_THREAD |
>> HWLOC_MEMBIND_MIGRATE);
>>
>> Binding seems works well, since the returned code from function
>> is 0 for every calls.
>>
>> The problems is that after binding, a simple little new malloc
>> fails, without any apparent reason.
>>
>> Disabling memory binding, the allocations works well.  Is there
>> any knows problem if  hwloc_set_area_membind_nodeset is used
>> intensively?
>>
>> Is there some operating system limit for memory pages binding?
>>
>> Thanks in advance.
>>
>> -- 
>> Ing. Gabriele Fatigati
>>
>> HPC specialist
>>
>> SuperComputing Applications and Innovation Department
>>
>> Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy
>>
>> www.cineca.it <http://www.cineca.it>Tel:  
>> +39 051 6171722 <tel:%2B39%20051%206171722>
>>
>> g.fatigati [AT] cineca.it <http://cineca.it>  
>>
>>
>> ___
>> hwloc-users mailing list
>> hwloc-us...@open-mpi.org <mailto:hwloc-us...@open-mpi.org>
>> http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users
>
>
>
>
> -- 
> Ing. Gabriele Fatigati
>
> HPC specialist
>
> SuperComputing Applications and Innovation Department
>
> Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy
>
> www.cineca.it <http://www.cineca.it>Tel:   +39 051
> 6171722
>
> g.fatigati [AT] cineca.it <http://cineca.it>  



Re: [hwloc-users] Thread binding problem

2012-09-05 Thread Brice Goglin
An internal malloc failed then. That would explain why your malloc
failed too.
It looks like you malloc'ed too much memory in your program?

Brice




Le 05/09/2012 15:56, Gabriele Fatigati a écrit :
> An update:
>
> placing strerror(errno) after hwloc_set_area_membind_nodeset  gives:
> "Cannot allocate memory"
>
> 2012/9/5 Gabriele Fatigati <g.fatig...@cineca.it
> <mailto:g.fatig...@cineca.it>>
>
> Hi,
>
> I've noted that hwloc_set_area_membind_nodeset return -1 but errno
> is not equal to EXDEV or ENOSYS. I supposed that these two case
> was the two unique possibly.
>
> From the hwloc documentation:
>
> -1 with errno set to ENOSYS if the action is not supported
> -1 with errno set to EXDEV if the binding cannot be enforced
>
>
> Any other binding failure reason? The memory available is enought.
>
> 2012/9/5 Brice Goglin <brice.gog...@inria.fr
> <mailto:brice.gog...@inria.fr>>
>
> Hello Gabriele,
>
> The only limit that I would think of is the available physical
> memory on each NUMA node (numactl -H will tell you how much of
> each NUMA node memory is still available).
> malloc usually only fails (it returns NULL?) when there no
> *virtual* memory anymore, that's different. If you don't
> allocate tons of terabytes of virtual memory, this shouldn't
> happen easily.
>
> Brice
>
>
>
>
> Le 05/09/2012 14:27, Gabriele Fatigati a écrit :
>> Dear Hwloc users and developers,
>>
>>
>> I'm using hwloc 1.4.1 on a multithreaded program in a Linux
>> platform, where each thread bind many non contiguos pieces of
>> a big matrix using in a very intensive way
>> hwloc_set_area_membind_nodeset function:
>>
>> hwloc_set_area_membind_nodeset(topology, punt+offset, len,
>> nodeset, HWLOC_MEMBIND_BIND, HWLOC_MEMBIND_THREAD |
>> HWLOC_MEMBIND_MIGRATE);
>>
>> Binding seems works well, since the returned code from
>> function is 0 for every calls.
>>
>> The problems is that after binding, a simple little new
>> malloc fails, without any apparent reason.
>>
>> Disabling memory binding, the allocations works well.  Is
>> there any knows problem if  hwloc_set_area_membind_nodeset is
>> used intensively?
>>
>> Is there some operating system limit for memory pages binding?
>>
>> Thanks in advance.
>>
>> -- 
>> Ing. Gabriele Fatigati
>>
>> HPC specialist
>>
>> SuperComputing Applications and Innovation Department
>>
>> Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy
>>
>> www.cineca.it <http://www.cineca.it>   
>> Tel:   +39 051 6171722 <tel:%2B39%20051%206171722>
>>
>> g.fatigati [AT] cineca.it <http://cineca.it>  
>>
>>
>> ___
>> hwloc-users mailing list
>> hwloc-us...@open-mpi.org <mailto:hwloc-us...@open-mpi.org>
>> http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users
>
>
>
>
> -- 
> Ing. Gabriele Fatigati
>
> HPC specialist
>
> SuperComputing Applications and Innovation Department
>
> Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy
>
> www.cineca.it <http://www.cineca.it>Tel:   +39
> 051 6171722 <tel:%2B39%20051%206171722>
>
> g.fatigati [AT] cineca.it <http://cineca.it>  
>
>
>
>
> -- 
> Ing. Gabriele Fatigati
>
> HPC specialist
>
> SuperComputing Applications and Innovation Department
>
> Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy
>
> www.cineca.it <http://www.cineca.it>Tel:   +39 051
> 6171722
>
> g.fatigati [AT] cineca.it <http://cineca.it>  



Re: [hwloc-users] Thread binding problem

2012-09-06 Thread Brice Goglin
Le 06/09/2012 09:56, Gabriele Fatigati a écrit :
> Hi Brice, hi Jeff,
>
> >Can you add some printf inside hwloc_linux_set_area_membind() in
> src/topology-linux.c to see if ENOMEM comes from the mbind >syscall or
> not?
>
> I added printf inside that function, but ENOMEM does not come from there.

Not from hwloc_linux_set_area_membind() at all? Or not from mbind?

> >Have you run your application through valgrind or another
> memory-checking debugger?
>
> I tried with valgrind :
>
> valgrind --track-origins=yes --log-file=output_valgrind
> --leak-check=full --tool=memcheck  --show-reachable=yes
> ./main_hybrid_bind_mem
>
> ==25687== Warning: set address range perms: large range [0x39454040,
> 0x2218d4040) (undefined)
> ==25687== 
> ==25687== Valgrind's memory management: out of memory:
> ==25687==newSuperblock's request for 4194304 bytes failed.
> ==25687==34253180928 bytes have already been allocated.
> ==25687== Valgrind cannot continue.  Sorry.

There's really somebody allocating way too much memory here.

You should reduce your array size so that it doesn't fail, and then run
valgrind again to check if somebody is allocated a lot of memory without
ever freeing it.

Brice



>
>
> I attach the full output. 
>
>
> The code dies also using OpenMP pure code. Very misteriously.
>
>
> 2012/9/5 Jeff Squyres >
>
> On Sep 5, 2012, at 2:36 PM, Gabriele Fatigati wrote:
>
> > I don't think is a simply out of memory since NUMA node has 48
> GB, and I'm allocating just 8 GB.
>
> Mmm.  Probably right.
>
> Have you run your application through valgrind or another
> memory-checking debugger?
>
> I've seen cases of heap corruption lead to malloc incorrectly
> failing with ENOMEM.
>
> --
> Jeff Squyres
> jsquy...@cisco.com 
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
>
>
> ___
> hwloc-users mailing list
> hwloc-us...@open-mpi.org 
> http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users
>
>
>
>
> -- 
> Ing. Gabriele Fatigati
>
> HPC specialist
>
> SuperComputing Applications and Innovation Department
>
> Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy
>
> www.cineca.it Tel:   +39 051
> 6171722
>
> g.fatigati [AT] cineca.it   
>
>
> ___
> hwloc-users mailing list
> hwloc-us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users



Re: [hwloc-users] Thread binding problem

2012-09-06 Thread Brice Goglin
Le 06/09/2012 14:51, Gabriele Fatigati a écrit :
> Hi Brice,
>
> the initial grep is:
>
> numa_policy65671  65952 24  1441 : tunables  120   60
>8 : slabdata458458  0
>
> When set_membind fails is:
>
> numa_policy  482   1152 24  1441 : tunables  120   60
>8 : slabdata  8  8288
>
> What does it means?

The first number is the number of active objects. That means 65000
mempolicy objects were in use on the first line.
(I wonder if you swapped the lines, I expected higher numbers at the end
of the run)

Anyway, having 65000 mempolicies in use is a lot. And that would somehow
correspond to the number of set_area_membind that succeeed before one
fails. So the kernel might indeed fail to merge those.

That said, these objects are small (24bytes here if I am reading things
correctly), so we're talking about 1,6MB only here. So there's still
something else eating all the memory. /proc/meminfo (MemFree) and
numactl -H should again help.

Brice


>
>
>
> 2012/9/6 Brice Goglin <brice.gog...@inria.fr
> <mailto:brice.gog...@inria.fr>>
>
> Le 06/09/2012 12:19, Gabriele Fatigati a écrit :
>> I did't find any strange number in /proc/meminfo.
>>
>> I've noted that the program fails exactly
>> every 65479 hwloc_set_area_membind. So It sounds like some kernel
>> limit. You can check that also just one thread.
>>
>> Maybe never has not noted them  because usually we bind a large
>> amount of contiguos memory few times, instead of small and non
>> contiguos pieces of memory many and many times.. :(
>
> If you have root access, try (as root)
> watch -n 1 grep numa_policy /proc/slabinfo
> Put a sleep(10) in your program when set_area_membind() fails, and
>     don't let your program exit before you can read the content of
> /proc/slabinfo.
>
> Brice
>
>
>
>
>>
>> 2012/9/6 Brice Goglin <brice.gog...@inria.fr
>> <mailto:brice.gog...@inria.fr>>
>>
>> Le 06/09/2012 10:44, Samuel Thibault a écrit :
>> > Gabriele Fatigati, le Thu 06 Sep 2012 10:12:38 +0200, a écrit :
>> >> mbind hwloc_linux_set_area_membind()  fails:
>> >>
>> >> Error from HWLOC mbind: Cannot allocate memory
>> > Ok. mbind is not really supposed to allocate much memory,
>> but it still
>> > does allocate some, to record the policy
>> >
>> >> //hwloc_obj_t obj =
>> hwloc_get_obj_by_type(topology, HWLOC_OBJ_NODE, tid);
>> >> hwloc_obj_t obj = hwloc_get_obj_by_type(topology,
>> HWLOC_OBJ_PU, tid);
>> >> hwloc_cpuset_t cpuset = hwloc_bitmap_dup(obj->cpuset);
>> >> hwloc_bitmap_singlify(cpuset);
>> >> hwloc_set_cpubind(topology, cpuset,
>> HWLOC_CPUBIND_THREAD);
>> >>
>> >> for( i = chunk*tid; i < len; i+=PAGE_SIZE) {
>> >> //   res =
>> hwloc_set_area_membind_nodeset(topology, [i],
>> PAGE_SIZE, obj->nodeset, HWLOC_MEMBIND_BIND,
>> HWLOC_MEMBIND_THREAD);
>> >>  res = hwloc_set_area_membind(topology,
>> [i], PAGE_SIZE, cpuset, HWLOC_MEMBIND_BIND,
>> HWLOC_MEMBIND_THREAD);
>> > and I'm afraid that calling set_area_membind for each page
>> might be too
>> > dense: the kernel is probably allocating a memory policy
>> record for each
>> > page, not being able to merge adjacent equal policies.
>> >
>>
>> It's supposed to merge VMA with same policies (from what I
>> understand in
>> the code), but I don't know if that actually works.
>> Maybe Gabriele found a kernel bug :)
>>
>> Brice
>>
>> ___
>> hwloc-users mailing list
>> hwloc-us...@open-mpi.org <mailto:hwloc-us...@open-mpi.org>
>> http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users
>>
>>
>>
>>
>> -- 
>> Ing. Gabriele Fatigati
>>
>> HPC specialist
>>
>> SuperComputing Applications and Innovation Department
>>
>> Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy
>>
>> www.cineca.it <http://www.cineca.it>Tel:  
>> +39 051 6171722 <tel:%2B39%20051%2061717

Re: [hwloc-users] Solaris and hwloc

2012-09-13 Thread Brice Goglin
Le 13/09/2012 00:26, Jeff Squyres a écrit :
> On Sep 12, 2012, at 10:30 AM, Samuel Thibault wrote:
>
>>> Sidenote: if hwloc-bind fails to bind, should we still launch the child 
>>> process?
>> Well, it's up to you to decide :)
>
> Anyone have an opinion?  I'm 60/40 in favor of not letting it run, under the 
> rationale that the user asked for something that we can't deliver, so we 
> shouldn't continue.
>
> Any idea what numactl does if it can't bind?

Let me add taskset to the list of tools to compare to, and distinguish
several cases:

1) invalid command line
taskset (with invalid list "2,") errors out
numactl (with invalid list "2,") errors out
hwloc-bind (with invalid location followed by "-- executable") errors
out (considers the invalid location as the executable name)

2) valid command-line containing *only* non-existing objects:
taskset errors out
numactl errors out
succeeds, binds to nothing

3) valid command-line containing some existing objects and some
non-existing:
taskset succeed (ignores unexisting objects, bind to others)
numactl errors out
succeeds (ignores unexisting objects, bind to others)

4) valid command-line with only valid objects but missing OS support
doesn't apply to taskset and numactl afaik
succeeds (ignores failure to bind)


We have a --strict option, which translate into the STRICT binding flag
which is documented as
  "Request strict binding from the OS.  The function will fail if the
binding can not be guaranteed / completely enforced."
I usually see "non-strict" as 'if you can't do what I want, do something
similar". I wouldn't be too bad to say that this applies to (3) (bind to
smaller than requested).

But (2) and (4) are different. Not binding at all or binding to nothing
is far from "non-strict". But I wonder if adding a new command-line flag
to exit on such errors would be confusing with respect to the existing
--strict.

We could also change the default to exit on error, and add --force to
launch the process even on failure to bind. But changing defaults isn't
always a good idea.

Brice



Re: [hwloc-users] Solaris and hwloc

2012-09-13 Thread Brice Goglin
I think I am going to agree. Three comments:
* which "binding fails" do you refer to? I assume all cases I listed.
* I was initially against changing the default behavior of hwloc-bind,
but it's not like changing the ABI. There are likely very few scripts
using hwloc-bind out there. Breaking some of them is not too bad as long
as we give a useful error message.
* If we start failing because of invalid inputs in hwloc-bind, we may
have to do the same in hwloc-calc. The parsing code is shared anyway.

Brice



Le 13/09/2012 17:09, Jeff Squyres a écrit :
> These are all good points.
>
> That being said, Brock Palen made another good point on the OMPI list 
> recently.  It was in regards to OpenFabrics registered memory, but the issue 
> is quite analogous.
>
> OMPI used to issue a warning if there wasn't enough registered memory 
> available, but allow the job to run anyway (at lower performance).  Brock was 
> firmly opposed to that (he's an HPC sysadmin): he didn't want jobs to run at 
> all if there wasn't enough registered memory.  
>
> One of the rationale here is that users won't tend to notice a warning at the 
> top of a job's stdout/stderr -- if the job ran, that's good enough (until 
> much later when they realize that they're not getting the right performance, 
> or, worse, this job is impacting other jobs because its affinity is wrong).  
> But if the job doesn't run, that will get noticed immediately, and the 
> problem will be fixed by a human.
>
> Hence, it seems safer to fall back on the "if we can't give the user what 
> they asked for, fail and let a human figure it out" philosophy.  Even if it 
> means changing the default.  Keep in mind that if they run hwloc-bind, 
> they're specifically asking for binding.
>
> I think I'm now 80/20 in the "abort hwloc-bind if it fails to bind" camp now. 
>  :-)
>
> After a little more thought, I'm also thinking that having a "it's ok if 
> binding fails" CLI flag is a bad idea.  If the user really wants something to 
> run without binding, then you can just do that in the shell:
>
> -
> hwloc-bind ...whatever... my_executable
> if test "$?" != "0"; then
>   # run without binding
>   my_executable
> fi
> -
>
> My $0.02.  :)
>
>
> On Sep 13, 2012, at 4:09 AM, Brice Goglin wrote:
>
>> (resending because the formatting was bad)
>>
>>
>> Le 13/09/2012 00:26, Jeff Squyres a écrit :
>>> On Sep 12, 2012, at 10:30 AM, Samuel Thibault wrote:
>>>
>>>>> Sidenote: if hwloc-bind fails to bind, should we still launch the child 
>>>>> process?
>>>> Well, it's up to you to decide :)
>>> Anyone have an opinion?  I'm 60/40 in favor of not letting it run, under 
>>> the rationale that the user asked for something that we can't deliver, so 
>>> we shouldn't continue.
>>>
>>> Any idea what numactl does if it can't bind?
>> Let me add taskset to the list of tools to compare to, and distinguish
>> several cases:
>>
>> 1) invalid command line
>> * taskset (with invalid list "2,") errors out
>> * numactl (with invalid list "2,") errors out
>> * hwloc-bind (with invalid location followed by "-- executable") errors
>> out (considers the invalid location as the executable name)
>>
>> 2) valid command-line containing *only* non-existing objects:
>> * taskset errors out
>> * numactl errors out
>> * hwloc-bind succeeds, binds to nothing
>>
>> 3) valid command-line containing some existing objects and some
>> non-existing:
>> * taskset succeed (ignores unexisting objects, bind to others)
>> * numactl errors out
>> * hwloc-bind succeeds (ignores unexisting objects, bind to others)
>>
>> 4) valid command-line with only valid objects but missing OS support
>> * doesn't apply to taskset and numactl afaik
>> * hwloc-bind succeeds (ignores failure to bind)
>>
>>
>> We have a --strict option, which translate into the STRICT binding flag
>> which is documented as
>>  "Request strict binding from the OS.  The function will fail if the
>> binding can not be guaranteed / completely enforced."
>> I usually see "non-strict" as 'if you can't do what I want, do something
>> similar". I wouldn't be too bad to say that this applies to (3) (bind to
>> smaller than requested).
>>
>> But (2) and (4) are different. Not binding at all or binding to nothing
>> is far from "non-strict". But I wonder if adding a new command-line flag
>> to exit on such errors would be confusing with respect to the existing
>> --strict.
>>
>> We could also change the default to exit on error, and add --force to
>> launch the process even on failure to bind. But changing defaults isn't
>> always a good idea.
>>
>> Brice
>>
>



Re: [hwloc-users] hwloc 1.5, freebsd and linux output on the same hardware

2012-10-03 Thread Brice Goglin
Le 03/10/2012 17:23, Sebastian Kuzminsky a écrit :
> On Tue, Oct 2, 2012 at 5:14 PM, Samuel Thibault
> > wrote:
>
> There were two bugs which resulted into cpuid not being properly
> compiled. I have fixed them in the trunk, could you try again?
>
>
> I updated my checkout to r4882, reconfigured, rebuilt, and reran it,
> and it made the same output as 1.5.  So that's an improvement over the
> svn trunk yesterday, but it's not all the way fixed yet!
>
> I'll be around all day to run tests if you like ;-)
>

For what it's worth, I tested the x86 code on Linux on a dual E5-2650
machine and got the correct topology (exactly like your Linux on your
server). So the x86 detection code may be ok, but something else wouldn't.
There's still at least one bug in the freebsd code according to our
internal regression tool, stay tuned.

Brice



Re: [hwloc-users] How do I access CPUModel info string

2012-10-25 Thread Brice Goglin
Le 25/10/2012 23:42, Samuel Thibault a écrit :
> Robin Scher, le Thu 25 Oct 2012 23:39:46 +0200, a écrit :
>> Is there a way to get this string (e.g. "Intel(R) Core(TM) i7 CPU M 620 @
>> 2.67GHz") consistently on Windows, Linux, OS-X and Solaris?
> Currently, no.
>
> hwloc itself does not have a table of such strings, and each OS has its
> own table.
>

Actually there's no table on Linux/x86. It uses cpuid to fill the model
name in the vast majority of cases [1]. We could use that to get
consistent names on non-Linux non-Solaris OS, and in the x86 backend.

Brice

[1] http://lxr.free-electrons.com/source/arch/x86/kernel/cpu/common.c#L389


Re: [hwloc-users] How do I access CPUModel info string

2012-10-25 Thread Brice Goglin
Le 25/10/2012 23:57, Robin Scher a écrit :
> On OS-X, you can get this string from the sysctlbyname() call:
>
> const char *name = "machdep.cpu.brand_string";
> char buffer[ 64 ];
> size_t size = 64;
> if( !sysctlbyname( name, buffer, , NULL, 0 ) )
> memcpy( cpu_model, buffer, 12 * sizeof( int ) );

Thanks.

> if That doesn't work, you can get it from calling system_profiler and
> parsing the output.

I'd rather not do that from inside the hwloc library :)

> On Widows (32 bit), the only way I've found is to actually use the
> cpuid assembly call:

Good to know, that's likely similar to the Linux code I cited in my
other mail.
I'll see if I can put that in some sort of common code.

> I don't know if that would work on Win64, though. Do you think those
> could be added to hwloc?

If we can make this work without too much pain, sure.

Brice



Re: [hwloc-users] How do I access CPUModel info string

2012-10-27 Thread Brice Goglin
Can you send your lstopo output?
preferably with latest trunk tarball

http://www.open-mpi.org/software/hwloc/nightly/trunk/hwloc-1.6a1r4928.tar.gz

One way to solve this problem (which may also occur on old Linux
distribs) would be to store the CPU model in the machine object. But
we'll have to make sure all processors in the machine are indeed of the
same model. On MacOSX, it looks like sysctl reports a single socket
description anyway, so no problem.

Brice




Le 27/10/2012 11:37, Olivier Cessenat a écrit :
> Hello,
>
> Robin Scher indicated how to get the info on a Mac.
>
> At least on mine (OSX 10.4) with darwin 8.11.1
> where
> $ sysctl -a machdep.cpu.brand_string
> machdep.cpu.brand_string: Intel(R) Core(TM)2 CPU T7400  @
> 2.16GHz
> I unfortunately have no socket:
> *** The number of sockets is unknown
> [ from Third example: Print the number of sockets. of
> http://www.open-mpi.org/projects/hwloc/doc/v1.5.1/
> ]
> I see objects type 1,2,4 and 6 only.
>
> So, will there be another (non socket hwloc object based) way to get
> CPUModel or will it find sockets as on Linux ?
>
> Thanks.
>
> Olivier Cessenat.
>
>
> Le jeudi 25 octobre 2012 à 23:42 +0200, Brice Goglin a écrit :
>> Hello,
>>
>> Assuming you found the socket hwloc object whose name you want, do
>> hwloc_obj_get_info_by_name(obj, "CPUModel");
>> you'll get const char * pointing to what you want.
>>
>> However, this info is only available on Linux and Solaris for now. If
>> you have any idea of to discover such info on other OS, please let us
>> know.
>>
>> Brice
>>
>>
>>
>> Le 25/10/2012 23:39, Robin Scher a écrit : 
>>> Is there a way to get this string (e.g. "Intel(R) Core(TM) i7 CPU M
>>> 620 @ 2.67GHz") consistently on Windows, Linux, OS-X and Solaris?
>>>
>>> Thanks,
>>> -robin
>>>
>>> -- 
>>> Robin Scher Uberware
>>> ro...@uberware.net
>>> +1 (213) 448-0443 
>>>
>>>
>>>
>>>
>>> ___
>>> hwloc-users mailing list
>>> hwloc-us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users
>> ___
>> hwloc-users mailing list
>> hwloc-us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users
> ___
> hwloc-users mailing list
> hwloc-us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users



Re: [hwloc-users] Strange binding issue on 40 core nodes and cgroups

2012-11-02 Thread Brice Goglin
Le 02/11/2012 21:03, Brock Palen a écrit :
> This isn't a hwloc problem exactly, but maybe you can shed some insight.
>
> We have some 4 socket 10 core = 40 core nodes, HT off:
>
> depth 0:  1 Machine (type #1)
>  depth 1: 4 NUMANodes (type #2)
>   depth 2:4 Sockets (type #3)
>depth 3:   4 Caches (type #4)
> depth 4:  40 Caches (type #4)
>  depth 5: 40 Caches (type #4)
>   depth 6:40 Cores (type #5)
>depth 7:   40 PUs (type #6)
>
>
> We run rhel 6.3  we use torque to create cgroups for jobs.  I get the 
> following cgroup for this job  all 12 cores for the job are on one node:
> cat /dev/cpuset/torque/8845236.nyx.engin.umich.edu/cpus 
> 0-1,4-5,8,12,16,20,24,28,32,36
>
> Not all nicely spaced, but 12 cores
>
> I then start a code, even a simple serial code with openmpi 1.6.0 on all 12 
> cores:
> mpirun ./stream
>
> 45521 brockp20   0 1885m 1.8g  456 R 100.0  0.2   4:02.72 stream  
>
> 45522 brockp20   0 1885m 1.8g  456 R 100.0  0.2   1:46.08 stream  
>
> 45525 brockp20   0 1885m 1.8g  456 R 100.0  0.2   4:02.72 stream  
>
> 45526 brockp20   0 1885m 1.8g  456 R 100.0  0.2   1:46.07 stream  
>
> 45527 brockp20   0 1885m 1.8g  456 R 100.0  0.2   4:02.71 stream  
>
> 45528 brockp20   0 1885m 1.8g  456 R 100.0  0.2   4:02.71 stream  
>
> 45532 brockp20   0 1885m 1.8g  456 R 100.0  0.2   1:46.05 stream  
>
> 45529 brockp20   0 1885m 1.8g  456 R 99.2  0.2   4:02.70 stream   
>
> 45530 brockp20   0 1885m 1.8g  456 R 99.2  0.2   4:02.70 stream   
>
> 45531 brockp20   0 1885m 1.8g  456 R 33.6  0.2   1:20.89 stream   
>
> 45523 brockp20   0 1885m 1.8g  456 R 32.8  0.2   1:20.90 stream   
>
> 45524 brockp20   0 1885m 1.8g  456 R 32.8  0.2   1:20.89 stream   
>
> Note the processes that are not running at 100% cpu, 
>
> hwloc-bind  --get --pid 45523
> 0x0011,0x1133
> 

Hello Brock,

I don't see any helpful to answer here :/

Do you know which core is overloaded and which (two?) cores are idle?
Does that change during one run or from one run to another?
Pressing 1 in top should give that information in the very first lines.
Then, you can try to binding another process to one of the idle cores,
to see if the kernel accepts that.

You can also press "f" and "j" (or "f" and use arrows and space to
select "last used cpu") to add a "P" line which tells you the last CPU
used by each process.
hwloc-bind --get-last-cpu-location --pid  should give the same info
but it seems broken on my machine right now, going to debug.

One thing to check would be to run more than 12 cores and check where
the kernel puts them. If it keeps ignoring two cores, that would be funny :)

Brice



Re: [hwloc-users] Strange binding issue on 40 core nodes and cgroups

2012-11-02 Thread Brice Goglin
Le 02/11/2012 21:22, Brice Goglin a écrit :
> hwloc-bind --get-last-cpu-location --pid  should give the same
> info but it seems broken on my machine right now, going to debug.

Actually, that works fine once you try it on a non-multithreaded program
that uses all cores :)

So you can use top or hwloc-bind --get-last-cpu-location --pid  to
find out where each process runs.

Brice



[hwloc-users] hwloc@SC12

2012-11-07 Thread Brice Goglin
Hello,

If you're attending SC12, feel free to come to the Inria booth (#1209)
and say hello. Samuel and I will be there, happy to meet people in real
life.

Brice



Re: [hwloc-users] [hwloc-announce] Hardware locality (hwloc) v1.6rc1 released

2012-11-15 Thread Brice Goglin
Thanks, that was an old bug on a somehow rare XML case on a NUIOA machine.
Looks like adding new test cases is indeed useful :)

Brice



Le 15/11/2012 13:14, Samuel Thibault a écrit :
> Hello,
>
> Brice Goglin, le Tue 13 Nov 2012 13:45:28 +0100, a écrit :
>> The Hardware Locality (hwloc) team is pleased to announce the first
>> release candidate for v1.6:
> I'm getting an odd failure in hwloc_pci_backend:
>
> lt-hwloc_pci_backend: hwloc-1.6rc1/tests/hwloc_pci_backend.c:68: main: 
> Assertion `!nb' failed.
>
> It seems that even with flags == 0, pci stuff gets loaded from the xml
> output. It happens on only one of our machines, hannibal. I wonder what
> is special there.
>
> Samuel
> ___
> hwloc-users mailing list
> hwloc-us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users



Re: [hwloc-users] How do I access CPUModel info string

2012-11-18 Thread Brice Goglin
Le 26/10/2012 09:39, Brice Goglin a écrit :
> Le 26/10/2012 05:22, Robin Scher a écrit :
>> I would love to get this by my next release, say in the next 3-6
>> months. Is that something that would be possible? Is there anything I
>> can do to help?
>
> We'll have a v1.6 release before the end of the year, and hopefully a
> first release candidate by mid-november for SC12.
> It'll include the Darwin code (just committed) and the x86 code
> (Samuel committed it). But I need to finish reworking the core
> components before the x86 code becomes available to non-FreeBSD
> backends. Hopefully, I'll have something ready by next week.

By the way, in case you missed it, with hwloc 1.6rc1 (released last
week), you should get CPUModel attributes on all x86 machines, as well
as Mac OS X on all architectures. Obviously this includes Windows/x86 (I
only tested on 32bits).

I will likely release 1.6rc2 tomorrow or tuesday, and the final 1.6 will
arrive by the end of the month if nobody complains.

Brice



Re: [hwloc-users] Windows api threading functions equivalent to hwloc?

2012-11-19 Thread Brice Goglin
Le 19/11/2012 21:01, Andrew Somorjai a écrit :
> Below I posted a simple windows thread creation C++ routine which sets
> the processor affinity to two cores.
> What I want is the equivalent code using hwloc. Sorry for being
> somewhat new to this but I'm not sure what 
> api calls are equivalent to the windows calls and I did search hwloc.h
> for "affinity" thinking the function call
> would be easy to find. More specifically I'm wondered whats the
> equivalent of " CreateThread ", " SetThreadAffinityMask ", 
> " GetSystemInfo ", and " WaitForMultipleObjects " in hwloc.

CreateThread() and WaitForMultipleObjects() are not in hwloc since they
have nothing to do with topologies.

> DWORD_PTR m_id = 0;
> DWORD_PTR m_mask = 1 << i;
>
> m_threads[i] = CreateThread(NULL, 0,
> (LPTHREAD_START_ROUTINE)threadMain, (LPVOID)i, NULL, _id);
> SetThreadAffinityMask(m_threads[i], m_mask);

This will likely be something such as:

hwloc_bitmap_t bitmap = hwloc_bitmap_alloc();
hwloc_bitmap_set_only(bitmap, i);
hwloc_set_thread_cpubind(topology, m_threads[i], bitmap, 0);
hwloc_bitmap_free(bitmap);


To get the number of processors with hwloc, use something like:
  hwloc_get_nbobjs_by_type(topology, HWLOC_OBJ_CORE);
or
  hwloc_get_nbobjs_by_type(topology, HWLOC_OBJ_PU);
Then it depends if you want real cores (the former or hardware threads
(the latter).

Brice



Re: [hwloc-users] GPU devices appear as PCI devices

2012-11-20 Thread Brice Goglin
Ask nvidia to make their cuda driver free, and to add them to sysfs :)

There's a cuda hwloc branch that will solve this. In the meantime, there are no 
nvidia osdevs.
Maybe look at hwloc/cuda.h and cudart.h, they give cuda device affinity without 
osdevs.

Brice



Guillermo Miranda  a écrit :

>Hello,
>
>I am trying to detect GPus when traversing through the topology tree,
>but they appear as PCI devices instead of OS devices, so I can't
>compare
>the OS type against HWLOC_OBJ_OSDEV_GPU.
>
>I have enabled IO device discovery (HWLOC_TOPOLOGY_FLAG_IO_DEVICES) and
>made sure that hwloc's configure properly recognised Cuda (4.1).
>
>Here's what lstopo prints:
>  pci_busid=":83:00.0" pci_type="0302 [10de:1091] [00de:0042] a1"
>pci_link_speed="0.00">
>
>  
>
>Is this the expected behaviour? What can I do to make that GPU be
>marked
>as an OSDEV GPU object?
>
>Thanks in advance.
>
>WARNING / LEGAL TEXT: This message is intended only for the use of the
>individual or entity to which it is addressed and may contain
>information which is privileged, confidential, proprietary, or exempt
>from disclosure under applicable law. If you are not the intended
>recipient or the person responsible for delivering the message to the
>intended recipient, you are strictly prohibited from disclosing,
>distributing, copying, or in any way using this message. If you have
>received this communication in error, please notify the sender and
>destroy and delete any copies you may have received.
>
>http://www.bsc.es/disclaimer
>___
>hwloc-users mailing list
>hwloc-us...@open-mpi.org
>http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users


Re: [hwloc-users] "-lnuma" missing from pkg-config information?

2013-01-04 Thread Brice Goglin
Hello Erik
I am not a pkgconfig expert but my feeling is that this has buggy for a
long time. hwloc.pc.in should likely use HWLOC_LIBS instead of LIBS. On
my machine, it makes Libs.private change from -dl to -lm -lnuma here
(with -lpci -lxml2 depending on the config). We also need to check
whether -ldl should be kept because of plugin support too.
Can you change LIBS instead HWLOC_LIBS in hwloc.pc.in, rerun configure,
and try again?
Brice



Le 04/01/2013 04:50, Erik Schnetter a écrit :
> I just installed hwloc 1.6 on a Linux Red Hat system. libnuma is
> required for linking -- I receive linker errors if I omit -lnuma, and
> I see that -lnuma is listed in libhwloc.la  under
> "dependency_libs". However, pkgconfig/hwloc.pc does not mention
> libnuma. It does mention libpci, though.
>
> Does this sound like an error when hwloc.pc is generated, or am I
> misunderstanding how pkg-config works? If you give me a pointer, I'd
> be happy to try an provide a patch.
>
> -erik
>
>
> -- 
> Erik Schnetter >
> http://www.perimeterinstitute.ca/personal/eschnetter/
>
>
> ___
> hwloc-users mailing list
> hwloc-us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users



Re: [hwloc-users] hwloc on Blue Gene/Q?

2013-01-08 Thread Brice Goglin
Hello Erik,
We need specific BGQ binding support, the binding API is different. Also
we don't properly detect the 16 4-way cores properly, we only only 64
identical PUs.
I am supposed to get a BGQ account in the near future so I hope I will
have everything working in v1.7.
Stay tuned
Brice




Le 08/01/2013 18:06, Erik Schnetter a écrit :
> I am trying to use hwloc on a Blue Gene/Q. Building and installing
> worked fine, and it reports the system configuration fine as well
> (i.e. it shows all PUs). However, when I try to inquire the
> thread/core bindings, hwloc crashes with an error in libc's free().
> This is both with 1.6 and 1.6.1rc1.
>
> The error occurs apparently in CPU_FREE called from
> hwloc_linux_find_kernel_nr_cpus.
>
> Does this ring a bell with anyone? I know this is not enough
> information to debug things, but do you have any pointers for things
> to look at?
>
> I remember reading somewhere that the last bit in a cpu_set_t cannot
> be used. A Blue Gene/Q has 64 PUs, and may be using 64-bit integers to
> hold cpu_set_t data. Could this be an issue?
>
> My goal is to examine and experiment with thread/core bindings with
> OpenMP to improve performance.
>
> -erik
>
> -- 
> Erik Schnetter >
> http://www.perimeterinstitute.ca/personal/eschnetter/
>
>
> ___
> hwloc-users mailing list
> hwloc-us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users



Re: [hwloc-users] Mapping a GPU to a pci local CPU on Windows

2013-01-08 Thread Brice Goglin
Is your machine NUMA? Maybe Windows returns an error when requesting
numa info on non-NUMA info?

Brice



Le 08/01/2013 18:44, Ashley Reid a écrit :
> OS says DEVPKEY_Numa_Proximity_Domain does not exist. Neither does 
> DEVPKEY_Device_Numa_Node . For all devices.
>
> Lame :/
>
> Thanks,
> Ash
>
> -Original Message-
> From: Ashley Reid 
> Sent: Tuesday, January 08, 2013 5:40 PM
> To: 'Samuel Thibault'
> Cc: hwloc-us...@open-mpi.org
> Subject: RE: [hwloc-users] Mapping a GPU to a pci local CPU on Windows
>
> It appears DEVPKEY_Numa_Proximity_Domain with SetupDiGetDeviceProperty, 
> should work. I found this hidden way down in 
>
> http://blogs.technet.com/b/winserverperformance/archive/2008/09/13/getting-system-topology-information-on-windows.aspx
>
> I am looking into seeing if this works.
>
> -Original Message-
> From: Samuel Thibault [mailto:samuel.thiba...@inria.fr] 
> Sent: Tuesday, January 08, 2013 5:39 PM
> To: Ashley Reid
> Cc: hwloc-us...@open-mpi.org
> Subject: Re: [hwloc-users] Mapping a GPU to a pci local CPU on Windows
>
> Ashley Reid, le Tue 08 Jan 2013 16:53:20 +0100, a écrit :
>> Does anyone know if this is possible with OS APIs?
> I don't know.
>
>> It looks like this is not supported on Windows yet by hwloc
> Indeed. I hadn't found when I had a look some years ago, I don't know if 
> there is something available nowadays.
>
> Samuel
> NVIDIA GmbH, Wuerselen, Germany, Amtsgericht Aachen, HRB 8361
> Managing Director: Karen Theresa Burns
>
> ---
> This email message is for the sole use of the intended recipient(s) and may 
> contain
> confidential information.  Any unauthorized review, use, disclosure or 
> distribution
> is prohibited.  If you are not the intended recipient, please contact the 
> sender by
> reply email and destroy all copies of the original message.
> ---
>
> ___
> hwloc-users mailing list
> hwloc-us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users



Re: [hwloc-users] Segmentation fault in collect_proc_cpuset, topology.c line 1074

2013-01-15 Thread Brice Goglin
Hello
Indeed, there's a big cgroup crash in 1.6.  Can you verify that 1.6.1rc2 works 
fine?
Thanks
Brice



cesse...@free.fr a écrit :

>Hello,
>
>When updating from 1.5.1 to 1.6 I get a segfault when inside a
>cgroup/cpuset in collect_proc_cpuset, file topology.c line 1074.
>
>It appears that an HWLOC_OBJ_CORE has a son who is it's
>HWLOC_OBJ_GROUP's father !
>
>$ cat /proc/self/cgroup
>2: cpuset: /slurm/test
>1: freezer: /
>$ lssubsys -m cpuset
>cpuset /cgroup/cpuset
>$ cat /cgroup/cpuset/slurm/test/cpuset.cpus
>31
>$ hwloc-1.6/bis/lstopo
>Segmentation fault (core dumped)
>$ gdb...
>Program terminated with signal 11, Segmentation fault.
>#0 0x7ffd758d225e in collect_proc_cpuset (obj=,
>sys=0x1f4dba0) at topology.c: 1074
>
>The machine is made of bullx super-node S6010 (CEA Tera 100).
>
>Thanks for your help,
>
>Olivier Cessenat.
>
>
>___
>hwloc-users mailing list
>hwloc-us...@open-mpi.org
>http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users


Re: [hwloc-users] hwloc-1.6.1rc2 Build failure with Cray compiler

2013-01-17 Thread Brice Goglin
Did this work in the past? I don't think we changed this code recently.
Can you run "make check" to make sure this change doesn't break anything?
Thanks
Brice


Le 17/01/2013 19:19, Erik Schnetter a écrit :
> hwloc-1.6.1rc2 fails to build with the Cray compiler
>
> Cray C : Version 8.1.2  Thu Jan 17, 2013  12:18:54
>
> The error message is
>
>   CC   bitmap.lo
> CC-147 craycc: ERROR 
>   Declaration is incompatible with "int ffsl(long)" (declared at line
> 526 of
>   "/opt/cray/xe-sysroot/4.1.20/usr/include/string.h").
>
> (Yes, there is no line number with the error message.)
>
> This seems to be caused by the fact that the Cray compiler
> sets __GNUC__, but is not quite compatible. A work-around is to change
> line 56 of include/private/misc.h from
>
> #elif defined(__GNUC__)
>
> to
>
> #elif defined(__GNUC__) && !defined(_CRAYC)
>
> -erik
>
> -- 
> Erik Schnetter >
> http://www.perimeterinstitute.ca/personal/eschnetter/
>
>
> ___
> hwloc-users mailing list
> hwloc-us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users



Re: [hwloc-users] hwloc-1.6.1rc2 Build failure with Cray compiler

2013-01-17 Thread Brice Goglin
Does Cray fix such bugs quickly usually? If so, no need to change hwloc. If 
not, I'll need somebody to test the change on other cray platforms and compiler 
versions.
Brice



Jeff Hammond  a écrit :

>This is a bug in the Cray compiler.  They cannot and should not set
>the __GNUC__ flag unless they are fully compatible with GCC.  There
>are many ways to define "fully compatible" but at a minimum, code that
>compiles with GCC needs to compile with any compiler that elects to
>define __GNUC__.  It is prudent to impose a higher standard in some
>cases but that's not pertinent to this discussion.
>
>Lots of vendor compilers pretend to be __GNUC__ for any number of
>reasons.  I believe that they are all wrong for doing it.
>
>Regarding this specific issue, there is nothing wrong with hwloc and I
>don't know why anyone should bother trying to fix Cray's problem, but
>I suspect that pragmatism will prevail, as it appears to have in the
>case of Boost
>(http://www.boost.org/doc/libs/1_52_0/boost/config/select_platform_config.hpp).
>
>I'll reproduce this locally and contact Cray directly about fixing
>this on their end.
>
>Best,
>
>Jeff
>
>On Thu, Jan 17, 2013 at 12:19 PM, Erik Schnetter 
>wrote:
>> hwloc-1.6.1rc2 fails to build with the Cray compiler
>>
>> Cray C : Version 8.1.2  Thu Jan 17, 2013  12:18:54
>>
>> The error message is
>>
>>   CC   bitmap.lo
>> CC-147 craycc: ERROR
>>   Declaration is incompatible with "int ffsl(long)" (declared at line
>526 of
>>   "/opt/cray/xe-sysroot/4.1.20/usr/include/string.h").
>>
>> (Yes, there is no line number with the error message.)
>>
>> This seems to be caused by the fact that the Cray compiler sets
>__GNUC__,
>> but is not quite compatible. A work-around is to change line 56 of
>> include/private/misc.h from
>>
>> #elif defined(__GNUC__)
>>
>> to
>>
>> #elif defined(__GNUC__) && !defined(_CRAYC)
>>
>> -erik
>>
>> --
>> Erik Schnetter 
>> http://www.perimeterinstitute.ca/personal/eschnetter/
>>
>> ___
>> hwloc-users mailing list
>> hwloc-us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users
>
>
>
>-- 
>Jeff Hammond
>Argonne Leadership Computing Facility
>University of Chicago Computation Institute
>jhamm...@alcf.anl.gov / (630) 252-5381
>http://www.linkedin.com/in/jeffhammond
>https://wiki.alcf.anl.gov/parts/index.php/User:Jhammond
>___
>hwloc-users mailing list
>hwloc-us...@open-mpi.org
>http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users


Re: [hwloc-users] OpenGL GPU detection code

2013-01-29 Thread Brice Goglin
Le 29/01/2013 10:14, Stefan Eilemann a écrit :
> Hi,
>
> On 29. Jan 2013, at 8:13, Brice Goglin <brice.gog...@inria.fr> wrote:
>
> [snip]
>> Could you send a diff against this branch instead?
> I missed this branch. I'll merge this into the current implementation and 
> ping let you know once it's done.
>

FWIW, I finally managed to run the code on our machines (we only have
remote rack servers withour display, so playing with displays isn't easy :)
I am getting things to work for real now, and I am seeing several things
to fix before this can work. Shouldn't be too hard, but I won't touch
the SVN branch until you send your patch.

Brice



Re: [hwloc-users] OpenGL GPU detection code

2013-01-29 Thread Brice Goglin
As far as I know, neither OpenCL nor OpenGL have a standard way to query
GPU affinity.
There is a AMD OpenCL extension and there are some NVIDIA specific
libraries (nvml, cuda and nvctrl) that can query GPU affinity but
nothing is portable.

The current plan is to have some OpenCL device info and some OpenGL
device info appear inside GPU PCI devices. That said, I am still not
confident about the current OpenGL thing. The current branch manipulates
what I usually call a display (":0.0") which seem rather X than OpenGL
related, but I am not familiar with all this at all anyway.

Since both OpenCL and OpenCL (and maybe CUDA at some point) may end up
containing attributes describing the capabilities of the (same) GPU,
we'll need to think about displaying them only once in a common place,
but we're not there yet.

Brice




Le 29/01/2013 16:56, Kenneth A. Lloyd a écrit :
> As OpenGL and OpenCL are both under the umbrella of the Khronos Group, is
> the endeavor to inspect GPUs common to both?
>
> Ken Lloyd
>
> -Original Message-
> From: hwloc-users-boun...@open-mpi.org
> [mailto:hwloc-users-boun...@open-mpi.org] On Behalf Of Stefan Eilemann
> Sent: Tuesday, January 29, 2013 7:46 AM
> To: Brice Goglin
> Cc: Hardware locality user list
> Subject: Re: [hwloc-users] OpenGL GPU detection code
>
> Hi Brice,
>
> On 29. Jan 2013, at 15:25, Brice Goglin <brice.gog...@inria.fr> wrote:
>
>> FWIW, I finally managed to run the code on our machines (we only have 
>> remote rack servers withour display, so playing with displays isn't 
>> easy :) I am getting things to work for real now, and I am seeing 
>> several things to fix before this can work. Shouldn't be too hard, but 
>> I won't touch the SVN branch until you send your patch.
> Please go ahead and modify your display svn branch. I'll merge it in using
> our git-svn clone, that should be very easy.
>
>
> Cheers,
>
> Stefan.
> --
> http://www.eyescale.ch
> https://github.com/Eyescale/
> http://www.linkedin.com/in/eilemann
>
>
> ___
> hwloc-users mailing list
> hwloc-us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users



Re: [hwloc-users] OpenGL GPU detection code

2013-02-01 Thread Brice Goglin
I just committed big changes to the display branch (and I also merged
latest trunk changes).

lstopo will now report things like this:
PCI 10de:06d1
  GPU L#0 ":0.0"
  GPU L#1 "cuda0"
  GPU L#2 "nvml0"


The changes include:

1) We don't have a "display" specific OS device anymore, it's just
another kind of GPU among cuda, opencl and nvml. The name is the X
server display name. There are string attributes in these new GL GPU OS
devices (lstopo -v):
  GPU L#9 (Backend=GL GPUVendor="NVIDIA Corporation" GPUModel="Tesla
C2050") ":0.2"

2) The gl component is now buildable as a plugin

3) Given (2), we can't expose internal GL routines in the public API. So
hwloc/gl.h is just made of inline helpers as any other hwloc/foo.h. It
now contains functions to convert between displays (name or port/device)
and hwloc OS devices:

hwloc_obj_t hwloc_gl_get_display_osdev_by_port_device(hwloc_topology_t
topology, unsigned port, unsigned device)
hwloc_obj_t hwloc_gl_get_display_osdev_by_name(hwloc_topology_t
topology, const char *name)
int hwloc_gl_get_display_by_osdev(hwloc_topology_t topology, hwloc_obj_t
osdev,unsigned *port, unsigned *device)

If you really need the PCI device, just use osdev->parent as documented.
If you need the locality, use hwloc_get_non_io_ancestor(topology,
osdev)->cpuset
See tests/gl.c for examples.

Please review hwloc/gl.h and let me know if that works for you. I hope I
used the words port/device/server/screen as expected.

The last thing on my TODO list is to decide is whether we keep the "GL"
name or switch to something among display/X11/X/... for filenames and
function names.

Brice



Re: [hwloc-users] hwloc-bind --get on Solaris for binding to a single core

2013-02-08 Thread Brice Goglin
Le 07/02/2013 18:48, Eugene Loh a écrit :
> I'm attaching a patch.

Thanks a lot Eugene. I've tested that and it looks OK to me.
I am committing it, it will be in v1.7 and v1.6.2. I guess now you want
Jeff to include r5295 in OMPI.

Brice



Re: [hwloc-users] hwloc on Blue Gene/Q?

2013-02-11 Thread Brice Goglin
Obviously, I should have mentioned that you must pass
--host=powerpc64-bgq-linux to configure. I will add a FAQ about this.

Brice




Le 11/02/2013 01:52, Erik Schnetter a écrit :
> Brice
>
> I tried using this tarball. Things didn't work. (This particular run
> used 2 MPI processes with 32 OpenMP threads each.)
>
> In my application, I first output the topology in a tree structure. (I
> do this in my application instead of via one of hwloc's tools because
> I don't want to call out to shell code.) Then I output thread
> bindings, then modify the thread bindings, then output them again.
>
> (1) The topology I find consists of 32 PUs and nothing else. I would
> have expected to find two cache levels, 16 cores, and 64 PUs.
>
> (2) When outputting the thread bindings, I received a segfault. The
> lightweight core file says this was signal 6 (SIGABRT) in a routine
> called ".raise".
>
> I'd be happy to help debug this. How?
>
> -erik
>
>
>
>
> On Sat, Feb 9, 2013 at 5:46 PM, Brice Goglin <brice.gog...@inria.fr
> <mailto:brice.gog...@inria.fr>> wrote:
>
> The new "bgq" branch now contains proper topology for BG/Q nodes
> (including cores and caches, except the prefetching cache) as well
> as support for set/get binding of the current thread or of another
> thread. No process-wide binding since I don't know how to iterate
> over all threads of a process.
>
> A tarball is available at:
>
> 
> https://ci.inria.fr/hwloc/job/hwloc-zcustom-tarball/lastSuccessfulBuild/artifact/hwloc-1.7a1r5312.tar.gz
> (this is our new regression testing tool, I hope the tarball won't
> disappear too soon)
>
> I don't expect a lot more features so this branch will likely go
> into trunk very soon. But if you can look at it, that'll be great.
>
>
> Brice
>
>
>
> Le 08/01/2013 18:06, Erik Schnetter a écrit :
>> I am trying to use hwloc on a Blue Gene/Q. Building and
>> installing worked fine, and it reports the system configuration
>> fine as well (i.e. it shows all PUs). However, when I try to
>> inquire the thread/core bindings, hwloc crashes with an error in
>> libc's free(). This is both with 1.6 and 1.6.1rc1.
>>
>> The error occurs apparently in CPU_FREE called from
>> hwloc_linux_find_kernel_nr_cpus.
>>
>> Does this ring a bell with anyone? I know this is not enough
>> information to debug things, but do you have any pointers for
>> things to look at?
>>
>> I remember reading somewhere that the last bit in a cpu_set_t
>> cannot be used. A Blue Gene/Q has 64 PUs, and may be using 64-bit
>> integers to hold cpu_set_t data. Could this be an issue?
>>
>> My goal is to examine and experiment with thread/core bindings
>> with OpenMP to improve performance.
>>
>> -erik
>>
>> -- 
>> Erik Schnetter <schnet...@gmail.com <mailto:schnet...@gmail.com>>
>> http://www.perimeterinstitute.ca/personal/eschnetter/
>>
>>
>> ___
>> hwloc-users mailing list
>> hwloc-us...@open-mpi.org <mailto:hwloc-us...@open-mpi.org>
>> http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users
>
>
>
>
> -- 
> Erik Schnetter <schnet...@gmail.com <mailto:schnet...@gmail.com>>
> http://www.perimeterinstitute.ca/personal/eschnetter/



Re: [hwloc-users] Zero cache line size on Power7?

2013-02-27 Thread Brice Goglin
I think I've seen cases where the device-tree contains 0 for such line
sizes.
I guess we should document that the line size is 0 means unknown.

Can you send the tarball generated by hwloc-gather-topology ? (send it
only to, in a private email)

Brice



Le 27/02/2013 23:11, Erik Schnetter a écrit :
> I am running hwloc 1.7a1r5312 on a Power7 architecture. I find there a
> level 2 cache with a cacheline size of 0. Is this to be expected? The
> documentation doesn't say that determining the cacheline size may fail.
>
> I query the cache parameters from my application with these results:
>
> Cache (unknown name) has type "data" depth 1
>size 32768 linesize 128 associativity 8 stride 4096
> Cache (unknown name) has type "unified" depth 2
>size 262144 linesize 0 associativity 8 stride 32768
>
> -erik
>
> --
> Erik Schnetter >
> http://www.perimeterinstitute.ca/personal/eschnetter/
>
>
> ___
> hwloc-users mailing list
> hwloc-us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users



Re: [hwloc-users] Zero cache line size on Power7?

2013-03-04 Thread Brice Goglin
You should run hwloc-gather-topology with one parameter:
  hwloc-gather-topology foo
This should generate foo.tar.bz2

We don't explicitly check the number of parameters, we directly look at
$1. My bash makes $1 empty when not given, looks like your bash doesn't
accept that. The attached patch should fix the error.

Brice

Index: tests/linux/hwloc-gather-topology.in
===
--- tests/linux/hwloc-gather-topology.in(révision 5403)
+++ tests/linux/hwloc-gather-topology.in(copie de travail)
@@ -34,6 +34,11 @@
echo "  $0 /tmp/\$(uname -n)"
 }

+if test x$# = x0 ; then
+  usage
+  exit 1
+fi
+
 name="$1"; shift
 if [ -z "$name" -o x`echo $name | cut -c1` = x- ] ; then
   [ x$name != x -a x$name != x-h -a x$name != x--help ] && echo "Unrecognized 
option: $name"





Le 04/03/2013 14:47, Erik Schnetter a écrit :
> Brice
>
> bash-3.2$
> configs/sim-debug/scratch/build/hwloc/hwloc-1.7a1r5312/tests/linux/hwloc-gather-topology
> configs/sim-debug/scratch/build/hwloc/hwloc-1.7a1r5312/tests/linux/hwloc-gather-topology[37]:
> shift: bad number
>
> No tarball is generated.
>
> -erik
>
>
>
> On Wed, Feb 27, 2013 at 5:19 PM, Brice Goglin <brice.gog...@inria.fr
> <mailto:brice.gog...@inria.fr>> wrote:
>
> I think I've seen cases where the device-tree contains 0 for such
> line sizes.
> I guess we should document that the line size is 0 means unknown.
>
> Can you send the tarball generated by hwloc-gather-topology ?
> (send it only to, in a private email)
>
> Brice
>
>
>
> Le 27/02/2013 23:11, Erik Schnetter a écrit :
>> I am running hwloc 1.7a1r5312 on a Power7 architecture. I find
>> there a level 2 cache with a cacheline size of 0. Is this to be
>> expected? The documentation doesn't say that determining the
>> cacheline size may fail.
>>
>> I query the cache parameters from my application with these results:
>>
>> Cache (unknown name) has type "data" depth 1
>>size 32768 linesize 128 associativity 8 stride 4096
>> Cache (unknown name) has type "unified" depth 2
>>size 262144 linesize 0 associativity 8 stride 32768
>>
>> -erik
>>
>> --
>> Erik Schnetter <schnet...@cct.lsu.edu <mailto:schnet...@cct.lsu.edu>>
>> http://www.perimeterinstitute.ca/personal/eschnetter/
>>
>>
>> ___
>> hwloc-users mailing list
>> hwloc-us...@open-mpi.org <mailto:hwloc-us...@open-mpi.org>
>> http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users
>
>
> ___
> hwloc-users mailing list
> hwloc-us...@open-mpi.org <mailto:hwloc-us...@open-mpi.org>
> http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users
>
>
>
>
> -- 
> Erik Schnetter <schnet...@cct.lsu.edu <mailto:schnet...@cct.lsu.edu>>
> http://www.perimeterinstitute.ca/personal/eschnetter/
>
>
> ___
> hwloc-users mailing list
> hwloc-us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users



Re: [hwloc-users] Many queries creating slow performance

2013-03-05 Thread Brice Goglin
Hello Simon,

I don't think anybody every benchmarked this, but people have been
complaining this problem appearing on large machines at some point. I
have a large SGI machine at work, I'll see if I can reproduce this.

One solution is to export the topology to XML once and then have all
your MPI process read from XML. Basically, do "lstopo /tmp/foo.xml" and
then export HWLOC_XMLFILE=/tmp/foo.xml in the environment before
starting your MPI job.

If the topology doesn't change (and that's likely the case), the XML
file could even be stored by the administrator in a "standard" location
(not in /tmp)

Brice



Le 05/03/2013 20:23, Simon Hammond a écrit :
> Hi HWLOC users,
>
> We are seeing some significant performance problems using HWLOC 1.6.2
> on Intel's MIC products. In one of our configurations we create 56 MPI
> ranks, each rank then queries the topology of the MIC card before
> creating threads. We are noticing that if we run 56 MPI ranks as
> opposed to one the calls to query the topology in HWLOC are very slow,
> runtime goes from seconds to minutes (and upwards).
>
> We guessed that this might be caused by the kernel serializing access
> to the /proc filesystem but this is just a hunch. 
>
> Has anyone had this problem and found an easy way to change the
> library / calls to HWLOC so that the slow down is not experienced?
> Would you describe this as a bug?
>
> Thanks for your help.
>
>
> --
> Simon Hammond
>
> 1-(505)-845-7897 / MS-1319
> Scalable Computer Architectures
> Sandia National Laboratories, NM
>
>
>
>
>
>
> ___
> hwloc-users mailing list
> hwloc-us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users



Re: [hwloc-users] Trying to get last cpu location

2013-03-05 Thread Brice Goglin
I don't see any obvious problem in the code.
Are you sure you're not using an old hwloc by mistake?
Can you add this after #include  and compile again ?
#if HWLOC_API_VERSION < 0x00010200
#error Too old
#endif

Brice



Le 05/03/2013 21:58, Fabio Tordini a écrit :
> I'm using release 1.6.1 and, as I said, it is the only weird behaviour
> I'm experiencing.
>
> the test program is actually very straight and easy: I load and
> initialise the topology, perform some checks on current bindings and
> the try to get the current thread's location, so that I can bind it to
> the nodeset it belongs to:
>
>  hwloc_topology_init();
>  hwloc_topology_load(topo);
>
>  ...
>
>  hwloc_bitmap_t cpuset  = hwloc_bitmap_alloc();
>  hwloc_bitmap_t nodeset = hwloc_bitmap_alloc();
>  char *str;
>
>  ...
>
>  if( hwloc_get_last_cpu_location(topology, cpuset,
> HWLOC_CPUBIND_THREAD) < 0 )
>  abort();
>  hwloc_bitmap_asprintf(, cpuset);
>  printf("current thread running on cpuset %s\n", str);
>  free(str);
>
>  hwloc_cpuset_from_nodeset(topology, cpuset, nodeset);
>  if( hwloc_set_membind_nodeset(topology, nodeset, HWLOC_MEMBIND_BIND,
> HWLOC_MEMBIND_THREAD) < 0 )
>  abort();
>
>  ...
>
>
>  if( hwloc_get_last_cpu_location(topology, cpuset,
> HWLOC_CPUBIND_THREAD) < 0 )
>  abort();
>  hwloc_bitmap_asprintf(, cpuset);
>  printf("current thread running on cpuset %s\n", str);
>  free(str);
>
>  hwloc_cpuset_from_nodeset(topology, cpuset, nodeset);
>  if( hwloc_set_membind_nodeset(topology, nodeset, HWLOC_MEMBIND_BIND,
> HWLOC_MEMBIND_THREAD) < 0 )
>  abort();
>
>  ...
>
> I omitted some out-of-the-scope parts, but this is mostly it: is there
> something wrong?
> Fabio
>
>
> On 05/03/13 18:25, Brice Goglin wrote:
>> Hello Fabio,
>> Which hwloc release are you using ? get_last_cpu_location() was only
>> added in hwloc v1.2. It has always been available since then, even on
>> when not supported (it will return -1 with errno=ENOSYS in this case).
>> If this doesn't help, can you send your test program?
>> Brice
>>
>>
>>
>> Le 05/03/2013 18:01, Fabio Tordini a écrit :
>>> Hello,
>>>
>>> I'm experiencing a problem using the function
>>> 'hwloc_get_last_cpu_location(...)': when compiling i first get a
>>> warning about an implicit declaration of the function, and then it
>>> gives an "undefined reference" error.
>>> Everything else works just fine and I was thinking whether I have to
>>> link some other libraries or perform some other actions in order to be
>>> able to use the function.
>>>
>>> The tests I'm executing are run on a x86_64 GNU/Linux machine, and as
>>> far as I know that function should be totally supported on Linux
>>> systems.
>>>
>>> thanks,
>>> Fabio
>>>
>>> ___
>>> hwloc-users mailing list
>>> hwloc-us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users
>



Re: [hwloc-users] Many queries creating slow performance

2013-03-05 Thread Brice Goglin
Just tested on a 96-core shared-memory machine. Running OpenMPI 1.6
mpiexec lstopo, here's the execution time (mpiexec launch time is 0.2-0.4s)

1 rank :  0.2s
8 ranks:  0.3-0.5s depending on binding (packed or scatter)
24ranks:  0.8-3.7s depending on binding
48ranks:  2.8-8.0s depending on binding
96ranks: 14.2s

96ranks from a single XML file: 0.4s (negligible against mpiexec launch time)


Brice



Le 05/03/2013 20:23, Simon Hammond a écrit :
> Hi HWLOC users,
>
> We are seeing some significant performance problems using HWLOC 1.6.2
> on Intel's MIC products. In one of our configurations we create 56 MPI
> ranks, each rank then queries the topology of the MIC card before
> creating threads. We are noticing that if we run 56 MPI ranks as
> opposed to one the calls to query the topology in HWLOC are very slow,
> runtime goes from seconds to minutes (and upwards).
>
> We guessed that this might be caused by the kernel serializing access
> to the /proc filesystem but this is just a hunch. 
>
> Has anyone had this problem and found an easy way to change the
> library / calls to HWLOC so that the slow down is not experienced?
> Would you describe this as a bug?
>
> Thanks for your help.
>
>
> --
> Simon Hammond
>
> 1-(505)-845-7897 / MS-1319
> Scalable Computer Architectures
> Sandia National Laboratories, NM
>
>
>
>
>
>
> ___
> hwloc-users mailing list
> hwloc-us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users



Re: [hwloc-users] [EXTERNAL] Re: Many queries creating slow performance

2013-03-06 Thread Brice Goglin
Aside from the idea of saving the topology to a XML file before running
the job, you could also:
* rank 0 load the topology as usual
* rank 0 saves it to a XML buffer (hwloc_topology_export_xmlbuffer())
then MPI_Broadcast() to other ranks
* those ranks would just load a hwloc topology from the received XML
buffer (hwloc_topology_set_xmlbuffer()).

Brice



Le 06/03/2013 03:53, Hammond, Simon David (-EXP) a écrit :
> Hey Jeff,
>
> It's not in OpenMPI or MPICH :(. It's a custom library which is not
> MPI aware making it difficult to share the topology query. Ill see if
> we can get a stand alone piece of code.
>
> From earlier posts it sounds like OpenMPI queries once per physical
> node so probably won't have this problem. I'm guessing MPICH would do
> something similar?
>
> S.
>
>
>
> Sent with Good (www.good.com)
>
>
> -Original Message-
> *From: *Jeff Hammond [jhamm...@alcf.anl.gov
> <mailto:jhamm...@alcf.anl.gov>]
> *Sent: *Tuesday, March 05, 2013 07:17 PM Mountain Standard Time
> *To: *Hardware locality user list
> *Subject: *[EXTERNAL] Re: [hwloc-users] Many queries creating slow
> performance
>
> Si - Is your code that calls hwloc part of MPICH or OpenMPI or
> something that can be made standalone and shared?
>
> Brice - Do you have access to a MIC system for testing?  Write me
> offline if you don't and I'll see what I can do to help.
>
> If this affects MPICH i.e. Hydra, then I'm sure Intel will be
> committed to helping fix it since Intel MPI is using Hydra as the
> launcher on systems like Stampede.
>
> Best,
>
> Jeff
>
> On Tue, Mar 5, 2013 at 3:05 PM, Brice Goglin <brice.gog...@inria.fr>
> wrote:
> > Just tested on a 96-core shared-memory machine. Running OpenMPI 1.6
> mpiexec
> > lstopo, here's the execution time (mpiexec launch time is 0.2-0.4s)
> >
> > 1 rank :  0.2s
> > 8 ranks:  0.3-0.5s depending on binding (packed or scatter)
> > 24ranks:  0.8-3.7s depending on binding
> > 48ranks:  2.8-8.0s depending on binding
> > 96ranks: 14.2s
> >
> > 96ranks from a single XML file: 0.4s (negligible against mpiexec launch
> > time)
> >
> > Brice
> >
> >
> >
> > Le 05/03/2013 20:23, Simon Hammond a écrit :
> >
> > Hi HWLOC users,
> >
> > We are seeing some significant performance problems using HWLOC 1.6.2 on
> > Intel's MIC products. In one of our configurations we create 56 MPI
> ranks,
> > each rank then queries the topology of the MIC card before creating
> threads.
> > We are noticing that if we run 56 MPI ranks as opposed to one the
> calls to
> > query the topology in HWLOC are very slow, runtime goes from seconds to
> > minutes (and upwards).
> >
> > We guessed that this might be caused by the kernel serializing
> access to the
> > /proc filesystem but this is just a hunch.
> >
> > Has anyone had this problem and found an easy way to change the
> library /
> > calls to HWLOC so that the slow down is not experienced? Would you
> describe
> > this as a bug?
> >
> > Thanks for your help.
> >
> >
> > --
> > Simon Hammond
> >
> > 1-(505)-845-7897 / MS-1319
> > Scalable Computer Architectures
> > Sandia National Laboratories, NM
> >
> >
> >
> >
> >
> >
> > ___
> > hwloc-users mailing list
> > hwloc-us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users
> >
> >
> >
> > ___
> > hwloc-users mailing list
> > hwloc-us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users
>
>
>
> --
> Jeff Hammond
> Argonne Leadership Computing Facility
> University of Chicago Computation Institute
> jhamm...@alcf.anl.gov / (630) 252-5381
> http://www.linkedin.com/in/jeffhammond
> https://wiki.alcf.anl.gov/parts/index.php/User:Jhammond
>
> ___
> hwloc-users mailing list
> hwloc-us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users
>
>
>
> ___
> hwloc-users mailing list
> hwloc-us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users



Re: [hwloc-users] Windows binaries miss lib file

2013-05-20 Thread Brice Goglin
Thanks, there was indeed an issue on the machine that builds the Windows
zipballs. I am fixing this. Should be fixed in 1.7.1. If anybody needs
updated earlier Windows zipballs, please let me know.

Brice



Le 20/05/2013 14:19, Hartmut Kaiser a écrit :
> Hey all,
>
> The V1.7 (and V1.7.1-rc1) Win64 binaries distributed from the website miss
> the libhwloc.lib file. Same is true for the 32bit binaries (I just checked).
> Sure, it's possible to create those by hand, but I think it's more desirable
> to have it available as part of the downloads.
>
> Thanks!
> Regards Hartmut
> ---
> http://boost-spirit.com
> http://stellar.cct.lsu.edu
>
>
> ___
> hwloc-users mailing list
> hwloc-us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users




Re: [hwloc-users] Windows binaries miss lib file

2013-05-20 Thread Brice Goglin
I took the opportunity to make the windows build pretty much automatic,
so that fixing earlier winballs would be easy. And it also checks that
we don't miss the .lib file now.
v1.5.2, v1.6.2, v1.7 and v1.7.1rc1 have been fixed online, you should
have the missing .lib in both 32 and 64 bit releases now. All earlier
releases (except v0.9) were already OK.
Final v1.7.1 expected today or wednesday.

Brice



Le 20/05/2013 18:45, Brice Goglin a écrit :
> Thanks, there was indeed an issue on the machine that builds the Windows
> zipballs. I am fixing this. Should be fixed in 1.7.1. If anybody needs
> updated earlier Windows zipballs, please let me know.
>
> Brice
>
>
>
> Le 20/05/2013 14:19, Hartmut Kaiser a écrit :
>> Hey all,
>>
>> The V1.7 (and V1.7.1-rc1) Win64 binaries distributed from the website miss
>> the libhwloc.lib file. Same is true for the 32bit binaries (I just checked).
>> Sure, it's possible to create those by hand, but I think it's more desirable
>> to have it available as part of the downloads.
>>
>> Thanks!
>> Regards Hartmut
>> ---
>> http://boost-spirit.com
>> http://stellar.cct.lsu.edu
>>
>>
>> ___
>> hwloc-users mailing list
>> hwloc-us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users
>



Re: [hwloc-users] hwloc on Xeon Phi

2013-06-18 Thread Brice Goglin
Le 18/06/2013 08:52, pinak panigrahi a écrit :
> Hi, how do I use hwloc on Intel Xeon Phi. I have written codes that
> use it for Sandybridge.

Hello,

If you really mean 'inside the Xeon Phi", it should just work and report
all available Phi cores.

If you mean managing the Phi internal topology from the host, it's a bit
harder, we currently only report the Phi location within the host.

What did you try, what didn't work, and what would you like to do exactly?

Brice



Re: [hwloc-users] Open-mpi + hwloc ...

2013-06-21 Thread Brice Goglin
Hello,
hwloc can only tell where CPU/device are, and place programs on the
right CPUs. hwloc isn't going to convert your parallel program into a
GPU program. If you want to use NVIDIA GPUs, you have to rewrite your
program using CUDA, OpenCL, or a high-level heterogeneous langage.
Brice



Le 21/06/2013 12:04, Solibakke Per Bjarte a écrit :
> Hello,
>
> I have been using OPEN-MPI for several years now on 8-16 CPU/Core
> machines. I want to extend the usage to graphic-card devices
> (NVIDIA-cards). Therefore,  
>
>
> I use open-mpi implementation on x number of CPU´s working well
> (linux/ubuntu):
>
>  
>
> The CPU installation:
>
>  
>
> 1) makefile look like this:
>
>  
>
> 
>
> CC   = mpic++
>
> SDIR = ./
>
> IMPI = /usr/lib/openmpi/include
>
> LMPI = /usr/lib/openmpi/lib
>
> ISCL = $(HOME)/applik-libscl/libscl/gpp
>
> LSCL = $(HOME)/applik-libscl/libscl/gpp
>
> IDIRS= -I. -I$(SDIR) -I$(IMPI) -I$(ISCL)
>
> LDIRS= -L$(LMPI) -L$(LSCL)
>
> CFLAGS   = -O -Wall -c  $(IDIRS)
>
> LFLAGS   = $(LDIRS)  -lscl -lm
>
>  
>
> hello : hello.o
>
>   $(CC) -o hello hello.o $(LFLAGS)
>
>  
>
> hello.o : $(SDIR)/hello.cpp $(HEADERS)
>
>   $(CC) $(CFLAGS) $(SDIR)/hello.cpp
>
>  
>
> clean :
>
>   rm -f *.o core core.*
>
>  
>
> veryclean   :
>
>   rm -f *.o core core.*
>
>   rm -f  hello
>
>  
>
> *
>
>  
>
> 2) and I  simultaneously compile and execute with the  sh-file:
>
>  
>
> *
>
> echo "localhost cpu=24" > OpenMPIhosts
>
>  
>
> test -f hello.err  && mv -f hello.err  hello.err.bak
>
> test -f hello.out  && mv -f hello.out  hello.out.bak
>
>  
>
> make -f makefile.mpi.OpenMPI_1.4 >hello.out 2>&1 && \
>
>   mpirun --hostfile OpenMPIhosts ${PWD}/hello >>hello.out 2>hello.err
>
>  
>
> RC=$?
>
>  
>
> case $RC in
>
>   0) exit 0 ;;
>
>   esac
>
> exit 1;
>
>  
>
>  
>
> I have now some questions:
>
>  
>
> Can this parallel program (hello) be extended by also using Graphic
> processors card (i.e. Nvidia-cards) using hwloc = internal in version 
> Open-mpi 1.7.1 (installation).
>
>  
>
> If yes:
>
>  
>
> Any changes in makefiles? Execute-files? Program-files?
>
>  
>
> Suggestions for implementations are appreciated!
>
> The graphic card devices should be the extensions of a machine's CPUs.
>
>  
>
> Regards
>
> PBSolibakke
>
> Professor
>
>
>
>
> ___
> hwloc-users mailing list
> hwloc-us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users



Re: [hwloc-users] [hwloc-announce] Hardware locality (hwloc) v1.7.2rc1 released

2013-08-29 Thread Brice Goglin
There will be a v1.8rc1 in the near future (within a month) so I'd
rather put such a change there.

Brice



Le 29/08/2013 15:48, Jiri Hladky a écrit :
> Hi Brice,
>
> is there any change to add the release tag
>
> $uname -r
> 3.10.7-100.fc18.x86_64
>
> to the graphical output in 1.7.2 ? (see also my other email I sent to
> you 2 minutes ago).
>
> Jirka
>
>
>
>
> On Thu, Aug 29, 2013 at 11:32 AM, Brice Goglin <brice.gog...@inria.fr
> <mailto:brice.gog...@inria.fr>> wrote:
>
> The Hardware Locality (hwloc) team is pleased to announce the first
> release candidate of v1.7.2:
>
>http://www.open-mpi.org/projects/hwloc/
>
> v1.7.2 is a bug fix release which addresses all known bugs in the
> v1.7 series.
>
> The following is a summary of the changes since v1.7.1:
>
> * Do not create invalid block OS devices on very old Linux kernel such
>   as RHEL4 2.6.9.
> * Fix PCI subvendor/device IDs.
> * Fix the management of Misc objects inserted by parent.
>   Thanks to Jirka Hladky for reporting the problem.
> * Add a PortState into attribute to OpenFabrics OS devices.
> * Add a MICSerialNumber info attribute to Xeon PHI/MIC OS devices.
> * Improve verbose error messages when failing to load from XML.
>
> --
> Brice
>
> ___
> hwloc-announce mailing list
> hwloc-annou...@open-mpi.org <mailto:hwloc-annou...@open-mpi.org>
> http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-announce
>
>
>
>
> ___
> hwloc-users mailing list
> hwloc-us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users



Re: [hwloc-users] CPU binding

2013-10-03 Thread Brice Goglin
Le 03/10/2013 02:56, Panos Labropoulos a écrit :
> Hallo,
>
>
> I initially posted this at us...@open-mpi.org .
>
> We seem to be unable to to set the cpu binding on a cluster consisting
> of Dell M420/M610 systems:
>
> [jallan@hpc21 ~]$ cat report-bindings.sh #!/bin/sh
>
> bitmap=`hwloc-bind --get -p`
> friendly=`hwloc-calc -p -H socket.core.pu $bitmap`
>
> echo "MCW rank $OMPI_COMM_WORLD_RANK (`hostname`): $friendly"
> exit 0
>
>
> [jallan@hpc27 ~]$ hwloc-bind -v  socket:0.core:0 -l ./report-bindings.sh
> using object #0 depth 2 below cpuset 0x00ff
> using object #0 depth 6 below cpuset 0x0080
> adding 0x0080 to 0x0
> adding 0x0080 to 0x0
> assuming the command starts at ./report-bindings.sh
> binding on cpu set 0x0080
> MCW rank  (hpc27): Socket:0.Core:10.PU:7
> [jallan@hpc27 ~]$ hwloc-bind -v  socket:1.core:0 -l ./report-bindings.sh
> object #1 depth 2 (type socket) below cpuset 0x00ff does not exist
> adding 0x0 to 0x0
> assuming the command starts at ./report-bindings.sh
> MCW rank  (hpc27): Socket:0.Core:10.PU:7
>
>
> The topology of this system looks a bit strange:
>
> [jallan@hpc21 ~]$ lstopo --no-io
> Machine (24GB)
>  NUMANode L#0 (P#0 24GB)
>  NUMANode L#1 (P#1) + Socket L#0 + L3 L#0 (15MB) + L2 L#0 (256KB) + L1
> L#0 (32KB) + Core L#0 + PU L#0 (P#11)
> [jallan@hpc21 ~]$


You likely have some Linux cpuset that restrict the available CPUs.
That's why the first socket object doesn't appear in lstopo above. And
that's why "socket:1" fails in other commands: there's no socket with
logical index 1.

If you're allocating jobs with a batch scheduler, the problem will go
away if you reserve all cores of the node instead of a single one.

If you really want to play with manual binding on that restricted
platform, you also have to manually play with the unavailable resources.

Otherwise you can generate the entire topology with "lstopo
--whole-system foo.xml" and then use it with "normal" socket numbers:
"hwloc-bind -i foo.xml socket:1.core:0 etc". You won't get errors about
objects anymore, but you may get new errors about failures to bind if
you try to bind to objects outside the restricted topology.

Brice



Re: [hwloc-users] [WARNING: A/V UNSCANNABLE]Re: [OMPI users] SIGSEGV in opal_hwlock152_hwlock_bitmap_or.A // Bug in 'hwlock" ?

2013-11-04 Thread Brice Goglin
Thanks. That's indeed the same bug that you got in Open MPI (reuse of a
hwloc cpuset structure that was freed earlier). It's a nasty bug that
happens when reloading from XML on big machines like yours (that
explains why lstopo works while xmlbuffer and OMPI fail). It was fixed
in hwloc v1.7.1 (hence will be fixed in Open MPI 1.7.4 from what I
understand) but the fix was too big to be backported to older hwloc/OMPI.

You should be able to work around the problem for now by setting
HWLOC_GROUPING=0 in your environment.

I re-added hwloc-users to CC so that the bug is officially "closed".

Brice




Le 04/11/2013 22:33, Paul Kapinos a écrit :
> Hello again,
> I'm not allowed to publish to Hardware locality user list so I omit it
> now.
>
> On 11/04/13 14:19, Brice Goglin wrote:
>> Le 04/11/2013 11:44, Paul Kapinos a écrit :
>>> Hello all,
>>> I.
>>> sorry for this paleontologic excursion. (The 4 years old 'lstopo'
>>> binary was just in my private bin folder and still being runnable..)
>>>
>>> Attached output of newer version 1.5 (Linux-Default one on RHEL/6.4
>>> (SL/6.4).
>>>
>>> II.
>>> I've also tested hwloc-1.5.2 (could not find v.1.5.3) and hwloc-1.7.2
>>> as Brice suggested, by 'confugure' + 'make test' - logs attached.
>>>
>>> 1.5.2 fails:
>>>> /bin/sh: line 5: 20677 Segmentation fault (core dumped) ${dir}$tst
>>>> FAIL: xmlbuffer
>>
>> Can you give more details about this segfault?
>>
>> Try (from the build tree):
>> $ libtool --mode=execute gdb xmlbuffer
>> then type 'run'
>> when it crashes, type 'bt full' and send the output.
>
> see attached file trace_1.5.2.txt
>
>
>
>
>
>>
>> Then please also run from hwloc 1.5.2:
>> * "lstopo foo.xml" and send "foo.xml"
>> * "hwloc-gather-topology foo" and send "foo.tar.bz2"
>
> also attached but with non-empty names :o)
>
>
>
> Best
>
> Paul
>>
>>> whereby 1.7.2 seem to be OK.
>>>
>>> AFAIK in OpenMPI 1.7.4 the version of 'hwlock' has to be updated?
>>> If so, the original issue should be fixed by this, huh?
>>
>> Hard to say before we get details about the crash in xmlbuffer above.
>>
>> Brice
>>
>>
>>>
>>> Many thanks for your help!
>>> Best
>>>
>>> Paul
>>>
>>> pk224850@linuxitvc00:~/SVN/mpifasttest/trunk[511]lstopo 1.5
>>> $ lstopo lstopo_linuxitvc00_1.5.txt
>>> $ lstopo lstopo_linuxitvc00_1.5.xml
>>>
>>>
>>>
>>>
>>>
>>> On 11/01/13 15:37, Brice Goglin wrote:
>>>> Sorry, I missed the mail on OMPI-users.
>>>>
>>>> This hwloc looks vry old. We don't have Misc objects
>>>> instead of
>>>> Groups since we switched from 0.9 to 1.0. You should regenerate the
>>>> XML file
>>>> with a hwloc version that came out after the big bang (or better,
>>>> after the
>>>> asteroid killed the dinosaurs). Please resend that XML from a recent
>>>> hwloc so
>>>> that we can get a better clue of the problem.
>>>>
>>>> Assuming there's a bug in OMPI's hwloc, I would suggests downloading
>>>> hwloc 1.5.3
>>>> and running make check on that machine. And try again with hwloc
>>>> 1.7.2 in case
>>>> that's already fixed.
>>>>
>>>> thanks
>>>> Brice
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Le 01/11/2013 15:24, Jeff Squyres (jsquyres) a écrit :
>>>>> Paul Kapinos originally reported this issue on the OMPI users list.
>>>>>
>>>>> He is showing a stack trace from OMPI-1.7.3, which uses hwloc 1.5.2
>>>>> (note that
>>>>> OMPI 1.7.4 will use hwloc 1.7.2).
>>>>>
>>>>> I tried to read the xml file he provided with the git hwloc master
>>>>> HEAD, and
>>>>> it fails:
>>>>>
>>>>> -
>>>>> ❯❯❯ ./utils/lstopo -i lstopo_linuxitvc00.xml
>>>>> ignoring depth attribute for object type without depth
>>>>> ignoring depth attribute for object type without depth
>>>>> XML component discovery failed.
>>>>> hwloc_topology_load() failed (Invalid argument).
>>>>> -
>>>>>
>>>>> Any idea what's happening here?
>>>>>
>>>>> BTW, I can apply th

Re: [hwloc-users] [hwloc-announce] Hardware locality (hwloc) v1.8rc1 released

2013-11-09 Thread Brice Goglin
Le 09/11/2013 01:33, Jiri Hladky a écrit :
> Hi Brice,
>
> I have bumped into the following issue:
>
> hwloc-1.7.2:
> This works as expected:
> utils/hwloc-calc core:0-1 -H pu
> PU:0 PU:1 PU:2 PU:3
>
> Now intuitively one would expect this to work as well (as supported by
> taskset/numactl commands)
> utils/hwloc-calc core:0,1 -H pu
> PU:0 PU:1
> Unfortunately, ",1" is silently ignored.
>
> hwloc-1.8rc1 does better:
> $ ./hwloc-calc core:0,1 -H pu
> invalid character at `,1' after index at `0,1'
> ignored unrecognized argument core:0,1

Somebody got the same issue a couple months ago. That's why I've added
these explicit warning in 1.8. NEWS says:

  + hwloc-calc and friends have a more robust parsing of locations given
on the command-line and they report useful error messages about it.


> However, I would vote for the format
> object:index,index1,index2
>
> to be supported and being equivalent to
>
> object:index object:index1 object:index2
>
> What do you think about it?

It's annoying to implement because the current code was designed for
(nested) loops only. Given that object:index1 object:index2 is easy to
write, I'd vote for not making the code too complex.

Brice



>
> Thanks a lot!
> Jirka
>
>
> On Wed, Nov 6, 2013 at 3:06 PM, Brice Goglin <brice.gog...@inria.fr
> <mailto:brice.gog...@inria.fr>> wrote:
>
> The Hardware Locality (hwloc) team is pleased to announce the first
> release candidate for v1.8:
>
>http://www.open-mpi.org/projects/hwloc/
>
> v1.8rc1 is the first milestone of a major feature release.
> It adds PCI discovery on Linux without dependencies on external libs,
> a new API to manipulate differences between very similar topologies,
> multiple improvements to command-line tools, and more.
>
> * New components
>   + Add the "linuxpci" component that always works on Linux even when
> libpciaccess and libpci aren't available (and even with a modified
> file-system root). By default the old "pci" component runs first
> because "linuxpci" lacks device names (obj->name is always NULL).
> * API
>   + Add the topology difference API in hwloc/diff.h for manipulating
> many similar topologies.
>   + Add hwloc_topology_dup() for duplicating an entire topology.
>   + hwloc.h and hwloc/helper.h have been reorganized to clarify the
> documentation sections. The actual inline code has moved out
> of hwloc.h
> into the new hwloc/inlines.h.
>   + Deprecated functions are now in hwloc/deprecated.h, and not in the
> official documentation anymore.
> * Tools
>   + Add hwloc-diff and hwloc-patch tools together with the new
> diff API.
>   + Add hwloc-compress-dir to (de)compress an entire directory of
> XML files
> using hwloc-diff and hwloc-patch.
>   + Object colors in the graphical output of lstopo may be changed
> by adding
> a "lstopoStyle" info attribute. See CUSTOM COLORS in the
> lstopo(1) manpage
> for details. Thanks to Jirka Hladky for discussing the idea.
>   + hwloc-gather-topology may now gather I/O-related files on
> Linux when
> --io is given. Only the linuxpci component supports
> discovering I/O
> objects from these extended tarballs.
>   + hwloc-annotate now supports --ri to remove/replace info
> attributes with
> a given name.
>   + hwloc-info supports "root" and "all" special locations for dumping
> information about the root object.
>   + lstopo now supports --append-legend to append custom lines of text
> to the legend in the graphical output. Thanks to Jirka Hladky for
> discussing the idea.
>   + hwloc-calc and friends have a more robust parsing of locations
> given
> on the command-line and they report useful error messages
> about it.
>   + Add --whole-system to hwloc-bind, hwloc-calc, hwloc-distances and
> hwloc-distrib, and add --restrict to hwloc-bind for uniformity
> among
> tools.
> * Misc
>   + Calling hwloc_topology_load() or hwloc_topology_set_*() on an
> already
> loaded topology now returns an error (deprecated since release
> 1.6.1).
>   + Fix the initialisation of cpusets and nodesets in Group
> objects added
> when inserting PCI hostbridges.
>   + Never merge Group objects that were added explicitly by the
> user with
> hwloc_custom_insert_group_object_by_parent().
>   + Add a sanity check during dynamic plug

Re: [hwloc-users] DELL 8 core machine + Quadro K5000 GPU Card...

2013-11-18 Thread Brice Goglin


Le 18/11/2013 02:14, Solibakke Per Bjarte a écrit :
> Hello
>
> I recently got access to a very interesting and powerful machine: Dell
> 8 core + GPU Quadro K5000 (96 cores).
> A total of 1536 cores in the original machine configuration.

Hello

GPU cores are not real cores so I am not sure your count makes sense :)

> I installed first HWLOC 1.7 version and I also installed the newly
> released beta 1.8. The final installation lines report PCI (linux) CUDA.
> However, the commands:
>
> Lstopo ---whole-system and lstopo ---whole-io
>
> report only the 8 CPU-cores. No reference to PCI-Bridges, eth0, seas
> +++ and the GPUs.
>
> Is the installation of the machine the problem or is my 
> ./configure ---prefix=/usr/local/hwloc
>
> missing some vital elements?

What kind of Linux distribution and what kernel version is this?

Please run "hwloc-gather-topology --io myname" (from hwloc 1.8) and send
the corresponding myname.tar.bz2 that it will create.
Note that this command may be a bit slow. Also the generated tar.bz2 may
be big, so feel free to send it to me only since it may well be rejected
by the mailing list.

Brice


>
> Regards
> PBSolibakke
>
> Dr.econ Per Bjarte Solibakke
> Professor
> per.b.soliba...@himolde.no
> Cell phone: 004790035606
> Phone: 004771214238
>
>
> ___
> hwloc-users mailing list
> hwloc-us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users



Re: [hwloc-users] Regarding the Dell 8 core machine with GPUs

2013-11-18 Thread Brice Goglin
(readding the list to CC).

OK, good to know it actually works.
v1.7.2 doesn't get any PCI device (and therefore no CUDA devices)
because you don't have PCI libraries/headers installed (see
"Installation" in http://www.open-mpi.org/projects/hwloc/doc/v1.8rc1/ ).
v1.8 has a new Linux-specific way to discover PCI devices, that's why it
works there.

Note that --whole-system isn't relevant here (it's only about showing
processors/cores that have been reserved by the administrator).
--whole-io isn't strictly needed either (GPUs are always shown by
default in lstopo).

cpuinfo.txt doesn't contain the kernel version ("uname -a" would be more
useful) but I don't need this information anymore anyway.

Looks like I am ready to release the final hwloc v1.8 now :)

Brice



Le 18/11/2013 04:17, Solibakke Per Bjarte a écrit :
> Dear Brice Goglin
>
> Sorry, there must have been a version problem on my machine from 1.7.2
> to 1.8rc1. The lstopo —whole-system now report the bridges and the
> CoProc L#2 “cuda0” + pci´s +++
>
> Here is the answers for your questions and the attached bz2-file (if
> you are interested)?
>
> First question is UBUNTU 12.04 with CUDA-5.5 installed.
> Open-MPI-1.7.3 for parallell processing and hwloc for the mpirun-command.
> Kernel: cpu_info.txt
>
> The hwloc version 1.8rc1 – version and “—io” worked well.
> The lstopo —whole –io solibakke did also work well.
>
> The bz2-file is attached (68K)
>
> I will now “—use-hwthread-cpus" in the mpirun-command.
>
> Thank you.
>
> Regards
> PBSolibakke
>
> Dr.econ Per Bjarte Solibakke
> Professor
> per.b.soliba...@himolde.no
> Cell phone: 004790035606
> Phone: 004771214238
>
> From: Brice Goglin <brice.gog...@inria.fr <mailto:brice.gog...@inria.fr>>
> Date: Monday 18 November 2013 11:14
> To: Per B Solibakke <per.b.soliba...@himolde.no
> <mailto:per.b.soliba...@himolde.no>>
> Subject: Re: Regarding the Dell 8 core machine with GPUs
>
> --io is only in hwloc v1.8, did you use this one?
> I can't do anything with the tarball unless you pass this option.
> Also you forgot to reply to the other questions :)
>
> Brice
>
>
>
> Le 18/11/2013 03:12, Solibakke Per Bjarte a écrit :
>> Dear Brice Goglin
>>
>> The “—io” -option did not work. I therefore produced the command
>> without the “—io” - option.
>> It is attached to this e-mail.
>>
>> (the file size is 13K)
>>
>> Regards
>> PBSolibakke
>>
>> Dr.econ Per Bjarte Solibakke
>> Professor
>> per.b.soliba...@himolde.no
>> Cell phone: 004790035606
>> Phone: 004771214238
>



Re: [hwloc-users] windows PCI locality (was; DELL 8 core machine + Quadro K5000 GPU Card...)

2013-11-19 Thread Brice Goglin
  }
>
> }
>
> }
>
> }
>
> }
>
>  
>
> }
>
>  
>
> if (dataSet &&
>
> (bus == cudaBus) &&
>
> (subdevice == cudaSubdevice) &&
>
> (function == cudaFunction))
>
> {
>
> ret = SetupDiGetDeviceRegistryPropertyA(hNvDevInfo,
> , SPDRP_HARDWAREID, NULL,
>
> (PBYTE)locinfo, sizeof(locinfo), NULL);
>
>  
>
> printf("locinfo %s\n", locinfo);
>
>  
>
> int data[20];
>
> data[0] = 0;
>
> DEVPROPTYPE type;
>
> DEVPROPKEY key = DEVPKEY_Numa_Proximity_Domain;
>
>  
>
> lastError = 0;
>
>    
>
> ret =  SetupDiGetDeviceProperty(hNvDevInfo,
> , , , (PBYTE)[0], 20*sizeof(int), NULL,0);
>
>  
>
> if (!ret)
>
> {
>
> lastError = GetLastError();
>
> }
>
>  
>
> printf("DEVPKEY_Numa_Proximity_Domain %d err %d\n",
> data[0], lastError);
>
> key = DEVPKEY_Device_Numa_Node;
>
> lastError = 0;
>
> ret =  SetupDiGetDeviceProperty(hNvDevInfo,
> , , , (PBYTE)[0], 20*sizeof(int), NULL,0);
>
>  
>
> if (!ret)
>
> {
>
> lastError = GetLastError();
>
> }
>
>  
>
> printf("DEVPKEY_Device_Numa_Node %d err %d\n", data[0],
> lastError);
>
>
>
> return data[0];
>
> }
>
> }
>
>  
>
> return -1;
>
> }
>
>  
>
> *From:*hwloc-users [mailto:hwloc-users-boun...@open-mpi.org] *On
> Behalf Of *Brice Goglin
> *Sent:* Monday, November 18, 2013 11:09 AM
> *To:* Hardware locality user list
> *Subject:* Re: [hwloc-users] windows PCI locality (was; DELL 8 core
> machine + Quadro K5000 GPU Card...)
>
>  
>
> This seems unrelated since he seems to be running Linux anyway.
>
> We got that information a while ago but I couldn't do anything with it
> because (I think) I didn't have access to a Windows release that
> supported this. And, bigger problem, I don't have access to a Windows
> machine with more than one socket. I can't actually test the code
> anywhere.
>
> Are you volunteering to write some code? I am not saying that you
> should write the entire hwloc support, but some example would help a lot.
>
> Once we have the device locality, we'll need the devices too. The
> windows code misses the entire device listing code. Do you have any
> idea how to list PCI devices, match them with CUDA GPUs, etc ?
>
> Brice
>
>
>
>
> Le 18/11/2013 02:52, Ashley Reid a écrit :
>
> Maybe not completely related to your issue, but the windows code
> misses the correct enumeration to see where the GPU is in a NUMA
> system. The code needs to look at:
>
>  
>
> Use "DEVPKEY_Numa_Proximity_Domain" and "DEVPKEY_Device_Numa_Node"
> when calling SetupDiGetDeviceProperty.
>
> Links:
>
>  
>
> 
> http://msdn.microsoft.com/en-us/library/windows/hardware/ff543536(v=vs.85).aspx
> 
> <http://msdn.microsoft.com/en-us/library/windows/hardware/ff543536%28v=vs.85%29.aspx>
>
>"Windows Server 2003, Windows XP, and Windows 2000 do not
> support this property." -- So should be fine on win7 and win8?
>
> 
> http://blogs.technet.com/b/winserverperformance/archive/2008/09/13/getting-system-topology-information-on-windows.aspx
>
>  
>
> But this only works if the bios has the right ACPI entries, we
> filed a bug and got a update for the z820 from HP. This relies on
> the _PXM  value in the ACPI tables.
>
>  
>
> You can use windbg and !nstree to view the tables. There inside
> should be some _PXM values.
>
>  
>
> Ash
>
>  
>
>  
>
> *From:*hwloc-users [mailto:hwloc-users-boun...@open-mpi.org] *On
> Behalf Of *Solibakke Per Bjarte
> *Sent:* Monday, November 18, 2013 10:15 AM
> *To:* hwloc-us...@open-mpi.org <mailto:hwloc-us...@open-mpi.org>
> *Subject:* [hwloc-users] DELL 8 core machine + Quadro K5000 GPU
> Card...
>
>  
>
> Hello
>
>  
>
> I recently got access to a very interesting and powerful machine:
> Dell 8 core + GPU Quadro K5000 (96 cores).
>
> A total of 1536 cores in the original machine configuration.
>
>  
>
>

  1   2   3   >