Re: [hwloc-devel] Fwd: BGQ empty topology with MPI

2012-03-26 Thread Christopher Samuel
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 26/03/12 17:14, Brice Goglin wrote:

> Thanks, that would explain such a strange behavior.

Not a problem.

> For the record, you can run "lstopo -v" or even "lstopo -.xml" to
> get more info, especially machine attributes.

OK, please find attached both lstopo -v (with debug enabled) and also
the XML file requested.  This is BG/P, not BG/Q of course!

cheers!
Chris
- -- 
Christopher Samuel - Senior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.unimelb.edu.au/

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk9wDuYACgkQO2KABBYQAh+5rwCffUVzbgIGgfAH9HtjAlBO90uV
kLoAn0Rk2X6dlkNCBC3hKqPz1EZlx9KO
=G9MN
-END PGP SIGNATURE-



  











  

could not open /proc/cpuinfo


 * CPU cpusets *

cpu 0 (os 0) has cpuset 0x0001
cpu 1 (os 1) has cpuset 0x0002
cpu 2 (os 2) has cpuset 0x0004
cpu 3 (os 3) has cpuset 0x0008
Machine#0(Backend=Linux OSName=CNK OSRelease=2.6.16.60-304 OSVersion=1 
HostName=r00-m1-n02.pcf.vlsci.unimelb.edu.au Architecture=BGP) cpuset 0xf...f 
complete 0x000f online 0xf...f allowed 0xf...f nodeset 0x0 completeN 0x0 
allowedN 0xf...f
  PU#0 cpuset 0x0001
  PU#1 cpuset 0x0002
  PU#2 cpuset 0x0004
  PU#3 cpuset 0x0008

Restrict topology cpusets to existing PU and NODE objects

Propagate offline and disallowed cpus down and up

Propagate nodesets
Machine#0(Backend=Linux OSName=CNK OSRelease=2.6.16.60-304 OSVersion=1 
HostName=r00-m1-n02.pcf.vlsci.unimelb.edu.au Architecture=BGP) cpuset 
0x000f complete 0x000f online 0x000f allowed 0x000f
  PU#0 cpuset 0x0001 complete 0x0001 online 0x0001 allowed 
0x0001
  PU#1 cpuset 0x0002 complete 0x0002 online 0x0002 allowed 
0x0002
  PU#2 cpuset 0x0004 complete 0x0004 online 0x0004 allowed 
0x0004
  PU#3 cpuset 0x0008 complete 0x0008 online 0x0008 allowed 
0x0008

Removing unauthorized and offline cpusets from all cpusets

Removing disallowed memory according to nodesets
Machine#0(Backend=Linux OSName=CNK OSRelease=2.6.16.60-304 OSVersion=1 
HostName=r00-m1-n02.pcf.vlsci.unimelb.edu.au Architecture=BGP) cpuset 
0x000f complete 0x000f online 0x000f allowed 0x000f
  PU#0 cpuset 0x0001 complete 0x0001 online 0x0001 allowed 
0x0001
  PU#1 cpuset 0x0002 complete 0x0002 online 0x0002 allowed 
0x0002
  PU#2 cpuset 0x0004 complete 0x0004 online 0x0004 allowed 
0x0004
  PU#3 cpuset 0x0008 complete 0x0008 online 0x0008 allowed 
0x0008

Removing ignored objects
Machine#0(Backend=Linux OSName=CNK OSRelease=2.6.16.60-304 OSVersion=1 
HostName=r00-m1-n02.pcf.vlsci.unimelb.edu.au Architecture=BGP) cpuset 
0x000f complete 0x000f online 0x000f allowed 0x000f
  PU#0 cpuset 0x0001 complete 0x0001 online 0x0001 allowed 
0x0001
  PU#1 cpuset 0x0002 complete 0x0002 online 0x0002 allowed 
0x0002
  PU#2 cpuset 0x0004 complete 0x0004 online 0x0004 allowed 
0x0004
  PU#3 cpuset 0x0008 complete 0x0008 online 0x0008 allowed 
0x0008

Removing empty objects except numa nodes and PCI devices
Machine#0(Backend=Linux OSName=CNK OSRelease=2.6.16.60-304 OSVersion=1 
HostName=r00-m1-n02.pcf.vlsci.unimelb.edu.au Architecture=BGP) cpuset 
0x000f complete 0x000f online 0x000f allowed 0x000f
  PU#0 cpuset 0x0001 complete 0x0001 online 0x0001 allowed 
0x0001
  PU#1 cpuset 0x0002 complete 0x0002 online 0x0002 allowed 
0x0002
  PU#2 cpuset 0x0004 complete 0x0004 online 0x0004 allowed 
0x0004
  PU#3 cpuset 0x0008 complete 0x0008 online 0x0008 allowed 
0x0008

Removing objects whose type has HWLOC_IGNORE_TYPE_KEEP_STRUCTURE and have only 
one child or are the only child
Machine#0(Backend=Linux OSName=CNK OSRelease=2.6.16.60-304 OSVersion=1 
HostName=r00-m1-n02.pcf.vlsci.unimelb.edu.au Architecture=BGP) cpuset 
0x000f complete 0x000f online 0x000f allowed 0x000f
  PU#0 cpuset 0x0001 complete 0x0001 online 0x0001 allowed 
0x0001
  PU#1 cpuset 0x0002 complete 0x0002 online 0x0002 allowed 
0x0002
  PU#2 cpuset 0x0004 complete 0x0004 online 0x0004 allowed 
0x0004
  PU#3 cpuset 0x0008 complete 0x0008 online 0x0008 allowed 
0x0008

Add default object sets

Ok, finished tweaking, now connect
Machine#0(Backend=Linux OSName=CNK OSRelease=2.6.16.60-304 OSVersion=1 
HostName=r00-m1-n02.pcf.vlsci.unimelb.edu.au Architecture=BGP) cpuset 
0x000f complete 0x000f online 0x000f allowed 0x000f nodeset 
0xf...f completeN 0xf...f allowedN 0xf...f arity 4
  PU#0 cpuset 

Re: [hwloc-devel] Fwd: BGQ empty topology with MPI

2012-03-26 Thread Brice Goglin
Le 26/03/2012 05:16, Christopher Samuel a écrit :
> On 25/03/12 09:04, Daniel Ibanez wrote:
>
> > Additional printfs confirm that with MPI in the code,
> > hwloc_accessat succeeds on the various /sys/ directories, but the
> > overall procedure for getting PUs from these fails. Without MPI,
> > access to /sys/ directories fails but the fallback
> > hwloc_setup_pu_level works.
>
> Sounds like your I/O with MPI is getting redirected to the I/O node
> (and hence finding /sys from the Linux kernel there) but when you're
> running without MPI it's trying to open files on the compute node and
> the CNK isn't presenting the /sys directories, causing it to fall back.
>
> I've run lstopo on our BG/P and I get to see the 4 cores there whether
> it's the stock code or if I add an MPI_Init() to the start.  The
> output from lstopo when built with --enable-debug confirms it's
> reporting kernel and hostname info from the I/O node associated with
> the block:
>
> Machine#0(Backend=Linux OSName=CNK OSRelease=2.6.16.60-304 OSVersion=1
> HostName=r00-m1-n04.pcf.vlsci.unimelb.edu.au Architecture=BGP) [...]

Thanks, that would explain such a strange behavior.

For the record, you can run "lstopo -v" or even "lstopo -.xml" to get
more info, especially machine attributes.

Brice



Re: [hwloc-devel] Fwd: BGQ empty topology with MPI

2012-03-26 Thread Christopher Samuel
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 25/03/12 17:43, Brice Goglin wrote:

> But it'd be good to understand what's going on in /sys on this
> machine. And I still don't understand why MPI changes things here.

My guess (looking at the BG/P CNK kernel code) is that /sys is not
present on a BG/Q compute node, only on its I/O nodes (which run a
Linux kernel), and so the code is only picking them up when the I/O is
being redirected via an I/O node (i.e. when MPI is in play).

Now I'd have thought that would happen with or without MPI, but who
knows..

cheers,
Chris
- -- 
Christopher Samuel - Senior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.unimelb.edu.au/

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk9v4XkACgkQO2KABBYQAh8QrwCdGVrp1OzExLnB9v696lqEO2yz
qKwAnivU+GJ2lXB5wzRBw1WlCkj0XeSy
=rgKS
-END PGP SIGNATURE-


Re: [hwloc-devel] Fwd: BGQ empty topology with MPI

2012-03-26 Thread Christopher Samuel
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 25/03/12 09:04, Daniel Ibanez wrote:

> Additional printfs confirm that with MPI in the code, 
> hwloc_accessat succeeds on the various /sys/ directories, but the
> overall procedure for getting PUs from these fails. Without MPI,
> access to /sys/ directories fails but the fallback
> hwloc_setup_pu_level works.

Sounds like your I/O with MPI is getting redirected to the I/O node
(and hence finding /sys from the Linux kernel there) but when you're
running without MPI it's trying to open files on the compute node and
the CNK isn't presenting the /sys directories, causing it to fall back.

I've run lstopo on our BG/P and I get to see the 4 cores there whether
it's the stock code or if I add an MPI_Init() to the start.  The
output from lstopo when built with --enable-debug confirms it's
reporting kernel and hostname info from the I/O node associated with
the block:

Machine#0(Backend=Linux OSName=CNK OSRelease=2.6.16.60-304 OSVersion=1
HostName=r00-m1-n04.pcf.vlsci.unimelb.edu.au Architecture=BGP) [...]

It might be interesting to build something like ls with the BG/Q
compilers to see if you can run it on a compute node to see what /proc
or /sys look like in each case.

cheers,
Chris
- -- 
Christopher Samuel - Senior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.unimelb.edu.au/

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk9v33UACgkQO2KABBYQAh+S1ACfSypUPtoOFV8fHOObBztuUMGI
RmwAnRy/Estz8Qi2KzAuQigPJbgtSlD4
=sdGx
-END PGP SIGNATURE-