Re: [hwloc-devel] Fwd: BGQ empty topology with MPI
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 26/03/12 17:14, Brice Goglin wrote: > Thanks, that would explain such a strange behavior. Not a problem. > For the record, you can run "lstopo -v" or even "lstopo -.xml" to > get more info, especially machine attributes. OK, please find attached both lstopo -v (with debug enabled) and also the XML file requested. This is BG/P, not BG/Q of course! cheers! Chris - -- Christopher Samuel - Senior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.unimelb.edu.au/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk9wDuYACgkQO2KABBYQAh+5rwCffUVzbgIGgfAH9HtjAlBO90uV kLoAn0Rk2X6dlkNCBC3hKqPz1EZlx9KO =G9MN -END PGP SIGNATURE- could not open /proc/cpuinfo * CPU cpusets * cpu 0 (os 0) has cpuset 0x0001 cpu 1 (os 1) has cpuset 0x0002 cpu 2 (os 2) has cpuset 0x0004 cpu 3 (os 3) has cpuset 0x0008 Machine#0(Backend=Linux OSName=CNK OSRelease=2.6.16.60-304 OSVersion=1 HostName=r00-m1-n02.pcf.vlsci.unimelb.edu.au Architecture=BGP) cpuset 0xf...f complete 0x000f online 0xf...f allowed 0xf...f nodeset 0x0 completeN 0x0 allowedN 0xf...f PU#0 cpuset 0x0001 PU#1 cpuset 0x0002 PU#2 cpuset 0x0004 PU#3 cpuset 0x0008 Restrict topology cpusets to existing PU and NODE objects Propagate offline and disallowed cpus down and up Propagate nodesets Machine#0(Backend=Linux OSName=CNK OSRelease=2.6.16.60-304 OSVersion=1 HostName=r00-m1-n02.pcf.vlsci.unimelb.edu.au Architecture=BGP) cpuset 0x000f complete 0x000f online 0x000f allowed 0x000f PU#0 cpuset 0x0001 complete 0x0001 online 0x0001 allowed 0x0001 PU#1 cpuset 0x0002 complete 0x0002 online 0x0002 allowed 0x0002 PU#2 cpuset 0x0004 complete 0x0004 online 0x0004 allowed 0x0004 PU#3 cpuset 0x0008 complete 0x0008 online 0x0008 allowed 0x0008 Removing unauthorized and offline cpusets from all cpusets Removing disallowed memory according to nodesets Machine#0(Backend=Linux OSName=CNK OSRelease=2.6.16.60-304 OSVersion=1 HostName=r00-m1-n02.pcf.vlsci.unimelb.edu.au Architecture=BGP) cpuset 0x000f complete 0x000f online 0x000f allowed 0x000f PU#0 cpuset 0x0001 complete 0x0001 online 0x0001 allowed 0x0001 PU#1 cpuset 0x0002 complete 0x0002 online 0x0002 allowed 0x0002 PU#2 cpuset 0x0004 complete 0x0004 online 0x0004 allowed 0x0004 PU#3 cpuset 0x0008 complete 0x0008 online 0x0008 allowed 0x0008 Removing ignored objects Machine#0(Backend=Linux OSName=CNK OSRelease=2.6.16.60-304 OSVersion=1 HostName=r00-m1-n02.pcf.vlsci.unimelb.edu.au Architecture=BGP) cpuset 0x000f complete 0x000f online 0x000f allowed 0x000f PU#0 cpuset 0x0001 complete 0x0001 online 0x0001 allowed 0x0001 PU#1 cpuset 0x0002 complete 0x0002 online 0x0002 allowed 0x0002 PU#2 cpuset 0x0004 complete 0x0004 online 0x0004 allowed 0x0004 PU#3 cpuset 0x0008 complete 0x0008 online 0x0008 allowed 0x0008 Removing empty objects except numa nodes and PCI devices Machine#0(Backend=Linux OSName=CNK OSRelease=2.6.16.60-304 OSVersion=1 HostName=r00-m1-n02.pcf.vlsci.unimelb.edu.au Architecture=BGP) cpuset 0x000f complete 0x000f online 0x000f allowed 0x000f PU#0 cpuset 0x0001 complete 0x0001 online 0x0001 allowed 0x0001 PU#1 cpuset 0x0002 complete 0x0002 online 0x0002 allowed 0x0002 PU#2 cpuset 0x0004 complete 0x0004 online 0x0004 allowed 0x0004 PU#3 cpuset 0x0008 complete 0x0008 online 0x0008 allowed 0x0008 Removing objects whose type has HWLOC_IGNORE_TYPE_KEEP_STRUCTURE and have only one child or are the only child Machine#0(Backend=Linux OSName=CNK OSRelease=2.6.16.60-304 OSVersion=1 HostName=r00-m1-n02.pcf.vlsci.unimelb.edu.au Architecture=BGP) cpuset 0x000f complete 0x000f online 0x000f allowed 0x000f PU#0 cpuset 0x0001 complete 0x0001 online 0x0001 allowed 0x0001 PU#1 cpuset 0x0002 complete 0x0002 online 0x0002 allowed 0x0002 PU#2 cpuset 0x0004 complete 0x0004 online 0x0004 allowed 0x0004 PU#3 cpuset 0x0008 complete 0x0008 online 0x0008 allowed 0x0008 Add default object sets Ok, finished tweaking, now connect Machine#0(Backend=Linux OSName=CNK OSRelease=2.6.16.60-304 OSVersion=1 HostName=r00-m1-n02.pcf.vlsci.unimelb.edu.au Architecture=BGP) cpuset 0x000f complete 0x000f online 0x000f allowed 0x000f nodeset 0xf...f completeN 0xf...f allowedN 0xf...f arity 4 PU#0 cpuset
Re: [hwloc-devel] Fwd: BGQ empty topology with MPI
Le 26/03/2012 05:16, Christopher Samuel a écrit : > On 25/03/12 09:04, Daniel Ibanez wrote: > > > Additional printfs confirm that with MPI in the code, > > hwloc_accessat succeeds on the various /sys/ directories, but the > > overall procedure for getting PUs from these fails. Without MPI, > > access to /sys/ directories fails but the fallback > > hwloc_setup_pu_level works. > > Sounds like your I/O with MPI is getting redirected to the I/O node > (and hence finding /sys from the Linux kernel there) but when you're > running without MPI it's trying to open files on the compute node and > the CNK isn't presenting the /sys directories, causing it to fall back. > > I've run lstopo on our BG/P and I get to see the 4 cores there whether > it's the stock code or if I add an MPI_Init() to the start. The > output from lstopo when built with --enable-debug confirms it's > reporting kernel and hostname info from the I/O node associated with > the block: > > Machine#0(Backend=Linux OSName=CNK OSRelease=2.6.16.60-304 OSVersion=1 > HostName=r00-m1-n04.pcf.vlsci.unimelb.edu.au Architecture=BGP) [...] Thanks, that would explain such a strange behavior. For the record, you can run "lstopo -v" or even "lstopo -.xml" to get more info, especially machine attributes. Brice
Re: [hwloc-devel] Fwd: BGQ empty topology with MPI
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 25/03/12 17:43, Brice Goglin wrote: > But it'd be good to understand what's going on in /sys on this > machine. And I still don't understand why MPI changes things here. My guess (looking at the BG/P CNK kernel code) is that /sys is not present on a BG/Q compute node, only on its I/O nodes (which run a Linux kernel), and so the code is only picking them up when the I/O is being redirected via an I/O node (i.e. when MPI is in play). Now I'd have thought that would happen with or without MPI, but who knows.. cheers, Chris - -- Christopher Samuel - Senior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.unimelb.edu.au/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk9v4XkACgkQO2KABBYQAh8QrwCdGVrp1OzExLnB9v696lqEO2yz qKwAnivU+GJ2lXB5wzRBw1WlCkj0XeSy =rgKS -END PGP SIGNATURE-
Re: [hwloc-devel] Fwd: BGQ empty topology with MPI
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 25/03/12 09:04, Daniel Ibanez wrote: > Additional printfs confirm that with MPI in the code, > hwloc_accessat succeeds on the various /sys/ directories, but the > overall procedure for getting PUs from these fails. Without MPI, > access to /sys/ directories fails but the fallback > hwloc_setup_pu_level works. Sounds like your I/O with MPI is getting redirected to the I/O node (and hence finding /sys from the Linux kernel there) but when you're running without MPI it's trying to open files on the compute node and the CNK isn't presenting the /sys directories, causing it to fall back. I've run lstopo on our BG/P and I get to see the 4 cores there whether it's the stock code or if I add an MPI_Init() to the start. The output from lstopo when built with --enable-debug confirms it's reporting kernel and hostname info from the I/O node associated with the block: Machine#0(Backend=Linux OSName=CNK OSRelease=2.6.16.60-304 OSVersion=1 HostName=r00-m1-n04.pcf.vlsci.unimelb.edu.au Architecture=BGP) [...] It might be interesting to build something like ls with the BG/Q compilers to see if you can run it on a compute node to see what /proc or /sys look like in each case. cheers, Chris - -- Christopher Samuel - Senior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.unimelb.edu.au/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk9v33UACgkQO2KABBYQAh+S1ACfSypUPtoOFV8fHOObBztuUMGI RmwAnRy/Estz8Qi2KzAuQigPJbgtSlD4 =sdGx -END PGP SIGNATURE-