Re: [OMPI users] affinity issues under cpuset torque 1.8.1
Yes ompi_info --all Works, ompi_info -param all all [brockp@flux-login1 34241]$ ompi_info --param all all Error getting SCIF driver version MCA btl: parameter "btl_tcp_if_include" (current value: "", data source: default, level: 1 user/basic, type: string) Comma-delimited list of devices and/or CIDR notation of networks to use for MPI communication (e.g., "eth0,192.168.0.0/16"). Mutually exclusive with btl_tcp_if_exclude. MCA btl: parameter "btl_tcp_if_exclude" (current value: "127.0.0.1/8,sppp", data source: default, level: 1 user/basic, type: string) Comma-delimited list of devices and/or CIDR notation of networks to NOT use for MPI communication -- all devices not matching these specifications will be used (e.g., "eth0,192.168.0.0/16"). If set to a non-default value, it is mutually exclusive with btl_tcp_if_include. [brockp@flux-login1 34241]$ ompi_info --param all all --level 9 (gives me what I expect). Thanks, Brock Palen www.umich.edu/~brockp CAEN Advanced Computing XSEDE Campus Champion bro...@umich.edu (734)936-1985 On Jun 24, 2014, at 10:22 AM, Jeff Squyres (jsquyres) wrote: > Brock -- > > Can you run with "ompi_info --all"? > > With "--param all all", ompi_info in v1.8.x is defaulting to only showing > level 1 MCA params. It's showing you all possible components and variables, > but only level 1. > > Or you could also use "--level 9" to show all 9 levels. Here's the relevant > section from the README: > > - > The following options may be helpful: > > --all Show a *lot* of information about your Open MPI >installation. > --parsable Display all the information in an easily >grep/cut/awk/sed-able format. > --param >A of "all" and a of "all" will >show all parameters to all components. Otherwise, the >parameters of all the components in a specific framework, >or just the parameters of a specific component can be >displayed by using an appropriate and/or > name. > --level >By default, ompi_info only shows "Level 1" MCA parameters >-- parameters that can affect whether MPI processes can >run successfully or not (e.g., determining which network >interfaces to use). The --level option will display all >MCA parameters from level 1 to (the max >value is 9). Use "ompi_info --param > --level 9" to see *all* MCA parameters for a >given component. See "The Modular Component Architecture >(MCA)" section, below, for a fuller explanation. > > > > > > On Jun 24, 2014, at 5:19 AM, Ralph Castain wrote: > >> That's odd - it shouldn't truncate the output. I'll take a look later today >> - we're all gathered for a developer's conference this week, so I'll be able >> to poke at this with Nathan. >> >> >> >> On Mon, Jun 23, 2014 at 3:15 PM, Brock Palen wrote: >> Perfection, flexible, extensible, so nice. >> >> BTW this doesn't happen older versions, >> >> [brockp@flux-login2 34241]$ ompi_info --param all all >> Error getting SCIF driver version >> MCA btl: parameter "btl_tcp_if_include" (current value: "", >> data source: default, level: 1 user/basic, type: >> string) >> Comma-delimited list of devices and/or CIDR >> notation of networks to use for MPI communication >> (e.g., "eth0,192.168.0.0/16"). Mutually exclusive >> with btl_tcp_if_exclude. >> MCA btl: parameter "btl_tcp_if_exclude" (current value: >> "127.0.0.1/8,sppp", data source: default, level: 1 >> user/basic, type: string) >> Comma-delimited list of devices and/or CIDR >> notation of networks to NOT use for MPI >> communication -- all devices not matching these >> specifications will be used (e.g., >> "eth0,192.168.0.0/16"). If set to a non-default >> value, it is mutually exclusive with >> btl_tcp_if_include. >> >> >> This is normally much longer. And yes we don't have the PHI stuff installed >> on all nodes, strange that 'all all' is now very short, ompi_info -a still >> works though. >> >> >> >> Brock Palen >> www.umich.edu/~brockp >> CAEN Advanced Computi
Re: [OMPI users] affinity issues under cpuset torque 1.8.1
Brock -- Can you run with "ompi_info --all"? With "--param all all", ompi_info in v1.8.x is defaulting to only showing level 1 MCA params. It's showing you all possible components and variables, but only level 1. Or you could also use "--level 9" to show all 9 levels. Here's the relevant section from the README: - The following options may be helpful: --all Show a *lot* of information about your Open MPI installation. --parsable Display all the information in an easily grep/cut/awk/sed-able format. --param A of "all" and a of "all" will show all parameters to all components. Otherwise, the parameters of all the components in a specific framework, or just the parameters of a specific component can be displayed by using an appropriate and/or name. --level By default, ompi_info only shows "Level 1" MCA parameters -- parameters that can affect whether MPI processes can run successfully or not (e.g., determining which network interfaces to use). The --level option will display all MCA parameters from level 1 to (the max value is 9). Use "ompi_info --param --level 9" to see *all* MCA parameters for a given component. See "The Modular Component Architecture (MCA)" section, below, for a fuller explanation. On Jun 24, 2014, at 5:19 AM, Ralph Castain wrote: > That's odd - it shouldn't truncate the output. I'll take a look later today - > we're all gathered for a developer's conference this week, so I'll be able to > poke at this with Nathan. > > > > On Mon, Jun 23, 2014 at 3:15 PM, Brock Palen wrote: > Perfection, flexible, extensible, so nice. > > BTW this doesn't happen older versions, > > [brockp@flux-login2 34241]$ ompi_info --param all all > Error getting SCIF driver version > MCA btl: parameter "btl_tcp_if_include" (current value: "", > data source: default, level: 1 user/basic, type: > string) > Comma-delimited list of devices and/or CIDR > notation of networks to use for MPI communication > (e.g., "eth0,192.168.0.0/16"). Mutually exclusive > with btl_tcp_if_exclude. > MCA btl: parameter "btl_tcp_if_exclude" (current value: > "127.0.0.1/8,sppp", data source: default, level: 1 > user/basic, type: string) > Comma-delimited list of devices and/or CIDR > notation of networks to NOT use for MPI > communication -- all devices not matching these > specifications will be used (e.g., > "eth0,192.168.0.0/16"). If set to a non-default > value, it is mutually exclusive with > btl_tcp_if_include. > > > This is normally much longer. And yes we don't have the PHI stuff installed > on all nodes, strange that 'all all' is now very short, ompi_info -a still > works though. > > > > Brock Palen > www.umich.edu/~brockp > CAEN Advanced Computing > XSEDE Campus Champion > bro...@umich.edu > (734)936-1985 > > > > On Jun 20, 2014, at 1:48 PM, Ralph Castain wrote: > > > Put "orte_hetero_nodes=1" in your default MCA param file - uses can > > override by setting that param to 0 > > > > > > On Jun 20, 2014, at 10:30 AM, Brock Palen wrote: > > > >> Perfection! That appears to do it for our standard case. > >> > >> Now I know how to set MCA options by env var or config file. How can I > >> make this the default, that then a user can override? > >> > >> Brock Palen > >> www.umich.edu/~brockp > >> CAEN Advanced Computing > >> XSEDE Campus Champion > >> bro...@umich.edu > >> (734)936-1985 > >> > >> > >> > >> On Jun 20, 2014, at 1:21 PM, Ralph Castain wrote: > >> > >>> I think I begin to grok at least part of the problem. If you are > >>> assigning different cpus on each node, then you'll need to tell us that > >>> by setting --hetero-nodes otherwise we won't have any way to report that > >>> back to mpirun for its binding calculation. > >>> > >>> Otherwise, we expect that the cpuset of the first node we launch a daemon > >>> onto (or where mpirun is executing, if we are only launching local to > >>> mpirun) accurately represents the cpuset on every node in the allocation. > >>> > >>> We still might well have a bug in our binding computation - but the above > >>> will definitely impact what you said the user did. > >>> > >>> On Jun 20, 2014, at 10:06 AM, Brock Palen wrote: > >>> > Extra data point if I do: > > [brockp@nyx5508 34241]$ mpirun --report-bindings --bind-to core hostname >
Re: [OMPI users] affinity issues under cpuset torque 1.8.1
That's odd - it shouldn't truncate the output. I'll take a look later today - we're all gathered for a developer's conference this week, so I'll be able to poke at this with Nathan. On Mon, Jun 23, 2014 at 3:15 PM, Brock Palen wrote: > Perfection, flexible, extensible, so nice. > > BTW this doesn't happen older versions, > > [brockp@flux-login2 34241]$ ompi_info --param all all > Error getting SCIF driver version > MCA btl: parameter "btl_tcp_if_include" (current value: > "", > data source: default, level: 1 user/basic, type: > string) > Comma-delimited list of devices and/or CIDR > notation of networks to use for MPI communication > (e.g., "eth0,192.168.0.0/16"). Mutually > exclusive > with btl_tcp_if_exclude. > MCA btl: parameter "btl_tcp_if_exclude" (current value: > "127.0.0.1/8,sppp", data source: default, > level: 1 > user/basic, type: string) > Comma-delimited list of devices and/or CIDR > notation of networks to NOT use for MPI > communication -- all devices not matching these > specifications will be used (e.g., > "eth0,192.168.0.0/16"). If set to a non-default > value, it is mutually exclusive with > btl_tcp_if_include. > > > This is normally much longer. And yes we don't have the PHI stuff > installed on all nodes, strange that 'all all' is now very short, > ompi_info -a still works though. > > > > Brock Palen > www.umich.edu/~brockp > CAEN Advanced Computing > XSEDE Campus Champion > bro...@umich.edu > (734)936-1985 > > > > On Jun 20, 2014, at 1:48 PM, Ralph Castain wrote: > > > Put "orte_hetero_nodes=1" in your default MCA param file - uses can > override by setting that param to 0 > > > > > > On Jun 20, 2014, at 10:30 AM, Brock Palen wrote: > > > >> Perfection! That appears to do it for our standard case. > >> > >> Now I know how to set MCA options by env var or config file. How can I > make this the default, that then a user can override? > >> > >> Brock Palen > >> www.umich.edu/~brockp > >> CAEN Advanced Computing > >> XSEDE Campus Champion > >> bro...@umich.edu > >> (734)936-1985 > >> > >> > >> > >> On Jun 20, 2014, at 1:21 PM, Ralph Castain wrote: > >> > >>> I think I begin to grok at least part of the problem. If you are > assigning different cpus on each node, then you'll need to tell us that by > setting --hetero-nodes otherwise we won't have any way to report that back > to mpirun for its binding calculation. > >>> > >>> Otherwise, we expect that the cpuset of the first node we launch a > daemon onto (or where mpirun is executing, if we are only launching local > to mpirun) accurately represents the cpuset on every node in the allocation. > >>> > >>> We still might well have a bug in our binding computation - but the > above will definitely impact what you said the user did. > >>> > >>> On Jun 20, 2014, at 10:06 AM, Brock Palen wrote: > >>> > Extra data point if I do: > > [brockp@nyx5508 34241]$ mpirun --report-bindings --bind-to core > hostname > > -- > A request was made to bind to that would result in binding more > processes than cpus on a resource: > > Bind to: CORE > Node:nyx5513 > #processes: 2 > #cpus: 1 > > You can override this protection by adding the "overload-allowed" > option to your binding directive. > > -- > > [brockp@nyx5508 34241]$ mpirun -H nyx5513 uptime > 13:01:37 up 31 days, 23:06, 0 users, load average: 10.13, 10.90, > 12.38 > 13:01:37 up 31 days, 23:06, 0 users, load average: 10.13, 10.90, > 12.38 > [brockp@nyx5508 34241]$ mpirun -H nyx5513 --bind-to core hwloc-bind > --get > 0x0010 > 0x1000 > [brockp@nyx5508 34241]$ cat $PBS_NODEFILE | grep nyx5513 > nyx5513 > nyx5513 > > Interesting, if I force bind to core, MPI barfs saying there is only > 1 cpu available, PBS says it gave it two, and if I force (this is all > inside an interactive job) just on that node hwloc-bind --get I get what I > expect, > > Is there a way to get a map of what MPI thinks it has on each host? > > Brock Palen > www.umich.edu/~brockp > CAEN Advanced Computing > XSEDE Campus Champion > bro...@umich.edu > (734)936-1985 > > > > On Jun 20, 2014, at 12:38 PM, Brock Palen wrote: > > > I was able to produce it in my test. > > > > orted affinity set by cpuset: > >
Re: [OMPI users] affinity issues under cpuset torque 1.8.1
Let's say that the downside is an unknown at this time. The only real impact of setting that param is that each daemon now reports its topology at startup. Without the param, only the daemon on the first node does so. The concern expressed when we first added that report was that the volume of data being sent on a very large system might impact launch time. However, the amount of data from each node isn't very much, so we don't know if there really would be a downside, or how significant it might be. Sadly, we haven't had access to machines of any real size to test this so we had real numbers for the decision. Absent that data, we took the conservative approach of setting the default so as to preserve the pre-existing behavior. So everyone out there: please consider this an appeal for data. If you are interested and willing, just send me (or the list - your option) any data you are willing to share regarding launch time with and without the --hetero-nodes option. A simple "time mpirun --map-by ppr:1:node /bin/true" (or equivalent) run at various numbers of nodes would suffice. On Mon, Jun 23, 2014 at 3:17 PM, Maxime Boissonneault < maxime.boissonnea...@calculquebec.ca> wrote: > Hi, > I've been following this thread because it may be relevant to our setup. > > Is there a drawback of having orte_hetero_nodes=1 as default MCA parameter > ? Is there a reason why the most generic case is not assumed ? > > Maxime Boissonneault > > Le 2014-06-20 13:48, Ralph Castain a écrit : > >> Put "orte_hetero_nodes=1" in your default MCA param file - uses can >> override by setting that param to 0 >> >> >> On Jun 20, 2014, at 10:30 AM, Brock Palen wrote: >> >> Perfection! That appears to do it for our standard case. >>> >>> Now I know how to set MCA options by env var or config file. How can I >>> make this the default, that then a user can override? >>> >>> Brock Palen >>> www.umich.edu/~brockp >>> CAEN Advanced Computing >>> XSEDE Campus Champion >>> bro...@umich.edu >>> (734)936-1985 >>> >>> >>> >>> On Jun 20, 2014, at 1:21 PM, Ralph Castain wrote: >>> >>> I think I begin to grok at least part of the problem. If you are assigning different cpus on each node, then you'll need to tell us that by setting --hetero-nodes otherwise we won't have any way to report that back to mpirun for its binding calculation. Otherwise, we expect that the cpuset of the first node we launch a daemon onto (or where mpirun is executing, if we are only launching local to mpirun) accurately represents the cpuset on every node in the allocation. We still might well have a bug in our binding computation - but the above will definitely impact what you said the user did. On Jun 20, 2014, at 10:06 AM, Brock Palen wrote: Extra data point if I do: > > [brockp@nyx5508 34241]$ mpirun --report-bindings --bind-to core > hostname > > -- > A request was made to bind to that would result in binding more > processes than cpus on a resource: > > Bind to: CORE > Node:nyx5513 > #processes: 2 > #cpus: 1 > > You can override this protection by adding the "overload-allowed" > option to your binding directive. > > -- > > [brockp@nyx5508 34241]$ mpirun -H nyx5513 uptime > 13:01:37 up 31 days, 23:06, 0 users, load average: 10.13, 10.90, > 12.38 > 13:01:37 up 31 days, 23:06, 0 users, load average: 10.13, 10.90, > 12.38 > [brockp@nyx5508 34241]$ mpirun -H nyx5513 --bind-to core hwloc-bind > --get > 0x0010 > 0x1000 > [brockp@nyx5508 34241]$ cat $PBS_NODEFILE | grep nyx5513 > nyx5513 > nyx5513 > > Interesting, if I force bind to core, MPI barfs saying there is only 1 > cpu available, PBS says it gave it two, and if I force (this is all inside > an interactive job) just on that node hwloc-bind --get I get what I > expect, > > Is there a way to get a map of what MPI thinks it has on each host? > > Brock Palen > www.umich.edu/~brockp > CAEN Advanced Computing > XSEDE Campus Champion > bro...@umich.edu > (734)936-1985 > > > > On Jun 20, 2014, at 12:38 PM, Brock Palen wrote: > > I was able to produce it in my test. >> >> orted affinity set by cpuset: >> [root@nyx5874 ~]# hwloc-bind --get --pid 103645 >> 0xc002 >> >> This mask (1, 14,15) which is across sockets, matches the cpu set >> setup by the batch system. >> [root@nyx5874 ~]# cat /dev/cpuset/torque/12719806. >> nyx.engin.umich.edu/cpus >> 1,14-15 >> >> The ranks though were then all set to the same core: >> >> [root@nyx5874 ~]# hwloc-bind --get --pid 103871 >> 0x8000 >> [root
Re: [OMPI users] affinity issues under cpuset torque 1.8.1
Hi, I've been following this thread because it may be relevant to our setup. Is there a drawback of having orte_hetero_nodes=1 as default MCA parameter ? Is there a reason why the most generic case is not assumed ? Maxime Boissonneault Le 2014-06-20 13:48, Ralph Castain a écrit : Put "orte_hetero_nodes=1" in your default MCA param file - uses can override by setting that param to 0 On Jun 20, 2014, at 10:30 AM, Brock Palen wrote: Perfection! That appears to do it for our standard case. Now I know how to set MCA options by env var or config file. How can I make this the default, that then a user can override? Brock Palen www.umich.edu/~brockp CAEN Advanced Computing XSEDE Campus Champion bro...@umich.edu (734)936-1985 On Jun 20, 2014, at 1:21 PM, Ralph Castain wrote: I think I begin to grok at least part of the problem. If you are assigning different cpus on each node, then you'll need to tell us that by setting --hetero-nodes otherwise we won't have any way to report that back to mpirun for its binding calculation. Otherwise, we expect that the cpuset of the first node we launch a daemon onto (or where mpirun is executing, if we are only launching local to mpirun) accurately represents the cpuset on every node in the allocation. We still might well have a bug in our binding computation - but the above will definitely impact what you said the user did. On Jun 20, 2014, at 10:06 AM, Brock Palen wrote: Extra data point if I do: [brockp@nyx5508 34241]$ mpirun --report-bindings --bind-to core hostname -- A request was made to bind to that would result in binding more processes than cpus on a resource: Bind to: CORE Node:nyx5513 #processes: 2 #cpus: 1 You can override this protection by adding the "overload-allowed" option to your binding directive. -- [brockp@nyx5508 34241]$ mpirun -H nyx5513 uptime 13:01:37 up 31 days, 23:06, 0 users, load average: 10.13, 10.90, 12.38 13:01:37 up 31 days, 23:06, 0 users, load average: 10.13, 10.90, 12.38 [brockp@nyx5508 34241]$ mpirun -H nyx5513 --bind-to core hwloc-bind --get 0x0010 0x1000 [brockp@nyx5508 34241]$ cat $PBS_NODEFILE | grep nyx5513 nyx5513 nyx5513 Interesting, if I force bind to core, MPI barfs saying there is only 1 cpu available, PBS says it gave it two, and if I force (this is all inside an interactive job) just on that node hwloc-bind --get I get what I expect, Is there a way to get a map of what MPI thinks it has on each host? Brock Palen www.umich.edu/~brockp CAEN Advanced Computing XSEDE Campus Champion bro...@umich.edu (734)936-1985 On Jun 20, 2014, at 12:38 PM, Brock Palen wrote: I was able to produce it in my test. orted affinity set by cpuset: [root@nyx5874 ~]# hwloc-bind --get --pid 103645 0xc002 This mask (1, 14,15) which is across sockets, matches the cpu set setup by the batch system. [root@nyx5874 ~]# cat /dev/cpuset/torque/12719806.nyx.engin.umich.edu/cpus 1,14-15 The ranks though were then all set to the same core: [root@nyx5874 ~]# hwloc-bind --get --pid 103871 0x8000 [root@nyx5874 ~]# hwloc-bind --get --pid 103872 0x8000 [root@nyx5874 ~]# hwloc-bind --get --pid 103873 0x8000 Which is core 15: report-bindings gave me: You can see how a few nodes were bound to all the same core, the last one in each case. I only gave you the results for the hose nyx5874. [nyx5526.engin.umich.edu:23726] MCW rank 0 is not bound (or bound to all available processors) [nyx5878.engin.umich.edu:103925] MCW rank 8 is not bound (or bound to all available processors) [nyx5533.engin.umich.edu:123988] MCW rank 1 is not bound (or bound to all available processors) [nyx5879.engin.umich.edu:102808] MCW rank 9 is not bound (or bound to all available processors) [nyx5874.engin.umich.edu:103645] MCW rank 41 bound to socket 1[core 15[hwt 0]]: [./././././././.][./././././././B] [nyx5874.engin.umich.edu:103645] MCW rank 42 bound to socket 1[core 15[hwt 0]]: [./././././././.][./././././././B] [nyx5874.engin.umich.edu:103645] MCW rank 43 bound to socket 1[core 15[hwt 0]]: [./././././././.][./././././././B] [nyx5888.engin.umich.edu:117400] MCW rank 11 is not bound (or bound to all available processors) [nyx5786.engin.umich.edu:30004] MCW rank 19 bound to socket 1[core 15[hwt 0]]: [./././././././.][./././././././B] [nyx5786.engin.umich.edu:30004] MCW rank 18 bound to socket 1[core 15[hwt 0]]: [./././././././.][./././././././B] [nyx5594.engin.umich.edu:33884] MCW rank 24 bound to socket 1[core 15[hwt 0]]: [./././././././.][./././././././B] [nyx5594.engin.umich.edu:33884] MCW rank 25 bound to socket 1[core 15[hwt 0]]: [./././././././.][./././././././B] [nyx5594.engin.umich.edu:33884] MCW rank 26 bound to socket 1[core 15[hwt 0]]: [./././././././.][./././././././B] [nyx5798.engin.umich.edu:53026] MCW rank 59
Re: [OMPI users] affinity issues under cpuset torque 1.8.1
Perfection, flexible, extensible, so nice. BTW this doesn't happen older versions, [brockp@flux-login2 34241]$ ompi_info --param all all Error getting SCIF driver version MCA btl: parameter "btl_tcp_if_include" (current value: "", data source: default, level: 1 user/basic, type: string) Comma-delimited list of devices and/or CIDR notation of networks to use for MPI communication (e.g., "eth0,192.168.0.0/16"). Mutually exclusive with btl_tcp_if_exclude. MCA btl: parameter "btl_tcp_if_exclude" (current value: "127.0.0.1/8,sppp", data source: default, level: 1 user/basic, type: string) Comma-delimited list of devices and/or CIDR notation of networks to NOT use for MPI communication -- all devices not matching these specifications will be used (e.g., "eth0,192.168.0.0/16"). If set to a non-default value, it is mutually exclusive with btl_tcp_if_include. This is normally much longer. And yes we don't have the PHI stuff installed on all nodes, strange that 'all all' is now very short, ompi_info -a still works though. Brock Palen www.umich.edu/~brockp CAEN Advanced Computing XSEDE Campus Champion bro...@umich.edu (734)936-1985 On Jun 20, 2014, at 1:48 PM, Ralph Castain wrote: > Put "orte_hetero_nodes=1" in your default MCA param file - uses can override > by setting that param to 0 > > > On Jun 20, 2014, at 10:30 AM, Brock Palen wrote: > >> Perfection! That appears to do it for our standard case. >> >> Now I know how to set MCA options by env var or config file. How can I make >> this the default, that then a user can override? >> >> Brock Palen >> www.umich.edu/~brockp >> CAEN Advanced Computing >> XSEDE Campus Champion >> bro...@umich.edu >> (734)936-1985 >> >> >> >> On Jun 20, 2014, at 1:21 PM, Ralph Castain wrote: >> >>> I think I begin to grok at least part of the problem. If you are assigning >>> different cpus on each node, then you'll need to tell us that by setting >>> --hetero-nodes otherwise we won't have any way to report that back to >>> mpirun for its binding calculation. >>> >>> Otherwise, we expect that the cpuset of the first node we launch a daemon >>> onto (or where mpirun is executing, if we are only launching local to >>> mpirun) accurately represents the cpuset on every node in the allocation. >>> >>> We still might well have a bug in our binding computation - but the above >>> will definitely impact what you said the user did. >>> >>> On Jun 20, 2014, at 10:06 AM, Brock Palen wrote: >>> Extra data point if I do: [brockp@nyx5508 34241]$ mpirun --report-bindings --bind-to core hostname -- A request was made to bind to that would result in binding more processes than cpus on a resource: Bind to: CORE Node:nyx5513 #processes: 2 #cpus: 1 You can override this protection by adding the "overload-allowed" option to your binding directive. -- [brockp@nyx5508 34241]$ mpirun -H nyx5513 uptime 13:01:37 up 31 days, 23:06, 0 users, load average: 10.13, 10.90, 12.38 13:01:37 up 31 days, 23:06, 0 users, load average: 10.13, 10.90, 12.38 [brockp@nyx5508 34241]$ mpirun -H nyx5513 --bind-to core hwloc-bind --get 0x0010 0x1000 [brockp@nyx5508 34241]$ cat $PBS_NODEFILE | grep nyx5513 nyx5513 nyx5513 Interesting, if I force bind to core, MPI barfs saying there is only 1 cpu available, PBS says it gave it two, and if I force (this is all inside an interactive job) just on that node hwloc-bind --get I get what I expect, Is there a way to get a map of what MPI thinks it has on each host? Brock Palen www.umich.edu/~brockp CAEN Advanced Computing XSEDE Campus Champion bro...@umich.edu (734)936-1985 On Jun 20, 2014, at 12:38 PM, Brock Palen wrote: > I was able to produce it in my test. > > orted affinity set by cpuset: > [root@nyx5874 ~]# hwloc-bind --get --pid 103645 > 0xc002 > > This mask (1, 14,15) which is across sockets, matches the cpu set setup > by the batch system. > [root@nyx5874 ~]# cat > /dev/cpuset/torque/12719806.nyx.engin.umich.edu/cpus > 1,14-15 > > The ranks though were then all set to the same core: > > [root@nyx5874 ~]# hwloc-bind --get --pid 103871 > 0x000
Re: [OMPI users] affinity issues under cpuset torque 1.8.1
Put "orte_hetero_nodes=1" in your default MCA param file - uses can override by setting that param to 0 On Jun 20, 2014, at 10:30 AM, Brock Palen wrote: > Perfection! That appears to do it for our standard case. > > Now I know how to set MCA options by env var or config file. How can I make > this the default, that then a user can override? > > Brock Palen > www.umich.edu/~brockp > CAEN Advanced Computing > XSEDE Campus Champion > bro...@umich.edu > (734)936-1985 > > > > On Jun 20, 2014, at 1:21 PM, Ralph Castain wrote: > >> I think I begin to grok at least part of the problem. If you are assigning >> different cpus on each node, then you'll need to tell us that by setting >> --hetero-nodes otherwise we won't have any way to report that back to mpirun >> for its binding calculation. >> >> Otherwise, we expect that the cpuset of the first node we launch a daemon >> onto (or where mpirun is executing, if we are only launching local to >> mpirun) accurately represents the cpuset on every node in the allocation. >> >> We still might well have a bug in our binding computation - but the above >> will definitely impact what you said the user did. >> >> On Jun 20, 2014, at 10:06 AM, Brock Palen wrote: >> >>> Extra data point if I do: >>> >>> [brockp@nyx5508 34241]$ mpirun --report-bindings --bind-to core hostname >>> -- >>> A request was made to bind to that would result in binding more >>> processes than cpus on a resource: >>> >>> Bind to: CORE >>> Node:nyx5513 >>> #processes: 2 >>> #cpus: 1 >>> >>> You can override this protection by adding the "overload-allowed" >>> option to your binding directive. >>> -- >>> >>> [brockp@nyx5508 34241]$ mpirun -H nyx5513 uptime >>> 13:01:37 up 31 days, 23:06, 0 users, load average: 10.13, 10.90, 12.38 >>> 13:01:37 up 31 days, 23:06, 0 users, load average: 10.13, 10.90, 12.38 >>> [brockp@nyx5508 34241]$ mpirun -H nyx5513 --bind-to core hwloc-bind --get >>> 0x0010 >>> 0x1000 >>> [brockp@nyx5508 34241]$ cat $PBS_NODEFILE | grep nyx5513 >>> nyx5513 >>> nyx5513 >>> >>> Interesting, if I force bind to core, MPI barfs saying there is only 1 cpu >>> available, PBS says it gave it two, and if I force (this is all inside an >>> interactive job) just on that node hwloc-bind --get I get what I expect, >>> >>> Is there a way to get a map of what MPI thinks it has on each host? >>> >>> Brock Palen >>> www.umich.edu/~brockp >>> CAEN Advanced Computing >>> XSEDE Campus Champion >>> bro...@umich.edu >>> (734)936-1985 >>> >>> >>> >>> On Jun 20, 2014, at 12:38 PM, Brock Palen wrote: >>> I was able to produce it in my test. orted affinity set by cpuset: [root@nyx5874 ~]# hwloc-bind --get --pid 103645 0xc002 This mask (1, 14,15) which is across sockets, matches the cpu set setup by the batch system. [root@nyx5874 ~]# cat /dev/cpuset/torque/12719806.nyx.engin.umich.edu/cpus 1,14-15 The ranks though were then all set to the same core: [root@nyx5874 ~]# hwloc-bind --get --pid 103871 0x8000 [root@nyx5874 ~]# hwloc-bind --get --pid 103872 0x8000 [root@nyx5874 ~]# hwloc-bind --get --pid 103873 0x8000 Which is core 15: report-bindings gave me: You can see how a few nodes were bound to all the same core, the last one in each case. I only gave you the results for the hose nyx5874. [nyx5526.engin.umich.edu:23726] MCW rank 0 is not bound (or bound to all available processors) [nyx5878.engin.umich.edu:103925] MCW rank 8 is not bound (or bound to all available processors) [nyx5533.engin.umich.edu:123988] MCW rank 1 is not bound (or bound to all available processors) [nyx5879.engin.umich.edu:102808] MCW rank 9 is not bound (or bound to all available processors) [nyx5874.engin.umich.edu:103645] MCW rank 41 bound to socket 1[core 15[hwt 0]]: [./././././././.][./././././././B] [nyx5874.engin.umich.edu:103645] MCW rank 42 bound to socket 1[core 15[hwt 0]]: [./././././././.][./././././././B] [nyx5874.engin.umich.edu:103645] MCW rank 43 bound to socket 1[core 15[hwt 0]]: [./././././././.][./././././././B] [nyx5888.engin.umich.edu:117400] MCW rank 11 is not bound (or bound to all available processors) [nyx5786.engin.umich.edu:30004] MCW rank 19 bound to socket 1[core 15[hwt 0]]: [./././././././.][./././././././B] [nyx5786.engin.umich.edu:30004] MCW rank 18 bound to socket 1[core 15[hwt 0]]: [./././././././.][./././././././B] [nyx5594.engin.umich.edu:33884] MCW rank 24 bound to socket 1[core 15[hwt 0]]: [./././././././.][./././././././B] [nyx5594.engin.umich.edu:33884] MCW rank 25 bound to socket 1[core 15[hwt >>
Re: [OMPI users] affinity issues under cpuset torque 1.8.1
Perfection! That appears to do it for our standard case. Now I know how to set MCA options by env var or config file. How can I make this the default, that then a user can override? Brock Palen www.umich.edu/~brockp CAEN Advanced Computing XSEDE Campus Champion bro...@umich.edu (734)936-1985 On Jun 20, 2014, at 1:21 PM, Ralph Castain wrote: > I think I begin to grok at least part of the problem. If you are assigning > different cpus on each node, then you'll need to tell us that by setting > --hetero-nodes otherwise we won't have any way to report that back to mpirun > for its binding calculation. > > Otherwise, we expect that the cpuset of the first node we launch a daemon > onto (or where mpirun is executing, if we are only launching local to mpirun) > accurately represents the cpuset on every node in the allocation. > > We still might well have a bug in our binding computation - but the above > will definitely impact what you said the user did. > > On Jun 20, 2014, at 10:06 AM, Brock Palen wrote: > >> Extra data point if I do: >> >> [brockp@nyx5508 34241]$ mpirun --report-bindings --bind-to core hostname >> -- >> A request was made to bind to that would result in binding more >> processes than cpus on a resource: >> >> Bind to: CORE >> Node:nyx5513 >> #processes: 2 >> #cpus: 1 >> >> You can override this protection by adding the "overload-allowed" >> option to your binding directive. >> -- >> >> [brockp@nyx5508 34241]$ mpirun -H nyx5513 uptime >> 13:01:37 up 31 days, 23:06, 0 users, load average: 10.13, 10.90, 12.38 >> 13:01:37 up 31 days, 23:06, 0 users, load average: 10.13, 10.90, 12.38 >> [brockp@nyx5508 34241]$ mpirun -H nyx5513 --bind-to core hwloc-bind --get >> 0x0010 >> 0x1000 >> [brockp@nyx5508 34241]$ cat $PBS_NODEFILE | grep nyx5513 >> nyx5513 >> nyx5513 >> >> Interesting, if I force bind to core, MPI barfs saying there is only 1 cpu >> available, PBS says it gave it two, and if I force (this is all inside an >> interactive job) just on that node hwloc-bind --get I get what I expect, >> >> Is there a way to get a map of what MPI thinks it has on each host? >> >> Brock Palen >> www.umich.edu/~brockp >> CAEN Advanced Computing >> XSEDE Campus Champion >> bro...@umich.edu >> (734)936-1985 >> >> >> >> On Jun 20, 2014, at 12:38 PM, Brock Palen wrote: >> >>> I was able to produce it in my test. >>> >>> orted affinity set by cpuset: >>> [root@nyx5874 ~]# hwloc-bind --get --pid 103645 >>> 0xc002 >>> >>> This mask (1, 14,15) which is across sockets, matches the cpu set setup by >>> the batch system. >>> [root@nyx5874 ~]# cat /dev/cpuset/torque/12719806.nyx.engin.umich.edu/cpus >>> 1,14-15 >>> >>> The ranks though were then all set to the same core: >>> >>> [root@nyx5874 ~]# hwloc-bind --get --pid 103871 >>> 0x8000 >>> [root@nyx5874 ~]# hwloc-bind --get --pid 103872 >>> 0x8000 >>> [root@nyx5874 ~]# hwloc-bind --get --pid 103873 >>> 0x8000 >>> >>> Which is core 15: >>> >>> report-bindings gave me: >>> You can see how a few nodes were bound to all the same core, the last one >>> in each case. I only gave you the results for the hose nyx5874. >>> >>> [nyx5526.engin.umich.edu:23726] MCW rank 0 is not bound (or bound to all >>> available processors) >>> [nyx5878.engin.umich.edu:103925] MCW rank 8 is not bound (or bound to all >>> available processors) >>> [nyx5533.engin.umich.edu:123988] MCW rank 1 is not bound (or bound to all >>> available processors) >>> [nyx5879.engin.umich.edu:102808] MCW rank 9 is not bound (or bound to all >>> available processors) >>> [nyx5874.engin.umich.edu:103645] MCW rank 41 bound to socket 1[core 15[hwt >>> 0]]: [./././././././.][./././././././B] >>> [nyx5874.engin.umich.edu:103645] MCW rank 42 bound to socket 1[core 15[hwt >>> 0]]: [./././././././.][./././././././B] >>> [nyx5874.engin.umich.edu:103645] MCW rank 43 bound to socket 1[core 15[hwt >>> 0]]: [./././././././.][./././././././B] >>> [nyx5888.engin.umich.edu:117400] MCW rank 11 is not bound (or bound to all >>> available processors) >>> [nyx5786.engin.umich.edu:30004] MCW rank 19 bound to socket 1[core 15[hwt >>> 0]]: [./././././././.][./././././././B] >>> [nyx5786.engin.umich.edu:30004] MCW rank 18 bound to socket 1[core 15[hwt >>> 0]]: [./././././././.][./././././././B] >>> [nyx5594.engin.umich.edu:33884] MCW rank 24 bound to socket 1[core 15[hwt >>> 0]]: [./././././././.][./././././././B] >>> [nyx5594.engin.umich.edu:33884] MCW rank 25 bound to socket 1[core 15[hwt >>> 0]]: [./././././././.][./././././././B] >>> [nyx5594.engin.umich.edu:33884] MCW rank 26 bound to socket 1[core 15[hwt >>> 0]]: [./././././././.][./././././././B] >>> [nyx5798.engin.umich.edu:53026] MCW rank 59 bound to socket 1[core 15[hwt >>> 0]]: [./././././././.][./././././././B] >>>
Re: [OMPI users] affinity issues under cpuset torque 1.8.1
I think I begin to grok at least part of the problem. If you are assigning different cpus on each node, then you'll need to tell us that by setting --hetero-nodes otherwise we won't have any way to report that back to mpirun for its binding calculation. Otherwise, we expect that the cpuset of the first node we launch a daemon onto (or where mpirun is executing, if we are only launching local to mpirun) accurately represents the cpuset on every node in the allocation. We still might well have a bug in our binding computation - but the above will definitely impact what you said the user did. On Jun 20, 2014, at 10:06 AM, Brock Palen wrote: > Extra data point if I do: > > [brockp@nyx5508 34241]$ mpirun --report-bindings --bind-to core hostname > -- > A request was made to bind to that would result in binding more > processes than cpus on a resource: > > Bind to: CORE > Node:nyx5513 > #processes: 2 > #cpus: 1 > > You can override this protection by adding the "overload-allowed" > option to your binding directive. > -- > > [brockp@nyx5508 34241]$ mpirun -H nyx5513 uptime > 13:01:37 up 31 days, 23:06, 0 users, load average: 10.13, 10.90, 12.38 > 13:01:37 up 31 days, 23:06, 0 users, load average: 10.13, 10.90, 12.38 > [brockp@nyx5508 34241]$ mpirun -H nyx5513 --bind-to core hwloc-bind --get > 0x0010 > 0x1000 > [brockp@nyx5508 34241]$ cat $PBS_NODEFILE | grep nyx5513 > nyx5513 > nyx5513 > > Interesting, if I force bind to core, MPI barfs saying there is only 1 cpu > available, PBS says it gave it two, and if I force (this is all inside an > interactive job) just on that node hwloc-bind --get I get what I expect, > > Is there a way to get a map of what MPI thinks it has on each host? > > Brock Palen > www.umich.edu/~brockp > CAEN Advanced Computing > XSEDE Campus Champion > bro...@umich.edu > (734)936-1985 > > > > On Jun 20, 2014, at 12:38 PM, Brock Palen wrote: > >> I was able to produce it in my test. >> >> orted affinity set by cpuset: >> [root@nyx5874 ~]# hwloc-bind --get --pid 103645 >> 0xc002 >> >> This mask (1, 14,15) which is across sockets, matches the cpu set setup by >> the batch system. >> [root@nyx5874 ~]# cat /dev/cpuset/torque/12719806.nyx.engin.umich.edu/cpus >> 1,14-15 >> >> The ranks though were then all set to the same core: >> >> [root@nyx5874 ~]# hwloc-bind --get --pid 103871 >> 0x8000 >> [root@nyx5874 ~]# hwloc-bind --get --pid 103872 >> 0x8000 >> [root@nyx5874 ~]# hwloc-bind --get --pid 103873 >> 0x8000 >> >> Which is core 15: >> >> report-bindings gave me: >> You can see how a few nodes were bound to all the same core, the last one in >> each case. I only gave you the results for the hose nyx5874. >> >> [nyx5526.engin.umich.edu:23726] MCW rank 0 is not bound (or bound to all >> available processors) >> [nyx5878.engin.umich.edu:103925] MCW rank 8 is not bound (or bound to all >> available processors) >> [nyx5533.engin.umich.edu:123988] MCW rank 1 is not bound (or bound to all >> available processors) >> [nyx5879.engin.umich.edu:102808] MCW rank 9 is not bound (or bound to all >> available processors) >> [nyx5874.engin.umich.edu:103645] MCW rank 41 bound to socket 1[core 15[hwt >> 0]]: [./././././././.][./././././././B] >> [nyx5874.engin.umich.edu:103645] MCW rank 42 bound to socket 1[core 15[hwt >> 0]]: [./././././././.][./././././././B] >> [nyx5874.engin.umich.edu:103645] MCW rank 43 bound to socket 1[core 15[hwt >> 0]]: [./././././././.][./././././././B] >> [nyx5888.engin.umich.edu:117400] MCW rank 11 is not bound (or bound to all >> available processors) >> [nyx5786.engin.umich.edu:30004] MCW rank 19 bound to socket 1[core 15[hwt >> 0]]: [./././././././.][./././././././B] >> [nyx5786.engin.umich.edu:30004] MCW rank 18 bound to socket 1[core 15[hwt >> 0]]: [./././././././.][./././././././B] >> [nyx5594.engin.umich.edu:33884] MCW rank 24 bound to socket 1[core 15[hwt >> 0]]: [./././././././.][./././././././B] >> [nyx5594.engin.umich.edu:33884] MCW rank 25 bound to socket 1[core 15[hwt >> 0]]: [./././././././.][./././././././B] >> [nyx5594.engin.umich.edu:33884] MCW rank 26 bound to socket 1[core 15[hwt >> 0]]: [./././././././.][./././././././B] >> [nyx5798.engin.umich.edu:53026] MCW rank 59 bound to socket 1[core 15[hwt >> 0]]: [./././././././.][./././././././B] >> [nyx5798.engin.umich.edu:53026] MCW rank 60 bound to socket 1[core 15[hwt >> 0]]: [./././././././.][./././././././B] >> [nyx5798.engin.umich.edu:53026] MCW rank 56 bound to socket 1[core 15[hwt >> 0]]: [./././././././.][./././././././B] >> [nyx5798.engin.umich.edu:53026] MCW rank 57 bound to socket 1[core 15[hwt >> 0]]: [./././././././.][./././././././B] >> [nyx5798.engin.umich.edu:53026] MCW rank 58 bound to socket 1[core 15[hwt >> 0]]: [./././././././.][./././././././B
Re: [OMPI users] affinity issues under cpuset torque 1.8.1
Extra data point if I do: [brockp@nyx5508 34241]$ mpirun --report-bindings --bind-to core hostname -- A request was made to bind to that would result in binding more processes than cpus on a resource: Bind to: CORE Node:nyx5513 #processes: 2 #cpus: 1 You can override this protection by adding the "overload-allowed" option to your binding directive. -- [brockp@nyx5508 34241]$ mpirun -H nyx5513 uptime 13:01:37 up 31 days, 23:06, 0 users, load average: 10.13, 10.90, 12.38 13:01:37 up 31 days, 23:06, 0 users, load average: 10.13, 10.90, 12.38 [brockp@nyx5508 34241]$ mpirun -H nyx5513 --bind-to core hwloc-bind --get 0x0010 0x1000 [brockp@nyx5508 34241]$ cat $PBS_NODEFILE | grep nyx5513 nyx5513 nyx5513 Interesting, if I force bind to core, MPI barfs saying there is only 1 cpu available, PBS says it gave it two, and if I force (this is all inside an interactive job) just on that node hwloc-bind --get I get what I expect, Is there a way to get a map of what MPI thinks it has on each host? Brock Palen www.umich.edu/~brockp CAEN Advanced Computing XSEDE Campus Champion bro...@umich.edu (734)936-1985 On Jun 20, 2014, at 12:38 PM, Brock Palen wrote: > I was able to produce it in my test. > > orted affinity set by cpuset: > [root@nyx5874 ~]# hwloc-bind --get --pid 103645 > 0xc002 > > This mask (1, 14,15) which is across sockets, matches the cpu set setup by > the batch system. > [root@nyx5874 ~]# cat /dev/cpuset/torque/12719806.nyx.engin.umich.edu/cpus > 1,14-15 > > The ranks though were then all set to the same core: > > [root@nyx5874 ~]# hwloc-bind --get --pid 103871 > 0x8000 > [root@nyx5874 ~]# hwloc-bind --get --pid 103872 > 0x8000 > [root@nyx5874 ~]# hwloc-bind --get --pid 103873 > 0x8000 > > Which is core 15: > > report-bindings gave me: > You can see how a few nodes were bound to all the same core, the last one in > each case. I only gave you the results for the hose nyx5874. > > [nyx5526.engin.umich.edu:23726] MCW rank 0 is not bound (or bound to all > available processors) > [nyx5878.engin.umich.edu:103925] MCW rank 8 is not bound (or bound to all > available processors) > [nyx5533.engin.umich.edu:123988] MCW rank 1 is not bound (or bound to all > available processors) > [nyx5879.engin.umich.edu:102808] MCW rank 9 is not bound (or bound to all > available processors) > [nyx5874.engin.umich.edu:103645] MCW rank 41 bound to socket 1[core 15[hwt > 0]]: [./././././././.][./././././././B] > [nyx5874.engin.umich.edu:103645] MCW rank 42 bound to socket 1[core 15[hwt > 0]]: [./././././././.][./././././././B] > [nyx5874.engin.umich.edu:103645] MCW rank 43 bound to socket 1[core 15[hwt > 0]]: [./././././././.][./././././././B] > [nyx5888.engin.umich.edu:117400] MCW rank 11 is not bound (or bound to all > available processors) > [nyx5786.engin.umich.edu:30004] MCW rank 19 bound to socket 1[core 15[hwt > 0]]: [./././././././.][./././././././B] > [nyx5786.engin.umich.edu:30004] MCW rank 18 bound to socket 1[core 15[hwt > 0]]: [./././././././.][./././././././B] > [nyx5594.engin.umich.edu:33884] MCW rank 24 bound to socket 1[core 15[hwt > 0]]: [./././././././.][./././././././B] > [nyx5594.engin.umich.edu:33884] MCW rank 25 bound to socket 1[core 15[hwt > 0]]: [./././././././.][./././././././B] > [nyx5594.engin.umich.edu:33884] MCW rank 26 bound to socket 1[core 15[hwt > 0]]: [./././././././.][./././././././B] > [nyx5798.engin.umich.edu:53026] MCW rank 59 bound to socket 1[core 15[hwt > 0]]: [./././././././.][./././././././B] > [nyx5798.engin.umich.edu:53026] MCW rank 60 bound to socket 1[core 15[hwt > 0]]: [./././././././.][./././././././B] > [nyx5798.engin.umich.edu:53026] MCW rank 56 bound to socket 1[core 15[hwt > 0]]: [./././././././.][./././././././B] > [nyx5798.engin.umich.edu:53026] MCW rank 57 bound to socket 1[core 15[hwt > 0]]: [./././././././.][./././././././B] > [nyx5798.engin.umich.edu:53026] MCW rank 58 bound to socket 1[core 15[hwt > 0]]: [./././././././.][./././././././B] > [nyx5545.engin.umich.edu:88170] MCW rank 2 is not bound (or bound to all > available processors) > [nyx5613.engin.umich.edu:25229] MCW rank 31 is not bound (or bound to all > available processors) > [nyx5880.engin.umich.edu:01406] MCW rank 10 is not bound (or bound to all > available processors) > [nyx5770.engin.umich.edu:86538] MCW rank 6 is not bound (or bound to all > available processors) > [nyx5613.engin.umich.edu:25228] MCW rank 30 is not bound (or bound to all > available processors) > [nyx5577.engin.umich.edu:65949] MCW rank 4 is not bound (or bound to all > available processors) > [nyx5607.engin.umich.edu:30379] MCW rank 14 is not bound (or bound to all > available processors) > [nyx5544.engin.umich.edu:72960] MCW rank 47 is not bound (or bound to all > available proc
Re: [OMPI users] affinity issues under cpuset torque 1.8.1
I was able to produce it in my test. orted affinity set by cpuset: [root@nyx5874 ~]# hwloc-bind --get --pid 103645 0xc002 This mask (1, 14,15) which is across sockets, matches the cpu set setup by the batch system. [root@nyx5874 ~]# cat /dev/cpuset/torque/12719806.nyx.engin.umich.edu/cpus 1,14-15 The ranks though were then all set to the same core: [root@nyx5874 ~]# hwloc-bind --get --pid 103871 0x8000 [root@nyx5874 ~]# hwloc-bind --get --pid 103872 0x8000 [root@nyx5874 ~]# hwloc-bind --get --pid 103873 0x8000 Which is core 15: report-bindings gave me: You can see how a few nodes were bound to all the same core, the last one in each case. I only gave you the results for the hose nyx5874. [nyx5526.engin.umich.edu:23726] MCW rank 0 is not bound (or bound to all available processors) [nyx5878.engin.umich.edu:103925] MCW rank 8 is not bound (or bound to all available processors) [nyx5533.engin.umich.edu:123988] MCW rank 1 is not bound (or bound to all available processors) [nyx5879.engin.umich.edu:102808] MCW rank 9 is not bound (or bound to all available processors) [nyx5874.engin.umich.edu:103645] MCW rank 41 bound to socket 1[core 15[hwt 0]]: [./././././././.][./././././././B] [nyx5874.engin.umich.edu:103645] MCW rank 42 bound to socket 1[core 15[hwt 0]]: [./././././././.][./././././././B] [nyx5874.engin.umich.edu:103645] MCW rank 43 bound to socket 1[core 15[hwt 0]]: [./././././././.][./././././././B] [nyx5888.engin.umich.edu:117400] MCW rank 11 is not bound (or bound to all available processors) [nyx5786.engin.umich.edu:30004] MCW rank 19 bound to socket 1[core 15[hwt 0]]: [./././././././.][./././././././B] [nyx5786.engin.umich.edu:30004] MCW rank 18 bound to socket 1[core 15[hwt 0]]: [./././././././.][./././././././B] [nyx5594.engin.umich.edu:33884] MCW rank 24 bound to socket 1[core 15[hwt 0]]: [./././././././.][./././././././B] [nyx5594.engin.umich.edu:33884] MCW rank 25 bound to socket 1[core 15[hwt 0]]: [./././././././.][./././././././B] [nyx5594.engin.umich.edu:33884] MCW rank 26 bound to socket 1[core 15[hwt 0]]: [./././././././.][./././././././B] [nyx5798.engin.umich.edu:53026] MCW rank 59 bound to socket 1[core 15[hwt 0]]: [./././././././.][./././././././B] [nyx5798.engin.umich.edu:53026] MCW rank 60 bound to socket 1[core 15[hwt 0]]: [./././././././.][./././././././B] [nyx5798.engin.umich.edu:53026] MCW rank 56 bound to socket 1[core 15[hwt 0]]: [./././././././.][./././././././B] [nyx5798.engin.umich.edu:53026] MCW rank 57 bound to socket 1[core 15[hwt 0]]: [./././././././.][./././././././B] [nyx5798.engin.umich.edu:53026] MCW rank 58 bound to socket 1[core 15[hwt 0]]: [./././././././.][./././././././B] [nyx5545.engin.umich.edu:88170] MCW rank 2 is not bound (or bound to all available processors) [nyx5613.engin.umich.edu:25229] MCW rank 31 is not bound (or bound to all available processors) [nyx5880.engin.umich.edu:01406] MCW rank 10 is not bound (or bound to all available processors) [nyx5770.engin.umich.edu:86538] MCW rank 6 is not bound (or bound to all available processors) [nyx5613.engin.umich.edu:25228] MCW rank 30 is not bound (or bound to all available processors) [nyx5577.engin.umich.edu:65949] MCW rank 4 is not bound (or bound to all available processors) [nyx5607.engin.umich.edu:30379] MCW rank 14 is not bound (or bound to all available processors) [nyx5544.engin.umich.edu:72960] MCW rank 47 is not bound (or bound to all available processors) [nyx5544.engin.umich.edu:72959] MCW rank 46 is not bound (or bound to all available processors) [nyx5848.engin.umich.edu:04332] MCW rank 33 is not bound (or bound to all available processors) [nyx5848.engin.umich.edu:04333] MCW rank 34 is not bound (or bound to all available processors) [nyx5544.engin.umich.edu:72958] MCW rank 45 is not bound (or bound to all available processors) [nyx5858.engin.umich.edu:12165] MCW rank 35 is not bound (or bound to all available processors) [nyx5607.engin.umich.edu:30380] MCW rank 15 is not bound (or bound to all available processors) [nyx5544.engin.umich.edu:72957] MCW rank 44 is not bound (or bound to all available processors) [nyx5858.engin.umich.edu:12167] MCW rank 37 is not bound (or bound to all available processors) [nyx5870.engin.umich.edu:33811] MCW rank 7 is not bound (or bound to all available processors) [nyx5582.engin.umich.edu:81994] MCW rank 5 is not bound (or bound to all available processors) [nyx5848.engin.umich.edu:04331] MCW rank 32 is not bound (or bound to all available processors) [nyx5557.engin.umich.edu:46654] MCW rank 50 is not bound (or bound to all available processors) [nyx5858.engin.umich.edu:12166] MCW rank 36 is not bound (or bound to all available processors) [nyx5799.engin.umich.edu:67802] MCW rank 22 is not bound (or bound to all available processors) [nyx5799.engin.umich.edu:67803] MCW rank 23 is not bound (or bound to all available processors) [nyx5556.engin.umich.edu:50889] MCW rank 3 is not bound (or bound
Re: [OMPI users] affinity issues under cpuset torque 1.8.1
Got it, I have the input from the user and am testing it out. It probably has less todo with torque and more cpuset's, I'm working on producing it myself also. Brock Palen www.umich.edu/~brockp CAEN Advanced Computing XSEDE Campus Champion bro...@umich.edu (734)936-1985 On Jun 20, 2014, at 12:18 PM, Ralph Castain wrote: > Thanks - I'm just trying to reproduce one problem case so I can look at it. > Given that I don't have access to a Torque machine, I need to "fake" it. > > > On Jun 20, 2014, at 9:15 AM, Brock Palen wrote: > >> In this case they are a single socket, but as you can see they could be >> ether/or depending on the job. >> >> Brock Palen >> www.umich.edu/~brockp >> CAEN Advanced Computing >> XSEDE Campus Champion >> bro...@umich.edu >> (734)936-1985 >> >> >> >> On Jun 19, 2014, at 2:44 PM, Ralph Castain wrote: >> >>> Sorry, I should have been clearer - I was asking if cores 8-11 are all on >>> one socket, or span multiple sockets >>> >>> >>> On Jun 19, 2014, at 11:36 AM, Brock Palen wrote: >>> Ralph, It was a large job spread across. Our system allows users to ask for 'procs' which are laid out in any format. The list: > [nyx5406:2][nyx5427:2][nyx5506:2][nyx5311:3] > [nyx5329:4][nyx5398:4][nyx5396:11][nyx5397:11] > [nyx5409:11][nyx5411:11][nyx5412:3] Shows that nyx5406 had 2 cores, nyx5427 also 2, nyx5411 had 11. They could be spread across any number of sockets configuration. We start very lax "user requests X procs" and then the user can request more strict requirements from there. We support mostly serial users, and users can colocate on nodes. That is good to know, I think we would want to turn our default to 'bind to core' except for our few users who use hybrid mode. Our CPU set tells you what cores the job is assigned. So in the problem case provided, the cpuset/cgroup shows only cores 8-11 are available to this job on this node. Brock Palen www.umich.edu/~brockp CAEN Advanced Computing XSEDE Campus Champion bro...@umich.edu (734)936-1985 On Jun 18, 2014, at 11:10 PM, Ralph Castain wrote: > The default binding option depends on the number of procs - it is bind-to > core for np=2, and bind-to socket for np > 2. You never said, but should > I assume you ran 4 ranks? If so, then we should be trying to bind-to > socket. > > I'm not sure what your cpuset is telling us - are you binding us to a > socket? Are some cpus in one socket, and some in another? > > It could be that the cpuset + bind-to socket is resulting in some odd > behavior, but I'd need a little more info to narrow it down. > > > On Jun 18, 2014, at 7:48 PM, Brock Palen wrote: > >> I have started using 1.8.1 for some codes (meep in this case) and it >> sometimes works fine, but in a few cases I am seeing ranks being given >> overlapping CPU assignments, not always though. >> >> Example job, default binding options (so by-core right?): >> >> Assigned nodes, the one in question is nyx5398, we use torque CPU sets, >> and use TM to spawn. >> >> [nyx5406:2][nyx5427:2][nyx5506:2][nyx5311:3] >> [nyx5329:4][nyx5398:4][nyx5396:11][nyx5397:11] >> [nyx5409:11][nyx5411:11][nyx5412:3] >> >> [root@nyx5398 ~]# hwloc-bind --get --pid 16065 >> 0x0200 >> [root@nyx5398 ~]# hwloc-bind --get --pid 16066 >> 0x0800 >> [root@nyx5398 ~]# hwloc-bind --get --pid 16067 >> 0x0200 >> [root@nyx5398 ~]# hwloc-bind --get --pid 16068 >> 0x0800 >> >> [root@nyx5398 ~]# cat >> /dev/cpuset/torque/12703230.nyx.engin.umich.edu/cpus >> 8-11 >> >> So torque claims the CPU set setup for the job has 4 cores, but as you >> can see the ranks were giving identical binding. >> >> I checked the pids they were part of the correct CPU set, I also >> checked, orted: >> >> [root@nyx5398 ~]# hwloc-bind --get --pid 16064 >> 0x0f00 >> [root@nyx5398 ~]# hwloc-calc --intersect PU 16064 >> ignored unrecognized argument 16064 >> >> [root@nyx5398 ~]# hwloc-calc --intersect PU 0x0f00 >> 8,9,10,11 >> >> Which is exactly what I would expect. >> >> So ummm, i'm lost why this might happen? What else should I check? >> Like I said not all jobs show this behavior. >> >> Brock Palen >> www.umich.edu/~brockp >> CAEN Advanced Computing >> XSEDE Campus Champion >> bro...@umich.edu >> (734)936-1985 >> >> >> >> ___ >> users mailing list >> us...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >> Link to this post: >> http://www.open-mpi.org/communit
Re: [OMPI users] affinity issues under cpuset torque 1.8.1
Thanks - I'm just trying to reproduce one problem case so I can look at it. Given that I don't have access to a Torque machine, I need to "fake" it. On Jun 20, 2014, at 9:15 AM, Brock Palen wrote: > In this case they are a single socket, but as you can see they could be > ether/or depending on the job. > > Brock Palen > www.umich.edu/~brockp > CAEN Advanced Computing > XSEDE Campus Champion > bro...@umich.edu > (734)936-1985 > > > > On Jun 19, 2014, at 2:44 PM, Ralph Castain wrote: > >> Sorry, I should have been clearer - I was asking if cores 8-11 are all on >> one socket, or span multiple sockets >> >> >> On Jun 19, 2014, at 11:36 AM, Brock Palen wrote: >> >>> Ralph, >>> >>> It was a large job spread across. Our system allows users to ask for >>> 'procs' which are laid out in any format. >>> >>> The list: >>> [nyx5406:2][nyx5427:2][nyx5506:2][nyx5311:3] [nyx5329:4][nyx5398:4][nyx5396:11][nyx5397:11] [nyx5409:11][nyx5411:11][nyx5412:3] >>> >>> Shows that nyx5406 had 2 cores, nyx5427 also 2, nyx5411 had 11. >>> >>> They could be spread across any number of sockets configuration. We start >>> very lax "user requests X procs" and then the user can request more strict >>> requirements from there. We support mostly serial users, and users can >>> colocate on nodes. >>> >>> That is good to know, I think we would want to turn our default to 'bind to >>> core' except for our few users who use hybrid mode. >>> >>> Our CPU set tells you what cores the job is assigned. So in the problem >>> case provided, the cpuset/cgroup shows only cores 8-11 are available to >>> this job on this node. >>> >>> Brock Palen >>> www.umich.edu/~brockp >>> CAEN Advanced Computing >>> XSEDE Campus Champion >>> bro...@umich.edu >>> (734)936-1985 >>> >>> >>> >>> On Jun 18, 2014, at 11:10 PM, Ralph Castain wrote: >>> The default binding option depends on the number of procs - it is bind-to core for np=2, and bind-to socket for np > 2. You never said, but should I assume you ran 4 ranks? If so, then we should be trying to bind-to socket. I'm not sure what your cpuset is telling us - are you binding us to a socket? Are some cpus in one socket, and some in another? It could be that the cpuset + bind-to socket is resulting in some odd behavior, but I'd need a little more info to narrow it down. On Jun 18, 2014, at 7:48 PM, Brock Palen wrote: > I have started using 1.8.1 for some codes (meep in this case) and it > sometimes works fine, but in a few cases I am seeing ranks being given > overlapping CPU assignments, not always though. > > Example job, default binding options (so by-core right?): > > Assigned nodes, the one in question is nyx5398, we use torque CPU sets, > and use TM to spawn. > > [nyx5406:2][nyx5427:2][nyx5506:2][nyx5311:3] > [nyx5329:4][nyx5398:4][nyx5396:11][nyx5397:11] > [nyx5409:11][nyx5411:11][nyx5412:3] > > [root@nyx5398 ~]# hwloc-bind --get --pid 16065 > 0x0200 > [root@nyx5398 ~]# hwloc-bind --get --pid 16066 > 0x0800 > [root@nyx5398 ~]# hwloc-bind --get --pid 16067 > 0x0200 > [root@nyx5398 ~]# hwloc-bind --get --pid 16068 > 0x0800 > > [root@nyx5398 ~]# cat > /dev/cpuset/torque/12703230.nyx.engin.umich.edu/cpus > 8-11 > > So torque claims the CPU set setup for the job has 4 cores, but as you > can see the ranks were giving identical binding. > > I checked the pids they were part of the correct CPU set, I also checked, > orted: > > [root@nyx5398 ~]# hwloc-bind --get --pid 16064 > 0x0f00 > [root@nyx5398 ~]# hwloc-calc --intersect PU 16064 > ignored unrecognized argument 16064 > > [root@nyx5398 ~]# hwloc-calc --intersect PU 0x0f00 > 8,9,10,11 > > Which is exactly what I would expect. > > So ummm, i'm lost why this might happen? What else should I check? Like > I said not all jobs show this behavior. > > Brock Palen > www.umich.edu/~brockp > CAEN Advanced Computing > XSEDE Campus Champion > bro...@umich.edu > (734)936-1985 > > > > ___ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2014/06/24672.php ___ users mailing list us...@open-mpi.org Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users Link to this post: http://www.open-mpi.org/community/lists/users/2014/06/24673.php >>> >>> ___ >>> users mailing list >>> us...@open-mpi.org >>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>> L
Re: [OMPI users] affinity issues under cpuset torque 1.8.1
In this case they are a single socket, but as you can see they could be ether/or depending on the job. Brock Palen www.umich.edu/~brockp CAEN Advanced Computing XSEDE Campus Champion bro...@umich.edu (734)936-1985 On Jun 19, 2014, at 2:44 PM, Ralph Castain wrote: > Sorry, I should have been clearer - I was asking if cores 8-11 are all on one > socket, or span multiple sockets > > > On Jun 19, 2014, at 11:36 AM, Brock Palen wrote: > >> Ralph, >> >> It was a large job spread across. Our system allows users to ask for >> 'procs' which are laid out in any format. >> >> The list: >> >>> [nyx5406:2][nyx5427:2][nyx5506:2][nyx5311:3] >>> [nyx5329:4][nyx5398:4][nyx5396:11][nyx5397:11] >>> [nyx5409:11][nyx5411:11][nyx5412:3] >> >> Shows that nyx5406 had 2 cores, nyx5427 also 2, nyx5411 had 11. >> >> They could be spread across any number of sockets configuration. We start >> very lax "user requests X procs" and then the user can request more strict >> requirements from there. We support mostly serial users, and users can >> colocate on nodes. >> >> That is good to know, I think we would want to turn our default to 'bind to >> core' except for our few users who use hybrid mode. >> >> Our CPU set tells you what cores the job is assigned. So in the problem >> case provided, the cpuset/cgroup shows only cores 8-11 are available to this >> job on this node. >> >> Brock Palen >> www.umich.edu/~brockp >> CAEN Advanced Computing >> XSEDE Campus Champion >> bro...@umich.edu >> (734)936-1985 >> >> >> >> On Jun 18, 2014, at 11:10 PM, Ralph Castain wrote: >> >>> The default binding option depends on the number of procs - it is bind-to >>> core for np=2, and bind-to socket for np > 2. You never said, but should I >>> assume you ran 4 ranks? If so, then we should be trying to bind-to socket. >>> >>> I'm not sure what your cpuset is telling us - are you binding us to a >>> socket? Are some cpus in one socket, and some in another? >>> >>> It could be that the cpuset + bind-to socket is resulting in some odd >>> behavior, but I'd need a little more info to narrow it down. >>> >>> >>> On Jun 18, 2014, at 7:48 PM, Brock Palen wrote: >>> I have started using 1.8.1 for some codes (meep in this case) and it sometimes works fine, but in a few cases I am seeing ranks being given overlapping CPU assignments, not always though. Example job, default binding options (so by-core right?): Assigned nodes, the one in question is nyx5398, we use torque CPU sets, and use TM to spawn. [nyx5406:2][nyx5427:2][nyx5506:2][nyx5311:3] [nyx5329:4][nyx5398:4][nyx5396:11][nyx5397:11] [nyx5409:11][nyx5411:11][nyx5412:3] [root@nyx5398 ~]# hwloc-bind --get --pid 16065 0x0200 [root@nyx5398 ~]# hwloc-bind --get --pid 16066 0x0800 [root@nyx5398 ~]# hwloc-bind --get --pid 16067 0x0200 [root@nyx5398 ~]# hwloc-bind --get --pid 16068 0x0800 [root@nyx5398 ~]# cat /dev/cpuset/torque/12703230.nyx.engin.umich.edu/cpus 8-11 So torque claims the CPU set setup for the job has 4 cores, but as you can see the ranks were giving identical binding. I checked the pids they were part of the correct CPU set, I also checked, orted: [root@nyx5398 ~]# hwloc-bind --get --pid 16064 0x0f00 [root@nyx5398 ~]# hwloc-calc --intersect PU 16064 ignored unrecognized argument 16064 [root@nyx5398 ~]# hwloc-calc --intersect PU 0x0f00 8,9,10,11 Which is exactly what I would expect. So ummm, i'm lost why this might happen? What else should I check? Like I said not all jobs show this behavior. Brock Palen www.umich.edu/~brockp CAEN Advanced Computing XSEDE Campus Champion bro...@umich.edu (734)936-1985 ___ users mailing list us...@open-mpi.org Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users Link to this post: http://www.open-mpi.org/community/lists/users/2014/06/24672.php >>> >>> ___ >>> users mailing list >>> us...@open-mpi.org >>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>> Link to this post: >>> http://www.open-mpi.org/community/lists/users/2014/06/24673.php >> >> ___ >> users mailing list >> us...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >> Link to this post: >> http://www.open-mpi.org/community/lists/users/2014/06/24675.php > > ___ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2014/06/24676.php signature.asc Description
Re: [OMPI users] affinity issues under cpuset torque 1.8.1
Sorry, I should have been clearer - I was asking if cores 8-11 are all on one socket, or span multiple sockets On Jun 19, 2014, at 11:36 AM, Brock Palen wrote: > Ralph, > > It was a large job spread across. Our system allows users to ask for 'procs' > which are laid out in any format. > > The list: > >> [nyx5406:2][nyx5427:2][nyx5506:2][nyx5311:3] >> [nyx5329:4][nyx5398:4][nyx5396:11][nyx5397:11] >> [nyx5409:11][nyx5411:11][nyx5412:3] > > Shows that nyx5406 had 2 cores, nyx5427 also 2, nyx5411 had 11. > > They could be spread across any number of sockets configuration. We start > very lax "user requests X procs" and then the user can request more strict > requirements from there. We support mostly serial users, and users can > colocate on nodes. > > That is good to know, I think we would want to turn our default to 'bind to > core' except for our few users who use hybrid mode. > > Our CPU set tells you what cores the job is assigned. So in the problem case > provided, the cpuset/cgroup shows only cores 8-11 are available to this job > on this node. > > Brock Palen > www.umich.edu/~brockp > CAEN Advanced Computing > XSEDE Campus Champion > bro...@umich.edu > (734)936-1985 > > > > On Jun 18, 2014, at 11:10 PM, Ralph Castain wrote: > >> The default binding option depends on the number of procs - it is bind-to >> core for np=2, and bind-to socket for np > 2. You never said, but should I >> assume you ran 4 ranks? If so, then we should be trying to bind-to socket. >> >> I'm not sure what your cpuset is telling us - are you binding us to a >> socket? Are some cpus in one socket, and some in another? >> >> It could be that the cpuset + bind-to socket is resulting in some odd >> behavior, but I'd need a little more info to narrow it down. >> >> >> On Jun 18, 2014, at 7:48 PM, Brock Palen wrote: >> >>> I have started using 1.8.1 for some codes (meep in this case) and it >>> sometimes works fine, but in a few cases I am seeing ranks being given >>> overlapping CPU assignments, not always though. >>> >>> Example job, default binding options (so by-core right?): >>> >>> Assigned nodes, the one in question is nyx5398, we use torque CPU sets, and >>> use TM to spawn. >>> >>> [nyx5406:2][nyx5427:2][nyx5506:2][nyx5311:3] >>> [nyx5329:4][nyx5398:4][nyx5396:11][nyx5397:11] >>> [nyx5409:11][nyx5411:11][nyx5412:3] >>> >>> [root@nyx5398 ~]# hwloc-bind --get --pid 16065 >>> 0x0200 >>> [root@nyx5398 ~]# hwloc-bind --get --pid 16066 >>> 0x0800 >>> [root@nyx5398 ~]# hwloc-bind --get --pid 16067 >>> 0x0200 >>> [root@nyx5398 ~]# hwloc-bind --get --pid 16068 >>> 0x0800 >>> >>> [root@nyx5398 ~]# cat /dev/cpuset/torque/12703230.nyx.engin.umich.edu/cpus >>> 8-11 >>> >>> So torque claims the CPU set setup for the job has 4 cores, but as you can >>> see the ranks were giving identical binding. >>> >>> I checked the pids they were part of the correct CPU set, I also checked, >>> orted: >>> >>> [root@nyx5398 ~]# hwloc-bind --get --pid 16064 >>> 0x0f00 >>> [root@nyx5398 ~]# hwloc-calc --intersect PU 16064 >>> ignored unrecognized argument 16064 >>> >>> [root@nyx5398 ~]# hwloc-calc --intersect PU 0x0f00 >>> 8,9,10,11 >>> >>> Which is exactly what I would expect. >>> >>> So ummm, i'm lost why this might happen? What else should I check? Like I >>> said not all jobs show this behavior. >>> >>> Brock Palen >>> www.umich.edu/~brockp >>> CAEN Advanced Computing >>> XSEDE Campus Champion >>> bro...@umich.edu >>> (734)936-1985 >>> >>> >>> >>> ___ >>> users mailing list >>> us...@open-mpi.org >>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>> Link to this post: >>> http://www.open-mpi.org/community/lists/users/2014/06/24672.php >> >> ___ >> users mailing list >> us...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >> Link to this post: >> http://www.open-mpi.org/community/lists/users/2014/06/24673.php > > ___ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2014/06/24675.php
Re: [OMPI users] affinity issues under cpuset torque 1.8.1
Ralph, It was a large job spread across. Our system allows users to ask for 'procs' which are laid out in any format. The list: > [nyx5406:2][nyx5427:2][nyx5506:2][nyx5311:3] > [nyx5329:4][nyx5398:4][nyx5396:11][nyx5397:11] > [nyx5409:11][nyx5411:11][nyx5412:3] Shows that nyx5406 had 2 cores, nyx5427 also 2, nyx5411 had 11. They could be spread across any number of sockets configuration. We start very lax "user requests X procs" and then the user can request more strict requirements from there. We support mostly serial users, and users can colocate on nodes. That is good to know, I think we would want to turn our default to 'bind to core' except for our few users who use hybrid mode. Our CPU set tells you what cores the job is assigned. So in the problem case provided, the cpuset/cgroup shows only cores 8-11 are available to this job on this node. Brock Palen www.umich.edu/~brockp CAEN Advanced Computing XSEDE Campus Champion bro...@umich.edu (734)936-1985 On Jun 18, 2014, at 11:10 PM, Ralph Castain wrote: > The default binding option depends on the number of procs - it is bind-to > core for np=2, and bind-to socket for np > 2. You never said, but should I > assume you ran 4 ranks? If so, then we should be trying to bind-to socket. > > I'm not sure what your cpuset is telling us - are you binding us to a socket? > Are some cpus in one socket, and some in another? > > It could be that the cpuset + bind-to socket is resulting in some odd > behavior, but I'd need a little more info to narrow it down. > > > On Jun 18, 2014, at 7:48 PM, Brock Palen wrote: > >> I have started using 1.8.1 for some codes (meep in this case) and it >> sometimes works fine, but in a few cases I am seeing ranks being given >> overlapping CPU assignments, not always though. >> >> Example job, default binding options (so by-core right?): >> >> Assigned nodes, the one in question is nyx5398, we use torque CPU sets, and >> use TM to spawn. >> >> [nyx5406:2][nyx5427:2][nyx5506:2][nyx5311:3] >> [nyx5329:4][nyx5398:4][nyx5396:11][nyx5397:11] >> [nyx5409:11][nyx5411:11][nyx5412:3] >> >> [root@nyx5398 ~]# hwloc-bind --get --pid 16065 >> 0x0200 >> [root@nyx5398 ~]# hwloc-bind --get --pid 16066 >> 0x0800 >> [root@nyx5398 ~]# hwloc-bind --get --pid 16067 >> 0x0200 >> [root@nyx5398 ~]# hwloc-bind --get --pid 16068 >> 0x0800 >> >> [root@nyx5398 ~]# cat /dev/cpuset/torque/12703230.nyx.engin.umich.edu/cpus >> 8-11 >> >> So torque claims the CPU set setup for the job has 4 cores, but as you can >> see the ranks were giving identical binding. >> >> I checked the pids they were part of the correct CPU set, I also checked, >> orted: >> >> [root@nyx5398 ~]# hwloc-bind --get --pid 16064 >> 0x0f00 >> [root@nyx5398 ~]# hwloc-calc --intersect PU 16064 >> ignored unrecognized argument 16064 >> >> [root@nyx5398 ~]# hwloc-calc --intersect PU 0x0f00 >> 8,9,10,11 >> >> Which is exactly what I would expect. >> >> So ummm, i'm lost why this might happen? What else should I check? Like I >> said not all jobs show this behavior. >> >> Brock Palen >> www.umich.edu/~brockp >> CAEN Advanced Computing >> XSEDE Campus Champion >> bro...@umich.edu >> (734)936-1985 >> >> >> >> ___ >> users mailing list >> us...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >> Link to this post: >> http://www.open-mpi.org/community/lists/users/2014/06/24672.php > > ___ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2014/06/24673.php signature.asc Description: Message signed with OpenPGP using GPGMail
Re: [OMPI users] affinity issues under cpuset torque 1.8.1
The default binding option depends on the number of procs - it is bind-to core for np=2, and bind-to socket for np > 2. You never said, but should I assume you ran 4 ranks? If so, then we should be trying to bind-to socket. I'm not sure what your cpuset is telling us - are you binding us to a socket? Are some cpus in one socket, and some in another? It could be that the cpuset + bind-to socket is resulting in some odd behavior, but I'd need a little more info to narrow it down. On Jun 18, 2014, at 7:48 PM, Brock Palen wrote: > I have started using 1.8.1 for some codes (meep in this case) and it > sometimes works fine, but in a few cases I am seeing ranks being given > overlapping CPU assignments, not always though. > > Example job, default binding options (so by-core right?): > > Assigned nodes, the one in question is nyx5398, we use torque CPU sets, and > use TM to spawn. > > [nyx5406:2][nyx5427:2][nyx5506:2][nyx5311:3] > [nyx5329:4][nyx5398:4][nyx5396:11][nyx5397:11] > [nyx5409:11][nyx5411:11][nyx5412:3] > > [root@nyx5398 ~]# hwloc-bind --get --pid 16065 > 0x0200 > [root@nyx5398 ~]# hwloc-bind --get --pid 16066 > 0x0800 > [root@nyx5398 ~]# hwloc-bind --get --pid 16067 > 0x0200 > [root@nyx5398 ~]# hwloc-bind --get --pid 16068 > 0x0800 > > [root@nyx5398 ~]# cat /dev/cpuset/torque/12703230.nyx.engin.umich.edu/cpus > 8-11 > > So torque claims the CPU set setup for the job has 4 cores, but as you can > see the ranks were giving identical binding. > > I checked the pids they were part of the correct CPU set, I also checked, > orted: > > [root@nyx5398 ~]# hwloc-bind --get --pid 16064 > 0x0f00 > [root@nyx5398 ~]# hwloc-calc --intersect PU 16064 > ignored unrecognized argument 16064 > > [root@nyx5398 ~]# hwloc-calc --intersect PU 0x0f00 > 8,9,10,11 > > Which is exactly what I would expect. > > So ummm, i'm lost why this might happen? What else should I check? Like I > said not all jobs show this behavior. > > Brock Palen > www.umich.edu/~brockp > CAEN Advanced Computing > XSEDE Campus Champion > bro...@umich.edu > (734)936-1985 > > > > ___ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2014/06/24672.php
[OMPI users] affinity issues under cpuset torque 1.8.1
I have started using 1.8.1 for some codes (meep in this case) and it sometimes works fine, but in a few cases I am seeing ranks being given overlapping CPU assignments, not always though. Example job, default binding options (so by-core right?): Assigned nodes, the one in question is nyx5398, we use torque CPU sets, and use TM to spawn. [nyx5406:2][nyx5427:2][nyx5506:2][nyx5311:3] [nyx5329:4][nyx5398:4][nyx5396:11][nyx5397:11] [nyx5409:11][nyx5411:11][nyx5412:3] [root@nyx5398 ~]# hwloc-bind --get --pid 16065 0x0200 [root@nyx5398 ~]# hwloc-bind --get --pid 16066 0x0800 [root@nyx5398 ~]# hwloc-bind --get --pid 16067 0x0200 [root@nyx5398 ~]# hwloc-bind --get --pid 16068 0x0800 [root@nyx5398 ~]# cat /dev/cpuset/torque/12703230.nyx.engin.umich.edu/cpus 8-11 So torque claims the CPU set setup for the job has 4 cores, but as you can see the ranks were giving identical binding. I checked the pids they were part of the correct CPU set, I also checked, orted: [root@nyx5398 ~]# hwloc-bind --get --pid 16064 0x0f00 [root@nyx5398 ~]# hwloc-calc --intersect PU 16064 ignored unrecognized argument 16064 [root@nyx5398 ~]# hwloc-calc --intersect PU 0x0f00 8,9,10,11 Which is exactly what I would expect. So ummm, i'm lost why this might happen? What else should I check? Like I said not all jobs show this behavior. Brock Palen www.umich.edu/~brockp CAEN Advanced Computing XSEDE Campus Champion bro...@umich.edu (734)936-1985 signature.asc Description: Message signed with OpenPGP using GPGMail