Re: [OMPI devel] trunk hangs when I specify a particular binding by rankfile
Thanks Ralph. I'll check it on next Monday. Tetsuya > Should be fixed with r32058 > > > On Jun 20, 2014, at 4:13 AM, tmish...@jcity.maeda.co.jp wrote: > > > > > > > Hi Ralph, > > > > By the way, something is wrong with your latest rmaps_rank_file.c. > > I've got the error below. I'm tring to find the problem. But, you > > could find it more quickly... > > > > [mishima@manage trial]$ cat rankfile > > rank 0=node05 slot=0-1 > > rank 1=node05 slot=3-4 > > rank 2=node05 slot=6-7 > > [mishima@manage trial]$ mpirun -np 3 -rf rankfile -report-bindings > > demos/myprog > > -- > > Error, invalid syntax in the rankfile (rankfile) > > syntax must be the fallowing > > rank i=host_i slot=string > > Examples of proper syntax include: > >rank 1=host1 slot=1:0,1 > >rank 0=host2 slot=0:* > >rank 2=host4 slot=1-2 > >rank 3=host3 slot=0:1;1:0-2 > > -- > > [manage.cluster:24456] [[20979,0],0] ORTE_ERROR_LOG: Bad parameter in file > > rmaps_rank_file.c at line 483 > > [manage.cluster:24456] [[20979,0],0] ORTE_ERROR_LOG: Bad parameter in file > > rmaps_rank_file.c at line 149 > > [manage.cluster:24456] [[20979,0],0] ORTE_ERROR_LOG: Bad parameter in file > > base/rmaps_base_map_job.c at line 287 > > > > Regards, > > Tetsuya Mishima > > > >> My guess is that the coll/ml component may have problems with binding a > > single process across multiple cores like that - it might be that we'll > > have to have it check for that condition and disqualify > >> itself. It is a particularly bad binding pattern, though, as shared > > memory gets completely messed up when you split that way. > >> > >> > >> On Jun 19, 2014, at 3:57 PM, tmish...@jcity.maeda.co.jp wrote: > >> > >>> > >>> Hi folks, > >>> > >>> Recently I have been seeing a hang with trunk when I specify a > >>> particular binding by use of rankfile or "-map-by slot". > >>> > >>> This can be reproduced by the rankfile which allocates a process > >>> beyond socket boundary. For example, on the node05 which has 2 socket > >>> with 4 core, the rank 1 is allocated through socket 0 and 1 as shown > >>> below. Then it hangs in the middle of communication. > >>> > >>> [mishima@manage trial]$ cat rankfile1 > >>> rank 0=node05 slot=0-1 > >>> rank 1=node05 slot=3-4 > >>> rank 2=node05 slot=6-7 > >>> > >>> [mishima@manage trial]$ mpirun -rf rankfile1 -report-bindings > > demos/myprog > >>> [node05.cluster:02342] MCW rank 0 bound to socket 0[core 0[hwt 0]], > > socket > >>> 0[core 1[hwt 0]]: [B/B/./.][./././.] > >>> [node05.cluster:02342] MCW rank 1 bound to socket 0[core 3[hwt 0]], > > socket > >>> 1[core 4[hwt 0]]: [./././B][B/././.] > >>> [node05.cluster:02342] MCW rank 2 bound to socket 1[core 6[hwt 0]], > > socket > >>> 1[core 7[hwt 0]]: [./././.][././B/B] > >>> Hello world from process 2 of 3 > >>> Hello world from process 1 of 3 > >>> << hang here! >> > >>> > >>> If I disable coll_ml or use 1.8 series, it works, which means it > >>> might be affected by coll_ml component, I guess. But, unfortunately, > >>> I have no idea to fix this problem. So, please somebody could resolve > >>> the issue. > >>> > >>> [mishima@manage trial]$ mpirun -rf rankfile1 -report-bindings -mca > >>> coll_ml_priority 0 demos/myprog > >>> [node05.cluster:02382] MCW rank 0 bound to socket 0[core 0[hwt 0]], > > socket > >>> 0[core 1[hwt 0]]: [B/B/./.][./././.] > >>> [node05.cluster:02382] MCW rank 1 bound to socket 0[core 3[hwt 0]], > > socket > >>> 1[core 4[hwt 0]]: [./././B][B/././.] > >>> [node05.cluster:02382] MCW rank 2 bound to socket 1[core 6[hwt 0]], > > socket > >>> 1[core 7[hwt 0]]: [./././.][././B/B] > >>> Hello world from process 2 of 3 > >>> Hello world from process 0 of 3 > >>> Hello world from process 1 of 3 > >>> > >>> In addtition, when I use the host with 12 cores, "-map-by slot" causes > > the > >>> same problem. > >>> [mishima@manage trial]$ mpirun -np 3 -map-by slot:pe=4 -report-bindings > >>> demos/myprog > >>> [manage.cluster:02557] MCW rank 0 bound to socket 0[core 0[hwt 0]], > > socket > >>> 0[core 1[hwt 0]], socket 0[core 2[hwt 0]], so > >>> cket 0[core 3[hwt 0]]: [B/B/B/B/./.][./././././.] > >>> [manage.cluster:02557] MCW rank 1 bound to socket 0[core 4[hwt 0]], > > socket > >>> 0[core 5[hwt 0]], socket 1[core 6[hwt 0]], so > >>> cket 1[core 7[hwt 0]]: [././././B/B][B/B/./././.] > >>> [manage.cluster:02557] MCW rank 2 bound to socket 1[core 8[hwt 0]], > > socket > >>> 1[core 9[hwt 0]], socket 1[core 10[hwt 0]], s > >>> ocket 1[core 11[hwt 0]]: [./././././.][././B/B/B/B] > >>> Hello world from process 1 of 3 > >>> Hello world from process 2 of 3 > >>> << hang here! >> > >>> > >>> Regards, > >>> Tetsuya Mishima > >>> > >>> ___ > >>> devel mailing list > >>> de...@open-mpi.org > >>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > >>> Link to this post: > > http://www.o
Re: [OMPI devel] trunk hangs when I specify a particular binding by rankfile
Should be fixed with r32058 On Jun 20, 2014, at 4:13 AM, tmish...@jcity.maeda.co.jp wrote: > > > Hi Ralph, > > By the way, something is wrong with your latest rmaps_rank_file.c. > I've got the error below. I'm tring to find the problem. But, you > could find it more quickly... > > [mishima@manage trial]$ cat rankfile > rank 0=node05 slot=0-1 > rank 1=node05 slot=3-4 > rank 2=node05 slot=6-7 > [mishima@manage trial]$ mpirun -np 3 -rf rankfile -report-bindings > demos/myprog > -- > Error, invalid syntax in the rankfile (rankfile) > syntax must be the fallowing > rank i=host_i slot=string > Examples of proper syntax include: >rank 1=host1 slot=1:0,1 >rank 0=host2 slot=0:* >rank 2=host4 slot=1-2 >rank 3=host3 slot=0:1;1:0-2 > -- > [manage.cluster:24456] [[20979,0],0] ORTE_ERROR_LOG: Bad parameter in file > rmaps_rank_file.c at line 483 > [manage.cluster:24456] [[20979,0],0] ORTE_ERROR_LOG: Bad parameter in file > rmaps_rank_file.c at line 149 > [manage.cluster:24456] [[20979,0],0] ORTE_ERROR_LOG: Bad parameter in file > base/rmaps_base_map_job.c at line 287 > > Regards, > Tetsuya Mishima > >> My guess is that the coll/ml component may have problems with binding a > single process across multiple cores like that - it might be that we'll > have to have it check for that condition and disqualify >> itself. It is a particularly bad binding pattern, though, as shared > memory gets completely messed up when you split that way. >> >> >> On Jun 19, 2014, at 3:57 PM, tmish...@jcity.maeda.co.jp wrote: >> >>> >>> Hi folks, >>> >>> Recently I have been seeing a hang with trunk when I specify a >>> particular binding by use of rankfile or "-map-by slot". >>> >>> This can be reproduced by the rankfile which allocates a process >>> beyond socket boundary. For example, on the node05 which has 2 socket >>> with 4 core, the rank 1 is allocated through socket 0 and 1 as shown >>> below. Then it hangs in the middle of communication. >>> >>> [mishima@manage trial]$ cat rankfile1 >>> rank 0=node05 slot=0-1 >>> rank 1=node05 slot=3-4 >>> rank 2=node05 slot=6-7 >>> >>> [mishima@manage trial]$ mpirun -rf rankfile1 -report-bindings > demos/myprog >>> [node05.cluster:02342] MCW rank 0 bound to socket 0[core 0[hwt 0]], > socket >>> 0[core 1[hwt 0]]: [B/B/./.][./././.] >>> [node05.cluster:02342] MCW rank 1 bound to socket 0[core 3[hwt 0]], > socket >>> 1[core 4[hwt 0]]: [./././B][B/././.] >>> [node05.cluster:02342] MCW rank 2 bound to socket 1[core 6[hwt 0]], > socket >>> 1[core 7[hwt 0]]: [./././.][././B/B] >>> Hello world from process 2 of 3 >>> Hello world from process 1 of 3 >>> << hang here! >> >>> >>> If I disable coll_ml or use 1.8 series, it works, which means it >>> might be affected by coll_ml component, I guess. But, unfortunately, >>> I have no idea to fix this problem. So, please somebody could resolve >>> the issue. >>> >>> [mishima@manage trial]$ mpirun -rf rankfile1 -report-bindings -mca >>> coll_ml_priority 0 demos/myprog >>> [node05.cluster:02382] MCW rank 0 bound to socket 0[core 0[hwt 0]], > socket >>> 0[core 1[hwt 0]]: [B/B/./.][./././.] >>> [node05.cluster:02382] MCW rank 1 bound to socket 0[core 3[hwt 0]], > socket >>> 1[core 4[hwt 0]]: [./././B][B/././.] >>> [node05.cluster:02382] MCW rank 2 bound to socket 1[core 6[hwt 0]], > socket >>> 1[core 7[hwt 0]]: [./././.][././B/B] >>> Hello world from process 2 of 3 >>> Hello world from process 0 of 3 >>> Hello world from process 1 of 3 >>> >>> In addtition, when I use the host with 12 cores, "-map-by slot" causes > the >>> same problem. >>> [mishima@manage trial]$ mpirun -np 3 -map-by slot:pe=4 -report-bindings >>> demos/myprog >>> [manage.cluster:02557] MCW rank 0 bound to socket 0[core 0[hwt 0]], > socket >>> 0[core 1[hwt 0]], socket 0[core 2[hwt 0]], so >>> cket 0[core 3[hwt 0]]: [B/B/B/B/./.][./././././.] >>> [manage.cluster:02557] MCW rank 1 bound to socket 0[core 4[hwt 0]], > socket >>> 0[core 5[hwt 0]], socket 1[core 6[hwt 0]], so >>> cket 1[core 7[hwt 0]]: [././././B/B][B/B/./././.] >>> [manage.cluster:02557] MCW rank 2 bound to socket 1[core 8[hwt 0]], > socket >>> 1[core 9[hwt 0]], socket 1[core 10[hwt 0]], s >>> ocket 1[core 11[hwt 0]]: [./././././.][././B/B/B/B] >>> Hello world from process 1 of 3 >>> Hello world from process 2 of 3 >>> << hang here! >> >>> >>> Regards, >>> Tetsuya Mishima >>> >>> ___ >>> devel mailing list >>> de...@open-mpi.org >>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >>> Link to this post: > http://www.open-mpi.org/community/lists/devel/2014/06/15030.php >> >> ___ >> devel mailing list >> de...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >> Link to this post: > http://www.open-mpi.org/community/lists/devel/201
Re: [OMPI devel] trunk hangs when I specify a particular binding by rankfile
Hi Ralph, By the way, something is wrong with your latest rmaps_rank_file.c. I've got the error below. I'm tring to find the problem. But, you could find it more quickly... [mishima@manage trial]$ cat rankfile rank 0=node05 slot=0-1 rank 1=node05 slot=3-4 rank 2=node05 slot=6-7 [mishima@manage trial]$ mpirun -np 3 -rf rankfile -report-bindings demos/myprog -- Error, invalid syntax in the rankfile (rankfile) syntax must be the fallowing rank i=host_i slot=string Examples of proper syntax include: rank 1=host1 slot=1:0,1 rank 0=host2 slot=0:* rank 2=host4 slot=1-2 rank 3=host3 slot=0:1;1:0-2 -- [manage.cluster:24456] [[20979,0],0] ORTE_ERROR_LOG: Bad parameter in file rmaps_rank_file.c at line 483 [manage.cluster:24456] [[20979,0],0] ORTE_ERROR_LOG: Bad parameter in file rmaps_rank_file.c at line 149 [manage.cluster:24456] [[20979,0],0] ORTE_ERROR_LOG: Bad parameter in file base/rmaps_base_map_job.c at line 287 Regards, Tetsuya Mishima > My guess is that the coll/ml component may have problems with binding a single process across multiple cores like that - it might be that we'll have to have it check for that condition and disqualify > itself. It is a particularly bad binding pattern, though, as shared memory gets completely messed up when you split that way. > > > On Jun 19, 2014, at 3:57 PM, tmish...@jcity.maeda.co.jp wrote: > > > > > Hi folks, > > > > Recently I have been seeing a hang with trunk when I specify a > > particular binding by use of rankfile or "-map-by slot". > > > > This can be reproduced by the rankfile which allocates a process > > beyond socket boundary. For example, on the node05 which has 2 socket > > with 4 core, the rank 1 is allocated through socket 0 and 1 as shown > > below. Then it hangs in the middle of communication. > > > > [mishima@manage trial]$ cat rankfile1 > > rank 0=node05 slot=0-1 > > rank 1=node05 slot=3-4 > > rank 2=node05 slot=6-7 > > > > [mishima@manage trial]$ mpirun -rf rankfile1 -report-bindings demos/myprog > > [node05.cluster:02342] MCW rank 0 bound to socket 0[core 0[hwt 0]], socket > > 0[core 1[hwt 0]]: [B/B/./.][./././.] > > [node05.cluster:02342] MCW rank 1 bound to socket 0[core 3[hwt 0]], socket > > 1[core 4[hwt 0]]: [./././B][B/././.] > > [node05.cluster:02342] MCW rank 2 bound to socket 1[core 6[hwt 0]], socket > > 1[core 7[hwt 0]]: [./././.][././B/B] > > Hello world from process 2 of 3 > > Hello world from process 1 of 3 > > << hang here! >> > > > > If I disable coll_ml or use 1.8 series, it works, which means it > > might be affected by coll_ml component, I guess. But, unfortunately, > > I have no idea to fix this problem. So, please somebody could resolve > > the issue. > > > > [mishima@manage trial]$ mpirun -rf rankfile1 -report-bindings -mca > > coll_ml_priority 0 demos/myprog > > [node05.cluster:02382] MCW rank 0 bound to socket 0[core 0[hwt 0]], socket > > 0[core 1[hwt 0]]: [B/B/./.][./././.] > > [node05.cluster:02382] MCW rank 1 bound to socket 0[core 3[hwt 0]], socket > > 1[core 4[hwt 0]]: [./././B][B/././.] > > [node05.cluster:02382] MCW rank 2 bound to socket 1[core 6[hwt 0]], socket > > 1[core 7[hwt 0]]: [./././.][././B/B] > > Hello world from process 2 of 3 > > Hello world from process 0 of 3 > > Hello world from process 1 of 3 > > > > In addtition, when I use the host with 12 cores, "-map-by slot" causes the > > same problem. > > [mishima@manage trial]$ mpirun -np 3 -map-by slot:pe=4 -report-bindings > > demos/myprog > > [manage.cluster:02557] MCW rank 0 bound to socket 0[core 0[hwt 0]], socket > > 0[core 1[hwt 0]], socket 0[core 2[hwt 0]], so > > cket 0[core 3[hwt 0]]: [B/B/B/B/./.][./././././.] > > [manage.cluster:02557] MCW rank 1 bound to socket 0[core 4[hwt 0]], socket > > 0[core 5[hwt 0]], socket 1[core 6[hwt 0]], so > > cket 1[core 7[hwt 0]]: [././././B/B][B/B/./././.] > > [manage.cluster:02557] MCW rank 2 bound to socket 1[core 8[hwt 0]], socket > > 1[core 9[hwt 0]], socket 1[core 10[hwt 0]], s > > ocket 1[core 11[hwt 0]]: [./././././.][././B/B/B/B] > > Hello world from process 1 of 3 > > Hello world from process 2 of 3 > > << hang here! >> > > > > Regards, > > Tetsuya Mishima > > > > ___ > > devel mailing list > > de...@open-mpi.org > > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > > Link to this post: http://www.open-mpi.org/community/lists/devel/2014/06/15030.php > > ___ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: http://www.open-mpi.org/community/lists/devel/2014/06/15032.php
Re: [OMPI devel] trunk hangs when I specify a particular binding by rankfile
Hmmm...this is a tough one. It basically comes down to what we mean by relative locality. Initially, we meant "at what level do these procs share cpus" - however, coll/ml is using it as "at what level are these procs commonly bound". Subtle difference, but significant. Your proposed version implements the second interpretation - even though we share cpus down to the hwthread level, it correctly reports that we are only commonly bound to the node. I'm unclear how the shared memory system (or other areas using that value) will respond to that change in meaning. Probably requires looking a little more broadly (just search the ompi layer for anything referencing the ompi_proc_t locality flag) to ensure everything can handle (or be adjusted to handle) the revised definition. If so, then I have no issue with replacing the locality algorithm. Would also require an RFC as that might impact folks working on branches. On Jun 19, 2014, at 11:52 PM, Gilles Gouaillardet wrote: > Ralph, > > Here is attached a patch that fixes/works around my issue. > this is more of a proof of concept, so i did not commit it to the trunk. > > basically : > > opal_hwloc_base_get_relative_locality (topo, set1, set2) > sets the locality based on the deepest element that is part of both set1 and > set2. > in my case, set2 means "all the available cpus" that is why the subroutine > will return OPAL_PROC_ON_HWTHREAD > > the patch uses opal_hwloc_base_get_relative_locality2 instead. > if one of the cpuset means "all the available cpus", then the subroutine will > simply return OPAL_PROC_ON_NODE. > > i am puzzled wether this is a bug in opal_hwloc_base_get_relative_locality > or in proc.c that should not call this subroutine because it does not do what > should be expected. > > Cheers, > > Gilles > > On 2014/06/20 13:59, Gilles Gouaillardet wrote: >> Ralph, >> >> my test VM is single socket four cores. >> here is something odd i just found when running mpirun -np 2 >> intercomm_create. >> tasks [0,1] are bound on cpus [0,1] => OK >> tasks[2-3] (first spawn) are bound on cpus [2,3] => OK >> tasks[4-5] (second spawn) are not bound (and cpuset is [0-3]) => OK >> >> in ompi_proc_set_locality (ompi/proc/proc.c:228) on task 0 >>locality = >> opal_hwloc_base_get_relative_locality(opal_hwloc_topology, >> >> ompi_process_info.cpuset, >> >> cpu_bitmap); >> where >> ompi_process_info.cpuset is "0" >> cpu_bitmap is "0-3" >> >> and locality is set to OPAL_PROC_ON_HWTHREAD (!) >> >> is this correct ? >> >> i would have expected OPAL_PROC_ON_L2CACHE (since there is a single L2 >> cache on my vm, >> as reported by lstopo) or even OPAL_PROC_LOCALITY_UNKNOWN >> >> then in mca_coll_ml_comm_query (ompi/mca/coll/ml/coll_ml_module.c:2899) >> the module >> disqualifies itself if !ompi_rte_proc_bound. >> if locality were previously set to OPAL_PROC_LOCALITY_UNKNOWN, coll/ml >> could checked the flag >> of all the procs of the communicator and disqualify itself if at least >> one of them is OPAL_PROC_LOCALITY_UNKNOWN. >> >> >> as you wrote, there might be a bunch of other corner cases. >> that being said, i ll try to write a simple proof of concept and see it >> this specific hang can be avoided >> >> Cheers, >> >> Gilles >> >> On 2014/06/20 12:08, Ralph Castain wrote: >>> It is related, but it means that coll/ml has a higher degree of sensitivity >>> to the binding pattern than what you reported (which was that coll/ml >>> doesn't work with unbound processes). What we are now seeing is that >>> coll/ml also doesn't work when processes are bound across sockets. >>> >>> Which means that Nathan's revised tests are going to have to cover a lot >>> more corner cases. Our locality flags don't currently include >>> "bound-to-multiple-sockets", and I'm not sure how he is going to easily >>> resolve that case. >>> >> ___ >> devel mailing list >> de...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >> Link to this post: >> http://www.open-mpi.org/community/lists/devel/2014/06/15036.php > > ___ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2014/06/15037.php
Re: [OMPI devel] trunk hangs when I specify a particular binding by rankfile
Ralph, Here is attached a patch that fixes/works around my issue. this is more of a proof of concept, so i did not commit it to the trunk. basically : opal_hwloc_base_get_relative_locality (topo, set1, set2) sets the locality based on the deepest element that is part of both set1 and set2. in my case, set2 means "all the available cpus" that is why the subroutine will return OPAL_PROC_ON_HWTHREAD the patch uses opal_hwloc_base_get_relative_locality2 instead. if one of the cpuset means "all the available cpus", then the subroutine will simply return OPAL_PROC_ON_NODE. i am puzzled wether this is a bug in opal_hwloc_base_get_relative_locality or in proc.c that should not call this subroutine because it does not do what should be expected. Cheers, Gilles On 2014/06/20 13:59, Gilles Gouaillardet wrote: > Ralph, > > my test VM is single socket four cores. > here is something odd i just found when running mpirun -np 2 > intercomm_create. > tasks [0,1] are bound on cpus [0,1] => OK > tasks[2-3] (first spawn) are bound on cpus [2,3] => OK > tasks[4-5] (second spawn) are not bound (and cpuset is [0-3]) => OK > > in ompi_proc_set_locality (ompi/proc/proc.c:228) on task 0 > locality = > opal_hwloc_base_get_relative_locality(opal_hwloc_topology, > > ompi_process_info.cpuset, > > cpu_bitmap); > where > ompi_process_info.cpuset is "0" > cpu_bitmap is "0-3" > > and locality is set to OPAL_PROC_ON_HWTHREAD (!) > > is this correct ? > > i would have expected OPAL_PROC_ON_L2CACHE (since there is a single L2 > cache on my vm, > as reported by lstopo) or even OPAL_PROC_LOCALITY_UNKNOWN > > then in mca_coll_ml_comm_query (ompi/mca/coll/ml/coll_ml_module.c:2899) > the module > disqualifies itself if !ompi_rte_proc_bound. > if locality were previously set to OPAL_PROC_LOCALITY_UNKNOWN, coll/ml > could checked the flag > of all the procs of the communicator and disqualify itself if at least > one of them is OPAL_PROC_LOCALITY_UNKNOWN. > > > as you wrote, there might be a bunch of other corner cases. > that being said, i ll try to write a simple proof of concept and see it > this specific hang can be avoided > > Cheers, > > Gilles > > On 2014/06/20 12:08, Ralph Castain wrote: >> It is related, but it means that coll/ml has a higher degree of sensitivity >> to the binding pattern than what you reported (which was that coll/ml >> doesn't work with unbound processes). What we are now seeing is that coll/ml >> also doesn't work when processes are bound across sockets. >> >> Which means that Nathan's revised tests are going to have to cover a lot >> more corner cases. Our locality flags don't currently include >> "bound-to-multiple-sockets", and I'm not sure how he is going to easily >> resolve that case. >> > ___ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2014/06/15036.php Index: opal/mca/hwloc/base/base.h === --- opal/mca/hwloc/base/base.h (revision 32056) +++ opal/mca/hwloc/base/base.h (working copy) @@ -1,6 +1,8 @@ /* * Copyright (c) 2011-2012 Cisco Systems, Inc. All rights reserved. * Copyright (c) 2013-2014 Intel, Inc. All rights reserved. + * Copyright (c) 2014 Research Organization for Information Science + * and Technology (RIST). All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -86,6 +88,9 @@ OPAL_DECLSPEC opal_hwloc_locality_t opal_hwloc_base_get_relative_locality(hwloc_topology_t topo, char *cpuset1, char *cpuset2); +OPAL_DECLSPEC opal_hwloc_locality_t opal_hwloc_base_get_relative_locality2(hwloc_topology_t topo, + char *cpuset1, char *cpuset2); + OPAL_DECLSPEC int opal_hwloc_base_set_binding_policy(opal_binding_policy_t *policy, char *spec); /** Index: opal/mca/hwloc/base/hwloc_base_util.c === --- opal/mca/hwloc/base/hwloc_base_util.c (revision 32056) +++ opal/mca/hwloc/base/hwloc_base_util.c (working copy) @@ -13,6 +13,8 @@ * Copyright (c) 2012-2013 Los Alamos National Security, LLC. * All rights reserved. * Copyright (c) 2013-2014 Intel, Inc. All rights reserved. + * Copyright (c) 2014 Research Organization for Information Science + * and Technology (RIST). All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -1419,6 +1421,130 @@ return locality; } +opal_hwloc_locality_t opal_hwloc_base_get_relative_locality2(hwloc_topo
Re: [OMPI devel] trunk hangs when I specify a particular binding by rankfile
Ralph, my test VM is single socket four cores. here is something odd i just found when running mpirun -np 2 intercomm_create. tasks [0,1] are bound on cpus [0,1] => OK tasks[2-3] (first spawn) are bound on cpus [2,3] => OK tasks[4-5] (second spawn) are not bound (and cpuset is [0-3]) => OK in ompi_proc_set_locality (ompi/proc/proc.c:228) on task 0 locality = opal_hwloc_base_get_relative_locality(opal_hwloc_topology, ompi_process_info.cpuset, cpu_bitmap); where ompi_process_info.cpuset is "0" cpu_bitmap is "0-3" and locality is set to OPAL_PROC_ON_HWTHREAD (!) is this correct ? i would have expected OPAL_PROC_ON_L2CACHE (since there is a single L2 cache on my vm, as reported by lstopo) or even OPAL_PROC_LOCALITY_UNKNOWN then in mca_coll_ml_comm_query (ompi/mca/coll/ml/coll_ml_module.c:2899) the module disqualifies itself if !ompi_rte_proc_bound. if locality were previously set to OPAL_PROC_LOCALITY_UNKNOWN, coll/ml could checked the flag of all the procs of the communicator and disqualify itself if at least one of them is OPAL_PROC_LOCALITY_UNKNOWN. as you wrote, there might be a bunch of other corner cases. that being said, i ll try to write a simple proof of concept and see it this specific hang can be avoided Cheers, Gilles On 2014/06/20 12:08, Ralph Castain wrote: > It is related, but it means that coll/ml has a higher degree of sensitivity > to the binding pattern than what you reported (which was that coll/ml doesn't > work with unbound processes). What we are now seeing is that coll/ml also > doesn't work when processes are bound across sockets. > > Which means that Nathan's revised tests are going to have to cover a lot more > corner cases. Our locality flags don't currently include > "bound-to-multiple-sockets", and I'm not sure how he is going to easily > resolve that case. >
Re: [OMPI devel] trunk hangs when I specify a particular binding by rankfile
I'm not sure, but I guess it's related to Gilles's ticket. It's a quite bad binding pattern as Ralph pointed out, so checking for that condition and disqualifying coll/ml could be a practical solution as well. Tetsuya > It is related, but it means that coll/ml has a higher degree of sensitivity to the binding pattern than what you reported (which was that coll/ml doesn't work with unbound processes). What we are now > seeing is that coll/ml also doesn't work when processes are bound across sockets. > > Which means that Nathan's revised tests are going to have to cover a lot more corner cases. Our locality flags don't currently include "bound-to-multiple-sockets", and I'm not sure how he is going to > easily resolve that case. > > > On Jun 19, 2014, at 8:02 PM, Gilles Gouaillardet wrote: > > > Ralph and Tetsuya, > > > > is this related to the hang i reported at > > http://www.open-mpi.org/community/lists/devel/2014/06/14975.php ? > > > > Nathan already replied he is working on a fix. > > > > Cheers, > > > > Gilles > > > > > > On 2014/06/20 11:54, Ralph Castain wrote: > >> My guess is that the coll/ml component may have problems with binding a single process across multiple cores like that - it might be that we'll have to have it check for that condition and > disqualify itself. It is a particularly bad binding pattern, though, as shared memory gets completely messed up when you split that way. > >> > > > > ___ > > devel mailing list > > de...@open-mpi.org > > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > > Link to this post: http://www.open-mpi.org/community/lists/devel/2014/06/15033.php > > ___ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: http://www.open-mpi.org/community/lists/devel/2014/06/15034.php
Re: [OMPI devel] trunk hangs when I specify a particular binding by rankfile
It is related, but it means that coll/ml has a higher degree of sensitivity to the binding pattern than what you reported (which was that coll/ml doesn't work with unbound processes). What we are now seeing is that coll/ml also doesn't work when processes are bound across sockets. Which means that Nathan's revised tests are going to have to cover a lot more corner cases. Our locality flags don't currently include "bound-to-multiple-sockets", and I'm not sure how he is going to easily resolve that case. On Jun 19, 2014, at 8:02 PM, Gilles Gouaillardet wrote: > Ralph and Tetsuya, > > is this related to the hang i reported at > http://www.open-mpi.org/community/lists/devel/2014/06/14975.php ? > > Nathan already replied he is working on a fix. > > Cheers, > > Gilles > > > On 2014/06/20 11:54, Ralph Castain wrote: >> My guess is that the coll/ml component may have problems with binding a >> single process across multiple cores like that - it might be that we'll have >> to have it check for that condition and disqualify itself. It is a >> particularly bad binding pattern, though, as shared memory gets completely >> messed up when you split that way. >> > > ___ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2014/06/15033.php
Re: [OMPI devel] trunk hangs when I specify a particular binding by rankfile
Ralph and Tetsuya, is this related to the hang i reported at http://www.open-mpi.org/community/lists/devel/2014/06/14975.php ? Nathan already replied he is working on a fix. Cheers, Gilles On 2014/06/20 11:54, Ralph Castain wrote: > My guess is that the coll/ml component may have problems with binding a > single process across multiple cores like that - it might be that we'll have > to have it check for that condition and disqualify itself. It is a > particularly bad binding pattern, though, as shared memory gets completely > messed up when you split that way. >
Re: [OMPI devel] trunk hangs when I specify a particular binding by rankfile
My guess is that the coll/ml component may have problems with binding a single process across multiple cores like that - it might be that we'll have to have it check for that condition and disqualify itself. It is a particularly bad binding pattern, though, as shared memory gets completely messed up when you split that way. On Jun 19, 2014, at 3:57 PM, tmish...@jcity.maeda.co.jp wrote: > > Hi folks, > > Recently I have been seeing a hang with trunk when I specify a > particular binding by use of rankfile or "-map-by slot". > > This can be reproduced by the rankfile which allocates a process > beyond socket boundary. For example, on the node05 which has 2 socket > with 4 core, the rank 1 is allocated through socket 0 and 1 as shown > below. Then it hangs in the middle of communication. > > [mishima@manage trial]$ cat rankfile1 > rank 0=node05 slot=0-1 > rank 1=node05 slot=3-4 > rank 2=node05 slot=6-7 > > [mishima@manage trial]$ mpirun -rf rankfile1 -report-bindings demos/myprog > [node05.cluster:02342] MCW rank 0 bound to socket 0[core 0[hwt 0]], socket > 0[core 1[hwt 0]]: [B/B/./.][./././.] > [node05.cluster:02342] MCW rank 1 bound to socket 0[core 3[hwt 0]], socket > 1[core 4[hwt 0]]: [./././B][B/././.] > [node05.cluster:02342] MCW rank 2 bound to socket 1[core 6[hwt 0]], socket > 1[core 7[hwt 0]]: [./././.][././B/B] > Hello world from process 2 of 3 > Hello world from process 1 of 3 > << hang here! >> > > If I disable coll_ml or use 1.8 series, it works, which means it > might be affected by coll_ml component, I guess. But, unfortunately, > I have no idea to fix this problem. So, please somebody could resolve > the issue. > > [mishima@manage trial]$ mpirun -rf rankfile1 -report-bindings -mca > coll_ml_priority 0 demos/myprog > [node05.cluster:02382] MCW rank 0 bound to socket 0[core 0[hwt 0]], socket > 0[core 1[hwt 0]]: [B/B/./.][./././.] > [node05.cluster:02382] MCW rank 1 bound to socket 0[core 3[hwt 0]], socket > 1[core 4[hwt 0]]: [./././B][B/././.] > [node05.cluster:02382] MCW rank 2 bound to socket 1[core 6[hwt 0]], socket > 1[core 7[hwt 0]]: [./././.][././B/B] > Hello world from process 2 of 3 > Hello world from process 0 of 3 > Hello world from process 1 of 3 > > In addtition, when I use the host with 12 cores, "-map-by slot" causes the > same problem. > [mishima@manage trial]$ mpirun -np 3 -map-by slot:pe=4 -report-bindings > demos/myprog > [manage.cluster:02557] MCW rank 0 bound to socket 0[core 0[hwt 0]], socket > 0[core 1[hwt 0]], socket 0[core 2[hwt 0]], so > cket 0[core 3[hwt 0]]: [B/B/B/B/./.][./././././.] > [manage.cluster:02557] MCW rank 1 bound to socket 0[core 4[hwt 0]], socket > 0[core 5[hwt 0]], socket 1[core 6[hwt 0]], so > cket 1[core 7[hwt 0]]: [././././B/B][B/B/./././.] > [manage.cluster:02557] MCW rank 2 bound to socket 1[core 8[hwt 0]], socket > 1[core 9[hwt 0]], socket 1[core 10[hwt 0]], s > ocket 1[core 11[hwt 0]]: [./././././.][././B/B/B/B] > Hello world from process 1 of 3 > Hello world from process 2 of 3 > << hang here! >> > > Regards, > Tetsuya Mishima > > ___ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2014/06/15030.php
[OMPI devel] trunk hangs when I specify a particular binding by rankfile
Hi folks, Recently I have been seeing a hang with trunk when I specify a particular binding by use of rankfile or "-map-by slot". This can be reproduced by the rankfile which allocates a process beyond socket boundary. For example, on the node05 which has 2 socket with 4 core, the rank 1 is allocated through socket 0 and 1 as shown below. Then it hangs in the middle of communication. [mishima@manage trial]$ cat rankfile1 rank 0=node05 slot=0-1 rank 1=node05 slot=3-4 rank 2=node05 slot=6-7 [mishima@manage trial]$ mpirun -rf rankfile1 -report-bindings demos/myprog [node05.cluster:02342] MCW rank 0 bound to socket 0[core 0[hwt 0]], socket 0[core 1[hwt 0]]: [B/B/./.][./././.] [node05.cluster:02342] MCW rank 1 bound to socket 0[core 3[hwt 0]], socket 1[core 4[hwt 0]]: [./././B][B/././.] [node05.cluster:02342] MCW rank 2 bound to socket 1[core 6[hwt 0]], socket 1[core 7[hwt 0]]: [./././.][././B/B] Hello world from process 2 of 3 Hello world from process 1 of 3 << hang here! >> If I disable coll_ml or use 1.8 series, it works, which means it might be affected by coll_ml component, I guess. But, unfortunately, I have no idea to fix this problem. So, please somebody could resolve the issue. [mishima@manage trial]$ mpirun -rf rankfile1 -report-bindings -mca coll_ml_priority 0 demos/myprog [node05.cluster:02382] MCW rank 0 bound to socket 0[core 0[hwt 0]], socket 0[core 1[hwt 0]]: [B/B/./.][./././.] [node05.cluster:02382] MCW rank 1 bound to socket 0[core 3[hwt 0]], socket 1[core 4[hwt 0]]: [./././B][B/././.] [node05.cluster:02382] MCW rank 2 bound to socket 1[core 6[hwt 0]], socket 1[core 7[hwt 0]]: [./././.][././B/B] Hello world from process 2 of 3 Hello world from process 0 of 3 Hello world from process 1 of 3 In addtition, when I use the host with 12 cores, "-map-by slot" causes the same problem. [mishima@manage trial]$ mpirun -np 3 -map-by slot:pe=4 -report-bindings demos/myprog [manage.cluster:02557] MCW rank 0 bound to socket 0[core 0[hwt 0]], socket 0[core 1[hwt 0]], socket 0[core 2[hwt 0]], so cket 0[core 3[hwt 0]]: [B/B/B/B/./.][./././././.] [manage.cluster:02557] MCW rank 1 bound to socket 0[core 4[hwt 0]], socket 0[core 5[hwt 0]], socket 1[core 6[hwt 0]], so cket 1[core 7[hwt 0]]: [././././B/B][B/B/./././.] [manage.cluster:02557] MCW rank 2 bound to socket 1[core 8[hwt 0]], socket 1[core 9[hwt 0]], socket 1[core 10[hwt 0]], s ocket 1[core 11[hwt 0]]: [./././././.][././B/B/B/B] Hello world from process 1 of 3 Hello world from process 2 of 3 << hang here! >> Regards, Tetsuya Mishima