subject:"\[OMPI devel\] trunk hangs when I specify a particular binding by rankfile"

Re: [OMPI devel] trunk hangs when I specify a particular binding by rankfile

2014-06-20 Thread tmishima



Thanks Ralph. I'll check it on next Monday.

Tetsuya

> Should be fixed with r32058
>
>
> On Jun 20, 2014, at 4:13 AM, tmish...@jcity.maeda.co.jp wrote:
>
> >
> >
> > Hi Ralph,
> >
> > By the way, something is wrong with your latest rmaps_rank_file.c.
> > I've got the error below. I'm tring to find the problem. But, you
> > could find it more quickly...
> >
> > [mishima@manage trial]$ cat rankfile
> > rank 0=node05 slot=0-1
> > rank 1=node05 slot=3-4
> > rank 2=node05 slot=6-7
> > [mishima@manage trial]$ mpirun -np 3 -rf rankfile -report-bindings
> > demos/myprog
> >
--
> > Error, invalid syntax in the rankfile (rankfile)
> > syntax must be the fallowing
> > rank i=host_i slot=string
> > Examples of proper syntax include:
> >rank 1=host1 slot=1:0,1
> >rank 0=host2 slot=0:*
> >rank 2=host4 slot=1-2
> >rank 3=host3 slot=0:1;1:0-2
> >
--
> > [manage.cluster:24456] [[20979,0],0] ORTE_ERROR_LOG: Bad parameter in
file
> > rmaps_rank_file.c at line 483
> > [manage.cluster:24456] [[20979,0],0] ORTE_ERROR_LOG: Bad parameter in
file
> > rmaps_rank_file.c at line 149
> > [manage.cluster:24456] [[20979,0],0] ORTE_ERROR_LOG: Bad parameter in
file
> > base/rmaps_base_map_job.c at line 287
> >
> > Regards,
> > Tetsuya Mishima
> >
> >> My guess is that the coll/ml component may have problems with binding
a
> > single process across multiple cores like that - it might be that we'll
> > have to have it check for that condition and disqualify
> >> itself. It is a particularly bad binding pattern, though, as shared
> > memory gets completely messed up when you split that way.
> >>
> >>
> >> On Jun 19, 2014, at 3:57 PM, tmish...@jcity.maeda.co.jp wrote:
> >>
> >>>
> >>> Hi folks,
> >>>
> >>> Recently I have been seeing a hang with trunk when I specify a
> >>> particular binding by use of rankfile or "-map-by slot".
> >>>
> >>> This can be reproduced by the rankfile which allocates a process
> >>> beyond socket boundary. For example, on the node05 which has 2 socket
> >>> with 4 core, the rank 1 is allocated through socket 0 and 1 as shown
> >>> below. Then it hangs in the middle of communication.
> >>>
> >>> [mishima@manage trial]$ cat rankfile1
> >>> rank 0=node05 slot=0-1
> >>> rank 1=node05 slot=3-4
> >>> rank 2=node05 slot=6-7
> >>>
> >>> [mishima@manage trial]$ mpirun -rf rankfile1 -report-bindings
> > demos/myprog
> >>> [node05.cluster:02342] MCW rank 0 bound to socket 0[core 0[hwt 0]],
> > socket
> >>> 0[core 1[hwt 0]]: [B/B/./.][./././.]
> >>> [node05.cluster:02342] MCW rank 1 bound to socket 0[core 3[hwt 0]],
> > socket
> >>> 1[core 4[hwt 0]]: [./././B][B/././.]
> >>> [node05.cluster:02342] MCW rank 2 bound to socket 1[core 6[hwt 0]],
> > socket
> >>> 1[core 7[hwt 0]]: [./././.][././B/B]
> >>> Hello world from process 2 of 3
> >>> Hello world from process 1 of 3
> >>> << hang here! >>
> >>>
> >>> If I disable coll_ml or use 1.8 series, it works, which means it
> >>> might be affected by coll_ml component, I guess. But, unfortunately,
> >>> I have no idea to fix this problem. So, please somebody could resolve
> >>> the issue.
> >>>
> >>> [mishima@manage trial]$ mpirun -rf rankfile1 -report-bindings -mca
> >>> coll_ml_priority 0 demos/myprog
> >>> [node05.cluster:02382] MCW rank 0 bound to socket 0[core 0[hwt 0]],
> > socket
> >>> 0[core 1[hwt 0]]: [B/B/./.][./././.]
> >>> [node05.cluster:02382] MCW rank 1 bound to socket 0[core 3[hwt 0]],
> > socket
> >>> 1[core 4[hwt 0]]: [./././B][B/././.]
> >>> [node05.cluster:02382] MCW rank 2 bound to socket 1[core 6[hwt 0]],
> > socket
> >>> 1[core 7[hwt 0]]: [./././.][././B/B]
> >>> Hello world from process 2 of 3
> >>> Hello world from process 0 of 3
> >>> Hello world from process 1 of 3
> >>>
> >>> In addtition, when I use the host with 12 cores, "-map-by slot"
causes
> > the
> >>> same problem.
> >>> [mishima@manage trial]$ mpirun -np 3 -map-by slot:pe=4
-report-bindings
> >>> demos/myprog
> >>> [manage.cluster:02557] MCW rank 0 bound to socket 0[core 0[hwt 0]],
> > socket
> >>> 0[core 1[hwt 0]], socket 0[core 2[hwt 0]], so
> >>> cket 0[core 3[hwt 0]]: [B/B/B/B/./.][./././././.]
> >>> [manage.cluster:02557] MCW rank 1 bound to socket 0[core 4[hwt 0]],
> > socket
> >>> 0[core 5[hwt 0]], socket 1[core 6[hwt 0]], so
> >>> cket 1[core 7[hwt 0]]: [././././B/B][B/B/./././.]
> >>> [manage.cluster:02557] MCW rank 2 bound to socket 1[core 8[hwt 0]],
> > socket
> >>> 1[core 9[hwt 0]], socket 1[core 10[hwt 0]], s
> >>> ocket 1[core 11[hwt 0]]: [./././././.][././B/B/B/B]
> >>> Hello world from process 1 of 3
> >>> Hello world from process 2 of 3
> >>> << hang here! >>
> >>>
> >>> Regards,
> >>> Tetsuya Mishima
> >>>
> >>> ___
> >>> devel mailing list
> >>> de...@open-mpi.org
> >>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >>> Link to this post:
> > http://www.o

Re: [OMPI devel] trunk hangs when I specify a particular binding by rankfile

2014-06-20 Thread Ralph Castain

Should be fixed with r32058


On Jun 20, 2014, at 4:13 AM, tmish...@jcity.maeda.co.jp wrote:

> 
> 
> Hi Ralph,
> 
> By the way, something is wrong with your latest rmaps_rank_file.c.
> I've got the error below. I'm tring to find the problem. But, you
> could find it more quickly...
> 
> [mishima@manage trial]$ cat rankfile
> rank 0=node05 slot=0-1
> rank 1=node05 slot=3-4
> rank 2=node05 slot=6-7
> [mishima@manage trial]$ mpirun -np 3 -rf rankfile -report-bindings
> demos/myprog
> --
> Error, invalid syntax in the rankfile (rankfile)
> syntax must be the fallowing
> rank i=host_i slot=string
> Examples of proper syntax include:
>rank 1=host1 slot=1:0,1
>rank 0=host2 slot=0:*
>rank 2=host4 slot=1-2
>rank 3=host3 slot=0:1;1:0-2
> --
> [manage.cluster:24456] [[20979,0],0] ORTE_ERROR_LOG: Bad parameter in file
> rmaps_rank_file.c at line 483
> [manage.cluster:24456] [[20979,0],0] ORTE_ERROR_LOG: Bad parameter in file
> rmaps_rank_file.c at line 149
> [manage.cluster:24456] [[20979,0],0] ORTE_ERROR_LOG: Bad parameter in file
> base/rmaps_base_map_job.c at line 287
> 
> Regards,
> Tetsuya Mishima
> 
>> My guess is that the coll/ml component may have problems with binding a
> single process across multiple cores like that - it might be that we'll
> have to have it check for that condition and disqualify
>> itself. It is a particularly bad binding pattern, though, as shared
> memory gets completely messed up when you split that way.
>> 
>> 
>> On Jun 19, 2014, at 3:57 PM, tmish...@jcity.maeda.co.jp wrote:
>> 
>>> 
>>> Hi folks,
>>> 
>>> Recently I have been seeing a hang with trunk when I specify a
>>> particular binding by use of rankfile or "-map-by slot".
>>> 
>>> This can be reproduced by the rankfile which allocates a process
>>> beyond socket boundary. For example, on the node05 which has 2 socket
>>> with 4 core, the rank 1 is allocated through socket 0 and 1 as shown
>>> below. Then it hangs in the middle of communication.
>>> 
>>> [mishima@manage trial]$ cat rankfile1
>>> rank 0=node05 slot=0-1
>>> rank 1=node05 slot=3-4
>>> rank 2=node05 slot=6-7
>>> 
>>> [mishima@manage trial]$ mpirun -rf rankfile1 -report-bindings
> demos/myprog
>>> [node05.cluster:02342] MCW rank 0 bound to socket 0[core 0[hwt 0]],
> socket
>>> 0[core 1[hwt 0]]: [B/B/./.][./././.]
>>> [node05.cluster:02342] MCW rank 1 bound to socket 0[core 3[hwt 0]],
> socket
>>> 1[core 4[hwt 0]]: [./././B][B/././.]
>>> [node05.cluster:02342] MCW rank 2 bound to socket 1[core 6[hwt 0]],
> socket
>>> 1[core 7[hwt 0]]: [./././.][././B/B]
>>> Hello world from process 2 of 3
>>> Hello world from process 1 of 3
>>> << hang here! >>
>>> 
>>> If I disable coll_ml or use 1.8 series, it works, which means it
>>> might be affected by coll_ml component, I guess. But, unfortunately,
>>> I have no idea to fix this problem. So, please somebody could resolve
>>> the issue.
>>> 
>>> [mishima@manage trial]$ mpirun -rf rankfile1 -report-bindings -mca
>>> coll_ml_priority 0 demos/myprog
>>> [node05.cluster:02382] MCW rank 0 bound to socket 0[core 0[hwt 0]],
> socket
>>> 0[core 1[hwt 0]]: [B/B/./.][./././.]
>>> [node05.cluster:02382] MCW rank 1 bound to socket 0[core 3[hwt 0]],
> socket
>>> 1[core 4[hwt 0]]: [./././B][B/././.]
>>> [node05.cluster:02382] MCW rank 2 bound to socket 1[core 6[hwt 0]],
> socket
>>> 1[core 7[hwt 0]]: [./././.][././B/B]
>>> Hello world from process 2 of 3
>>> Hello world from process 0 of 3
>>> Hello world from process 1 of 3
>>> 
>>> In addtition, when I use the host with 12 cores, "-map-by slot" causes
> the
>>> same problem.
>>> [mishima@manage trial]$ mpirun -np 3 -map-by slot:pe=4 -report-bindings
>>> demos/myprog
>>> [manage.cluster:02557] MCW rank 0 bound to socket 0[core 0[hwt 0]],
> socket
>>> 0[core 1[hwt 0]], socket 0[core 2[hwt 0]], so
>>> cket 0[core 3[hwt 0]]: [B/B/B/B/./.][./././././.]
>>> [manage.cluster:02557] MCW rank 1 bound to socket 0[core 4[hwt 0]],
> socket
>>> 0[core 5[hwt 0]], socket 1[core 6[hwt 0]], so
>>> cket 1[core 7[hwt 0]]: [././././B/B][B/B/./././.]
>>> [manage.cluster:02557] MCW rank 2 bound to socket 1[core 8[hwt 0]],
> socket
>>> 1[core 9[hwt 0]], socket 1[core 10[hwt 0]], s
>>> ocket 1[core 11[hwt 0]]: [./././././.][././B/B/B/B]
>>> Hello world from process 1 of 3
>>> Hello world from process 2 of 3
>>> << hang here! >>
>>> 
>>> Regards,
>>> Tetsuya Mishima
>>> 
>>> ___
>>> devel mailing list
>>> de...@open-mpi.org
>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2014/06/15030.php
>> 
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post:
> http://www.open-mpi.org/community/lists/devel/201

Re: [OMPI devel] trunk hangs when I specify a particular binding by rankfile

2014-06-20 Thread tmishima



Hi Ralph,

By the way, something is wrong with your latest rmaps_rank_file.c.
I've got the error below. I'm tring to find the problem. But, you
could find it more quickly...

[mishima@manage trial]$ cat rankfile
rank 0=node05 slot=0-1
rank 1=node05 slot=3-4
rank 2=node05 slot=6-7
[mishima@manage trial]$ mpirun -np 3 -rf rankfile -report-bindings
demos/myprog
--
Error, invalid syntax in the rankfile (rankfile)
syntax must be the fallowing
rank i=host_i slot=string
Examples of proper syntax include:
rank 1=host1 slot=1:0,1
rank 0=host2 slot=0:*
rank 2=host4 slot=1-2
rank 3=host3 slot=0:1;1:0-2
--
[manage.cluster:24456] [[20979,0],0] ORTE_ERROR_LOG: Bad parameter in file
rmaps_rank_file.c at line 483
[manage.cluster:24456] [[20979,0],0] ORTE_ERROR_LOG: Bad parameter in file
rmaps_rank_file.c at line 149
[manage.cluster:24456] [[20979,0],0] ORTE_ERROR_LOG: Bad parameter in file
base/rmaps_base_map_job.c at line 287

Regards,
Tetsuya Mishima

> My guess is that the coll/ml component may have problems with binding a
single process across multiple cores like that - it might be that we'll
have to have it check for that condition and disqualify
> itself. It is a particularly bad binding pattern, though, as shared
memory gets completely messed up when you split that way.
>
>
> On Jun 19, 2014, at 3:57 PM, tmish...@jcity.maeda.co.jp wrote:
>
> >
> > Hi folks,
> >
> > Recently I have been seeing a hang with trunk when I specify a
> > particular binding by use of rankfile or "-map-by slot".
> >
> > This can be reproduced by the rankfile which allocates a process
> > beyond socket boundary. For example, on the node05 which has 2 socket
> > with 4 core, the rank 1 is allocated through socket 0 and 1 as shown
> > below. Then it hangs in the middle of communication.
> >
> > [mishima@manage trial]$ cat rankfile1
> > rank 0=node05 slot=0-1
> > rank 1=node05 slot=3-4
> > rank 2=node05 slot=6-7
> >
> > [mishima@manage trial]$ mpirun -rf rankfile1 -report-bindings
demos/myprog
> > [node05.cluster:02342] MCW rank 0 bound to socket 0[core 0[hwt 0]],
socket
> > 0[core 1[hwt 0]]: [B/B/./.][./././.]
> > [node05.cluster:02342] MCW rank 1 bound to socket 0[core 3[hwt 0]],
socket
> > 1[core 4[hwt 0]]: [./././B][B/././.]
> > [node05.cluster:02342] MCW rank 2 bound to socket 1[core 6[hwt 0]],
socket
> > 1[core 7[hwt 0]]: [./././.][././B/B]
> > Hello world from process 2 of 3
> > Hello world from process 1 of 3
> > << hang here! >>
> >
> > If I disable coll_ml or use 1.8 series, it works, which means it
> > might be affected by coll_ml component, I guess. But, unfortunately,
> > I have no idea to fix this problem. So, please somebody could resolve
> > the issue.
> >
> > [mishima@manage trial]$ mpirun -rf rankfile1 -report-bindings -mca
> > coll_ml_priority 0 demos/myprog
> > [node05.cluster:02382] MCW rank 0 bound to socket 0[core 0[hwt 0]],
socket
> > 0[core 1[hwt 0]]: [B/B/./.][./././.]
> > [node05.cluster:02382] MCW rank 1 bound to socket 0[core 3[hwt 0]],
socket
> > 1[core 4[hwt 0]]: [./././B][B/././.]
> > [node05.cluster:02382] MCW rank 2 bound to socket 1[core 6[hwt 0]],
socket
> > 1[core 7[hwt 0]]: [./././.][././B/B]
> > Hello world from process 2 of 3
> > Hello world from process 0 of 3
> > Hello world from process 1 of 3
> >
> > In addtition, when I use the host with 12 cores, "-map-by slot" causes
the
> > same problem.
> > [mishima@manage trial]$ mpirun -np 3 -map-by slot:pe=4 -report-bindings
> > demos/myprog
> > [manage.cluster:02557] MCW rank 0 bound to socket 0[core 0[hwt 0]],
socket
> > 0[core 1[hwt 0]], socket 0[core 2[hwt 0]], so
> > cket 0[core 3[hwt 0]]: [B/B/B/B/./.][./././././.]
> > [manage.cluster:02557] MCW rank 1 bound to socket 0[core 4[hwt 0]],
socket
> > 0[core 5[hwt 0]], socket 1[core 6[hwt 0]], so
> > cket 1[core 7[hwt 0]]: [././././B/B][B/B/./././.]
> > [manage.cluster:02557] MCW rank 2 bound to socket 1[core 8[hwt 0]],
socket
> > 1[core 9[hwt 0]], socket 1[core 10[hwt 0]], s
> > ocket 1[core 11[hwt 0]]: [./././././.][././B/B/B/B]
> > Hello world from process 1 of 3
> > Hello world from process 2 of 3
> > << hang here! >>
> >
> > Regards,
> > Tetsuya Mishima
> >
> > ___
> > devel mailing list
> > de...@open-mpi.org
> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> > Link to this post:
http://www.open-mpi.org/community/lists/devel/2014/06/15030.php
>
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
http://www.open-mpi.org/community/lists/devel/2014/06/15032.php

Re: [OMPI devel] trunk hangs when I specify a particular binding by rankfile

2014-06-20 Thread Ralph Castain

Hmmm...this is a tough one. It basically comes down to what we mean by relative 
locality. Initially, we meant "at what level do these procs share cpus" - 
however, coll/ml is using it as "at what level are these procs commonly bound". 
Subtle difference, but significant.

Your proposed version implements the second interpretation - even though we 
share cpus down to the hwthread level, it correctly reports that we are only 
commonly bound to the node. I'm unclear how the shared memory system (or other 
areas using that value) will respond to that change in meaning.

Probably requires looking a little more broadly (just search the ompi layer for 
anything referencing the ompi_proc_t locality flag) to ensure everything can 
handle (or be adjusted to handle) the revised definition. If so, then I have no 
issue with replacing the locality algorithm.

Would also require an RFC as that might impact folks working on branches.


On Jun 19, 2014, at 11:52 PM, Gilles Gouaillardet 
 wrote:

> Ralph,
> 
> Here is attached a patch that fixes/works around my issue.
> this is more of a proof of concept, so i did not commit it to the trunk.
> 
> basically :
> 
> opal_hwloc_base_get_relative_locality (topo, set1, set2)
> sets the locality based on the deepest element that is part of both set1 and 
> set2.
> in my case, set2 means "all the available cpus" that is why the subroutine
> will return OPAL_PROC_ON_HWTHREAD
> 
> the patch uses opal_hwloc_base_get_relative_locality2 instead.
> if one of the cpuset means "all the available cpus", then the subroutine will
> simply return OPAL_PROC_ON_NODE.
> 
> i am puzzled wether this is a bug in opal_hwloc_base_get_relative_locality
> or in proc.c that should not call this subroutine because it does not do what
> should be expected.
> 
> Cheers,
> 
> Gilles
> 
> On 2014/06/20 13:59, Gilles Gouaillardet wrote:
>> Ralph,
>> 
>> my test VM is single socket four cores.
>> here is something odd i just found when running mpirun -np 2
>> intercomm_create.
>> tasks [0,1] are bound on cpus [0,1] => OK
>> tasks[2-3] (first spawn) are bound on cpus [2,3] => OK
>> tasks[4-5] (second spawn) are not bound (and cpuset is [0-3]) => OK
>> 
>> in ompi_proc_set_locality (ompi/proc/proc.c:228) on task 0
>>locality =
>> opal_hwloc_base_get_relative_locality(opal_hwloc_topology,
>> 
>> ompi_process_info.cpuset,
>> 
>> cpu_bitmap);
>> where
>> ompi_process_info.cpuset is "0"
>> cpu_bitmap is "0-3"
>> 
>> and locality is set to OPAL_PROC_ON_HWTHREAD (!)
>> 
>> is this correct ?
>> 
>> i would have expected OPAL_PROC_ON_L2CACHE (since there is a single L2
>> cache on my vm,
>> as reported by lstopo) or even OPAL_PROC_LOCALITY_UNKNOWN
>> 
>> then in mca_coll_ml_comm_query (ompi/mca/coll/ml/coll_ml_module.c:2899)
>> the module
>> disqualifies itself if !ompi_rte_proc_bound.
>> if locality were previously set to OPAL_PROC_LOCALITY_UNKNOWN, coll/ml
>> could checked the flag
>> of all the procs of the communicator and disqualify itself if at least
>> one of them is OPAL_PROC_LOCALITY_UNKNOWN.
>> 
>> 
>> as you wrote, there might be a bunch of other corner cases.
>> that being said, i ll try to write a simple proof of concept and see it
>> this specific hang can be avoided
>> 
>> Cheers,
>> 
>> Gilles
>> 
>> On 2014/06/20 12:08, Ralph Castain wrote:
>>> It is related, but it means that coll/ml has a higher degree of sensitivity 
>>> to the binding pattern than what you reported (which was that coll/ml 
>>> doesn't work with unbound processes). What we are now seeing is that 
>>> coll/ml also doesn't work when processes are bound across sockets.
>>> 
>>> Which means that Nathan's revised tests are going to have to cover a lot 
>>> more corner cases. Our locality flags don't currently include 
>>> "bound-to-multiple-sockets", and I'm not sure how he is going to easily 
>>> resolve that case.
>>> 
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/devel/2014/06/15036.php
> 
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/06/15037.php

Re: [OMPI devel] trunk hangs when I specify a particular binding by rankfile

2014-06-20 Thread Gilles Gouaillardet

Ralph,

Here is attached a patch that fixes/works around my issue.
this is more of a proof of concept, so i did not commit it to the trunk.

basically :

opal_hwloc_base_get_relative_locality (topo, set1, set2)
sets the locality based on the deepest element that is part of both set1 and 
set2.
in my case, set2 means "all the available cpus" that is why the subroutine
will return OPAL_PROC_ON_HWTHREAD

the patch uses opal_hwloc_base_get_relative_locality2 instead.
if one of the cpuset means "all the available cpus", then the subroutine will
simply return OPAL_PROC_ON_NODE.

i am puzzled wether this is a bug in opal_hwloc_base_get_relative_locality
or in proc.c that should not call this subroutine because it does not do what
should be expected.

Cheers,

Gilles

On 2014/06/20 13:59, Gilles Gouaillardet wrote:
> Ralph,
>
> my test VM is single socket four cores.
> here is something odd i just found when running mpirun -np 2
> intercomm_create.
> tasks [0,1] are bound on cpus [0,1] => OK
> tasks[2-3] (first spawn) are bound on cpus [2,3] => OK
> tasks[4-5] (second spawn) are not bound (and cpuset is [0-3]) => OK
>
> in ompi_proc_set_locality (ompi/proc/proc.c:228) on task 0
> locality =
> opal_hwloc_base_get_relative_locality(opal_hwloc_topology,
> 
> ompi_process_info.cpuset,
> 
> cpu_bitmap);
> where
> ompi_process_info.cpuset is "0"
> cpu_bitmap is "0-3"
>
> and locality is set to OPAL_PROC_ON_HWTHREAD (!)
>
> is this correct ?
>
> i would have expected OPAL_PROC_ON_L2CACHE (since there is a single L2
> cache on my vm,
> as reported by lstopo) or even OPAL_PROC_LOCALITY_UNKNOWN
>
> then in mca_coll_ml_comm_query (ompi/mca/coll/ml/coll_ml_module.c:2899)
> the module
> disqualifies itself if !ompi_rte_proc_bound.
> if locality were previously set to OPAL_PROC_LOCALITY_UNKNOWN, coll/ml
> could checked the flag
> of all the procs of the communicator and disqualify itself if at least
> one of them is OPAL_PROC_LOCALITY_UNKNOWN.
>
>
> as you wrote, there might be a bunch of other corner cases.
> that being said, i ll try to write a simple proof of concept and see it
> this specific hang can be avoided
>
> Cheers,
>
> Gilles
>
> On 2014/06/20 12:08, Ralph Castain wrote:
>> It is related, but it means that coll/ml has a higher degree of sensitivity 
>> to the binding pattern than what you reported (which was that coll/ml 
>> doesn't work with unbound processes). What we are now seeing is that coll/ml 
>> also doesn't work when processes are bound across sockets.
>>
>> Which means that Nathan's revised tests are going to have to cover a lot 
>> more corner cases. Our locality flags don't currently include 
>> "bound-to-multiple-sockets", and I'm not sure how he is going to easily 
>> resolve that case.
>>
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/06/15036.php

Index: opal/mca/hwloc/base/base.h
===
--- opal/mca/hwloc/base/base.h  (revision 32056)
+++ opal/mca/hwloc/base/base.h  (working copy)
@@ -1,6 +1,8 @@
 /*
  * Copyright (c) 2011-2012 Cisco Systems, Inc.  All rights reserved.
  * Copyright (c) 2013-2014 Intel, Inc. All rights reserved.
+ * Copyright (c) 2014  Research Organization for Information Science
+ * and Technology (RIST). All rights reserved.
  * $COPYRIGHT$
  * 
  * Additional copyrights may follow
@@ -86,6 +88,9 @@
 OPAL_DECLSPEC opal_hwloc_locality_t 
opal_hwloc_base_get_relative_locality(hwloc_topology_t topo,
   char 
*cpuset1, char *cpuset2);

+OPAL_DECLSPEC opal_hwloc_locality_t 
opal_hwloc_base_get_relative_locality2(hwloc_topology_t topo,
+  char 
*cpuset1, char *cpuset2);
+
 OPAL_DECLSPEC int opal_hwloc_base_set_binding_policy(opal_binding_policy_t 
*policy, char *spec);

 /**
Index: opal/mca/hwloc/base/hwloc_base_util.c
===
--- opal/mca/hwloc/base/hwloc_base_util.c   (revision 32056)
+++ opal/mca/hwloc/base/hwloc_base_util.c   (working copy)
@@ -13,6 +13,8 @@
  * Copyright (c) 2012-2013 Los Alamos National Security, LLC.
  * All rights reserved.
  * Copyright (c) 2013-2014 Intel, Inc. All rights reserved.
+ * Copyright (c) 2014  Research Organization for Information Science
+ * and Technology (RIST). All rights reserved.
  * $COPYRIGHT$
  * 
  * Additional copyrights may follow
@@ -1419,6 +1421,130 @@
 return locality;
 }

+opal_hwloc_locality_t opal_hwloc_base_get_relative_locality2(hwloc_topo

Re: [OMPI devel] trunk hangs when I specify a particular binding by rankfile

2014-06-20 Thread Gilles Gouaillardet

Ralph,

my test VM is single socket four cores.
here is something odd i just found when running mpirun -np 2
intercomm_create.
tasks [0,1] are bound on cpus [0,1] => OK
tasks[2-3] (first spawn) are bound on cpus [2,3] => OK
tasks[4-5] (second spawn) are not bound (and cpuset is [0-3]) => OK

in ompi_proc_set_locality (ompi/proc/proc.c:228) on task 0
locality =
opal_hwloc_base_get_relative_locality(opal_hwloc_topology,

ompi_process_info.cpuset,

cpu_bitmap);
where
ompi_process_info.cpuset is "0"
cpu_bitmap is "0-3"

and locality is set to OPAL_PROC_ON_HWTHREAD (!)

is this correct ?

i would have expected OPAL_PROC_ON_L2CACHE (since there is a single L2
cache on my vm,
as reported by lstopo) or even OPAL_PROC_LOCALITY_UNKNOWN

then in mca_coll_ml_comm_query (ompi/mca/coll/ml/coll_ml_module.c:2899)
the module
disqualifies itself if !ompi_rte_proc_bound.
if locality were previously set to OPAL_PROC_LOCALITY_UNKNOWN, coll/ml
could checked the flag
of all the procs of the communicator and disqualify itself if at least
one of them is OPAL_PROC_LOCALITY_UNKNOWN.

as you wrote, there might be a bunch of other corner cases.
that being said, i ll try to write a simple proof of concept and see it
this specific hang can be avoided

Cheers,

Gilles

On 2014/06/20 12:08, Ralph Castain wrote:
> It is related, but it means that coll/ml has a higher degree of sensitivity 
> to the binding pattern than what you reported (which was that coll/ml doesn't 
> work with unbound processes). What we are now seeing is that coll/ml also 
> doesn't work when processes are bound across sockets.
>
> Which means that Nathan's revised tests are going to have to cover a lot more 
> corner cases. Our locality flags don't currently include 
> "bound-to-multiple-sockets", and I'm not sure how he is going to easily 
> resolve that case.
>

Re: [OMPI devel] trunk hangs when I specify a particular binding by rankfile

2014-06-20 Thread tmishima



I'm not sure, but I guess it's related to Gilles's ticket.
It's a quite bad binding pattern as Ralph pointed out, so
checking for that condition and disqualifying coll/ml could
be a practical solution as well.

Tetsuya

> It is related, but it means that coll/ml has a higher degree of
sensitivity to the binding pattern than what you reported (which was that
coll/ml doesn't work with unbound processes). What we are now
> seeing is that coll/ml also doesn't work when processes are bound across
sockets.
>
> Which means that Nathan's revised tests are going to have to cover a lot
more corner cases. Our locality flags don't currently include
"bound-to-multiple-sockets", and I'm not sure how he is going to
> easily resolve that case.
>
>
> On Jun 19, 2014, at 8:02 PM, Gilles Gouaillardet
 wrote:
>
> > Ralph and Tetsuya,
> >
> > is this related to the hang i reported at
> > http://www.open-mpi.org/community/lists/devel/2014/06/14975.php ?
> >
> > Nathan already replied he is working on a fix.
> >
> > Cheers,
> >
> > Gilles
> >
> >
> > On 2014/06/20 11:54, Ralph Castain wrote:
> >> My guess is that the coll/ml component may have problems with binding
a single process across multiple cores like that - it might be that we'll
have to have it check for that condition and
> disqualify itself. It is a particularly bad binding pattern, though, as
shared memory gets completely messed up when you split that way.
> >>
> >
> > ___
> > devel mailing list
> > de...@open-mpi.org
> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> > Link to this post:
http://www.open-mpi.org/community/lists/devel/2014/06/15033.php
>
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
http://www.open-mpi.org/community/lists/devel/2014/06/15034.php

Re: [OMPI devel] trunk hangs when I specify a particular binding by rankfile

2014-06-19 Thread Ralph Castain

It is related, but it means that coll/ml has a higher degree of sensitivity to 
the binding pattern than what you reported (which was that coll/ml doesn't work 
with unbound processes). What we are now seeing is that coll/ml also doesn't 
work when processes are bound across sockets.

Which means that Nathan's revised tests are going to have to cover a lot more 
corner cases. Our locality flags don't currently include 
"bound-to-multiple-sockets", and I'm not sure how he is going to easily resolve 
that case.

On Jun 19, 2014, at 8:02 PM, Gilles Gouaillardet 
 wrote:

> Ralph and Tetsuya,
> 
> is this related to the hang i reported at
> http://www.open-mpi.org/community/lists/devel/2014/06/14975.php ?
> 
> Nathan already replied he is working on a fix.
> 
> Cheers,
> 
> Gilles
> 
> 
> On 2014/06/20 11:54, Ralph Castain wrote:
>> My guess is that the coll/ml component may have problems with binding a 
>> single process across multiple cores like that - it might be that we'll have 
>> to have it check for that condition and disqualify itself. It is a 
>> particularly bad binding pattern, though, as shared memory gets completely 
>> messed up when you split that way.
>> 
> 
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/06/15033.php

Re: [OMPI devel] trunk hangs when I specify a particular binding by rankfile

2014-06-19 Thread Gilles Gouaillardet

Ralph and Tetsuya,

is this related to the hang i reported at
http://www.open-mpi.org/community/lists/devel/2014/06/14975.php ?

Nathan already replied he is working on a fix.

Cheers,

Gilles


On 2014/06/20 11:54, Ralph Castain wrote:
> My guess is that the coll/ml component may have problems with binding a 
> single process across multiple cores like that - it might be that we'll have 
> to have it check for that condition and disqualify itself. It is a 
> particularly bad binding pattern, though, as shared memory gets completely 
> messed up when you split that way.
>

Re: [OMPI devel] trunk hangs when I specify a particular binding by rankfile

2014-06-19 Thread Ralph Castain

My guess is that the coll/ml component may have problems with binding a single 
process across multiple cores like that - it might be that we'll have to have 
it check for that condition and disqualify itself. It is a particularly bad 
binding pattern, though, as shared memory gets completely messed up when you 
split that way.


On Jun 19, 2014, at 3:57 PM, tmish...@jcity.maeda.co.jp wrote:

> 
> Hi folks,
> 
> Recently I have been seeing a hang with trunk when I specify a
> particular binding by use of rankfile or "-map-by slot".
> 
> This can be reproduced by the rankfile which allocates a process
> beyond socket boundary. For example, on the node05 which has 2 socket
> with 4 core, the rank 1 is allocated through socket 0 and 1 as shown
> below. Then it hangs in the middle of communication.
> 
> [mishima@manage trial]$ cat rankfile1
> rank 0=node05 slot=0-1
> rank 1=node05 slot=3-4
> rank 2=node05 slot=6-7
> 
> [mishima@manage trial]$ mpirun -rf rankfile1 -report-bindings demos/myprog
> [node05.cluster:02342] MCW rank 0 bound to socket 0[core 0[hwt 0]], socket
> 0[core 1[hwt 0]]: [B/B/./.][./././.]
> [node05.cluster:02342] MCW rank 1 bound to socket 0[core 3[hwt 0]], socket
> 1[core 4[hwt 0]]: [./././B][B/././.]
> [node05.cluster:02342] MCW rank 2 bound to socket 1[core 6[hwt 0]], socket
> 1[core 7[hwt 0]]: [./././.][././B/B]
> Hello world from process 2 of 3
> Hello world from process 1 of 3
> << hang here! >>
> 
> If I disable coll_ml or use 1.8 series, it works, which means it
> might be affected by coll_ml component, I guess. But, unfortunately,
> I have no idea to fix this problem. So, please somebody could resolve
> the issue.
> 
> [mishima@manage trial]$ mpirun -rf rankfile1 -report-bindings -mca
> coll_ml_priority 0 demos/myprog
> [node05.cluster:02382] MCW rank 0 bound to socket 0[core 0[hwt 0]], socket
> 0[core 1[hwt 0]]: [B/B/./.][./././.]
> [node05.cluster:02382] MCW rank 1 bound to socket 0[core 3[hwt 0]], socket
> 1[core 4[hwt 0]]: [./././B][B/././.]
> [node05.cluster:02382] MCW rank 2 bound to socket 1[core 6[hwt 0]], socket
> 1[core 7[hwt 0]]: [./././.][././B/B]
> Hello world from process 2 of 3
> Hello world from process 0 of 3
> Hello world from process 1 of 3
> 
> In addtition, when I use the host with 12 cores, "-map-by slot" causes the
> same problem.
> [mishima@manage trial]$ mpirun -np 3 -map-by slot:pe=4 -report-bindings
> demos/myprog
> [manage.cluster:02557] MCW rank 0 bound to socket 0[core 0[hwt 0]], socket
> 0[core 1[hwt 0]], socket 0[core 2[hwt 0]], so
> cket 0[core 3[hwt 0]]: [B/B/B/B/./.][./././././.]
> [manage.cluster:02557] MCW rank 1 bound to socket 0[core 4[hwt 0]], socket
> 0[core 5[hwt 0]], socket 1[core 6[hwt 0]], so
> cket 1[core 7[hwt 0]]: [././././B/B][B/B/./././.]
> [manage.cluster:02557] MCW rank 2 bound to socket 1[core 8[hwt 0]], socket
> 1[core 9[hwt 0]], socket 1[core 10[hwt 0]], s
> ocket 1[core 11[hwt 0]]: [./././././.][././B/B/B/B]
> Hello world from process 1 of 3
> Hello world from process 2 of 3
> << hang here! >>
> 
> Regards,
> Tetsuya Mishima
> 
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/06/15030.php

[OMPI devel] trunk hangs when I specify a particular binding by rankfile

2014-06-19 Thread tmishima


Hi folks,

Recently I have been seeing a hang with trunk when I specify a
particular binding by use of rankfile or "-map-by slot".

This can be reproduced by the rankfile which allocates a process
beyond socket boundary. For example, on the node05 which has 2 socket
with 4 core, the rank 1 is allocated through socket 0 and 1 as shown
below. Then it hangs in the middle of communication.

[mishima@manage trial]$ cat rankfile1
rank 0=node05 slot=0-1
rank 1=node05 slot=3-4
rank 2=node05 slot=6-7

[mishima@manage trial]$ mpirun -rf rankfile1 -report-bindings demos/myprog
[node05.cluster:02342] MCW rank 0 bound to socket 0[core 0[hwt 0]], socket
0[core 1[hwt 0]]: [B/B/./.][./././.]
[node05.cluster:02342] MCW rank 1 bound to socket 0[core 3[hwt 0]], socket
1[core 4[hwt 0]]: [./././B][B/././.]
[node05.cluster:02342] MCW rank 2 bound to socket 1[core 6[hwt 0]], socket
1[core 7[hwt 0]]: [./././.][././B/B]
Hello world from process 2 of 3
Hello world from process 1 of 3
<< hang here! >>

If I disable coll_ml or use 1.8 series, it works, which means it
might be affected by coll_ml component, I guess. But, unfortunately,
I have no idea to fix this problem. So, please somebody could resolve
the issue.

[mishima@manage trial]$ mpirun -rf rankfile1 -report-bindings -mca
coll_ml_priority 0 demos/myprog
[node05.cluster:02382] MCW rank 0 bound to socket 0[core 0[hwt 0]], socket
0[core 1[hwt 0]]: [B/B/./.][./././.]
[node05.cluster:02382] MCW rank 1 bound to socket 0[core 3[hwt 0]], socket
1[core 4[hwt 0]]: [./././B][B/././.]
[node05.cluster:02382] MCW rank 2 bound to socket 1[core 6[hwt 0]], socket
1[core 7[hwt 0]]: [./././.][././B/B]
Hello world from process 2 of 3
Hello world from process 0 of 3
Hello world from process 1 of 3

In addtition, when I use the host with 12 cores, "-map-by slot" causes the
same problem.
[mishima@manage trial]$ mpirun -np 3 -map-by slot:pe=4 -report-bindings
demos/myprog
[manage.cluster:02557] MCW rank 0 bound to socket 0[core 0[hwt 0]], socket
0[core 1[hwt 0]], socket 0[core 2[hwt 0]], so
cket 0[core 3[hwt 0]]: [B/B/B/B/./.][./././././.]
[manage.cluster:02557] MCW rank 1 bound to socket 0[core 4[hwt 0]], socket
0[core 5[hwt 0]], socket 1[core 6[hwt 0]], so
cket 1[core 7[hwt 0]]: [././././B/B][B/B/./././.]
[manage.cluster:02557] MCW rank 2 bound to socket 1[core 8[hwt 0]], socket
1[core 9[hwt 0]], socket 1[core 10[hwt 0]], s
ocket 1[core 11[hwt 0]]: [./././././.][././B/B/B/B]
Hello world from process 1 of 3
Hello world from process 2 of 3
<< hang here! >>

Regards,
Tetsuya Mishima

Re: [OMPI devel] trunk hangs when I specify a particular binding by rankfile

Re: [OMPI devel] trunk hangs when I specify a particular binding by rankfile

Re: [OMPI devel] trunk hangs when I specify a particular binding by rankfile

Re: [OMPI devel] trunk hangs when I specify a particular binding by rankfile

Re: [OMPI devel] trunk hangs when I specify a particular binding by rankfile

Re: [OMPI devel] trunk hangs when I specify a particular binding by rankfile

Re: [OMPI devel] trunk hangs when I specify a particular binding by rankfile

Re: [OMPI devel] trunk hangs when I specify a particular binding by rankfile

Re: [OMPI devel] trunk hangs when I specify a particular binding by rankfile

Re: [OMPI devel] trunk hangs when I specify a particular binding by rankfile

[OMPI devel] trunk hangs when I specify a particular binding by rankfile

11 matches

Site Navigation

Mail list logo

Footer information