Re: [OMPI devel] [EXTERNAL] OpenSHMEM round 2
I see the problem. Yoda is directly calling bml_get without first checking to see if the bml_btl supports rdma operations. If you only have the tcp btl, then rdma isn't supported, the bml_get function is NULL, and you segfault. What you need to do is check for rdma, and then fall back to message-based transfers if rdma isn't available. I believe that's what our current PML's do - you can't just assume rdma (or any other support) is just present. On Aug 14, 2013, at 4:02 PM, Joshua Ladd wrote: > Thanks, Ralph. We'll have a look. Admittedly, we've done little testing with > the tcp BTL - I was under the impression that the yoda interface was capable > of working with all BTLs, seems we need more testing. For sure it works with > SM and OpenIB BTLs. > > Josh > > -Original Message- > From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Ralph Castain > Sent: Wednesday, August 14, 2013 6:13 PM > To: Open MPI Developers > Subject: Re: [OMPI devel] [EXTERNAL] OpenSHMEM round 2 > > Here's the backtrace: > > (gdb) where > #0 0x in ?? () > #1 0x7fac6b8d8921 in mca_bml_base_get (bml_btl=0x239a130, des=0x220e880) > at ../../../../ompi/mca/bml/bml.h:326 > #2 0x7fac6b8db767 in mca_spml_yoda_get (src_addr=0x601500, size=4, > dst_addr=0x7fff3b00b370, src=1) at spml_yoda.c:1091 > #3 0x7fac6f1ea56d in shmem_int_g (addr=0x601500, pe=1) at shmem_g.c:47 > #4 0x00400bc7 in main () > > On Aug 14, 2013, at 3:12 PM, Ralph Castain wrote: > >> Hmmm...well, it works fine as long as the procs are on the same node. >> However, if they are on different nodes, it segfaults: >> >> [rhc@bend002 shmem]$ shmemrun -npernode 1 ./test_shmem running on >> bend001 running on bend002 [bend001:06590] *** Process received signal >> *** [bend001:06590] Signal: Segmentation fault (11) [bend001:06590] >> Signal code: Address not mapped (1) [bend001:06590] Failing at >> address: (nil) [bend001:06590] [ 0] /lib64/libpthread.so.0() >> [0x307d40f500] [bend001:06590] *** End of error message *** >> [bend002][[62090,1],1][btl_tcp_frag.c:219:mca_btl_tcp_frag_recv] >> mca_btl_tcp_frag_recv: readv failed: Connection reset by peer (104) >> -- >> shmemrun noticed that process rank 0 with PID 6590 on node >> bend001 exited on signal 11 (Segmentation fault). >> -- >> >> >> I would have thought it should work in that situation - yes? >> >> >> On Aug 14, 2013, at 2:52 PM, Joshua Ladd wrote: >> >>> The following simple test code will exercise the following: >>> >>> start_pes() >>> >>> shmalloc() >>> >>> shmem_int_get() >>> >>> shmem_int_put() >>> >>> shmem_barrier_all() >>> >>> To compile: >>> >>> shmemcc test_shmem.c -o test_shmem >>> >>> To launch: >>> >>> shmemrun -np 2 test_shmem >>> >>> or for those who prefer to launch with SLURM >>> >>> srun -n 2 test_shmem >>> >>> Josh >>> >>> >>> -Original Message- >>> From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Ralph >>> Castain >>> Sent: Wednesday, August 14, 2013 5:32 PM >>> To: Open MPI Developers >>> Subject: Re: [OMPI devel] [EXTERNAL] OpenSHMEM round 2 >>> >>> Can you point me to a test program that would exercise it? I'd like to give >>> it a try first. >>> >>> I'm okay with on by default as it builds its own separate library, >>> and with the RFC >>> >>> On Aug 14, 2013, at 2:03 PM, "Barrett, Brian W" wrote: >>> Josh - In general, I don't have a strong opinion of whether OpenSHMEM is on by default or not. It might cause unexpected behavior for some users (like on Crays, where one should really use Cray's SHMEM), but maybe it's better on other platforms. I also would have no objection to the RFC, provided the segfaults I found get resolved. Brian On 8/14/13 2:08 PM, "Joshua Ladd" wrote: > Ralph, and Brian > > Thanks a bunch for taking the time to review this. It is extremely > helpful. Let me comment of the building of OSHMEM and solicit some > feedback from you guys (along with the rest of the community.) > Originally we had planned to enable OSHMEM to build only if > '--with-oshmem' flag was passed at configure time. However, > (unbeknownst to me) this behavior was changed and now OSHMEM is built by > default, i.e. > yes, Ralph this is the intended behavior now. I am wondering if > this is such a good idea. Do folks have a strong opinion on this > one way or the other? From my perspective I can see arguments for > both sides of the coin. > > Other than cleaning up warnings and resolving the segfault that > Brian observed are we on a good course to getting this upstream? Is > it reasonable to file an RFC for three weeks out? > > Josh > > -Original Messag
Re: [OMPI devel] [EXTERNAL] OpenSHMEM round 2
Maybe this is a stupid question, but in this case (I believe this goes all the way back to our initial discussion on OSHMEM), how does one fall back onto send/recv semantics when the call is made at the SHMEM level to do a put? If a BTL doesn't support RDMA, then it doesn't seem reasonable to expect OSHMEM to support it through YODA. It seems more reasonable to check whether or not the bml_get is NULL and if this is the case, then one must disqualify YODA and hence SHMEM. How can you support put /get SHMEM semantics without an RDMA equipped BTL? Does it even make sense to try to emulate that behavior? I know the SHMEM developers have been going round in circles on this, so any insight you could provide would be greatly appreciated. Josh -Original Message- From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Ralph Castain Sent: Thursday, August 15, 2013 11:55 AM To: Open MPI Developers Subject: Re: [OMPI devel] [EXTERNAL] OpenSHMEM round 2 I see the problem. Yoda is directly calling bml_get without first checking to see if the bml_btl supports rdma operations. If you only have the tcp btl, then rdma isn't supported, the bml_get function is NULL, and you segfault. What you need to do is check for rdma, and then fall back to message-based transfers if rdma isn't available. I believe that's what our current PML's do - you can't just assume rdma (or any other support) is just present. On Aug 14, 2013, at 4:02 PM, Joshua Ladd wrote: > Thanks, Ralph. We'll have a look. Admittedly, we've done little testing with > the tcp BTL - I was under the impression that the yoda interface was capable > of working with all BTLs, seems we need more testing. For sure it works with > SM and OpenIB BTLs. > > Josh > > -Original Message- > From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Ralph > Castain > Sent: Wednesday, August 14, 2013 6:13 PM > To: Open MPI Developers > Subject: Re: [OMPI devel] [EXTERNAL] OpenSHMEM round 2 > > Here's the backtrace: > > (gdb) where > #0 0x in ?? () > #1 0x7fac6b8d8921 in mca_bml_base_get (bml_btl=0x239a130, > des=0x220e880) at ../../../../ompi/mca/bml/bml.h:326 > #2 0x7fac6b8db767 in mca_spml_yoda_get (src_addr=0x601500, > size=4, dst_addr=0x7fff3b00b370, src=1) at spml_yoda.c:1091 > #3 0x7fac6f1ea56d in shmem_int_g (addr=0x601500, pe=1) at > shmem_g.c:47 > #4 0x00400bc7 in main () > > On Aug 14, 2013, at 3:12 PM, Ralph Castain wrote: > >> Hmmm...well, it works fine as long as the procs are on the same node. >> However, if they are on different nodes, it segfaults: >> >> [rhc@bend002 shmem]$ shmemrun -npernode 1 ./test_shmem running on >> bend001 running on bend002 [bend001:06590] *** Process received >> signal >> *** [bend001:06590] Signal: Segmentation fault (11) [bend001:06590] >> Signal code: Address not mapped (1) [bend001:06590] Failing at >> address: (nil) [bend001:06590] [ 0] /lib64/libpthread.so.0() >> [0x307d40f500] [bend001:06590] *** End of error message *** >> [bend002][[62090,1],1][btl_tcp_frag.c:219:mca_btl_tcp_frag_recv] >> mca_btl_tcp_frag_recv: readv failed: Connection reset by peer (104) >> - >> - >> shmemrun noticed that process rank 0 with PID 6590 on node >> bend001 exited on signal 11 (Segmentation fault). >> - >> - >> >> >> I would have thought it should work in that situation - yes? >> >> >> On Aug 14, 2013, at 2:52 PM, Joshua Ladd wrote: >> >>> The following simple test code will exercise the following: >>> >>> start_pes() >>> >>> shmalloc() >>> >>>
Re: [OMPI devel] [EXTERNAL] OpenSHMEM round 2
You can always implement a put/get using send/recv semantics. The performance drops, but the functionality is the same - after all, it's still nothing but data movement, and ultimately the application doesn't care how the data got there. For a first-cut, you can certainly do a clean abort if rdma isn't supported. However, that's pretty limiting as most systems out there don't have that capability. You may not be any worse than other shmem implementations, but it does raise the question of what value was derived from building it on top of OMPI. After all, one of our principles has always been to run anywhere. Not a killer, but I would think it has limited value as-is. On Aug 15, 2013, at 9:06 AM, Joshua Ladd wrote: > Maybe this is a stupid question, but in this case (I believe this goes all > the way back to our initial discussion on OSHMEM), how does one fall back > onto send/recv semantics when the call is made at the SHMEM level to do a > put? If a BTL doesn't support RDMA, then it doesn't seem reasonable to expect > OSHMEM to support it through YODA. It seems more reasonable to check whether > or not the bml_get is NULL and if this is the case, then one must disqualify > YODA and hence SHMEM. How can you support put /get SHMEM semantics without an > RDMA equipped BTL? Does it even make sense to try to emulate that behavior? I > know the SHMEM developers have been going round in circles on this, so any > insight you could provide would be greatly appreciated. > > Josh > > > > > > > > > > > > > > > -Original Message- > From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Ralph Castain > Sent: Thursday, August 15, 2013 11:55 AM > To: Open MPI Developers > Subject: Re: [OMPI devel] [EXTERNAL] OpenSHMEM round 2 > > I see the problem. Yoda is directly calling bml_get without first checking to > see if the bml_btl supports rdma operations. If you only have the tcp btl, > then rdma isn't supported, the bml_get function is NULL, and you segfault. > > What you need to do is check for rdma, and then fall back to message-based > transfers if rdma isn't available. I believe that's what our current PML's do > - you can't just assume rdma (or any other support) is just present. > > > On Aug 14, 2013, at 4:02 PM, Joshua Ladd wrote: > >> Thanks, Ralph. We'll have a look. Admittedly, we've done little testing >> with the tcp BTL - I was under the impression that the yoda interface was >> capable of working with all BTLs, seems we need more testing. For sure it >> works with SM and OpenIB BTLs. >> >> Josh >> >> -Original Message- >> From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Ralph >> Castain >> Sent: Wednesday, August 14, 2013 6:13 PM >> To: Open MPI Developers >> Subject: Re: [OMPI devel] [EXTERNAL] OpenSHMEM round 2 >> >> Here's the backtrace: >> >> (gdb) where >> #0 0x in ?? () >> #1 0x7fac6b8d8921 in mca_bml_base_get (bml_btl=0x239a130, >> des=0x220e880) at ../../../../ompi/mca/bml/bml.h:326 >> #2 0x7fac6b8db767 in mca_spml_yoda_get (src_addr=0x601500, >> size=4, dst_addr=0x7fff3b00b370, src=1) at spml_yoda.c:1091 >> #3 0x7fac6f1ea56d in shmem_int_g (addr=0x601500, pe=1) at >> shmem_g.c:47 >> #4 0x00400bc7 in main () >> >> On Aug 14, 2013, at 3:12 PM, Ralph Castain wrote: >> >>> Hmmm...well, it works fine as long as the procs are on the same node. >>> However, if they are on different nodes, it segfaults: >>> >>> [rhc@bend002 shmem]$ shmemrun -npernode 1 ./test_shmem running on >>> bend001 running on bend002 [bend001:06590] *** Process received >>> signal >>> *** [bend001:06590] Signal: Segmentation fault (11) [bend001:06590] >>> Signal code: Address not mapped (1) [bend001:06590] Failing at >>> address: (nil) [bend001:06590] [ 0] /lib64/libpthread.so.0() >>> [0x307d40f500] [bend001:06590] *** End of error message *** >>> [bend002][[62090,1],1][btl_tcp_frag.c:219:mca_btl_tcp_frag_recv] >>> mca_btl_tcp_
Re: [OMPI devel] [EXTERNAL] OpenSHMEM round 2
On Aug 15, 2013, at 18:06 , Joshua Ladd wrote: > Maybe this is a stupid question, but in this case (I believe this goes all > the way back to our initial discussion on OSHMEM), how does one fall back > onto send/recv semantics when the call is made at the SHMEM level to do a put? The same way our current OSC (one-sided component) is falling back on the pt2pt component when no underlying BTL supports RDMA operation. > If a BTL doesn't support RDMA, then it doesn't seem reasonable to expect > OSHMEM to support it through YODA. It seems more reasonable to check whether > or not the bml_get is NULL and if this is the case, then one must disqualify > YODA and hence SHMEM. How can you support put /get SHMEM semantics without an > RDMA equipped BTL? Does it even make sense to try to emulate that behavior? I > know the SHMEM developers have been going round in circles on this, so any > insight you could provide would be greatly appreciated. If you want to provide SHMEM for single machine runs (for development purposes as an example) you will have to provide SHMEM on top of BTLs without RMA support. Our current SM BTL doesn't support RMA operations if KNEM or CMA are not available. Thus you will disqualify all machines without CMA/KNEM support as development machines for SHMEM based application (including all Mac OS X laptops). George. > > Josh > > > > > > > > > > > > > > > -Original Message- > From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Ralph Castain > Sent: Thursday, August 15, 2013 11:55 AM > To: Open MPI Developers > Subject: Re: [OMPI devel] [EXTERNAL] OpenSHMEM round 2 > > I see the problem. Yoda is directly calling bml_get without first checking to > see if the bml_btl supports rdma operations. If you only have the tcp btl, > then rdma isn't supported, the bml_get function is NULL, and you segfault. > > What you need to do is check for rdma, and then fall back to message-based > transfers if rdma isn't available. I believe that's what our current PML's do > - you can't just assume rdma (or any other support) is just present. > > > On Aug 14, 2013, at 4:02 PM, Joshua Ladd wrote: > >> Thanks, Ralph. We'll have a look. Admittedly, we've done little testing >> with the tcp BTL - I was under the impression that the yoda interface was >> capable of working with all BTLs, seems we need more testing. For sure it >> works with SM and OpenIB BTLs. >> >> Josh >> >> -Original Message- >> From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Ralph >> Castain >> Sent: Wednesday, August 14, 2013 6:13 PM >> To: Open MPI Developers >> Subject: Re: [OMPI devel] [EXTERNAL] OpenSHMEM round 2 >> >> Here's the backtrace: >> >> (gdb) where >> #0 0x in ?? () >> #1 0x7fac6b8d8921 in mca_bml_base_get (bml_btl=0x239a130, >> des=0x220e880) at ../../../../ompi/mca/bml/bml.h:326 >> #2 0x7fac6b8db767 in mca_spml_yoda_get (src_addr=0x601500, >> size=4, dst_addr=0x7fff3b00b370, src=1) at spml_yoda.c:1091 >> #3 0x7fac6f1ea56d in shmem_int_g (addr=0x601500, pe=1) at >> shmem_g.c:47 >> #4 0x00400bc7 in main () >> >> On Aug 14, 2013, at 3:12 PM, Ralph Castain wrote: >> >>> Hmmm...well, it works fine as long as the procs are on the same node. >>> However, if they are on different nodes, it segfaults: >>> >>> [rhc@bend002 shmem]$ shmemrun -npernode 1 ./test_shmem running on >>> bend001 running on bend002 [bend001:06590] *** Process received >>> signal >>> *** [bend001:06590] Signal: Segmentation fault (11) [bend001:06590] >>> Signal code: Address not mapped (1) [bend001:06590] Failing at >>> address: (nil) [bend001:06590] [ 0] /lib64/libpthread.so.0() >>> [0x307d40f500] [bend001:06590] *** End of error message *** >>> [bend002][[62090,1],1][btl_tcp_frag.c:219:mca_btl_tcp_frag_recv] >>> mca_btl_tcp_frag_recv: readv failed: Connection reset by peer (104) >>> - >>> -
Re: [OMPI devel] [EXTERNAL] OpenSHMEM round 2
On 8/15/13 10:30 AM, "George Bosilca" wrote: > >On Aug 15, 2013, at 18:06 , Joshua Ladd wrote: > >> Maybe this is a stupid question, but in this case (I believe this goes >>all the way back to our initial discussion on OSHMEM), how does one fall >>back onto send/recv semantics when the call is made at the SHMEM level >>to do a put? > >The same way our current OSC (one-sided component) is falling back on the >pt2pt component when no underlying BTL supports RDMA operation. in general, I agree with everything George said. Except that if the OSC rdma component finds BTLs that don't support RDMA, it falls back to the AM interface. The pt2pt component is only used when there are no BTLs (such as when the CM PML is used). Which, by the way, is a case that OSHMEM should figure out how to deal with (if only by initializing the BML/BTLs) Brian -- Brian W. Barrett Scalable System Software Group Sandia National Laboratories smime.p7s Description: S/MIME cryptographic signature