Re: [OMPI devel] [EXTERNAL] OpenSHMEM round 2

2013-08-15 Thread Ralph Castain
I see the problem. Yoda is directly calling bml_get without first checking to 
see if the bml_btl supports rdma operations. If you only have the tcp btl, then 
rdma isn't supported, the bml_get function is NULL, and you segfault.

What you need to do is check for rdma, and then fall back to message-based 
transfers if rdma isn't available. I believe that's what our current PML's do - 
you can't just assume rdma (or any other support) is just present.


On Aug 14, 2013, at 4:02 PM, Joshua Ladd  wrote:

> Thanks, Ralph. We'll have a look.  Admittedly, we've done little testing with 
> the tcp BTL - I was under the impression that the yoda interface was capable 
> of working with all BTLs, seems we need more testing. For sure it works with 
> SM and OpenIB BTLs. 
> 
> Josh
> 
> -Original Message-
> From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Ralph Castain
> Sent: Wednesday, August 14, 2013 6:13 PM
> To: Open MPI Developers
> Subject: Re: [OMPI devel] [EXTERNAL] OpenSHMEM round 2
> 
> Here's the backtrace:
> 
> (gdb) where
> #0  0x in ?? ()
> #1  0x7fac6b8d8921 in mca_bml_base_get (bml_btl=0x239a130, des=0x220e880) 
> at ../../../../ompi/mca/bml/bml.h:326
> #2  0x7fac6b8db767 in mca_spml_yoda_get (src_addr=0x601500, size=4, 
> dst_addr=0x7fff3b00b370, src=1) at spml_yoda.c:1091
> #3  0x7fac6f1ea56d in shmem_int_g (addr=0x601500, pe=1) at shmem_g.c:47
> #4  0x00400bc7 in main ()
> 
> On Aug 14, 2013, at 3:12 PM, Ralph Castain  wrote:
> 
>> Hmmm...well, it works fine as long as the procs are on the same node. 
>> However, if they are on different nodes, it segfaults:
>> 
>> [rhc@bend002 shmem]$ shmemrun -npernode 1 ./test_shmem running on 
>> bend001 running on bend002 [bend001:06590] *** Process received signal 
>> *** [bend001:06590] Signal: Segmentation fault (11) [bend001:06590] 
>> Signal code: Address not mapped (1) [bend001:06590] Failing at 
>> address: (nil) [bend001:06590] [ 0] /lib64/libpthread.so.0() 
>> [0x307d40f500] [bend001:06590] *** End of error message *** 
>> [bend002][[62090,1],1][btl_tcp_frag.c:219:mca_btl_tcp_frag_recv] 
>> mca_btl_tcp_frag_recv: readv failed: Connection reset by peer (104)
>> --
>>  shmemrun noticed that process rank 0 with PID 6590 on node 
>> bend001 exited on signal 11 (Segmentation fault).
>> --
>> 
>> 
>> I would have thought it should work in that situation - yes?
>> 
>> 
>> On Aug 14, 2013, at 2:52 PM, Joshua Ladd  wrote:
>> 
>>> The following simple test code will exercise the following:
>>> 
>>> start_pes()
>>> 
>>> shmalloc()
>>> 
>>> shmem_int_get()
>>> 
>>> shmem_int_put()
>>> 
>>> shmem_barrier_all()
>>> 
>>> To compile:
>>> 
>>> shmemcc test_shmem.c -o test_shmem
>>> 
>>> To launch:
>>> 
>>> shmemrun -np 2  test_shmem
>>> 
>>> or for those who prefer to launch with SLURM
>>> 
>>> srun -n 2 test_shmem
>>> 
>>> Josh
>>> 
>>> 
>>> -Original Message-
>>> From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Ralph 
>>> Castain
>>> Sent: Wednesday, August 14, 2013 5:32 PM
>>> To: Open MPI Developers
>>> Subject: Re: [OMPI devel] [EXTERNAL] OpenSHMEM round 2
>>> 
>>> Can you point me to a test program that would exercise it? I'd like to give 
>>> it a try first.
>>> 
>>> I'm okay with on by default as it builds its own separate library, 
>>> and with the RFC
>>> 
>>> On Aug 14, 2013, at 2:03 PM, "Barrett, Brian W"  wrote:
>>> 
 Josh -
 
 In general, I don't have a strong opinion of whether OpenSHMEM is on 
 by default or not.  It might cause unexpected behavior for some 
 users (like on Crays, where one should really use Cray's SHMEM), but 
 maybe it's better on other platforms.
 
 I also would have no objection to the RFC, provided the segfaults I 
 found get resolved.
 
 Brian
 
 On 8/14/13 2:08 PM, "Joshua Ladd"  wrote:
 
> Ralph, and Brian
> 
> Thanks a bunch for taking the time to review this. It is extremely 
> helpful. Let me comment of the building of OSHMEM and solicit some 
> feedback from you guys (along with the rest of the community.) 
> Originally we had planned to enable OSHMEM to build only if 
> '--with-oshmem' flag was passed at configure time. However, 
> (unbeknownst to me) this behavior was changed and now OSHMEM is built by 
> default, i.e.
> yes, Ralph this is the intended behavior now. I am wondering if 
> this is such a good idea. Do folks have a strong opinion on this 
> one way or the other? From my perspective I can see arguments for 
> both sides of the coin.
> 
> Other than cleaning up warnings and resolving the segfault that 
> Brian observed are we on a good course to getting this upstream? Is 
> it reasonable to file an RFC for three weeks out?
> 
> Josh
> 
> -Original Messag

Re: [OMPI devel] [EXTERNAL] OpenSHMEM round 2

2013-08-15 Thread Joshua Ladd
Maybe this is a stupid question, but in this case (I believe this goes all the 
way back to our initial discussion on OSHMEM), how does one fall back onto 
send/recv semantics when the call is made at the SHMEM level to do a put? If a 
BTL doesn't support RDMA, then it doesn't seem reasonable to expect OSHMEM to 
support it through YODA. It seems more reasonable to check whether or not the 
bml_get is NULL and if this is the case, then one must disqualify YODA and 
hence SHMEM. How can you support put /get SHMEM semantics without an RDMA 
equipped BTL? Does it even make sense to try to emulate that behavior? I know 
the SHMEM developers have been going round in circles on this, so any insight 
you could provide would be greatly appreciated.

Josh














 

-Original Message-
From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Ralph Castain
Sent: Thursday, August 15, 2013 11:55 AM
To: Open MPI Developers
Subject: Re: [OMPI devel] [EXTERNAL] OpenSHMEM round 2

I see the problem. Yoda is directly calling bml_get without first checking to 
see if the bml_btl supports rdma operations. If you only have the tcp btl, then 
rdma isn't supported, the bml_get function is NULL, and you segfault.

What you need to do is check for rdma, and then fall back to message-based 
transfers if rdma isn't available. I believe that's what our current PML's do - 
you can't just assume rdma (or any other support) is just present.


On Aug 14, 2013, at 4:02 PM, Joshua Ladd  wrote:

> Thanks, Ralph. We'll have a look.  Admittedly, we've done little testing with 
> the tcp BTL - I was under the impression that the yoda interface was capable 
> of working with all BTLs, seems we need more testing. For sure it works with 
> SM and OpenIB BTLs. 
> 
> Josh
> 
> -Original Message-
> From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Ralph 
> Castain
> Sent: Wednesday, August 14, 2013 6:13 PM
> To: Open MPI Developers
> Subject: Re: [OMPI devel] [EXTERNAL] OpenSHMEM round 2
> 
> Here's the backtrace:
> 
> (gdb) where
> #0  0x in ?? ()
> #1  0x7fac6b8d8921 in mca_bml_base_get (bml_btl=0x239a130, 
> des=0x220e880) at ../../../../ompi/mca/bml/bml.h:326
> #2  0x7fac6b8db767 in mca_spml_yoda_get (src_addr=0x601500, 
> size=4, dst_addr=0x7fff3b00b370, src=1) at spml_yoda.c:1091
> #3  0x7fac6f1ea56d in shmem_int_g (addr=0x601500, pe=1) at 
> shmem_g.c:47
> #4  0x00400bc7 in main ()
> 
> On Aug 14, 2013, at 3:12 PM, Ralph Castain  wrote:
> 
>> Hmmm...well, it works fine as long as the procs are on the same node. 
>> However, if they are on different nodes, it segfaults:
>> 
>> [rhc@bend002 shmem]$ shmemrun -npernode 1 ./test_shmem running on
>> bend001 running on bend002 [bend001:06590] *** Process received 
>> signal
>> *** [bend001:06590] Signal: Segmentation fault (11) [bend001:06590] 
>> Signal code: Address not mapped (1) [bend001:06590] Failing at
>> address: (nil) [bend001:06590] [ 0] /lib64/libpthread.so.0() 
>> [0x307d40f500] [bend001:06590] *** End of error message *** 
>> [bend002][[62090,1],1][btl_tcp_frag.c:219:mca_btl_tcp_frag_recv]
>> mca_btl_tcp_frag_recv: readv failed: Connection reset by peer (104)
>> -
>> -
>>  shmemrun noticed that process rank 0 with PID 6590 on node
>> bend001 exited on signal 11 (Segmentation fault).
>> -
>> -
>> 
>> 
>> I would have thought it should work in that situation - yes?
>> 
>> 
>> On Aug 14, 2013, at 2:52 PM, Joshua Ladd  wrote:
>> 
>>> The following simple test code will exercise the following:
>>> 
>>> start_pes()
>>> 
>>> shmalloc()
>>> 
>>> 

Re: [OMPI devel] [EXTERNAL] OpenSHMEM round 2

2013-08-15 Thread Ralph Castain
You can always implement a put/get using send/recv semantics. The performance 
drops, but the functionality is the same - after all, it's still nothing but 
data movement, and ultimately the application doesn't care how the data got 
there.

For a first-cut, you can certainly do a clean abort if rdma isn't supported. 
However, that's pretty limiting as most systems out there don't have that 
capability. You may not be any worse than other shmem implementations, but it 
does raise the question of what value was derived from building it on top of 
OMPI. After all, one of our principles has always been to run anywhere.

Not a killer, but I would think it has limited value as-is.


On Aug 15, 2013, at 9:06 AM, Joshua Ladd  wrote:

> Maybe this is a stupid question, but in this case (I believe this goes all 
> the way back to our initial discussion on OSHMEM), how does one fall back 
> onto send/recv semantics when the call is made at the SHMEM level to do a 
> put? If a BTL doesn't support RDMA, then it doesn't seem reasonable to expect 
> OSHMEM to support it through YODA. It seems more reasonable to check whether 
> or not the bml_get is NULL and if this is the case, then one must disqualify 
> YODA and hence SHMEM. How can you support put /get SHMEM semantics without an 
> RDMA equipped BTL? Does it even make sense to try to emulate that behavior? I 
> know the SHMEM developers have been going round in circles on this, so any 
> insight you could provide would be greatly appreciated.
> 
> Josh  
>   
>   
>   
>   
>   
>   
>   
>   
>   
>   
>   
>   
> 
> 
> -Original Message-
> From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Ralph Castain
> Sent: Thursday, August 15, 2013 11:55 AM
> To: Open MPI Developers
> Subject: Re: [OMPI devel] [EXTERNAL] OpenSHMEM round 2
> 
> I see the problem. Yoda is directly calling bml_get without first checking to 
> see if the bml_btl supports rdma operations. If you only have the tcp btl, 
> then rdma isn't supported, the bml_get function is NULL, and you segfault.
> 
> What you need to do is check for rdma, and then fall back to message-based 
> transfers if rdma isn't available. I believe that's what our current PML's do 
> - you can't just assume rdma (or any other support) is just present.
> 
> 
> On Aug 14, 2013, at 4:02 PM, Joshua Ladd  wrote:
> 
>> Thanks, Ralph. We'll have a look.  Admittedly, we've done little testing 
>> with the tcp BTL - I was under the impression that the yoda interface was 
>> capable of working with all BTLs, seems we need more testing. For sure it 
>> works with SM and OpenIB BTLs. 
>> 
>> Josh
>> 
>> -Original Message-
>> From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Ralph 
>> Castain
>> Sent: Wednesday, August 14, 2013 6:13 PM
>> To: Open MPI Developers
>> Subject: Re: [OMPI devel] [EXTERNAL] OpenSHMEM round 2
>> 
>> Here's the backtrace:
>> 
>> (gdb) where
>> #0  0x in ?? ()
>> #1  0x7fac6b8d8921 in mca_bml_base_get (bml_btl=0x239a130, 
>> des=0x220e880) at ../../../../ompi/mca/bml/bml.h:326
>> #2  0x7fac6b8db767 in mca_spml_yoda_get (src_addr=0x601500, 
>> size=4, dst_addr=0x7fff3b00b370, src=1) at spml_yoda.c:1091
>> #3  0x7fac6f1ea56d in shmem_int_g (addr=0x601500, pe=1) at 
>> shmem_g.c:47
>> #4  0x00400bc7 in main ()
>> 
>> On Aug 14, 2013, at 3:12 PM, Ralph Castain  wrote:
>> 
>>> Hmmm...well, it works fine as long as the procs are on the same node. 
>>> However, if they are on different nodes, it segfaults:
>>> 
>>> [rhc@bend002 shmem]$ shmemrun -npernode 1 ./test_shmem running on
>>> bend001 running on bend002 [bend001:06590] *** Process received 
>>> signal
>>> *** [bend001:06590] Signal: Segmentation fault (11) [bend001:06590] 
>>> Signal code: Address not mapped (1) [bend001:06590] Failing at
>>> address: (nil) [bend001:06590] [ 0] /lib64/libpthread.so.0() 
>>> [0x307d40f500] [bend001:06590] *** End of error message *** 
>>> [bend002][[62090,1],1][btl_tcp_frag.c:219:mca_btl_tcp_frag_recv]
>>> mca_btl_tcp_

Re: [OMPI devel] [EXTERNAL] OpenSHMEM round 2

2013-08-15 Thread George Bosilca

On Aug 15, 2013, at 18:06 , Joshua Ladd  wrote:

> Maybe this is a stupid question, but in this case (I believe this goes all 
> the way back to our initial discussion on OSHMEM), how does one fall back 
> onto send/recv semantics when the call is made at the SHMEM level to do a put?

The same way our current OSC (one-sided component) is falling back on the pt2pt 
component when no underlying BTL supports RDMA operation.

> If a BTL doesn't support RDMA, then it doesn't seem reasonable to expect 
> OSHMEM to support it through YODA. It seems more reasonable to check whether 
> or not the bml_get is NULL and if this is the case, then one must disqualify 
> YODA and hence SHMEM. How can you support put /get SHMEM semantics without an 
> RDMA equipped BTL? Does it even make sense to try to emulate that behavior? I 
> know the SHMEM developers have been going round in circles on this, so any 
> insight you could provide would be greatly appreciated.

If you want to provide SHMEM for single machine runs (for development purposes 
as an example) you will have to provide SHMEM on top of BTLs without RMA 
support. Our current SM BTL doesn't support RMA operations if KNEM or CMA are 
not available. Thus you will disqualify all machines without CMA/KNEM support 
as development machines for SHMEM based application (including all Mac OS X 
laptops).

  George.

> 
> Josh  
>   
>   
>   
>   
>   
>   
>   
>   
>   
>   
>   
>   
> 
> 
> -Original Message-
> From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Ralph Castain
> Sent: Thursday, August 15, 2013 11:55 AM
> To: Open MPI Developers
> Subject: Re: [OMPI devel] [EXTERNAL] OpenSHMEM round 2
> 
> I see the problem. Yoda is directly calling bml_get without first checking to 
> see if the bml_btl supports rdma operations. If you only have the tcp btl, 
> then rdma isn't supported, the bml_get function is NULL, and you segfault.
> 
> What you need to do is check for rdma, and then fall back to message-based 
> transfers if rdma isn't available. I believe that's what our current PML's do 
> - you can't just assume rdma (or any other support) is just present.
> 
> 
> On Aug 14, 2013, at 4:02 PM, Joshua Ladd  wrote:
> 
>> Thanks, Ralph. We'll have a look.  Admittedly, we've done little testing 
>> with the tcp BTL - I was under the impression that the yoda interface was 
>> capable of working with all BTLs, seems we need more testing. For sure it 
>> works with SM and OpenIB BTLs. 
>> 
>> Josh
>> 
>> -Original Message-
>> From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Ralph 
>> Castain
>> Sent: Wednesday, August 14, 2013 6:13 PM
>> To: Open MPI Developers
>> Subject: Re: [OMPI devel] [EXTERNAL] OpenSHMEM round 2
>> 
>> Here's the backtrace:
>> 
>> (gdb) where
>> #0  0x in ?? ()
>> #1  0x7fac6b8d8921 in mca_bml_base_get (bml_btl=0x239a130, 
>> des=0x220e880) at ../../../../ompi/mca/bml/bml.h:326
>> #2  0x7fac6b8db767 in mca_spml_yoda_get (src_addr=0x601500, 
>> size=4, dst_addr=0x7fff3b00b370, src=1) at spml_yoda.c:1091
>> #3  0x7fac6f1ea56d in shmem_int_g (addr=0x601500, pe=1) at 
>> shmem_g.c:47
>> #4  0x00400bc7 in main ()
>> 
>> On Aug 14, 2013, at 3:12 PM, Ralph Castain  wrote:
>> 
>>> Hmmm...well, it works fine as long as the procs are on the same node. 
>>> However, if they are on different nodes, it segfaults:
>>> 
>>> [rhc@bend002 shmem]$ shmemrun -npernode 1 ./test_shmem running on
>>> bend001 running on bend002 [bend001:06590] *** Process received 
>>> signal
>>> *** [bend001:06590] Signal: Segmentation fault (11) [bend001:06590] 
>>> Signal code: Address not mapped (1) [bend001:06590] Failing at
>>> address: (nil) [bend001:06590] [ 0] /lib64/libpthread.so.0() 
>>> [0x307d40f500] [bend001:06590] *** End of error message *** 
>>> [bend002][[62090,1],1][btl_tcp_frag.c:219:mca_btl_tcp_frag_recv]
>>> mca_btl_tcp_frag_recv: readv failed: Connection reset by peer (104)
>>> -
>>> -

Re: [OMPI devel] [EXTERNAL] OpenSHMEM round 2

2013-08-15 Thread Barrett, Brian W
On 8/15/13 10:30 AM, "George Bosilca"  wrote:

>
>On Aug 15, 2013, at 18:06 , Joshua Ladd  wrote:
>
>> Maybe this is a stupid question, but in this case (I believe this goes
>>all the way back to our initial discussion on OSHMEM), how does one fall
>>back onto send/recv semantics when the call is made at the SHMEM level
>>to do a put?
>
>The same way our current OSC (one-sided component) is falling back on the
>pt2pt component when no underlying BTL supports RDMA operation.

in general, I agree with everything George said.  Except that if the OSC
rdma component finds BTLs that don't support RDMA, it falls back to the AM
interface.  The pt2pt component is only used when there are no BTLs (such
as when the CM PML is used).  Which, by the way, is a case that OSHMEM
should figure out how to deal with (if only by initializing the BML/BTLs)

Brian


--
  Brian W. Barrett
  Scalable System Software Group
  Sandia National Laboratories





smime.p7s
Description: S/MIME cryptographic signature