Re: [OMPI devel] btl/vader: race condition in finalize on OS X

2018-10-02 Thread Ralph H Castain
We already have the register_cleanup option in master - are you using an older 
version of PMIx that doesn’t support it?


> On Oct 2, 2018, at 4:05 AM, Jeff Squyres (jsquyres) via devel 
>  wrote:
> 
> FYI: https://github.com/open-mpi/ompi/issues/5798 brought up what may be the 
> same issue.
> 
> 
>> On Oct 2, 2018, at 3:16 AM, Gilles Gouaillardet  wrote:
>> 
>> Folks,
>> 
>> 
>> When running a simple helloworld program on OS X, we can end up with the 
>> following error message
>> 
>> 
>> A system call failed during shared memory initialization that should
>> not have.  It is likely that your MPI job will now either abort or
>> experience performance degradation.
>> 
>>  Local host:  c7.kmc.kobe.rist.or.jp
>>  System call: unlink(2) 
>> /tmp/ompi.c7.1000/pid.23376/1/vader_segment.c7.17d80001.54
>>  Error:   No such file or directory (errno 2)
>> 
>> 
>> the error does not occur on linux by default since the vader segment is in 
>> /dev/shm by default.
>> 
>> the patch below can be used to evidence the issue on linux
>> 
>> 
>> diff --git a/opal/mca/btl/vader/btl_vader_component.c 
>> b/opal/mca/btl/vader/btl_vader_component.c
>> index 115bceb..80fec05 100644
>> --- a/opal/mca/btl/vader/btl_vader_component.c
>> +++ b/opal/mca/btl/vader/btl_vader_component.c
>> @@ -204,7 +204,7 @@ static int mca_btl_vader_component_register (void)
>>OPAL_INFO_LVL_3, 
>> MCA_BASE_VAR_SCOPE_GROUP, _btl_vader_component.single_copy_mechanism);
>> OBJ_RELEASE(new_enum);
>> 
>> -if (0 == access ("/dev/shm", W_OK)) {
>> +if (0 && 0 == access ("/dev/shm", W_OK)) {
>> mca_btl_vader_component.backing_directory = "/dev/shm";
>> } else {
>> mca_btl_vader_component.backing_directory = 
>> opal_process_info.job_session_dir;
>> 
>> 
>> From my analysis, here is what happens :
>> 
>> - each rank is supposed to have its own vader_segment unlinked by btl/vader 
>> in vader_finalize().
>> 
>> - but this file might have already been destroyed by an other task in 
>> orte_ess_base_app_finalize()
>> 
>>  if (NULL == opal_pmix.register_cleanup) {
>>orte_session_dir_finalize(ORTE_PROC_MY_NAME);
>>}
>> 
>>  *all* the tasks end up removing 
>> opal_os_dirpath_destroy("/tmp/ompi.c7.1000/pid.23941/1")
>> 
>> 
>> I am not really sure about the best way to fix this.
>> 
>> - one option is to perform an intra node barrier in vader_finalize()
>> 
>> - an other option would be to implement an opal_pmix.register_cleanup
>> 
>> 
>> Any thoughts ?
>> 
>> 
>> Cheers,
>> 
>> 
>> Gilles
>> 
>> ___
>> devel mailing list
>> devel@lists.open-mpi.org
>> https://lists.open-mpi.org/mailman/listinfo/devel
> 
> 
> -- 
> Jeff Squyres
> jsquy...@cisco.com
> 
> ___
> devel mailing list
> devel@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/devel

___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel

Re: [OMPI devel] btl/vader: race condition in finalize on OS X

2018-10-02 Thread Jeff Squyres (jsquyres) via devel
FYI: https://github.com/open-mpi/ompi/issues/5798 brought up what may be the 
same issue.


> On Oct 2, 2018, at 3:16 AM, Gilles Gouaillardet  wrote:
> 
> Folks,
> 
> 
> When running a simple helloworld program on OS X, we can end up with the 
> following error message
> 
> 
> A system call failed during shared memory initialization that should
> not have.  It is likely that your MPI job will now either abort or
> experience performance degradation.
> 
>   Local host:  c7.kmc.kobe.rist.or.jp
>   System call: unlink(2) 
> /tmp/ompi.c7.1000/pid.23376/1/vader_segment.c7.17d80001.54
>   Error:   No such file or directory (errno 2)
> 
> 
> the error does not occur on linux by default since the vader segment is in 
> /dev/shm by default.
> 
> the patch below can be used to evidence the issue on linux
> 
> 
> diff --git a/opal/mca/btl/vader/btl_vader_component.c 
> b/opal/mca/btl/vader/btl_vader_component.c
> index 115bceb..80fec05 100644
> --- a/opal/mca/btl/vader/btl_vader_component.c
> +++ b/opal/mca/btl/vader/btl_vader_component.c
> @@ -204,7 +204,7 @@ static int mca_btl_vader_component_register (void)
> OPAL_INFO_LVL_3, 
> MCA_BASE_VAR_SCOPE_GROUP, _btl_vader_component.single_copy_mechanism);
>  OBJ_RELEASE(new_enum);
> 
> -if (0 == access ("/dev/shm", W_OK)) {
> +if (0 && 0 == access ("/dev/shm", W_OK)) {
>  mca_btl_vader_component.backing_directory = "/dev/shm";
>  } else {
>  mca_btl_vader_component.backing_directory = 
> opal_process_info.job_session_dir;
> 
> 
> From my analysis, here is what happens :
> 
>  - each rank is supposed to have its own vader_segment unlinked by btl/vader 
> in vader_finalize().
> 
> - but this file might have already been destroyed by an other task in 
> orte_ess_base_app_finalize()
> 
>   if (NULL == opal_pmix.register_cleanup) {
> orte_session_dir_finalize(ORTE_PROC_MY_NAME);
> }
> 
>   *all* the tasks end up removing 
> opal_os_dirpath_destroy("/tmp/ompi.c7.1000/pid.23941/1")
> 
> 
> I am not really sure about the best way to fix this.
> 
>  - one option is to perform an intra node barrier in vader_finalize()
> 
>  - an other option would be to implement an opal_pmix.register_cleanup
> 
> 
> Any thoughts ?
> 
> 
> Cheers,
> 
> 
> Gilles
> 
> ___
> devel mailing list
> devel@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/devel


-- 
Jeff Squyres
jsquy...@cisco.com

___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel


[OMPI devel] btl/vader: race condition in finalize on OS X

2018-10-02 Thread Gilles Gouaillardet

Folks,


When running a simple helloworld program on OS X, we can end up with the 
following error message



A system call failed during shared memory initialization that should
not have.  It is likely that your MPI job will now either abort or
experience performance degradation.

  Local host:  c7.kmc.kobe.rist.or.jp
  System call: unlink(2) 
/tmp/ompi.c7.1000/pid.23376/1/vader_segment.c7.17d80001.54

  Error:   No such file or directory (errno 2)


the error does not occur on linux by default since the vader segment is 
in /dev/shm by default.


the patch below can be used to evidence the issue on linux


diff --git a/opal/mca/btl/vader/btl_vader_component.c 
b/opal/mca/btl/vader/btl_vader_component.c

index 115bceb..80fec05 100644
--- a/opal/mca/btl/vader/btl_vader_component.c
+++ b/opal/mca/btl/vader/btl_vader_component.c
@@ -204,7 +204,7 @@ static int mca_btl_vader_component_register (void)
    OPAL_INFO_LVL_3, 
MCA_BASE_VAR_SCOPE_GROUP, _btl_vader_component.single_copy_mechanism);

 OBJ_RELEASE(new_enum);

-    if (0 == access ("/dev/shm", W_OK)) {
+    if (0 && 0 == access ("/dev/shm", W_OK)) {
 mca_btl_vader_component.backing_directory = "/dev/shm";
 } else {
 mca_btl_vader_component.backing_directory = 
opal_process_info.job_session_dir;



From my analysis, here is what happens :

 - each rank is supposed to have its own vader_segment unlinked by 
btl/vader in vader_finalize().


- but this file might have already been destroyed by an other task in 
orte_ess_base_app_finalize()


      if (NULL == opal_pmix.register_cleanup) {
    orte_session_dir_finalize(ORTE_PROC_MY_NAME);
    }

  *all* the tasks end up removing 
opal_os_dirpath_destroy("/tmp/ompi.c7.1000/pid.23941/1")



I am not really sure about the best way to fix this.

 - one option is to perform an intra node barrier in vader_finalize()

 - an other option would be to implement an opal_pmix.register_cleanup


Any thoughts ?


Cheers,


Gilles

___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel