Re: [OMPI devel] Slurm direct-launch is broken on trunk

2014-10-16 Thread Ralph Castain

On Oct 15, 2014, at 9:38 PM, Gilles Gouaillardet 
 wrote:

> Ralph,
> 
> the issue occurs when "pushing" a message that is larger than 255 bytes,
> and i fixed it.

Kewl - thanks

> 
> /* i am not sure anyone broke this, and fwiw, git blames blamed you */

I did the commit from the pmix branch, so it will list me for all those lines. 
However, I only worked on the native component, so I'm not sure who might have 
worked on that section or what testing they might have done prior to the commit 
to the trunk. Hence my request.


> 
> Cheers,
> 
> Gilles
> 
> $ git show 27dcca0bb20d8f42b4d048758ef4ff14ca0d79b9
> commit 27dcca0bb20d8f42b4d048758ef4ff14ca0d79b9
> Author: Gilles Gouaillardet 
> Date:   Thu Oct 16 13:29:32 2014 +0900
> 
>pmi/s1: fix large keys
> 
>do not overwrite the PMI key when pushing a message that does
>not fit within 255 bytes
> 
> diff --git a/opal/mca/pmix/base/pmix_base_fns.c
> b/opal/mca/pmix/base/pmix_base_fns.c
> index 56609c5..56c13ba 100644
> --- a/opal/mca/pmix/base/pmix_base_fns.c
> +++ b/opal/mca/pmix/base/pmix_base_fns.c
> @@ -144,7 +144,7 @@ int opal_pmix_base_commit_packed( char*
> buffer_to_put, int data_to_put,
> for (left = strlen (encoded_data), tmp = encoded_data ; left ; ) {
> size_t value_size = vallen > left ? left : vallen - 1;
> 
> -sprintf (tmp_key, "key%d", *pack_key);
> +sprintf (tmp_key, "key%d", pkey);
> 
> if (NULL == (pmikey = setup_key(_PROC_MY_NAME, tmp_key,
> vallen))) {
> OPAL_ERROR_LOG(OPAL_ERR_BAD_PARAM);
> 
> 
> On 2014/10/16 3:33, Ralph Castain wrote:
>> When attempting to launch via srun:
>> 
>> [bend001:03708] pmix: get all keys for proc 327680 in KVS 5.0
>> [bend001:03708] GETTING KEY 327680-key0
>> [bend001:03708] Read data 
>> AcG1peC5obmFtZQAwMwAwMDA4AGJlbmQwMDEAcG1peC5scmFuawAwZAAwMDAycG1peC5ucmFuawAwZAAwMA
>>   -
>> [bend001:03708] UNSUPPORTED TYPE 0
>> [bend001:03708] OPAL ERROR: Error in file pmix_s1.c at line 458
>> [bend001:03709] [[5,0],2] pmix:s1 barrier complete
>> [bend001:03709] pmix: get all keys for proc 327680 in KVS 5.0
>> [bend001:03709] GETTING KEY 327680-key0
>> [bend001:03709] Read data 
>> AcG1peC5obmFtZQAwMwAwMDA4AGJlbmQwMDEAcG1peC5scmFuawAwZAAwMDAycG1peC5ucmFuawAwZAAwMA
>>   -
>> [bend001:03709] UNSUPPORTED TYPE 0
>> [bend001:03709] OPAL ERROR: Error in file pmix_s1.c at line 458
>> [bend001:03708] [[5,0],1] pmix:s1 called get for key pmix.hname
>> [bend001:03708] pmix: get all keys for proc 327680 in KVS 5.0
>> [bend001:03708] GETTING KEY 327680-key0
>> [bend001:03708] Read data 
>> AcG1peC5obmFtZQAwMwAwMDA4AGJlbmQwMDEAcG1peC5scmFuawAwZAAwMDAycG1peC5ucmFuawAwZAAwMA
>>   -
>> [bend001:03708] UNSUPPORTED TYPE 0
>> [bend001:03708] [[5,0],1] pmix:s1 got key pmix.hname
>> 
>> 
>> Looks like someone broke the common code for decoding keys. Could you please 
>> fix it?
>> Ralph
>> 
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/devel/2014/10/16046.php
> 
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/10/16052.php



Re: [OMPI devel] Slurm direct-launch is broken on trunk

2014-10-16 Thread Gilles Gouaillardet
Ralph,

the issue occurs when "pushing" a message that is larger than 255 bytes,
and i fixed it.

/* i am not sure anyone broke this, and fwiw, git blames blamed you */

Cheers,

Gilles

$ git show 27dcca0bb20d8f42b4d048758ef4ff14ca0d79b9
commit 27dcca0bb20d8f42b4d048758ef4ff14ca0d79b9
Author: Gilles Gouaillardet 
List-Post: devel@lists.open-mpi.org
Date:   Thu Oct 16 13:29:32 2014 +0900

pmi/s1: fix large keys

do not overwrite the PMI key when pushing a message that does
not fit within 255 bytes

diff --git a/opal/mca/pmix/base/pmix_base_fns.c
b/opal/mca/pmix/base/pmix_base_fns.c
index 56609c5..56c13ba 100644
--- a/opal/mca/pmix/base/pmix_base_fns.c
+++ b/opal/mca/pmix/base/pmix_base_fns.c
@@ -144,7 +144,7 @@ int opal_pmix_base_commit_packed( char*
buffer_to_put, int data_to_put,
 for (left = strlen (encoded_data), tmp = encoded_data ; left ; ) {
 size_t value_size = vallen > left ? left : vallen - 1;

-sprintf (tmp_key, "key%d", *pack_key);
+sprintf (tmp_key, "key%d", pkey);

 if (NULL == (pmikey = setup_key(_PROC_MY_NAME, tmp_key,
vallen))) {
 OPAL_ERROR_LOG(OPAL_ERR_BAD_PARAM);


On 2014/10/16 3:33, Ralph Castain wrote:
> When attempting to launch via srun:
>
> [bend001:03708] pmix: get all keys for proc 327680 in KVS 5.0
> [bend001:03708] GETTING KEY 327680-key0
> [bend001:03708] Read data 
> AcG1peC5obmFtZQAwMwAwMDA4AGJlbmQwMDEAcG1peC5scmFuawAwZAAwMDAycG1peC5ucmFuawAwZAAwMA
>   -
> [bend001:03708] UNSUPPORTED TYPE 0
> [bend001:03708] OPAL ERROR: Error in file pmix_s1.c at line 458
> [bend001:03709] [[5,0],2] pmix:s1 barrier complete
> [bend001:03709] pmix: get all keys for proc 327680 in KVS 5.0
> [bend001:03709] GETTING KEY 327680-key0
> [bend001:03709] Read data 
> AcG1peC5obmFtZQAwMwAwMDA4AGJlbmQwMDEAcG1peC5scmFuawAwZAAwMDAycG1peC5ucmFuawAwZAAwMA
>   -
> [bend001:03709] UNSUPPORTED TYPE 0
> [bend001:03709] OPAL ERROR: Error in file pmix_s1.c at line 458
> [bend001:03708] [[5,0],1] pmix:s1 called get for key pmix.hname
> [bend001:03708] pmix: get all keys for proc 327680 in KVS 5.0
> [bend001:03708] GETTING KEY 327680-key0
> [bend001:03708] Read data 
> AcG1peC5obmFtZQAwMwAwMDA4AGJlbmQwMDEAcG1peC5scmFuawAwZAAwMDAycG1peC5ucmFuawAwZAAwMA
>   -
> [bend001:03708] UNSUPPORTED TYPE 0
> [bend001:03708] [[5,0],1] pmix:s1 got key pmix.hname
>
>
> Looks like someone broke the common code for decoding keys. Could you please 
> fix it?
> Ralph
>
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/10/16046.php



[OMPI devel] Slurm direct-launch is broken on trunk

2014-10-15 Thread Ralph Castain
When attempting to launch via srun:

[bend001:03708] pmix: get all keys for proc 327680 in KVS 5.0
[bend001:03708] GETTING KEY 327680-key0
[bend001:03708] Read data 
AcG1peC5obmFtZQAwMwAwMDA4AGJlbmQwMDEAcG1peC5scmFuawAwZAAwMDAycG1peC5ucmFuawAwZAAwMA
  -
[bend001:03708] UNSUPPORTED TYPE 0
[bend001:03708] OPAL ERROR: Error in file pmix_s1.c at line 458
[bend001:03709] [[5,0],2] pmix:s1 barrier complete
[bend001:03709] pmix: get all keys for proc 327680 in KVS 5.0
[bend001:03709] GETTING KEY 327680-key0
[bend001:03709] Read data 
AcG1peC5obmFtZQAwMwAwMDA4AGJlbmQwMDEAcG1peC5scmFuawAwZAAwMDAycG1peC5ucmFuawAwZAAwMA
  -
[bend001:03709] UNSUPPORTED TYPE 0
[bend001:03709] OPAL ERROR: Error in file pmix_s1.c at line 458
[bend001:03708] [[5,0],1] pmix:s1 called get for key pmix.hname
[bend001:03708] pmix: get all keys for proc 327680 in KVS 5.0
[bend001:03708] GETTING KEY 327680-key0
[bend001:03708] Read data 
AcG1peC5obmFtZQAwMwAwMDA4AGJlbmQwMDEAcG1peC5scmFuawAwZAAwMDAycG1peC5ucmFuawAwZAAwMA
  -
[bend001:03708] UNSUPPORTED TYPE 0
[bend001:03708] [[5,0],1] pmix:s1 got key pmix.hname


Looks like someone broke the common code for decoding keys. Could you please 
fix it?
Ralph