Re: [OMPI devel] Slurm direct-launch is broken on trunk
On Oct 15, 2014, at 9:38 PM, Gilles Gouaillardetwrote: > Ralph, > > the issue occurs when "pushing" a message that is larger than 255 bytes, > and i fixed it. Kewl - thanks > > /* i am not sure anyone broke this, and fwiw, git blames blamed you */ I did the commit from the pmix branch, so it will list me for all those lines. However, I only worked on the native component, so I'm not sure who might have worked on that section or what testing they might have done prior to the commit to the trunk. Hence my request. > > Cheers, > > Gilles > > $ git show 27dcca0bb20d8f42b4d048758ef4ff14ca0d79b9 > commit 27dcca0bb20d8f42b4d048758ef4ff14ca0d79b9 > Author: Gilles Gouaillardet > Date: Thu Oct 16 13:29:32 2014 +0900 > >pmi/s1: fix large keys > >do not overwrite the PMI key when pushing a message that does >not fit within 255 bytes > > diff --git a/opal/mca/pmix/base/pmix_base_fns.c > b/opal/mca/pmix/base/pmix_base_fns.c > index 56609c5..56c13ba 100644 > --- a/opal/mca/pmix/base/pmix_base_fns.c > +++ b/opal/mca/pmix/base/pmix_base_fns.c > @@ -144,7 +144,7 @@ int opal_pmix_base_commit_packed( char* > buffer_to_put, int data_to_put, > for (left = strlen (encoded_data), tmp = encoded_data ; left ; ) { > size_t value_size = vallen > left ? left : vallen - 1; > > -sprintf (tmp_key, "key%d", *pack_key); > +sprintf (tmp_key, "key%d", pkey); > > if (NULL == (pmikey = setup_key(_PROC_MY_NAME, tmp_key, > vallen))) { > OPAL_ERROR_LOG(OPAL_ERR_BAD_PARAM); > > > On 2014/10/16 3:33, Ralph Castain wrote: >> When attempting to launch via srun: >> >> [bend001:03708] pmix: get all keys for proc 327680 in KVS 5.0 >> [bend001:03708] GETTING KEY 327680-key0 >> [bend001:03708] Read data >> AcG1peC5obmFtZQAwMwAwMDA4AGJlbmQwMDEAcG1peC5scmFuawAwZAAwMDAycG1peC5ucmFuawAwZAAwMA >> - >> [bend001:03708] UNSUPPORTED TYPE 0 >> [bend001:03708] OPAL ERROR: Error in file pmix_s1.c at line 458 >> [bend001:03709] [[5,0],2] pmix:s1 barrier complete >> [bend001:03709] pmix: get all keys for proc 327680 in KVS 5.0 >> [bend001:03709] GETTING KEY 327680-key0 >> [bend001:03709] Read data >> AcG1peC5obmFtZQAwMwAwMDA4AGJlbmQwMDEAcG1peC5scmFuawAwZAAwMDAycG1peC5ucmFuawAwZAAwMA >> - >> [bend001:03709] UNSUPPORTED TYPE 0 >> [bend001:03709] OPAL ERROR: Error in file pmix_s1.c at line 458 >> [bend001:03708] [[5,0],1] pmix:s1 called get for key pmix.hname >> [bend001:03708] pmix: get all keys for proc 327680 in KVS 5.0 >> [bend001:03708] GETTING KEY 327680-key0 >> [bend001:03708] Read data >> AcG1peC5obmFtZQAwMwAwMDA4AGJlbmQwMDEAcG1peC5scmFuawAwZAAwMDAycG1peC5ucmFuawAwZAAwMA >> - >> [bend001:03708] UNSUPPORTED TYPE 0 >> [bend001:03708] [[5,0],1] pmix:s1 got key pmix.hname >> >> >> Looks like someone broke the common code for decoding keys. Could you please >> fix it? >> Ralph >> >> ___ >> devel mailing list >> de...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >> Link to this post: >> http://www.open-mpi.org/community/lists/devel/2014/10/16046.php > > ___ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2014/10/16052.php
Re: [OMPI devel] Slurm direct-launch is broken on trunk
Ralph, the issue occurs when "pushing" a message that is larger than 255 bytes, and i fixed it. /* i am not sure anyone broke this, and fwiw, git blames blamed you */ Cheers, Gilles $ git show 27dcca0bb20d8f42b4d048758ef4ff14ca0d79b9 commit 27dcca0bb20d8f42b4d048758ef4ff14ca0d79b9 Author: Gilles GouaillardetList-Post: devel@lists.open-mpi.org Date: Thu Oct 16 13:29:32 2014 +0900 pmi/s1: fix large keys do not overwrite the PMI key when pushing a message that does not fit within 255 bytes diff --git a/opal/mca/pmix/base/pmix_base_fns.c b/opal/mca/pmix/base/pmix_base_fns.c index 56609c5..56c13ba 100644 --- a/opal/mca/pmix/base/pmix_base_fns.c +++ b/opal/mca/pmix/base/pmix_base_fns.c @@ -144,7 +144,7 @@ int opal_pmix_base_commit_packed( char* buffer_to_put, int data_to_put, for (left = strlen (encoded_data), tmp = encoded_data ; left ; ) { size_t value_size = vallen > left ? left : vallen - 1; -sprintf (tmp_key, "key%d", *pack_key); +sprintf (tmp_key, "key%d", pkey); if (NULL == (pmikey = setup_key(_PROC_MY_NAME, tmp_key, vallen))) { OPAL_ERROR_LOG(OPAL_ERR_BAD_PARAM); On 2014/10/16 3:33, Ralph Castain wrote: > When attempting to launch via srun: > > [bend001:03708] pmix: get all keys for proc 327680 in KVS 5.0 > [bend001:03708] GETTING KEY 327680-key0 > [bend001:03708] Read data > AcG1peC5obmFtZQAwMwAwMDA4AGJlbmQwMDEAcG1peC5scmFuawAwZAAwMDAycG1peC5ucmFuawAwZAAwMA > - > [bend001:03708] UNSUPPORTED TYPE 0 > [bend001:03708] OPAL ERROR: Error in file pmix_s1.c at line 458 > [bend001:03709] [[5,0],2] pmix:s1 barrier complete > [bend001:03709] pmix: get all keys for proc 327680 in KVS 5.0 > [bend001:03709] GETTING KEY 327680-key0 > [bend001:03709] Read data > AcG1peC5obmFtZQAwMwAwMDA4AGJlbmQwMDEAcG1peC5scmFuawAwZAAwMDAycG1peC5ucmFuawAwZAAwMA > - > [bend001:03709] UNSUPPORTED TYPE 0 > [bend001:03709] OPAL ERROR: Error in file pmix_s1.c at line 458 > [bend001:03708] [[5,0],1] pmix:s1 called get for key pmix.hname > [bend001:03708] pmix: get all keys for proc 327680 in KVS 5.0 > [bend001:03708] GETTING KEY 327680-key0 > [bend001:03708] Read data > AcG1peC5obmFtZQAwMwAwMDA4AGJlbmQwMDEAcG1peC5scmFuawAwZAAwMDAycG1peC5ucmFuawAwZAAwMA > - > [bend001:03708] UNSUPPORTED TYPE 0 > [bend001:03708] [[5,0],1] pmix:s1 got key pmix.hname > > > Looks like someone broke the common code for decoding keys. Could you please > fix it? > Ralph > > ___ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2014/10/16046.php
[OMPI devel] Slurm direct-launch is broken on trunk
When attempting to launch via srun: [bend001:03708] pmix: get all keys for proc 327680 in KVS 5.0 [bend001:03708] GETTING KEY 327680-key0 [bend001:03708] Read data AcG1peC5obmFtZQAwMwAwMDA4AGJlbmQwMDEAcG1peC5scmFuawAwZAAwMDAycG1peC5ucmFuawAwZAAwMA - [bend001:03708] UNSUPPORTED TYPE 0 [bend001:03708] OPAL ERROR: Error in file pmix_s1.c at line 458 [bend001:03709] [[5,0],2] pmix:s1 barrier complete [bend001:03709] pmix: get all keys for proc 327680 in KVS 5.0 [bend001:03709] GETTING KEY 327680-key0 [bend001:03709] Read data AcG1peC5obmFtZQAwMwAwMDA4AGJlbmQwMDEAcG1peC5scmFuawAwZAAwMDAycG1peC5ucmFuawAwZAAwMA - [bend001:03709] UNSUPPORTED TYPE 0 [bend001:03709] OPAL ERROR: Error in file pmix_s1.c at line 458 [bend001:03708] [[5,0],1] pmix:s1 called get for key pmix.hname [bend001:03708] pmix: get all keys for proc 327680 in KVS 5.0 [bend001:03708] GETTING KEY 327680-key0 [bend001:03708] Read data AcG1peC5obmFtZQAwMwAwMDA4AGJlbmQwMDEAcG1peC5scmFuawAwZAAwMDAycG1peC5ucmFuawAwZAAwMA - [bend001:03708] UNSUPPORTED TYPE 0 [bend001:03708] [[5,0],1] pmix:s1 got key pmix.hname Looks like someone broke the common code for decoding keys. Could you please fix it? Ralph