Re: [OMPI users] Is gridengine integration broken in openmpi 2.0.2?

2017-02-13 Thread r...@open-mpi.org
I dug into this further, and the simplest solution for now is to simply do one 
of the following:

* replace the “!=“ with “==“ in the test, as Jeff indicated; or

* revert the commit Mark identified

Both options will restore the original logic. Given that someone already got it 
wrong, I have clarified the logic in the OMPI master repo. However, I don’t 
know how long it will be before a 2.0.3 release is issued, so GridEngine users 
might want to locally fix things in the interim.


> On Feb 12, 2017, at 1:52 PM, r...@open-mpi.org wrote:
> 
> Yeah, I’ll fix it this week. The problem is that you can’t check the source 
> as being default as the default is ssh - so the only way to get the current 
> code to check for qrsh is to specify something other than the default ssh (it 
> doesn’t matter what you specify - anything will get you past the erroneous 
> check so you look for qrsh).
> 
> 
>> On Feb 9, 2017, at 3:21 PM, Jeff Squyres (jsquyres)  
>> wrote:
>> 
>> Yes, we can get it fixed.
>> 
>> Ralph is unavailable this week; I don't know offhand what he meant by his 
>> prior remarks.  It's possible that 
>> https://github.com/open-mpi/ompi/commit/71ec5cfb436977ea9ad409ba634d27e6addf6fae;
>>  can you try changing the "!=" on line to be "=="?  I.e., from
>> 
>> if (MCA_BASE_VAR_SOURCE_DEFAULT != source) {
>> 
>> to
>> 
>> if (MCA_BASE_VAR_SOURCE_DEFAULT == source) {
>> 
>> I filed https://github.com/open-mpi/ompi/issues/2947 to track the issue.
>> 
>> 
>>> On Feb 9, 2017, at 6:01 PM, Glenn Johnson  wrote:
>>> 
>>> Will this be fixed in the 2.0.3 release?
>>> 
>>> Thanks.
>>> 
>>> 
>>> Glenn
>>> 
>>> On Mon, Feb 6, 2017 at 10:45 AM, Mark Dixon  wrote:
>>> On Mon, 6 Feb 2017, Mark Dixon wrote:
>>> ...
>>> Ah-ha! "-mca plm_rsh_agent foo" fixes it!
>>> 
>>> Thanks very much - presumably I can stick that in the system-wide 
>>> openmpi-mca-params.conf for now.
>>> ...
>>> 
>>> Except if I do that, it means running ompi outside of the SGE environment 
>>> no longer works :(
>>> 
>>> Should I just revoke the following commit?
>>> 
>>> Cheers,
>>> 
>>> Mark
>>> 
>>> commit d51c2af76b0c011177aca8e08a5a5fcf9f5e67db
>>> Author: Jeff Squyres 
>>> Date:   Tue Aug 16 06:58:20 2016 -0500
>>> 
>>>   rsh: robustify the check for plm_rsh_agent default value
>>> 
>>>   Don't strcmp against the default value -- the default value may change
>>>   over time.  Instead, check to see if the MCA var source is not
>>>   DEFAULT.
>>> 
>>>   Signed-off-by: Jeff Squyres 
>>> 
>>>   (cherry picked from commit 
>>> open-mpi/ompi@71ec5cfb436977ea9ad409ba634d27e6addf6fae)
>>> 
>>> 
>>> ___
>>> users mailing list
>>> users@lists.open-mpi.org
>>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>>> 
>>> ___
>>> users mailing list
>>> users@lists.open-mpi.org
>>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>> 
>> 
>> -- 
>> Jeff Squyres
>> jsquy...@cisco.com
>> 
>> ___
>> users mailing list
>> users@lists.open-mpi.org
>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
> 
> ___
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users

___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] Is gridengine integration broken in openmpi 2.0.2?

2017-02-12 Thread r...@open-mpi.org
Yeah, I’ll fix it this week. The problem is that you can’t check the source as 
being default as the default is ssh - so the only way to get the current code 
to check for qrsh is to specify something other than the default ssh (it 
doesn’t matter what you specify - anything will get you past the erroneous 
check so you look for qrsh).


> On Feb 9, 2017, at 3:21 PM, Jeff Squyres (jsquyres)  
> wrote:
> 
> Yes, we can get it fixed.
> 
> Ralph is unavailable this week; I don't know offhand what he meant by his 
> prior remarks.  It's possible that 
> https://github.com/open-mpi/ompi/commit/71ec5cfb436977ea9ad409ba634d27e6addf6fae;
>  can you try changing the "!=" on line to be "=="?  I.e., from
> 
> if (MCA_BASE_VAR_SOURCE_DEFAULT != source) {
> 
> to
> 
> if (MCA_BASE_VAR_SOURCE_DEFAULT == source) {
> 
> I filed https://github.com/open-mpi/ompi/issues/2947 to track the issue.
> 
> 
>> On Feb 9, 2017, at 6:01 PM, Glenn Johnson  wrote:
>> 
>> Will this be fixed in the 2.0.3 release?
>> 
>> Thanks.
>> 
>> 
>> Glenn
>> 
>> On Mon, Feb 6, 2017 at 10:45 AM, Mark Dixon  wrote:
>> On Mon, 6 Feb 2017, Mark Dixon wrote:
>> ...
>> Ah-ha! "-mca plm_rsh_agent foo" fixes it!
>> 
>> Thanks very much - presumably I can stick that in the system-wide 
>> openmpi-mca-params.conf for now.
>> ...
>> 
>> Except if I do that, it means running ompi outside of the SGE environment no 
>> longer works :(
>> 
>> Should I just revoke the following commit?
>> 
>> Cheers,
>> 
>> Mark
>> 
>> commit d51c2af76b0c011177aca8e08a5a5fcf9f5e67db
>> Author: Jeff Squyres 
>> Date:   Tue Aug 16 06:58:20 2016 -0500
>> 
>>rsh: robustify the check for plm_rsh_agent default value
>> 
>>Don't strcmp against the default value -- the default value may change
>>over time.  Instead, check to see if the MCA var source is not
>>DEFAULT.
>> 
>>Signed-off-by: Jeff Squyres 
>> 
>>(cherry picked from commit 
>> open-mpi/ompi@71ec5cfb436977ea9ad409ba634d27e6addf6fae)
>> 
>> 
>> ___
>> users mailing list
>> users@lists.open-mpi.org
>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>> 
>> ___
>> users mailing list
>> users@lists.open-mpi.org
>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
> 
> 
> -- 
> Jeff Squyres
> jsquy...@cisco.com
> 
> ___
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users

___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] Is gridengine integration broken in openmpi 2.0.2?

2017-02-09 Thread Jeff Squyres (jsquyres)
Yes, we can get it fixed.

Ralph is unavailable this week; I don't know offhand what he meant by his prior 
remarks.  It's possible that 
https://github.com/open-mpi/ompi/commit/71ec5cfb436977ea9ad409ba634d27e6addf6fae;
 can you try changing the "!=" on line to be "=="?  I.e., from

if (MCA_BASE_VAR_SOURCE_DEFAULT != source) {

to

if (MCA_BASE_VAR_SOURCE_DEFAULT == source) {

I filed https://github.com/open-mpi/ompi/issues/2947 to track the issue.


> On Feb 9, 2017, at 6:01 PM, Glenn Johnson  wrote:
> 
> Will this be fixed in the 2.0.3 release?
> 
> Thanks.
> 
> 
> Glenn
> 
> On Mon, Feb 6, 2017 at 10:45 AM, Mark Dixon  wrote:
> On Mon, 6 Feb 2017, Mark Dixon wrote:
> ...
> Ah-ha! "-mca plm_rsh_agent foo" fixes it!
> 
> Thanks very much - presumably I can stick that in the system-wide 
> openmpi-mca-params.conf for now.
> ...
> 
> Except if I do that, it means running ompi outside of the SGE environment no 
> longer works :(
> 
> Should I just revoke the following commit?
> 
> Cheers,
> 
> Mark
> 
> commit d51c2af76b0c011177aca8e08a5a5fcf9f5e67db
> Author: Jeff Squyres 
> Date:   Tue Aug 16 06:58:20 2016 -0500
> 
> rsh: robustify the check for plm_rsh_agent default value
> 
> Don't strcmp against the default value -- the default value may change
> over time.  Instead, check to see if the MCA var source is not
> DEFAULT.
> 
> Signed-off-by: Jeff Squyres 
> 
> (cherry picked from commit 
> open-mpi/ompi@71ec5cfb436977ea9ad409ba634d27e6addf6fae)
> 
> 
> ___
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
> 
> ___
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users


-- 
Jeff Squyres
jsquy...@cisco.com

___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users


Re: [OMPI users] Is gridengine integration broken in openmpi 2.0.2?

2017-02-09 Thread Glenn Johnson
Will this be fixed in the 2.0.3 release?

Thanks.


Glenn

On Mon, Feb 6, 2017 at 10:45 AM, Mark Dixon  wrote:

> On Mon, 6 Feb 2017, Mark Dixon wrote:
> ...
>
>> Ah-ha! "-mca plm_rsh_agent foo" fixes it!
>>
>> Thanks very much - presumably I can stick that in the system-wide
>> openmpi-mca-params.conf for now.
>>
> ...
>
> Except if I do that, it means running ompi outside of the SGE environment
> no longer works :(
>
> Should I just revoke the following commit?
>
> Cheers,
>
> Mark
>
> commit d51c2af76b0c011177aca8e08a5a5fcf9f5e67db
> Author: Jeff Squyres 
> Date:   Tue Aug 16 06:58:20 2016 -0500
>
> rsh: robustify the check for plm_rsh_agent default value
>
> Don't strcmp against the default value -- the default value may change
> over time.  Instead, check to see if the MCA var source is not
> DEFAULT.
>
> Signed-off-by: Jeff Squyres 
>
> (cherry picked from commit open-mpi/ompi@71ec5cfb436977ea
> 9ad409ba634d27e6addf6fae)
>
>
> ___
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] Is gridengine integration broken in openmpi 2.0.2?

2017-02-06 Thread Mark Dixon

On Mon, 6 Feb 2017, Mark Dixon wrote:
...

Ah-ha! "-mca plm_rsh_agent foo" fixes it!

Thanks very much - presumably I can stick that in the system-wide 
openmpi-mca-params.conf for now.

...

Except if I do that, it means running ompi outside of the SGE environment 
no longer works :(


Should I just revoke the following commit?

Cheers,

Mark

commit d51c2af76b0c011177aca8e08a5a5fcf9f5e67db
Author: Jeff Squyres 
Date:   Tue Aug 16 06:58:20 2016 -0500

rsh: robustify the check for plm_rsh_agent default value

Don't strcmp against the default value -- the default value may change
over time.  Instead, check to see if the MCA var source is not
DEFAULT.

Signed-off-by: Jeff Squyres 

(cherry picked from commit 
open-mpi/ompi@71ec5cfb436977ea9ad409ba634d27e6addf6fae)

___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users


Re: [OMPI users] Is gridengine integration broken in openmpi 2.0.2?

2017-02-06 Thread Mark Dixon

On Fri, 3 Feb 2017, r...@open-mpi.org wrote:

I do see a diff between 2.0.1 and 2.0.2 that might have a related 
impact. The way we handled the MCA param that specifies the launch agent 
(ssh, rsh, or whatever) was modified, and I don’t think the change is 
correct. It basically says that we don’t look for qrsh unless the MCA 
param has been changed from the coded default, which means we are not 
detecting SGE by default.


Try setting "-mca plm_rsh_agent foo" on your cmd line - that will get 
past the test, and then we should auto-detect SGE again

...

Ah-ha! "-mca plm_rsh_agent foo" fixes it!

Thanks very much - presumably I can stick that in the system-wide 
openmpi-mca-params.conf for now.


Cheers,

Mark___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] Is gridengine integration broken in openmpi 2.0.2?

2017-02-03 Thread r...@open-mpi.org
I don’t think so - at least, that isn’t the code I was looking at.

> On Feb 3, 2017, at 9:43 AM, Glenn Johnson  wrote:
> 
> Is this the same issue that was previously fixed in PR-1960?
> 
> https://github.com/open-mpi/ompi/pull/1960/files 
> 
> 
> 
> Glenn
> 
> On Fri, Feb 3, 2017 at 10:56 AM, r...@open-mpi.org  
> > wrote:
> I do see a diff between 2.0.1 and 2.0.2 that might have a related impact. The 
> way we handled the MCA param that specifies the launch agent (ssh, rsh, or 
> whatever) was modified, and I don’t think the change is correct. It basically 
> says that we don’t look for qrsh unless the MCA param has been changed from 
> the coded default, which means we are not detecting SGE by default.
> 
> Try setting "-mca plm_rsh_agent foo" on your cmd line - that will get past 
> the test, and then we should auto-detect SGE again
> 
> 
> > On Feb 3, 2017, at 8:49 AM, Mark Dixon  > > wrote:
> >
> > On Fri, 3 Feb 2017, Reuti wrote:
> > ...
> >> SGE on its own is not configured to use SSH? (I mean the entries in `qconf 
> >> -sconf` for rsh_command resp. daemon).
> > ...
> >
> > Nope, everything left as the default:
> >
> > $ qconf -sconf | grep _command
> > qlogin_command   builtin
> > rlogin_command   builtin
> > rsh_command  builtin
> >
> > I have 2.0.1 and 2.0.2 installed side by side. 2.0.1 is happy but 2.0.2 
> > isn't.
> >
> > I'll start digging, but I'd appreciate hearing from any other SGE user who 
> > had tried 2.0.2 and tell me if it had worked for them, please? :)
> >
> > Cheers,
> >
> > Mark
> > ___
> > users mailing list
> > users@lists.open-mpi.org 
> > https://rfd.newmexicoconsortium.org/mailman/listinfo/users 
> > 
> 
> ___
> users mailing list
> users@lists.open-mpi.org 
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users 
> 
> ___
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users

___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] Is gridengine integration broken in openmpi 2.0.2?

2017-02-03 Thread Glenn Johnson
Is this the same issue that was previously fixed in PR-1960?

https://github.com/open-mpi/ompi/pull/1960/files


Glenn

On Fri, Feb 3, 2017 at 10:56 AM, r...@open-mpi.org  wrote:

> I do see a diff between 2.0.1 and 2.0.2 that might have a related impact.
> The way we handled the MCA param that specifies the launch agent (ssh, rsh,
> or whatever) was modified, and I don’t think the change is correct. It
> basically says that we don’t look for qrsh unless the MCA param has been
> changed from the coded default, which means we are not detecting SGE by
> default.
>
> Try setting "-mca plm_rsh_agent foo" on your cmd line - that will get past
> the test, and then we should auto-detect SGE again
>
>
> > On Feb 3, 2017, at 8:49 AM, Mark Dixon  wrote:
> >
> > On Fri, 3 Feb 2017, Reuti wrote:
> > ...
> >> SGE on its own is not configured to use SSH? (I mean the entries in
> `qconf -sconf` for rsh_command resp. daemon).
> > ...
> >
> > Nope, everything left as the default:
> >
> > $ qconf -sconf | grep _command
> > qlogin_command   builtin
> > rlogin_command   builtin
> > rsh_command  builtin
> >
> > I have 2.0.1 and 2.0.2 installed side by side. 2.0.1 is happy but 2.0.2
> isn't.
> >
> > I'll start digging, but I'd appreciate hearing from any other SGE user
> who had tried 2.0.2 and tell me if it had worked for them, please? :)
> >
> > Cheers,
> >
> > Mark
> > ___
> > users mailing list
> > users@lists.open-mpi.org
> > https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>
> ___
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] Is gridengine integration broken in openmpi 2.0.2?

2017-02-03 Thread r...@open-mpi.org
I do see a diff between 2.0.1 and 2.0.2 that might have a related impact. The 
way we handled the MCA param that specifies the launch agent (ssh, rsh, or 
whatever) was modified, and I don’t think the change is correct. It basically 
says that we don’t look for qrsh unless the MCA param has been changed from the 
coded default, which means we are not detecting SGE by default.

Try setting "-mca plm_rsh_agent foo" on your cmd line - that will get past the 
test, and then we should auto-detect SGE again


> On Feb 3, 2017, at 8:49 AM, Mark Dixon  wrote:
> 
> On Fri, 3 Feb 2017, Reuti wrote:
> ...
>> SGE on its own is not configured to use SSH? (I mean the entries in `qconf 
>> -sconf` for rsh_command resp. daemon).
> ...
> 
> Nope, everything left as the default:
> 
> $ qconf -sconf | grep _command
> qlogin_command   builtin
> rlogin_command   builtin
> rsh_command  builtin
> 
> I have 2.0.1 and 2.0.2 installed side by side. 2.0.1 is happy but 2.0.2 isn't.
> 
> I'll start digging, but I'd appreciate hearing from any other SGE user who 
> had tried 2.0.2 and tell me if it had worked for them, please? :)
> 
> Cheers,
> 
> Mark
> ___
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users

___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] Is gridengine integration broken in openmpi 2.0.2?

2017-02-03 Thread Mark Dixon

On Fri, 3 Feb 2017, Reuti wrote:
...
SGE on its own is not configured to use SSH? (I mean the entries in 
`qconf -sconf` for rsh_command resp. daemon).

...

Nope, everything left as the default:

$ qconf -sconf | grep _command
qlogin_command   builtin
rlogin_command   builtin
rsh_command  builtin

I have 2.0.1 and 2.0.2 installed side by side. 2.0.1 is happy but 2.0.2 
isn't.


I'll start digging, but I'd appreciate hearing from any other SGE user who 
had tried 2.0.2 and tell me if it had worked for them, please? :)


Cheers,

Mark
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users


Re: [OMPI users] Is gridengine integration broken in openmpi 2.0.2?

2017-02-03 Thread Reuti
Hi,

> Am 03.02.2017 um 17:10 schrieb Mark Dixon :
> 
> Hi,
> 
> Just tried upgrading from 2.0.1 to 2.0.2 and I'm getting error messages that 
> look like openmpi is using ssh to login to remote nodes instead of qrsh (see 
> below). Has anyone else noticed gridengine integration being broken, or am I 
> being dumb?
> 
> I built with "./configure 
> --prefix=/apps/developers/libraries/openmpi/2.0.2/1/intel-17.0.1 --with-sge 
> --with-io-romio-flags=--with-file-system=lustre+ufs --enable-mpi-cxx 
> --with-cma"

SGE on its own is not configured to use SSH? (I mean the entries in `qconf 
-sconf` for rsh_command resp. daemon).

-- Reuti


> Can see the gridengine component via:
> 
> $ ompi_info -a | grep gridengine
> MCA ras: gridengine (MCA v2.1.0, API v2.0.0, Component v2.0.2)
>  MCA ras gridengine: ---
>  MCA ras gridengine: parameter "ras_gridengine_priority" (current value: 
> "100", data source: default, level: 9 dev/all, type: int)
>  Priority of the gridengine ras component
>  MCA ras gridengine: parameter "ras_gridengine_verbose" (current value: 
> "0", data source: default, level: 9 dev/all, type: int)
>  Enable verbose output for the gridengine ras 
> component
>  MCA ras gridengine: parameter "ras_gridengine_show_jobid" (current 
> value: "false", data source: default, level: 9 dev/all, type: bool)
> 
> Cheers,
> 
> Mark
> 
> ssh_askpass: exec(/usr/libexec/openssh/ssh-askpass): No such file or directory
> Permission denied, please try again.
> ssh_askpass: exec(/usr/libexec/openssh/ssh-askpass): No such file or directory
> Permission denied, please try again.
> ssh_askpass: exec(/usr/libexec/openssh/ssh-askpass): No such file or directory
> Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password,hostbased).
> --
> ORTE was unable to reliably start one or more daemons.
> This usually is caused by:
> 
> * not finding the required libraries and/or binaries on
>  one or more nodes. Please check your PATH and LD_LIBRARY_PATH
>  settings, or configure OMPI with --enable-orterun-prefix-by-default
> 
> * lack of authority to execute on one or more specified nodes.
>  Please verify your allocation and authorities.
> 
> * the inability to write startup files into /tmp (--tmpdir/orte_tmpdir_base).
>  Please check with your sys admin to determine the correct location to use.
> 
> *  compilation of the orted with dynamic libraries when static are required
>  (e.g., on Cray). Please check your configure cmd line and consider using
>  one of the contrib/platform definitions for your system type.
> 
> * an inability to create a connection back to mpirun due to a
>  lack of common network interfaces and/or no route found between
>  them. Please check network connectivity (including firewalls
>  and network routing requirements).
> --
> 
> ___
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users



signature.asc
Description: Message signed with OpenPGP using GPGMail
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users