Re: [O-MPI devel] compilation problem

2006-02-09 Thread Gleb Natapov
On Wed, Feb 08, 2006 at 07:08:35AM -0700, Ralph H. Castain wrote:
> Hi Gleb
> 
> I just checked out another copy of the trunk and cannot replicate 
> this problem. Could you take out a fresh trunk and see if it works 
> for you? Could be something just got out of sync on your current 
> checkout (seen that happen before where svn gets files out of sync or 
> can even "lose" them) - I suspect that is the case here.
> 
Fresh checkout works OK, thanks.

> Ralph
> 
> 
> At 01:55 AM 2/8/2006, you wrote:
> >Hello I have problem to compile latest trunk event after running
> >./autogen.sh.
> >I've got the following error:
> >
> >  gcc -DHAVE_CONFIG_H -I. -I../../../../ompi/orte/mca/ns
> >  -I../../../include -I../../../include -I../../../../ompi/include
> >  -I../../../../ompi -I../../.. -I../../../include
> >  -I../../../../ompi/opal -I../../../../ompi/orte -I../../../../ompi/ompi
> >  -O3 -DNDEBUG -fno-strict-aliasing -pthread -MT
> >  base/ns_base_local_fns.lo -MD -MP -MF base/.deps/ns_base_local_fns.Tpo
> >  -c ../../../../ompi/orte/mca/ns/base/ns_base_local_fns.c  -fPIC -DPIC
> >  -o base/.libs/ns_base_local_fns.o
> >  make[2]: *** No rule to make target
> >  `base/data_type_support/ns_data_type_compare_fns.c', needed by
> >  `base/data_type_support/ns_data_type_compare_fns.lo'.  Stop.
> >  make[2]: Leaving directory
> >  `/export/home/glebn/OpenMPI/build/orte/mca/ns'
> >  make[1]: *** [all-recursive] Error 1
> >  make[1]: Leaving directory `/export/home/glebn/OpenMPI/build/orte'
> >  make: *** [all-recursive] Error 1
> >
> >Thanks,
> >
> >--
> > Gleb.
> >___
> >devel mailing list
> >de...@open-mpi.org
> >http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

--
Gleb.


Re: [O-MPI devel] Modification to triggers

2006-02-09 Thread Brian Barrett

On Feb 8, 2006, at 12:46 PM, Ralph H. Castain wrote:


In addition, I took advantage of the change to fix something Brian
had flagged in the orte/mca/rmgr/urm/rmgr_urm.c file where he noted
that the wireup of stdin for io forwarding should occur at the LAUNCH
stage (as opposed to the STG1 stage gate where it was occurring).
Given the availability of the new triggers, I changed that to conform
to his noted request.

Brian: please check that code to ensure I did this correctly.


I can't figure out exactly what is going on, but it looks like this  
change broke standard input forwarding.  I currently have it traced  
back (via printf debugging) to the fact that the  
orte_rmgr_urm_wireup_callback() callback never gets triggered in  
mpirun, so the wireup_stdin() function is never called and we never  
start pushing mpirun's standard input into the iof system.


At that point, we fall into parts of the code with which I'm not too  
familiar, so I have to hand this one back to you ;).


Brian


--
  Brian Barrett
  Open MPI developer
  http://www.open-mpi.org/




[O-MPI devel] required automake version...

2006-02-09 Thread Brian Barrett

Hey all -

A question for the group...  I'm working on the build system changes  
needed for the project split.  I'm currently running into a bug in  
Automake 1.9.5 and older that is causing me to have to do some fairly  
nasty workarounds.  The bug was fixed in AM 1.9.6, which has been out  
for a couple of months now.


I was wondering what people would think about us upgrading our  
requirements for the AM version to 1.9.6 for developers, so that I  
don't have to add a bunch of nasty workarounds into the build  
system.  As usual, this would only affect developers -- users that  
build from tarballs do not have to have AM installed.


Comments?  If not, I'll assume that means you're all cool with me  
making you all upgrade ;).


Brian

--
  Brian Barrett
  Open MPI developer
  http://www.open-mpi.org/




Re: [O-MPI devel] Modification to triggers

2006-02-09 Thread Ralph H. Castain
Hmmmyuck! I'll take a look - will set it back to what it was 
before in the interim.


Thanks
Ralph

At 07:05 AM 2/9/2006, you wrote:

On Feb 8, 2006, at 12:46 PM, Ralph H. Castain wrote:

> In addition, I took advantage of the change to fix something Brian
> had flagged in the orte/mca/rmgr/urm/rmgr_urm.c file where he noted
> that the wireup of stdin for io forwarding should occur at the LAUNCH
> stage (as opposed to the STG1 stage gate where it was occurring).
> Given the availability of the new triggers, I changed that to conform
> to his noted request.
>
> Brian: please check that code to ensure I did this correctly.

I can't figure out exactly what is going on, but it looks like this
change broke standard input forwarding.  I currently have it traced
back (via printf debugging) to the fact that the
orte_rmgr_urm_wireup_callback() callback never gets triggered in
mpirun, so the wireup_stdin() function is never called and we never
start pushing mpirun's standard input into the iof system.

At that point, we fall into parts of the code with which I'm not too
familiar, so I have to hand this one back to you ;).

Brian


--
   Brian Barrett
   Open MPI developer
   http://www.open-mpi.org/


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel





Re: [O-MPI devel] required automake version...

2006-02-09 Thread Ralph H. Castain

Sounds fine with me - whatever makes the job easier for you.

Ralph

At 07:16 AM 2/9/2006, you wrote:

Hey all -

A question for the group...  I'm working on the build system changes
needed for the project split.  I'm currently running into a bug in
Automake 1.9.5 and older that is causing me to have to do some fairly
nasty workarounds.  The bug was fixed in AM 1.9.6, which has been out
for a couple of months now.

I was wondering what people would think about us upgrading our
requirements for the AM version to 1.9.6 for developers, so that I
don't have to add a bunch of nasty workarounds into the build
system.  As usual, this would only affect developers -- users that
build from tarballs do not have to have AM installed.

Comments?  If not, I'll assume that means you're all cool with me
making you all upgrade ;).

Brian

--
   Brian Barrett
   Open MPI developer
   http://www.open-mpi.org/


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel





Re: [O-MPI devel] Modification to triggers

2006-02-09 Thread Ralph H. Castain
Okay, it turned out that the counters were not being adjusted as 
processes hit the INIT and LAUNCHED stages - just a case where that 
hadn't been implemented yet. I've fixed that now (it was easier to 
fix than go back) and the wireup_stdin function is now being called.


Brian: can you verify that things are working correctly now?

Thanks
Ralph


At 07:40 AM 2/9/2006, you wrote:

Hmmmyuck! I'll take a look - will set it back to what it was
before in the interim.

Thanks
Ralph

At 07:05 AM 2/9/2006, you wrote:
>On Feb 8, 2006, at 12:46 PM, Ralph H. Castain wrote:
>
> > In addition, I took advantage of the change to fix something Brian
> > had flagged in the orte/mca/rmgr/urm/rmgr_urm.c file where he noted
> > that the wireup of stdin for io forwarding should occur at the LAUNCH
> > stage (as opposed to the STG1 stage gate where it was occurring).
> > Given the availability of the new triggers, I changed that to conform
> > to his noted request.
> >
> > Brian: please check that code to ensure I did this correctly.
>
>I can't figure out exactly what is going on, but it looks like this
>change broke standard input forwarding.  I currently have it traced
>back (via printf debugging) to the fact that the
>orte_rmgr_urm_wireup_callback() callback never gets triggered in
>mpirun, so the wireup_stdin() function is never called and we never
>start pushing mpirun's standard input into the iof system.
>
>At that point, we fall into parts of the code with which I'm not too
>familiar, so I have to hand this one back to you ;).
>
>Brian
>
>
>--
>Brian Barrett
>Open MPI developer
>http://www.open-mpi.org/
>
>
>___
>devel mailing list
>de...@open-mpi.org
>http://www.open-mpi.org/mailman/listinfo.cgi/devel


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel





Re: [O-MPI devel] required automake version...

2006-02-09 Thread Jeff Squyres

On Feb 9, 2006, at 9:56 AM, Ralph H. Castain wrote:


Sounds fine with me - whatever makes the job easier for you.


Ditto.  I think the system is complex enough that adding workarounds  
for known bugs in AM (especially ones that have been fixed and are  
available to us) is not worthwhile.


--
{+} Jeff Squyres
{+} The Open MPI Project
{+} http://www.open-mpi.org/




Re: [O-MPI devel] Modification to triggers

2006-02-09 Thread Brian Barrett

That did the trick for me.  Thanks!

Brian

On Feb 9, 2006, at 10:40 AM, Ralph H. Castain wrote:


Okay, it turned out that the counters were not being adjusted as
processes hit the INIT and LAUNCHED stages - just a case where that
hadn't been implemented yet. I've fixed that now (it was easier to
fix than go back) and the wireup_stdin function is now being called.

Brian: can you verify that things are working correctly now?

Thanks
Ralph


At 07:40 AM 2/9/2006, you wrote:

Hmmmyuck! I'll take a look - will set it back to what it was
before in the interim.

Thanks
Ralph

At 07:05 AM 2/9/2006, you wrote:

On Feb 8, 2006, at 12:46 PM, Ralph H. Castain wrote:


In addition, I took advantage of the change to fix something Brian
had flagged in the orte/mca/rmgr/urm/rmgr_urm.c file where he noted
that the wireup of stdin for io forwarding should occur at the  
LAUNCH

stage (as opposed to the STG1 stage gate where it was occurring).
Given the availability of the new triggers, I changed that to  
conform

to his noted request.

Brian: please check that code to ensure I did this correctly.


I can't figure out exactly what is going on, but it looks like this
change broke standard input forwarding.  I currently have it traced
back (via printf debugging) to the fact that the
orte_rmgr_urm_wireup_callback() callback never gets triggered in
mpirun, so the wireup_stdin() function is never called and we never
start pushing mpirun's standard input into the iof system.

At that point, we fall into parts of the code with which I'm not too
familiar, so I have to hand this one back to you ;).

Brian


--
   Brian Barrett
   Open MPI developer
   http://www.open-mpi.org/


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [O-MPI devel] Alpha 4 and job state transitions

2006-02-09 Thread Jeff Squyres

Nathan --

Ralph and I talked about this and decided not to bring it over to the  
1.0 branch -- the fix uses new functionality that exists on the trunk  
and not in the 1.0 branch.  The fix could be re-crafted to use  
existing functionality on the 1.0 branch (we're really trying to only  
put bug fixes on the 1.0 branch -- not any new functionality) -- but  
we didn't know if you cared.  :-)


Do you mind if this fix stays on the trunk, or do you need it in the  
v1.0 branch?




On Feb 8, 2006, at 4:36 PM, Nathan DeBardeleben wrote:


Thanks Ralph.

-- Nathan
Correspondence
-
Nathan DeBardeleben, Ph.D.
Los Alamos National Laboratory
Parallel Tools Team
High Performance Computing Environments
phone: 505-667-3428
email: ndeb...@lanl.gov
-



Ralph H. Castain wrote:

Nathan

This should now be fixed on the trunk. Once it is checked out more
thoroughly, I'll ask that it be moved to the 1.0 branch. For now, you
might want to check out the trunk and verify it meets your needs.

Ralph

At 03:05 PM 2/1/2006, you wrote:

This was happening on Alpha 1 as well but I upgraded today to  
Alpha 4 to

see if it's gone away - it has not.

I register a callback on a spawn() inside ORTE.  That callback  
includes
the current state and should be called as the job goes through  
those states.


I am now noticing that jobs never go through the INIT state.   
They may

also not go through others but definitely not ORTE_PROC_STATE_INIT.

I was registering the IOForwarding callback during the INIT phase  
so,
consequentially, I now do not have IOF.  There are other side  
effects

such as jobs that I start I think are perpetually in the 'starting'
state and then, suddenly, they're done.

Can someone look into / comment on this please?

Thanks.

--
-- Nathan
Correspondence
 
-

Nathan DeBardeleben, Ph.D.
Los Alamos National Laboratory
Parallel Tools Team
High Performance Computing Environments
phone: 505-667-3428
email: ndeb...@lanl.gov
 
-


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



--
{+} Jeff Squyres
{+} The Open MPI Project
{+} http://www.open-mpi.org/




Re: [O-MPI devel] Alpha 4 and job state transitions

2006-02-09 Thread Nathan DeBardeleben
I've coded a hacky workaround in our code to get past this.  Basically, 
I capture all of the state transitions and the first one fired for a job 
I fire the 'init' state internally in our tool.  Generally this occurs 
for one of the gate transitions, G1 or something.  It'll work this way.


Furthermore, we're telling our users to get your 1.0.2a4 (or whatever 
1.0.2 is available at the time).


The way I coded it when you guys put this into the main branch and the 
INIT state resumes firing then my code will start working that much 
better.  I really only brought it up because I felt it was a bug you 
might not have been aware of.


Thanks all.

-- Nathan
Correspondence
-
Nathan DeBardeleben, Ph.D.
Los Alamos National Laboratory
Parallel Tools Team
High Performance Computing Environments
phone: 505-667-3428
email: ndeb...@lanl.gov
-



Jeff Squyres wrote:

Nathan --

Ralph and I talked about this and decided not to bring it over to the  
1.0 branch -- the fix uses new functionality that exists on the trunk  
and not in the 1.0 branch.  The fix could be re-crafted to use  
existing functionality on the 1.0 branch (we're really trying to only  
put bug fixes on the 1.0 branch -- not any new functionality) -- but  
we didn't know if you cared.  :-)


Do you mind if this fix stays on the trunk, or do you need it in the  
v1.0 branch?




On Feb 8, 2006, at 4:36 PM, Nathan DeBardeleben wrote:

  

Thanks Ralph.

-- Nathan
Correspondence
-
Nathan DeBardeleben, Ph.D.
Los Alamos National Laboratory
Parallel Tools Team
High Performance Computing Environments
phone: 505-667-3428
email: ndeb...@lanl.gov
-



Ralph H. Castain wrote:


Nathan

This should now be fixed on the trunk. Once it is checked out more
thoroughly, I'll ask that it be moved to the 1.0 branch. For now, you
might want to check out the trunk and verify it meets your needs.

Ralph

At 03:05 PM 2/1/2006, you wrote:

  
This was happening on Alpha 1 as well but I upgraded today to  
Alpha 4 to

see if it's gone away - it has not.

I register a callback on a spawn() inside ORTE.  That callback  
includes
the current state and should be called as the job goes through  
those states.


I am now noticing that jobs never go through the INIT state.   
They may

also not go through others but definitely not ORTE_PROC_STATE_INIT.

I was registering the IOForwarding callback during the INIT phase  
so,
consequentially, I now do not have IOF.  There are other side  
effects

such as jobs that I start I think are perpetually in the 'starting'
state and then, suddenly, they're done.

Can someone look into / comment on this please?

Thanks.

--
-- Nathan
Correspondence
 
-

Nathan DeBardeleben, Ph.D.
Los Alamos National Laboratory
Parallel Tools Team
High Performance Computing Environments
phone: 505-667-3428
email: ndeb...@lanl.gov
 
-


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


  

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel