Re: [OMPI devel] [O-MPI devel] Alpha 4 and job state transitions

2006-02-13 Thread Ralph H. Castain

HI wonder if this is going to create a problem?

Tim/Brian/you io forwarding folks: This poses an interesting 
question. We automatically wire up i/o forwarding in our spawn 
routine. What happens when someone sets up their own i/o forwarding 
callback and subsequently wires up stdio themselves? Does this 
overwrite what we did, do processes receive duplicate copies, does it 
generate an error, ...?


I gather this is working for Nathan, and I don't claim to fully 
understand what he is doing, but I'm curious as to what might happen 
since I don't see anything in the system to prevent someone doing 
this (not sure we could anyway).


Ralph


At 02:32 PM 2/9/2006, you wrote:

I've coded a hacky workaround in our code to get past this.  Basically,
I capture all of the state transitions and the first one fired for a job
I fire the 'init' state internally in our tool.  Generally this occurs
for one of the gate transitions, G1 or something.  It'll work this way.

Furthermore, we're telling our users to get your 1.0.2a4 (or whatever
1.0.2 is available at the time).

The way I coded it when you guys put this into the main branch and the
INIT state resumes firing then my code will start working that much
better.  I really only brought it up because I felt it was a bug you
might not have been aware of.

Thanks all.

-- Nathan
Correspondence
-
Nathan DeBardeleben, Ph.D.
Los Alamos National Laboratory
Parallel Tools Team
High Performance Computing Environments
phone: 505-667-3428
email: ndeb...@lanl.gov
-



Jeff Squyres wrote:
> Nathan --
>
> Ralph and I talked about this and decided not to bring it over to the
> 1.0 branch -- the fix uses new functionality that exists on the trunk
> and not in the 1.0 branch.  The fix could be re-crafted to use
> existing functionality on the 1.0 branch (we're really trying to only
> put bug fixes on the 1.0 branch -- not any new functionality) -- but
> we didn't know if you cared.  :-)
>
> Do you mind if this fix stays on the trunk, or do you need it in the
> v1.0 branch?
>
>
>
> On Feb 8, 2006, at 4:36 PM, Nathan DeBardeleben wrote:
>
>
>> Thanks Ralph.
>>
>> -- Nathan
>> Correspondence
>> -
>> Nathan DeBardeleben, Ph.D.
>> Los Alamos National Laboratory
>> Parallel Tools Team
>> High Performance Computing Environments
>> phone: 505-667-3428
>> email: ndeb...@lanl.gov
>> -
>>
>>
>>
>> Ralph H. Castain wrote:
>>
>>> Nathan
>>>
>>> This should now be fixed on the trunk. Once it is checked out more
>>> thoroughly, I'll ask that it be moved to the 1.0 branch. For now, you
>>> might want to check out the trunk and verify it meets your needs.
>>>
>>> Ralph
>>>
>>> At 03:05 PM 2/1/2006, you wrote:
>>>
>>>
 This was happening on Alpha 1 as well but I upgraded today to
 Alpha 4 to
 see if it's gone away - it has not.

 I register a callback on a spawn() inside ORTE.  That callback
 includes
 the current state and should be called as the job goes through
 those states.

 I am now noticing that jobs never go through the INIT state.
 They may
 also not go through others but definitely not ORTE_PROC_STATE_INIT.

 I was registering the IOForwarding callback during the INIT phase
 so,
 consequentially, I now do not have IOF.  There are other side
 effects
 such as jobs that I start I think are perpetually in the 'starting'
 state and then, suddenly, they're done.

 Can someone look into / comment on this please?

 Thanks.

 --
 -- Nathan
 Correspondence
 
 -
 Nathan DeBardeleben, Ph.D.
 Los Alamos National Laboratory
 Parallel Tools Team
 High Performance Computing Environments
 phone: 505-667-3428
 email: ndeb...@lanl.gov
 
 -

 ___
 devel mailing list
 de...@open-mpi.org
 http://www.open-mpi.org/mailman/listinfo.cgi/devel


>>> ___
>>> devel mailing list
>>> de...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>
>>>
>>>
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>
>
>
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel





Re: [OMPI devel] [O-MPI devel] Alpha 4 and job state transitions

2006-02-13 Thread Greg Watson
I thought we were wiring up stdio ourselves because it wasn't being  
done in the spawn? If it's now being done by spawn then that's fine,  
but we need to be able to get called back when the I/O becomes  
available. How does this work?


Greg

On Feb 13, 2006, at 2:16 PM, Ralph H. Castain wrote:


HI wonder if this is going to create a problem?

Tim/Brian/you io forwarding folks: This poses an interesting
question. We automatically wire up i/o forwarding in our spawn
routine. What happens when someone sets up their own i/o forwarding
callback and subsequently wires up stdio themselves? Does this
overwrite what we did, do processes receive duplicate copies, does it
generate an error, ...?

I gather this is working for Nathan, and I don't claim to fully
understand what he is doing, but I'm curious as to what might happen
since I don't see anything in the system to prevent someone doing
this (not sure we could anyway).

Ralph


At 02:32 PM 2/9/2006, you wrote:
I've coded a hacky workaround in our code to get past this.   
Basically,
I capture all of the state transitions and the first one fired for  
a job
I fire the 'init' state internally in our tool.  Generally this  
occurs
for one of the gate transitions, G1 or something.  It'll work this  
way.


Furthermore, we're telling our users to get your 1.0.2a4 (or whatever
1.0.2 is available at the time).

The way I coded it when you guys put this into the main branch and  
the

INIT state resumes firing then my code will start working that much
better.  I really only brought it up because I felt it was a bug you
might not have been aware of.

Thanks all.

-- Nathan
Correspondence
-
Nathan DeBardeleben, Ph.D.
Los Alamos National Laboratory
Parallel Tools Team
High Performance Computing Environments
phone: 505-667-3428
email: ndeb...@lanl.gov
-



Jeff Squyres wrote:

Nathan --

Ralph and I talked about this and decided not to bring it over to  
the
1.0 branch -- the fix uses new functionality that exists on the  
trunk

and not in the 1.0 branch.  The fix could be re-crafted to use
existing functionality on the 1.0 branch (we're really trying to  
only

put bug fixes on the 1.0 branch -- not any new functionality) -- but
we didn't know if you cared.  :-)

Do you mind if this fix stays on the trunk, or do you need it in the
v1.0 branch?



On Feb 8, 2006, at 4:36 PM, Nathan DeBardeleben wrote:



Thanks Ralph.

-- Nathan
Correspondence
--- 
--

Nathan DeBardeleben, Ph.D.
Los Alamos National Laboratory
Parallel Tools Team
High Performance Computing Environments
phone: 505-667-3428
email: ndeb...@lanl.gov
--- 
--




Ralph H. Castain wrote:


Nathan

This should now be fixed on the trunk. Once it is checked out more
thoroughly, I'll ask that it be moved to the 1.0 branch. For  
now, you

might want to check out the trunk and verify it meets your needs.

Ralph

At 03:05 PM 2/1/2006, you wrote:



This was happening on Alpha 1 as well but I upgraded today to
Alpha 4 to
see if it's gone away - it has not.

I register a callback on a spawn() inside ORTE.  That callback
includes
the current state and should be called as the job goes through
those states.

I am now noticing that jobs never go through the INIT state.
They may
also not go through others but definitely not  
ORTE_PROC_STATE_INIT.


I was registering the IOForwarding callback during the INIT phase
so,
consequentially, I now do not have IOF.  There are other side
effects
such as jobs that I start I think are perpetually in the  
'starting'

state and then, suddenly, they're done.

Can someone look into / comment on this please?

Thanks.

--
-- Nathan
Correspondence
- 
---

-
Nathan DeBardeleben, Ph.D.
Los Alamos National Laboratory
Parallel Tools Team
High Performance Computing Environments
phone: 505-667-3428
email: ndeb...@lanl.gov
- 
---

-

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel






___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel