Hi Greg

I believe you may have been wiring it up originally because we didn't have that service implemented at that time. We do have it all wired up now - in fact, Brian has done some fairly important cleanup to the system recently.

Since we complete the wiring upon notification of the INIT trigger, I would not advise attaching yourself to that trigger - it could create a race condition as to which of you (your callback or ours) got called first. Instead, I would suggest attaching to the LAUNCHED trigger, which occurs next in the sequence. This fires when the procs actually are all launched, but before they initialize themselves through mpi_init (assuming they do so).

If that doesn't work for you, I could create a subscription flag to NOTIFY_ME_LAST that would ensure your callback occurred after any others. This would resolve the race condition and allow you to use the INIT trigger, but would take a little work on my part to implement before you could use it.

Ralph


At 03:21 PM 2/13/2006, you wrote:
I thought we were wiring up stdio ourselves because it wasn't being
done in the spawn? If it's now being done by spawn then that's fine,
but we need to be able to get called back when the I/O becomes
available. How does this work?

Greg

On Feb 13, 2006, at 2:16 PM, Ralph H. Castain wrote:

> Hmmmm....I wonder if this is going to create a problem?
>
> Tim/Brian/you io forwarding folks: This poses an interesting
> question. We automatically wire up i/o forwarding in our spawn
> routine. What happens when someone sets up their own i/o forwarding
> callback and subsequently wires up stdio themselves? Does this
> overwrite what we did, do processes receive duplicate copies, does it
> generate an error, ...?
>
> I gather this is working for Nathan, and I don't claim to fully
> understand what he is doing, but I'm curious as to what might happen
> since I don't see anything in the system to prevent someone doing
> this (not sure we could anyway).
>
> Ralph
>
>
> At 02:32 PM 2/9/2006, you wrote:
>> I've coded a hacky workaround in our code to get past this.
>> Basically,
>> I capture all of the state transitions and the first one fired for
>> a job
>> I fire the 'init' state internally in our tool.  Generally this
>> occurs
>> for one of the gate transitions, G1 or something.  It'll work this
>> way.
>>
>> Furthermore, we're telling our users to get your 1.0.2a4 (or whatever
>> 1.0.2 is available at the time).
>>
>> The way I coded it when you guys put this into the main branch and
>> the
>> INIT state resumes firing then my code will start working that much
>> better.  I really only brought it up because I felt it was a bug you
>> might not have been aware of.
>>
>> Thanks all.
>>
>> -- Nathan
>> Correspondence
>> ---------------------------------------------------------------------
>> Nathan DeBardeleben, Ph.D.
>> Los Alamos National Laboratory
>> Parallel Tools Team
>> High Performance Computing Environments
>> phone: 505-667-3428
>> email: ndeb...@lanl.gov
>> ---------------------------------------------------------------------
>>
>>
>>
>> Jeff Squyres wrote:
>>> Nathan --
>>>
>>> Ralph and I talked about this and decided not to bring it over to
>>> the
>>> 1.0 branch -- the fix uses new functionality that exists on the
>>> trunk
>>> and not in the 1.0 branch.  The fix could be re-crafted to use
>>> existing functionality on the 1.0 branch (we're really trying to
>>> only
>>> put bug fixes on the 1.0 branch -- not any new functionality) -- but
>>> we didn't know if you cared.  :-)
>>>
>>> Do you mind if this fix stays on the trunk, or do you need it in the
>>> v1.0 branch?
>>>
>>>
>>>
>>> On Feb 8, 2006, at 4:36 PM, Nathan DeBardeleben wrote:
>>>
>>>
>>>> Thanks Ralph.
>>>>
>>>> -- Nathan
>>>> Correspondence
>>>> -------------------------------------------------------------------
>>>> --
>>>> Nathan DeBardeleben, Ph.D.
>>>> Los Alamos National Laboratory
>>>> Parallel Tools Team
>>>> High Performance Computing Environments
>>>> phone: 505-667-3428
>>>> email: ndeb...@lanl.gov
>>>> -------------------------------------------------------------------
>>>> --
>>>>
>>>>
>>>>
>>>> Ralph H. Castain wrote:
>>>>
>>>>> Nathan
>>>>>
>>>>> This should now be fixed on the trunk. Once it is checked out more
>>>>> thoroughly, I'll ask that it be moved to the 1.0 branch. For
>>>>> now, you
>>>>> might want to check out the trunk and verify it meets your needs.
>>>>>
>>>>> Ralph
>>>>>
>>>>> At 03:05 PM 2/1/2006, you wrote:
>>>>>
>>>>>
>>>>>> This was happening on Alpha 1 as well but I upgraded today to
>>>>>> Alpha 4 to
>>>>>> see if it's gone away - it has not.
>>>>>>
>>>>>> I register a callback on a spawn() inside ORTE.  That callback
>>>>>> includes
>>>>>> the current state and should be called as the job goes through
>>>>>> those states.
>>>>>>
>>>>>> I am now noticing that jobs never go through the INIT state.
>>>>>> They may
>>>>>> also not go through others but definitely not
>>>>>> ORTE_PROC_STATE_INIT.
>>>>>>
>>>>>> I was registering the IOForwarding callback during the INIT phase
>>>>>> so,
>>>>>> consequentially, I now do not have IOF.  There are other side
>>>>>> effects
>>>>>> such as jobs that I start I think are perpetually in the
>>>>>> 'starting'
>>>>>> state and then, suddenly, they're done.
>>>>>>
>>>>>> Can someone look into / comment on this please?
>>>>>>
>>>>>> Thanks.
>>>>>>
>>>>>> --
>>>>>> -- Nathan
>>>>>> Correspondence
>>>>>> -----------------------------------------------------------------
>>>>>> ---
>>>>>> -
>>>>>> Nathan DeBardeleben, Ph.D.
>>>>>> Los Alamos National Laboratory
>>>>>> Parallel Tools Team
>>>>>> High Performance Computing Environments
>>>>>> phone: 505-667-3428
>>>>>> email: ndeb...@lanl.gov
>>>>>> -----------------------------------------------------------------
>>>>>> ---
>>>>>> -
>>>>>>
>>>>>> _______________________________________________
>>>>>> devel mailing list
>>>>>> de...@open-mpi.org
>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>
>>>>>>
>>>>> _______________________________________________
>>>>> devel mailing list
>>>>> de...@open-mpi.org
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>
>>>>>
>>>>>
>>>> _______________________________________________
>>>> devel mailing list
>>>> de...@open-mpi.org
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>
>>>
>>>
>>>
>> _______________________________________________
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
>
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


Reply via email to