Yes... but something wrong is going on... maybe the problem is that the jobid 
is different than the process' jobid, I don't know.

I'm trying to send a signal to other process running under a another job. The 
other process jump into an accept_connect to the MPI comm. So i did a code like 
this (I removed verification code and comments, this is just a summary for a 
happy execution):

ompi_dpm.parse_port(port, &hnp_uri, &rml_uri, &el_tag);
orte_rml_base_parse_uris(rml_uri, &el_proc, NULL);
ompi_dpm.route_to_port(hnp_uri, &el_proc);
orte_plm.signal_job(el_proc.jobid, SIGUSR1);
ompi_dpm.connect_accept(MPI_COMM_SELF, 0, port, true, el_comm);

el_proc is defined as orte_process_name_t, not a pointer to this. And signal.h 
has been included for SIGUSR1's sake. But when the code enter in signal_job 
function it crashes. I'm trying to debug it just now... the crash is the 
following:

[Fialho-2.local:51377] receiver: looking for: radic_eventlog[0]
[Fialho-2.local:51377] receiver: found port 
<784793600.0;tcp://192.168.1.200:54071+784793601.0;tcp://192.168.1.200:54072:300>
[Fialho-2.local:51377] receiver: HNP URI 
<784793600.0;tcp://192.168.1.200:54071>, RML URI 
<784793601.0;tcp://192.168.1.200:54072>, TAG <300>
[Fialho-2.local:51377] receiver: sending SIGUSR1 <30> to RADIC Event Logger 
<[[11975,1],0]>
[Fialho-2:51377] *** Process received signal ***
[Fialho-2:51377] Signal: Segmentation fault (11)
[Fialho-2:51377] Signal code: Address not mapped (1)
[Fialho-2:51377] Failing at address: 0x0
[Fialho-2:51377] [ 0] 2   libSystem.B.dylib                   
0x00007fff83a6eeaa _sigtramp + 26
[Fialho-2:51377] [ 1] 3   libSystem.B.dylib                   
0x00007fff83a210b7 snprintf + 496
[Fialho-2:51377] [ 2] 4   mca_vprotocol_receiver.so           
0x000000010065ba0a mca_vprotocol_receiver_send + 177
[Fialho-2:51377] [ 3] 5   libmpi.0.dylib                      
0x0000000100077d44 MPI_Send + 734
[Fialho-2:51377] [ 4] 6   ping                                
0x0000000100000a97 main + 431
[Fialho-2:51377] [ 5] 7   ping                                
0x00000001000008e0 start + 52
[Fialho-2:51377] [ 6] 8   ???                                 
0x0000000000000003 0x0 + 3
[Fialho-2:51377] *** End of error message ***

With exception to the signal_job the code works, I have tested it forcing an 
accept on the other process, and avoiding the signal_job. But I want to send 
the signal to wake-up the other side and to be able to manage multiple 
connect/accept.

Thanks,
Leonardo

On Mar 17, 2010, at 1:33 AM, Ralph Castain wrote:

> Sure! So long as you add the include, you are okay as the ORTE layer is 
> "below" the OMPI one.
> 
> On Mar 16, 2010, at 6:29 PM, Leonardo Fialho wrote:
> 
>> Thanks Ralph, the last question... it orte_plm.signal_job exposed/available 
>> to be called by a PML component? Yes, I have the orte/mca/plm/plm.h include 
>> line.
>> 
>> Leonardo
>> 
>> On Mar 16, 2010, at 11:59 PM, Ralph Castain wrote:
>> 
>>> It's just the orte_process_name_t jobid field. So if you have an 
>>> orte_process_name_t *pname, then it would just be
>>> 
>>> orte_plm.signal_job(pname->jobid, sig)
>>> 
>>> 
>>> On Mar 16, 2010, at 3:23 PM, Leonardo Fialho wrote:
>>> 
>>>> Hum.... and to signal a job probably the function is 
>>>> orte_plm.signal_job(jobid, signal); right?
>>>> 
>>>> Now my dummy question is how to obtain the jobid part from an 
>>>> orte_proc_name_t variable? Is there any magical function in the 
>>>> names_fns.h?
>>>> 
>>>> Thanks,
>>>> Leonardo
>>>> 
>>>> On Mar 16, 2010, at 10:12 PM, Ralph Castain wrote:
>>>> 
>>>>> Afraid not - you can signal a job, but not a specific process. We used to 
>>>>> have such an API, but nobody ever used it. Easy to restore if someone has 
>>>>> a need.
>>>>> 
>>>>> On Mar 16, 2010, at 2:45 PM, Leonardo Fialho wrote:
>>>>> 
>>>>>> Hi,
>>>>>> 
>>>>>> Is there any function in Open MPI's frameworks to send a signal to other 
>>>>>> ORTE proc?
>>>>>> 
>>>>>> For example, the ORTE process [[1234,1],1] want to  send a signal to 
>>>>>> process [[1234,1,4] locate in other node. I'm looking for this kind of 
>>>>>> functions but I just found functions to send signal to all procs in a 
>>>>>> node.
>>>>>> 
>>>>>> Thanks,
>>>>>> Leonardo
>>>>>> _______________________________________________
>>>>>> devel mailing list
>>>>>> de...@open-mpi.org
>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>> 
>>>>> 
>>>>> _______________________________________________
>>>>> devel mailing list
>>>>> de...@open-mpi.org
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>> 
>>>> _______________________________________________
>>>> devel mailing list
>>>> de...@open-mpi.org
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> 
>>> _______________________________________________
>>> devel mailing list
>>> de...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> 
>> _______________________________________________
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

Reply via email to