Re: [OMPI devel] RML Send

2008-06-17 Thread Ralph Castain



On 6/17/08 3:35 PM, "Leonardo Fialho"  wrote:

> Hi Ralph,
> 
> 1) Yes, I'm using ORTE_RML_TAG_DAEMON with a new "command" that I
> defined in "odls_types.h".
> 2) I'm packing and unpacking variables like OPAL_INT, OPAL_SIZE, ...
> 3) I'm not blocking the "process_commands" function with long code.
> 4) To know the daemon's vpid and jobid I used the same jobid from the
> app (in this solution, I can be changed) and the vpid is ordered
> sequentially (0 for mpirun and 1 to N for the orted's).

The jobid of the daemons is different from the jobid of the apps. So at the
moment, you are actually sending the message to another app!

You can find the jobid of the daemons by extracting it as
ORTE_PROC_MY_DAEMON->jobid. Please note, though, that the app has no
knowledge of the contact info for that daemon, so this message will have to
route through the local daemon. Happens transparently, but just wanted to be
clear as to how this is working.

> 
> The problems is: I need to send a buffered data, and I don't know the
> type of this data. I'm trying to use OPAL_NULL and OPAL_DATA_VALUE to
> send it but I got no success :(

If I recall correctly, you were trying to archive messages that flowed
through the PML - correct? I would suggest just treating them as bytes and
packing them as an opal_byte_object_t, something like this:

opal_byte_object_t bo;

bo.size = sizeof(my-data);
bo.data = *my_data;

opal_dss.pack(*buffer, , 1, OPAL_BYTE_OBJECT);

Then on the other end:

opal_byte_object_t *bo;
int32_t n;

opal_dss.unpack(*buffer, , , OPAL_BYTE_OBJECT);

You can then transfer the data into whatever storage you like. All this does
is pass the #bytes and the bytes as a collected unit - you could, of course,
simply pass the #bytes and bytes with independent packs if you wanted:

int32_t num_bytes;
uint8_t *my_data;

opal_dss.pack(*buffer, _bytes, 1, OPAL_INT32);
opal_dss.pack(*buffer, my-data, num_bytes, OPAL_BYTE);

...

opal_dss.unpack(*buffer, _bytes, , OPAL_INT32);
my_data = (uint8_t*)malloc(num_bytes);
opal_dss.unpack(*buffer, _data, _bytes, OPAL_BYTE);


Up to you.

Hope that helps
Ralph

> 
> Thanks in advance,
> Leonardo Fialho
> 
> 
> Ralph H Castain escribió:
>> I'm not sure exactly how you are trying to do this, but the usual procedure
>> would be:
>> 
>> 1. call opal_dss.pack(*buffer, *data, #data, data_type) for each thing you
>> want to put in the buffer. So you might call this to pack a string:
>> 
>> opal_dss.pack(*buffer, , 1, OPAL_STRING);
>> 
>> 2. once you have everything packed into the buffer, you send the buffer with
>> 
>> orte_rml.send_buffer(*dest, *buffer, dest_tag, 0);
>> 
>> What you will need is a tag that the daemon is listening on that won't
>> interfere with its normal operations - i.e., what you send won't get held
>> forever waiting to get serviced, and your servicing won't block us from
>> responding to a ctrl-c. You can probably use ORTE_RML_TAG_DAEMON, but you
>> need to ensure you don't block anything.
>> 
>> BTW: how is the app figuring out the name of the remote daemon? The proc
>> will have access to the daemon's vpid (assuming it knows the nodename where
>> the daemon is running) in the ESS, but not the jobid - I assume you are
>> using some method to compute the daemon jobid from the apps?
>> 
>> 
>> On 6/17/08 12:08 PM, "Leonardo Fialho"  wrote:
>> 
>>   
>>> Hi All,
>>> 
>>> I´m using RML to send log messages from a PML to a ORTE daemon (located
>>> in another node). I got success sending the message header, but now I
>>> need to send the message data (buffer). How can I do it? The problem is
>>> what data type I need to use for packing/unpacking? I tried
>>> OPAL_DATA_VALUE but don´t get success...
>>> 
>>> Thanks,
>>> 
>> 
>> 
>> 
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>   
> 





Re: [OMPI devel] RML Send

2008-06-17 Thread Leonardo Fialho

Hi Ralph,

1) Yes, I'm using ORTE_RML_TAG_DAEMON with a new "command" that I 
defined in "odls_types.h".

2) I'm packing and unpacking variables like OPAL_INT, OPAL_SIZE, ...
3) I'm not blocking the "process_commands" function with long code.
4) To know the daemon's vpid and jobid I used the same jobid from the 
app (in this solution, I can be changed) and the vpid is ordered 
sequentially (0 for mpirun and 1 to N for the orted's).


The problems is: I need to send a buffered data, and I don't know the 
type of this data. I'm trying to use OPAL_NULL and OPAL_DATA_VALUE to 
send it but I got no success :(


Thanks in advance,
Leonardo Fialho


Ralph H Castain escribió:

I'm not sure exactly how you are trying to do this, but the usual procedure
would be:

1. call opal_dss.pack(*buffer, *data, #data, data_type) for each thing you
want to put in the buffer. So you might call this to pack a string:

opal_dss.pack(*buffer, , 1, OPAL_STRING);

2. once you have everything packed into the buffer, you send the buffer with

orte_rml.send_buffer(*dest, *buffer, dest_tag, 0);

What you will need is a tag that the daemon is listening on that won't
interfere with its normal operations - i.e., what you send won't get held
forever waiting to get serviced, and your servicing won't block us from
responding to a ctrl-c. You can probably use ORTE_RML_TAG_DAEMON, but you
need to ensure you don't block anything.

BTW: how is the app figuring out the name of the remote daemon? The proc
will have access to the daemon's vpid (assuming it knows the nodename where
the daemon is running) in the ESS, but not the jobid - I assume you are
using some method to compute the daemon jobid from the apps?


On 6/17/08 12:08 PM, "Leonardo Fialho"  wrote:

  

Hi All,

I´m using RML to send log messages from a PML to a ORTE daemon (located
in another node). I got success sending the message header, but now I
need to send the message data (buffer). How can I do it? The problem is
what data type I need to use for packing/unpacking? I tried
OPAL_DATA_VALUE but don´t get success...

Thanks,





___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel
  



--
Leonardo Fialho
Computer Architecture and Operating Systems Department - CAOS
Universidad Autonoma de Barcelona - UAB
ETSE, Edifcio Q, QC/3088
http://www.caos.uab.es
Phone: +34-93-581-2888
Fax: +34-93-581-2478



[OMPI devel] Open MPI v1.2.7rc1 has been posted

2008-06-17 Thread Tim Mattox
Hi All,
The first release candidate of Open MPI v1.2.7 is now available:

 http://www.open-mpi.org/software/ompi/v1.2/

Please run it through it's paces as best you can.
-- 
Tim Mattox, Ph.D. - http://homepage.mac.com/tmattox/
 tmat...@gmail.com || timat...@open-mpi.org
 I'm a bright... http://www.the-brights.net/


Re: [OMPI devel] RML Send

2008-06-17 Thread Ralph H Castain
I'm not sure exactly how you are trying to do this, but the usual procedure
would be:

1. call opal_dss.pack(*buffer, *data, #data, data_type) for each thing you
want to put in the buffer. So you might call this to pack a string:

opal_dss.pack(*buffer, , 1, OPAL_STRING);

2. once you have everything packed into the buffer, you send the buffer with

orte_rml.send_buffer(*dest, *buffer, dest_tag, 0);

What you will need is a tag that the daemon is listening on that won't
interfere with its normal operations - i.e., what you send won't get held
forever waiting to get serviced, and your servicing won't block us from
responding to a ctrl-c. You can probably use ORTE_RML_TAG_DAEMON, but you
need to ensure you don't block anything.

BTW: how is the app figuring out the name of the remote daemon? The proc
will have access to the daemon's vpid (assuming it knows the nodename where
the daemon is running) in the ESS, but not the jobid - I assume you are
using some method to compute the daemon jobid from the apps?


On 6/17/08 12:08 PM, "Leonardo Fialho"  wrote:

> Hi All,
> 
> I´m using RML to send log messages from a PML to a ORTE daemon (located
> in another node). I got success sending the message header, but now I
> need to send the message data (buffer). How can I do it? The problem is
> what data type I need to use for packing/unpacking? I tried
> OPAL_DATA_VALUE but don´t get success...
> 
> Thanks,





[OMPI devel] RML Send

2008-06-17 Thread Leonardo Fialho

Hi All,

I´m using RML to send log messages from a PML to a ORTE daemon (located 
in another node). I got success sending the message header, but now I 
need to send the message data (buffer). How can I do it? The problem is 
what data type I need to use for packing/unpacking? I tried 
OPAL_DATA_VALUE but don´t get success...


Thanks,

--
Leonardo Fialho
Computer Architecture and Operating Systems Department - CAOS
Universidad Autonoma de Barcelona - UAB
ETSE, Edifcio Q, QC/3088
http://www.caos.uab.es
Phone: +34-93-581-2888
Fax: +34-93-581-2478



[OMPI devel] iprobe and opal_progress

2008-06-17 Thread Terry Dontje
I've ran into an issue while running hpl where a message has been sent 
(in shared memory in this case) and the receiver calls iprobe but 
doesn't see said message the first call to iprobe (even though it is 
there) but does see it the second call to iprobe.  Looking at 
mca_pml_ob1_iprobe function and the calls it makes it looks like it 
checks the unexpected queue for matches and if it doesn't find one it 
sets the flag to 0 (no matches), then calls opal_progress and return.  
This seems wrong to me since I would expect that the call to 
opal_progress probably would pull in the message that the iprobe is 
waiting for.


Am I correct in my reading of the code?  It seems that maybe some sort 
of check needs to be done after the call to opal_progress in 
mca_pml_ob1_iprobe.


Attached is a simple program that shows the issue I am running into:

#include 

int main() {
   int rank, src[2], dst[2], flag = 0;
   int nxfers;
   MPI_Status status;

   MPI_Init(NULL, NULL);
   MPI_Comm_rank(MPI_COMM_WORLD, );

   if (0 == rank) {
   for (nxfers = 0; nxfers < 5; nxfers++)
   MPI_Send(src, 2, MPI_INT, 1, 0, MPI_COMM_WORLD);
   } else if (1 == rank) {
   for (nxfers = 0; nxfers < 5; nxfers++) {
   sleep(5);
   flag = 0;
   while (!flag) {
  printf("iprobe...");
  MPI_Iprobe(0, 0, MPI_COMM_WORLD, , );
   }
   printf("\n");
   MPI_Recv(dst, 2, MPI_INT, 0, 0, MPI_COMM_WORLD, );
   }
   }
   MPI_Finalize();
}

--td


Re: [OMPI devel] BW benchmark hangs after r 18551

2008-06-17 Thread Lenny Verkhovsky
It seems like we have 2 bugs here.
1. After commiting NUMA awareness we see seqf
2. Before commiting NUMA r18656 we see application hangs.
3. I checked both it with and without sendi, same results.
4. It hangs most of the times, but sometimes large msg ( >1M ) are working.


I will keep investigating :)


VER=TRUNK; //home/USERS/lenny/OMPI_ORTE_${VER}/bin/mpicc -o mpi_p_${VER}
/opt/vltmpi/OPENIB/mpi/examples/mpi_p.c ;
/home/USERS/lenny/OMPI_ORTE_${VER}/bin/mpirun -np 100 -hostfile hostfile_w
./mpi_p_${VER} -t bw -s 400
[witch17:09798] *** Process received signal ***
[witch17:09798] Signal: Segmentation fault (11)
[witch17:09798] Signal code: Address not mapped (1)
[witch17:09798] Failing at address: (nil)
[witch17:09798] [ 0] /lib64/libpthread.so.0 [0x2b1d13530c10]
[witch17:09798] [ 1]
/home/USERS/lenny/OMPI_ORTE_TRUNK/lib/openmpi/mca_btl_sm.so [0x2b1d1557a68a]
[witch17:09798] [ 2]
/home/USERS/lenny/OMPI_ORTE_TRUNK/lib/openmpi/mca_bml_r2.so [0x2b1d14e1b12f]
[witch17:09798] [ 3]
/home/USERS/lenny/OMPI_ORTE_TRUNK/lib/libopen-pal.so.0(opal_progress+0x5a)
[0x2b1d12f6a6da]
[witch17:09798] [ 4] /home/USERS/lenny/OMPI_ORTE_TRUNK/lib/libmpi.so.0
[0x2b1d12cafd28]
[witch17:09798] [ 5]
/home/USERS/lenny/OMPI_ORTE_TRUNK/lib/libmpi.so.0(PMPI_Waitall+0x91)
[0x2b1d12cd9d71]
[witch17:09798] [ 6] ./mpi_p_TRUNK(main+0xd32) [0x401ca2]
[witch17:09798] [ 7] /lib64/libc.so.6(__libc_start_main+0xf4)
[0x2b1d13657154]
[witch17:09798] [ 8] ./mpi_p_TRUNK [0x400ea9]
[witch17:09798] *** End of error message ***
[witch1:24955]
--
mpirun noticed that process rank 62 with PID 9798 on node witch17 exited on
signal 11 (Segmentation fault).
--
witch1:/home/USERS/lenny/TESTS/NUMA #
witch1:/home/USERS/lenny/TESTS/NUMA #
witch1:/home/USERS/lenny/TESTS/NUMA #
witch1:/home/USERS/lenny/TESTS/NUMA # VER=18551;
//home/USERS/lenny/OMPI_ORTE_${VER}/bin/mpicc -o mpi_p_${VER}
/opt/vltmpi/OPENIB/mpi/examples/mpi_p.c ;
/home/USERS/lenny/OMPI_ORTE_${VER}/bin/mpirun -np 100 -hostfile hostfile_w
./mpi_p_${VER} -t bw -s 400
BW (100) (size min max avg)  400654.496755  2121.899985
1156.171067
witch1:/home/USERS/lenny/TESTS/NUMA
#




On Tue, Jun 17, 2008 at 2:10 PM, George Bosilca 
wrote:

> Lenny,
>
> I guess you're running the latest version. If not, please update, Galen and
> myself corrected some bugs last week. If you're using the latest (and
> greatest) then ... well I imagine there is at least one bug left.
>
> There is a quick test you can do. In the btl_sm.c in the module structure
> at the beginning of the file, please replace the sendi function by NULL. If
> this fix the problem, then at least we know that it's a sm send immediate
> problem.
>
>  Thanks,
>george.
>
>
> On Jun 17, 2008, at 7:54 AM, Lenny Verkhovsky wrote:
>
> Hi, George,
>>
>> I have a problem running BW benchmark on 100 rank cluster after r18551.
>> The BW is mpi_p that runs mpi_bandwidth with 100K between all pairs.
>>
>>
>> #mpirun -np 100 -hostfile hostfile_w  ./mpi_p_18549 -t bw -s 10
>> BW (100) (size min max avg)  10 576.734030  2001.882416
>> 1062.698408
>> #mpirun -np 100 -hostfile hostfile_w ./mpi_p_18551 -t bw -s 10
>> mpirun: killing job...
>> ( it hangs even after 10 hours ).
>>
>>
>> It doesn't happen if I run --bynode or btl openib,self only.
>>
>>
>> Lenny.
>>
>
>


Re: [OMPI devel] BW benchmark hangs after r 18551

2008-06-17 Thread George Bosilca

Lenny,

I guess you're running the latest version. If not, please update,  
Galen and myself corrected some bugs last week. If you're using the  
latest (and greatest) then ... well I imagine there is at least one  
bug left.


There is a quick test you can do. In the btl_sm.c in the module  
structure at the beginning of the file, please replace the sendi  
function by NULL. If this fix the problem, then at least we know that  
it's a sm send immediate problem.


  Thanks,
george.

On Jun 17, 2008, at 7:54 AM, Lenny Verkhovsky wrote:


Hi, George,

I have a problem running BW benchmark on 100 rank cluster after  
r18551.

The BW is mpi_p that runs mpi_bandwidth with 100K between all pairs.


#mpirun -np 100 -hostfile hostfile_w  ./mpi_p_18549 -t bw -s 10
BW (100) (size min max avg)  10 576.734030   
2001.882416 1062.698408

#mpirun -np 100 -hostfile hostfile_w ./mpi_p_18551 -t bw -s 10
mpirun: killing job...
( it hangs even after 10 hours ).


It doesn't happen if I run --bynode or btl openib,self only.


Lenny.




smime.p7s
Description: S/MIME cryptographic signature


[OMPI devel] BW benchmark hangs after r 18551

2008-06-17 Thread Lenny Verkhovsky
Hi, George,

I have a problem running BW benchmark on 100 rank cluster after r18551.
The BW is mpi_p that runs mpi_bandwidth with 100K between all pairs.


#mpirun -np 100 -hostfile hostfile_w  ./mpi_p_18549 -t bw -s 10
BW (100) (size min max avg)  10 576.734030  2001.882416
1062.698408
#mpirun -np 100 -hostfile hostfile_w ./mpi_p_18551 -t bw -s 10
mpirun: killing job...
( it hangs even after 10 hours ).


It doesn't happen if I run --bynode or btl openib,self only.


Lenny.