Re: [OMPI devel] RML Send
On 6/17/08 3:35 PM, "Leonardo Fialho"wrote: > Hi Ralph, > > 1) Yes, I'm using ORTE_RML_TAG_DAEMON with a new "command" that I > defined in "odls_types.h". > 2) I'm packing and unpacking variables like OPAL_INT, OPAL_SIZE, ... > 3) I'm not blocking the "process_commands" function with long code. > 4) To know the daemon's vpid and jobid I used the same jobid from the > app (in this solution, I can be changed) and the vpid is ordered > sequentially (0 for mpirun and 1 to N for the orted's). The jobid of the daemons is different from the jobid of the apps. So at the moment, you are actually sending the message to another app! You can find the jobid of the daemons by extracting it as ORTE_PROC_MY_DAEMON->jobid. Please note, though, that the app has no knowledge of the contact info for that daemon, so this message will have to route through the local daemon. Happens transparently, but just wanted to be clear as to how this is working. > > The problems is: I need to send a buffered data, and I don't know the > type of this data. I'm trying to use OPAL_NULL and OPAL_DATA_VALUE to > send it but I got no success :( If I recall correctly, you were trying to archive messages that flowed through the PML - correct? I would suggest just treating them as bytes and packing them as an opal_byte_object_t, something like this: opal_byte_object_t bo; bo.size = sizeof(my-data); bo.data = *my_data; opal_dss.pack(*buffer, , 1, OPAL_BYTE_OBJECT); Then on the other end: opal_byte_object_t *bo; int32_t n; opal_dss.unpack(*buffer, , , OPAL_BYTE_OBJECT); You can then transfer the data into whatever storage you like. All this does is pass the #bytes and the bytes as a collected unit - you could, of course, simply pass the #bytes and bytes with independent packs if you wanted: int32_t num_bytes; uint8_t *my_data; opal_dss.pack(*buffer, _bytes, 1, OPAL_INT32); opal_dss.pack(*buffer, my-data, num_bytes, OPAL_BYTE); ... opal_dss.unpack(*buffer, _bytes, , OPAL_INT32); my_data = (uint8_t*)malloc(num_bytes); opal_dss.unpack(*buffer, _data, _bytes, OPAL_BYTE); Up to you. Hope that helps Ralph > > Thanks in advance, > Leonardo Fialho > > > Ralph H Castain escribió: >> I'm not sure exactly how you are trying to do this, but the usual procedure >> would be: >> >> 1. call opal_dss.pack(*buffer, *data, #data, data_type) for each thing you >> want to put in the buffer. So you might call this to pack a string: >> >> opal_dss.pack(*buffer, , 1, OPAL_STRING); >> >> 2. once you have everything packed into the buffer, you send the buffer with >> >> orte_rml.send_buffer(*dest, *buffer, dest_tag, 0); >> >> What you will need is a tag that the daemon is listening on that won't >> interfere with its normal operations - i.e., what you send won't get held >> forever waiting to get serviced, and your servicing won't block us from >> responding to a ctrl-c. You can probably use ORTE_RML_TAG_DAEMON, but you >> need to ensure you don't block anything. >> >> BTW: how is the app figuring out the name of the remote daemon? The proc >> will have access to the daemon's vpid (assuming it knows the nodename where >> the daemon is running) in the ESS, but not the jobid - I assume you are >> using some method to compute the daemon jobid from the apps? >> >> >> On 6/17/08 12:08 PM, "Leonardo Fialho" wrote: >> >> >>> Hi All, >>> >>> I´m using RML to send log messages from a PML to a ORTE daemon (located >>> in another node). I got success sending the message header, but now I >>> need to send the message data (buffer). How can I do it? The problem is >>> what data type I need to use for packing/unpacking? I tried >>> OPAL_DATA_VALUE but don´t get success... >>> >>> Thanks, >>> >> >> >> >> ___ >> devel mailing list >> de...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/devel >> >
Re: [OMPI devel] RML Send
Hi Ralph, 1) Yes, I'm using ORTE_RML_TAG_DAEMON with a new "command" that I defined in "odls_types.h". 2) I'm packing and unpacking variables like OPAL_INT, OPAL_SIZE, ... 3) I'm not blocking the "process_commands" function with long code. 4) To know the daemon's vpid and jobid I used the same jobid from the app (in this solution, I can be changed) and the vpid is ordered sequentially (0 for mpirun and 1 to N for the orted's). The problems is: I need to send a buffered data, and I don't know the type of this data. I'm trying to use OPAL_NULL and OPAL_DATA_VALUE to send it but I got no success :( Thanks in advance, Leonardo Fialho Ralph H Castain escribió: I'm not sure exactly how you are trying to do this, but the usual procedure would be: 1. call opal_dss.pack(*buffer, *data, #data, data_type) for each thing you want to put in the buffer. So you might call this to pack a string: opal_dss.pack(*buffer, , 1, OPAL_STRING); 2. once you have everything packed into the buffer, you send the buffer with orte_rml.send_buffer(*dest, *buffer, dest_tag, 0); What you will need is a tag that the daemon is listening on that won't interfere with its normal operations - i.e., what you send won't get held forever waiting to get serviced, and your servicing won't block us from responding to a ctrl-c. You can probably use ORTE_RML_TAG_DAEMON, but you need to ensure you don't block anything. BTW: how is the app figuring out the name of the remote daemon? The proc will have access to the daemon's vpid (assuming it knows the nodename where the daemon is running) in the ESS, but not the jobid - I assume you are using some method to compute the daemon jobid from the apps? On 6/17/08 12:08 PM, "Leonardo Fialho"wrote: Hi All, I´m using RML to send log messages from a PML to a ORTE daemon (located in another node). I got success sending the message header, but now I need to send the message data (buffer). How can I do it? The problem is what data type I need to use for packing/unpacking? I tried OPAL_DATA_VALUE but don´t get success... Thanks, ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Leonardo Fialho Computer Architecture and Operating Systems Department - CAOS Universidad Autonoma de Barcelona - UAB ETSE, Edifcio Q, QC/3088 http://www.caos.uab.es Phone: +34-93-581-2888 Fax: +34-93-581-2478
[OMPI devel] Open MPI v1.2.7rc1 has been posted
Hi All, The first release candidate of Open MPI v1.2.7 is now available: http://www.open-mpi.org/software/ompi/v1.2/ Please run it through it's paces as best you can. -- Tim Mattox, Ph.D. - http://homepage.mac.com/tmattox/ tmat...@gmail.com || timat...@open-mpi.org I'm a bright... http://www.the-brights.net/
Re: [OMPI devel] RML Send
I'm not sure exactly how you are trying to do this, but the usual procedure would be: 1. call opal_dss.pack(*buffer, *data, #data, data_type) for each thing you want to put in the buffer. So you might call this to pack a string: opal_dss.pack(*buffer, , 1, OPAL_STRING); 2. once you have everything packed into the buffer, you send the buffer with orte_rml.send_buffer(*dest, *buffer, dest_tag, 0); What you will need is a tag that the daemon is listening on that won't interfere with its normal operations - i.e., what you send won't get held forever waiting to get serviced, and your servicing won't block us from responding to a ctrl-c. You can probably use ORTE_RML_TAG_DAEMON, but you need to ensure you don't block anything. BTW: how is the app figuring out the name of the remote daemon? The proc will have access to the daemon's vpid (assuming it knows the nodename where the daemon is running) in the ESS, but not the jobid - I assume you are using some method to compute the daemon jobid from the apps? On 6/17/08 12:08 PM, "Leonardo Fialho"wrote: > Hi All, > > I´m using RML to send log messages from a PML to a ORTE daemon (located > in another node). I got success sending the message header, but now I > need to send the message data (buffer). How can I do it? The problem is > what data type I need to use for packing/unpacking? I tried > OPAL_DATA_VALUE but don´t get success... > > Thanks,
[OMPI devel] RML Send
Hi All, I´m using RML to send log messages from a PML to a ORTE daemon (located in another node). I got success sending the message header, but now I need to send the message data (buffer). How can I do it? The problem is what data type I need to use for packing/unpacking? I tried OPAL_DATA_VALUE but don´t get success... Thanks, -- Leonardo Fialho Computer Architecture and Operating Systems Department - CAOS Universidad Autonoma de Barcelona - UAB ETSE, Edifcio Q, QC/3088 http://www.caos.uab.es Phone: +34-93-581-2888 Fax: +34-93-581-2478
[OMPI devel] iprobe and opal_progress
I've ran into an issue while running hpl where a message has been sent (in shared memory in this case) and the receiver calls iprobe but doesn't see said message the first call to iprobe (even though it is there) but does see it the second call to iprobe. Looking at mca_pml_ob1_iprobe function and the calls it makes it looks like it checks the unexpected queue for matches and if it doesn't find one it sets the flag to 0 (no matches), then calls opal_progress and return. This seems wrong to me since I would expect that the call to opal_progress probably would pull in the message that the iprobe is waiting for. Am I correct in my reading of the code? It seems that maybe some sort of check needs to be done after the call to opal_progress in mca_pml_ob1_iprobe. Attached is a simple program that shows the issue I am running into: #include int main() { int rank, src[2], dst[2], flag = 0; int nxfers; MPI_Status status; MPI_Init(NULL, NULL); MPI_Comm_rank(MPI_COMM_WORLD, ); if (0 == rank) { for (nxfers = 0; nxfers < 5; nxfers++) MPI_Send(src, 2, MPI_INT, 1, 0, MPI_COMM_WORLD); } else if (1 == rank) { for (nxfers = 0; nxfers < 5; nxfers++) { sleep(5); flag = 0; while (!flag) { printf("iprobe..."); MPI_Iprobe(0, 0, MPI_COMM_WORLD, , ); } printf("\n"); MPI_Recv(dst, 2, MPI_INT, 0, 0, MPI_COMM_WORLD, ); } } MPI_Finalize(); } --td
Re: [OMPI devel] BW benchmark hangs after r 18551
It seems like we have 2 bugs here. 1. After commiting NUMA awareness we see seqf 2. Before commiting NUMA r18656 we see application hangs. 3. I checked both it with and without sendi, same results. 4. It hangs most of the times, but sometimes large msg ( >1M ) are working. I will keep investigating :) VER=TRUNK; //home/USERS/lenny/OMPI_ORTE_${VER}/bin/mpicc -o mpi_p_${VER} /opt/vltmpi/OPENIB/mpi/examples/mpi_p.c ; /home/USERS/lenny/OMPI_ORTE_${VER}/bin/mpirun -np 100 -hostfile hostfile_w ./mpi_p_${VER} -t bw -s 400 [witch17:09798] *** Process received signal *** [witch17:09798] Signal: Segmentation fault (11) [witch17:09798] Signal code: Address not mapped (1) [witch17:09798] Failing at address: (nil) [witch17:09798] [ 0] /lib64/libpthread.so.0 [0x2b1d13530c10] [witch17:09798] [ 1] /home/USERS/lenny/OMPI_ORTE_TRUNK/lib/openmpi/mca_btl_sm.so [0x2b1d1557a68a] [witch17:09798] [ 2] /home/USERS/lenny/OMPI_ORTE_TRUNK/lib/openmpi/mca_bml_r2.so [0x2b1d14e1b12f] [witch17:09798] [ 3] /home/USERS/lenny/OMPI_ORTE_TRUNK/lib/libopen-pal.so.0(opal_progress+0x5a) [0x2b1d12f6a6da] [witch17:09798] [ 4] /home/USERS/lenny/OMPI_ORTE_TRUNK/lib/libmpi.so.0 [0x2b1d12cafd28] [witch17:09798] [ 5] /home/USERS/lenny/OMPI_ORTE_TRUNK/lib/libmpi.so.0(PMPI_Waitall+0x91) [0x2b1d12cd9d71] [witch17:09798] [ 6] ./mpi_p_TRUNK(main+0xd32) [0x401ca2] [witch17:09798] [ 7] /lib64/libc.so.6(__libc_start_main+0xf4) [0x2b1d13657154] [witch17:09798] [ 8] ./mpi_p_TRUNK [0x400ea9] [witch17:09798] *** End of error message *** [witch1:24955] -- mpirun noticed that process rank 62 with PID 9798 on node witch17 exited on signal 11 (Segmentation fault). -- witch1:/home/USERS/lenny/TESTS/NUMA # witch1:/home/USERS/lenny/TESTS/NUMA # witch1:/home/USERS/lenny/TESTS/NUMA # witch1:/home/USERS/lenny/TESTS/NUMA # VER=18551; //home/USERS/lenny/OMPI_ORTE_${VER}/bin/mpicc -o mpi_p_${VER} /opt/vltmpi/OPENIB/mpi/examples/mpi_p.c ; /home/USERS/lenny/OMPI_ORTE_${VER}/bin/mpirun -np 100 -hostfile hostfile_w ./mpi_p_${VER} -t bw -s 400 BW (100) (size min max avg) 400654.496755 2121.899985 1156.171067 witch1:/home/USERS/lenny/TESTS/NUMA # On Tue, Jun 17, 2008 at 2:10 PM, George Bosilcawrote: > Lenny, > > I guess you're running the latest version. If not, please update, Galen and > myself corrected some bugs last week. If you're using the latest (and > greatest) then ... well I imagine there is at least one bug left. > > There is a quick test you can do. In the btl_sm.c in the module structure > at the beginning of the file, please replace the sendi function by NULL. If > this fix the problem, then at least we know that it's a sm send immediate > problem. > > Thanks, >george. > > > On Jun 17, 2008, at 7:54 AM, Lenny Verkhovsky wrote: > > Hi, George, >> >> I have a problem running BW benchmark on 100 rank cluster after r18551. >> The BW is mpi_p that runs mpi_bandwidth with 100K between all pairs. >> >> >> #mpirun -np 100 -hostfile hostfile_w ./mpi_p_18549 -t bw -s 10 >> BW (100) (size min max avg) 10 576.734030 2001.882416 >> 1062.698408 >> #mpirun -np 100 -hostfile hostfile_w ./mpi_p_18551 -t bw -s 10 >> mpirun: killing job... >> ( it hangs even after 10 hours ). >> >> >> It doesn't happen if I run --bynode or btl openib,self only. >> >> >> Lenny. >> > >
Re: [OMPI devel] BW benchmark hangs after r 18551
Lenny, I guess you're running the latest version. If not, please update, Galen and myself corrected some bugs last week. If you're using the latest (and greatest) then ... well I imagine there is at least one bug left. There is a quick test you can do. In the btl_sm.c in the module structure at the beginning of the file, please replace the sendi function by NULL. If this fix the problem, then at least we know that it's a sm send immediate problem. Thanks, george. On Jun 17, 2008, at 7:54 AM, Lenny Verkhovsky wrote: Hi, George, I have a problem running BW benchmark on 100 rank cluster after r18551. The BW is mpi_p that runs mpi_bandwidth with 100K between all pairs. #mpirun -np 100 -hostfile hostfile_w ./mpi_p_18549 -t bw -s 10 BW (100) (size min max avg) 10 576.734030 2001.882416 1062.698408 #mpirun -np 100 -hostfile hostfile_w ./mpi_p_18551 -t bw -s 10 mpirun: killing job... ( it hangs even after 10 hours ). It doesn't happen if I run --bynode or btl openib,self only. Lenny. smime.p7s Description: S/MIME cryptographic signature
[OMPI devel] BW benchmark hangs after r 18551
Hi, George, I have a problem running BW benchmark on 100 rank cluster after r18551. The BW is mpi_p that runs mpi_bandwidth with 100K between all pairs. #mpirun -np 100 -hostfile hostfile_w ./mpi_p_18549 -t bw -s 10 BW (100) (size min max avg) 10 576.734030 2001.882416 1062.698408 #mpirun -np 100 -hostfile hostfile_w ./mpi_p_18551 -t bw -s 10 mpirun: killing job... ( it hangs even after 10 hours ). It doesn't happen if I run --bynode or btl openib,self only. Lenny.