Okay, I've traced this down. The problem is that a DSS-internal function has been exposed via the API, so now people can mistakenly call the wrong one. You should -never- be using opal_dss.pack_buffer or opal_dss.unpack_buffer. Those were supposed to be internal to the DSS only, and will definitely mess you up if called directly.
I'll fix this problem to avoid future issues. There is a comment in dss.h that warns you never to call those functions, but who would remember? I sure wouldn't. I've only avoided the problem because of ignorance - I didn't know those API's existed! Should have a fix in later today. Ralph On 6/19/08 8:43 AM, "Ralph H Castain" <r...@lanl.gov> wrote: > WOW! Somebody really screwed up the DSS by adding some new API's I'd never > heard of before, but really can cause the system to break! > > I'm going to have to straighten this mess out - it is a total disaster. > There needs to be just ONE way of packing and unpacking, not two totally > incompatible methods. > > Will let you know when it is fixed - probably early next week. > Ralph > > > > On 6/19/08 8:34 AM, "Leonardo Fialho" <lfia...@aomail.uab.es> wrote: > >> Hi Ralph, >> >> Mi mistake, I'm really using ORTE_PROC_MY_DAEMON->jobid. >> >> I have success using pack_buffer()/unpack_buffer() and OPAL_BYTE type, >> something strange occur when I was using pack()/unpack(). The value of >> num_bytes increase, example: >> I tried to read num_bytes=5, and after a unpack this var have 33! I >> don't understand it... >> >> Thanks, >> Leonardo Fialho >> >> Ralph Castain escribió: >>> >>> On 6/17/08 3:35 PM, "Leonardo Fialho" <lfia...@aomail.uab.es> wrote: >>> >>> >>>> Hi Ralph, >>>> >>>> 1) Yes, I'm using ORTE_RML_TAG_DAEMON with a new "command" that I >>>> defined in "odls_types.h". >>>> 2) I'm packing and unpacking variables like OPAL_INT, OPAL_SIZE, ... >>>> 3) I'm not blocking the "process_commands" function with long code. >>>> 4) To know the daemon's vpid and jobid I used the same jobid from the >>>> app (in this solution, I can be changed) and the vpid is ordered >>>> sequentially (0 for mpirun and 1 to N for the orted's). >>>> >>> >>> The jobid of the daemons is different from the jobid of the apps. So at the >>> moment, you are actually sending the message to another app! >>> >>> You can find the jobid of the daemons by extracting it as >>> ORTE_PROC_MY_DAEMON->jobid. Please note, though, that the app has no >>> knowledge of the contact info for that daemon, so this message will have to >>> route through the local daemon. Happens transparently, but just wanted to be >>> clear as to how this is working. >>> >>> >>>> The problems is: I need to send a buffered data, and I don't know the >>>> type of this data. I'm trying to use OPAL_NULL and OPAL_DATA_VALUE to >>>> send it but I got no success.... :( >>>> >>> >>> If I recall correctly, you were trying to archive messages that flowed >>> through the PML - correct? I would suggest just treating them as bytes and >>> packing them as an opal_byte_object_t, something like this: >>> >>> opal_byte_object_t bo; >>> >>> bo.size = sizeof(my-data); >>> bo.data = *my_data; >>> >>> opal_dss.pack(*buffer, &bo, 1, OPAL_BYTE_OBJECT); >>> >>> Then on the other end: >>> >>> opal_byte_object_t *bo; >>> int32_t n; >>> >>> opal_dss.unpack(*buffer, &bo, &n, OPAL_BYTE_OBJECT); >>> >>> You can then transfer the data into whatever storage you like. All this does >>> is pass the #bytes and the bytes as a collected unit - you could, of course, >>> simply pass the #bytes and bytes with independent packs if you wanted: >>> >>> int32_t num_bytes; >>> uint8_t *my_data; >>> >>> opal_dss.pack(*buffer, &num_bytes, 1, OPAL_INT32); >>> opal_dss.pack(*buffer, my-data, num_bytes, OPAL_BYTE); >>> >>> ... >>> >>> opal_dss.unpack(*buffer, &num_bytes, &n, OPAL_INT32); >>> my_data = (uint8_t*)malloc(num_bytes); >>> opal_dss.unpack(*buffer, &my_data, &num_bytes, OPAL_BYTE); >>> >>> >>> Up to you. >>> >>> Hope that helps >>> Ralph >>> >>> >>>> Thanks in advance, >>>> Leonardo Fialho >>>> >>>> >>>> Ralph H Castain escribió: >>>> >>>>> I'm not sure exactly how you are trying to do this, but the usual >>>>> procedure >>>>> would be: >>>>> >>>>> 1. call opal_dss.pack(*buffer, *data, #data, data_type) for each thing you >>>>> want to put in the buffer. So you might call this to pack a string: >>>>> >>>>> opal_dss.pack(*buffer, &string, 1, OPAL_STRING); >>>>> >>>>> 2. once you have everything packed into the buffer, you send the buffer >>>>> with >>>>> >>>>> orte_rml.send_buffer(*dest, *buffer, dest_tag, 0); >>>>> >>>>> What you will need is a tag that the daemon is listening on that won't >>>>> interfere with its normal operations - i.e., what you send won't get held >>>>> forever waiting to get serviced, and your servicing won't block us from >>>>> responding to a ctrl-c. You can probably use ORTE_RML_TAG_DAEMON, but you >>>>> need to ensure you don't block anything. >>>>> >>>>> BTW: how is the app figuring out the name of the remote daemon? The proc >>>>> will have access to the daemon's vpid (assuming it knows the nodename >>>>> where >>>>> the daemon is running) in the ESS, but not the jobid - I assume you are >>>>> using some method to compute the daemon jobid from the apps? >>>>> >>>>> >>>>> On 6/17/08 12:08 PM, "Leonardo Fialho" <lfia...@aomail.uab.es> wrote: >>>>> >>>>> >>>>> >>>>>> Hi All, >>>>>> >>>>>> I´m using RML to send log messages from a PML to a ORTE daemon (located >>>>>> in another node). I got success sending the message header, but now I >>>>>> need to send the message data (buffer). How can I do it? The problem is >>>>>> what data type I need to use for packing/unpacking? I tried >>>>>> OPAL_DATA_VALUE but don´t get success... >>>>>> >>>>>> Thanks, >>>>>> >>>>>> >>>>> >>>>> _______________________________________________ >>>>> devel mailing list >>>>> de...@open-mpi.org >>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>>> >>>>> >>> >>> >>> >>> _______________________________________________ >>> devel mailing list >>> de...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >>> >> > > > > _______________________________________________ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel