Re: [OMPI devel] [OMPI users] Sophos virus
Thanks for all the helpful off-list replies. Note that this mail was sent about 11 hours ago; it was sent when we thought that some members of the list had caught a virus. Unfortunately, this mail, too, was caught in the problem -- the IU admins apparently just released it from the incorrectly-tagged-as-spam queue. Rest assured that I un-moderated everyone much earlier today. :-) Thanks to the IU admins for cleaning this all up! On Jan 8, 2010, at 6:29 AM, Jeff Squyres (jsquyres) wrote: > Well it looks like the Sophos virus is making the rounds today. :-) > > Thankfully, the Indiana U. virus scanner found and removed the virus before > it was delivered to our lists. I've moderated the two members who appear to > have sent the virus messages, so hopefully we won't get those messages again. > > (if you've been moderated, please contact me off-list to let me know when/if > you've cleaned up the virus and I'll un-moderate you) > > -- > Jeff Squyres > jsquy...@cisco.com > > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users > -- Jeff Squyres jsquy...@cisco.com
[OMPI devel] Sophos virus
Well it looks like the Sophos virus is making the rounds today. :-) Thankfully, the Indiana U. virus scanner found and removed the virus before it was delivered to our lists. I've moderated the two members who appear to have sent the virus messages, so hopefully we won't get those messages again. (if you've been moderated, please contact me off-list to let me know when/if you've cleaned up the virus and I'll un-moderate you) -- Jeff Squyres jsquy...@cisco.com
Re: [OMPI devel] Howto pause BTL's sending at runtime - sorry for spamming
Hello, please apologize the spamming. the mailfilter somehow doesn't like my mail. I have no clue what the problem could be... I've uploaded the Mail as a simple textfile. Please have a look at it. Thanks. http://www.sysckon.de/mail.txt -- Paderborn Center for Parallel Computing - PC2 University of Paderborn - Germany http://www.pc2.de Christoph Konersmann
Re: [OMPI devel] Howto pause BTL's sending at runtime
Sorry, but somehow the mailscanner doesn't like the sourcecode... Changed now. Hi again, Maybe I should give more specific information with some code snippets... Currently I added the definition ORTE_DAEMON_BTL_CTL_CMD (orte_daemon_cmd_flag_t) 26 to odls_types.h to identify if I want to trigger the BTL pause. In process_commands() of orted/orted_comm.c this flag is processed first by broadcasting to all orteds with xcast of the grpcomm framework. At second it's forwarded with orte_odls.deliver_message to the local procs. So every process should get the trigger. Or is there another possibly easier way of spawning the trigger? I expanded the mca_btl_base_module_t in btl/btl.h simply with an indicator (btl_paused) if pause is set. I then added a line to the initial values in every BTL component that btl_paused should be false by default. E.g. in self/btl_self.c. Or did I forget something? So my problem is now, when every process gets the trigger in the ORTE project, how could I set btl->paused to true in OMPI project? ORTE has not (and I know it should not) have access to the OMPI components. Is there a way of implementing a libevent callback function in the BTL modules? Or is there another way? I already read the documentation at your wiki-site, but for me it's not really trivial as I'm relatively new to this. An idea to get the connection to the OMPI project would be to use the ft_event framework. Therefore I added another opal_crs_state_type_t OPAL_CRS_PAUSE in crs/crs.h and tried to trigger the event in orted_comm.c with orte_ess.ft_event(OPAL_CRS_PAUSE), but the ft_event() is NULL and therefore isn't executed... Any ideas? Any advices? For me the performance impact of a solution is of no interest. Thanks, and please excuse me if I bother you with this. Christoph Christoph Konersmann schrieb: Hi all, I'm trying to implement a method to pause all BTL's sending packets to their destinations. Currently I added a state variable to orte_process_info which will be changed with an external program through process_commands() in orte/orted/orted_comm.c (I hope it's processed globaly not locally). While this state is changed to something defined as PAUSE, I want the send_methods in PML-Layer to be halted omitting any network traffic. By now it's not working, cause the PML-Layer does not see the state change. Another way would be to use a libevent thread on the bml/pml-level. I've read that this library is already supported/implemented, or am I wrong? How would I use libevent in this context? Does somebody have an example or hint? Or should I use the fault tolerance framework for this purpose? Any help would be appreciated. thanks -- Paderborn Center for Parallel Computing - PC2 University of Paderborn - Germany http://www.pc2.de Christoph Konersmann
Re: [OMPI devel] Howto pause BTL's sending at runtime
Sorry, but the mailscanner somehow doesn't like the sourcecode... Changed now. Hi again, Maybe I should give more specific information with some code snippets... Currently I added #define ORTE_DAEMON_BTL_CTL_CMD (orte_daemon_cmd_flag_t) 26 to odls_types.h to identify if I want to trigger the BTL pause. In process_commands() of orted/orted_comm.c this flag is processed first by broadcasting to all orteds with xcast of the grpcomm framework. At second it's forwarded with orte_odls.deliver_message to the local procs. So every process should get the trigger. Or is there another possibly easier way of spawning the trigger? I expanded the mca_btl_base_module_t in btl/btl.h simply with an indicator (btl_paused) if pause is set. I then added a line to the initial values in every BTL component that btl_paused should be false by default. E.g. in self/btl_self.c. Or did I forget something? So my problem is now, when every process gets the trigger in the ORTE project, how could I set btl->paused to true in OMPI project? ORTE has not (and I know it should not) have access to the OMPI components. Is there a way of implementing a libevent callback function in the BTL modules? Or is there another way? I already read the documentation at your wiki-site, but for me it's not really trivial as I'm relatively new to this. An idea to get the connection to the OMPI project would be to use the ft_event framework. Therefore I added another opal_crs_state_type_t OPAL_CRS_PAUSE in crs/crs.h and tried to trigger the event in orted_comm.c with: if( NULL != orte_ess.ft_event ) { if( ORTE_SUCCESS != (ret = orte_ess.ft_event(OPAL_CRS_PAUSE))) { goto CLEANUP; } } But the ft_event() is NULL and therefore isn't executed... Any ideas? Any advices? For me the performance impact of a solution is of no interest. Thanks, and please excuse me if I bother you with this. Christoph
Re: [OMPI devel] Howto pause BTL's sending at runtime
Hi again, Maybe I should give more specific information with some code snippets... Currently I added #define ORTE_DAEMON_BTL_CTL_CMD (orte_daemon_cmd_flag_t) 26 to odls_types.h to identify if I want to trigger the BTL pause. In process_commands() of orted/orted_comm.c this flag is processed first by broadcasting to all orteds with xcast of the grpcomm framework. At second it's forwarded with orte_odls.deliver_message to the local procs. So every process should get the trigger. Or is there another possibly easier way of spawning the trigger? I expanded the mca_btl_base_module_t in btl/btl.h simply with an indicator if pause is set. struct mca_btl_base_module_t { [...] boolbtl_paused; [...] }; I then added a line to the initial values in every BTL component that btl_paused should be false by default. E.g. in self/btl_self.c: mca_btl_base_module_t mca_btl_self = { [...] false, /* btl_paused */ [...] }; Or did I forget something? So my problem is now, when every process gets the trigger in the ORTE project, how could I set btl->paused to true in OMPI project? ORTE has not (and I know it should not) have access to the OMPI components. Is there a way of implementing a libevent callback function in the BTL modules? Or is there another way? I already read the documentation at your wiki-site, but for me it's not really trivial as I'm relatively new to this. An idea to get the connection to the OMPI project would be to use the ft_event framework. Therefore I added another opal_crs_state_type_t OPAL_CRS_PAUSE in crs/crs.h and tried to trigger the event in orted_comm.c with: if( NULL != orte_ess.ft_event ) { if( ORTE_SUCCESS != (ret = orte_ess.ft_event(OPAL_CRS_PAUSE))) { goto CLEANUP; } } But the ft_event() is NULL and therefore isn't executed... Any ideas? Any advices? For me the performance impact of a solution is of no interest. Thanks, and please excuse me if I bother you with this. Christoph Christoph Konersmann schrieb: Hi all, I'm trying to implement a method to pause all BTL's sending packets to their destinations. Currently I added a state variable to orte_process_info which will be changed with an external program through process_commands() in orte/orted/orted_comm.c (I hope it's processed globaly not locally). While this state is changed to something defined as PAUSE, I want the send_methods in PML-Layer to be halted omitting any network traffic. By now it's not working, cause the PML-Layer does not see the state change. Another way would be to use a libevent thread on the bml/pml-level. I've read that this library is already supported/implemented, or am I wrong? How would I use libevent in this context? Does somebody have an example or hint? Or should I use the fault tolerance framework for this purpose? Any help would be appreciated. thanks -- Paderborn Center for Parallel Computing - PC2 University of Paderborn - Germany http://www.pc2.de Christoph Konersmann
Re: [OMPI devel] Howto pause BTL's sending at runtime - hope mail is working again
You've got this a tad wrong, but that's okay - let me try to clarify a couple of things that may help. First, you don't want to add this as a separate orted command. As you noted, orte has no direct way to tell the OMPI layer to do anything. Instead, you want to pass a message to the process that is received in the OMPI layer. That is easy to do. 1. add a message tag in ompi/mca/dpm/dpm.h - perhaps something like OMPI_RML_TAG_BTL_CTL 2. in the btl, add a call to orte_rml.recv_nb() that identifies the above tag and specifies a callback function to use when such a message arrives 3. in that callback function, toggle your "paused" flag - or you can unpack the buffer to get a flag telling you what value to set. Your choice. Now, when you want to pause the BTL, you do an orte_grpcomm.xcast() to the above message tag. ORTE will deliver that message to every process, which will then have its callback function called. HTH Ralph On Jan 8, 2010, at 9:03 AM, Christoph Konersmann wrote: > Hi again, > > Maybe I should give more specific information with some code snippets... > > Currently I added > #define ORTE_DAEMON_BTL_CTL_CMD (orte_daemon_cmd_flag_t) 26 > to odls_types.h to identify if I want to trigger the BTL pause. > > In process_commands() of orted/orted_comm.c this flag is processed first by > broadcasting to all orteds with xcast of the grpcomm framework. At second > it's forwarded with orte_odls.deliver_message to the local procs. So every > process should get the trigger. Or is there another possibly easier way of > spawning the trigger? > > I expanded the mca_btl_base_module_t in btl/btl.h simply with an indicator if > pause is set. > struct mca_btl_base_module_t { > [...] >boolbtl_paused; > [...] > }; > > I then added a line to the initial values in every BTL component that > btl_paused should be false by default. E.g. in self/btl_self.c: > mca_btl_base_module_t mca_btl_self = { > [...] >false, /* btl_paused */ > [...] > }; > Or did I forget something? > > So my problem is now, when every process gets the trigger in the ORTE > project, how could I set btl->paused to true in OMPI project? ORTE has not > (and I know it should not) have access to the OMPI components. Is there a way > of implementing a libevent callback function in the BTL modules? Or is there > another way? I already read the documentation at your wiki-site, but for me > it's not really trivial as I'm relatively new to this. > > An idea to get the connection to the OMPI project would be to use the > ft_event framework. Therefore I added another opal_crs_state_type_t > OPAL_CRS_PAUSE in crs/crs.h and tried to trigger the event in orted_comm.c > with: > if( NULL != orte_ess.ft_event ) { >if( ORTE_SUCCESS != (ret = orte_ess.ft_event(OPAL_CRS_PAUSE))) { >goto CLEANUP; >} > } > But the ft_event() is NULL and therefore isn't executed... > > Any ideas? Any advices? > > For me the performance impact of a solution is of no interest. > > Thanks, and please excuse me if I bother you with this. > > Christoph > > Christoph Konersmann schrieb: >> Hi all, >> I'm trying to implement a method to pause all BTL's sending packets to their >> destinations. >> Currently I added a state variable to orte_process_info which will be >> changed with an external program through process_commands() in >> orte/orted/orted_comm.c (I hope it's processed globaly not locally). While >> this state is changed to something defined as PAUSE, I want the send_methods >> in PML-Layer to be halted omitting any network traffic. By now it's not >> working, cause the PML-Layer does not see the state change. >> Another way would be to use a libevent thread on the bml/pml-level. I've >> read that this library is already supported/implemented, or am I wrong? How >> would I use libevent in this context? Does somebody have an example or hint? >> Or should I use the fault tolerance framework for this purpose? >> Any help would be appreciated. thanks > > -- > Paderborn Center for Parallel Computing - PC2 > University of Paderborn - Germany > http://www.pc2.de > > Christoph Konersmann > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel
[OMPI devel] Fwd: Update on CS mail problem
You may have noticed that some of the messages from this morning were marked as a virus (prefixed with [PMX:VIRUS]). This was caused by the problem described below by Rob. This affected the various mailing lists (including all the Open MPI project lists) that were hosted by IU. The admins at IU think they have the issue resolved, and should be resending the quarantined messages sometime today. -- Josh Begin forwarded message: > From: Rob Henderson > Date: January 8, 2010 10:32:45 AM EST > To: undisclosed-recipients > Subject: Update on CS mail problem > > > There was a problem with the CS virus and spam scanning software that > was causing email sent through @cs.indiana.edu, @extreme.indiana.edu, > and @osl.iu.edu to be incorrectly tagged as a virus with the virus code: > > SOPHOS_SAVI_ERROR_OLD_VIRUS_DATA > > We have corrected the underlying problem that caused this but the result > was that email delivered starting around 1:20am through around 8:25am on > 1/8/2010 was not delivered properly. > > The messages were saved in the software's quarantine queue and I will be > releasing this email from the queue so you should be receiving the > original messages intact. I'm working with the vendor on the release of > the messages from the queue so you should be getting them shortly but I > don't have an ETA on this yet. But, I'm working to do this as soon as > possible so please bear with me. > > Thanks, > > --Rob
Re: [OMPI devel] Howto pause BTL's sending at runtime - hope mail is working again
Hi again, Maybe I should give more specific information with some code snippets... Currently I added #define ORTE_DAEMON_BTL_CTL_CMD (orte_daemon_cmd_flag_t) 26 to odls_types.h to identify if I want to trigger the BTL pause. In process_commands() of orted/orted_comm.c this flag is processed first by broadcasting to all orteds with xcast of the grpcomm framework. At second it's forwarded with orte_odls.deliver_message to the local procs. So every process should get the trigger. Or is there another possibly easier way of spawning the trigger? I expanded the mca_btl_base_module_t in btl/btl.h simply with an indicator if pause is set. struct mca_btl_base_module_t { [...] boolbtl_paused; [...] }; I then added a line to the initial values in every BTL component that btl_paused should be false by default. E.g. in self/btl_self.c: mca_btl_base_module_t mca_btl_self = { [...] false, /* btl_paused */ [...] }; Or did I forget something? So my problem is now, when every process gets the trigger in the ORTE project, how could I set btl->paused to true in OMPI project? ORTE has not (and I know it should not) have access to the OMPI components. Is there a way of implementing a libevent callback function in the BTL modules? Or is there another way? I already read the documentation at your wiki-site, but for me it's not really trivial as I'm relatively new to this. An idea to get the connection to the OMPI project would be to use the ft_event framework. Therefore I added another opal_crs_state_type_t OPAL_CRS_PAUSE in crs/crs.h and tried to trigger the event in orted_comm.c with: if( NULL != orte_ess.ft_event ) { if( ORTE_SUCCESS != (ret = orte_ess.ft_event(OPAL_CRS_PAUSE))) { goto CLEANUP; } } But the ft_event() is NULL and therefore isn't executed... Any ideas? Any advices? For me the performance impact of a solution is of no interest. Thanks, and please excuse me if I bother you with this. Christoph Christoph Konersmann schrieb: Hi all, I'm trying to implement a method to pause all BTL's sending packets to their destinations. Currently I added a state variable to orte_process_info which will be changed with an external program through process_commands() in orte/orted/orted_comm.c (I hope it's processed globaly not locally). While this state is changed to something defined as PAUSE, I want the send_methods in PML-Layer to be halted omitting any network traffic. By now it's not working, cause the PML-Layer does not see the state change. Another way would be to use a libevent thread on the bml/pml-level. I've read that this library is already supported/implemented, or am I wrong? How would I use libevent in this context? Does somebody have an example or hint? Or should I use the fault tolerance framework for this purpose? Any help would be appreciated. thanks -- Paderborn Center for Parallel Computing - PC2 University of Paderborn - Germany http://www.pc2.de Christoph Konersmann
Re: [OMPI devel] MALLOC_MMAP_MAX (and MALLOC_MMAP_THRESHOLD)
On Thu, 7 Jan 2010, Eugene Loh wrote: Could someone tell me how these settings are used in OMPI or give any guidance on how they should or should not be used? This is a very good question :-) As this whole e-mail, though it's hard (in my opinion) to give it a Good (TM) answer. This means that if you loop over the elements of multiple large arrays (which is common in HPC), you can generate a lot of cache conflicts, depending on the cache associativity. On the other hand, high buffer alignment sometimes gives better performance (e.g. Infiniband QDR bandwidth). There are multiple reasons one might want to modify the behavior of the memory allocator, including high cost of mmap calls, wanting to register memory for faster communications, and now this cache-conflict issue. The usual solution is setenv MALLOC_MMAP_MAX_0 setenv MALLOC_TRIM_THRESHOLD_ -1 or the equivalent mallopt() calls. But yes, this set of settings is the number one tweak on HPC code that I'm aware of. This issue becomes an MPI issue for at least three reasons: *) MPI may care about these settings due to memory registration and pinning. (I invite you to explain to me what I mean. I'm talking over my head here.) Avoiding mmap is good since it prevents from calling munmap (a function we need to hack to prevent data corruption). *) (Related to the previous bullet), MPI performance comparisons may reflect these effects. Specifically, in comparing performance of OMPI, Intel MPI, Scali/Platform MPI, and MVAPICH2, some tests (such as HPCC and SPECmpi) have shown large performance differences between the various MPIs when, it seems, none were actually spending much time in MPI. Rather, some MPI implementations were turning off large-malloc mmaps and getting good performance (and sadly OMPI looked bad in comparison). I don't think this bullet is related to the previous one. The first one is a good reason, this one is typically the Bad reason. Bad, but unfortunately true : competitors' MPI libraries are faster because ... they do much more than MPI (accelerate malloc being the main difference). Which I think is Bad, because all these settings should be let in developper's hands. You'll always find an application where these settings will waste memory and prevent an application from running. *) These settings seem to be desirable for HPC codes since they don't do much allocation/deallocation and they do tend to have loop nests that wade through multiple large arrays at once. For best "out of the box" performance, a software stack should turn these settings on for HPC. Codes don't typically identify themselves as "HPC", but some indicators include Fortran, OpenMP, and MPI. In practice, I agree. Most HPC codes benefit from it. But I also ran into codes where the memory waste was a problem. I don't know the full scope of the problem, but I've run into this with at least HPCC STREAM (which shouldn't depend on MPI at all, but OMPI looks much slower than Scali/Platform on some tests) and SPECmpi (primarily one or two codes, though it depends also on problem size). I had also those codes in mind. That's also why I don't like those MPI "benchmarks", since they benchmark much more than MPI. They hence encourage MPI provider to incorporate into their libraries things that have (more or less) nothing to do with MPI. But again, yes, from the (basic) user point of view, library X seems faster than library Y. When there is nothing left to improve on MPI, start optimizing the rest .. maybe we should reimplement a faster libc inside MPI :-) Sylvain
[OMPI devel] [PMX:VIRUS] Sophos virus
The original content of this message part has been replaced by this text because it tested positive for the following virus(es): SOPHOS_SAVI_ERROR_OLD_VIRUS_DATA The original message has been quarantined pending further action by the mail administrator. For further information about the message and its delivery status, please contact the undersigned, and include the full content of this message. The identifier for this message is 'o08BTUqg011523'. This notification is being sent to you and any other original envelope recipient(s). To avoid creating a nuisance and to keep mail traffic under control, the original sender of the message has NOT been notified. However, you may want to notify the sender at your discretion. The Management PureMessage Admin
[OMPI devel] [PMX:VIRUS] Re: Howto pause BTL's sending at runtime - sorry for spamming
The original content of this message part has been replaced by this text because it tested positive for the following virus(es): SOPHOS_SAVI_ERROR_OLD_VIRUS_DATA The original message has been quarantined pending further action by the mail administrator. For further information about the message and its delivery status, please contact the undersigned, and include the full content of this message. The identifier for this message is 'o089RBUv000649'. This notification is being sent to you and any other original envelope recipient(s). To avoid creating a nuisance and to keep mail traffic under control, the original sender of the message has NOT been notified. However, you may want to notify the sender at your discretion. The Management PureMessage Admin
[OMPI devel] [PMX:VIRUS] Re: Howto pause BTL's sending at runtime
The original content of this message part has been replaced by this text because it tested positive for the following virus(es): SOPHOS_SAVI_ERROR_OLD_VIRUS_DATA The original message has been quarantined pending further action by the mail administrator. For further information about the message and its delivery status, please contact the undersigned, and include the full content of this message. The identifier for this message is 'o089J4Io032177'. This notification is being sent to you and any other original envelope recipient(s). To avoid creating a nuisance and to keep mail traffic under control, the original sender of the message has NOT been notified. However, you may want to notify the sender at your discretion. The Management PureMessage Admin
[OMPI devel] [PMX:VIRUS] Re: Howto pause BTL's sending at runtime
The original content of this message part has been replaced by this text because it tested positive for the following virus(es): SOPHOS_SAVI_ERROR_OLD_VIRUS_DATA The original message has been quarantined pending further action by the mail administrator. For further information about the message and its delivery status, please contact the undersigned, and include the full content of this message. The identifier for this message is 'o0894jWP029876'. This notification is being sent to you and any other original envelope recipient(s). To avoid creating a nuisance and to keep mail traffic under control, the original sender of the message has NOT been notified. However, you may want to notify the sender at your discretion. The Management PureMessage Admin