Re: [OMPI devel] Changes: opal_output and opal_show_help
On May 10, 2008, at 9:00 AM, Jeff Squyres wrote: Er, no. I thought the group had agreed to the main idea last Tuesday (framework for filtering output). We were racing against the time-to- branch clock and didn't take the time for an RFC after we agreed on the design. Do we need to? I don't think so. But I'd just kinda like a more formal description of what this fix is and it's implications on how the developers are expected to use it going forward since this is altering the coding standards. The side effect of eliminating duplicate error messages is new / was not discussed last Tuesday -- I can put out an RFC for that if you'd like, but the benefit is so obvious that I didn't think it would be controversial. Don't get me wrong, I'm not arguing the benefit just that I'd like to know what is expected of me as a developer after this change. Not something to hold up the merge, just something I'd like to see. Cheers, Josh On May 9, 2008, at 8:48 PM, Josh Hursey wrote: Is there a RFC telling us when we might expect this? On May 9, 2008, at 5:52 PM, Jeff Squyres wrote: So when this stuff hits the trunk, ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Jeff Squyres Cisco Systems ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel
[OMPI devel] heterogeneous OpenFabrics adapters
I think that this issue has come up before, but I filed a ticket about it because at least one developer (Jon) has a system with both IB and iWARP adapters: https://svn.open-mpi.org/trac/ompi/ticket/1282 My question: do we care about the heterogeneous adapter scenarios? For v1.3? For v1.4? For ...some version in the future? I think the first issue I identified in the ticket is grunt work to fix (annoying and tedious, but not difficult), but the second one will be a little dicey -- it has scalability issues (e.g., sending around all info in the modex, etc.). -- Jeff Squyres Cisco Systems
Re: [OMPI devel] Changes: opal_output and opal_show_help
Sorry it took so long for a reply; Ralph and I were working on this code much of the day in an attempt to have it all complete / tidied up for the teleconf tomorrow. On May 12, 2008, at 10:04 AM, Josh Hursey wrote: Er, no. I thought the group had agreed to the main idea last Tuesday (framework for filtering output). We were racing against the time- to- branch clock and didn't take the time for an RFC after we agreed on the design. Do we need to? I don't think so. But I'd just kinda like a more formal description of what this fix is and it's implications on how the developers are expected to use it going forward since this is altering the coding standards. Fair enough, will do. Since this one was kinda weird, do you want an after-the-fact RFC, or a page on the wiki? I'm partial to the latter; it'll be more durable. The side effect of eliminating duplicate error messages is new / was not discussed last Tuesday -- I can put out an RFC for that if you'd like, but the benefit is so obvious that I didn't think it would be controversial. Don't get me wrong, I'm not arguing the benefit just that I'd like to know what is expected of me as a developer after this change. That's perfectly reasonable. In short: s/opal_show_help/ orte_show_help/ in the ORTE and OMPI layers, and you're done (which we already did throughout the code base). Use orte_show_help in the ORTE and OMPI layers in the future. I think this information should go on the wiki. Finally, per a conversation that I had with Terry earlier today, I added a new MCA parameter that will turn off the show_help message aggregation. It defaults to aggregation enabled, but you can disable it with: ... --mca orte_base_help_aggregation 0 ... This will show *all* show_help messages, regardless of duplication. Terry was worried that aggregating the same (filename, tuple) messages may actually mask different errors because we allow %s expansion in the message. Re-examining George's mail in this thread, I think he may have had similar concerns, but I didn't grok that at the time. -- Jeff Squyres Cisco Systems
Re: [OMPI devel] Changes: opal_output and opal_show_help
On 5/12/08 3:49 PM, "Jeff Squyres" wrote: > Sorry it took so long for a reply; Ralph and I were working on this > code much of the day in an attempt to have it all complete / tidied up > for the teleconf tomorrow. > > > On May 12, 2008, at 10:04 AM, Josh Hursey wrote: > >>> Er, no. I thought the group had agreed to the main idea last Tuesday >>> (framework for filtering output). We were racing against the time- >>> to- >>> branch clock and didn't take the time for an RFC after we agreed on >>> the design. Do we need to? >> >> I don't think so. But I'd just kinda like a more formal description of >> what this fix is and it's implications on how the developers are >> expected to use it going forward since this is altering the coding >> standards. > > Fair enough, will do. > > Since this one was kinda weird, do you want an after-the-fact RFC, or > a page on the wiki? I'm partial to the latter; it'll be more durable. > >>> The side effect of eliminating duplicate error messages is new / was >>> not discussed last Tuesday -- I can put out an RFC for that if you'd >>> like, but the benefit is so obvious that I didn't think it would be >>> controversial. >> >> Don't get me wrong, I'm not arguing the benefit just that I'd like to >> know what is expected of me as a developer after this change. > > That's perfectly reasonable. In short: s/opal_show_help/ > orte_show_help/ in the ORTE and OMPI layers, and you're done (which we > already did throughout the code base). Use orte_show_help in the ORTE > and OMPI layers in the future. I think this information should go on > the wiki. Just to complete that, you also should: s/opal_output/orte_output s/OPAL_OUTPUT/ORTE_OUTPUT s/OPAL_OUTPUT_VERBOSE/ORTE_OUTPUT_VERBOSE throughout ORTE and OMPI layers in the future. This has also been done in the current code base. > > Finally, per a conversation that I had with Terry earlier today, I > added a new MCA parameter that will turn off the show_help message > aggregation. It defaults to aggregation enabled, but you can disable > it with: > > ... --mca orte_base_help_aggregation 0 ... > > This will show *all* show_help messages, regardless of duplication. > Terry was worried that aggregating the same (filename, tuple) messages > may actually mask different errors because we allow %s expansion in > the message. > > Re-examining George's mail in this thread, I think he may have had > similar concerns, but I didn't grok that at the time.
Re: [OMPI devel] Changes: opal_output and opal_show_help
I think a wiki page describing this should be fine. Just wanted to make sure I use the new functionality properly. Cheers, Josh On May 12, 2008, at 5:59 PM, Ralph Castain wrote: On 5/12/08 3:49 PM, "Jeff Squyres" wrote: Sorry it took so long for a reply; Ralph and I were working on this code much of the day in an attempt to have it all complete / tidied up for the teleconf tomorrow. On May 12, 2008, at 10:04 AM, Josh Hursey wrote: Er, no. I thought the group had agreed to the main idea last Tuesday (framework for filtering output). We were racing against the time- to- branch clock and didn't take the time for an RFC after we agreed on the design. Do we need to? I don't think so. But I'd just kinda like a more formal description of what this fix is and it's implications on how the developers are expected to use it going forward since this is altering the coding standards. Fair enough, will do. Since this one was kinda weird, do you want an after-the-fact RFC, or a page on the wiki? I'm partial to the latter; it'll be more durable. The side effect of eliminating duplicate error messages is new / was not discussed last Tuesday -- I can put out an RFC for that if you'd like, but the benefit is so obvious that I didn't think it would be controversial. Don't get me wrong, I'm not arguing the benefit just that I'd like to know what is expected of me as a developer after this change. That's perfectly reasonable. In short: s/opal_show_help/ orte_show_help/ in the ORTE and OMPI layers, and you're done (which we already did throughout the code base). Use orte_show_help in the ORTE and OMPI layers in the future. I think this information should go on the wiki. Just to complete that, you also should: s/opal_output/orte_output s/OPAL_OUTPUT/ORTE_OUTPUT s/OPAL_OUTPUT_VERBOSE/ORTE_OUTPUT_VERBOSE throughout ORTE and OMPI layers in the future. This has also been done in the current code base. Finally, per a conversation that I had with Terry earlier today, I added a new MCA parameter that will turn off the show_help message aggregation. It defaults to aggregation enabled, but you can disable it with: ... --mca orte_base_help_aggregation 0 ... This will show *all* show_help messages, regardless of duplication. Terry was worried that aggregating the same (filename, tuple) messages may actually mask different errors because we allow %s expansion in the message. Re-examining George's mail in this thread, I think he may have had similar concerns, but I didn't grok that at the time. ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel
Re: [OMPI devel] heterogeneous OpenFabrics adapters
After looking at the code a bit, I realized that I completely forgot that the INI file was invented to solve at least the heterogeneous- adapters-in-a-host problem. So I amended the ticket to reflect that that problem is already solved. The other part is not, though -- consider two MPI procs on different hosts, each with an iWARP NIC, but one NIC supports SRQs and one does not. On May 12, 2008, at 5:36 PM, Jeff Squyres wrote: I think that this issue has come up before, but I filed a ticket about it because at least one developer (Jon) has a system with both IB and iWARP adapters: https://svn.open-mpi.org/trac/ompi/ticket/1282 My question: do we care about the heterogeneous adapter scenarios? For v1.3? For v1.4? For ...some version in the future? I think the first issue I identified in the ticket is grunt work to fix (annoying and tedious, but not difficult), but the second one will be a little dicey -- it has scalability issues (e.g., sending around all info in the modex, etc.). -- Jeff Squyres Cisco Systems -- Jeff Squyres Cisco Systems
Re: [OMPI devel] heterogeneous OpenFabrics adapters
Short version: -- I propose that we should disallow multiple different mca_btl_openib_receive_queues values (or receive_queues values from the INI file) to be used in a single MPI job for the v1.3 series. More details: - The reason I'm looking into this heterogeneity stuff is to help Chelsio support their iWARP NIC in OMPI. Their NIC needs a specific value for mca_btl_openib_receive_queues (specifically: it does not support SRQ and it has the wireup race condition that we discussed before). The major problem is that all the BSRQ information is currently stored in on the openib component -- it is *not* maintained on a per-HCA (or per port) basis. We *could* move all the BSRQ info to live on the hca_t struct (or even the openib module struct), but it has at least 3 big consequences: 1. It would touch a lot of code. But touching all this code is relatively low risk; it will be easy to check for correctness because the changes will either compile or not. 2. There are functions (some of which are static inline) that read the BSRQ data. These functions would have to take an additional (hca_t*) (or (btl_openib_module_t*)) parameter. 3. Getting to the BSRQ info will take at least 1 or 2 more dereferences (e.g., module->hca->bsrq_info or module->bsrq_info...). I'm not too concerned about #1 (it's grunt work), but I am a bit concerned about #2 and #3 since at least some of these places are in the critical performance path. Given these concerns, I propose the following v1.3: - Add a "receive_queues" field to the INI file so that the Chelsio adapter can run out of the box (i.e., "mpirun -np 4 a.out" with hosts containing Chelsio NICs will get a value for btl_openib_receive_queues that will work). - NetEffect NICs will also require overriding btl_openib_receive_queues, but will likely have a different value than Chelsio NICs (they don't have the wireup race condition that Chelsio does). - Because the BSRQ info is on the component (i.e., global), we should detect when multiple different receive_queues values are specified and gracefully abort. I think it'll be quite uncommon to have a need for two different receive_queues values, and that this proposal will be fine for v1.3 Comments? On May 12, 2008, at 6:44 PM, Jeff Squyres wrote: After looking at the code a bit, I realized that I completely forgot that the INI file was invented to solve at least the heterogeneous- adapters-in-a-host problem. So I amended the ticket to reflect that that problem is already solved. The other part is not, though -- consider two MPI procs on different hosts, each with an iWARP NIC, but one NIC supports SRQs and one does not. On May 12, 2008, at 5:36 PM, Jeff Squyres wrote: I think that this issue has come up before, but I filed a ticket about it because at least one developer (Jon) has a system with both IB and iWARP adapters: https://svn.open-mpi.org/trac/ompi/ticket/1282 My question: do we care about the heterogeneous adapter scenarios? For v1.3? For v1.4? For ...some version in the future? I think the first issue I identified in the ticket is grunt work to fix (annoying and tedious, but not difficult), but the second one will be a little dicey -- it has scalability issues (e.g., sending around all info in the modex, etc.). -- Jeff Squyres Cisco Systems -- Jeff Squyres Cisco Systems ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Jeff Squyres Cisco Systems
Re: [OMPI devel] [RFC] mca_base_select()
I -think- I may have found the problem here, but don't have a real test case - try r18429 and see if it works. On 5/11/08 4:32 PM, "Josh Hursey" wrote: > From the stacktrace, this doesn't look like a problem with > base_select, but with 'orte_util_encode_pidmap'. You may want to > start looking there. > > -- Josh > > On May 11, 2008, at 1:30 PM, Lenny Verkhovsky wrote: > >> Hi, >> I tried r 18423 with rank_file component and got seqfault >> ( I increase priority of the component if rmaps_rank_file_path exist) >> >> >> /home/USERS/lenny/OMPI_ORTE_SMD/bin/mpirun -np 4 -hostfile >> hostfile_ompi -mca rmaps_rank_file_path rankfile -mca >> paffinity_base_verbose 5 ./mpi_p_SMD -t bw -output 1 -order 1 >> [witch1:25456] mca:base:select: Querying component [linux] >> [witch1:25456] mca:base:select: Query of component [linux] set >> priority to 10 >> [witch1:25456] mca:base:select: Selected component [linux] >> [witch1:25456] *** Process received signal *** >> [witch1:25456] Signal: Segmentation fault (11) >> [witch1:25456] Signal code: Invalid permissions (2) >> [witch1:25456] Failing at address: 0x2b2875530030 >> [witch1:25456] [ 0] /lib64/libpthread.so.0 [0x2b28759dfc10] >> [witch1:25456] [ 1] /home/USERS/lenny/OMPI_ORTE_SMD/lib/libopen- >> pal.so.0 [0x2b28753e2bb6] >> [witch1:25456] [ 2] /home/USERS/lenny/OMPI_ORTE_SMD/lib/libopen- >> pal.so.0 [0x2b28753e23b6] >> [witch1:25456] [ 3] /home/USERS/lenny/OMPI_ORTE_SMD/lib/libopen- >> pal.so.0 [0x2b28753e22fd] >> [witch1:25456] [ 4] /home/USERS/lenny/OMPI_ORTE_SMD/lib/libopen- >> rte.so.0(orte_util_encode_pidmap+0x2f4) [0x2b287527f412] >> [witch1:25456] [ 5] /home/USERS/lenny/OMPI_ORTE_SMD/lib/libopen- >> rte.so.0(orte_odls_base_default_get_add_procs_data+0x989) >> [0x2b28752934f5] >> [witch1:25456] [ 6] /home/USERS/lenny/OMPI_ORTE_SMD/lib/libopen- >> rte.so.0(orte_plm_base_launch_apps+0x1a3) [0x2b287529e60b] >> [witch1:25456] [ 7] /home/USERS/lenny/OMPI_ORTE_SMD/lib/openmpi/ >> mca_plm_rsh.so [0x2b287612f788] >> [witch1:25456] [ 8] /home/USERS/lenny/OMPI_ORTE_SMD/bin/mpirun >> [0x4032bf] >> [witch1:25456] [ 9] /home/USERS/lenny/OMPI_ORTE_SMD/bin/mpirun >> [0x402b53] >> [witch1:25456] [10] /lib64/libc.so.6(__libc_start_main+0xf4) >> [0x2b2875b06154] >> [witch1:25456] [11] /home/USERS/lenny/OMPI_ORTE_SMD/bin/mpirun >> [0x402aa9] >> [witch1:25456] *** End of error message *** >> Segmentation fault >> >> >> >> >> On Tue, May 6, 2008 at 9:09 PM, Josh Hursey >> wrote: >> This has been committed in r18381 >> >> Please let me know if you have any problems with this commit. >> >> Cheers, >> Josh >> >> On May 5, 2008, at 10:41 AM, Josh Hursey wrote: >> >>> Awesome. >>> >>> The branch is updated to the latest trunk head. I encourage folks to >>> check out this repository and make sure that it builds on their >>> system. A normal build of the branch should be enough to find out if >>> there are any cut-n-paste problems (though I tried to be careful, >>> mistakes do happen). >>> >>> I haven't heard any problems so this is looking like it will come in >>> tomorrow after the teleconf. I'll ask again there to see if there >> are >>> any voices of concern. >>> >>> Cheers, >>> Josh >>> >>> On May 5, 2008, at 9:58 AM, Jeff Squyres wrote: >>> This all sounds good to me! On Apr 29, 2008, at 6:35 PM, Josh Hursey wrote: > What: Add mca_base_select() and adjust frameworks & components to > use > it. > Why: Consolidation of code for general goodness. > Where: https://svn.open-mpi.org/svn/ompi/tmp-public/jjh-mca-play > When: Code ready now. Documentation ready soon. > Timeout: May 6, 2008 (After teleconf) [1 week] > > Discussion: > --- > For a number of years a few developers have been talking about > creating a MCA base component selection function. For various > reasons > this was never implemented. Recently I decided to give it a try. > > A base select function will allow Open MPI to provide completely > consistent selection behavior for many of its frameworks (18 of 31 > to > be exact at the moment). The primary goal of this work is to > improving > code maintainability through code reuse. Other benefits also >> result > such as a slightly smaller memory footprint. > > The mca_base_select() function represented the most commonly used > logic for component selection: Select the one component with the > highest priority and close all of the not selected components. >> This > function can be found at the path below in the branch: > opal/mca/base/mca_base_components_select.c > > To support this I had to formalize a query() function in the > mca_base_component_t of the form: > int mca_base_query_component_fn(mca_base_module_t **module, int > *priority); > > This function is specified after the open and close component > functions in this structure as to allow compatibility with > fram