[OMPI devel] ORTE tutorial
Yo folks I need to do a little planning and it would help a bunch to have a preliminary head count. Could you please let me know (a) if you plan to participate in the tutorial, and (b) indicate if in-person or remote? For an agenda, my thought is that we will start at 7am Mountain time (that's 9am Eastern) and stop around 2pm Mountain (4pm Eastern) both days so we keep things as manageable as possible for our European participants. Comments on the agenda are welcome. I hope to put something out a little later today. Ralph
Re: [OMPI devel] ORTE tutorial
Ralph, I plan to attend the meeting in person, assuming that I get the approval to enter LANL (which I haven't yet). I will return to Houston on Saturday, so I have plenty of time on Friday. The only point which is difficult for me is to start at 7am in the morning (at least on thursday), since I will reach my hotel at White Rocks between 1 and 2am in the morning. (My flight comes in at Albuquerque at around 10.30 pm on wednesday, and by the time I pick up the rental car and drive to White Rocks, it can become quite late) Could we maybe start a little later that day, e.g. 8am or 9am? Thanks Edgar Ralph H Castain wrote: Yo folks I need to do a little planning and it would help a bunch to have a preliminary head count. Could you please let me know (a) if you plan to participate in the tutorial, and (b) indicate if in-person or remote? For an agenda, my thought is that we will start at 7am Mountain time (that's 9am Eastern) and stop around 2pm Mountain (4pm Eastern) both days so we keep things as manageable as possible for our European participants. Comments on the agenda are welcome. I hope to put something out a little later today. Ralph ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Edgar Gabriel Assistant Professor Department of Computer Science email:gabr...@cs.uh.edu University of Houston http://www.cs.uh.edu/~gabriel Philip G. Hoffman Hall, Room 524Tel: +1 (713) 743-3857 Houston, TX-77204, USA Fax: +1 (713) 743-3335
Re: [OMPI devel] ORTE tutorial
In general, I have no problem with starting a little later Thurs. However, I have found out that nothing was done regarding setting up of meals for the tutorial, so we are on our own for lunch and breaks. Given traffic issues etc, we will probably need to break at 11am local time for lunch each day. So if we don't start until 9am, it will be a short morning session...and going later makes it very hard on Europe. I'm hoping the Europeans will answer my earlier email about their specific areas of interest. Could be that I can creatively schedule those things into the mornings, and put other things later on Thurs. Let's see what I can do... Ralph On 9/25/06 8:44 AM, "Edgar Gabriel" wrote: > Ralph, > > I plan to attend the meeting in person, assuming that I get the approval > to enter LANL (which I haven't yet). I will return to Houston on > Saturday, so I have plenty of time on Friday. > > The only point which is difficult for me is to start at 7am in the > morning (at least on thursday), since I will reach my hotel at White > Rocks between 1 and 2am in the morning. (My flight comes in at > Albuquerque at around 10.30 pm on wednesday, and by the time I pick up > the rental car and drive to White Rocks, it can become quite late) > Could we maybe start a little later that day, e.g. 8am or 9am? > > Thanks > Edgar > > Ralph H Castain wrote: >> Yo folks >> >> I need to do a little planning and it would help a bunch to have a >> preliminary head count. Could you please let me know (a) if you plan to >> participate in the tutorial, and (b) indicate if in-person or remote? >> >> For an agenda, my thought is that we will start at 7am Mountain time (that's >> 9am Eastern) and stop around 2pm Mountain (4pm Eastern) both days so we keep >> things as manageable as possible for our European participants. >> >> Comments on the agenda are welcome. I hope to put something out a little >> later today. >> >> Ralph >> >> >> ___ >> devel mailing list >> de...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/devel >
Re: [OMPI devel] ORTE tutorial
Hello dear Ralph, On Saturday 23 September 2006 04:41, Ralph Castain wrote: > I am sensitive to the fact that you are well ahead of us in terms of time. > Is there some way we could organize our agenda to make things easier on > you? If you review the syllabus posted on the OpenRTE web site > (www.open-rte.org, look at the news column on the right side), are there > particular topics you definitely want to attend, and others you would > rather not attend? Thanks for Your consideration -- for us, the different startup-methods (core-concepts.pdf) and architectural design (here, the different life-cycles, including planned persistent behavior and of course the cell-design with the interaction of GPR to RAS and RMAP) is most interesting. > You will note that I already have the materials for some of the modules > posted on the site. I will be adding more modules in the next few days to > complete the materials. This may give you a better idea of what I intend to > cover, and to what depth. My intent is that this will be an interactive > tutorial, with design discussions and changes being made as we go. > > I need to keep the modules in some order to make then understandable, but > hate to impose upon you late into your night just so you can hear a topic > of interest to you. Well, we already plan to stay until around ten o'clock... So we would also stay for part time of the afternoon session, as well. Thanks, Rainer -- Dipl.-Inf. Rainer Keller http://www.hlrs.de/people/keller High Performance Computing Tel: ++49 (0)711-685 6 5858 Center Stuttgart (HLRS) Fax: ++49 (0)711-685 6 5832 POSTAL:Nobelstrasse 19 email: kel...@hlrs.de ACTUAL:Allmandring 30, R.O.030AIM:rusraink 70550 Stuttgart
Re: [OMPI devel] Buffer Overflow Error
Following up on an old thread from the list. The error was being thrown by the FORTIFY_SOURCE option that Dave had used (actually, RPM added, I believe) that does some bounds checking on C functions. There was an erroneous size value being passed to a call to snprintf() that was larger than the buffer being passed. However, the string being processed by snprintf() could not possibly have overflowed the buffer, so there was no possibility of a buffer overflow in this situation. We've fixed the code so that it will pass the correct value for the size to snprintf() and this error will no longer occur. Brian On Thu, 2006-08-31 at 15:56 -0600, Brian Barrett wrote: > What facilities are you using to detect the buffer overflow? We've seen > no such issues in our testing and I'd be surprised if there was an issue > in that code path. Valgrind and friends don't show any issues on our > test machines, so without more detail, I'm afraid we really can't fix > the issue you are seeing. > > Brian > > > On Thu, 2006-08-24 at 13:53 -0400, Dave Rogers wrote: > > I just compiled the latest version on my machine and ran a dumb test - > > mpirun without any arguments. > > This generated a buffer overflow error! > > > > Error message (reproducible with different mem. addr.s): > > [ /home/dave/rpmbuild ] $ mpirun > > *** buffer overflow detected ***: mpirun terminated > > === Backtrace: = > > /lib64/libc.so.6(__chk_fail+0x2f)[0x31669dee3f] > > /lib64/libc.so.6[0x31669de69b] > > /lib64/libc.so.6(__snprintf_chk+0x7b)[0x31669de56b] > > /usr/lib64/libopal.so.0(opal_cmd_line_get_usage_msg > > +0x20a)[0x2ac1088a] > > mpirun[0x403c53] > > mpirun(orterun+0xa0)[0x402798] > > mpirun(main+0x1b)[0x4026f3] > > /lib64/libc.so.6(__libc_start_main+0xf4)[0x316691d084] > > mpirun[0x402649] > > === Memory map: > > 0040-00408000 r-xp 09:01 > > 2697992/usr/bin/orterun > > ... > > 7fff20e92000-7fff20ea8000 rw-p 7fff20e92000 00:00 0 > > [stack] > > ff60-ffe0 ---p 00:00 0 > > [vdso] > > Aborted > > > > Installation details: System: FC5 AMD Opteron x86_64 > > downloaded SRPM version 1.1.1 > > > > rpm -ivh /usr/local/src/dist/libs/openmpi- 1.1-1.src.rpm > > rpmbuild -ba SPECS/openmpi-1.1.spec --target x86_64 > > - generates an error from check-rpaths stating that the /usr/lib64 > > prefix is unnecessary and may cause problems > > QA_RPATHS=$[ 0x0001|0x0010 ] rpmbuild -ba SPECS/openmpi- 1.1.spec > > --target x86_64 > > - suggessted workaround - ignores as warnings > > rpm -ivh ~dave/rpmbuild/RPMS/x86_64/openmpi-1.1-1.x86_64.rpm > > - generates a package conflict -- file /usr/lib64/libopal.so from > > install of openmpi-1.1-1 conflicts with file from package opal-2.2.1-1 > > - apparently, this comes from opal, the open phone abstraction > > library... so I uninstalled opal > > rpm -ivh ~dave/rpmbuild/RPMS/x86_64/openmpi-1.1-1.x86_64.rpm > > - worked! > > > > The strange thing is that mpirun with normal arguments works as > > expected without any sorts of mem. errors. > > mpirun with flags -h or --help also buffer overflows, but not mpirun > > with an unrecognized argument, to which it spits out a "you must > > specify how many processes to launch, via the -np argument." error. > > > > I hope this gets fixed soon, buffer overflows are potential security > > vulnerabilities. > > > > ~ David Rogers > > ___ > > devel mailing list > > de...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/devel > > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel
Re: [OMPI devel] btl_openib_max_btls
What version of Open MPI are you using? We had a bug with this on the trunk and [unreleased] v1.2 branch; it was just fixed within the last few hours in both places. It should not be a problem in the released v1.1 series. Can you confirm that you were using the OMPI trunk or the v1.2 branch? If you're seeing this in the v1.1 series, then we need to look at this a bit closer... On 9/22/06 1:25 PM, "Nysal Jan" wrote: > The ompi_info command shows the following description for > "btl_openib_max_btls" parameter > MCA btl: parameter "btl_openib_max_btls" (current value: "-1") Maximum > number of HCA ports to use (-1 = use all available, otherwise must be >= 1) > > Even though I specify "mpirun --mca btl_openib_max_btls 1 ." 2 openib > btls are created(the HCA has 2 ports). > When I try to run Open MPI across 2 nodes (one node has an HCA with 2 ports > and the other has only one port). Both endpoints send the QP information > over to the peer. Only one endpoint exists at the peer so it prints the > following error message: > [0,1,1][btl_openib_endpoint.c:706:mca_btl_openib_endpoint_recv] can't find > suitable endpoint for this peer > > [0,1,0][btl_openib_endpoint.c:913:mca_btl_openib_endpoint_connect] error > posting receive errno says Operation now in progress > > [0,1,0][btl_openib_endpoint.c:737:mca_btl_openib_endpoint_recv] endpoint > connect error: -1 > > Is "btl_openib_max_btls" the maximum number of BTLs or maximum number of > BTLs per port (which is what the current implementation "init_one_hca()" > looks like)? > > -Nysal > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Jeff Squyres Server Virtualization Business Unit Cisco Systems
[OMPI devel] Tentative OpenRTE tutorial agenda
Hello all I have attached a tentative agenda for this week's tutorial, based on inputs received so far from planned participants. I have adjusted things to try and accommodate the needs of a geographically distributed audience, and the fact that - as sole speaker - I cannot possibly talk for hours on end without a break. Please feel free to comment and I will try to make adjustments. We will have to play this a little loosely on the times - since I haven't given this lecture before, I can't totally be certain of the time required to cover each module. I also intend to skim some of the material (especially in the intro) as most of the audience is already familiar with it. Thanks Ralph agenda.pdf Description: Binary data