[OMPI devel] ORTE tutorial

2006-09-25 Thread Ralph H Castain
Yo folks

I need to do a little planning and it would help a bunch to have a
preliminary head count. Could you please let me know (a) if you plan to
participate in the tutorial, and (b) indicate if in-person or remote?

For an agenda, my thought is that we will start at 7am Mountain time (that's
9am Eastern) and stop around 2pm Mountain (4pm Eastern) both days so we keep
things as manageable as possible for our European participants.

Comments on the agenda are welcome. I hope to put something out a little
later today.

Ralph




Re: [OMPI devel] ORTE tutorial

2006-09-25 Thread Edgar Gabriel

Ralph,

I plan to attend the meeting in person, assuming that I get the approval 
to enter LANL (which I haven't yet). I will return to Houston on 
Saturday, so I have plenty of time on Friday.


The only point which is difficult for me is to start at 7am in the 
morning (at least on thursday), since I will reach my hotel at White 
Rocks between 1 and 2am in the morning. (My flight comes in at 
Albuquerque at around 10.30 pm on wednesday, and by the time I pick up 
the rental car and drive to White Rocks, it can become quite late)

Could we maybe start a little later that day, e.g. 8am or 9am?

Thanks
Edgar

Ralph H Castain wrote:

Yo folks

I need to do a little planning and it would help a bunch to have a
preliminary head count. Could you please let me know (a) if you plan to
participate in the tutorial, and (b) indicate if in-person or remote?

For an agenda, my thought is that we will start at 7am Mountain time (that's
9am Eastern) and stop around 2pm Mountain (4pm Eastern) both days so we keep
things as manageable as possible for our European participants.

Comments on the agenda are welcome. I hope to put something out a little
later today.

Ralph


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



--
Edgar Gabriel
Assistant Professor
Department of Computer Science  email:gabr...@cs.uh.edu
University of Houston   http://www.cs.uh.edu/~gabriel
Philip G. Hoffman Hall, Room 524Tel: +1 (713) 743-3857
Houston, TX-77204, USA  Fax: +1 (713) 743-3335


Re: [OMPI devel] ORTE tutorial

2006-09-25 Thread Ralph H Castain
In general, I have no problem with starting a little later Thurs. However, I
have found out that nothing was done regarding setting up of meals for the
tutorial, so we are on our own for lunch and breaks. Given traffic issues
etc, we will probably need to break at 11am local time for lunch each day.

So if we don't start until 9am, it will be a short morning session...and
going later makes it very hard on Europe.

I'm hoping the Europeans will answer my earlier email about their specific
areas of interest. Could be that I can creatively schedule those things into
the mornings, and put other things later on Thurs.

Let's see what I can do...
Ralph


On 9/25/06 8:44 AM, "Edgar Gabriel"  wrote:

> Ralph,
> 
> I plan to attend the meeting in person, assuming that I get the approval
> to enter LANL (which I haven't yet). I will return to Houston on
> Saturday, so I have plenty of time on Friday.
> 
> The only point which is difficult for me is to start at 7am in the
> morning (at least on thursday), since I will reach my hotel at White
> Rocks between 1 and 2am in the morning. (My flight comes in at
> Albuquerque at around 10.30 pm on wednesday, and by the time I pick up
> the rental car and drive to White Rocks, it can become quite late)
> Could we maybe start a little later that day, e.g. 8am or 9am?
> 
> Thanks
> Edgar
> 
> Ralph H Castain wrote:
>> Yo folks
>> 
>> I need to do a little planning and it would help a bunch to have a
>> preliminary head count. Could you please let me know (a) if you plan to
>> participate in the tutorial, and (b) indicate if in-person or remote?
>> 
>> For an agenda, my thought is that we will start at 7am Mountain time (that's
>> 9am Eastern) and stop around 2pm Mountain (4pm Eastern) both days so we keep
>> things as manageable as possible for our European participants.
>> 
>> Comments on the agenda are welcome. I hope to put something out a little
>> later today.
>> 
>> Ralph
>> 
>> 
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 




Re: [OMPI devel] ORTE tutorial

2006-09-25 Thread Rainer Keller
Hello dear Ralph,
On Saturday 23 September 2006 04:41, Ralph Castain wrote:
> I am sensitive to the fact that you are well ahead of us in terms of time.
> Is there some way we could organize our agenda to make things easier on
> you? If you review the syllabus posted on the OpenRTE web site
> (www.open-rte.org, look at the news column on the right side), are there
> particular topics you definitely want to attend, and others you would
> rather not attend?
Thanks for Your consideration -- for us, the different startup-methods 
(core-concepts.pdf) and architectural design (here, the different 
life-cycles, including planned persistent behavior and of course the 
cell-design with the interaction of GPR to RAS and RMAP) is most interesting.

> You will note that I already have the materials for some of the modules
> posted on the site. I will be adding more modules in the next few days to
> complete the materials. This may give you a better idea of what I intend to
> cover, and to what depth. My intent is that this will be an interactive
> tutorial, with design discussions and changes being made as we go.
>
> I need to keep the modules in some order to make then understandable, but
> hate to impose upon you late into your night just so you can hear a topic
> of interest to you.
Well, we already plan to stay until around ten o'clock... So we would also 
stay for part time of the afternoon session, as well.

Thanks,
Rainer
-- 

Dipl.-Inf. Rainer Keller   http://www.hlrs.de/people/keller
 High Performance Computing   Tel: ++49 (0)711-685 6 5858
   Center Stuttgart (HLRS)   Fax: ++49 (0)711-685 6 5832
 POSTAL:Nobelstrasse 19 email: kel...@hlrs.de 
 ACTUAL:Allmandring 30, R.O.030AIM:rusraink
 70550 Stuttgart


Re: [OMPI devel] Buffer Overflow Error

2006-09-25 Thread Brian Barrett
Following up on an old thread from the list.  The error was being thrown
by the FORTIFY_SOURCE option that Dave had used (actually, RPM added, I
believe) that does some bounds checking on C functions.  There was an
erroneous size value being passed to a call to snprintf() that was
larger than the buffer being passed.  However, the string being
processed by snprintf() could not possibly have overflowed the buffer,
so there was no possibility of a buffer overflow in this situation.

We've fixed the code so that it will pass the correct value for the size
to snprintf() and this error will no longer occur.

Brian

On Thu, 2006-08-31 at 15:56 -0600, Brian Barrett wrote:
> What facilities are you using to detect the buffer overflow?  We've seen
> no such issues in our testing and I'd be surprised if there was an issue
> in that code path.  Valgrind and friends don't show any issues on our
> test machines, so without more detail, I'm afraid we really can't fix
> the issue you are seeing.
> 
> Brian
> 
> 
> On Thu, 2006-08-24 at 13:53 -0400, Dave Rogers wrote:
> > I just compiled the latest version on my machine and ran a dumb test -
> > mpirun without any arguments.
> > This generated a buffer overflow error!
> > 
> > Error message (reproducible with different mem. addr.s):
> > [ /home/dave/rpmbuild ] $ mpirun 
> > *** buffer overflow detected ***: mpirun terminated
> > === Backtrace: =
> > /lib64/libc.so.6(__chk_fail+0x2f)[0x31669dee3f]
> > /lib64/libc.so.6[0x31669de69b]
> > /lib64/libc.so.6(__snprintf_chk+0x7b)[0x31669de56b] 
> > /usr/lib64/libopal.so.0(opal_cmd_line_get_usage_msg
> > +0x20a)[0x2ac1088a]
> > mpirun[0x403c53]
> > mpirun(orterun+0xa0)[0x402798]
> > mpirun(main+0x1b)[0x4026f3]
> > /lib64/libc.so.6(__libc_start_main+0xf4)[0x316691d084] 
> > mpirun[0x402649]
> > === Memory map: 
> > 0040-00408000 r-xp  09:01
> > 2697992/usr/bin/orterun
> > ...
> > 7fff20e92000-7fff20ea8000 rw-p 7fff20e92000 00:00 0
> > [stack] 
> > ff60-ffe0 ---p  00:00 0
> > [vdso]
> > Aborted
> > 
> > Installation details: System: FC5 AMD Opteron x86_64
> > downloaded SRPM version 1.1.1
> > 
> > rpm -ivh /usr/local/src/dist/libs/openmpi- 1.1-1.src.rpm
> > rpmbuild -ba SPECS/openmpi-1.1.spec --target x86_64
> >  - generates an error from check-rpaths stating that the /usr/lib64
> > prefix is unnecessary and may cause problems
> > QA_RPATHS=$[ 0x0001|0x0010 ] rpmbuild -ba SPECS/openmpi- 1.1.spec
> > --target x86_64
> >  - suggessted workaround - ignores as warnings
> > rpm -ivh ~dave/rpmbuild/RPMS/x86_64/openmpi-1.1-1.x86_64.rpm
> >  - generates a package conflict -- file /usr/lib64/libopal.so from
> > install of openmpi-1.1-1 conflicts with file from package opal-2.2.1-1
> >  - apparently, this comes from opal, the open phone abstraction
> > library... so I uninstalled opal
> > rpm -ivh ~dave/rpmbuild/RPMS/x86_64/openmpi-1.1-1.x86_64.rpm 
> >  - worked! 
> > 
> > The strange thing is that mpirun with normal arguments works as
> > expected without any sorts of mem. errors.
> > mpirun with flags -h or --help also buffer overflows, but not mpirun
> > with an unrecognized argument, to which it spits out a "you must
> > specify how many processes to launch, via the -np argument." error. 
> > 
> > I hope this gets fixed soon, buffer overflows are potential security
> > vulnerabilities.
> > 
> > ~ David Rogers
> > ___
> > devel mailing list
> > de...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel



Re: [OMPI devel] btl_openib_max_btls

2006-09-25 Thread Jeff Squyres
What version of Open MPI are you using?

We had a bug with this on the trunk and [unreleased] v1.2 branch; it was
just fixed within the last few hours in both places.  It should not be a
problem in the released v1.1 series.

Can you confirm that you were using the OMPI trunk or the v1.2 branch?  If
you're seeing this in the v1.1 series, then we need to look at this a bit
closer...


On 9/22/06 1:25 PM, "Nysal Jan"  wrote:

> The ompi_info command shows the following description for
> "btl_openib_max_btls" parameter
> MCA btl: parameter "btl_openib_max_btls" (current value: "-1")  Maximum
> number of HCA ports to use (-1 = use all available, otherwise must be >= 1)
> 
> Even though I specify "mpirun --mca btl_openib_max_btls 1 ."  2 openib
> btls are created(the HCA has 2 ports).
> When I try to run Open MPI across 2 nodes (one node has an HCA with 2 ports
> and the other has only one port). Both endpoints send the QP information
> over to the peer. Only one endpoint exists at the peer so it prints the
> following error message:
> [0,1,1][btl_openib_endpoint.c:706:mca_btl_openib_endpoint_recv] can't find
> suitable endpoint for this peer
> 
> [0,1,0][btl_openib_endpoint.c:913:mca_btl_openib_endpoint_connect] error
> posting receive errno says Operation now in progress
> 
> [0,1,0][btl_openib_endpoint.c:737:mca_btl_openib_endpoint_recv] endpoint
> connect error: -1
> 
> Is "btl_openib_max_btls" the maximum number of BTLs or maximum number of
> BTLs per port (which is what the current implementation "init_one_hca()"
> looks like)?
> 
> -Nysal
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel


-- 
Jeff Squyres
Server Virtualization Business Unit
Cisco Systems


[OMPI devel] Tentative OpenRTE tutorial agenda

2006-09-25 Thread Ralph H Castain
Hello all

I have attached a tentative agenda for this week's tutorial, based on inputs
received so far from planned participants. I have adjusted things to try and
accommodate the needs of a geographically distributed audience, and the fact
that - as sole speaker - I cannot possibly talk for hours on end without a
break.

Please feel free to comment and I will try to make adjustments. We will have
to play this a little loosely on the times - since I haven't given this
lecture before, I can't totally be certain of the time required to cover
each module. I also intend to skim some of the material (especially in the
intro) as most of the audience is already familiar with it.

Thanks
Ralph



agenda.pdf
Description: Binary data