Re: [OMPI devel] Possible buffer overrun bug in opal_free_list_grow, called by MPI::Init

2008-10-20 Thread George Bosilca

Stephen,

I think you're completely right, and that I had a wrong understanding  
of the modulus operator. Based on my memory, I was pretty sure that  
the modulus is ALWAYS positive. Now, even Wikipedia seems to  
contradict me :) They have a pretty good definition of % based on the  
programming language (http://en.wikipedia.org/wiki/Modulo_operation).


I will apply your patch to all places where we use modulus in Open  
MPI. Thanks for your help on this issue.


  Thanks,
george.

On Oct 19, 2008, at 1:43 PM, Stephan Kramer wrote:


George Bosilca wrote:

Stephan,

You might be right. intptr_t is a signed type, which allows the  
result of % to be potentially negative. However, on the other side,  
mod is defined as a size_t which [based on my memory] is  
definitively unsigned as it represent a size.


Did you try to apply your patch to Open MPI ? If yes does it  
resolve the issue ?


 george.
Yes, I have applied the patch intptr_t -> uintptr_t and it does  
resolve the issue.


I think the way this works, I'm not a C programmer myself, is:
- the outcome of the % is a signed and negative number, say -x
- this number gets wrapped in the assignment to the signed integer  
mod: UINT_MAX+1-x
- in the subtraction CACHE_LINE_SIZE-mod, the result is wrapped  
around again, giving CACHE_LINE_SIZE+x


Cheers
Stephan


On Oct 16, 2008, at 7:29 PM, Stephan Kramer wrote:


George Bosilca wrote:
I did investigate this issue for about 3 hours yesterday. Neither  
valgrind nor efence report any errors on my cluster. I'm using  
debian unstable with gcc-4.1.2. Adding printfs doesn't shows the  
same output as you, all addresses are in the correct range. I  
went over the code manually, and to be honest I cannot imagine  
how this might happens IF the compiler is doing what it is  
supposed to do.


I run out of options on this one. If you can debug it and figure  
out what's the problem there I'll be happy to hear.


george.

Hi George,

Thanks a lot for your effort of looking into this. I think I've  
come a bit further with this. The reproducibility may in fact have  
to do with 32bit/64 bit differences.

I think the culprit is line 105 of opal_free_list.c:

 mod = (intptr_t)ptr % CACHE_LINE_SIZE;
 if(mod != 0) {
 ptr += (CACHE_LINE_SIZE - mod);
 }

As intptr_t casts to a signed integer, for 32 bit with addresses  
above 0x7fff  the outcome of mod will be negative. Thus ptr  
will be increased with more than CACHE_LINE_SIZE, which is not  
accounted for in the allocated buffer size in line 93, and a  
buffer overrun will appear in the subsequent element loop. This is  
confirmed with the output of some debugging statements I've pasted  
below. Also I haven't come across the same bug on 64bit machines.


I guess this should be uintptr_t instead?

Cheers
Stephan Kramer

The debugging output:

mpidebug: num_elements  = 1, flist->fl_elem_size = 40
mpidebug: sizeof(opal_list_item_t) = 16
mpidebug: allocating 184
mpidebug: allocated at memory address 0xb2d29f48
mpidebug: mod = -40, CACHE_LINE_SIZE = 128

and at point of the buffer overrun/efence segfault in gdb:

(gdb) print item
$23 = (opal_free_list_item_t *) 0xb2d2a000

which is exactly at (over) the end of the buffer:  
0xb2d2a000=0xb2d29f48 + 184




On Oct 14, 2008, at 11:03 AM, Stephan Kramer wrote:

Would someone mind having another look at the bug reported  
below? I'm hitting exactly the same problem with Debian  
Unstable, openmpi  1.2.7~rc2. Both valgrind and efence are  
indispensable tools for any developper,  where both may catch  
errors the other won't report. Electric fence is especially good  
at catching buffer overruns as it protects the beginning and end  
of each allocated buffer. The original bug report shows an  
undeniable buffer overrun in MPI::Init, i.e. the attached patch  
prints out exactly the address it's trying to access which is  
over the end of the buffer. Any help would be much appreciated


Stephan Kramer



Patrick,

I'm unable to reproduce the buffer overrun with the latest  
trunk. I

run valgrind (with the memchekcer tool) on a regular basis on the
trunk, and I never noticed anything like that. Moreover, I went  
over
the code, and I cannot imagine how we can overrun the buffer in  
the

code you pinpointed.

Thanks,
 george.

On Aug 23, 2008, at 7:57 PM, Patrick Farrell wrote:

> Hi,
>
> I think I have found a buffer overrun in a function
> called by MPI::Init, though explanations of why I am
> wrong are welcome.
>
> I am using the openmpi included in Ubuntu Hardy,
> version 1.2.5, though I have inspected the latest trunk by eye
> and I don't believe the relevant code has changed.
>
> I was trying to use Electric Fence, a memory debugging library,
> to debug a suspected buffer overrun in my own program.
> Electric Fence works by replacing malloc/free in such
> a way that bounds violation errors issue a segfault.
> While running my program under Electric Fence, I found
> that I got a segfault issued at:
>
> 0xb5cdd

Re: [OMPI devel] -display-map

2008-10-20 Thread Ralph Castain
Hmmm...just to be sure we are all clear on this. The reason we  
proposed to use mpirun is that "hostfile" has no meaning outside of  
mpirun. That's why ompi_info can't do anything in this regard.


We have no idea what hostfile the user may specify until we actually  
get the mpirun cmd line. They may have specified a default-hostfile,  
but they could also specify hostfiles for the individual app_contexts.  
These may or may not include the node upon which mpirun is executing.


So the only way to provide you with a separate command to get a  
hostfile<->nodename mapping would require you to provide us with the  
default-hostifle and/or hostfile cmd line options just as if you were  
issuing the mpirun cmd. We just wouldn't launch - but it would be the  
exact equivalent of doing "mpirun --do-not-launch".


Am I missing something? If so, please do correct me - I would be happy  
to provide a tool if that would make it easier. Just not sure what  
that tool would do.


Thanks
Ralph


On Oct 19, 2008, at 1:59 PM, Greg Watson wrote:


Ralph,

It seems a little strange to be using mpirun for this, but barring  
providing a separate command, or using ompi_info, I think this would  
solve our problem.


Thanks,

Greg

On Oct 17, 2008, at 10:46 AM, Ralph Castain wrote:


Sorry for delay - had to ponder this one for awhile.

Jeff and I agree that adding something to ompi_info would not be a  
good idea. Ompi_info has no knowledge or understanding of  
hostfiles, and adding that capability to it would be a major  
distortion of its intended use.


However, we think we can offer an alternative that might better  
solve the problem. Remember, we now treat hostfiles in a very  
different manner than before - see the wiki page for a complete  
description, or "man orte_hosts".


So the problem is that, to provide you with what you want, we need  
to "dump" the information from whatever default-hostfile was  
provided, and, if no default-hostfile was provided, then the  
information from each hostfile that was provided with an app_context.


The best way we could think of to do this is to add another mpirun  
cmd line option --dump-hostfiles that would output the line-by-line  
name from the hostfile plus the name we resolved it to. Of course,  
--xml would cause it to be in xml format.


Would that meet your needs?

Ralph


On Oct 15, 2008, at 3:12 PM, Greg Watson wrote:


Hi Ralph,

We've been discussing this back and forth a bit internally and  
don't really see an easy solution. Our problem is that Eclipse is  
not running on the head node, so gethostbyname will not  
necessarily resolve to the same address. For example, the hostfile  
might refer to the head node by an internal network address that  
is not visible to the outside world. Since gethostname also looks  
in /etc/hosts, it may resolve locally but not on a remote system.  
The only think I can think of would be, rather than us reading the  
hostfile directly as we do now, to provide an option to ompi_info  
that would dump the hostfile using the same rules that you apply  
when you're using the hostfile. Would that be feasible?


Greg

On Sep 22, 2008, at 4:25 PM, Ralph Castain wrote:

Sorry for delay - was on vacation and am now trying to work my  
way back to the surface.


I'm not sure I can fix this one for two reasons:

1. In general, OMPI doesn't really care what name is used for the  
node. However, the problem is that it needs to be consistent. In  
this case, ORTE has already used the name returned by gethostname  
to create its session directory structure long before mpirun  
reads a hostfile. This is why we retain the value from  
gethostname instead of allowing it to be overwritten by the name  
in whatever allocation we are given. Using the name in hostfile  
would require that I either find some way to remember any prior  
name, or that I tear down and rebuild the session directory tree  
- neither seems attractive nor simple (e.g., what happens when  
the user provides multiple entries in the hostfile for the node,  
each with a different IP address based on another interface in  
that node? Sounds crazy, but we have already seen it done - which  
one do I use?).


2. We don't actually store the hostfile info anywhere - we just  
use it and forget it. For us to add an XML attribute containing  
any hostfile-related info would therefore require us to re-read  
the hostfile. I could have it do that -only- in the case of "XML  
output required", but it seems rather ugly.


An alternative might be for you to simply do a "gethostbyname"  
lookup of the IP address or hostname to see if it matches instead  
of just doing a strcmp. This is what we have to do internally as  
we frequently have problems with FQDN vs. non-FQDN vs. IP  
addresses etc. If the local OS hasn't cached the IP address for  
the node in question it can take a little time to DNS resolve it,  
but otherwise works fine.


I can point you to the code in OPAL that we use - I w