Re: [OMPI devel] -display-map

2008-10-19 Thread Greg Watson

Ralph,

It seems a little strange to be using mpirun for this, but barring  
providing a separate command, or using ompi_info, I think this would  
solve our problem.


Thanks,

Greg

On Oct 17, 2008, at 10:46 AM, Ralph Castain wrote:


Sorry for delay - had to ponder this one for awhile.

Jeff and I agree that adding something to ompi_info would not be a  
good idea. Ompi_info has no knowledge or understanding of hostfiles,  
and adding that capability to it would be a major distortion of its  
intended use.


However, we think we can offer an alternative that might better  
solve the problem. Remember, we now treat hostfiles in a very  
different manner than before - see the wiki page for a complete  
description, or "man orte_hosts".


So the problem is that, to provide you with what you want, we need  
to "dump" the information from whatever default-hostfile was  
provided, and, if no default-hostfile was provided, then the  
information from each hostfile that was provided with an app_context.


The best way we could think of to do this is to add another mpirun  
cmd line option --dump-hostfiles that would output the line-by-line  
name from the hostfile plus the name we resolved it to. Of course, -- 
xml would cause it to be in xml format.


Would that meet your needs?

Ralph


On Oct 15, 2008, at 3:12 PM, Greg Watson wrote:


Hi Ralph,

We've been discussing this back and forth a bit internally and  
don't really see an easy solution. Our problem is that Eclipse is  
not running on the head node, so gethostbyname will not necessarily  
resolve to the same address. For example, the hostfile might refer  
to the head node by an internal network address that is not visible  
to the outside world. Since gethostname also looks in /etc/hosts,  
it may resolve locally but not on a remote system. The only think I  
can think of would be, rather than us reading the hostfile directly  
as we do now, to provide an option to ompi_info that would dump the  
hostfile using the same rules that you apply when you're using the  
hostfile. Would that be feasible?


Greg

On Sep 22, 2008, at 4:25 PM, Ralph Castain wrote:

Sorry for delay - was on vacation and am now trying to work my way  
back to the surface.


I'm not sure I can fix this one for two reasons:

1. In general, OMPI doesn't really care what name is used for the  
node. However, the problem is that it needs to be consistent. In  
this case, ORTE has already used the name returned by gethostname  
to create its session directory structure long before mpirun reads  
a hostfile. This is why we retain the value from gethostname  
instead of allowing it to be overwritten by the name in whatever  
allocation we are given. Using the name in hostfile would require  
that I either find some way to remember any prior name, or that I  
tear down and rebuild the session directory tree - neither seems  
attractive nor simple (e.g., what happens when the user provides  
multiple entries in the hostfile for the node, each with a  
different IP address based on another interface in that node?  
Sounds crazy, but we have already seen it done - which one do I  
use?).


2. We don't actually store the hostfile info anywhere - we just  
use it and forget it. For us to add an XML attribute containing  
any hostfile-related info would therefore require us to re-read  
the hostfile. I could have it do that -only- in the case of "XML  
output required", but it seems rather ugly.


An alternative might be for you to simply do a "gethostbyname"  
lookup of the IP address or hostname to see if it matches instead  
of just doing a strcmp. This is what we have to do internally as  
we frequently have problems with FQDN vs. non-FQDN vs. IP  
addresses etc. If the local OS hasn't cached the IP address for  
the node in question it can take a little time to DNS resolve it,  
but otherwise works fine.


I can point you to the code in OPAL that we use - I would think  
something similar would be easy to implement in your code and  
would readily solve the problem.


Ralph

On Sep 19, 2008, at 7:18 AM, Greg Watson wrote:


Ralph,

The problem we're seeing is just with the head node. If I specify  
a particular IP address for the head node in the hostfile, it  
gets changed to the FQDN when displayed in the map. This is a  
problem for us as we need to be able to match the two, and since  
we're not necessarily running on the head node, we can't always  
do the same resolution you're doing.


Would it be possible to use the same address that is specified in  
the hostfile, or alternatively provide an XML attribute that  
contains this information?


Thanks,

Greg

On Sep 11, 2008, at 9:06 AM, Ralph Castain wrote:

Not in that regard, depending upon what you mean by "recently".  
The only changes I am aware of wrt nodes consisted of some  
changes to the order in which we use the nodes when specified by  
hostfile or -host, and a little #if protectionism needed by  
Brian for the Cray port

Re: [OMPI devel] Possible buffer overrun bug in opal_free_list_grow, called by MPI::Init

2008-10-19 Thread Stephan Kramer

George Bosilca wrote:

Stephan,

You might be right. intptr_t is a signed type, which allows the result 
of % to be potentially negative. However, on the other side, mod is 
defined as a size_t which [based on my memory] is definitively 
unsigned as it represent a size.


Did you try to apply your patch to Open MPI ? If yes does it resolve 
the issue ?


  george.
Yes, I have applied the patch intptr_t -> uintptr_t and it does resolve 
the issue.


I think the way this works, I'm not a C programmer myself, is:
- the outcome of the % is a signed and negative number, say -x
- this number gets wrapped in the assignment to the signed integer mod: 
UINT_MAX+1-x
- in the subtraction CACHE_LINE_SIZE-mod, the result is wrapped around 
again, giving CACHE_LINE_SIZE+x


Cheers
Stephan


On Oct 16, 2008, at 7:29 PM, Stephan Kramer wrote:


George Bosilca wrote:
I did investigate this issue for about 3 hours yesterday. Neither 
valgrind nor efence report any errors on my cluster. I'm using 
debian unstable with gcc-4.1.2. Adding printfs doesn't shows the 
same output as you, all addresses are in the correct range. I went 
over the code manually, and to be honest I cannot imagine how this 
might happens IF the compiler is doing what it is supposed to do.


I run out of options on this one. If you can debug it and figure out 
what's the problem there I'll be happy to hear.


george.

Hi George,

Thanks a lot for your effort of looking into this. I think I've come 
a bit further with this. The reproducibility may in fact have to do 
with 32bit/64 bit differences.

I think the culprit is line 105 of opal_free_list.c:

  mod = (intptr_t)ptr % CACHE_LINE_SIZE;
  if(mod != 0) {
  ptr += (CACHE_LINE_SIZE - mod);
  }

As intptr_t casts to a signed integer, for 32 bit with addresses 
above 0x7fff  the outcome of mod will be negative. Thus ptr will 
be increased with more than CACHE_LINE_SIZE, which is not accounted 
for in the allocated buffer size in line 93, and a buffer overrun 
will appear in the subsequent element loop. This is confirmed with 
the output of some debugging statements I've pasted below. Also I 
haven't come across the same bug on 64bit machines.


I guess this should be uintptr_t instead?

Cheers
Stephan Kramer

The debugging output:

mpidebug: num_elements  = 1, flist->fl_elem_size = 40
mpidebug: sizeof(opal_list_item_t) = 16
mpidebug: allocating 184
mpidebug: allocated at memory address 0xb2d29f48
mpidebug: mod = -40, CACHE_LINE_SIZE = 128

and at point of the buffer overrun/efence segfault in gdb:

(gdb) print item
$23 = (opal_free_list_item_t *) 0xb2d2a000

which is exactly at (over) the end of the buffer: 
0xb2d2a000=0xb2d29f48 + 184




On Oct 14, 2008, at 11:03 AM, Stephan Kramer wrote:

Would someone mind having another look at the bug reported below? 
I'm hitting exactly the same problem with Debian Unstable, openmpi  
1.2.7~rc2. Both valgrind and efence are indispensable tools for any 
developper,  where both may catch errors the other won't report. 
Electric fence is especially good at catching buffer overruns as it 
protects the beginning and end of each allocated buffer. The 
original bug report shows an undeniable buffer overrun in 
MPI::Init, i.e. the attached patch prints out exactly the address 
it's trying to access which is over the end of the buffer. Any help 
would be much appreciated


Stephan Kramer



Patrick,

I'm unable to reproduce the buffer overrun with the latest trunk. I
run valgrind (with the memchekcer tool) on a regular basis on the
trunk, and I never noticed anything like that. Moreover, I went over
the code, and I cannot imagine how we can overrun the buffer in the
code you pinpointed.

Thanks,
  george.

On Aug 23, 2008, at 7:57 PM, Patrick Farrell wrote:

> Hi,
>
> I think I have found a buffer overrun in a function
> called by MPI::Init, though explanations of why I am
> wrong are welcome.
>
> I am using the openmpi included in Ubuntu Hardy,
> version 1.2.5, though I have inspected the latest trunk by eye
> and I don't believe the relevant code has changed.
>
> I was trying to use Electric Fence, a memory debugging library,
> to debug a suspected buffer overrun in my own program.
> Electric Fence works by replacing malloc/free in such
> a way that bounds violation errors issue a segfault.
> While running my program under Electric Fence, I found
> that I got a segfault issued at:
>
> 0xb5cdd334 in opal_free_list_grow (flist=0xb2b46a50, 
num_elements=1)

> at class/opal_free_list.c:113
> 113 OBJ_CONSTRUCT_INTERNAL(item, flist->fl_elem_class);
> (gdb) bt
> #0 0xb5cdd334 in opal_free_list_grow (flist=0xb2b46a50,
> num_elements=1) at class/opal_free_list.c:113
> #1 0xb5cdd479 in opal_free_list_init (flist=0xb2b46a50,
> elem_size=56, elem_class=0xb2b46e20, num_elements_to_alloc=73,
> max_elements_to_alloc=-1, num_elements_per_alloc=1) at class/
> opal_free_list.c:78
> #2 0xb2b381aa in ompi_osc_pt2pt_component_init
> (enable_progress_threads=false, enable_mpi_threads=