[OMPI devel] Ticket #1982 - Fortran MPI_IN_PLACE issue

2009-09-22 Thread David Gunter
I've been playing around with Jeff's "bogus" tarball and I, too, see  
it fail on OS X.  If I make the following changes it works perfectly:


in configure.in

1) replace -fno-common with -fcommon
2) add -flat_namespace as part of the arguments for creating shared  
libs.


After that, things work fine:

(dog@domdechant 63%) main
 Fortran MPI_BOTTOM is  93
Assigning C variables
MPI_SEND_F: This is BOTTOM: 0x2040 == (0x6020/17, 0x6024/18,  
0x2040/19, 0x602c/20)

 Fortran MPI_BOTTOM is  19
 Fortran MPI_BOTTOM is  32
MPI_SEND_F: This is BOTTOM: 0x2040 == (0x6020/17, 0x6024/18,  
0x2040/32, 0x602c/20)

 Fortran MPI_BOTTOM is  32

I still don't see what the problem is for the two different versions  
of OMPI are.


OSX 10.5.8, GCC 4.4.1, most recent libtool, autoconf, automake and m4.

-david
--
David Gunter
HPC-3: Parallel Tools Team
Los Alamos National Laboratory





Re: [OMPI devel] Ticket #1982 - Fortran MPI_IN_PLACE issue

2009-09-22 Thread David Gunter

I meant to say "configure", not "configure.in" below.

--
David Gunter
HPC-3: Parallel Tools Team
Los Alamos National Laboratory



On Sep 22, 2009, at 8:05 AM, David Gunter wrote:

I've been playing around with Jeff's "bogus" tarball and I, too, see  
it fail on OS X.  If I make the following changes it works perfectly:


in configure.in

1) replace -fno-common with -fcommon
2) add -flat_namespace as part of the arguments for creating shared  
libs.


After that, things work fine:

(dog@domdechant 63%) main
Fortran MPI_BOTTOM is  93
Assigning C variables
MPI_SEND_F: This is BOTTOM: 0x2040 == (0x6020/17, 0x6024/18,  
0x2040/19, 0x602c/20)

Fortran MPI_BOTTOM is  19
Fortran MPI_BOTTOM is  32
MPI_SEND_F: This is BOTTOM: 0x2040 == (0x6020/17, 0x6024/18,  
0x2040/32, 0x602c/20)

Fortran MPI_BOTTOM is  32

I still don't see what the problem is for the two different versions  
of OMPI are.


OSX 10.5.8, GCC 4.4.1, most recent libtool, autoconf, automake and m4.

-david
--
David Gunter
HPC-3: Parallel Tools Team
Los Alamos National Laboratory



___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] Ticket #1982 - Fortran MPI_IN_PLACE issue

2009-09-22 Thread Jeff Squyres
Thanks!  I added these comments to #1982 (don't hesitate to add  
comments yourself :-) ).



On Sep 22, 2009, at 10:05 AM, David Gunter wrote:

I've been playing around with Jeff's "bogus" tarball and I, too, see  
it fail on OS X.  If I make the following changes it works perfectly:


in configure.in

1) replace -fno-common with -fcommon
2) add -flat_namespace as part of the arguments for creating shared  
libs.


After that, things work fine:

(dog@domdechant 63%) main
Fortran MPI_BOTTOM is  93
Assigning C variables
MPI_SEND_F: This is BOTTOM: 0x2040 == (0x6020/17, 0x6024/18,  
0x2040/19, 0x602c/20)

Fortran MPI_BOTTOM is  19
Fortran MPI_BOTTOM is  32
MPI_SEND_F: This is BOTTOM: 0x2040 == (0x6020/17, 0x6024/18,  
0x2040/32, 0x602c/20)

Fortran MPI_BOTTOM is  32

I still don't see what the problem is for the two different versions  
of OMPI are.


OSX 10.5.8, GCC 4.4.1, most recent libtool, autoconf, automake and m4.

-david
--
David Gunter
HPC-3: Parallel Tools Team
Los Alamos National Laboratory



___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



--
Jeff Squyres
jsquy...@cisco.com



Re: [OMPI devel] Ticket #1982 - Fortran MPI_IN_PLACE issue

2009-09-22 Thread David Gunter
I don't believe I have an account to add comments - I would appreciate  
one!


Thanks,
david
--
David Gunter
HPC-3: Parallel Tools Team
Los Alamos National Laboratory



On Sep 22, 2009, at 8:24 AM, Jeff Squyres wrote:

Thanks!  I added these comments to #1982 (don't hesitate to add  
comments yourself :-) ).



On Sep 22, 2009, at 10:05 AM, David Gunter wrote:

I've been playing around with Jeff's "bogus" tarball and I, too,  
see it fail on OS X.  If I make the following changes it works  
perfectly:


in configure.in

1) replace -fno-common with -fcommon
2) add -flat_namespace as part of the arguments for creating shared  
libs.


After that, things work fine:

(dog@domdechant 63%) main
Fortran MPI_BOTTOM is  93
Assigning C variables
MPI_SEND_F: This is BOTTOM: 0x2040 == (0x6020/17, 0x6024/18,  
0x2040/19, 0x602c/20)

Fortran MPI_BOTTOM is  19
Fortran MPI_BOTTOM is  32
MPI_SEND_F: This is BOTTOM: 0x2040 == (0x6020/17, 0x6024/18,  
0x2040/32, 0x602c/20)

Fortran MPI_BOTTOM is  32

I still don't see what the problem is for the two different  
versions of OMPI are.


OSX 10.5.8, GCC 4.4.1, most recent libtool, autoconf, automake and  
m4.


-david
--
David Gunter
HPC-3: Parallel Tools Team
Los Alamos National Laboratory



___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



--
Jeff Squyres
jsquy...@cisco.com

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] Dynamic languages, dlopen() issues, and symbol visibility of libtool ltdl API in current trunk

2009-09-22 Thread Lisandro Dalcin
On Mon, Sep 21, 2009 at 9:45 AM, Jeff Squyres  wrote:
> Ick; I appreciate Lisandro's quandry, but don't quite know what to do.
>

I'm just asking the library "libopen-pal.so" exposing ltdl calls
wrapped with an "opal_" prefix. This way, the original ltdl calls hare
hidden (no chance to collide with user code using an incompatible
libtool version), but Open MPI provides a portable way to dlopen()
shared libs/dynamic modules. In simple terms, I'm asking
"libopen-pal.so" to contain ltdl wrapper calls like this one:

OMPI_DECLSPEC lt_dlhandle opal_lt_dlopenadvise(const char *filename,
lt_dladvise advise) /* note opal_ prefix! */
{
   return lt_dlopenadvise(filename,advise); /* original ltdl call*/
}


Then, third-party code (like mpi4py or any other dynamic MPI module
for any other dynamic language) can do this:

#include 
#if defined(OPEN_MPI)
typedef void *lt_dlhandle;
typedef void *lt_dladvise;
OMPI_DECLSPEC extern lt_dlhandle opal_lt_dlopenadvise(const char *, lt_dladvise)
#endif
...
#if defined(OPEN_MPI)
/* init advice, not shown ... */
opal_lt_dlopenadvise("mpi", advice);
/* destroy advice, not shown ... */
#endif
MPI_Init(0,0);

>
> How about keeping libltdl fvisibility=hidden inside mpi4py?
>

Not sure if I was clear enough in my comments above, but mpi4py does
not bundles/link libtool. Just abuses on libtool availability in
"libopen-pal.so" for the sake of portability.

>
> On Sep 17, 2009, at 11:16 AM, Josh Hursey wrote:
>
>> So I started down this road a couple months ago. I was using the
>> lt_dlopen() and friends in the OPAL CRS self module. The visibility
>> changes broke that functionality. The one solution that I started
>> implementing was precisely what you suggested, wrapping a subset the
>> libtool calls and prefixing them with opal_*. The email thread is below:
>>   http://www.open-mpi.org/community/lists/devel/2009/07/6531.php
>>
>> The problem that I hit was that libtool's build system did not play
>> well with the visibility symbols. This caused dlopen to be disabled
>> incorrectly. The libtool folks have a patch and, I believe, they are
>> planning on incorporating in the next release. The email thread is
>> below:
>>   http://thread.gmane.org/gmane.comp.gnu.libtool.patches/9446
>>
>> So we would (others can speak up if not) certainly consider such a
>> wrapper, but I think we need to wait for the next libtool release
>> (unless there is other magic we can do) before it would be usable.
>>
>> Do others have any other ideas on how we might get around this in the
>> mean time?
>>
>> -- Josh
>>
>>
>> On Sep 16, 2009, at 5:59 PM, Lisandro Dalcin wrote:
>>
>> > Hi all.. I have to contact you again about the issues related to
>> > dlopen()ing libmpi with RTLD_LOCAL, as many dynamic languages (Python
>> > in my case) do.
>> >
>> > So far, I've been able to manage the issues (despite the "do nothing"
>> > policy from Open MPI devs, which I understand) in a more or less
>> > portable manner by taking advantage of the availability of libtool
>> > ltdl symbols in the Open MPI libraries (specifically, in libopen-pal).
>> > For reference, all this hackery is here:
>> > http://code.google.com/p/mpi4py/source/browse/trunk/src/compat/openmpi.h
>> >
>> > However, I noticed that in current trunk (v1.4, IIUC) things have
>> > changed and libtool symbols are not externally available. Again, I
>> > understand the reason and acknowledge that such change is a really
>> > good thing. However, this change has broken all my hackery for
>> > dlopen()ing libmpi before the call to MPI_Init().
>> >
>> > Is there any chance that libopen-pal could provide some properly
>> > prefixed (let say, using "opal_" as a prefix) wrapper calls to a small
>> > subset of the libtool ltdl API? The following set of wrapper calls
>> > would is the minimum required to properly load libmpi in a portable
>> > manner and cleanup resources (let me abuse of my previous suggestion
>> > and add the opal_ prefix):
>> >
>> > opal_lt_dlinit()
>> > opal_lt_dlexit()
>> >
>> > opal_lt_dladvise_init(a)
>> > opal_lt_dladvise_destroy(a)
>> > opal_lt_dladvise_global(a)
>> > opal_lt_dladvise_ext(a)
>> >
>> > opal_lt_dlopenadvise(n,a)
>> > opal_lt_dlclose(h)
>> >
>> > Any chance this request could be considered? I would really like to
>> > have this before any Open MPI tarball get released without libtool
>> > symbols exposed...
>> >
>> >
>> > --
>> > Lisandro Dalcín
>> > ---
>> > Centro Internacional de Métodos Computacionales en Ingeniería (CIMEC)
>> > Instituto de Desarrollo Tecnológico para la Industria Química (INTEC)
>> > Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET)
>> > PTLC - Güemes 3450, (3000) Santa Fe, Argentina
>> > Tel/Fax: +54-(0)342-451.1594
>> >
>> > ___
>> > devel mailing list
>> > de...@open-mpi.org
>> > http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>>
>> ___
>> devel mailing list
>> de...@open-mpi.org
>>

[OMPI devel] coll sm ramifications

2009-09-22 Thread Jeff Squyres

Someday soon, coll sm will be reliable.  Really.  :-)

One thing I noticed is that coll sm is "slow" in communicator  
construction and destruction because it mmap's upon creation and  
munmap's upon deletion.  For most apps, this probably doesn't matter.   
For apps that create bajillions of communicators, the effect can be  
noticeable.


There's at least one way to alleviate this effect, but I don't have  
time to implement this optimization.  I wrote up a ticket with a few  
more details:


https://svn.open-mpi.org/trac/ompi/ticket/2027

--
Jeff Squyres
jsquy...@cisco.com



Re: [OMPI devel] [OMPI users] Open-MPI between Mac and Linux (ubuntu 9.04) over wireless

2009-09-22 Thread Pallab Datta
Hi Rolf,

I ran the following:

pallabdatta$ /usr/local/bin/mpirun --mca btl_tcp_port_min_v4 36900 -mca
btl_tcp_port_range_v4 32 --mca btl_base_verbose 30 --mca
btl_tcp_if_include en0,wlan0 -np 2 -hetero -H localhost,10.11.14.205
/tmp/hello

[fuji.local:02267] mca: base: components_open: Looking for btl components
[fuji.local:02267] mca: base: components_open: opening btl components
[fuji.local:02267] mca: base: components_open: found loaded component self
[fuji.local:02267] mca: base: components_open: component self has no
register function
[fuji.local:02267] mca: base: components_open: component self open
function successful
[fuji.local:02267] mca: base: components_open: found loaded component sm
[fuji.local:02267] mca: base: components_open: component sm has no
register function
[fuji.local:02267] mca: base: components_open: component sm open function
successful
[fuji.local:02267] mca: base: components_open: found loaded component tcp
[fuji.local:02267] mca: base: components_open: component tcp has no
register function
[fuji.local:02267] mca: base: components_open: component tcp open function
successful
[fuji.local:02267] select: initializing btl component self
[fuji.local:02267] select: init of component self returned success
[fuji.local:02267] select: initializing btl component sm
[fuji.local:02267] select: init of component sm returned success
[fuji.local:02267] select: initializing btl component tcp
[fuji.local][[59424,1],0][btl_tcp_component.c:468:mca_btl_tcp_component_create_instances]
invalid interface "wlan0"
[fuji.local:02267] select: init of component tcp returned success
[apex-backpack:31956] mca: base: components_open: Looking for btl components
[apex-backpack:31956] mca: base: components_open: opening btl components
[apex-backpack:31956] mca: base: components_open: found loaded component self
[apex-backpack:31956] mca: base: components_open: component self has no
register function
[apex-backpack:31956] mca: base: components_open: component self open
function successful
[apex-backpack:31956] mca: base: components_open: found loaded component sm
[apex-backpack:31956] mca: base: components_open: component sm has no
register function
[apex-backpack:31956] mca: base: components_open: component sm open
function successful
[apex-backpack:31956] mca: base: components_open: found loaded component tcp
[apex-backpack:31956] mca: base: components_open: component tcp has no
register function
[apex-backpack:31956] mca: base: components_open: component tcp open
function successful
[apex-backpack:31956] select: initializing btl component self
[apex-backpack:31956] select: init of component self returned success
[apex-backpack:31956] select: initializing btl component sm
[apex-backpack:31956] select: init of component sm returned success
[apex-backpack:31956] select: initializing btl component tcp
[apex-backpack][[59424,1],1][btl_tcp_component.c:468:mca_btl_tcp_component_create_instances]
invalid interface "en0"
[apex-backpack:31956] select: init of component tcp returned success
Process 0 on fuji.local out of 2
Process 1 on apex-backpack out of 2
[apex-backpack:31956] btl: tcp: attempting to connect() to address
10.11.14.203 on port 9360



It launches the processes on both ends and then it hangs at the send
receive part..!!
What is the other thing that you were mentioning which makes you think
that its not working?!?
Please suggest..
--regards, pallab



> The -enable-heterogeneous should do the trick.  And to answer the
> previous question, yes, put both of the interfaces in the include list.
>
> --mca btl_tcp_if_include en0,wlan0
>
> If that does not work, then I may have one other thought why it might
> not work although perhaps not a solution.
>
> Rolf
>
> Pallab Datta wrote:
>> Hi Rolf,
>>
>> Do i need to configure openmpi with some specific options apart from
>> --enable-heterogeneous..?
>> I am currently using
>> ./configure --prefix=/usr/local/ --enable-heterogeneous --disable-static
>> --enable-shared --enable-debug
>>
>> on both ends...is the above correct..?! Please let me know.
>> thanks and regards,
>> pallab
>>
>>
>>> Hi:
>>> I assume if you wait several minutes than your program will actually
>>> time out, yes?  I guess I have two suggestions. First, can you run a
>>> non-MPI job using the wireless?  Something like hostname?  Secondly,
>>> you
>>> may want to specify the specific interfaces you want it to use on the
>>> two machines.  You can do that via the "--mca btl_tcp_if_include"
>>> run-time parameter.  Just list the ones that you expect it to use.
>>>
>>> Also, this is not right - "--mca OMPI_mca_mpi_preconnect_all 1"  It
>>> should be --mca mpi_preconnect_mpi 1 if you want to do the connection
>>> during MPI_Init.
>>>
>>> Rolf
>>>
>>> Pallab Datta wrote:
>>>
 The following is the error dump

 fuji:src pallabdatta$ /usr/local/bin/mpirun --mca btl_tcp_port_min_v4
 36900 -mca btl_tcp_port_range_v4 32 --mca btl_base_verbose 30 --mca
 btl
 tcp,self --mca OMPI_mca_

Re: [OMPI devel] [OMPI users] Open-MPI between Mac and Linux (ubuntu 9.04) over wireless

2009-09-22 Thread Pallab Datta
Is this a bug running open-mpi over heterogeneous environments (between a
mac and linux) over wireless links.
Please suggest what needs to be done or what I am missing.?!
Any clues as to how to debug this will be of great help.
thanks and regards, pallab

> Hi Rolf,
>
> I ran the following:
>
> pallabdatta$ /usr/local/bin/mpirun --mca btl_tcp_port_min_v4 36900 -mca
> btl_tcp_port_range_v4 32 --mca btl_base_verbose 30 --mca
> btl_tcp_if_include en0,wlan0 -np 2 -hetero -H localhost,10.11.14.205
> /tmp/hello
>
> [fuji.local:02267] mca: base: components_open: Looking for btl components
> [fuji.local:02267] mca: base: components_open: opening btl components
> [fuji.local:02267] mca: base: components_open: found loaded component self
> [fuji.local:02267] mca: base: components_open: component self has no
> register function
> [fuji.local:02267] mca: base: components_open: component self open
> function successful
> [fuji.local:02267] mca: base: components_open: found loaded component sm
> [fuji.local:02267] mca: base: components_open: component sm has no
> register function
> [fuji.local:02267] mca: base: components_open: component sm open function
> successful
> [fuji.local:02267] mca: base: components_open: found loaded component tcp
> [fuji.local:02267] mca: base: components_open: component tcp has no
> register function
> [fuji.local:02267] mca: base: components_open: component tcp open function
> successful
> [fuji.local:02267] select: initializing btl component self
> [fuji.local:02267] select: init of component self returned success
> [fuji.local:02267] select: initializing btl component sm
> [fuji.local:02267] select: init of component sm returned success
> [fuji.local:02267] select: initializing btl component tcp
> [fuji.local][[59424,1],0][btl_tcp_component.c:468:mca_btl_tcp_component_create_instances]
> invalid interface "wlan0"
> [fuji.local:02267] select: init of component tcp returned success
> [apex-backpack:31956] mca: base: components_open: Looking for btl
> components
> [apex-backpack:31956] mca: base: components_open: opening btl components
> [apex-backpack:31956] mca: base: components_open: found loaded component
> self
> [apex-backpack:31956] mca: base: components_open: component self has no
> register function
> [apex-backpack:31956] mca: base: components_open: component self open
> function successful
> [apex-backpack:31956] mca: base: components_open: found loaded component
> sm
> [apex-backpack:31956] mca: base: components_open: component sm has no
> register function
> [apex-backpack:31956] mca: base: components_open: component sm open
> function successful
> [apex-backpack:31956] mca: base: components_open: found loaded component
> tcp
> [apex-backpack:31956] mca: base: components_open: component tcp has no
> register function
> [apex-backpack:31956] mca: base: components_open: component tcp open
> function successful
> [apex-backpack:31956] select: initializing btl component self
> [apex-backpack:31956] select: init of component self returned success
> [apex-backpack:31956] select: initializing btl component sm
> [apex-backpack:31956] select: init of component sm returned success
> [apex-backpack:31956] select: initializing btl component tcp
> [apex-backpack][[59424,1],1][btl_tcp_component.c:468:mca_btl_tcp_component_create_instances]
> invalid interface "en0"
> [apex-backpack:31956] select: init of component tcp returned success
> Process 0 on fuji.local out of 2
> Process 1 on apex-backpack out of 2
> [apex-backpack:31956] btl: tcp: attempting to connect() to address
> 10.11.14.203 on port 9360
>
>
>
> It launches the processes on both ends and then it hangs at the send
> receive part..!!
> What is the other thing that you were mentioning which makes you think
> that its not working?!?
> Please suggest..
> --regards, pallab
>
>
>
>> The -enable-heterogeneous should do the trick.  And to answer the
>> previous question, yes, put both of the interfaces in the include list.
>>
>> --mca btl_tcp_if_include en0,wlan0
>>
>> If that does not work, then I may have one other thought why it might
>> not work although perhaps not a solution.
>>
>> Rolf
>>
>> Pallab Datta wrote:
>>> Hi Rolf,
>>>
>>> Do i need to configure openmpi with some specific options apart from
>>> --enable-heterogeneous..?
>>> I am currently using
>>> ./configure --prefix=/usr/local/ --enable-heterogeneous
>>> --disable-static
>>> --enable-shared --enable-debug
>>>
>>> on both ends...is the above correct..?! Please let me know.
>>> thanks and regards,
>>> pallab
>>>
>>>
 Hi:
 I assume if you wait several minutes than your program will actually
 time out, yes?  I guess I have two suggestions. First, can you run a
 non-MPI job using the wireless?  Something like hostname?  Secondly,
 you
 may want to specify the specific interfaces you want it to use on the
 two machines.  You can do that via the "--mca btl_tcp_if_include"
 run-time parameter.  Just list the ones that you expect it to use.
>>>

Re: [OMPI devel] [OMPI users] Open-MPI between Mac and Linux (ubuntu 9.04) over wireless

2009-09-22 Thread Pallab Datta
The following are the ifconfig for both the Mac and the Linux respectively:

fuji:openmpi-1.3.3 pallabdatta$ ifconfig
lo0: flags=8049 mtu 16384
inet6 fe80::1%lo0 prefixlen 64 scopeid 0x1
inet 127.0.0.1 netmask 0xff00
inet6 ::1 prefixlen 128
gif0: flags=8010 mtu 1280
stf0: flags=0<> mtu 1280
en0: flags=8863 mtu 1500
inet6 fe80::21f:5bff:fe3d:eaac%en0 prefixlen 64 scopeid 0x4
inet 10.11.14.203 netmask 0xf000 broadcast 10.11.15.255
ether 00:1f:5b:3d:ea:ac
media: autoselect (100baseTX ) status: active
supported media: autoselect 10baseT/UTP  10baseT/UTP
 10baseT/UTP  10baseT/UTP
 100baseTX  100baseTX
 100baseTX  100baseTX
 1000baseT  1000baseT
 1000baseT 
en1: flags=8863 mtu 1500
ether 00:1f:5b:3d:ea:ad
media: autoselect status: inactive
supported media: autoselect 10baseT/UTP  10baseT/UTP
 10baseT/UTP  10baseT/UTP
 100baseTX  100baseTX
 100baseTX  100baseTX
 1000baseT  1000baseT
 1000baseT 
fw0: flags=8863 mtu 4078
lladdr 00:22:41:ff:fe:ed:7d:a8
media: autoselect  status: inactive
supported media: autoselect 


LINUX:

pallabdatta@apex-backpack:~/backpack/src$ ifconfig
loLink encap:Local Loopback
  inet addr:127.0.0.1  Mask:255.0.0.0
  inet6 addr: ::1/128 Scope:Host
  UP LOOPBACK RUNNING  MTU:16436  Metric:1
  RX packets:116 errors:0 dropped:0 overruns:0 frame:0
  TX packets:116 errors:0 dropped:0 overruns:0 carrier:0
  collisions:0 txqueuelen:0
  RX bytes:11788 (11.7 KB)  TX bytes:11788 (11.7 KB)

wlan0 Link encap:Ethernet  HWaddr 00:21:79:c2:54:c7
  inet addr:10.11.14.205  Bcast:10.11.14.255  Mask:255.255.240.0
  inet6 addr: fe80::221:79ff:fec2:54c7/64 Scope:Link
  UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
  RX packets:72531 errors:0 dropped:0 overruns:0 frame:0
  TX packets:28894 errors:0 dropped:0 overruns:0 carrier:0
  collisions:0 txqueuelen:1000
  RX bytes:5459312 (5.4 MB)  TX bytes:7264193 (7.2 MB)

wmaster0  Link encap:UNSPEC  HWaddr
00-21-79-C2-54-C7-34-63-00-00-00-00-00-00-00-00
  UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
  RX packets:0 errors:0 dropped:0 overruns:0 frame:0
  TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
  collisions:0 txqueuelen:1000
  RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)

The mac is a Two 2.26GHz Quad-Core Intel Xeon Mac Pro and the Linux Box is
Ubuntu Server Edition 9.04. The Mac has the ethernet interface to connect
to the network and the linux box connects via a wireless adapter (IOGEAR).

Please help me any way I can fix this issue. It really needs to work for
our project.
thanks in advance,
regards,
pallab





> My other concern was the following but I am not sure it applies here.
> If you have multiple interfaces on the node, and they are on the same
> subnet, then you cannot actually select what IP address to go out of.
> You can only select the IP address you want to connect to. In these
> cases, I have seen a hang because we think we are selecting an IP
> address to go out of, but it actually goes out the other one.
> Perhaps you can send the User's list the output from "ifconfig" on each
> of the machines which would show all the interfaces. You need to get the
> right arguments for ifconfig depending on the OS you are running on.
>
> One thought is make sure the ethernet interface is marked down on both
> boxes if that is possible.
>
> Pallab Datta wrote:
>> Any suggestions on to how to debug this further..??
>> do you think I need to enable any other option besides heterogeneous at
>> the configure proompt.?
>>
>>
>>> The -enable-heterogeneous should do the trick.  And to answer the
>>> previous question, yes, put both of the interfaces in the include list.
>>>
>>> --mca btl_tcp_if_include en0,wlan0
>>>
>>> If that does not work, then I may have one other thought why it might
>>> not work although perhaps not a solution.
>>>
>>> Rolf
>>>
>>> Pallab Datta wrote:
>>>
 Hi Rolf,

 Do i need to configure openmpi with some specific options apart from
 --enable-heterogeneous..?
 I am currently using
 ./configure --prefix=/usr/local/ --enable-heterogeneous
 --disable-static
 --enable-shared --enable-debug

 on both ends...is the above correct..?! Please let me know.
 thanks and regards,
 pallab



> Hi:
> I assume if you wait several minutes than your program will actually
> time out, yes?  I guess I have two suggestions. First, can you run a
> non-MPI job using the wireless?  Something like hostname?  Secondly,
> you
> may want to specify the specific interfaces you want it to use on the
> two machines.  You can do that via the "--mca btl_tcp_if_include"
> run-time parameter.  Just list the ones that you expect it to use.
>
> Also, this is not right - "--mca OMPI_mca

Re: [OMPI devel] application hangs with multiple dup

2009-09-22 Thread Chris Samuel
Hi Edgar,

- "Edgar Gabriel"  wrote:

> just wanted to give a heads-up that I *think* I know what the problem
> is. I should have a fix (with a description) either later today or 
> tomorrow morning...

I see that changeset 21970 is on trunk to fix this issue,
is that backportable to the 1.3.x branch ?

Love to see if this fixes up our users issues with Gadget!

cheers,
Chris
-- 
Christopher Samuel - (03) 9925 4751 - Systems Manager
 The Victorian Partnership for Advanced Computing
 P.O. Box 201, Carlton South, VIC 3053, Australia
VPAC is a not-for-profit Registered Research Agency


Re: [OMPI devel] application hangs with multiple dup

2009-09-22 Thread Edgar Gabriel

it will be available in 1.3.4...
Thanks
Edgar

Chris Samuel wrote:

Hi Edgar,

- "Edgar Gabriel"  wrote:


just wanted to give a heads-up that I *think* I know what the problem
is. I should have a fix (with a description) either later today or 
tomorrow morning...


I see that changeset 21970 is on trunk to fix this issue,
is that backportable to the 1.3.x branch ?

Love to see if this fixes up our users issues with Gadget!

cheers,
Chris


--
Edgar Gabriel
Assistant Professor
Parallel Software Technologies Lab  http://pstl.cs.uh.edu
Department of Computer Science  University of Houston
Philip G. Hoffman Hall, Room 524Houston, TX-77204, USA
Tel: +1 (713) 743-3857  Fax: +1 (713) 743-3335