Re: [OMPI devel] Multi-rail on openib

2009-06-12 Thread Nifty Tom Mitchell
On Tue, Jun 09, 2009 at 04:33:51PM +0300, Pavel Shamis (Pasha) wrote:
>
>> Open MPI currently needs to have connected fabrics, but maybe that's  
>> something we will like to change in the future, having two separate  
>> rails. (Btw Pasha, will your current work enable this ?)
> I do not completely understand what do you mean here under two separate  
> rails ...
> Already today you may connect each port to different subnet, and ports  
> in the same
> subnet may talk to each other.
>

Subnet?  (subnet .vs. fabric)
Does this imply tcp/ip
What IB protocols are involved and
Is there any agent that notices the disconnect and will trigger the switch?

-- 
T o m  M i t c h e l l 
Found me a new hat, now what?



Re: [OMPI devel] Enabling debugging and profiling in openMPI (make "CFLAGS=-pg -g")

2009-06-12 Thread Ralph Castain
The firewall should already be solved. Basically, you have to define a  
set of ports in your firewall that will let TCP messages pass through,  
and then tell OMPI to use those ports for both the TCP BTL and the OOB.


"ompi_info --params btl tcp" - will tell you the right param to set  
for the TCP BTL


"ompi_info --params oob tcp" will do the same for the OOB

Of course, that -does- leave a hole in your firewall that any TCP  
message can exploit. :-/  You could look at more secure alternative  
ways.


I'm not sure how to solve the NAT problem as it boils down to how to  
specify the names/IP addresses of the nodes behind the NAT. Someone  
who understands NATs better can help you there - I know there is a way  
to do it, but I've never played with it.


Ralph


On Jun 12, 2009, at 11:00 AM, Leo P. wrote:


Thank you Ralph and Samuel.

Sorry for the complete newbie question.

The reason that i wanted to study openMPI is because i wanted to  
make open MPI support nodes that are behind NAT or firewall. If you  
guys could give me some pointers on how to go about doing this i  
would appreciate alot. I am considering this for my thesis project.


Sincerely,
LEO

From: Ralph Castain 
To: Open MPI Developers 
Sent: Friday, 12 June, 2009 9:56:16 PM
Subject: Re: [OMPI devel] Enabling debugging and profiling in  
openMPI (make "CFLAGS=-pg -g")


If you do a "./configure --help" you will get a complete list of the  
configure options. You may want to turn on more things than just  
enable-debug, though that is the critical first step.



On Jun 12, 2009, at 8:31 AM, Samuel K. Gutierrez wrote:


Hi,

Let me begin by stating that I'm at most an Open MPI novice - but  
you may want to try the addition of the --enable-debug configure  
option.  That is, for example:


./configure --enable-debug; make

Hope this helps.

Samuel K. Gutierrez

On Jun 12, 2009, at 3:27 AM, Leo P. wrote:


Hi everyone,

I am trying to understand the openMPI code so was trying to enable  
debug and profiling by issusing


$ make "CFLAGS=-pg -g"

But i am getting this error.

libtool: link: ( cd ".libs" && rm -f "mca_paffinity_linux.la" &&  
ln -s "../mca_paffinity_linux.la" "mca_paffinity_linux.la" )
make[3]: Leaving directory `/home/Desktop/openmpi-1.3.2/opal/mca/ 
paffinity/linux'
make[2]: Leaving directory `/home/Desktop/openmpi-1.3.2/opal/mca/ 
paffinity/linux'

Making all in tools/wrappers
make[2]: Entering directory `/home/Desktop/openmpi-1.3.2/opal/ 
tools/wrappers'

depbase=`echo opal_wrapper.o | sed 's|[^/]*$|.deps/&|;s|\.o$||'`;\
gcc "-DEXEEXT=\"\"" -I. -I../../../opal/include -I../../../ 
orte/include -I../../../ompi/include -I../../../opal/mca/paffinity/ 
linux/plpa/src/libplpa   -I../../..-pg -g -MT opal_wrapper.o - 
MD -MP -MF $depbase.Tpo -c -o opal_wrapper.o opal_wrapper.c &&\

mv -f $depbase.Tpo $depbase.Po
/bin/bash ../../../libtool --tag=CC   --mode=link gcc  -pg -g  - 
export-dynamic   -o opal_wrapper opal_wrapper.o ../../../opal/ 
libopen-pal.la -lnsl -lutil  -lm
libtool: link: gcc -pg -g -o .libs/opal_wrapper opal_wrapper.o - 
Wl,--export-dynamic  ../../../opal/.libs/libopen-pal.so -ldl -lnsl  
-lutil -lm
../../../opal/.libs/libopen-pal.so: undefined reference to  
`pthread_key_create'
../../../opal/.libs/libopen-pal.so: undefined reference to  
`pthread_getspecific'
../../../opal/.libs/libopen-pal.so: undefined reference to  
`pthread_create'
../../../opal/.libs/libopen-pal.so: undefined reference to  
`pthread_atfork'
../../../opal/.libs/libopen-pal.so: undefined reference to  
`pthread_setspecific'
../../../opal/.libs/libopen-pal.so: undefined reference to  
`pthread_join'

collect2: ld returned 1 exit status
make[2]: *** [opal_wrapper] Error 1
make[2]: Leaving directory `/home//Desktop/openmpi-1.3.2/opal/ 
tools/wrappers'

make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory `/home/Desktop/openmpi-1.3.2/opal'
make: *** [all-recursive] Error 1

Is there any other way of enabling debugging and profilling in  
open MPI.


Leo


Explore your hobbies and interests. Click here to  
begin.___

devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



Bollywood news, movie reviews, film trailers and more! Click  
here.___

devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] Hang in collectives involving shared memory

2009-06-12 Thread Eugene Loh

Sylvain Jeaugey wrote:


Hi Ralph,

I managed to have a deadlock after a whole night, but not the same you 
have : after a quick analysis, process 0 seems to be blocked in the 
very first send through shared memory. Still maybe a bug, but not the 
same as yours IMO.


Yes, that's the one Terry and I have tried to hunt down.  Kind of 
elusive.  Apparently, there is a race condition in sm start-up.  It 
*appears* as though a process (the lowest rank on a node?) computes 
offsets into shared memory using bad values and ends up with a FIFO 
pointer to the wrong spot.  Up through 1.3.1, this meant that OMPI would 
fail in add_procs()... Jeff and Terry have seen a couple of these.  With 
changes to sm in 1.3.2, the failure expresses itself differently... not 
until the first send (namely, first use of a remote FIFO).  At least 
that's my understanding.  George added some sync to the code to make it 
bulletproof.  But doesn't seem to have fixed the problem.  Sigh.


Anyhow, I think you ran into a different but known yet not understood 
problem.


Re: [OMPI devel] Enabling debugging and profiling in openMPI (make "CFLAGS=-pg -g")

2009-06-12 Thread Leo P.
Thank you Ralph and Samuel. 

Sorry for the complete newbie question. 

The reason that i wanted to study openMPI is because i wanted to make open MPI 
support nodes that are behind NAT or firewall. If you guys could give me some 
pointers on how to go about doing this i would appreciate alot. I am 
considering this for my thesis project.

Sincerely,
LEO





From: Ralph Castain 
To: Open MPI Developers 
Sent: Friday, 12 June, 2009 9:56:16 PM
Subject: Re: [OMPI devel] Enabling debugging and profiling in openMPI (make 
"CFLAGS=-pg -g")

If you do a "./configure --help" you will get a complete list of the configure 
options. You may want to turn on more things than just enable-debug, though 
that is the critical first step.



On Jun 12, 2009, at 8:31 AM, Samuel K. Gutierrez wrote:

Hi,

Let me begin by stating that I'm at most an Open MPI novice - but you may want 
to try the addition of the --enable-debug configure option.  That is, for 
example:

./configure --enable-debug; make

Hope this helps.

Samuel K. Gutierrez
 

On Jun 12, 2009, at 3:27 AM, Leo P. wrote:

Hi everyone,

I am trying to understand the openMPI code so was trying to enable debug and 
profiling by issusing 

$ make "CFLAGS=-pg -g"

But i am getting this error.

libtool: link: ( cd ".libs" && rm -f "mca_paffinity_linux.la" && ln -s 
"../mca_paffinity_linux.la" "mca_paffinity_linux.la" )
make[3]: Leaving directory 
`/home/Desktop/openmpi-1.3.2/opal/mca/paffinity/linux'
make[2]: Leaving directory 
`/home/Desktop/openmpi-1.3.2/opal/mca/paffinity/linux'
Making all in tools/wrappers
make[2]: Entering directory `/home/Desktop/openmpi-1.3.2/opal/tools/wrappers'
depbase=`echo opal_wrapper.o | sed 's|[^/]*$|.deps/&|;s|\.o$||'`;\
gcc "-DEXEEXT=\"\"" -I. -I../../../opal/include -I../../../orte/include 
-I../../../ompi/include -I../../../opal/mca/paffinity/linux/plpa/src/libplpa   
-I../../..-pg -g -MT opal_wrapper.o -MD -MP -MF $depbase.Tpo -c -o 
opal_wrapper.o opal_wrapper.c &&\
mv -f $depbase.Tpo $depbase.Po
/bin/bash ../../../libtool --tag=CC   --mode=link gcc  -pg -g  -export-dynamic  
 -o opal_wrapper opal_wrapper.o ../../../opal/libopen-pal.la -lnsl -lutil  -lm 
libtool: link: gcc -pg -g -o .libs/opal_wrapper opal_wrapper.o 
-Wl,--export-dynamic  ../../../opal/.libs/libopen-pal.so -ldl -lnsl -lutil -lm
../../../opal/.libs/libopen-pal.so: undefined reference to `pthread_key_create'
../../../opal/.libs/libopen-pal.so: undefined reference to `pthread_getspecific'
../../../opal/.libs/libopen-pal.so: undefined reference to `pthread_create'
../../../opal/.libs/libopen-pal.so: undefined reference to `pthread_atfork'
../../../opal/.libs/libopen-pal.so: undefined reference to `pthread_setspecific'
../../../opal/.libs/libopen-pal.so: undefined reference to `pthread_join'
collect2: ld returned 1 exit status
make[2]: *** [opal_wrapper] Error 1
make[2]: Leaving directory `/home//Desktop/openmpi-1.3.2/opal/tools/wrappers'
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory `/home/Desktop/openmpi-1.3.2/opal'
make: *** [all-recursive] Error 1

Is there any other way of enabling debugging and profilling in open MPI.

Leo



Explore your hobbies and interests. Click here to 
begin.___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



  Bollywood news, movie reviews, film trailers and more! Go to 
http://in.movies.yahoo.com/

Re: [OMPI devel] Hang in collectives involving shared memory

2009-06-12 Thread Sylvain Jeaugey

Hi Ralph,

I managed to have a deadlock after a whole night, but not the same you 
have : after a quick analysis, process 0 seems to be blocked in the very 
first send through shared memory. Still maybe a bug, but not the same as 
yours IMO.


I also figured out that libnuma support was not in my library, so I 
rebuilt the lib and this doesn't seem to change anything : same execution 
speed, same memory footprint, and of course same the-bug-does-not-appear 
:-(.


So, no luck so far in reproducing your problem. I guess you're the only 
one to be able to progress on this (since you seem to have a real 
reproducer).


Sylvain

On Wed, 10 Jun 2009, Sylvain Jeaugey wrote:

Hum, very glad that padb works with Open MPI, I couldn't live without it. In 
my opinion, the best debug tool for parallel applications, and more 
importantly, the only one that scales.


About the issue, I couldn't reproduce it on my platform (tried 2 nodes with 2 
to 8 processes each, nodes are twin 2.93 GHz Nehalem, IB is Mellanox QDR).


So my feeling about that is that is may be very hardware related. Especially 
if you use the hierarch component, some transactions will be done through 
RDMA on one side and read directly through shared memory on the other side, 
which can, depending on the hardware, produce very different timings and 
bugs. Did you try with a different collective component (i.e. not hierarch) ? 
Or with another interconnect ? [Yes, of course, if it is a race condition, we 
might well avoid the bug because timings will be different, but that's still 
information]


Perhaps all what I'm saying makes no sense or you already thought about this, 
anyway, if you want me to try different things, just let me know.


Sylvain

On Wed, 10 Jun 2009, Ralph Castain wrote:


Hi Ashley

Thanks! I would definitely be interested and will look at the tool. 
Meantime, I have filed a bunch of data on this in
ticket #1944, so perhaps you might take a glance at that and offer some 
thoughts?


https://svn.open-mpi.org/trac/ompi/ticket/1944

Will be back after I look at the tool.

Thanks again
Ralph


On Wed, Jun 10, 2009 at 8:51 AM, Ashley Pittman  
wrote:


  Ralph,

  If I may say this is exactly the type of problem the tool I have been
  working on recently aims to help with and I'd be happy to help you
  through it.

  Firstly I'd say of the three collectives you mention, MPI_Allgather,
  MPI_Reduce and MPI_Bcast one exhibit a many-to-many, one a 
many-to-one

  and the last a many-to-one communication pattern.  The scenario of a
  root process falling behind and getting swamped in comms is a 
plausible
  one for MPI_Reduce only but doesn't hold water with the other two. 
 You

  also don't mention if the loop is over a single collective or if you
  have loop calling a number of different collectives each iteration.

  padb, the tool I've been working on has the ability to look at 
parallel
  jobs and report on the state of collective comms and should help 
narrow

  you down on erroneous processes and those simply blocked waiting for
  comms.  I'd recommend using it to look at maybe four or five 
instances
  where the application has hung and look for any common features 
between

  them.

  Let me know if you are willing to try this route and I'll talk, the 
code
  is downloadable from http://padb.pittman.org.uk and if you want the 
full
  collective functionality you'll need to patch openmp with the patch 
from

  http://padb.pittman.org.uk/extensions.html

  Ashley,

  --

  Ashley Pittman, Bath, UK.

  Padb - A parallel job inspection tool for cluster computing
  http://padb.pittman.org.uk

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel





Re: [OMPI devel] Enabling debugging and profiling in openMPI (make "CFLAGS=-pg -g")

2009-06-12 Thread Ralph Castain
If you do a "./configure --help" you will get a complete list of the  
configure options. You may want to turn on more things than just  
enable-debug, though that is the critical first step.



On Jun 12, 2009, at 8:31 AM, Samuel K. Gutierrez wrote:


Hi,

Let me begin by stating that I'm at most an Open MPI novice - but  
you may want to try the addition of the --enable-debug configure  
option.  That is, for example:


./configure --enable-debug; make

Hope this helps.

Samuel K. Gutierrez

On Jun 12, 2009, at 3:27 AM, Leo P. wrote:


Hi everyone,

I am trying to understand the openMPI code so was trying to enable  
debug and profiling by issusing


$ make "CFLAGS=-pg -g"

But i am getting this error.

libtool: link: ( cd ".libs" && rm -f "mca_paffinity_linux.la" && ln  
-s "../mca_paffinity_linux.la" "mca_paffinity_linux.la" )
make[3]: Leaving directory `/home/Desktop/openmpi-1.3.2/opal/mca/ 
paffinity/linux'
make[2]: Leaving directory `/home/Desktop/openmpi-1.3.2/opal/mca/ 
paffinity/linux'

Making all in tools/wrappers
make[2]: Entering directory `/home/Desktop/openmpi-1.3.2/opal/tools/ 
wrappers'

depbase=`echo opal_wrapper.o | sed 's|[^/]*$|.deps/&|;s|\.o$||'`;\
gcc "-DEXEEXT=\"\"" -I. -I../../../opal/include -I../../../orte/ 
include -I../../../ompi/include -I../../../opal/mca/paffinity/linux/ 
plpa/src/libplpa   -I../../..-pg -g -MT opal_wrapper.o -MD -MP - 
MF $depbase.Tpo -c -o opal_wrapper.o opal_wrapper.c &&\

mv -f $depbase.Tpo $depbase.Po
/bin/bash ../../../libtool --tag=CC   --mode=link gcc  -pg -g  - 
export-dynamic   -o opal_wrapper opal_wrapper.o ../../../opal/ 
libopen-pal.la -lnsl -lutil  -lm
libtool: link: gcc -pg -g -o .libs/opal_wrapper opal_wrapper.o - 
Wl,--export-dynamic  ../../../opal/.libs/libopen-pal.so -ldl -lnsl - 
lutil -lm
../../../opal/.libs/libopen-pal.so: undefined reference to  
`pthread_key_create'
../../../opal/.libs/libopen-pal.so: undefined reference to  
`pthread_getspecific'
../../../opal/.libs/libopen-pal.so: undefined reference to  
`pthread_create'
../../../opal/.libs/libopen-pal.so: undefined reference to  
`pthread_atfork'
../../../opal/.libs/libopen-pal.so: undefined reference to  
`pthread_setspecific'
../../../opal/.libs/libopen-pal.so: undefined reference to  
`pthread_join'

collect2: ld returned 1 exit status
make[2]: *** [opal_wrapper] Error 1
make[2]: Leaving directory `/home//Desktop/openmpi-1.3.2/opal/tools/ 
wrappers'

make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory `/home/Desktop/openmpi-1.3.2/opal'
make: *** [all-recursive] Error 1

Is there any other way of enabling debugging and profilling in open  
MPI.


Leo


Explore your hobbies and interests. Click here to  
begin.___

devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] Enabling debugging and profiling in openMPI (make "CFLAGS=-pg -g")

2009-06-12 Thread Samuel K. Gutierrez

Hi,

Let me begin by stating that I'm at most an Open MPI novice - but you  
may want to try the addition of the --enable-debug configure option.   
That is, for example:


./configure --enable-debug; make

Hope this helps.

Samuel K. Gutierrez

On Jun 12, 2009, at 3:27 AM, Leo P. wrote:


Hi everyone,

I am trying to understand the openMPI code so was trying to enable  
debug and profiling by issusing


$ make "CFLAGS=-pg -g"

But i am getting this error.

libtool: link: ( cd ".libs" && rm -f "mca_paffinity_linux.la" && ln - 
s "../mca_paffinity_linux.la" "mca_paffinity_linux.la" )
make[3]: Leaving directory `/home/Desktop/openmpi-1.3.2/opal/mca/ 
paffinity/linux'
make[2]: Leaving directory `/home/Desktop/openmpi-1.3.2/opal/mca/ 
paffinity/linux'

Making all in tools/wrappers
make[2]: Entering directory `/home/Desktop/openmpi-1.3.2/opal/tools/ 
wrappers'

depbase=`echo opal_wrapper.o | sed 's|[^/]*$|.deps/&|;s|\.o$||'`;\
gcc "-DEXEEXT=\"\"" -I. -I../../../opal/include -I../../../orte/ 
include -I../../../ompi/include -I../../../opal/mca/paffinity/linux/ 
plpa/src/libplpa   -I../../..-pg -g -MT opal_wrapper.o -MD -MP - 
MF $depbase.Tpo -c -o opal_wrapper.o opal_wrapper.c &&\

mv -f $depbase.Tpo $depbase.Po
/bin/bash ../../../libtool --tag=CC   --mode=link gcc  -pg -g  - 
export-dynamic   -o opal_wrapper opal_wrapper.o ../../../opal/ 
libopen-pal.la -lnsl -lutil  -lm
libtool: link: gcc -pg -g -o .libs/opal_wrapper opal_wrapper.o -Wl,-- 
export-dynamic  ../../../opal/.libs/libopen-pal.so -ldl -lnsl -lutil  
-lm
../../../opal/.libs/libopen-pal.so: undefined reference to  
`pthread_key_create'
../../../opal/.libs/libopen-pal.so: undefined reference to  
`pthread_getspecific'
../../../opal/.libs/libopen-pal.so: undefined reference to  
`pthread_create'
../../../opal/.libs/libopen-pal.so: undefined reference to  
`pthread_atfork'
../../../opal/.libs/libopen-pal.so: undefined reference to  
`pthread_setspecific'
../../../opal/.libs/libopen-pal.so: undefined reference to  
`pthread_join'

collect2: ld returned 1 exit status
make[2]: *** [opal_wrapper] Error 1
make[2]: Leaving directory `/home//Desktop/openmpi-1.3.2/opal/tools/ 
wrappers'

make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory `/home/Desktop/openmpi-1.3.2/opal'
make: *** [all-recursive] Error 1

Is there any other way of enabling debugging and profilling in open  
MPI.


Leo


Explore your hobbies and interests. Click here to  
begin.___

devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




[OMPI devel] Enabling debugging and profiling in openMPI (make "CFLAGS=-pg -g")

2009-06-12 Thread Leo P.
Hi everyone,

I am trying to understand the openMPI code so was trying to enable debug and 
profiling by issusing 

$ make "CFLAGS=-pg -g"

But i am getting this error.

libtool: link: ( cd ".libs" && rm -f "mca_paffinity_linux.la" && ln -s 
"../mca_paffinity_linux.la" "mca_paffinity_linux.la" )
make[3]: Leaving directory 
`/home/Desktop/openmpi-1.3.2/opal/mca/paffinity/linux'
make[2]: Leaving directory 
`/home/Desktop/openmpi-1.3.2/opal/mca/paffinity/linux'
Making all in tools/wrappers
make[2]: Entering directory `/home/Desktop/openmpi-1.3.2/opal/tools/wrappers'
depbase=`echo opal_wrapper.o | sed 's|[^/]*$|.deps/&|;s|\.o$||'`;\
gcc "-DEXEEXT=\"\"" -I. -I../../../opal/include -I../../../orte/include 
-I../../../ompi/include -I../../../opal/mca/paffinity/linux/plpa/src/libplpa   
-I../../..-pg -g -MT opal_wrapper.o -MD -MP -MF $depbase.Tpo -c -o 
opal_wrapper.o opal_wrapper.c &&\
mv -f $depbase.Tpo $depbase.Po
/bin/bash ../../../libtool --tag=CC   --mode=link gcc  -pg -g  -export-dynamic  
 -o opal_wrapper opal_wrapper.o ../../../opal/libopen-pal.la -lnsl -lutil  -lm 
libtool: link: gcc -pg -g -o .libs/opal_wrapper opal_wrapper.o 
-Wl,--export-dynamic  ../../../opal/.libs/libopen-pal.so -ldl -lnsl -lutil -lm
../../../opal/.libs/libopen-pal.so: undefined reference to `pthread_key_create'
../../../opal/.libs/libopen-pal.so: undefined reference to `pthread_getspecific'
../../../opal/.libs/libopen-pal.so: undefined reference to `pthread_create'
../../../opal/.libs/libopen-pal.so: undefined reference to `pthread_atfork'
../../../opal/.libs/libopen-pal.so: undefined reference to `pthread_setspecific'
../../../opal/.libs/libopen-pal.so: undefined reference to `pthread_join'
collect2: ld returned 1 exit status
make[2]: *** [opal_wrapper] Error 1
make[2]: Leaving directory `/home//Desktop/openmpi-1.3.2/opal/tools/wrappers'
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory `/home/Desktop/openmpi-1.3.2/opal'
make: *** [all-recursive] Error 1

Is there any other way of enabling debugging and profilling in open MPI.

Leo


  Own a website.Get an unlimited package.Pay next to nothing.*Go to 
http://in.business.yahoo.com/