from:"Sven Stork"

Re: [OMPI devel] fortran application hanging when compiled with -g

2006-08-11 Thread Sven Stork

The real problem is not the -g it is the -O0 option which will be 
automatically added by -g. If you compile with "-g -ON" for 0 < N everythings 
works as expected.

Thanks,
Sven

On Friday 11 August 2006 11:54, Bettina Krammer wrote:
> Hi,
> 
> when I use the attached hello.f with Open MPI 1.1.0 and underlying Intel 
> 9.0 or 9.1 compiler on our Xeon cluster, it is deadlocking when compiled 
> with -g option but works without -g:
> 
> ===
> output with -g:
> 
> $mpirun -np 2 ./hello-g
> 
> My rank is0 !
> waiting for message from1
> My rank is1 !
> Greetings from process1 !
> Sending message from1 !
> Message recieved: HelloFromMex!
> waiting for message from1
> 
>   [...deadlock...]
> ===
> 
>  output without -g:
> 
> $mpirun -np 2 ./hello-no-g
> 
> My rank is0 !
>  waiting for message from1
>  My rank is1 !
>  Greetings from process1 !
>  Sending message from1 !
>  Message recieved: HelloFromMex!
>  All done...   0
>  All done...   1
> ===
> 
> Thanks, Bettina Krammer
> 
> (The example is taken from the distribution of DDT, to be found in 
> ddt/examples. The problem is reproducible with the simplified 
> hello-simple.f. The deadlock occurs in the DO source... MPI_Recv(...) 
>  loop)
> ===
> The config.log is not available to me.
> 
> hpc43203 cacau1 219$ompi_info
> Open MPI: 1.1
>Open MPI SVN revision: r10477
> Open RTE: 1.1
>Open RTE SVN revision: r10477
> OPAL: 1.1
>OPAL SVN revision: r10477
>   Prefix: /opt/OpenMPI/1.1.0/
>  Configured architecture: x86_64-unknown-linux-gnu
>Configured by: hpcraink
>Configured on: Mon Jul 31 12:55:30 CEST 2006
>   Configure host: cacau1
> Built by: hpcraink
> Built on: Mon Jul 31 13:16:04 CEST 2006
>   Built host: cacau1
>   C bindings: yes
> C++ bindings: yes
>   Fortran77 bindings: yes (all)
>   Fortran90 bindings: yes
>  Fortran90 bindings size: small
>   C compiler: icc
>  C compiler absolute: /opt/intel/compiler/9.1/cce/bin/icc
> C++ compiler: icpc
>C++ compiler absolute: /opt/intel/compiler/9.1/cce/bin/icpc
>   Fortran77 compiler: ifc
>   Fortran77 compiler abs: /opt/intel/compiler/9.1/fce/bin/ifc
>   Fortran90 compiler: ifc
>   Fortran90 compiler abs: /opt/intel/compiler/9.1/fce/bin/ifc
>  C profiling: yes
>C++ profiling: yes
>  Fortran77 profiling: yes
>  Fortran90 profiling: yes
>   C++ exceptions: no
>   Thread support: posix (mpi: no, progress: no)
>   Internal debug support: no
>  MPI parameter check: runtime
> Memory profiling support: no
> Memory debugging support: no
>  libltdl support: yes
>   MCA memory: ptmalloc2 (MCA v1.0, API v1.0, Component v1.1)
>MCA paffinity: linux (MCA v1.0, API v1.0, Component v1.1)
>MCA maffinity: first_use (MCA v1.0, API v1.0, Component v1.1)
>MCA maffinity: libnuma (MCA v1.0, API v1.0, Component v1.1)
>MCA timer: linux (MCA v1.0, API v1.0, Component v1.1)
>MCA allocator: basic (MCA v1.0, API v1.0, Component v1.0)
>MCA allocator: bucket (MCA v1.0, API v1.0, Component v1.0)
> MCA coll: basic (MCA v1.0, API v1.0, Component v1.1)
> MCA coll: hierarch (MCA v1.0, API v1.0, Component v1.1)
> MCA coll: self (MCA v1.0, API v1.0, Component v1.1)
> MCA coll: sm (MCA v1.0, API v1.0, Component v1.1)
> MCA coll: tuned (MCA v1.0, API v1.0, Component v1.1)
>   MCA io: romio (MCA v1.0, API v1.0, Component v1.1)
>MCA mpool: sm (MCA v1.0, API v1.0, Component v1.1)
>MCA mpool: mvapi (MCA v1.0, API v1.0, Component v1.1)
>  MCA pml: ob1 (MCA v1.0, API v1.0, Component v1.1)
>  MCA bml: r2 (MCA v1.0, API v1.0, Component v1.1)
>   MCA rcache: rb (MCA v1.0, API v1.0, Component v1.1)
>  MCA btl: self (MCA v1.0, API v1.0, Component v1.1)
>  MCA btl: sm (MCA v1.0, API v1.0, Component v1.1)
>  MCA btl: mvapi (MCA v1.0, API v1.0, Component v1.1)
>  MCA btl: tcp (MCA v1.0, API v1.0, Component v1.0)
> MCA topo: unity (MCA v1.0, API v1.0, Component v1.1)
>  MCA osc: pt2pt (MCA v1.0, API v1.0, Component v1.0)
>  MCA gpr: null (MCA v1.0, API v1.0, Component v1.1)
>  MCA gpr: proxy (MCA v1.0, API v1.0, Component v1.1)
>  MCA gpr: replica (MCA v1.0, API v1.0, Component v1.1)
>  MCA iof: proxy (MCA v1.0, API v1.0, Component v1.1)

Re: [OMPI devel] fortran application hanging when compiled with -g

2006-08-14 Thread Sven Stork

The problem is that after the MPI_Recv call the loop 
variable is set to 0, and therefore the loop doesn't stop 
as supposed to be for 2 processes.

If you use another temp. varaible as parameter the program 
works. The strange thing is that temp. variable wont be changed and still 
contains the origin value while the loop variable still gets changed (this 
time to a value higher than 0).

On Friday 11 August 2006 17:07, Jeff Squyres wrote:
> I'm not quite sure I understand -- does the application hang in an MPI call?
> Or is there some compiler error that is causing it to execute a DO loop
> incorrectly?
> 
> 
> On 8/11/06 6:25 AM, "Sven Stork"  wrote:
> 
> > The real problem is not the -g it is the -O0 option which will be
> > automatically added by -g. If you compile with "-g -ON" for 0 < N 
everythings
> > works as expected.
> > 
> > Thanks,
> > Sven
> > 
> > On Friday 11 August 2006 11:54, Bettina Krammer wrote:
> >> Hi,
> >> 
> >> when I use the attached hello.f with Open MPI 1.1.0 and underlying Intel
> >> 9.0 or 9.1 compiler on our Xeon cluster, it is deadlocking when compiled
> >> with -g option but works without -g:
> >> 
> >> ===
> >> output with -g:
> >> 
> >> $mpirun -np 2 ./hello-g
> >> 
> >> My rank is0 !
> >> waiting for message from1
> >> My rank is1 !
> >> Greetings from process1 !
> >> Sending message from1 !
> >> Message recieved: HelloFromMex!
> >> waiting for message from1
> >> 
> >>   [...deadlock...]
> >> ===
> >> 
> >>  output without -g:
> >> 
> >> $mpirun -np 2 ./hello-no-g
> >> 
> >> My rank is0 !
> >>  waiting for message from1
> >>  My rank is1 !
> >>  Greetings from process1 !
> >>  Sending message from1 !
> >>  Message recieved: HelloFromMex!
> >>  All done...   0
> >>  All done...   1
> >> ===
> >> 
> >> Thanks, Bettina Krammer
> >> 
> >> (The example is taken from the distribution of DDT, to be found in
> >> ddt/examples. The problem is reproducible with the simplified
> >> hello-simple.f. The deadlock occurs in the DO source... MPI_Recv(...)
> >>  loop)
> >> ===
> >> The config.log is not available to me.
> >> 
> >> hpc43203 cacau1 219$ompi_info
> >> Open MPI: 1.1
> >>Open MPI SVN revision: r10477
> >> Open RTE: 1.1
> >>Open RTE SVN revision: r10477
> >> OPAL: 1.1
> >>OPAL SVN revision: r10477
> >>   Prefix: /opt/OpenMPI/1.1.0/
> >>  Configured architecture: x86_64-unknown-linux-gnu
> >>Configured by: hpcraink
> >>Configured on: Mon Jul 31 12:55:30 CEST 2006
> >>   Configure host: cacau1
> >> Built by: hpcraink
> >> Built on: Mon Jul 31 13:16:04 CEST 2006
> >>   Built host: cacau1
> >>   C bindings: yes
> >> C++ bindings: yes
> >>   Fortran77 bindings: yes (all)
> >>   Fortran90 bindings: yes
> >>  Fortran90 bindings size: small
> >>   C compiler: icc
> >>  C compiler absolute: /opt/intel/compiler/9.1/cce/bin/icc
> >> C++ compiler: icpc
> >>C++ compiler absolute: /opt/intel/compiler/9.1/cce/bin/icpc
> >>   Fortran77 compiler: ifc
> >>   Fortran77 compiler abs: /opt/intel/compiler/9.1/fce/bin/ifc
> >>   Fortran90 compiler: ifc
> >>   Fortran90 compiler abs: /opt/intel/compiler/9.1/fce/bin/ifc
> >>  C profiling: yes
> >>C++ profiling: yes
> >>  Fortran77 profiling: yes
> >>  Fortran90 profiling: yes
> >>   C++ exceptions: no
> >>   Thread support: posix (mpi: no, progress: no)
> >>   Internal debug support: no
> >>  MPI parameter check: runtime
> >> Memory profiling support: no
> >> Memory debugging support: no
> >>  libltdl support: yes
> >>   MCA memory: ptmalloc2 (MCA v1.0, API v1.0, Component v1.1)
> >>MCA paffinity: linux (MCA v1.0, API v1.0, Component v1.1)
> >

Re: [OMPI devel] fortran application hanging when compiled with -g

2006-08-14 Thread Sven Stork

Problem solved.

The program has a bug. Instead of using the status variable (array) the 
program uses the stat variable (scalar). Therefore there is a buffer 
overflow.


On Monday 14 August 2006 09:58, Sven Stork wrote:
> The problem is that after the MPI_Recv call the loop 
> variable is set to 0, and therefore the loop doesn't stop 
> as supposed to be for 2 processes.
> 
> If you use another temp. varaible as parameter the program 
> works. The strange thing is that temp. variable wont be changed and still 
> contains the origin value while the loop variable still gets changed (this 
> time to a value higher than 0).
> 
> On Friday 11 August 2006 17:07, Jeff Squyres wrote:
> > I'm not quite sure I understand -- does the application hang in an MPI 
call?
> > Or is there some compiler error that is causing it to execute a DO loop
> > incorrectly?
> > 
> > 
> > On 8/11/06 6:25 AM, "Sven Stork"  wrote:
> > 
> > > The real problem is not the -g it is the -O0 option which will be
> > > automatically added by -g. If you compile with "-g -ON" for 0 < N 
> everythings
> > > works as expected.
> > > 
> > > Thanks,
> > > Sven
> > > 
> > > On Friday 11 August 2006 11:54, Bettina Krammer wrote:
> > >> Hi,
> > >> 
> > >> when I use the attached hello.f with Open MPI 1.1.0 and underlying 
Intel
> > >> 9.0 or 9.1 compiler on our Xeon cluster, it is deadlocking when 
compiled
> > >> with -g option but works without -g:
> > >> 
> > >> ===
> > >> output with -g:
> > >> 
> > >> $mpirun -np 2 ./hello-g
> > >> 
> > >> My rank is0 !
> > >> waiting for message from1
> > >> My rank is1 !
> > >> Greetings from process1 !
> > >> Sending message from1 !
> > >> Message recieved: HelloFromMex!
> > >> waiting for message from1
> > >> 
> > >>   [...deadlock...]
> > >> ===
> > >> 
> > >>  output without -g:
> > >> 
> > >> $mpirun -np 2 ./hello-no-g
> > >> 
> > >> My rank is0 !
> > >>  waiting for message from1
> > >>  My rank is1 !
> > >>  Greetings from process1 !
> > >>  Sending message from1 !
> > >>  Message recieved: HelloFromMex!
> > >>  All done...   0
> > >>  All done...   1
> > >> ===
> > >> 
> > >> Thanks, Bettina Krammer
> > >> 
> > >> (The example is taken from the distribution of DDT, to be found in
> > >> ddt/examples. The problem is reproducible with the simplified
> > >> hello-simple.f. The deadlock occurs in the DO source... MPI_Recv(...)
> > >>  loop)
> > >> ===
> > >> The config.log is not available to me.
> > >> 
> > >> hpc43203 cacau1 219$ompi_info
> > >> Open MPI: 1.1
> > >>Open MPI SVN revision: r10477
> > >> Open RTE: 1.1
> > >>Open RTE SVN revision: r10477
> > >> OPAL: 1.1
> > >>OPAL SVN revision: r10477
> > >>   Prefix: /opt/OpenMPI/1.1.0/
> > >>  Configured architecture: x86_64-unknown-linux-gnu
> > >>Configured by: hpcraink
> > >>Configured on: Mon Jul 31 12:55:30 CEST 2006
> > >>   Configure host: cacau1
> > >> Built by: hpcraink
> > >> Built on: Mon Jul 31 13:16:04 CEST 2006
> > >>   Built host: cacau1
> > >>   C bindings: yes
> > >> C++ bindings: yes
> > >>   Fortran77 bindings: yes (all)
> > >>   Fortran90 bindings: yes
> > >>  Fortran90 bindings size: small
> > >>   C compiler: icc
> > >>  C compiler absolute: /opt/intel/compiler/9.1/cce/bin/icc
> > >> C++ compiler: icpc
> > >>C++ compiler absolute: /opt/intel/compiler/9.1/cce/bin/icpc
> > >>   Fortran77 compiler: ifc
> > >>   Fortran77 compiler abs: /opt/intel/compiler/9.1/fce/bin/ifc
> > >>   Fortran90 compiler: ifc
> > >>   Fortran90 compiler abs: /

Re: [OMPI devel] [IPv6] new component oob/tcp6

2006-09-11 Thread Sven Stork

On Thursday 07 September 2006 18:42, George Bosilca wrote:
> I still wonder why we need any configuration "magic". We don't want  
> to be the only one around supporting IPv4 OR IPv6. Supporting both of  
> them simultaneously can be interesting, and it does not require huge  
> changes. In fact, we have a problem only at the connection step,  
> everything else will be identically.
> 
> In fact, as we're talking about the TCP layer, we might want to  
> finish the discussion we had a while ago, about merging the OOB and  
> the BTL in one component. They do have very similar functions, and  
> right now we have to maintain 2 components. I think it's more than  
> time to do the merge, and move the resulting component or whatever  
> down in the OPAL layer.
> 
> I even volunteer for that. Next week I will be away, so I will come  
> back with a design for the phone conference on ... well beginning of  
> october.

Sounds the most reasonable solution for me. At the moment the TCP BTL would 
have a problem in the case where a Open MPI jobs is spawned across multiple 
cells where at least 2 cells have the same private IP address range. In this 
scenario a process of one cell could think that a process from the other cell 
is reachable.

That's not really an IPv6 specific problem but when we are thinking about 
moving the BTL down to the OPAL layer we should take care about that. I'm not 
sure if other BTLs have similar problems (e.g. 2 infiniband cells connect via 
TCP).


Thanks,
 Sven

>george.
> 
> 
> On Sep 7, 2006, at 12:22 PM, Ralph H Castain wrote:
> 
> > Jeff and I talked about this for awhile this morning, and we both  
> > agree
> > (yes, I did change my mind after we discussed all the  
> > ramifications). It
> > appears that we should be able to consolidate the code into a single
> > component with the right configuration system "magic" - and that would
> > definitely be preferable.
> >
> > My primary concern originally was with the lack of knowledge and
> > documentation on the configuration system. I know that I don't know  
> > enough
> > about that system to make everything work in a single component. The
> > component method would have allowed you to remain ignorant of that  
> > system.
> > However, with Jeff's willingness to help in that regard, the  
> > approach he
> > recommends would be easier for everyone.
> >
> > Hope that doesn't cause too much of a problem.
> > Ralph
> >
> >
> > On 9/7/06 9:46 AM, "Jeff Squyres"  wrote:
> >
> >> On 9/1/06 12:21 PM, "Adrian Knoth"  wrote:
> >>
> >>> On Fri, Sep 01, 2006 at 07:01:25AM -0600, Ralph Castain wrote:
> >>>
> > Do you agree to go on with two oob components, tcp and tcp6?
>  Yes, I think that's the right approach
> >>>
> >>> It's a deal. ;)
> >>
> >> Actually, I would disagree here (sorry for jumping in late! :-( ).
> >>
> >> Given the amount of code duplication, it seems like a big shame to  
> >> make a
> >> separate component that is almost identical.
> >>
> >> Can we just have one component that handles both ivp4 and ivp6?   
> >> Appropriate
> >> #if's can be added (I'm willing to help with the configure.m4 mojo  
> >> -- the
> >> stuff to tell OMPI whether ipv4 and/or ipv6 stuff can be found and  
> >> to set
> >> the #define's appropriately).
> >>
> >> More specifically -- I can help with component / configure / build  
> >> system
> >> issues.  I'll defer on the whole how-to-wire-them-up issue for the  
> >> moment
> >> (I've got some other fires burning that must be tended to :-\ ).
> >>
> >> My $0.02: OOB is the first target to get working -- once you can  
> >> orterun
> >> non-MPI apps properly across ipv6 and/or ipv4 nodes, then move on  
> >> to the MPI
> >> layer and take the same approach there (e.g., one TCP btl with  
> >> configure.m4
> >> mojo, etc.).
> >
> >
> > ___
> > devel mailing list
> > de...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> "Half of what I say is meaningless; but I say it so that the other  
> half may reach you"
>Kahlil Gibran
> 
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>

Re: [OMPI devel] Orte update

2007-07-13 Thread Sven Stork

Hi Ralph,

On Thursday 12 July 2007 15:53, Ralph H Castain wrote:
> Yo all
> 
> I have a fairly significant change coming to the orte part of the code base
> that will require an autogen (sorry). I'll check it in late this afternoon
> (can't do it at night as it is on my office desktop).
> 
> The commit will fix the singleton operations, including singleton
> comm_spawn. It also takes the first step towards removing event-driven
> operations, replacing them with more serial code (to be explained
> separately). As part of all this, I had to modify the various pls
> components. For those I could not compile, I made a first cut at them that
> should (hopefully) allow them to continue to operate.
> 
> Any of you using TM: we discovered that the trunk is not working currently
> on that environment. We are investigating - it has nothing to do with this
> commit, but predates it.

what you mean with broken ?
I tried r15394 on out cluster and TM looks working for me. The only issue I 
currently know about is the problem with iof (see ticket #1071, can be tmp. 
fixed by using -mca iof ^null)

Thanks,
  Sven 

> Just wanted to give you a heads-up. Please refrain from making changes to
> the orte codebase today, if you could - it would simplify the commit and
> ensure we don't lose your changes.
> 
> Thanks
> Ralph
> 
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>

Re: [OMPI devel] Orte update

2007-07-16 Thread Sven Stork

On Friday 13 July 2007 15:35, Ralph H Castain wrote:
> 
> On 7/13/07 7:22 AM, "Sven Stork"  wrote:
> 
> > Hi Ralph,
> > 
> > On Thursday 12 July 2007 15:53, Ralph H Castain wrote:
> >> Yo all
> >> 
> >> I have a fairly significant change coming to the orte part of the code 
base
> >> that will require an autogen (sorry). I'll check it in late this 
afternoon
> >> (can't do it at night as it is on my office desktop).
> >> 
> >> The commit will fix the singleton operations, including singleton
> >> comm_spawn. It also takes the first step towards removing event-driven
> >> operations, replacing them with more serial code (to be explained
> >> separately). As part of all this, I had to modify the various pls
> >> components. For those I could not compile, I made a first cut at them 
that
> >> should (hopefully) allow them to continue to operate.
> >> 
> >> Any of you using TM: we discovered that the trunk is not working 
currently
> >> on that environment. We are investigating - it has nothing to do with 
this
> >> commit, but predates it.
> > 
> > what you mean with broken ?
> > I tried r15394 on out cluster and TM looks working for me. The only issue 
I
> > currently know about is the problem with iof (see ticket #1071, can be 
tmp.
> > fixed by using -mca iof ^null)
> 
> That is correct - the null component was being incorrectly selected because
> of an error in its selection logic. We fixed it in the r15390 commit - it
> was a trivial fix - so now everything works fine.
> 

I cannot see anything in r15390 that fixes this issue. I checked with the 
latest version of the trunk and have still the same issue:

hpcstork@noco042:~/ > ompi_info
Open MPI: 1.3a1r15427
   Open MPI SVN revision: r15427
...
hpcstork@noco042:~/ > mpiexec date
hpcstork@noco042:~/ > mpiexec -mca iof ^null date
Mon Jul 16 14:00:57 CEST 2007
Mon Jul 16 14:00:57 CEST 2007
hpcstork@noco042:~/ > 

Thanks,
  Sven

> > 
> > Thanks,
> >   Sven 
> > 
> >> Just wanted to give you a heads-up. Please refrain from making changes to
> >> the orte codebase today, if you could - it would simplify the commit and
> >> ensure we don't lose your changes.
> >> 
> >> Thanks
> >> Ralph
> >> 
> >> 
> >> ___
> >> devel mailing list
> >> de...@open-mpi.org
> >> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >> 
> 
> 
>

Re: [OMPI devel] pml failures?

2007-08-01 Thread Sven Stork

Hi,

since yesterday I noticed that Netpipe and sometimes IMB are hanging. As far 
as I saw both processes stuck in a receive. The wired thing is that if I run 
it in a debugger everything works fine.

Cheers,
  Sven 

On Tuesday 31 July 2007 23:47, Jeff Squyres wrote:
> I'm getting a pile of test failures when running with the openib and  
> tcp BTLs on the trunk.  Gleb is getting some failures, too, but his  
> seem to be different than mine.
> 
> Here's what I'm seeing from manual MTT runs on my SVN/development  
> install -- did you know that MTT could do that? :-)
> 
> +-+---+--+--+--+--+
> | Phase   | Section   | Pass | Fail | Time out | Skip |
> +-+---+--+--+--+--+
> | Test Run| intel | 442  | 0| 26   | 0|
> | Test Run| ibm   | 173  | 3| 1| 3|
> +-+---+--+--+--+--+
> 
> The tests that are failing are:
> 
> *** WARNING: Test: MPI_Recv_pack_c, np=16, variant=1: TIMED OUT (failed)
> *** WARNING: Test: MPI_Ssend_ator_c, np=16, variant=1: TIMED OUT  
> (failed)
> *** WARNING: Test: MPI_Irecv_pack_c, np=16, variant=1: TIMED OUT  
> (failed)
> *** WARNING: Test: MPI_Isend_ator_c, np=16, variant=1: TIMED OUT  
> (failed)
> *** WARNING: Test: MPI_Irsend_rtoa_c, np=16, variant=1: TIMED OUT  
> (failed)
> *** WARNING: Test: MPI_Ssend_rtoa_c, np=16, variant=1: TIMED OUT  
> (failed)
> *** WARNING: Test: MPI_Send_rtoa_c, np=16, variant=1: TIMED OUT (failed)
> *** WARNING: Test: MPI_Send_ator_c, np=16, variant=1: TIMED OUT (failed)
> *** WARNING: Test: MPI_Rsend_rtoa_c, np=16, variant=1: TIMED OUT  
> (failed)
> *** WARNING: Test: MPI_Reduce_loc_c, np=16, variant=1: TIMED OUT  
> (failed)
> *** WARNING: Test: MPI_Isend_ator2_c, np=16, variant=1: TIMED OUT  
> (failed)
> *** WARNING: Test: MPI_Issend_rtoa_c, np=16, variant=1: TIMED OUT  
> (failed)
> *** WARNING: Test: MPI_Isend_rtoa_c, np=16, variant=1: TIMED OUT  
> (failed)
> *** WARNING: Test: MPI_Send_ator2_c, np=16, variant=1: TIMED OUT  
> (failed)
> *** WARNING: Test: MPI_Issend_ator_c, np=16, variant=1: TIMED OUT  
> (failed)
> *** WARNING: Test: comm_join, np=16, variant=1: TIMED OUT (failed)
> *** WARNING: Test: getcount, np=16, variant=1: FAILED
> *** WARNING: Test: spawn, np=3, variant=1: FAILED
> *** WARNING: Test: spawn_multiple, np=3, variant=1: FAILED
> 
> I'm not too worried about the comm spawn/join tests because I think  
> they're heavily oversubscribing the nodes and therefore timing out.   
> These were all from a default trunk build running with "mpirun --mca  
> btl openib,self".
> 
> For all of these tests, I'm running on 4 nodes, 4 cores each, but  
> they have varying numbers of network interfaces:
> 
>nodes 1,2  nodes 3,4
> openib3 active ports 2 active ports
> tcp   4 tcp interfaces   3 tcp interfaces
> 
> Is anyone else seeing these kinds of failures?
> 
> -- 
> Jeff Squyres
> Cisco Systems
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>

[OMPI devel] using google-perftools for hunting memory leaks

2007-08-06 Thread Sven Stork

Dear all,

while hunting for memory leaks I found the google performance tools quite 
useful. The included memory manager has the feature for checking for memory 
leak. Unlike other tools you can use this feature without any recompilation 
and still get some nice call graph locating the memory allocation root (see 
attachment). As it might also be interesting for other people I wanted to 
mention it. Here the link to the homepage :

http://goog-perftools.sourceforge.net

Cheers,
  Sven


pprof6154.0.pdf
Description: Adobe PDF document

Re: [OMPI devel] [OMPI svn] svn:open-mpi r15848

2007-08-14 Thread Sven Stork

On Tuesday 14 August 2007 15:23, Tim Prins wrote:
> This might be breaking things on odin. All our 64 bit openib mtt tests 
> have the following output:
> 
> [odin003.cs.indiana.edu:30971] Wrong QP specification (QP 0 
> "P,128,256,128,16:S,1024,256,128,32:S,4096,256,128,32:S,65536,256,128,32"). 
> Point-to-point QP get 1-5 parameters
> 
> However, on my debug build I do not get any errors. Is anyone else 
> seeing this?

Just checked the mtt webpage for our viscluster which is 64. And It shows the 
same error message. By locking to the commit it looks like that the first  
triple has been extended to a quadruple ?

Cheers,
  Sven 

> Thanks,
> 
> Tim
> 
> 
> jsquy...@osl.iu.edu wrote:
> > Author: jsquyres
> > Date: 2007-08-13 17:51:05 EDT (Mon, 13 Aug 2007)
> > New Revision: 15848
> > URL: https://svn.open-mpi.org/trac/ompi/changeset/15848
> > 
> > Log:
> > Change the default receive_queues value per
> > http://www.open-mpi.org/community/lists/devel/2007/08/2100.php.
> > 
> > Text files modified: 
> >trunk/ompi/mca/btl/openib/btl_openib_mca.c | 2 +-
> >   
> >1 files changed, 1 insertions(+), 1 deletions(-)
> > 
> > Modified: trunk/ompi/mca/btl/openib/btl_openib_mca.c
> > 
==
> > --- trunk/ompi/mca/btl/openib/btl_openib_mca.c  (original)
> > +++ trunk/ompi/mca/btl/openib/btl_openib_mca.c  2007-08-13 17:51:05 EDT 
(Mon, 13 Aug 2007)
> > @@ -477,7 +477,7 @@
> >  char *str;
> >  char **queues, **params = NULL;
> >  int num_pp_qps = 0, num_srq_qps = 0, qp = 0, ret = OMPI_ERROR;
> > -char *default_qps 
= "P,128,16,4;S,1024,256,128,32;S,4096,256,128,32;S,65536,256,128,32";
> > +char *default_qps 
= "P,128,256,128,16:S,1024,256,128,32:S,4096,256,128,32:S,65536,256,128,32";
> >  uint32_t max_qp_size, max_size_needed;
> >  
> >  reg_string("receive_queues",
> > ___
> > svn mailing list
> > s...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/svn
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>

Re: [OMPI devel] [OMPI svn] svn:open-mpi r15881

2007-08-17 Thread Sven Stork

On Friday 17 August 2007 13:58, Jeff Squyres wrote:
> On Aug 16, 2007, at 1:13 PM, Tim Prins wrote:
> 
> >> So you're both right.  :-)  But Tim's falling back on an older (and
> >> unfortunately bad) precedent.  It would be nice to not extend that
> >> bad precedent, IMHO...
> >
> > I really don't care where the constants are defined, but they do  
> > need to
> > be unique. I think it is easiest if all the constants are stored in  
> > one
> > file, but if someone else wants to chop them up, that's fine with  
> > me. We
> > would just have to be more careful when adding new constants to check
> > both files.
> 
> Ya, IIRC, this is a definite problem that we had: it's at the core of  
> the "component" abstraction (a component should be wholly self- 
> contained and not have any component-specific definitions outside of  
> itself), but these tags are a central resource that need to be  
> allocated in a distributed fashion.
> 
> That's why I think it was decided to simply leave them as they were,  
> and/or use the (DYNAMIC-x) form.  I don't have any better suggestion;  
> I'm just providing rationale for the reason it was the way it was...
> 
> >> True.  We will need a robust tag reservation system, though, to
> >> guarantee that every process gets the same tag values (e.g., if udapl
> >> is available on some nodes but not others, will that cause openib to
> >> have different values on different nodes?  And so on).
> > Not really. All that is needed is a list of constants (similar to the
> > one in rml_types.h).
> 
> I was assuming a dynamic/run-time tag assignment (which is obviously  
> problematic for the reason I cited, and others).  But static is also  
> problematic for the breaking-abstraction reasons.  Stalemate.

What's about this. Every component choose its own tag independent from the 
others. Before a component can use the tag it must register with its full 
name and the tag at a small (process intern) database. If 2 components try to 
register the same tag we emit a warning, terminate the processes, ... .

If 2 components (CompA and CompB) want to register the same tag and we assume 
that process A loads _only_ CompA while processes B loads _only_ CompB than 
both components will be loaded without any error.
I assume that it's rather unusual that CompA send a message to process B as 
there is no counter component. But there is still some probability.

For more safety (and of course less performance) we could :
- add a parameter that cause this tag database to sync. across all processes.
- add a parameter that turns a check for every send/receive, if the specified 
tag has been used or not

Just my 0.02 $
   Sven

> > If a rsl component doesn't like the particular
> > constant tag values, they can do whatever they want in their
> > implementation, as long as a messages sent on a tag is received on the
> > same tag.
> 
> Sure.
> 
> -- 
> Jeff Squyres
> Cisco Systems
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>

Re: [OMPI devel] [OMPI svn] svn:open-mpi r9323 - Missing datatype_memcpy.c

2006-03-17 Thread Sven Stork

Hello George,

is it possible that you forget to checkin the "datatype_memcpy.c" file ?

Thanks,
Sven

On Friday 17 March 2006 09:05, you wrote:
> bosilca

Re: [OMPI devel] fortran application hanging when compiled with -g

Re: [OMPI devel] fortran application hanging when compiled with -g

Re: [OMPI devel] fortran application hanging when compiled with -g

Re: [OMPI devel] [IPv6] new component oob/tcp6

Re: [OMPI devel] Orte update

Re: [OMPI devel] Orte update

Re: [OMPI devel] pml failures?

[OMPI devel] using google-perftools for hunting memory leaks

Re: [OMPI devel] [OMPI svn] svn:open-mpi r15848

Re: [OMPI devel] [OMPI svn] svn:open-mpi r15881

Re: [OMPI devel] [OMPI svn] svn:open-mpi r9323 - Missing datatype_memcpy.c

11 matches

Site Navigation

Mail list logo

Footer information