[OMPI devel] openib currently broken

2007-11-02 Thread Jeff Squyres
The freelist changes from yesterday appear to have broken the openib  
btl.  We didn't get lots of test failures in MTT last night only  
because there was a separate (unrelated) typo in the ofud BTL that  
prevented the nightly tarball from building on any IB-capable  
machines.  :-)


Rich hopes to look into fixing the openib BTL problem today; he  
thinks it's a case of a simple oversight: the openib BTL is not using  
the new freelist init functions.


Rich: are there other places that are not using the new init  
functions that need to?


--
Jeff Squyres
Cisco Systems



Re: [OMPI devel] openib currently broken

2007-11-02 Thread Richard Graham



On 11/2/07 12:21 PM, "Jeff Squyres"  wrote:

> The freelist changes from yesterday appear to have broken the openib
> btl.  We didn't get lots of test failures in MTT last night only
> because there was a separate (unrelated) typo in the ofud BTL that
> prevented the nightly tarball from building on any IB-capable
> machines.  :-)
> 
> Rich hopes to look into fixing the openib BTL problem today; he
> thinks it's a case of a simple oversight: the openib BTL is not using
> the new freelist init functions.
> 
> Rich: are there other places that are not using the new init
> functions that need to?
> 
>>> >> the ompi free list has two init functions, I changed just one.  The IB
>>> btl uses the
>>> >> one I have not yet changed, but the pml uses the one I did change.
> 
>>> >> rich
> 
> --
> Jeff Squyres
> Cisco Systems
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 




Re: [OMPI devel] openib currently broken

2007-11-02 Thread Richard Graham
R16641 should have fixed the regression.  Anyone using ompi_free_list_t_ex()
and providing
 a memory allocator would have been bitten by this, since I did not update
this function
 (which will be deprecated in favor of a version parallel to
ompi_free_list_t_new) to initialize
 the new fields defined.  From looking through the btls, this seems to be
only the openib btl.

Rich


On 11/2/07 12:31 PM, "Richard Graham"  wrote:

> 
> 
> 
> On 11/2/07 12:21 PM, "Jeff Squyres"  wrote:
> 
>> The freelist changes from yesterday appear to have broken the openib
>> btl.  We didn't get lots of test failures in MTT last night only
>> because there was a separate (unrelated) typo in the ofud BTL that
>> prevented the nightly tarball from building on any IB-capable
>> machines.  :-)
>> 
>> Rich hopes to look into fixing the openib BTL problem today; he
>> thinks it's a case of a simple oversight: the openib BTL is not using
>> the new freelist init functions.
>> 
>> Rich: are there other places that are not using the new init
>> functions that need to?
>> 
 >> the ompi free list has two init functions, I changed just one.  The IB
 btl uses the
 >> one I have not yet changed, but the pml uses the one I did change.
>> 
 >> rich
>> 
>> --
>> Jeff Squyres
>> Cisco Systems
>> 
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> 
> 
> 
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] FreeBSD Support?

2007-11-02 Thread Karol Mroz
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hi everyone. So I noticed, after doing a build of the latest ompi-trunk
(rev. 16641) on a FreeBSD 6.2 machine, that the autogen.sh script is
still dying due to a configure script permissions issue. I'm pasting a
diff that solves the problem on my machine. Are there any foreseeable
problems with committing this change to ompi-trunk?

- ---

Index: autogen.sh
===
- --- autogen.sh  (revision 16641)
+++ autogen.sh  (working copy)
@@ -435,6 +435,7 @@
 pushd opal/libltdl > /dev/null 2>&1
 run_and_check $ompi_aclocal
 run_and_check $ompi_automake
+chmod u+w configure # Need this for FreeBSD.
 run_and_check $ompi_autoconf
 popd > /dev/null 2>&1
 unset indent

- ---

Thanks.
- --
Karol Mroz
km...@cs.ubc.ca
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFHK3LFuoug78g/Mz8RAvIDAJ9+XiEdY24ToQeaZkLxVvMK2FNuqQCcDDPU
5lMQijN9Y9ldt+zGpm/ZcJU=
=eSQ/
-END PGP SIGNATURE-


Re: [OMPI devel] openib currently broken

2007-11-02 Thread Jeff Squyres (jsquyres)
Rich -

I'm not 100% sure its fixed - I'm still seeing "out of memory" errors when 
running about 40 prob imb over openib.  But I ran out of time to investigate 
deeply...

Could you try running a nontrivial omb to check?

-jms
Sent from my PDA

 -Original Message-
From:   Richard Graham [mailto:rlgra...@ornl.gov]
Sent:   Friday, November 02, 2007 02:07 PM Eastern Standard Time
To: Open MPI Developers
Subject:Re: [OMPI devel] openib currently broken

R16641 should have fixed the regression.  Anyone using ompi_free_list_t_ex()
and providing
 a memory allocator would have been bitten by this, since I did not update
this function
 (which will be deprecated in favor of a version parallel to
ompi_free_list_t_new) to initialize
 the new fields defined.  From looking through the btls, this seems to be
only the openib btl.

Rich


On 11/2/07 12:31 PM, "Richard Graham"  wrote:

> 
> 
> 
> On 11/2/07 12:21 PM, "Jeff Squyres"  wrote:
> 
>> The freelist changes from yesterday appear to have broken the openib
>> btl.  We didn't get lots of test failures in MTT last night only
>> because there was a separate (unrelated) typo in the ofud BTL that
>> prevented the nightly tarball from building on any IB-capable
>> machines.  :-)
>> 
>> Rich hopes to look into fixing the openib BTL problem today; he
>> thinks it's a case of a simple oversight: the openib BTL is not using
>> the new freelist init functions.
>> 
>> Rich: are there other places that are not using the new init
>> functions that need to?
>> 
 >> the ompi free list has two init functions, I changed just one.  The IB
 btl uses the
 >> one I have not yet changed, but the pml uses the one I did change.
>> 
 >> rich
>> 
>> --
>> Jeff Squyres
>> Cisco Systems
>> 
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> 
> 
> 
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] openib currently broken

2007-11-02 Thread Don Kerr

Rich,

Do the ompi_free_list changes impact the sm btl?   Solaris SPARC sm btl 
seems to have an issue starting with last nights put back but I have not 
looked into it yet.


-DON

Richard Graham wrote:

R16641 should have fixed the regression.  Anyone using 
ompi_free_list_t_ex() and providing
 a memory allocator would have been bitten by this, since I did not 
update this function
 (which will be deprecated in favor of a version parallel to 
ompi_free_list_t_new) to initialize
 the new fields defined.  From looking through the btls, this seems to 
be only the openib btl.


Rich


On 11/2/07 12:31 PM, "Richard Graham"  wrote:




On 11/2/07 12:21 PM, "Jeff Squyres"  wrote:

The freelist changes from yesterday appear to have broken the
openib
btl.  We didn't get lots of test failures in MTT last night only
because there was a separate (unrelated) typo in the ofud BTL
that  
prevented the nightly tarball from building on any IB-capable

machines.  :-)

Rich hopes to look into fixing the openib BTL problem today; he
thinks it's a case of a simple oversight: the openib BTL is
not using
the new freelist init functions.

Rich: are there other places that are not using the new init
functions that need to?


the ompi free list has two init functions, I changed just

one.  The IB btl uses the

one I have not yet changed, but the pml uses the one I did

change.


rich


--
Jeff Squyres
Cisco Systems

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel
 



Re: [OMPI devel] openib currently broken

2007-11-02 Thread Richard Graham
It does.  I was able to run the point-to-point intel tests with 3 procs
using sm and self, ob1 and dr.

Rich


On 11/2/07 3:57 PM, "Don Kerr"  wrote:

> Rich,
> 
> Do the ompi_free_list changes impact the sm btl?   Solaris SPARC sm btl
> seems to have an issue starting with last nights put back but I have not
> looked into it yet.
> 
> -DON
> 
> Richard Graham wrote:
> 
>> R16641 should have fixed the regression.  Anyone using
>> ompi_free_list_t_ex() and providing
>>  a memory allocator would have been bitten by this, since I did not
>> update this function
>>  (which will be deprecated in favor of a version parallel to
>> ompi_free_list_t_new) to initialize
>>  the new fields defined.  From looking through the btls, this seems to
>> be only the openib btl.
>> 
>> Rich
>> 
>> 
>> On 11/2/07 12:31 PM, "Richard Graham"  wrote:
>> 
>> 
>> 
>> 
>> On 11/2/07 12:21 PM, "Jeff Squyres"  wrote:
>> 
>> The freelist changes from yesterday appear to have broken the
>> openib
>> btl.  We didn't get lots of test failures in MTT last night only
>> because there was a separate (unrelated) typo in the ofud BTL
>> that  
>> prevented the nightly tarball from building on any IB-capable
>> machines.  :-)
>> 
>> Rich hopes to look into fixing the openib BTL problem today; he
>> thinks it's a case of a simple oversight: the openib BTL is
>> not using
>> the new freelist init functions.
>> 
>> Rich: are there other places that are not using the new init
>> functions that need to?
>> 
 the ompi free list has two init functions, I changed just
>> one.  The IB btl uses the
 one I have not yet changed, but the pml uses the one I did
>> change.
>> 
 rich
>> 
>> --
>> Jeff Squyres
>> Cisco Systems
>> 
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> 
>> 
>> 
>> 
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> 
>> 
>> 
>> 
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>  
>> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel



Re: [OMPI devel] openib currently broken

2007-11-02 Thread Richard Graham
Jeff,
  I ran IMB on 60 procs with the openib and self btls,  and all ran fine.
The tests that were run
 were ping-pong, ping-ping, SendRecv, Exchange, Allreduce, Reduce,
Reduce_scatter, Allgather,
 Allgatherv, Alltoall, Bcast, and Barrier.  I also ran on 40 procs, and
several smaller runs.  If you
 can reproduce and provide more details (I realize you ran out of time), I
can take another look.
 I would expect a bug in the changes would cause one to walk over memory,
rather than change
 the memory usage, but who knows.  I will be off line until late Sunday...

Rich



On 11/2/07 3:26 PM, "Jeff Squyres (jsquyres)"  wrote:

> Rich -
> 
> I'm not 100% sure its fixed - I'm still seeing "out of memory" errors when
> running about 40 prob imb over openib.  But I ran out of time to investigate
> deeply...
> 
> Could you try running a nontrivial omb to check?
> 
> -jms
> Sent from my PDA
> 
>  -Original Message-
> From:   Richard Graham [mailto:rlgra...@ornl.gov]
> Sent:   Friday, November 02, 2007 02:07 PM Eastern Standard Time
> To: Open MPI Developers
> Subject:Re: [OMPI devel] openib currently broken
> 
> R16641 should have fixed the regression.  Anyone using ompi_free_list_t_ex()
> and providing
>  a memory allocator would have been bitten by this, since I did not update
> this function
>  (which will be deprecated in favor of a version parallel to
> ompi_free_list_t_new) to initialize
>  the new fields defined.  From looking through the btls, this seems to be
> only the openib btl.
> 
> Rich
> 
> 
> On 11/2/07 12:31 PM, "Richard Graham"  wrote:
> 
>> >
>> >
>> >
>> > On 11/2/07 12:21 PM, "Jeff Squyres"  wrote:
>> >
>>> >> The freelist changes from yesterday appear to have broken the openib
>>> >> btl.  We didn't get lots of test failures in MTT last night only
>>> >> because there was a separate (unrelated) typo in the ofud BTL that
>>> >> prevented the nightly tarball from building on any IB-capable
>>> >> machines.  :-)
>>> >>
>>> >> Rich hopes to look into fixing the openib BTL problem today; he
>>> >> thinks it's a case of a simple oversight: the openib BTL is not using
>>> >> the new freelist init functions.
>>> >>
>>> >> Rich: are there other places that are not using the new init
>>> >> functions that need to?
>>> >>
>>>  >> the ompi free list has two init functions, I changed just one.
The IB
>  btl uses the
>>>  >> one I have not yet changed, but the pml uses the one I did
change.
>>> >>
>>>  >> rich
>>> >>
>>> >> --
>>> >> Jeff Squyres
>>> >> Cisco Systems
>>> >>
>>> >> ___
>>> >> devel mailing list
>>> >> de...@open-mpi.org
>>> >> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> >>
>> >
>> >
>> >
>> > ___
>> > devel mailing list
>> > de...@open-mpi.org
>> > http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> 
> 
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] openib currently broken

2007-11-02 Thread Jeff Squyres (jsquyres)
Did you run with a higher number of procs?

-jms
Sent from my PDA

 -Original Message-
From:   Richard Graham [mailto:rlgra...@ornl.gov]
Sent:   Friday, November 02, 2007 04:05 PM Eastern Standard Time
To: Open MPI Developers
Subject:Re: [OMPI devel] openib currently broken

It does.  I was able to run the point-to-point intel tests with 3 procs
using sm and self, ob1 and dr.

Rich


On 11/2/07 3:57 PM, "Don Kerr"  wrote:

> Rich,
> 
> Do the ompi_free_list changes impact the sm btl?   Solaris SPARC sm btl
> seems to have an issue starting with last nights put back but I have not
> looked into it yet.
> 
> -DON
> 
> Richard Graham wrote:
> 
>> R16641 should have fixed the regression.  Anyone using
>> ompi_free_list_t_ex() and providing
>>  a memory allocator would have been bitten by this, since I did not
>> update this function
>>  (which will be deprecated in favor of a version parallel to
>> ompi_free_list_t_new) to initialize
>>  the new fields defined.  From looking through the btls, this seems to
>> be only the openib btl.
>> 
>> Rich
>> 
>> 
>> On 11/2/07 12:31 PM, "Richard Graham"  wrote:
>> 
>> 
>> 
>> 
>> On 11/2/07 12:21 PM, "Jeff Squyres"  wrote:
>> 
>> The freelist changes from yesterday appear to have broken the
>> openib
>> btl.  We didn't get lots of test failures in MTT last night only
>> because there was a separate (unrelated) typo in the ofud BTL
>> that  
>> prevented the nightly tarball from building on any IB-capable
>> machines.  :-)
>> 
>> Rich hopes to look into fixing the openib BTL problem today; he
>> thinks it's a case of a simple oversight: the openib BTL is
>> not using
>> the new freelist init functions.
>> 
>> Rich: are there other places that are not using the new init
>> functions that need to?
>> 
 the ompi free list has two init functions, I changed just
>> one.  The IB btl uses the
 one I have not yet changed, but the pml uses the one I did
>> change.
>> 
 rich
>> 
>> --
>> Jeff Squyres
>> Cisco Systems
>> 
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> 
>> 
>> 
>> 
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> 
>> 
>> 
>> 
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>  
>> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


Re: [OMPI devel] openib currently broken

2007-11-02 Thread Jeff Squyres (jsquyres)
Ok.  I'll dig a bit over the weekend.  Thanks! 

-jms
Sent from my PDA

 -Original Message-
From:   Richard Graham [mailto:rlgra...@ornl.gov]
Sent:   Friday, November 02, 2007 05:50 PM Eastern Standard Time
To: Open MPI Developers
Subject:Re: [OMPI devel] openib currently broken

Jeff,
  I ran IMB on 60 procs with the openib and self btls,  and all ran fine.
The tests that were run
 were ping-pong, ping-ping, SendRecv, Exchange, Allreduce, Reduce,
Reduce_scatter, Allgather,
 Allgatherv, Alltoall, Bcast, and Barrier.  I also ran on 40 procs, and
several smaller runs.  If you
 can reproduce and provide more details (I realize you ran out of time), I
can take another look.
 I would expect a bug in the changes would cause one to walk over memory,
rather than change
 the memory usage, but who knows.  I will be off line until late Sunday...

Rich
 


On 11/2/07 3:26 PM, "Jeff Squyres (jsquyres)"  wrote:

> Rich -
> 
> I'm not 100% sure its fixed - I'm still seeing "out of memory" errors when
> running about 40 prob imb over openib.  But I ran out of time to investigate
> deeply...
> 
> Could you try running a nontrivial omb to check?
> 
> -jms
> Sent from my PDA
> 
>  -Original Message-
> From:   Richard Graham [mailto:rlgra...@ornl.gov]
> Sent:   Friday, November 02, 2007 02:07 PM Eastern Standard Time
> To: Open MPI Developers
> Subject:Re: [OMPI devel] openib currently broken
> 
> R16641 should have fixed the regression.  Anyone using ompi_free_list_t_ex()
> and providing
>  a memory allocator would have been bitten by this, since I did not update
> this function
>  (which will be deprecated in favor of a version parallel to
> ompi_free_list_t_new) to initialize
>  the new fields defined.  From looking through the btls, this seems to be
> only the openib btl.
> 
> Rich
> 
> 
> On 11/2/07 12:31 PM, "Richard Graham"  wrote:
> 
>> >
>> >
>> >
>> > On 11/2/07 12:21 PM, "Jeff Squyres"  wrote:
>> >
>>> >> The freelist changes from yesterday appear to have broken the openib
>>> >> btl.  We didn't get lots of test failures in MTT last night only
>>> >> because there was a separate (unrelated) typo in the ofud BTL that
>>> >> prevented the nightly tarball from building on any IB-capable
>>> >> machines.  :-)
>>> >>
>>> >> Rich hopes to look into fixing the openib BTL problem today; he
>>> >> thinks it's a case of a simple oversight: the openib BTL is not using
>>> >> the new freelist init functions.
>>> >>
>>> >> Rich: are there other places that are not using the new init
>>> >> functions that need to?
>>> >>
>>>  >> the ompi free list has two init functions, I changed just one.
The IB
>  btl uses the
>>>  >> one I have not yet changed, but the pml uses the one I did
change.
>>> >>
>>>  >> rich
>>> >>
>>> >> --
>>> >> Jeff Squyres
>>> >> Cisco Systems
>>> >>
>>> >> ___
>>> >> devel mailing list
>>> >> de...@open-mpi.org
>>> >> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> >>
>> >
>> >
>> >
>> > ___
>> > devel mailing list
>> > de...@open-mpi.org
>> > http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> 
> 
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel