Re: [OMPI users] Running with native ugni on a Cray XC

2015-10-27 Thread Tim Mattox
;>> [nid00012:12788] [13] ./osu_latency[0x400dd9]
>>> [nid00012:12788] *** End of error message ***
>>> osu_latency: btl_ugni_endpoint.c:87: mca_btl_ugni_ep_connect_start:
>>> Assertion `0' failed.
>>>
>>>
>>> Here's how I build:
>>>
>>> export FC=ftn (I'm not using Fortran, but the configure fails if
>>> it can't find a Fortran compiler)
>>> ./configure --prefix=/lus/scratch/nradclif/openmpi_install
>>> --enable-mpi-fortran=none
>>> --with-platform=contrib/platform/lanl/cray_xe6/debug-lustre
>>> make install
>>>
>>> I didn't modify the debug-lustre file, but I did change cray-common to
>>> remove the hard-coding, e.g., rather than using the gemini-specific path
>>> "with_pmi=/opt/cray/pmi/2.1.4-1..8596.8.9.gem", I used
>>> "with_pmi=/opt/cray/pmi/default".
>>>
>>> I've tried running different executables with different numbers of
>>> ranks/nodes, but they all seem to run into problems with PMI_KVS_Put.
>>>
>>> Any ideas what could be going wrong?
>>>
>>> Thanks for any help,
>>> Nick
>>> ___
>>> users mailing list
>>> us...@open-mpi.org
>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> Link to this post:
>>> http://www.open-mpi.org/community/lists/users/2015/06/27197.php
>>
>>
>> ___
>> users mailing list
>> us...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>> Link to this post:
>> http://www.open-mpi.org/community/lists/users/2015/06/27199.php
>
>
>
> ___
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post:
> http://www.open-mpi.org/community/lists/users/2015/06/27200.php



-- 
Tim Mattox, Ph.D. - tmat...@gmail.com


[OMPI users] Open MPI v1.2.9 released

2009-02-17 Thread Tim Mattox
The Open MPI Team, representing a consortium of research, academic,
and industry partners, is pleased to announce the release of Open MPI
version 1.2.9. This release is mainly a bug fix release over the v1.2.8
release.  This is the last planned release from the 1.2 series.

We strongly recommend that all users upgrade to the 1.3 series
if possible.

Version 1.2.9 can be downloaded from the main Open MPI web site or
any of its mirrors (mirrors will be updating shortly).

Here is a list of changes in v1.2.9 as compared to v1.2.8:
- Fix a segfault when using one-sided communications on some forms of derived
  datatypes.  Thanks to Dorian Krause for reporting the bug. See #1715.
- Fix an alignment problem affecting one-sided communications on
  some architectures (e.g., SPARC64). See #1738.
- Fix compilation on Solaris when thread support is enabled in Open MPI
  (e.g., when using --with-threads). See #1736.
- Correctly take into account the MTU that an OpenFabrics device port
  is using. See #1722 and
  https://bugs.openfabrics.org/show_bug.cgi?id=1369.
- Fix two datatype engine bugs. See #1677.
  Thanks to Peter Kjellstrom for the bugreport.
- Fix the bml r2 help filename so the help message can be found. See #1623.
- Fix a compilation problem on RHEL4U3 with the PGI 32 bit compiler
  caused by .  See ticket #1613.
- Fix the --enable-cxx-exceptions configure option. See ticket #1607.
- Properly handle when the MX BTL cannot open an endpoint. See ticket #1621.
- Fix a double free of events on the tcp_events list. See ticket #1631.
- Fix a buffer overun in opal_free_list_grow (called by MPI_Init).
  Thanks to Patrick Farrell for the bugreport and Stephan Kramer for
  the bugfix.  See ticket #1583.
- Fix a problem setting OPAL_PREFIX for remote sh-based shells.
  See ticket #1580.

--
Tim Mattox, Ph.D.
Open Systems Lab
Indiana University


Re: [OMPI users] dead lock in MPI_Finalize

2009-01-21 Thread Tim Mattox
Can you send all the information listed here:

   http://www.open-mpi.org/community/help/

On Wed, Jan 21, 2009 at 8:58 AM, Bernard Secher - SFME/LGLS
<bernard.sec...@cea.fr> wrote:
> Hello,
>
> I have a case wher i have a dead lock in MPI_Finalize() function with
> openMPI v1.3.
>
> Can some body help me please?
>
> Bernard
>
>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>



-- 
Tim Mattox, Ph.D. - http://homepage.mac.com/tmattox/
 tmat...@gmail.com || timat...@open-mpi.org
I'm a bright... http://www.the-brights.net/


[OMPI users] Announcing the release of Open MPI version 1.3

2009-01-19 Thread Tim Mattox
The Open MPI Team, representing a consortium of research, academic,
and industry partners, is pleased to announce the release of Open MPI
version 1.3. This release contains many bug fixes, feature
enhancements, and performance improvements over the v1.2 series,
including (but not limited to):

   * MPI2.1 compliant
   * New Notifier framework
   * Additional architectures, OS's and batch schedulers
   * Improved thread safety
   * MPI_REAL16 and MPI_COMPLEX32
   * Improved MPI C++ bindings
   * Valgrind support
   * Updated ROMIO to the version from MPICH2-1.0.7
   * Improved Scalability
 - Process launch times reduced by an order of magnitude
 - sparse groups
 - On-demand connection setup
   * Improved point-to-point latencies
   * Better adaptive algorithms for multi-rail support
   * Additional collective algorithms; improved collective performance
   * Numerous enhancements for OpenFabrics
   * iWARP support
   * Fault Tolerance
 - coordinated checkpoint/restart
 - support for BLCR and self
   * Finer grained resource control and mapping (cores, HCAs, etc)
   * Many other new runtime features
   * Numerous bug fixes

Version 1.3 can be downloaded from the main Open MPI web site or any
of its mirrors (mirrors will be updating shortly).

We strongly recommend that all users upgrade to version 1.3 if possible.

Here are a list of some of the changes in v1.3 as compared the 1.2 series:

- Fixed deadlock issues under heavy messaging scenarios
- Extended the OS X 10.5.x (Leopard) workaround for a problem when
  assembly code is compiled with -g[0-9].  Thanks to Barry Smith for
  reporting the problem.  See ticket #1701.
- Disabled MPI_REAL16 and MPI_COMPLEX32 support on platforms where the
  bit representation of REAL*16 is different than that of the C type
  of the same size (usually long double).  Thanks to Julien Devriendt
  for reporting the issue.  See ticket #1603.
- Increased the size of MPI_MAX_PORT_NAME to 1024 from 36. See ticket #1533.
- Added "notify debugger on abort" feature. See tickets #1509 and #1510.
  Thanks to Seppo Sahrakropi for the bug report.
- Upgraded Open MPI tarballs to use Autoconf 2.63, Automake 1.10.1,
  Libtool 2.2.6a.
- Added missing MPI::Comm::Call_errhandler() function.  Thanks to Dave
  Goodell for bringing this to our attention.
- Increased MPI_SUBVERSION value in mpi.h to 1 (i.e., MPI 2.1).
- Changed behavior of MPI_GRAPH_CREATE, MPI_TOPO_CREATE, and several
  other topology functions per MPI-2.1.
- Fix the type of the C++ constant MPI::IN_PLACE.
- Various enhancements to the openib BTL:
  - Added btl_openib_if_[in|ex]clude MCA parameters for
including/excluding comma-delimited lists of HCAs and ports.
  - Added RDMA CM support, including btl_openib_cpc_[in|ex]clude MCA
parameters
  - Added NUMA support to only use "near" network adapters
  - Added "Bucket SRQ" (BSRQ) support to better utilize registered
memory, including btl_openib_receive_queues MCA parameter
  - Added ConnectX XRC support (and integrated with BSRQ)
  - Added btl_openib_ib_max_inline_data MCA parameter
  - Added iWARP support
  - Revamped flow control mechanisms to be more efficient
  - "mpi_leave_pinned=1" is now the default when possible,
automatically improving performance for large messages when
application buffers are re-used
- Eliminated duplicated error messages when multiple MPI processes fail
  with the same error.
- Added NUMA support to the shared memory BTL.
- Add Valgrind-based memory checking for MPI-semantic checks.
- Add support for some optional Fortran datatypes (MPI_LOGICAL1,
  MPI_LOGICAL2, MPI_LOGICAL4 and MPI_LOGICAL8).
- Remove the use of the STL from the C++ bindings.
- Added support for Platform/LSF job launchers.  Must be Platform LSF
  v7.0.2 or later.
- Updated ROMIO with the version from MPICH2 1.0.7.
- Added RDMA capable one-sided component (called rdma), which
  can be used with BTL components that expose a full one-sided
  interface.
- Added the optional datatype MPI_REAL2. As this is added to the "end of"
  predefined datatypes in the fortran header files, there will not be
  any compatibility issues.
- Added Portable Linux Processor Affinity (PLPA) for Linux.
- Addition of a finer symbols export control via the visibility feature
  offered by some compilers.
- Added checkpoint/restart process fault tolerance support. Initially
  support a LAM/MPI-like protocol.
- Removed "mvapi" BTL; all InfiniBand support now uses the OpenFabrics
  driver stacks ("openib" BTL).
- Added more stringent MPI API parameter checking to help user-level
  debugging.
- The ptmalloc2 memory manager component is now by default built as
  a standalone library named libopenmpi-malloc.  Users wanting to
  use leave_pinned with ptmalloc2 will now need to link the library
  into their application explicitly.  All other users will use the
  libc-provided allocator instead of Open MPI's ptmalloc2.  This change
  may be overridden with the configure option 

Re: [OMPI users] Deadlock on large numbers of processors

2009-01-12 Thread Tim Mattox
Hi Justin,
I applied the fixes for this particular deadlock to the 1.3 code base
late last week, see ticket #1725:
https://svn.open-mpi.org/trac/ompi/ticket/1725

This should fix the described problem, but I personally have not tested
to see if the deadlock in question is now gone.  Everyone should give
thanks to George for his efforts in tracking down the problem
and finding a solution.
  -- Tim Mattox, the v1.3 gatekeeper

On Mon, Jan 12, 2009 at 12:46 PM, Justin <luitj...@cs.utah.edu> wrote:
> Hi,  has this deadlock been fixed in the 1.3 source yet?
>
> Thanks,
>
> Justin
>
>
> Jeff Squyres wrote:
>>
>> On Dec 11, 2008, at 5:30 PM, Justin wrote:
>>
>>> The more I look at this bug the more I'm convinced it is with openMPI and
>>> not our code.  Here is why:  Our code generates a communication/execution
>>> schedule.  At each timestep this schedule is executed and all communication
>>> and execution is performed.  Our problem is AMR which means the
>>> communication schedule may change from time to time.  In this case the
>>> schedule has not changed in many timesteps meaning the same communication
>>> schedule is being used as the last X (x being around 20 in this case)
>>> timesteps.
>>> Our code does have a very large communication problem.  I have been able
>>> to reduce the hang down to 16 processors and it seems to me the hang occurs
>>> when he have lots of work per processor.  Meaning if I add more processors
>>> it may not hang but reducing processors makes it more likely to hang.
>>> What is the status on the fix for this particular freelist deadlock?
>>
>>
>> George is actively working on it because it is the "last" issue blocking
>> us from releasing v1.3.  I fear that if he doesn't get it fixed by tonight,
>> we'll have to push v1.3 to next year (see
>> http://www.open-mpi.org/community/lists/devel/2008/12/5029.php and
>> http://www.open-mpi.org/community/lists/users/2008/12/7499.php).
>>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>

-- 
Tim Mattox, Ph.D. - http://homepage.mac.com/tmattox/
 tmat...@gmail.com || timat...@open-mpi.org
I'm a bright... http://www.the-brights.net/


Re: [OMPI users] Problem with openmpi and infiniband

2008-12-24 Thread Tim Mattox
For your runs with Open MPI over InfiniBand, try using openib,sm,self
for the BTL setting, so that shared memory communications are used
within a node.  It would give us another datapoint to help diagnose
the problem.  As for other things we would need to help diagnose the
problem, please follow the advice on this FAQ entry, and the help page:
http://www.open-mpi.org/faq/?category=openfabrics#ofa-troubleshoot
http://www.open-mpi.org/community/help/

On Wed, Dec 24, 2008 at 5:55 AM, Biagio Lucini <b.luc...@swansea.ac.uk> wrote:
> Pavel Shamis (Pasha) wrote:
>>
>> Biagio Lucini wrote:
>>>
>>> Hello,
>>>
>>> I am new to this list, where I hope to find a solution for a problem
>>> that I have been having for quite a longtime.
>>>
>>> I run various versions of openmpi (from 1.1.2 to 1.2.8) on a cluster
>>> with Infiniband interconnects that I use and administer at the same
>>> time. The openfabric stac is OFED-1.2.5, the compilers gcc 4.2 and
>>> Intel. The queue manager is SGE 6.0u8.
>>
>> Do you use OpenMPI version that is included in OFED ? Did you was able
>> to run basic OFED/OMPI tests/benchmarks between two nodes ?
>>
>
> Hi,
>
> yes to both questions: the OMPI version is the one that comes with OFED
> (1.1.2-1) and the basic tests run fine. For instance, IMB-MPI1 (which is
> more than basic, as far as I can see) reports for the last test:
>
> #---
> # Benchmarking Barrier
> # #processes = 6
> #---
>  #repetitions  t_min[usec]  t_max[usec]  t_avg[usec]
> 100022.9322.9522.94
>
>
> for the openib,self btl (6 processes, all processes on different nodes)
> and
>
> #---
> # Benchmarking Barrier
> # #processes = 6
> #---
>  #repetitions  t_min[usec]  t_max[usec]  t_avg[usec]
> 1000   191.30   191.42   191.34
>
> for the tcp,self btl (same test)
>
> No anomalies for other tests (ping-pong, all-to-all etc.)
>
> Thanks,
> Biagio
>
>
> --
> =
>
> Dr. Biagio Lucini
> Department of Physics, Swansea University
> Singleton Park, SA2 8PP Swansea (UK)
> Tel. +44 (0)1792 602284
>
> =
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>



-- 
Tim Mattox, Ph.D. - http://homepage.mac.com/tmattox/
 tmat...@gmail.com || timat...@open-mpi.org
I'm a bright... http://www.the-brights.net/


Re: [OMPI users] Valgrind Warning

2008-12-15 Thread Tim Mattox
Gabriele,
You should switch the the upcoming v1.3 release as soon as you can,
via either the v1.3rc2 tarball or one of the v1.3rc3 nightly tarballs,
available here: http://www.open-mpi.org/nightly/
We are very close to v1.3, so not much should be different between
those versions and the final v1.3.

To see why you should switch, read the FAQ entries about Memchecker
that start here:
http://www.open-mpi.org/faq/?category=debugging#memchecker_what

On Mon, Dec 15, 2008 at 9:34 AM, Gabriele Fatigati <g.fatig...@cineca.it> wrote:
> PS: i'm using openmpi 1.2.5..
>
> 2008/12/15 Gabriele Fatigati <g.fatig...@cineca.it>:
>> Hi Jeff,
>> i recompiled libibverbs and libmthca with valgrind flas, but for
>> strage reasons, only warning over MPI_Send are disappears, but
>> warnings over MPI_Recv remains!
>>
>> 2008/12/15 Jeff Squyres <jsquy...@cisco.com>:
>>> Whoops; that was a reply to the wrong message.  :-)
>>>
>>> You'll need to recompile both libiverbs and libmthca with the valgrind flag,
>>> and use the enhanced memchecker support in v1.3.
>>>
>>> I have not personally verified that all the warnings disappear in this
>>> configuration (I was hoping to verify this somewhere during the v1.3
>>> series).
>>>
>>>
>>>
>>>>
>>>> On Dec 15, 2008, at 6:37 AM, Gabriele Fatigati wrote:
>>>>
>>>>> Hi Jeff,
>>>>> i recompiled libmthca with --with-valgrind flag, and modified
>>>>> enviroment variables, but warnings doesnt' disappears..
>>>>>
>>>>> 2008/12/14 Jeff Squyres <jsquy...@cisco.com>:
>>>>>>
>>>>>> On Dec 14, 2008, at 8:21 AM, Gabriele Fatigati wrote:
>>>>>>
>>>>>>> i have a strage problems with OpenMPI 1.2.5 Intel Compiled when i
>>>>>>> debug my code under Valgrind 3.3. In a very simple ping-pong MPI
>>>>>>> application, i retrieve strange warnings about MPI Communications,
>>>>>>> like MPI_Send,MPI_Recv. Valgrind tells me that there are uninitialized
>>>>>>> values in send/recv buffers, but there are initialized, i'm absolutely
>>>>>>> sure!
>>>>>>>
>>>>>>> There warnings are detected when my application runs over Infiniband
>>>>>>> net,
>>>>>>
>>>>>> This is because IB uses memory that does not come from the memory
>>>>>> allocator
>>>>>> that Valgrind is aware of (e.g., it may be memory that was allocated by
>>>>>> the
>>>>>> kernel itself).  Hence, since Valgrind is unaware of the memory, it
>>>>>> thinks
>>>>>> that its contents are undefined.  As such, it's quite likely that you're
>>>>>> seeing false positives.
>>>>>>
>>>>>> The memchecker support in the upcoming v1.3 series made quite a few
>>>>>> advancements in the area of valgrind memory checking, and recent
>>>>>> versions of
>>>>>> libibverbs allow you to compile in valgrind extensions that tell
>>>>>> valgrind
>>>>>> "this memory is ok" (which prevents these false positives).  I'm pretty
>>>>>> sure
>>>>>> that the OFED install does not enable these libibverbs valgrind
>>>>>> extensions;
>>>>>> you will likely need your own installation of libibverbs and your verbs
>>>>>> plugin (libmthca for you, I think...?) that explicitly has the valgrind
>>>>>> extensions enabled.
>>>>>>
>>>>>> --
>>>>>> Jeff Squyres
>>>>>> Cisco Systems
>>>>>>
>>>>>> ___
>>>>>> users mailing list
>>>>>> us...@open-mpi.org
>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Ing. Gabriele Fatigati
>>>>>
>>>>> Parallel programmer
>>>>>
>>>>> CINECA Systems & Tecnologies Department
>>>>>
>>>>> Supercomputing Group
>>>>>
>>>>> Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy
>>>>>
>>>>> www.cineca.itTel:   +39 051 6171722
>>

Re: [OMPI users] undefined symbol: MPI_Send

2008-12-08 Thread Tim Mattox
Hello Heitor,
We need more information to be able to answer your question,
such as what version of Open MPI are you using, and what
kind of OS/machine are you running on, and what kind of network, etc.
Please follow the directions on this webpage for getting help:
http://www.open-mpi.org/community/help/

On Mon, Dec 8, 2008 at 9:35 AM, Heitor Florido <heitorflor...@gmail.com> wrote:
> Hello,
>
> My program keeps throwing this error after I created a child process with
> MPI_comm_spawn:
>
> ./../../Desktop/computacaoDistribuida/src/server/server: symbol lookup
> error: ./../../Desktop/computacaoDistribuida/src/server/server: undefined
> symbol: MPI_Send
>
> I've already used MPI_Send on other parts of the program...
> I've tried to print the message recieved from child process, but a similar
> message appears:
>
> ./../../Desktop/computacaoDistribuida/src/server/server: symbol lookup
> error: ./../../Desktop/computacaoDistribuida/src/server/server: undefined
> symbol: printf, version GLIBC_2.0
>
> This printf is executed if MPI_Comm_spawn returned MPI_SUCESS, so I guess
> this is working.
>
> It appears that my libs (glibc, mpi) were unload after the MPI_comm_spawn.
>
> Does anyone knows what's this??
>
> Heitor Florido
>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>



-- 
Tim Mattox, Ph.D. - http://homepage.mac.com/tmattox/
 tmat...@gmail.com || timat...@open-mpi.org
I'm a bright... http://www.the-brights.net/


Re: [OMPI users] infiniband problem

2008-11-20 Thread Tim Mattox
t;> --
>>> [0,1,3]: uDAPL on host n02 was unable to find any NICs.
>>> Another transport will be used instead, although this may result in
>>> lower performance.
>>>
>>> --
>>>
>>> --
>>> [0,1,1]: uDAPL on host n02 was unable to find any NICs.
>>> Another transport will be used instead, although this may result in
>>> lower performance.
>>>
>>> ----------
>>> MPI with normal GB Etherrnet and IP networking just works fine, but the
>>> infinband doesn't. The MPI libs I am using
>>> for the test are definitely compiled with IB support and the tests have
>>> been run successfully on
>>> the cluster before.
>>>
>>> Any suggestions what is going wrong here?
>>>
>>> Best regards and thanks for any help!
>>>
>>> Michael
>>>
>>>
>>>
>>>
>>> ___
>>> users mailing list
>>> us...@open-mpi.org <mailto:us...@open-mpi.org>
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>> 
>>
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>



-- 
Tim Mattox, Ph.D. - http://homepage.mac.com/tmattox/
 tmat...@gmail.com || timat...@open-mpi.org
I'm a bright... http://www.the-brights.net/


Re: [OMPI users] ompi-checkpoint is hanging

2008-10-31 Thread Tim Mattox
rid-demo-1.cit.tu-berlin.de:14252] orte_checkpoint: Checkpointing...
> [grid-demo-1.cit.tu-berlin.de:14252] PID 13572
> [grid-demo-1.cit.tu-berlin.de:14252] Connected to Mpirun [[9529,0],0]
> [grid-demo-1.cit.tu-berlin.de:14252] Terminating after checkpoint
> [grid-demo-1.cit.tu-berlin.de:14252] orte_checkpoint: notify_hnp: Contact
> Head Node Process PID 13572
> [grid-demo-1.cit.tu-berlin.de:14252] orte_checkpoint: notify_hnp: Requested
> a checkpoint of jobid [INVALID]
> [grid-demo-1.cit.tu-berlin.de:14252] orte_checkpoint: hnp_receiver: Receive
> a command message.
> [grid-demo-1.cit.tu-berlin.de:14252] orte_checkpoint: hnp_receiver: Status
> Update.
> [grid-demo-1.cit.tu-berlin.de:14252] Requested - Global
> Snapshot Reference: (null)
> [grid-demo-1.cit.tu-berlin.de:14252] orte_checkpoint: hnp_receiver: Receive
> a command message.
> [grid-demo-1.cit.tu-berlin.de:14252] orte_checkpoint: hnp_receiver: Status
> Update.
> [grid-demo-1.cit.tu-berlin.de:14252] Pending (Termination) - Global
> Snapshot Reference: (null)
> [grid-demo-1.cit.tu-berlin.de:14252] orte_checkpoint: hnp_receiver: Receive
> a command message.
> [grid-demo-1.cit.tu-berlin.de:14252] orte_checkpoint: hnp_receiver: Status
> Update.
> [grid-demo-1.cit.tu-berlin.de:14252]   Running - Global
> Snapshot Reference: (null)
> ---
>
> I want to underline that ompi-checkpoint is not hanging each
> time I execute it while more than one job is running, but in
> approx. 50% of all cases. I don't see any difference between
> successful and failing calls...
>
>
> Is there perhaps a way of increasing the debug output?
>
>
> Best,
> Matthias
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>



-- 
Tim Mattox, Ph.D. - http://homepage.mac.com/tmattox/
 tmat...@gmail.com || timat...@open-mpi.org
I'm a bright... http://www.the-brights.net/


Re: [MTT users] Fwd: [Alert] Found server-side submit error messages

2008-10-28 Thread Tim Mattox
I ran into this a week ago on sif, so I added report_after_n_results = 100
for our regular nightly tests on sif, odin and bigred.  Josh, could this be a
problem with any of the tests you run?

On Tue, Oct 28, 2008 at 6:15 PM, Jeff Squyres <jsquy...@cisco.com> wrote:
> That host is in one of IU's clusters (odin).
>
> Tim/Josh -- this is you guys...
>
>
> On Oct 28, 2008, at 3:45 PM, Ethan Mallove wrote:
>
>> Folks,
>>
>> I got an alert from the http-log-checker.pl script. Somebody appears to
>> have
>> lost some MTT results. (Possibly due to an oversized database submission
>> to
>> submit/index.php?) There's an open ticket for this (see
>> https://svn.open-mpi.org/trac/mtt/ticket/375).  Currently there exists a
>> simple
>> workaround for this problem. Put the below line in the problematic "Test
>> run"
>> section(s). This will prevent oversided submissions by directing MTT to
>> submit
>> the results in batches of 50 results instead of an entire section at a
>> time,
>> which can reach 400+ for an Intel test run section.
>>
>>   report_after_n_results = 50
>>
>> It's hard to know whose errors are in the HTTP error log with only the IP
>> address. If you want to verify they are or are not yours, visit a bogus
>> URL off
>> open-mpi.org, e.g., www.open-mpi.org/what-is-foobars-ip-address, and ping
>> me
>> about it. This will write your IP address to the log file, and then this
>> can be
>> matched with the IP addr against the submit.php errors.
>>
>> -Ethan
>>
>>
>> - Forwarded message from Ethan Mallove <emall...@osl.iu.edu> -
>>
>> From: Ethan Mallove <emall...@osl.iu.edu>
>> Date: Tue, 28 Oct 2008 08:00:41 -0400
>> To: ethan.mall...@sun.com, http-log-checker.pl-no-re...@open-mpi.org
>> Subject: [Alert] Found server-side submit error messages
>> Original-recipient: rfc822;ethan.mall...@sun.com
>>
>> This email was automatically sent by http-log-checker.pl. You have
>> received
>> it because some error messages were found in the HTTP(S) logs that
>> might indicate some MTT results were not successfully submitted by the
>> server-side PHP submit script (even if the MTT client has not
>> indicated a submission error).
>>
>> ###
>> #
>> # The below log messages matched "gz.*submit/index.php" in
>> # /var/log/httpd/www.open-mpi.org/ssl_error_log
>> #
>> ###
>>
>> [client 129.79.240.114] PHP Warning:  gzeof(): supplied argument is not a
>> valid stream resource in
>> /nfs/rontok/xraid/data/osl/www/www.open-mpi.org/mtt/submit/index.php on line
>> 1923
>> [client 129.79.240.114] PHP Warning:  gzgets(): supplied argument is not a
>> valid stream resource in
>> /nfs/rontok/xraid/data/osl/www/www.open-mpi.org/mtt/submit/index.php on line
>> 1924
>> ...
>> 
>>
>>
>>
>>
>> - End forwarded message -
>> ___
>> mtt-users mailing list
>> mtt-us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/mtt-users
>
>
> --
> Jeff Squyres
> Cisco Systems
>
> ___
> mtt-users mailing list
> mtt-us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/mtt-users
>



-- 
Tim Mattox, Ph.D. - http://homepage.mac.com/tmattox/
 tmat...@gmail.com || timat...@open-mpi.org
I'm a bright... http://www.the-brights.net/


[OMPI users] Open MPI v1.2.8 released

2008-10-14 Thread Tim Mattox
The Open MPI Team, representing a consortium of research, academic,
and industry partners, is pleased to announce the release of Open MPI
version 1.2.8. This release is mainly a bug fix release over the v1.2.7
release, but there are few new features.  We strongly recommend
that all users upgrade to version 1.2.8 if possible.

Version 1.2.8 can be downloaded from the main Open MPI web site or
any of its mirrors (mirrors will be updating shortly).

Here is a list of changes in v1.2.8 as compared to v1.2.7:

- Tweaked one memory barrier in the openib component to be more conservative.
  May fix a problem observed on PPC machines.  See ticket #1532.
- Fix OpenFabrics IB partition support. See ticket #1557.
- Restore v1.1 feature that sourced .profile on remote nodes if the default
  shell will not do so (e.g. /bin/sh and /bin/ksh).  See ticket #1560.
- Fix segfault in MPI_Init_thread() if ompi_mpi_init() fails. See ticket #1562.
- Adjust SLURM support to first look for $SLURM_JOB_CPUS_PER_NODE instead of
  the deprecated $SLURM_TASKS_PER_NODE environment variable.  This change
  may be *required* when using SLURM v1.2 and above.  See ticket #1536.
- Fix the MPIR_Proctable to be in process rank order. See ticket #1529.
- Fix a regression introduced in 1.2.6 for the IBM eHCA. See ticket #1526.

-- 
Tim Mattox, Ph.D.
Open Systems Lab
Indiana University


Re: [OMPI users] MPI Finalize

2008-09-19 Thread Tim Mattox
This sounds like you have left a file open when using the MPI-2 I/O.
You need to MPI_File_close() any files you have opened.

On Fri, Sep 19, 2008 at 6:10 PM, Gabriele Fatigati <g.fatig...@cineca.it> wrote:
> Hi,
> i'm developing a C code under OpenMPI 1.2.5 with parallel I/O by MPI-2.
> I have a strange problem in the MPI_Finalize() routine. The code generates
> message reported below :
>
> *** An error occurred in MPI_Barrier
> *** after MPI was finalized
> *** MPI_ERRORS_ARE_FATAL (goodbye)
>
> In my code, i don't use MPI_Barrier. So, this error is caused by internal
> MPI_Barrier into MPI_Finalize. I've noted that if i disable MPI-2 I/O
> routine, application works well. Is there a strange effect of MPI_Finalize
> under MPI2 related MPI_File_open, MPI_File_close, MPI_File_Reat_at routines?
>
>
> --
> Gabriele Fatigati
>
> CINECA Systems & Tecnologies Department
>
> Supercomputing Group
>
> Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy
>
> www.cineca.it Tel: +39 051 6171722
>
> g.fatig...@cineca.it
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
Tim Mattox, Ph.D. - http://homepage.mac.com/tmattox/
 tmat...@gmail.com || timat...@open-mpi.org
I'm a bright... http://www.the-brights.net/


Re: [MTT users] Patch to add --local-scratch option

2008-09-19 Thread Tim Mattox
I've also been thinking about this a bit more, and although
having the name match the INI section name has some appeal,
I ultimately think the best name is: --mpi-build-scratch, since
that is what it does.  As Ethan mentioned, the actual MPI
install goes into --scratch.  And on the other side of it,
the MPI Get also goes into --scratch.  The --mpi-build scratch
is only used for untaring/copying the MPI source tree, running
config, make, and make check.  The actual "make install"
simply copies the binaries from --mpi-build-scratch into --scratch.

As for names like local-scratch or fast-scratch, they don't convey
what it's used for, so should it be fast-for-big-files, of fast-for-small-files?
Or similarly, "local" to my cluster, my node, or what?
I think mpi-build-scratch conveys the most useful meaning, since you
should pick a filesystem that is tuned (or at least not horrible) for
doing configure/make.

Unfortunately, I won't have time today to get the patch adjusted
and into svn.  Maybe on Monday.

On Fri, Sep 19, 2008 at 11:23 AM, Ethan Mallove <ethan.mall...@sun.com> wrote:
> On Thu, Sep/18/2008 05:35:13PM, Jeff Squyres wrote:
>> On Sep 18, 2008, at 10:45 AM, Ethan Mallove wrote:
>>
>>>> Ah, yeah, ok, now I see why you wouldl call it --mpi-install-scratch, so
>>>> that it matches the MTT ini section name.  Sure, that works for me.
>>>
>>> Since this does seem like a feature that should eventually
>>> propogate to all the other phases (except for Test run),
>>> what will we call the option to group all the fast phase
>>> scratches?
>>
>> --scratch
>>
>> :-)
>>
>> Seriously, *if* we ever implement the other per-phase scratches, I think
>> having one overall --scratch and fine-grained per-phase specifications
>> fine.  I don't think we need to go overboard to have a way to say I want
>> phases X, Y, and Z to use scratch A.  Meaning that you could just use
>> --X-scratch=A --Y-scratch=A and --Z-scratch=A.
>
> --mpi-install-scratch actually has MTT install (using
> DESTDIR) into --scratch. Is that confusing? Though
> --fast-scratch could also be misleading, as I could see a
> user thinking that --fast-scratch will do some magical
> optimization to make their NFS directory go faster. I guess
> I'm done splitting hairs over --mpi-install-scratch :-)
>
> -Ethan
>
>
>>
>> --
>> Jeff Squyres
>> Cisco Systems
>>
>> ___
>> mtt-users mailing list
>> mtt-us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/mtt-users
> ___
> mtt-users mailing list
> mtt-us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/mtt-users
>



-- 
Tim Mattox, Ph.D. - http://homepage.mac.com/tmattox/
 tmat...@gmail.com || timat...@open-mpi.org
I'm a bright... http://www.the-brights.net/


Re: [MTT users] Patch to add --local-scratch option

2008-09-18 Thread Tim Mattox
I Guess I should comment on Jeff's comments too.

On Thu, Sep 18, 2008 at 9:00 AM, Jeff Squyres <jsquy...@cisco.com> wrote:
> On Sep 16, 2008, at 12:07 PM, Ethan Mallove wrote:
>
>> What happens if one uses --local-scratch, but leaves out the
>> --scratch option? In this case, I think MTT should assume
>> --scratch is the same as --local-scratch.
>
> In this case, my $0.02 is that it should be an error.  --scratch implies a
> --local-scatch, but I don't think the implication should go the other way.

Yeah, I agree, especially if we call it --mpi-install-scratch.

>> Could the "local" in --local-scratch ever be misleading?
>> Couldn't a user ever use a mounted filesystem that's faster
>> than all their local filesystem? Should it be
>> --fast-scratch?
>
> Mmm... good point.  What if we name it what it really is:
> --mpi-install-scratch?  This also opens the door for other phase scratches
> if we ever want them.  And it keeps everything consistent and simple (from
> the user's point of view).

Ah, yeah, ok, now I see why you wouldl call it --mpi-install-scratch, so
that it matches the MTT ini section name.  Sure, that works for me.

>> For future work, how about --scratch taking a (CSV?) list of
>> scratch directories. MTT then does a quick check for which
>> is the fastest filesystem (if such a check is
>> possible/feasible), and proceeds accordingly. That is, doing
>> everything it possible can in a fast scratch (builds,
>> writing out metadata, etc.), and installing the MPI(s) into
>> the slow mounted scratch. Would this be possible?
>
> Hmm.  I'm not quite sure how that would work -- "fastest" is a hard thing to
> determine.  What is "fastest" at this moment may be "slowest" 2 minutes from
> now, for example.

Yeah, I claim that auto-detecting file system speed is outside the
scope of MTT. :-)

> I'm looking at the patch in detail now... sorry for the delay...
>
> --
> Jeff Squyres
> Cisco Systems
>
> ___
> mtt-users mailing list
> mtt-us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/mtt-users
>



-- 
Tim Mattox, Ph.D. - http://homepage.mac.com/tmattox/
 tmat...@gmail.com || timat...@open-mpi.org
 I'm a bright... http://www.the-brights.net/


Re: [MTT users] Patch to add --local-scratch option

2008-09-18 Thread Tim Mattox
OK, so how about calling it --mpi-build-scratch?
Once we get a consensus on what to call it, I can commit the patch to svn.

Can anyone check it quick for vpath builds?

Just a FYI, I've already run into the "downside" I mentioned once this week.
I had to rerun my MTT to get access to the build directory, since it
was on /tmp on some random BigRed compute node.  Hmm... maybe a
feature enhancement would be to copy it to your regular --scratch if
a build failure was detected?  Maybe later I'll do that as yet another option,
say, --copy-mpi-build-on-failure.  No time this week, but hey, its an idea.

On Thu, Sep 18, 2008 at 9:10 AM, Jeff Squyres <jsquy...@cisco.com> wrote:
> Patch looks good.  Please also update the CHANGES file (this file reflects
> bullets for things that have happened since the core testers branch).
>
>
> On Sep 15, 2008, at 6:15 PM, Tim Mattox wrote:
>
>> Hello,
>> Attached is a patchfile for the mtt trunk that adds a
>> --local-scratch 
>> option to client/mtt.  You can also specify something like
>> this in your [MTT] ini section:
>> local_scratch = ("echo /tmp/`whoami`_mtt")
>>
>> This local-scratch directory is then used for part of the --mpi-install
>> phase to speed up your run.  Specifically, the source-code of the
>> MPI is untarred there, configure is run, make all, and make check.
>> Then, when make install is invoked the MPI is installed into the
>> usual place as if you hadn't used --local-scratch.  If you don't
>> use --local-scratch, then the builds occur in the usual place that
>> they have before.
>>
>> For the clusters at IU that seem to have slow NSF home directories,
>> this cuts the --mpi-install phase time in half.
>>
>> The downside is that if the MPI build fails, your build directory is out
>> on some compile-node's /tmp and is harder to go debug.  But, since
>> mpi build failures are now rare, this should make for quicker turnaround
>> for the general case.
>>
>> I think I adjusted the code properly for the vpath build case, but I've
>> never used that so haven't tested it.  Also, I adjusted the free disk
>> space
>> check code.  Right now it only checks the free space on --scratch,
>> and won't detect if --local-scratch is full.  If people really care, I
>> could make it check both later.  But for now, if your /tmp is full
>> you probably have other problems to worry about.
>>
>> Comments?  Can you try it out, and if I get no objections, I'd like
>> to put this into the MTT trunk this week.
>> --
>> Tim Mattox, Ph.D. - http://homepage.mac.com/tmattox/
>> tmat...@gmail.com || timat...@open-mpi.org
>> I'm a bright... http://www.the-brights.net/
>> ___
>> mtt-users mailing list
>> mtt-us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/mtt-users
>
>
> --
> Jeff Squyres
> Cisco Systems
>
> ___
> mtt-users mailing list
> mtt-us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/mtt-users
>



-- 
Tim Mattox, Ph.D. - http://homepage.mac.com/tmattox/
 tmat...@gmail.com || timat...@open-mpi.org
 I'm a bright... http://www.the-brights.net/


[MTT users] Patch to add --local-scratch option

2008-09-15 Thread Tim Mattox
Hello,
Attached is a patchfile for the mtt trunk that adds a
--local-scratch 
option to client/mtt.  You can also specify something like
this in your [MTT] ini section:
local_scratch = ("echo /tmp/`whoami`_mtt")

This local-scratch directory is then used for part of the --mpi-install
phase to speed up your run.  Specifically, the source-code of the
MPI is untarred there, configure is run, make all, and make check.
Then, when make install is invoked the MPI is installed into the
usual place as if you hadn't used --local-scratch.  If you don't
use --local-scratch, then the builds occur in the usual place that
they have before.

For the clusters at IU that seem to have slow NSF home directories,
this cuts the --mpi-install phase time in half.

The downside is that if the MPI build fails, your build directory is out
on some compile-node's /tmp and is harder to go debug.  But, since
mpi build failures are now rare, this should make for quicker turnaround
for the general case.

I think I adjusted the code properly for the vpath build case, but I've
never used that so haven't tested it.  Also, I adjusted the free disk space
check code.  Right now it only checks the free space on --scratch,
and won't detect if --local-scratch is full.  If people really care, I
could make it check both later.  But for now, if your /tmp is full
you probably have other problems to worry about.

Comments?  Can you try it out, and if I get no objections, I'd like
to put this into the MTT trunk this week.
-- 
Tim Mattox, Ph.D. - http://homepage.mac.com/tmattox/
 tmat...@gmail.com || timat...@open-mpi.org
 I'm a bright... http://www.the-brights.net/


mtt-local-scratch.patch
Description: Binary data


[OMPI users] Open MPI v1.2.7 released

2008-08-28 Thread Tim Mattox
The Open MPI Team, representing a consortium of research, academic,
and industry partners, is pleased to announce the release of Open MPI
version 1.2.7. This release is mainly a bug fix release over the v1.2.6
release, but there are few new features.  We strongly recommend
that all users upgrade to version 1.2.7 if possible.

Version 1.2.7 can be downloaded from the main Open MPI web site or
any of its mirrors (mirrors will be updating shortly).

Here is a list of changes in v1.2.7 as compared to v1.2.6:

- Add some Sun HCA vendor IDs.  See ticket #1461.
- Fixed a memory leak in MPI_Alltoallw when called from Fortran.
  Thanks to Dave Grote for the bugreport.  See ticket #1457.
- Only link in libutil when it is needed/desired.  Thanks to
  Brian Barret for diagnosing and fixing the problem.  See ticket #1455.
- Update some QLogic HCA vendor IDs.  See ticket #1453.
- Fix F90 binding for MPI_CART_GET.  Thanks to Scott Beardsley for
  bringing it to our attention. See ticket #1429.
- Remove a spurious warning message generated in/by ROMIO. See ticket #1421.
- Fix a bug where command-line MCA parameters were not overriding
  MCA parameters set from environment variables.  See ticket #1380.
- Fix a bug in the AMD64 atomics assembly.  Thanks to Gabriele Fatigati
  for the bug report and bugfix.  See ticket #1351.
- Fix a gather and scatter bug on intercommunicators when the datatype
  being moved is 0 bytes. See ticket #1331.
- Some more man page fixes from the Debian maintainers.
  See tickets #1324 and #1329.
- Have openib BTL (OpenFabrics support) check for the presence of
  /sys/class/infiniband before allowing itself to be used.  This check
  prevents spurious "OMPI did not find RDMA hardware!" notices on
  systems that have the software drivers installed, but no
  corresponding hardware.  See tickets #1321 and #1305.
- Added vendor IDs for some ConnectX openib HCAs. See ticket #1311.
- Fix some RPM specfile inconsistencies.  See ticket #1308.
  Thanks to Jim Kusznir for noticing the problem.
- Removed an unused function prototype that caused warnings on
  some systems (e.g., OS X).  See ticket #1274.
- Fix a deadlock in inter-communicator scatter/gather operations.
  Thanks to Martin Audet for the bug report.  See ticket #1268.

-- 
Tim Mattox, Ph.D.
Open Systems Lab
Indiana University


Re: [OMPI users] Checkpoint problem

2008-08-20 Thread Tim Mattox
Hello,
Three things...
1) Josh, the main developer for checkpoint/restart, has been away for
a few weeks
and has just returned.  I suspect he will get unburied from e-mail in
another day or two.

2) The 1.4 (and 1.3) branch is very much under rapid development, and
there will be times
when basic functionality will just break for a day or so.  If you run
into a problem, please try
to be more specific about what version (include the r#) that you tried.

3) The checkpoint/restart functionality currently only supports a
subset of the network
transports.  I think all that you should expect to work right now is
TCP and shared memory.
Josh is working on other transports, but those are very much a "work
in progress".

On Wed, Aug 20, 2008 at 4:11 AM, Matthias Hovestadt
<m...@cs.tu-berlin.de> wrote:
> Hi Gabriele!
>
>> In this case, mpirun works well, but the checkpoint procedure fails:
>>
>> ompi-checkpoint 20109
>> [node0316:20134] Error: Unable to get the current working directory
>> [node0316:20134] [[42404,0],0] ORTE_ERROR_LOG: Not found in file
>> orte-checkpoint.c at line 395
>> [node0316:20134] HNP with PID 20109 Not found!
>
> I had exactly the same problem on my machine. Neither modifying
> the configure parameters nor the way of invoking the ompi-checkpoint
> command did help. Since I am using the source from subversion checkout,
> I also updated the source several times, following the day to day
> progress. However, this problem remained.
>
> Luckily, updating the source to SVN revision 19265 finally solved
> this checkpointing issue. Maybe the problem shows up again in later
> versions...
>
>
> Best,
> Matthias
> _______
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>

-- 
Tim Mattox, Ph.D. - http://homepage.mac.com/tmattox/
 tmat...@gmail.com || timat...@open-mpi.org
 I'm a bright... http://www.the-brights.net/


Re: [OMPI users] memory leak in alltoallw

2008-08-18 Thread Tim Mattox
The fix for this bug is in the 1.2 branch as of r19360, and will be in the
upcoming 1.2.7 release.

On Sun, Aug 17, 2008 at 6:10 PM, George Bosilca <bosi...@eecs.utk.edu> wrote:
> Dave,
>
> Thanks for your report. As you discovered we had a memory leak in the
> MPI_Alltoallw. A very small one, but it was there. Basically, we didn't
> release two internal arrays of data-types, used to convert from the Fortran
> data-types (as supplied by the user) to their C version (as required by the
> implementation of the alltoallw function).
>
> The good news is that this should not a problem anymore. Commit 19314 fix
> this for the trunk, while commit 19315 fix it for the upcoming 1.3.
>
>  Thanks again for your report.
>george.
>
> On Aug 7, 2008, at 1:21 AM, Dave Grote wrote:
>
>>
>> Hi,
>> I've been enhancing my code and have started using the nice routine
>> alltoallw. The code works fine except that there seems to be a memory leak
>> in alltoallw. I've eliminated all other possible causes and have reduced the
>> code down to a bare minimum. I've included fortran source code which
>> produces the problem. This code just keeps calling alltoallw, but with all
>> of the send and receive counts set to zero, so it shouldn't be doing
>> anything. And yet I can watch the memory continue to grow. As a sanity
>> check, I change the code to call alltoallv instead, and there is no memory
>> leak. If it helps, I am using OpenMPI on an AMD system running Chaos linux.
>> I tried the latest nightly build of version 1.3 from Aug 5. I run four
>> processors on one quad core node so it should be using shared memory
>> communication.
>>  Thanks!
>>Dave
>>
>>program testalltoallw
>>real(kind=8):: phi(-3:3200+3)
>>real(kind=8):: phi2(-3:3200+3)
>>integer(4):: izproc,ii
>>integer(4):: nzprocs
>>integer(4):: zrecvtypes(0:3),zsendtypes(0:3)
>>integer(4):: zsendcounts(0:3),zrecvcounts(0:3)
>>integer(4):: zdispls(0:3)
>>integer(4):: mpierror
>>include "mpif.h"
>>phi = 0.
>>
>>call MPI_INIT(mpierror)
>>call MPI_COMM_SIZE(MPI_COMM_WORLD,nzprocs,mpierror)
>>call MPI_COMM_RANK(MPI_COMM_WORLD,izproc,mpierror)
>>
>>zsendcounts=0
>>zrecvcounts=0
>>zdispls=0
>>zdispls=0
>>zsendtypes=MPI_DOUBLE_PRECISION
>>zrecvtypes=MPI_DOUBLE_PRECISION
>>
>>do ii=1,10
>>  if (mod(ii,100_4) == 0) print*,"loop ",ii,izproc
>>
>>  call MPI_ALLTOALLW(phi,zsendcounts,zdispls,zsendtypes,
>>   & phi2,zrecvcounts,zdispls,zrecvtypes,
>>   & MPI_COMM_WORLD,mpierror)
>>
>>enddo
>>return
>>end
>>
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>



-- 
Tim Mattox, Ph.D. - http://homepage.mac.com/tmattox/
 tmat...@gmail.com || timat...@open-mpi.org
 I'm a bright... http://www.the-brights.net/


Re: [OMPI users] cluster LiveCD

2008-08-07 Thread Tim Mattox
I think a better approach than using NFS-root or LiveCDs is to use Perceus in
this situation, since it has been developed over many years to handle this
sort of thing (diskless/stateless beowulf clusters):
  http://www.perceus.org/
It leverages PXE booting so all you need to do on a per-node basis is enable
PXE booting in the BIOS.  The primary limitation I see would be if your
windows machines are set up to use DHCP to get their IP addresses from
some server that is outside your control, since Perceus would need to take
over DHCP services to do its magic.

On Wed, Aug 6, 2008 at 6:05 PM, Adam C Powell IV <hazel...@debian.org> wrote:
> On Tue, 2008-08-05 at 17:01 -0500, Ben Payne wrote:
>> Hello.  I am not sure if this is the correct list to ask this
>> question, so if you know of a more appropriate one please let me know.
>>
>> I think am looking for a LiveCD that supports MPI, specifically one
>> that has mpif90 built in, and can easily mount external (USB) drives
>> for storing data.
>>
>> I have access to 40 Windows computers in a lab that rarely gets used.
>> I would like to use the computers to run a cluster during the
>> weekends, but be able to not mess with the Windows installation that
>> exists on the hard drive. Because of this, I think a LiveCD would be
>> good, and one that supports PXE booting is even better.  If there is a
>> better way to do this (run MPI, not disrupt Windows) please let me
>> know.
>
> The easiest way to do what you want is probably to netboot from a server
> on the subnet with NFS-root.  That way you don't need to make a bunch of
> new CDs to upgrade a single piece of software.  Just upgrade/modify the
> environment on the server, and everybody else upgrades instantly.
>
> Turn on the server and reboot the machines, and you're in Linux with
> MPI.  Disable PXE at the server and reboot and you're back in Windows.
>
> LTSP has some tools to do this, as does the Debian package lessdisks.
> The basic principle is in http://wiki.debian.org/DebianLive and
> http://wiki.debian.org/DebianEdu/HowTo/LtspDisklessWorkstation .  The
> old diskless package did this beautifully, but bitrotted and is
> deprecated due to lack of maintenance.
>
> -Adam
> --
> GPG fingerprint: D54D 1AEE B11C CE9B A02B  C5DD 526F 01E8 564E E4B6
>
> Engineering consulting with open source tools
> http://www.opennovation.com/
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>



-- 
Tim Mattox, Ph.D. - http://homepage.mac.com/tmattox/
 tmat...@gmail.com || timat...@open-mpi.org
 I'm a bright... http://www.the-brights.net/


Re: [OMPI users] Object Send Doubt

2008-07-24 Thread Tim Mattox
Hello Carlos,
Sorry for the long delay in replying.

You may want to take a look at the Boost.MPI project:
http://www.boost.org/doc/html/mpi.html

It has a higher-level interface to MPI that is much more C++ friendly.

On Sat, Jul 12, 2008 at 3:30 PM, Carlos Henrique da Silva Santos
<santos@gmail.com> wrote:
> Dear,
>
>  I m looking for a solution.
>  I m developing a C++ code and I have an object called Field. I need
> to send/recv this object over some process allowed in my cluster. For
> example, I have to send Field from process 1 to process 2 and 3, but
> not process 4.
>  My questions are:
>  Is possible to send an object by Open-MPI?
>  If it yes, could you send me an source code example to do it or reference ?
>
> Thank you.
>
> Carlos
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>

-- 
Tim Mattox, Ph.D. - http://homepage.mac.com/tmattox/
 tmat...@gmail.com || timat...@open-mpi.org
 I'm a bright... http://www.the-brights.net/


[OMPI users] Brief mail services outage today

2008-06-12 Thread Tim Mattox
Hi Open MPI users and developers,
The mailmain service hosted by the Open Systems Lab will be
upgraded this afternoon at 2pm Eastern, and thus will be
unavailable for about an hour.  See our sysadmin's notice:


The new version(2.1.10) of mailman was released on Apr 21 2008.
It has a lot of good features included and many bugs are fixed.

The OSL would like to upgrade the current mailman(2.1.9) to get
the benefit of the new version.
The upgrade would be at 2:00PM(E.T.) on June 12, 2008.

The mailman, sendmail, and some mailman admin/info websites
services would NOT be available during the following time period.
- 11:00am-12:00pm Pacific US time
- 12:00pm-1:00pm Mountain US time
- 1:00pm-2:00pm Central US time
- 2:00pm-3:00pm Eastern US time
- 7:00pm-8:00pm GMT

Please let me know if you have any concerns or questions about this upgrade.



Note: I do not know if svn checkin e-mails will be queued during that time or
if they will be lost.  So, if you have something important you are checking in
to svn, you might avoid doing so during that hour today.
-- 
Tim Mattox, Ph.D. - http://homepage.mac.com/tmattox/
 tmat...@gmail.com || timat...@open-mpi.org
 I'm a bright... http://www.the-brights.net/


Re: [OMPI users] crash with mpiBLAST

2008-05-07 Thread Tim Mattox
Hi Joe,
Hopefully we can help you figure out what is going wrong with mpiBLAST
and Open MPI.  For starters, have a look here for what information would
help us get to the root of the problem:
http://www.open-mpi.org/community/help/

Reconfiguring (and recompiling) Open MPI with --enable-debug would give
us more information, but you might as well send us the call stack first that you
already have.

On Wed, May 7, 2008 at 10:08 AM, Joe Landman
<land...@scalableinformatics.com> wrote:
> Hi Open-MPI team:
>
>I am working on a build of mpiBLAST 1.5.0-pio, and found that the
>  code crashes immediately after launch with a seg fault.  I used Open-MPI
>  1.2.6 built from the tarball (with just a --prefix directive).
>
>I did just try the code with MPICH 1.2.7p1, and it runs fine.   What
>  steps should I try to help isolate the issue in Open-MPI?  Would it help
>  to provide the call stack reported by Open-MPI on crash?  Do you need
>  rebuild of application and Open-MPI with the -g option (for debugging
>  symbols)?
>
>Thanks!
>
>  Joe
>
>  --
>  Joseph Landman, Ph.D
>  Founder and CEO
>  Scalable Informatics LLC,
>  email: land...@scalableinformatics.com
>  web  : http://www.scalableinformatics.com
> http://jackrabbit.scalableinformatics.com
>  phone: +1 734 786 8423
>  fax  : +1 866 888 3112
>  cell : +1 734 612 4615
>  ___
>  users mailing list
>  us...@open-mpi.org
>  http://www.open-mpi.org/mailman/listinfo.cgi/users
>

-- 
Tim Mattox, Ph.D. - http://homepage.mac.com/tmattox/
 tmat...@gmail.com || timat...@open-mpi.org
 I'm a bright... http://www.the-brights.net/


[OMPI users] Open MPI v1.2.6 released

2008-04-07 Thread Tim Mattox
The Open MPI Team, representing a consortium of research, academic,
and industry partners, is pleased to announce the release of Open MPI
version 1.2.6. This release is mainly a bug fix release over the v1.2.5
release, but there are few new features.  We strongly recommend
that all users upgrade to version 1.2.6 if possible.

Version 1.2.6 can be downloaded from the main Open MPI web site or
any of its mirrors (mirrors will be updating shortly).

Here is a list of changes in v1.2.6 as compared to v1.2.5:

- Fix a bug in the inter-allgather for asymmetric inter-communicators.
  Thanks to Martin Audet fro the bug report. See ticket #1247.
- Fix a bug in the openib BTL when setting the CQ depth.  Thanks
  to Jon Mason for the bug report and fix.  See ticket #1245.
- On Mac OS X Leopard, the execinfo component will be used for
  backtraces, making for a more durable solution.  See ticket #1246.
- Added vendor IDs for some QLogic DDR openib HCAs. See ticket #1227.
- Updated the URL to get the latest config.guess and config.sub files.
  Thanks to Ralf Wildenhues for the bug report. See ticket #1226.
- Added shared contexts support to PSM MTL.  See ticket #1225.
- Added pml_ob1_use_early_completion MCA parameter to allow users
  to turn off the OB1 early completion semantic and avoid "stall"
  problems seen on InfiniBand in some cases.  See ticket #1224.
- Sanitized some #define macros used in mpi.h to avoid compiler warnings
  caused by MPI programs built with different autoconf versions.
  Thanks to Ben Allan for reporting the problem, and thanks to
  Brian Barrett for the fix. See ticket #1220.
- Some man page fixes from the Debian maintainers. See ticket #1219.
- Made the openib BTL a bit more resilient in the face of driver
  errors.  See ticket #1217.
- Fixed F90 interface for MPI_CART_CREATE.  See ticket #1208.
  Thanks to Michal Charemza for reporting the problem.
- Fixed some C++ compiler warnings. See ticket #1203.
- Fixed formatting of the orterun man page.  See ticket #1202.
  Thanks to Peter Breitenlohner for the patch.

-- 
Tim Mattox
Open Systems Lab
Indiana University


Re: [OMPI users] Bad behavior in Allgatherv when a count is 0

2008-02-07 Thread Tim Mattox
Kenneth,
Have you tried the 1.2.5 version?  There were some fixes to the
vector collectives that could have resolved your problem.

On Feb 4, 2008 5:36 PM, George Bosilca <bosi...@eecs.utk.edu> wrote:
> Kenneth,
>
> I cannot replicate this weird behavior with the current version in the
> trunk. I guess it has been fixed since 1.2.4.
>
>Thanks,
>  george.
>
>
> On Dec 13, 2007, at 6:58 PM, Moreland, Kenneth wrote:
>
> > I have found that on rare occasion Allgatherv fails to pass the data
> > to
> > all processes.  Given some magical combination of receive counts and
> > displacements, one or more processes are missing some or all of some
> > arrays in their receive buffer.  A necessary, but not sufficient,
> > condition seems to be that one of the receive counts is 0.  Beyond
> > that
> > I have not figured out any real pattern, but the example program
> > listed
> > below demonstrates the failure.  I have tried it on OpenMPI version
> > 1.2.3 and 1.2.4; it fails on both.  However, it works fine with
> > version
> > 1.1.2, so the problem must have been introduced since then.
> >
> > -Ken
> >
> >     Kenneth Moreland
> >***  Sandia National Laboratories
> > ***
> > *** *** ***  email: kmo...@sandia.gov
> > **  ***  **  phone: (505) 844-8919
> >***  fax:   (505) 845-0833
> >
> >
> >
> > #include 
> >
> > #include 
> > #include 
> >
> > int main(int argc, char **argv)
> > {
> >  int rank;
> >  int size;
> >  MPI_Comm smallComm;
> >  int senddata[5], recvdata[100];
> >  int lengths[3], offsets[3];
> >  int i, j;
> >
> >  MPI_Init(, );
> >
> >  MPI_Comm_rank(MPI_COMM_WORLD, );
> >  MPI_Comm_size(MPI_COMM_WORLD, );
> >  if (size != 3)
> >{
> >printf("Need 3 processes.");
> >MPI_Abort(MPI_COMM_WORLD, 1);
> >}
> >
> >  for (i = 0; i < 100; i++) recvdata[i] = -1;
> >  for (i = 0; i < 5; i++) senddata[i] = rank*10 + i;
> >  lengths[0] = 5;  lengths[1] = 0;  lengths[2] = 5;
> >  offsets[0] = 3;  offsets[1] = 9;  offsets[2] = 10;
> >  MPI_Allgatherv(senddata, lengths[rank], MPI_INT,
> > recvdata, lengths, offsets, MPI_INT, MPI_COMM_WORLD);
> >
> >  for (i = 0; i < size; i++)
> >{
> >for (j = 0; j < lengths[i]; j++)
> >  {
> >  if (recvdata[offsets[i]+j] != 10*i+j)
> >{
> >printf("%d: Got bad data from rank %d, index %d: %d\n", rank,
> > i,
> > j,
> >   recvdata[offsets[i]+j]);
> >break;
> >}
> >  }
> >}
> >
> >  MPI_Finalize();
> >
> >  return 0;
> > }
> >
> >
> >
> > ___
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>



-- 
Tim Mattox, Ph.D. - http://homepage.mac.com/tmattox/
 tmat...@gmail.com || timat...@open-mpi.org
I'm a bright... http://www.the-brights.net/


Re: [OMPI users] odd network behavior

2008-01-25 Thread Tim Mattox
Mark,
I think the problem is likely due to the networking differences
between the nodes.  Check out these two FAQ entries:
http://www.open-mpi.org/faq/?category=tcp#tcp-multi-network
http://www.open-mpi.org/faq/?category=tcp#tcp-selection

Specifically, I think you should try using a pair of these four MCA
parameters:
  btl_tcp_if_include  and  oob_tcp_include
or
  btl_tcp_if_exclude  and  oob_tcp_exclude

Basically, you need to make sure the Open MPI doesn't try to use
the public network, since one of the nodes isn't on the public network.

On Jan 17, 2008 10:08 PM, Mark Kosmowski <mark.kosmow...@gmail.com> wrote:
> On Jan 15, 2008 7:54 PM, Mark Kosmowski <mark.kosmow...@gmail.com> wrote:
> > Dear Open-MPI Community:
> >
> > I have a 3 node cluster, each a dual opteron workstation running
> > OpenSUSE 10.1 64-bit.  The node names are LT, SGT and PFC.  When I
> > start an mpirun job from either SGT or PFC, things work as they are
> > supposed to.  However, if I start the same job from LT, the jobs hangs
> > at SGT - this was confirmed by mpirun --np 6 --hostfile  > hostfile for the three nodes> hostname, which gives only LT; LT; PFC;
> > PFC (and then hangs) when started from LT (this same command started
> > from either of the other nodes give two of each of the three hostnames
> > and terminates normally).  The nfs share drive is physically located
> > on LT.
> >
> > I have been using ssh to get to either SGT or PFC from a terminal
> > opened originally on LT to run jobs.  I can ssh from any node to any
> > other node.
> >
> > I have attached a gzipped tar archive of the three ifconfig results
> > (for each node) and the results of ompi_info --all command as
> > requested in the "Getting Help" section.  I was unable to locate a
> > config.log file in the shared ompi directory.
> >
> > Any assistance on this matter would be appreciated,
> >
> > Mark E. Kosmowski
> >
>
>
> >I'd posted a message earlier about intermittent hangs -- perhaps it's
> >the same issue. If you run a hundred instances or so of "mpirun --np 6
> >--hostfile hostfile uptime", from SGT or PFC, do you notice any hangs?
>
> >Barry Rountree
>
> Barry:
>
> I read your thread and I do not think that the issues are the same.
> You seem to get the correct output before the hang, I do not.  My
> system either fails to give the expected output with a hang when
> started from the LT node, or works correctly giving the proper output
> and a graceful exit (i.e. no hang whatsoever) when started on one of
> the other two nodes (SGT or PFC).
>
> I suspect that my issue is that both LT and SGT are connected to both
> the internet and the dedicated cluster traffic gigabit switch, while
> PFC is only connected to the dedicated cluster traffic gigabit switch.
>  However, this is the limit of my network diagnostic abilities,
> especially since SGT can properly launch open MPI jobs.
>
> Mark
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>



-- 
Tim Mattox, Ph.D. - http://homepage.mac.com/tmattox/
 tmat...@gmail.com || timat...@open-mpi.org
I'm a bright... http://www.the-brights.net/


Re: [OMPI users] Occasional mpirun hang on completion

2008-01-24 Thread Tim Mattox
Hello Barry,
I am guessing you are trying to use a threaded build of Open MPI...

Unfortunately, the threading support in Open MPI 1.2.x is not only not well
tested, it has many known problems.  We do not advise use of threading in
the Open MPI 1.2.x series.  We even added a warning in version 1.2.5 if
you try to use threading... specifically we added run-time warnings during
MPI_INIT when MPI_THREAD_MULTIPLE and/or progression threads are used.

We are targeting the 1.3 series to have threading support actually working.

On Jan 24, 2008 3:25 AM, Barry Rountree <rount...@cs.uga.edu> wrote:
> On Thu, Jan 24, 2008 at 03:01:40AM -0500, Barry Rountree wrote:
> > On Fri, Jan 18, 2008 at 08:33:10PM -0500, Jeff Squyres wrote:
> > > Barry --
> > >
> > > Could you check what apps are still running when it hangs?  I.e., I
> > > assume that all the uptime's are dead; are all the orted's dead on the
> > > remote nodes?  (orted = our helper process that is launched on the
> > > remote nodes to exert process control, funnel I/O back and forth to
> > > mpirun, etc.)
>
> One more bit of trivia -- when I ran my killall script across the nodes,
> there were four out of sixteen that had an orted process hanging out.
> If this is a synchronization problem, then most of the nodes are
> handling it fine.
>
> >
> > Here's the stack trace of the orted process on node 01.  The "uname"
> > process was long gone (and had sent its output back with no difficulty).
> >
> > 
> > Stopping process localhost:5321 
> > (/osr/users/rountree/ompi-1.2.4_intel_threaded_debug/bin/orted).
> > Thread received signal INT
> > stopped at [ pthread_cond_wait@@GLIBC_2.3.2(...) 0x2b67a766]
> > (idb) where
> > >0  0x2b67a766 in pthread_cond_wait@@GLIBC_2.3.2(...) in 
> > >/lib64/libpthread-2.4.so
> > #1  0x00401fef in opal_condition_wait(c=0x5075c0, m=0x507580) 
> > "../../../opal/threads/condition.h":64
> > #2  0x00403000 in main(argc=17, argv=0x7d82cd38) "orted.c":525
> > #3  0x2b7a6e54 in __libc_start_main(...) in /lib64/libc-2.4.so
> > #4  0x00401c19 in _start(...) in 
> > /osr/users/rountree/ompi-1.2.4_intel_threaded_debug/bin/orted
> > 
> >
> > The mpirun process on the root node isn't quite as useful.
> >
> >
> > 
> > Stopping process localhost:29856 
> > (/osr/users/rountree/ompi-1.2.4_intel_threaded_debug/bin/orterun).
> > Thread received signal INT
> > stopped at [ poll(...) 0x0039ef2c3806]
> > (idb) where
> > >0  0x0039ef2c3806 in poll(...) in /lib64/libc-2.4.so
> > #1  0x40a000c0
> > 
> >
> > Let me know what other information would be helpful.
> >
> > Best,
> >
> > Barry
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>



-- 
Tim Mattox, Ph.D. - http://homepage.mac.com/tmattox/
 tmat...@gmail.com || timat...@open-mpi.org
I'm a bright... http://www.the-brights.net/


[OMPI users] Open MPI v1.2.5 released

2008-01-08 Thread Tim Mattox
The Open MPI Team, representing a consortium of research, academic,
and industry partners, is pleased to announce the release of Open MPI
version 1.2.5. This release is mainly a bug fix release over the v1.2.4
release, but there are few new features.  We strongly recommend
that all users upgrade to version 1.2.5 if possible.

Version 1.2.5 can be downloaded from the main Open MPI web site or
any of its mirrors (mirrors will be updating shortly).

Here are a list of changes in v1.2.5 as compared to v1.2.4:

- Fixed compile issue with open() on Fedora 8 (and newer) platforms.
  Thanks to Sebastian Schmitzdorff for noticing the problem.
- Added run-time warnings during MPI_INIT when MPI_THREAD_MULTIPLE
  and/or progression threads are used (the OMPI v1.2 series does not
  support these well at all).
- Better handling of ECONNABORTED from connect on Linux.  Thanks to
  Bob Soliday for noticing the problem; thanks to Brian Barrett for
  submitting a patch.
- Reduce extraneous output from OOB when TCP connections must
  be retried.  Thanks to Brian Barrett for submitting a patch.
- Fix for ConnectX devices and OFED 1.3.  See ticket #1190.
- Fixed a configure problem for Fortran 90 on Cray systems.  Ticket #1189.
- Fix an uninitialized variable in the error case in opal_init.c.
  Thanks to Ake Sandgren for pointing out the mistake.
- Fixed a hang in configure if $USER was not defined.  Thanks to
  Darrell Kresge for noticing the problem.  See ticket #900.
- Added support for parallel debuggers even when we have an optimized build.
  See ticket #1178.
- Worked around a bus error in the Mac OS X 10.5.X (Leopard) linker when
  compiling Open MPI with -g.  See ticket #1179.
- Removed some warnings about 'rm' from Mac OS X 10.5 (Leopard) builds.
- Fix the handling of mx_finalize().  See ticket #1177.
  Thanks to Ake Sandgren for bringing this issue to our attention.
- Fixed minor file descriptor leak in the Altix timer code.  Thanks to
  Paul Hargrove for noticing the problem and supplying the fix.
- Fix a problem when using a different compiler for C and Objective C.
  See ticket #1153.
- Fix segfault in MPI_COMM_SPAWN when the user specified a working
  directory.  Thanks to Murat Knecht for reporting this and suggesting
  a fix.
- A few manpage fixes from the Debian Open MPI maintainers.  Thanks to
  Tilman Koschnick, Sylvestre Ledru, and Dirk Eddelbuettel.
- Fixed issue with pthread detection when compilers are not all
  from the same vendor.  Thanks to Ake Sandgren for the bug
  report.  See ticket #1150.
- Fixed vector collectives in the self module.  See ticket #1166.
- Fixed some data-type engine bugs: an indexing bug, and an alignment bug.
  See ticket #1165.
- Only set the MPI_APPNUM attribute if it is defined.  See ticket
  #1164.

--
Tim Mattox
Open Systems Lab
Indiana University


Re: [OMPI users] Problems with GATHERV on one process

2007-12-11 Thread Tim Mattox
Hello Ken,
This is a known bug, which is fixed in the upcoming 1.2.5 release.  We
expect 1.2.5
to come out very soon.  We should have a new release candidate for 1.2.5 posted
by tomorrow.

See these tickets about the bug if you care to look:
https://svn.open-mpi.org/trac/ompi/ticket/1166
https://svn.open-mpi.org/trac/ompi/ticket/1157

On Dec 11, 2007 2:48 PM, Moreland, Kenneth <kmo...@sandia.gov> wrote:
> I recently ran into a problem with GATHERV while running some randomized
> tests on my MPI code.  The problem seems to occur when running
> MPI_Gatherv with a displacement on a communicator with a single process.
> The code listed below exercises this errant behavior.  I have tried it
> on OpenMPI 1.1.2 and 1.2.4.
>
> Granted, this is not a situation that one would normally run into in a
> real application, but I just wanted to check to make sure I was not
> doing anything wrong.
>
> -Ken
>
>
>
> #include 
>
> #include 
> #include 
>
> int main(int argc, char **argv)
> {
>   int rank;
>   MPI_Comm smallComm;
>   int senddata[4], recvdata[4], length, offset;
>
>   MPI_Init(, );
>
>   MPI_Comm_rank(MPI_COMM_WORLD, );
>
>   // Split up into communicators of size 1.
>   MPI_Comm_split(MPI_COMM_WORLD, rank, 0, );
>
>   // Now try to do a gatherv.
>   senddata[0] = 5; senddata[1] = 6; senddata[2] = 7; senddata[3] = 8;
>   recvdata[0] = 0; recvdata[1] = 0; recvdata[2] = 0; recvdata[3] = 0;
>   length = 3;
>   offset = 1;
>   MPI_Gatherv(senddata, length, MPI_INT,
>   recvdata, , , MPI_INT, 0, smallComm);
>   if (senddata[0] != recvdata[offset])
> {
> printf("%d: %d != %d?\n", rank, senddata[0], recvdata[offset]);
> }
>   else
> {
> printf("%d: Everything OK.\n", rank);
> }
>
>   return 0;
> }
>
>  Kenneth Moreland
> ***  Sandia National Laboratories
> ***
> *** *** ***  email: kmo...@sandia.gov
> **  ***  **  phone: (505) 844-8919
> ***  fax:   (505) 845-0833
>
>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>

-- 
Tim Mattox, Ph.D. - http://homepage.mac.com/tmattox/
 tmat...@gmail.com || timat...@open-mpi.org
I'm a bright... http://www.the-brights.net/


Re: [MTT users] Problems running MTT with already installed MPICH-MX

2007-09-28 Thread Tim Mattox
home/pjesa/mtt/scratch2/installs
> > Unique directory: CoY6
> > Making dir: CoY6 (cwd: /home/pjesa/mtt/scratch2/installs)
> > CoY6 does not exist -- creating
> > chdir CoY6/
> > chdir /home/pjesa/mtt/scratch2/installs
> > chdir /home/pjesa/mtt/scratch2/installs/CoY6
> > Value: module
> > Evaluating: MPICH2
> > Replacing vars from section mpi install: mpich-mx: MPICH2
> > Got final version before escapes: MPICH2
> > Returning: MPICH2
> > Value: description
> > Value: description
> > Evaluating: [testbake]
> > Replacing vars from section MTT: [testbake]
> > Got final version before escapes: [testbake]
> > Returning: [testbake]
> > chdir /home/pjesa/mtt/scratch2/installs/CoY6
> > chdir ..
> > chdir /home/pjesa/mtt/scratch2/installs/CoY6
> > Sym linked: CoY6 to mpich-mx#mpich-mx#1.2.7
> > Value: env_module
> > Value: setenv
> > Value: unsetenv
> > Value: prepend_path
> > Value: append_path
> > Value: configure_arguments
> > Value: vpath_mode
> > Value: make_all_arguments
> > Value: make_check
> > Value: compiler_name
> > Value: compiler_version
> > Value: save_stdout_on_success
> > Evaluating: 1
> > Replacing vars from section mpi install: mpich-mx: 1
> > Got final version before escapes: 1
> > Returning: 1
> > Value: merge_stdout_stderr
> > Evaluating: 0
> > Replacing vars from section mpi install: mpich-mx: 0
> > Got final version before escapes: 0
> > Returning: 0
> > Value: stderr_save_lines
> > Value: stdout_save_lines
> > Running command: rm -rf src
> > Command complete, exit status: 0
> > Making dir: src (cwd: /home/pjesa/mtt/scratch2/installs/CoY6)
> > src does not exist -- creating
> > chdir src/
> > chdir /home/pjesa/mtt/scratch2/installs/CoY6
> > chdir /home/pjesa/mtt/scratch2/installs/CoY6/src
> > Evaluating: require MTT::MPI::Get::AlreadyInstalled
> > Evaluating: $ret =
> >::MPI::Get::AlreadyInstalled::PrepareForInstall(@args)
> > chdir /home/pjesa/mtt/scratch2/installs/CoY6/src
> > chdir /home/pjesa/mtt/scratch2/installs/CoY6/src
> > Making dir: /home/pjesa/mtt/scratch2/installs/CoY6/install (cwd:
> >/home/pjesa/mtt/scratch2/installs/CoY6/src)
> > /home/pjesa/mtt/scratch2/installs/CoY6/install does not exist -- creating
> > chdir /home/pjesa/mtt/scratch2/installs/CoY6/install/
> > chdir /home/pjesa/mtt/scratch2/installs/CoY6/src
> > Evaluating: require MTT::MPI::Install::MPICH2
> > Evaluating: $ret = ::MPI::Install::MPICH2::Install(@args)
> > Value: mpich2_make_all_arguments
> > Value: mpich2_compiler_name
> > Value: bitness
> > Evaluating: _mpi_install_bitness("")
> > --> Prefix now:
> > --> Remaining (after &): get_mpi_install_bitness("")
> > --> Found func name: get_mpi_install_bitness
> > --> Found beginning of arguments: "")
> > --> Initial param search: "")
> > --> Loop: trimmed search: "")
> > --> Examining char: " (pos 0)
> > --> Found beginning quote
> > --> Found last quote
> > --> Examining char: ) (pos 2)
> > --> Found end of arg (pos 2)
> > Found argument: ""
> > --> Remainder:
> > --> Calling: $ret = MTT::Values::Functions::get_mpi_install_bitness("");
> > _mpi_intall_bitness
> > &_find_libmpi returning:
> > Couldn't find libmpi!
> > --> After eval(string), remaining: 0
> > Got final version before escapes: 0
> > Returning: 0
> > Value: endian
> > Evaluating: _mpi_install_endian("")
> > --> Prefix now:
> > --> Remaining (after &): get_mpi_install_endian("")
> > --> Found func name: get_mpi_install_endian
> > --> Found beginning of arguments: "")
> > --> Initial param search: "")
> > --> Loop: trimmed search: "")
> > --> Examining char: " (pos 0)
> > --> Found beginning quote
> > --> Found last quote
> > --> Examining char: ) (pos 2)
> > --> Found end of arg (pos 2)
> > Found argument: ""
> > --> Remainder:
> > --> Calling: $ret = MTT::Values::Functions::get_mpi_install_endian("");
> > _mpi_intall_endian
> > &_find_libmpi returning:
> > *** Could not find libmpi to calculate endian-ness
> > --> After eval(string), remaining: 0
> > Got final version before escapes: 0
> > Returning: 0
> > Found whatami: /home/pjesa/mtt/collective-bakeoff/client/whatami/whatami
> > Value: platform_type
> > Value: platform_type
> > Value: platform_hardware
> > Value: platform_hardware
> > Value: os_name
> > Value: os_name
> > Value: os_version
> > Value: os_version
> >Skipped MPI install
> > *** MPI install phase complete
> > >> Phase: MPI Install
> >Started:   Thu Sep 27 22:39:37 2007
> >Stopped:   Thu Sep 27 22:39:38 2007
> >Elapsed:   00:00:01
> >Total elapsed: 00:00:01
> > *** Test get phase starting
> > chdir /home/pjesa/mtt/scratch2/sources
> > >> Test get: [test get: netpipe]
> >Checking for new test sources...
> > Value: module
> > Evaluating: Download
> > Replacing vars from section test get: netpipe: Download
> > Got final version before escapes: Download
> > Returning: Download
> > chdir /home/pjesa/mtt/scratch2/sources
> > Making dir: test_get__netpipe (cwd: /home/pjesa/mtt/scratch2/sources)
> > test_get__netpipe does not exist -- creating
> > chdir test_get__netpipe/
> > chdir /home/pjesa/mtt/scratch2/sources
> > chdir /home/pjesa/mtt/scratch2/sources/test_get__netpipe
> > Evaluating: require MTT::Test::Get::Download
> > Evaluating: $ret = ::Test::Get::Download::Get(@args)
> > Value: download_url
> > Evaluating: http://www.scl.ameslab.gov/netpipe/code/NetPIPE_3.6.2.tar.gz
> > Replacing vars from section test get: netpipe:
> >http://www.scl.ameslab.gov/netpipe/code/NetPIPE_3.6.2.tar.gz
> > Got final version before escapes:
> >http://www.scl.ameslab.gov/netpipe/code/NetPIPE_3.6.2.tar.gz
> > Returning: http://www.scl.ameslab.gov/netpipe/code/NetPIPE_3.6.2.tar.gz
> > >> Download got url:
> >http://www.scl.ameslab.gov/netpipe/code/NetPIPE_3.6.2.tar.gz
> > Value: download_username
> > Value: download_password
> > >> MTT::FindProgram::FindProgram returning /usr/bin/wget
> > Running command: wget -nv
> >http://www.scl.ameslab.gov/netpipe/code/NetPIPE_3.6.2.tar.gz
> > OUT:22:39:55
> >URL:http://www.scl.ameslab.gov/netpipe/code/NetPIPE_3.6.2.tar.gz
> >[369585/369585] -> "NetPIPE_3.6.2.tar.gz" [1]
> > Command complete, exit status: 0
> > Value: download_version
> > >> Download complete
> >Got new test sources
> > *** Test get phase complete
> > >> Phase: Test Get
> >Started:   Thu Sep 27 22:39:38 2007
> >Stopped:   Thu Sep 27 22:39:55 2007
> >Elapsed:   00:00:17
> >Total elapsed: 00:00:18
> > *** Test build phase starting
> > chdir /home/pjesa/mtt/scratch2/installs
> > >> Test build [test build: netpipe]
> > Value: test_get
> > Evaluating: netpipe
> > Replacing vars from section test build: netpipe: netpipe
> > Got final version before escapes: netpipe
> > Returning: netpipe
> > *** Test build phase complete
> > >> Phase: Test Build
> >Started:   Thu Sep 27 22:39:55 2007
> >Stopped:   Thu Sep 27 22:39:55 2007
> >Elapsed:   00:00:00
> >Total elapsed: 00:00:18
> > *** Run test phase starting
> > >> Test run [netpipe]
> > Value: test_build
> > Evaluating: netpipe
> > Replacing vars from section test run: netpipe: netpipe
> > Got final version before escapes: netpipe
> > Returning: netpipe
> > *** Run test phase complete
> > >> Phase: Test Run
> >Started:   Thu Sep 27 22:39:55 2007
> >Stopped:   Thu Sep 27 22:39:55 2007
> >Elapsed:   00:00:00
> >Total elapsed: 00:00:18
> > >> Phase: Trim
> >Started:   Thu Sep 27 22:39:55 2007
> >Stopped:   Thu Sep 27 22:39:55 2007
> >Elapsed:   00:00:00
> >Total elapsed: 00:00:18
> > *** Reporter finalizing
> > Evaluating: require MTT::Reporter::MTTDatabase
> > Evaluating: $ret = ::Reporter::MTTDatabase::Finalize(@args)
> > Evaluating: require MTT::Reporter::TextFile
> > Evaluating: $ret = ::Reporter::TextFile::Finalize(@args)
> > *** Reporter finalized
>
> > ___
> > mtt-users mailing list
> > mtt-us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/mtt-users
>
> ___
> mtt-users mailing list
> mtt-us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/mtt-users
>


-- 
Tim Mattox, Ph.D. - http://homepage.mac.com/tmattox/
 tmat...@gmail.com || timat...@open-mpi.org
I'm a bright... http://www.the-brights.net/


[OMPI users] Open MPI v1.2.4 released

2007-09-26 Thread Tim Mattox
The Open MPI Team, representing a consortium of research, academic,
and industry partners, is pleased to announce the release of Open MPI
version 1.2.4. This release is mainly a bug fix release over the v1.2.3
release, but there are few new features.  We strongly recommend
that all users upgrade to version 1.2.4 if possible.

Version 1.2.4 can be downloaded from the main Open MPI web site or
any of its mirrors (mirrors will be updating shortly).

Here are a list of changes in v1.2.4 as compared to v1.2.3:

- Really added support for TotalView/DDT parallel debugger message queue
  debugging (it was mistakenly listed as "added" in the 1.2 release).
- Fixed a build issue with GNU/kFreeBSD. Thanks to Petr Salinger for
  the patch.
- Added missing MPI_FILE_NULL constant in Fortran.  Thanks to
  Bernd Schubert for bringing this to our attention.
- Change such that the UDAPL BTL is now only built in Linux when
  explicitly specified via the --with-udapl configure command line
  switch.
- Fixed an issue with umask not being propagated when using the TM
  launcher.
- Fixed behavior if number of slots is not the same on all bproc nodes.
- Fixed a hang on systems without GPR support (ex. Cray XT3/4).
- Prevent users of 32-bit MPI apps from requesting >= 2GB of shared
  memory.
- Added a Portals MTL.
- Fix 0 sized MPI_ALLOC_MEM requests.  Thanks to Lisandro Dalcin for
  pointing out the problem.
- Fixed a segfault crash on large SMPs when doing collectives.
- A variety of fixes for Cray XT3/4 class of machines.
- Fixed which error handler is used when MPI_COMM_SELF is passed
  to MPI_COMM_FREE.  Thanks to Lisandro Dalcini for the bug report.
- Fixed compilation on platforms that don't have hton/ntoh.
- Fixed a logic problem in the fortran binding for MPI_TYPE_MATCH_SIZE.
  Thanks to Jeff Dusenberry for pointing out the problem and supplying
  the fix.
- Fixed a problem with MPI_BOTTOM in various places of the f77-interface.
  Thanks to Daniel Spangberg for bringing this up.
- Fixed problem where MPI-optional Fortran datatypes were not
  correctly initialized.
- Fixed several problems with stdin/stdout forwarding.
- Fixed overflow problems with the sm mpool MCA parameters on large SMPs.
- Added support for the DDT parallel debugger via orterun's --debug
  command line option.
- Added some sanity/error checks to the openib MCA parameter parsing
  code.
- Updated the udapl BTL to use RDMA capabilities.
- Allow use of the BProc head node if it was allocated to the user.
  Thanks to Sean Kelly for reporting the problem and helping debug it.
- Fixed a ROMIO problem where non-blocking I/O errors were not properly
  reported to the user.
- Made remote process launch check the $SHELL environment variable if
  a valid shell was not otherwise found for the user.
  Thanks to Alf Wachsmann for the bugreport and suggested fix.
- Added/updated some vendor IDs for a few openib HCAs.
- Fixed a couple of failures that could occur when specifying devices
  for use by the OOB.
- Removed dependency on sysfsutils from the openib BTL for
  libibverbs >=v1.1 (i.e., OFED 1.2 and beyond).

--
Tim Mattox
Open Systems Lab
Indiana University


Re: [OMPI users] Two different compilation of openmpi

2007-09-14 Thread Tim Mattox
Also, you might want to use this configure option to simplify switching:
--enable-mpirun-prefix-by-default

For more details, see: ./configure --help

On 9/14/07, Reuti <re...@staff.uni-marburg.de> wrote:
> Hi,
>
> Am 13.09.2007 um 23:29 schrieb Francesco Pietra:
>
> > Is it possible to have two different compilations of openmpi on the
> > same
> > machine (dual-opterons, Debian Linux etch)?
> >
> > On that parallel computer sander.MPI (Amber9) and openmpi 1.2.3
> > have both been
> > compiled with Intel Fortran 9.1.036.
> >
> > Now, I wish to install DOCK6 on this machine and I am advised that
> > it should be
> > better compiled on GNU compilers. As to openmpi I could install the
> > Debian
> > package, which is GNU compiled. Are conflicts between the two
> > installation
> > foreseeable? Although I don't have experience with DOCK, I suspect
> > that certain
> > procedures with DOCK call sander.MPI into play.
> >
> > I rule out the alternative of compiling Amber9 with GNU compilers,
> > which will
> > run slower.
>
> this is no problem. Instead of using any prebuilt package, compile
> and install the two different versions of OMPI on your own, and use
> two different locations for them, which you can achieve by e.g.:
>
> ./configure --prefix=/opt/my_location_a
>
> and a different location of course for the other compilation. If you
> now compile your application, be sure to get the correct one of mpicc
> etc. in /opt/my_location_a/bin and also use this specific mpiexec
> therein later on by adjusting the $PATH accordingly.
>
> As we have only two different versions, we don't use the mentioned
> "modules" package for now, but hardcode the appropriate PATH in the
> jobscript for our queuing system.
>
> --- Reuti
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>


-- 
Tim Mattox, Ph.D. - http://homepage.mac.com/tmattox/
 tmat...@gmail.com || timat...@open-mpi.org
I'm a bright... http://www.the-brights.net/


[OMPI users] Open MPI v1.2.3 released

2007-06-20 Thread Tim Mattox

The Open MPI Team, representing a consortium of research, academic,
and industry partners, is pleased to announce the release of Open MPI
version 1.2.3. This release is mainly a bug fix release over the v1.2.2
release, but there are few minor new features.  We strongly
recommend that all users upgrade to version 1.2.3 if possible.

Version 1.2.3 can be downloaded from the main Open MPI web site or
any of its mirrors (mirrors will be updating shortly).

Here are a list of changes in v1.2.3 as compared to v1.2.2:

- Fix a regression in comm_spawn functionality that inadvertently
 caused the mapping of child processes to always start at the same
 place.  Thanks to Prakash Velayutham for helping discover the
 problem.
- Fix segfault when a user's home directory is unavailable on a remote
 node.  Thanks to Guillaume Thomas-Collignon for bringing the issue
 to our attention.
- Fix MPI_IPROBE to properly handle MPI_STATUS_IGNORE on mx and
 psm MTLs. Thanks to Sophia Corwell for finding this and supplying a
 reproducer.
- Fix some error messages in the tcp BTL.
- Use _NSGetEnviron instead of environ on Mac OS X so that there
 are no undefined symbols in the shared libraries.
- On OS X, when MACOSX_DEPLOYMENT_TARGET is 10.3 or higher,
 support building the Fortran 90 bindings as a shared library.  Thanks to
 Jack Howarth for his advice on making this work.
- No longer require extra include flag for the C++ bindings.
- Fix detection of weak symbols support with Intel compilers.
- Fix issue found by Josh England: ompi_info would not show framework
 MCA parameters set in the environment properly.
- Rename the oob_tcp_include/exclude MCA params to oob_tcp_if_include/exclude
 so that they match the naming convention of the btl_tcp_if_include/exclude
 params.  The old names are depreciated, but will still work.
- Add -wd as a synonym for the -wdir orterun/mpirun option.
- Fix the mvapi BTL to compile properly with compilers that do not support
 anonymous unions.  Thanks to Luis Kornblueh for reporting the bug.

--
Tim Mattox
Open Systems Lab
Indiana University


[OMPI users] Open MPI v1.2.2

2007-05-16 Thread Tim Mattox

The Open MPI Team, representing a consortium of research, academic,
and industry partners, is pleased to announce the release of Open MPI
version 1.2.2. This release is mainly a bug fix release over the v1.2.1
release, but there are few minor new features.  We strongly
recommend that all users upgrade to version 1.2.2 if possible.

Version 1.2.2 can be downloaded from the main Open MPI web site or
any of its mirrors (mirrors will be updating shortly).

Here are a list of changes in v1.2.2 as compared to v1.2.1:

- Fix regression in 1.2.1 regarding the handling of $CC with both
 absolute and relative path names.
- Fix F90 array of status dimensions.  Thanks to Randy Bramley for
 noticing the problem.
- Add btl_openib_ib_pkey_value MCA parameter for controlling IB port selection.
- Fixed a variety of threading/locking bugs.
- Fixed some compiler warnings associated with ROMIO, OS X, and gridengine.
- If pbs-config can be found, use it to look for TM support.  Thanks
 to Bas van der Viles for the inspiration and preliminary work.
- Fixed a deadlock in orterun when the rsh PLS encounters some errors.

--
Tim Mattox
Open Systems Lab
Indiana University


[OMPI users] Open MPI v1.2.1

2007-04-25 Thread Tim Mattox

The Open MPI Team, representing a consortium of research, academic,
and industry partners, is pleased to announce the release of Open MPI
version 1.2.1. This release is mainly a bug fix release over the the
v1.2 release, but there are few minor new features.  We strongly
recommend that all users upgrade to version 1.2.1 if possible.

Version 1.2.1 can be downloaded from the main Open MPI web site or
any of its mirrors (mirrors will be updating shortly).

Here are a list of changes in v1.2.1 as compared to v1.2:

- Fixed a number of connection establishment errors in the TCP out-
of-band messaging system.
- Fixed a memory leak when using mpi_comm calls.
Thanks to Bas van der Vlies for reporting the problem.
- Fixed various memory leaks in OPAL and ORTE.
- Improved launch times when using TM (PBS Pro, Torque, Open PBS).
- Fixed mpi_leave_pinned to work for all datatypes.
- Fix functionality allowing users to disable sbrk() (the
mpool_base_disable_sbrk MCA parameter) on platforms that support it.
- Fixed a pair of problems with the TCP "listen_thread" mode for the
oob_tcp_listen_mode MCA parameter that would cause failures when
attempting to launch applications.
- Fixed a segfault if there was a failure opening a BTL MX endpoint.
- Fixed a problem with mpirun's --nolocal option introduced in 1.2.
- Re-enabled MPI_COMM_SPAWN_MULTIPLE from singletons.
- LoadLeveler and TM configure fixes, Thanks to Martin Audet for the
bug report.
- Various C++ MPI attributes fixes.
- Fixed issues with backtrace code on 64 bit Intel & PPC OS X builds.
- Fixed issues with multi-word CC variables and libtool.
Thanks to Bert Wesarg for the bug reports.
- Fix issue with non-uniform node naming schemes in SLURM.
- Fix file descriptor leak in the Grid Engine/N1GE support.
- Fix compile error on OS X 10.3.x introduced with Open MPI 1.1.5.
- Implement MPI_TYPE_CREATE_DARRAY function (was in 1.1.5 but not 1.2).
- Recognize zsh shell when using rsh/ssh for launching MPI jobs.
- Ability to set the OPAL_DESTDIR or OPAL_PREFIX environment
variables to "re-root" an existing Open MPI installation.
- Always include -I for Fortran compiles, even if the prefix is
/usr/local.
- Support for "fork()" in MPI applications that use the
OpenFabrics stack (OFED v1.2 or later).
- Support for setting specific limits on registered memory.

--
Tim Mattox
Open Systems Lab
Indiana University


Re: [OMPI users] [Re: Memory leak in openmpi-1.2?]

2007-04-04 Thread Tim Mattox

Hello Bas van der Vlies,
The memory leak you found in Open MPI 1.2 has not yet been fixed in the
1.2 branch. You can follow the status of that particular fix for the
1.2 branch here:
https://svn.open-mpi.org/trac/ompi/ticket/970

The fix should go in soon, but I had a problem yesterday applying the fix.

It has been fixed in the trunk, which has nightly tarballs available
here where you don't need to run autogen (and shouldn't):
http://www.open-mpi.org/nightly/trunk/

Be aware that the trunk is currently under development for a future 1.3 release,
and already has many differences with the 1.2 version.  i.e. it is unstable.

However, what specifically happened when you say that ./autogen.sh fails on
the trunk?  What version of libtool, automake, and autoconf do you have?

Thanks for your bug report and your patience!

On 4/4/07, Bas van der Vlies <b...@sara.nl> wrote:

Bas van der Vlies wrote:
> Mohamad Chaarawi wrote:
>> Yes we saw the memory leak, and a fix is already in the trunk right now..
>> Sorry i didn't reply back earlier...
>> The fix will be merged in V1.2, as soon as the release managers approve it..
>>
>> Thank you,
>>
>
> Thanks we will test it and do some more scalapack testing.
>

Is the fix in trunk or also in the nighly build release. When i download
the trunk version ./autogen.sh fails.

If i use the nighyly build version of openmpi-1.2.1a0r14212.tar.gz. We
still observe a memory leak. Is this the right version?

Regards


--

*  *
*  Bas van der Vlies e-mail: b...@sara.nl  *
*  SARA - Academic Computing Servicesphone:  +31 20 592 8012   *
*  Kruislaan 415 fax:+31 20 6683167*
*  1098 SJ Amsterdam   *
*  *

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Tim Mattox - http://homepage.mac.com/tmattox/
tmat...@gmail.com || timat...@open-mpi.org
   I'm a bright... http://www.the-brights.net/


[MTT users] Recent OMPI Trunk fails MPI_Allgatherv_* MTT tests

2007-04-01 Thread Tim Mattox

Hi All,
I just checked the recent nightly MTT results and found two things of note,
one for the MTT community, the other for the OMPI developers.

For both, see http://www.open-mpi.org/mtt/reporter.php?do_redir=143
for details of the failed MTT tests with the OMPI trunk at r14180.

1) For MTT developers:
The MTT intel test suite is incorrectly seeing a failed MPI_Allgatherv_f
test as passed, yet is correctly detecting that the MPI_Allgatherv_c
test is failing.
The STDOUT from "passed" MPI_Allgatherv_f seems to indicate that the tests
actually failed in a similar way to the _c version, but MTT thinks it passed.
I've not had time to diagnose why MTT is missing this...  anyone else have
some spare cycles to look at this?

2) For OMPI developers:
The MPI_Allgatherv_* tests are failing as of r14180 in all test conditions
on the IU machines, and others, yet this passed the night before on r14172.

Looking at the svn log for r#'s r14173 thru r14180, I can narrow it down to
one of these changes as the culprit:
https://svn.open-mpi.org/trac/ompi/changeset/14180
https://svn.open-mpi.org/trac/ompi/changeset/14179
https://svn.open-mpi.org/trac/ompi/changeset/14174 (Not likely)

My money is on the much larger r14180 changeset.
The other r#'s aren't culprits for obvious reasons.
--
Tim Mattox - http://homepage.mac.com/tmattox/
tmat...@gmail.com || timat...@open-mpi.org
   I'm a bright... http://www.the-brights.net/


Re: [OMPI users] Error in MPI_Unpack --- MPI_ERR_TRUNCATE: message truncated

2007-03-13 Thread Tim Mattox

Michael,
Can you upgrade to a newer version of Open MPI?  There have been several
bugfix releases of the 1.1 series, and we are on the verge of releasing v1.2.
So, please try either 1.1.4 (or 1.1.5rc1), and/or try v1.2rc3.

On 3/12/07, Michael Epitropakis <mike...@student.math.upatras.gr> wrote:


Dear ompi users,

I am using OpenMPI in order to parallelize an evolutionary algorithm.
During the execution of the algorithm I have to send many same messages to
some nodes. So, in order to generate and use these messages i use MPI_Pack
and MPI_Unpack Each message has the following structure:

{int, int, int[size], double[size2], double}

In the beginning of the Algorithm everything is going well... but suddenly
something is going wrong and i get the following messages::

[compute-0-1.local:32461] *** An error occurred in MPI_Unpack
[compute-0-1.local:32461] *** on communicator MPI_COMM_WORLD
[compute-0-1.local:32461] *** MPI_ERR_TRUNCATE: message truncated
[compute-0-1.local:32461] *** MPI_ERRORS_ARE_FATAL (goodbye)

With a lot of debugging I found out that the first two integers, let me
call them x, y, were not the values that i was expecting (the values that
i have packed!!!). They were very strange: either big positive integers or
big negative integers. I checked the buffer that I am using for these
messages and it is much bigger than the required message storage. I can;t
think of something else that could probably caused that problem

Can you please help me with this?? What could have probably caused this
problem?

I am using OpenMPI version: Open MPI: 1.1 Open MPI SVN revision: r10477


Thank you in advance
Michael.

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




--
Tim Mattox - http://homepage.mac.com/tmattox/
tmat...@gmail.com || timat...@open-mpi.org
   I'm a bright... http://www.the-brights.net/


Re: [MTT users] Minor bug found in MTT 2 client side.

2007-01-19 Thread Tim Mattox

Hi All,

On 1/18/07, Jeff Squyres <jsquy...@cisco.com> wrote:

On Jan 18, 2007, at 10:37 PM, Tim Mattox wrote:

[snip description of a newline bug]


Fixed -- none have newlines now, so they'll all be in the one-line
output format.


Thanks.


> I don't know if it is related or not, but for tests that fail without
> timing out,
> the debug output from MTT for that test does NOT have a line like
> these:
> test_result: 1  (passed)
> test_result: 2  (skipped)
> test_result: 3  (timed out)

Are you sure?  Lines 80-84 of Correctness.pm are:

 if ($results->{timed_out}) {
 Warning("$str TIMED OUT (failed)\n");
 } else {
 Warning("$str FAILED\n");
 }

Are you talking about some other output?  Or are you asking for
something in (parentheses)?


Sorry, I wasn't clear.  The current output for each test in the debug file
usually includes a line "test_result: X" with X replaced by a number.
However, for tests that fail outright, this line is missing.  This missing
line happened to correspond to the tests that had a newline in the result
message that I discussed (snipped) above.

Please don't put in the parentheses things.  That was just me commenting
on which number meant what.



If you're in the middle of revamping your parser to match the MTT 2.0
output, I might suggest that it might be far easier to just
incorporate your desired output into MTT itself, for two reasons:

1. the debug output can change at any time; it was meant to be for
debugging, not necessarily for screen scraping.


Point taken.


2. there would be no need for screen scraping/parsing; you would have
the data immediately available and all you have to do is output it
into the format that you want.  We should be able to accommodate
whatever you need via an MTT Reporter plugin.  I'm guessing this
should be pretty straightforward...?


Where can I find some documentation for or examples of a plugin?


--
Jeff Squyres
Server Virtualization Business Unit
Cisco Systems



--
Tim Mattox - http://homepage.mac.com/tmattox/
tmat...@gmail.com || timat...@open-mpi.org
   I'm a bright... http://www.the-brights.net/


[MTT users] Minor bug found in MTT 2 client side.

2007-01-18 Thread Tim Mattox

Hi MTT developers,
(Sorry to those who are just MTT users, you can skip this message).

I found some minor bugs/inconveniences in lib/MTT/Test/Analyze/Correctness.pm.

It is not consistent about making "$report->{result_message}" get assigned
a value without an embedded newline.  For example, at lines 93 & 96
a newline is embedded, yet at lines 72, 76, 87-88 the string is ended
without a newline.  For our purposes at IU, with a local e-mail
generation script,
it would be great if those \n newlines could be removed from lines 93 & 96,
so we could parse the MTT debug output more easily.

As it is right now, the result message for a test is reported in two very
distinct ways, depending on how a test passes or fails:
1) ugly format from failed tests:
RESULT_MESSAGE_BEGIN
Failed; exit status: 139
RESULT_MESSAGE_END

2) preferred format:
result_message: Failed; timeout expired (120 seconds) )
or
result_message: Passed
or
result_message: Skipped

I don't know if it is related or not, but for tests that fail without
timing out,
the debug output from MTT for that test does NOT have a line like
these:
test_result: 1  (passed)
test_result: 2  (skipped)
test_result: 3  (timed out)

Again, for our e-mail generation script, it would be much easier if there
was a corresponding "test_result: X" for each test regardless of if it failed,
timed-out, was skipped, or passed.

Thanks!
--
Tim Mattox - http://homepage.mac.com/tmattox/
tmat...@gmail.com || timat...@open-mpi.org
   I'm a bright... http://www.the-brights.net/


Re: [MTT users] [devel-core] MTT 2.0 tutorial teleconference

2007-01-04 Thread Tim Mattox

I'll be there for the call on Tuesday.
We are looking forward to switching IU to MTT 2.0
The new report/results pages are great!
--
Tim Mattox - http://homepage.mac.com/tmattox/
tmat...@gmail.com || timat...@open-mpi.org
   I'm a bright... http://www.the-brights.net/


[MTT users] Ignore trunk failures on thor from last night

2006-12-11 Thread Tim Mattox

Hello Ethan & others,
I was expanding out testing with the thor cluster this weekend,
and discovered this morning that one of it's nodes has a faulty
Myrinet card or configuration.

So, please remove or ignore the trunk failures from last night &
early today on IU's thor cluster.  I've excluded the faulty node,
and am rerunning the tests.
--
Tim Mattox - http://homepage.mac.com/tmattox/
tmat...@gmail.com || timat...@open-mpi.org
   I'm a bright... http://www.the-brights.net/


Re: [MTT users] New MTT home page

2006-11-10 Thread Tim Mattox

Hi Ethan,
These look great!  Can you add one more column of choices with the
heading "Failed Test Runs" which would be the same as "Test Runs",
but without the entries that had zero failures.

If it wouldn't be too much trouble, could you also add a "Past 48
Hours" section,
but this is lower priority than adding a "Failed Test Runs" column.

Thanks!

On 11/10/06, Ethan Mallove <ethan.mall...@sun.com> wrote:

Folks,

The MTT home page has been updated to display a number of
quick links divided up by date range, organization, and
testing phase (MPI installation, test build, and test run).
The custom and summary reports are linked from there.

http://www.open-mpi.org/mtt/

--
-Ethan
___
mtt-users mailing list
mtt-us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/mtt-users




--
Tim Mattox - http://homepage.mac.com/tmattox/
tmat...@gmail.com || timat...@open-mpi.org
   I'm a bright... http://www.the-brights.net/


Re: [MTT users] nightly OMPI tarballs

2006-11-08 Thread Tim Mattox

It would help us here at IU for MTT as well if the tarball generation was a
little earlier each day.  9pm Indiana/Eastern time would be good I think.
That would make it 6pm West coast time...  Does that work for
the West coasters?  or should we do 10pm Eastern/7pm West?
Gleb, George, hpcstork, as three that I know do svn commits outside the typical
US workday, how would this affect you?

Maybe we could make the non-trunk tarballs even earlier, since
the gatekeepers would know when we were "done for the day".
What time would Sun need to have the 1.2 tarballs ready
for them to do their MTT runs?  7pm Eastern?

I can work on making the tarball generation go more quickly,
but I suspect I can't get it reliably faster than 1 hour, especially if
we have changes on all three branches (trunk, v1.1, v1.2).
I have some ideas on how to speed it up though from it's
current 2 hour span.  One of the ideas, is to have the v1.2 (and maybe v1.1)
tarballs be built earlier, so that we only have one tarball to build
at the designated time.

As for doing multiple builds per day, I am a bit apposed to doing that
on a regular basis, for two reasons:
1) It takes time & resources (both human and computer) per tarball
for testing, and to look at the results from the testing.  One set per
day seems at the moment to be what we as a group can currently handle.
2) If we have different groups testing from different tarball sets,
then it would become harder to aggregate the testing results,
since we would not necessarily be testing the same tarball.

On 11/8/06, Jeff Squyres <jsquy...@cisco.com> wrote:

I'm wondering if it's worthwhile to either a) move back the nightly
tarball generation to, say, 9pm US Indiana time or b) perhaps make
the tarballs at multiple times during the day.

Since we're doing more and more testing, it seems like we need more
time to do it before the 9am reports.  Right now, we're pretty
limited to starting at about 2am (to guarantee that the tarballs have
finished building).  If you start before then, you could be testing a
tarball that's about a day old.

This was happening to sun, for example, who (I just found out) starts
their testing at 7pm because they have limited time and access to
resources (starting at 7pm lets them finish all their testing by 9am).

So what do people think about my proposals from above?  Either 9pm,
or perhaps make them every 6 hours throughout the day.

--
Jeff Squyres
Server Virtualization Business Unit
Cisco Systems

___
mtt-users mailing list
mtt-us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/mtt-users




--
Tim Mattox - http://homepage.mac.com/tmattox/
tmat...@gmail.com || timat...@open-mpi.org
   I'm a bright... http://www.the-brights.net/


Re: [O-MPI users] Further thoughts

2005-06-17 Thread Tim Mattox
Hello,
This has been an interesting discussion to follow.  Here are my thoughts
on the RPM packaging...

On 6/16/05, Jeff Squyres <jsquy...@open-mpi.org> wrote:
[snip]
> We've also got the "announce" mailing list -- a low volume list just
> for announcing new releases (and *exciting* messages about products you
> might be interested in... just kidding.).
;-)

[snip]
> We actually got a lot of help in this area from Greg Kurtzer from LBL
> (of cAos/Warewulf/Centos fame).  He helped us a bunch with our
> [previously extremely lame] LAM/MPI .spec file, and then offered to
> write one for Open MPI (which he did about a month or two ago).
> 
> I have some random user questions about RPMs, though:
> 
> 1. Would you prefer an all-in-one Open MPI RPM, or would you prefer
> multiple RPMs (e.g., openmpi-doc, openmpi-devel, openmpi-runtime,
> ...etc.)?

I prefer split RPMs.  The fingrained split you mention works well for
thin/diskless-nodes, but a simple split of runtime vs everything-else
would be "good enough".  The primary problem with an all-in-one RPM
would be the footprint of the non-MPI packages that satisfy MPI's
dependence tree, especially the compilers.

> 2. We're definitely going to provide an SRPM suitable for "rpmbuild
> --rebuild".  However, we're not 100% sure that it's worthwhile to
> provide binary RPMs because everyone's cluster/development systems seem
> to be "one off" from standard Linux distros.  Do you want a binary
> RPM(s)?  If so, for which distros?  (this is one area where vendors
> tend to have dramatically different views than academics/researchers)

If you supply fairly clean SRPMs, I think the distros themselves can
do the binary RPM building themselves.  At least that is easy enough
for cAos to do.  I guess the problem lies in the disparity in the distribution
release cycle and Open MPI's expected release cycle.  Certain RedHat
distribution versions shipped with amazingly old versions of LAM/MPI,
which I recall caused no end of trouble on the LAM/MPI mailing lists
with questions from long-ago fixed bugs.  How much is it worth to
the Open MPI team to be able to answer those questions with:
rpm -Uvh http://open-mpi.org//open-mpi-1.0-fixed.x86_64.rpm
rather than having to explain how to do "rpmbuild --rebuild".

I'll suggest that eventually you will want binary RPMs for SUSE 9.3 and
CentOS 4 and/or Scientific Linux 4 in both i386 & x86_64 flavors.
I'm sure you will get demand for a lot of Fedora Core flavors, but I think
that road leads to madness...  I think it might work out better to try and
get Open MPI into Dag Wieers RPM/APT/YUM repositories... see:
   http://dag.wieers.com/home-made/apt/
or the still-under-construction RPMforge site:
   http://rpmforge.net/

That's more than my two cents...
-- 
Tim Mattox - tmat...@gmail.com
  http://homepage.mac.com/tmattox/
I'm a bright... http://www.the-brights.net/