Announcing the Release of MVAPICH2 1.8 GA version and OSU Micro-Benchmarks (OMB) 3.6

2012-05-01 Thread Dhabaleswar Panda
The MVAPICH team is pleased to announce the release of MVAPICH2 1.8 GA
version and OSU Micro-Benchmarks (OMB) 3.6.

The complete set of features and enhancements for MVAPICH2 1.8 compared
to MVAPICH2 1.7 are as follows:

* Features & Enhancements:
- Support for MPI communication from NVIDIA GPU device memory
- High performance RDMA-based inter-node point-to-point
  communication (GPU-GPU, GPU-Host and Host-GPU)
- High performance intra-node point-to-point communication for
  multi-GPU adapters/node (GPU-GPU, GPU-Host and Host-GPU)
- Taking advantage of CUDA IPC (available in CUDA 4.1) in
  intra-node communication
  for multiple GPU adapters/node
- Enhanced designs for Alltoall and Allgather collective
  communication from GPU device buffers
- Optimized and tuned collectives for GPU device buffers
- MPI datatype support for point-to-point and collective
  communication from GPU device buffers
- Support for running UD only mode
- Support suspend/resume functionality with mpirun_rsh
- Enhanced support for CPU binding with socket and numanode level
  granularity
- Support for showing current CPU bindings
- Exporting local rank, local size, global rank and global
  size through environment variables (both mpirun_rsh and hydra)
- Update to hwloc v1.4.1
- Checkpoint-Restart support in OFA-IB-Nemesis interface
- Enabling run-through stabilization support to handle
  process failures in OFA-IB-Nemesis interface
- Enhancing OFA-IB-Nemesis interface to handle IB errors gracefully
- Performance tuning on various architecture clusters
- Support for Mellanox IB FDR adapter
- Adjust shared-memory communication block size at runtime
- Enable XRC by default at configure time
- New shared memory design for enhanced intra-node small message
  performance
- Tuned inter-node and intra-node performance on different cluster
  architectures
- Support for fallback to R3 rendezvous protocol if RGET fails
- SLURM integration with mpiexec.mpirun_rsh to use SLURM
  allocated hosts without specifying a hostfile
- Support added to automatically use PBS_NODEFILE in Torque and PBS
  environments
- Enable signal-triggered (SIGUSR2) migration
- Reduced memory footprint of the library
- Enhanced one-sided communication design with reduced
  memory requirement
- Enhancements and tuned collectives (Bcast and Alltoallv)
- Flexible HCA selection with Nemesis interface
- Thanks to Grigori Inozemtsev, Queens University
- Support iWARP interoperability between Intel NE020 and
  Chelsio T4 Adapters
- RoCE enable environment variable name is changed from
  MV2_USE_RDMAOE to MV2_USE_RoCE

Features and Enhancements for OSU Micro-Benchmarks (OMB)  3.6 are listed
here.

* New Features & Enhancements (since OMB 3.5.1)
- New collective benchmarks
* osu_allgather
* osu_allgatherv
* osu_allreduce
* osu_alltoall
* osu_alltoallv
* osu_barrier
* osu_bcast
* osu_gather
* osu_gatherv
* osu_reduce
* osu_reduce_scatter
* osu_scatter
* osu_scatterv

MVAPICH2 1.8 continues to deliver excellent performance. Sample
performance numbers include:

  OpenFabrics/Gen2 on Sandy Bridge 8-core (2.6 GHz) with PCIe-Gen3
  and ConnectX-3 FDR (Two-sided Operations):
- 1.05 microsec one-way latency (4 bytes)
- 6344 MB/sec unidirectional bandwidth
- 11994 MB/sec bidirectional bandwidth

  OpenFabrics/Gen2-RoCE (RDMA over Converged Ethernet) Support on
  Sandy Bridge 8-core (2.6 GHz) with ConnectX-3 EN (40GigE)
  (Two-sided operations):
- 1.2 microsec one-way latency (4 bytes)
- 4565 MB/sec unidirectional bandwidth
- 9117 MB/sec bidirectional bandwidth

  Intra-node performance on Sandy Bridge 8-core (2.6 GHz)
  (Two-sided operations, intra-socket)
- 0.19 microsec one-way latency (4 bytes)
- 9643 MB/sec unidirectional bandwidth
- 16941 MB/sec bidirectional bandwidth

Sample performance numbers for MPI communication from NVIDIA GPU memory
using MVAPICH2 1.8 and OMB 3.6 can be obtained from the following URL:

http://mvapich.cse.ohio-state.edu/performance/gpu.shtml

Performance numbers for several other platforms and system configurations
can be viewed by visiting `Performance' section of the project's web page.

For downloading MVAPICH2 1.8, OMB 3.6, associated user guide, quick start
guide, and accessing the SVN, please visit the following URL:

http://mvapich.cse.ohio-state.edu

All questions, feedbacks, bug reports, hints for performance tuning,
patches and enhancements are welcome. Please post it to the
mvapich-discuss mailing list (mvapich-disc...@cse.ohio-state.edu).

We are also happy to inform that the number of organizations using
MVAPICH/

Announcing the Release of MVAPICH2 1.7 and OSU Micro-Benchmarks (OMB) 3.4

2011-10-14 Thread Dhabaleswar Panda
The MVAPICH team is pleased to announce the release of MVAPICH2-1.7
and OSU Micro-Benchmarks (OMB) 3.4.

The complete set of Features, Enhancements, and Bug Fixes for MVAPICH2
1.7 (since MVAPICH2-1.6 release) are listed here.

- Based on MPICH2-1.4.1p1
- Integrated Hybrid (UD-RC/XRC) design to get best performance
  on large-scale systems with reduced/constant memory footprint
- CH3 shared memory channel for standalone hosts
  (including laptops) without any InfiniBand adapters
- HugePage support
- Improved intra-node shared memory communication performance
- Shared memory backed windows for One-Sided Communication
- Support for truly passive locking for intra-node RMA in shared
  memory and LIMIC based windows
- Improved on-demand InfiniBand connection setup (CH3 and RoCE)
- Tuned RDMA Fast Path Buffer size to get better performance
  with less memory footprint (CH3 and Nemesis)
- Support for large data transfers (>2GB)
- Integrated with enhanced LiMIC2 (v0.5.5) to support Intra-node
  large message (>2GB) transfers
- Optimized Fence synchronization (with and without
  LIMIC2 support)
- Automatic intra-node communication parameter tuning
  based on platform
- Efficient connection set-up for multi-core systems
- Enhanced designs and tuning for collectives
  (bcast, reduce, barrier, gather, allreduce, allgather,
  gatherv, allgatherv and alltoall)
- Support for shared-memory collectives for modern clusters
  with up to 64 cores/node
- MPI_THREAD_SINGLE provided by default and
  MPI_THREAD_MULTIPLE as an option
- Fast process migration using RDMA
- Enabling Checkpoint/Restart support in pure SMP mode
- Compact and shorthand way to specify blocks of processes
  on the same host with mpirun_rsh
- Support for latest stable version of HWLOC v1.2.2
- Enhanced mpirun_rsh design to avoid race conditions,
  support for fault-tolerance functionality and
  improved debug messages
- Enhanced debugging config options to generate
  core files and back-traces
- Automatic inter-node communication parameter tuning
  based on platform and adapter detection (Nemesis)
- Integrated with latest OSU Micro-benchmarks (3.4)
- Improved performance for medium sized messages (QLogic PSM interface)
- Multi-core-aware collective support (QLogic PSM interface)
- Performance optimization for QDR cards
- Support for Chelsio T4 Adapter
- Support for Ekopath Compiler

Bug Fixes:

- Fixes in Checkpoint/Restart and Migration support
- Fix Restart when using automatic checkpoint
  - Thanks to Alexandr for reporting this
- Handling very large one-sided transfers using RDMA
- Fixes for memory leaks
- Graceful handling of unknown HCAs
- Better handling of shmem file creation errors
- Fix for a hang in intra-node transfer
- Fix for a build error with --disable-weak-symbols
  - Thanks to Peter Willis for reporting this issue
- Fixes for one-sided communication with passive target
  synchronization
- Better handling of memory allocation and registration failures
- Fixes for compilation warnings
- Fix a bug that disallows '=' from mpirun_rsh arguments
- Handling of non-contiguous transfer in Nemesis interface
- Bug fix in gather collective when ranks are in cyclic order
- Fix for the ignore_locks bug in MPI-IO with Lustre
- Compiler preference lists reordered to avoid mixing GCC and Intel
  compilers if both are found by configure
- Fix a bug in transferring very large messages (>2GB)
   - Thanks to Tibor Pausz from Univ. of Frankfurt for reporting it
- Fix a hang with One-Sided Put operation
- Fix a bug in ptmalloc integration
- Avoid double-free crash with mpispawn
- Avoid crash and print an error message in mpirun_rsh when the
  hostfile is empty
- Checking for error codes in PMI design
- Verify programs can link with LiMIC2 at runtime
- Fix for compilation issue when BLCR or FTB installed in
  non-system paths
- Fix an issue with RDMA-Migration
- Fix a hang with RDMA CM
- Fix an issue in supporting RoCE with second port on available on HCA
- Thanks to Jeffrey Konz from HP for reporting it
- Fix for a hang with passive RMA tests (QLogic PSM interface)

New features, Enhancements and Bug Fixes of OSU Micro-Benchmarks (OMB)
3.4 (since OMB 3.3 release) are listed here.

New Features & Enhancements

- Add passive one-sided communication benchmarks
- Update one-sided communication benchmarks to provide shared
  memory hint in MPI_Alloc_mem calls
- Update one-sided communication benchmarks to use MPI_Alloc_mem
  for buffer allocation
- Give default values to configure definitions (can now build
  directly with mpicc)
- Update latency benchmarks to begin from 0 byte message

* Bug Fixes

  

Re: [RFC v2] [OFED] libibverbs: Support both OFED verbs and ibverbs

2011-07-14 Thread Dhabaleswar Panda
Sean,

Both MVAPICH1 and MVAPICH2 have XRC support for many years and are being
used by many large-scale production clusters.

For MVAPICH1, this support is available in both the standard Gen2
interface and the Hybrid (UD-RC/XRC) interface.

>From MVAPICH1 user guide (sec 4.4.1), you can find the following:

"eXtended Reliable Connnection Support (XRC): By default this support is
compiled in to allow the usage of the new scalable XRC transport of
InfiniBand. OFED 1.3 or later is required. If using an older version, then
remove the -DXRC from the CFLAGS variable."

The runtime flag is VIADEV_USE_XRC

Class: Run Time
Applicable device(s): Gen2
When MVAPICH is compiled with the XRC CFLAGS, this parameter enables use
of the XRC transport of InfiniBand available on certain adapters. Enabling
XRC automatically enables Shared Receive Queue and on-demand connection
management.

The hybrid (UD-RC/XRC) interface automatically selects the appropriate
interface at run time.

Not sure which interface (gen2 or hybrid you are using). If you are using
the gen2 interface, looks like you are not using the correct runtime flag
to enable XRC with MVAPICH1.

Thanks,

DK


On Thu, 14 Jul 2011, Jeff Squyres wrote:

> Sean pinged me last night about XRC in Open MPI last night (note that I am no 
> longer on the linux-rdma list).
>
> Open MPI uses XRC, but in a non-default manner -- the user has to 
> specifically ask for it at run time.
>
>
> On Jul 14, 2011, at 9:13 AM, Jack Morgenstein wrote:
>
> > Hi Sean,
> >
> > I am pleased that you are putting in the effort to enable the existing OFED 
> > user base to continue using its
> > code without changes to the XRC calls.
> >
> > Regarding XRC and MPI, see below.
> >
> > On Wednesday 13 July 2011 20:21, Hefty, Sean wrote:
> >> I was able to build and run mvapich2 successfully against libibverbs with
> >> this patch applied on top of the current XRC patches.  (The XRC patches are
> >> still undergoing work.)  I built mvapich2 using the following configure 
> >> options:
> >>
> >> --with-rdma=gen2 CFLAGS=-DOFED_VERBS
> >> and
> >> --with-rdma=gen2 CFLAGS='-DOFED_VERBS -D_ENABLE_XRC_
> >>
> >> It didn't appear that mvapichs ever used XRC
> >>
> > You are correct, mvapich does not use XRC.
> > openMPI uses XRC, so hopefully you can use openMPI to test out your XRC 
> > stuff.
> >
> > You can contact Jeff Squyres for details/help.
> >
> > In the meantime, I include the following from the ewg list:
> > =
> > On 11/08/2010 08:06 PM, Jeff Squyres wrote:
> >> Steve pinged me on IM this morning and told me that you want OMPI v1.4.3 
> >> for the next OFED release.  I just logged into www.openfabrics.org
> >> and apparently the server has changed -- my entire $HOME is empty.
> >>
> >> Where do you want me to put the new OMPI SRPM?  Alternatively, anyone can 
> >> grab the SRPM from the URL below
> >> -- there's nothing special about the SRPM for OpenFabrics that's not 
> >> already in our community SRPM:
> >>
> >> http://www.open-mpi.org/software/ompi/v1.4/
> >>
> >
> > Hi Jeff,
> > The place for the Open MPI on the new server is under:
> > /var/www/openfabrics.org/downloads/openmpi/   
> > (http://www.openfabrics.org/downloads/openmpi/)
> >
> > I updated Open MPI version there to v1.4.3.
> >
> > Regards,
> > Vladimir
> > ==
> >
> > -Jack
>
>
> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Announcing the Release of MVAPICH2 1.6

2011-03-09 Thread Dhabaleswar Panda
The MVAPICH team is pleased to announce the release of MVAPICH2 1.6
with the following NEW features/enhancements and bug fixes:

* NEW Features and Enhancements (since MVAPICH2-1.5.1)

- Optimization and enhanced performance for clusters with nVIDIA
  GPU adapters (with and without GPUDirect technology)
- Support for InfiniBand Quality of Service (QoS) with multiple lanes
- Support for 3D torus topology with appropriate SL settings
- For both CH3 and Nemesis interfaces
- Thanks to Jim Schutt, Marcus Epperson and John Nagle from
  Sandia for the initial patch
- Enhanced R3 rendezvous protocol
- For both CH3 and Nemesis interfaces
- Robust RDMA Fast Path setup to avoid memory allocation
  failures
- For both CH3 and Nemesis interfaces
- Multiple design enhancements for better performance of
  small and medium sized messages
- Using LiMIC2 for efficient intra-node RMA transfer to avoid extra
  memory copies
- Upgraded to LiMIC2 version 0.5.4
- Support of Shared-Memory-Nemesis interface on multi-core platforms
  requiring intra-node communication only (SMP-only systems,
  laptops, etc. )
- Enhancements to mpirun_rsh job start-up scheme on large-scale systems
- Optimization in MPI_Finalize
- XRC support with Hydra Process Manager
- Updated Hydra launcher with MPICH2-1.3.3 Hydra process manager
- Hydra is the default mpiexec process manager
- Enhancements and optimizations for one sided Put and Get operations
- Removing the limitation on number of concurrent windows in RMA
  operations
- Optimized thresholds for one-sided RMA operations
- Support for process-to-rail binding policy (bunch, scatter and
  user-defined) in multi-rail configurations (OFA-IB-CH3, OFA-iWARP-CH3,
  and OFA-RoCE-CH3 interfaces)
- Enhancements to Multi-rail Design and features including striping
  of one-sided messages
- Dynamic detection of multiple InfiniBand adapters and using these
  by default in multi-rail configurations (OFA-IB-CH3, OFA-iWARP-CH3
  and OFA-RoCE-CH3 interfaces)
- Optimized and tuned algorithms for Gather, Scatter, Reduce,
  AllReduce and AllGather collective  operations
- Enhanced support for multi-threaded applications
- Fast Checkpoint-Restart support with aggregation scheme
- Job Pause-Migration-Restart Framework for Pro-active Fault-Tolerance
- Support for new standardized Fault Tolerant Backplane (FTB) Events
  for Checkpoint-Restart and Job Pause-Migration-Restart Framework
- Enhanced designs for automatic detection of various
  architectures and adapters
- Configuration file support (similar to the one available in MVAPICH).
  Provides a convenient method for handling all runtime variables
  through a configuration file.
- User-friendly configuration options to enable/disable various
  checkpoint/restart and migration features
- Enabled ROMIO's auto detection scheme for filetypes
  on Lustre file system
- Improved error checking for system and BLCR calls in
  checkpoint-restart and migration codepath
- Enhanced OSU Micro-benchmarks suite (version 3.3)
- Building and installation of OSU micro benchmarks during default
  MVAPICH2 installation
- Improved configure help for MVAPICH2 features
- Improved usability of process to CPU mapping with support of
  delimiters (',' , '-') in CPU listing
- Thanks to Gilles Civario for the initial patch
- Use of gfortran as the default F77 compiler

* Bug fixes (since MVAPICH2-1.5.1)

- Fix for shmat() return code check
- Fix for issues in one-sided RMA
- Fix for issues with inter-communicator collectives in Nemesis
- KNEM patch for osu_bibw issue with KNEM version 0.9.2
- Fix for osu_bibw error with Shared-memory-Nemesis interface
- Fix for a hang in collective when thread level is set to multiple
- Fix for intel test errors with rsend, bsend and ssend
  operations in Nemesis
- Fix for memory free issue when it allocated by scandir
- Fix for a hang in Finalize
- Fix for issue with MPIU_Find_local_and_external when it is called
  from MPIDI_CH3I_comm_create
- Fix for handling CPPFLGS values with spaces
- Dynamic Process Management to work with XRC support
- Fix related to disabling CPU affinity when shared memory is
  turned off at run time
- Resolving a hang in mpirun_rsh termination when CR is enabled
- Fixing issue in MPI_Allreduce and Reduce when called with MPI_IN_PLACE
- Thanks to the initial patch by Alexander Alekhin
- Fix for threading related errors with comm_dup
- Fix for alignment issues in RDMA Fast Path
- Fix for extra memcpy in header caching
- Only set FC and F77 if gfortran is executable
- Fix in aggregate ADIO alignment
- XRC connection management
- Fixes in registration cache

Announcing the Release of MVAPICH2 1.5.1

2010-09-14 Thread Dhabaleswar Panda
The MVAPICH team is pleased to announce the release of MVAPICH2 1.5.1
with the following NEW features/enhancements and bug fixes:

* NEW Features and Enhancements (since MVAPICH2-1.5)

  - Significantly reduce memory footprint on some systems by changing the
stack size setting for multi-rail configurations
  - Optimization to the number of RDMA Fast Path connections
  - Performance improvements in Scatterv and Gatherv collectives for CH3
interface (Thanks to Dan Kokran and Max Suarez of NASA for identifying
the issue)
  - Tuning of Broadcast Collective
  - Support for tuning of eager thresholds based on both adapter and
platform type
  - Environment variables for message sizes can now be expressed in short
form K=Kilobytes and M=Megabytes (e.g.  MV2_IBA_EAGER_THRESHOLD=12K)
  - Ability to selectively use some or all HCAs using colon separated
lists. e.g. MV2_IBA_HCA=mlx4_0:mlx4_1
  - Improved Bunch/Scatter mapping for process binding with HWLOC and SMT
support (Thanks to Dr. Bernd Kallies of ZIB for ideas and suggestions)
  - Update to Hydra code from MPICH2-1.3b1
  - Auto-detection of various iWARP adapters
  - Specifying MV2_USE_IWARP=1 is no longer needed when using iWARP
  - Changing automatic eager threshold selection and tuning for iWARP
adapters based on number of nodes in the system instead of the number
of processes
  - PSM progress loop optimization for QLogic Adapters (Thanks to Dr.
Avneesh Pant of QLogic for the patch)

* Bug fixes (since MVAPICH2-1.5)

  - Fix memory leak in registration cache with --enable-g=all
  - Fix memory leak in operations using datatype modules
  - Fix for rdma_cross_connect issue for RDMA CM. The server is prevented
from initiating a connection
  - Don't fail during build if RDMA CM is unavailable
  - Various mpirun_rsh bug fixes for CH3, Nemesis and uDAPL interfaces
  - ROMIO panfs build fix
  - Update panfs for not-so-new ADIO file function pointers
  - Shared libraries can be generated with unknown compilers
  - Explicitly link against DL library to prevent build error due to DSO
link change in Fedora 13 (introduced with gcc-4.4.3-5.fc13)
  - Fix regression that prevents the proper use of our internal HWLOC
component
  - Remove spurious debug flags when certain options are selected at
build time
  - Error code added for situation when received eager SMP message is
larger than receive buffer
  - Fix for Gather and GatherV back-to-back hang problem with LiMIC2
  - Fix for packetized send in Nemesis
  - Fix related to eager threshold in nemesis ib-netmod
  - Fix initialization parameter for Nemesis based on adapter type
  - Fix for uDAPL one sided operations (Thanks to Jakub Fedoruk from
Intel for reporting this)
  - Fix an issue with out-of-order message handling for iWARP
  - Fixes for memory leak and Shared context Handling in PSM for QLogic
Adapters (Thanks to Dr. Avneesh Pant of QLogic for the patch)

For downloading MVAPICH2 1.5.1, associated user guide and accessing the
SVN, please visit the following URL:

http://mvapich.cse.ohio-state.edu

MVAPICH2 1.5.1 is also being made available with OFED 1.5.2.

All questions, feedbacks, bug reports, hints for performance tuning,
patches and enhancements are welcome. Please post it to the
mvapich-discuss mailing list (mvapich-disc...@cse.ohio-state.edu).

We are also happy to inform that the number of organizations using
MVAPICH/MVAPICH2 and registered at the MVAPICH site has crossed 1,250
world-wide (in 59 countries). The list of organizations can be
obtained from http://mvapich.cse.ohio-state.edu/current_users/. The
MVAPICH team extends thanks to all these organizations.

Thanks,

The MVAPICH Team



--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Announcing the Release of MVAPICH2 1.5

2010-07-12 Thread Dhabaleswar Panda
The MVAPICH team is pleased to announce the release of MVAPICH2 1.5 with
the following NEW features/enhancements and bug fixes:

* NEW Features and Enhancements (since MVAPICH2-1.4.1)

 - MPI 2.2 standard compliant
 - Based on MPICH2 1.2.1p1
 - OFA-IB-Nemesis interface design
- OpenFabrics InfiniBand network module support for
  MPICH2 Nemesis modular design
- Support for high-performance intra-node shared memory
  communication provided by the Nemesis design
- Adaptive RDMA Fastpath with Polling Set for high-performance
  inter-node communication
- Shared Receive Queue (SRQ) support with flow control,
  uses significantly less memory for MPI library
- Header caching
- Advanced AVL tree-based Resource-aware registration cache
- Memory Hook Support provided by integration with ptmalloc2
  library. This provides safe release of memory to the
  Operating System and is expected to benefit the memory
  usage of applications that heavily use malloc and free operations.
- Support for TotalView debugger
- Shared Library Support for existing binary MPI application
  programs to run ROMIO Support for MPI-IO
- Support for additional features (such as hwloc,
  hierarchical collectives, one-sided, multithreading, etc.),
  as included in the MPICH2 1.2.1p1 Nemesis channel
 - Flexible process manager support
- mpirun_rsh to work with any of the eight interfaces
  (CH3 and Nemesis channel-based) including OFA-IB-Nemesis,
  TCP/IP-CH3 and TCP/IP-Nemesis
- Hydra process manager to work with any of the eight interfaces
  (CH3 and Nemesis channel-based) including OFA-IB-CH3,
  OFA-iWARP-CH3, OFA-RoCE-CH3 and TCP/IP-CH3
 - MPIEXEC_TIMEOUT is honored by mpirun_rsh
 - Support for hwloc library (1.0.1) for defining CPU affinity
 - Deprecating older PLPA support for defining CPU affinity
   with HWLOC
 - Efficient CPU binding policies (bunch and scatter) to
   specify CPU binding per job for modern multi-core platforms
 - New flag in mpirun_rsh to execute tasks with different group IDs
 - Enhancement to the design of Win_complete for RMA operations
 - Flexibility to support variable number of RMA windows
 - Support for Intel iWARP NE020 adapter
- Tuning for Intel iWARP NE020 adapter, thanks to Harry
  Cropper of Intel
 - SRQ turned on by default for Nemesis interface
 - Performance tuning - adjusted eager thresholds for
   variety of architectures, vbuf size based on adapter
   types and vbuf pool sizes
 - Introduction of a retry mechanism for RDMA_CM connection
   establishment

* Bug fixes (since MVAPICH2-1.4.1)

 - Fix compilation error when configured with
   `--enable-thread-funneled'
 - Fix MPE functionality, thanks to Anthony Chan  for
   reporting and providing the resolving patch
 - Cleanup after a failure in the init phase is handled better by
   mpirun_rsh
 - Path determination is correctly handled by mpirun_rsh when DPM is
   used
 - Shared libraries are correctly built (again)
 - Compilation issue with the ROMIO adio-lustre driver, thanks
   to Adam Moody of LLNL for reporting the issue
 - Allowing checkpoint-restart for large-scale systems
 - Correcting a bug in clear_kvc function. Thanks to T J (Chris) Ward,
   IBM Research, for reporting and providing the resolving patch
 - Shared lock operations with RMA with scatter process distribution.
   Thanks to Pavan Balaji of Argonne for reporting this issue
 - Fix a bug during window creation in uDAPL
 - Compilation issue with --enable-alloca, Thanks to E. Borisch,
   for reporting and providing the patch
 - Improved error message for ibv_poll_cq failures
 - Fix an issue that prevents mpirun_rsh to execute programs without
   specifying the path from directories in PATH
 - Fix an issue of mpirun_rsh with Dynamic Process Migration (DPM)
 - Fix for memory leaks (both CH3 and Nemesis interfaces)
 - Updatefiles correctly update LiMIC2
 - Several fixes to the registration cache
   (CH3, Nemesis and uDAPL interfaces)
 - Fix to multi-rail communication
 - Fix to Shared Memory communication Progress Engine
 - Fix to all-to-all collective for large number of processes
 - Fix in build process with hwloc (for some Distros)
 - Fix for memory leak (Nemesis interface)

MVAPICH2 1.5 is being made available with OFED 1.5.2. It continues
to deliver excellent performance. Sample performance numbers include:

  OpenFabrics/Gen2 on Nehalem quad-core (2.4 GHz) with PCIe-Gen2
  and ConnectX2-QDR (Two-sided Operations):
- 1.62 microsec one-way latency (4 bytes)
- 3021 MB/sec unidirectional bandwidth
- 5858 MB/sec bidirectional bandwidth

  QLogic InfiniPath Support on Nehalem quad-core (2.4 GHz) with
  PCIe-Gen2 and QLogic-DDR (Two-sided Operations):
- 2.35 microsec one-way latency (4 bytes)
- 1910 MB/sec unidirectional bandwidth
- 3184 MB/sec bidirectional bandwidth

  OpenFabrics/Gen2-RoCE (RDMA over Convereged Ethernet) Suppor

Re: [PATCH 2/4] ib_core: implement XRC RCV qp's

2010-05-08 Thread Dhabaleswar Panda
Hi Jack,

Thanks for your note and the suggested changes. I will discuss this with
my team members and get back to you with our thoughts next week.

Thanks,

DK

On Sat, 8 May 2010, Jack Morgenstein wrote:

> Dr. Panda, Jeff, and Ishai,
>
> We are trying to get XRC integrated into the next mainstream kernel.
>
> For the kernel submission, I added a destroy_xrc_rcv_qp method (to be
> used if the application did not require persistence of the xrc_rcv qp
> after the creating process terminated -- per Diego Copernicoff's request).
> This did not affect the core API of create/modify/unreg that you have
> been using until now.
>
> However, even without the new destroy method (as I suggest below),
> having the creating process call unreg is still a bit counterintuitive,
> since it calls create, and registration is a side-effect.
>
> Roland is now intensively reviewing the XRC patches, and a made suggestion
> to simplify the API which Tziporet and I agree with (see Roland's comments 
> below).
>
> Please comment on this suggestion (which is to have reg_xrc_rcv_qp do create
> as well).
> This is a minor change, that would require two changes in your current calls:
> 1. Instead of calling create_xrc_rcv_qp(), as is done currently, MPI would 
> call
>u32 qp_num = 0x;
>   err = reg_xrc_rcv_qp(xrcd, &qp_num);
>and would have the created qp number returned in qp_num;
>(the qp_init attributes in the old create_xrc_rcv_qp are all ignored 
> except for
>the xrc domain handle anyway)
>
> 2. instead of calling reg_xrc_rcv_qp(xrcd, qp_num), you would need to set the
>qp number in a u32 variable, and call reg_xrc_rcv_qp(xrcd, &qp_num).
>
> The other xrc_rcv_qp verbs would work as they work now.
>
> Regarding OFED, this change would not affect OFED 1.5.x ; it would only enter
> OFED at 1.6.x.
>
> Please comment.
>
> -Jack
>
> P.S. You can see the submission/discussion of XRC starting at:
>   http://www.mail-archive.com/linux-rdma@vger.kernel.org/msg02792.html
> On Thursday 06 May 2010 01:40, Roland Dreier wrote:
> >  > > I don't really understand the semantics here.  What is supposed to
> >  > > happen if I do create/reg/destroy?> What happens if one process does
> >  > > destroy while another process is still registered?
> >
> >  > Maybe we can simply assert that the unreg IS the destroy method of the
> >  > IB_SPEC, and get rid of the destroy method.
> >  >
> >  > The xrc target qp section of the spec was not written with QP persistence
> >  > (after the creating process exited) in mind.  That requirement surfaced
> >  > at the last minute as a result of testing by the MPI community during the
> >  > implementation phase (as far as I know).  Unfortunately, this created
> >  > a semantic problem.
> >
> > Yes, I think we should try to simplify things here.
> >
> > It's very unfortunate to diverge from the API that's been shipped for a
> > while now, but I really think we don't want all these different ways of
> > saying the same thing, with little difference between create and reg,
> > and between destroy and unreg.
> >
> > In fact the smallest possible API would be just
> >
> >   register_xrc_rcv_qp(xrcd, *qp_num)
> >
> > where the user can pass in an invalid qp_num (say, -1 aka ) and
> > have a new QP created, or a valid one to take a new ref on the existing
> > rcv QP, and
> >
> >   unregister_xrc_rcv_qp(xrcd, qp_num).
> >
> > (along these lines, the structure in these patches:
> >
> > +struct ib_uverbs_create_xrc_rcv_qp {
> > +   __u64 response;
> > +   __u64 user_handle;
> > +   __u32 xrcd_handle;
> > +   __u32 max_send_wr;
> > +   __u32 max_recv_wr;
> > +   __u32 max_send_sge;
> > +   __u32 max_recv_sge;
> > +   __u32 max_inline_data;
> > +   __u8  sq_sig_all;
> > +   __u8  qp_type;
> > +   __u8  reserved[6];
> > +   __u64 driver_data[0];
> > +};
> >
> > has many fields we don't need.  Pretty much all the fields after
> > xrcd_handle are ignored, except sq_sig_all is used -- and that is highly
> > dubious since the rcv QP has no SQ!  So I would propose something like
> > just having:
> >
> > +struct ib_uverbs_reg_xrc_rcv_qp {
> > +   __u64 response;
> > +   __u32 xrcd_handle;
> > +   __u32 qp_num;
> > +   __u64 driver_data[0];
> > +};
> >
> > where response is used to pass back the qp_num in the create case.
> >
> > And then we just have unreg_xrc_rcv_qp and no destroy method (since they
> > are synonymous anyway).
>

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 1.5 rc 1 install/build error

2009-10-18 Thread Dhabaleswar Panda
Pasha from Mellanox has uploaded a new mvapich rpm build to the open
fabrics website - http://www.openfabrics.org/~pasha/ofed_1_5/mvapich/ The
new build should resolve the rpm build problems on Fedora 10 OS you are
seeing. Let us know if the problem persists with the latest build.

Thanks,

DK


On Thu, 8 Oct 2009, Jeremy Enos wrote:

> This is the HPC build option:
>
> ---
> /var/tmp/mvapich-1.1.0-3390-root/usr/mpi/gcc/mvapich-1.1.0/bin/mpicc -o
> cpi cpi.o -lm
> /var/tmp/mvapich-1.1.0-3390-root/usr/mpi/gcc/mvapich-1.1.0/bin/mpif77
> -fPIC -L/usr/lib64 -Wall -c pi3.f
> /var/tmp/mvapich-1.1.0-3390-root/usr/mpi/gcc/mvapich-1.1.0/bin/mpif77 -o
> pi3 pi3.o
> /var/tmp/mvapich-1.1.0-3390-root/usr/mpi/gcc/mvapich-1.1.0/bin/mpif90
> -Wall -c pi3f90.f90
> /var/tmp/mvapich-1.1.0-3390-root/usr/mpi/gcc/mvapich-1.1.0/bin/mpif90
> -o pi3f90 pi3f90.o
> rm -f *.o *~ PI* cpi  pi3 simpleio hello++ pi3f90 cpilog
> rm -rf SunWS_cache ii_files pi3f90.f pi3p cpip *.ti *.ii
> installed MPICH in
> /var/tmp/mvapich-1.1.0-3390-root/usr/mpi/gcc/mvapich-1.1.0
> /var/tmp/mvapich-1.1.0-3390-root/usr/mpi/gcc/mvapich-1.1.0/sbin/mpiuninstall
> may be used to remove the installation.
> + rm -f
> /var/tmp/mvapich-1.1.0-3390-root//usr/mpi/gcc/mvapich-1.1.0/share/examples/mpirun
> + cat
> + cat
> + /usr/lib/rpm/redhat/brp-compress
> + /usr/lib/rpm/redhat/brp-strip /usr/bin/strip
> + /usr/lib/rpm/redhat/brp-strip-static-archive /usr/bin/strip
> + /usr/lib/rpm/redhat/brp-strip-comment-note /usr/bin/strip /usr/bin/objdump
> + /usr/lib/rpm/brp-python-bytecompile
> + /usr/lib/rpm/redhat/brp-python-hardlink
> + /usr/lib/rpm/redhat/brp-java-repack-jars
> Processing files: mvapich_gcc-1.1.0-3390
> error: File not found:
> /var/tmp/OFED_topdir/BUILDROOT/mvapich_gcc-1.1.0-3390.x86_64/usr/mpi/gcc/mvapich-1.1.0
>
>
> RPM build errors:
> File not found:
> /var/tmp/OFED_topdir/BUILDROOT/mvapich_gcc-1.1.0-33902.6.27.35-170.2.94.fc10.x86_64
> .x86_64/usr/mpi/gcc/mvapich-1.1.0
>
> ---
>
> OS: Fedora Core 10 x64, up to date as of today
> Kernel:  2.6.27.35-170.2.94.fc10.x86_64
>
> This seems like more of a packaging problem than a build one.
>
> Jeremy
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html