[ewg] Re: [PATCH] docs: Update NFSRDMA SLES10 comment
Jon Mason wrote: Update the NFS SLES10 comment to reflect the rnfs-utils rpm updating the nfsserver scripts. Signed-off-by: Jon Mason j...@opengridcomputing.com --- nfs-rdma.release-notes.txt |5 +++-- 1 files changed, 3 insertions(+), 2 deletions(-) diff --git a/nfs-rdma.release-notes.txt b/nfs-rdma.release-notes.txt index a4369e4..f876908 100644 --- a/nfs-rdma.release-notes.txt +++ b/nfs-rdma.release-notes.txt @@ -187,8 +187,9 @@ NFS/RDMA Setup rpc.statd by default. However, the in-kernel lockd that was in SLES10 has been removed in the new kernels. Since OFED is back-porting the new code to the older distro's, there is no in-kernel lockd in SLES10 and the SLES10 -nfsserver scripts do not know the need to start it. Therefore, the user -MUST start rpc.statd. +nfsserver scripts do not know the need to start it. Therefore, the +nfsserver scripts will be modified when the rnfs-utils rpm is installed to +start/stop rpc.statd. - On the client system Applied, Regards, Vladimir ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] Re: [PATCH] docs: Document NFSRDMA mlx4 firmware requirement
Jon Mason wrote: Per Bug 1815, Document the NFSRDMA requirement for ConnectX 2.7.0 firmware. Signed-off-by: Jon Mason j...@opengridcomputing.com --- nfs-rdma.release-notes.txt |5 + 1 files changed, 5 insertions(+), 0 deletions(-) diff --git a/nfs-rdma.release-notes.txt b/nfs-rdma.release-notes.txt index f876908..0e966e3 100644 --- a/nfs-rdma.release-notes.txt +++ b/nfs-rdma.release-notes.txt @@ -219,3 +219,8 @@ a 64KB page size (like PPC64 and IA64 systems) and your server is using a 4KB page size (like i386 and X86_64), then you need to mount the server using rsize=32768,wsize=32768 to avoid overrunning the Chelsio RNIC fast register limits. This is a known firmware limitation in the Chelsio RNIC. + +Running NFSRDMA over Mellanox's ConnectX HCA requires that the adapter firmware +be 2.7.0 or greater on all NFS clients and servers. Firmware 2.6.0 has known +issues that prevent the RDMA connection from being established. Firmware 2.7.0 +has resolved these issues. Applied, Regards, Vladimir ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] Re: [ANNOUNCE] libnes-1.0.0 release
Tung, Chien Tin wrote: New release of libnes library (1.0.0) is available at: http://www.openfabrics.org/downloads/nes/libnes-1.0.0.tar.gz sha1sum: e9d73b46cbc6c000f9904874fb1aba874b6ab8cf libnes-1.0.0.tar.gz Changes since last release: Chien Tung (2): libnes: sync up with function prototype changes in libibverbs-1.1.3 libnes: update version number to 1.0.0 Don Wood (2): libnes: Fix a thread safety problem on send and recv paths. libnes: Change the fence flag to read fence Vlad, please pull this in for OFED 1.5. Thanks, Chien -- Chien Tung | chien.tin.t...@intel.com Pulled into OFED-1.5. Regards, Vladimir ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] Re: [PATCH] IB/qib: fix IPoIB device stop deadlock
Ralph Campbell wrote: Vlad, please pull 4 more fixes to the qib driver from: git://git.openfabrics.org/~ralphc/linux-2.6/.git ofed_kernel_1_5 commit 70f277ec61809c15b3352aa6b650882ebef05235 Author: Ralph Campbell (QLogic) ral...@hosting.openfabrics.org Date: Tue Dec 8 17:14:43 2009 -0800 IB/qib: fix IPoIB device stop deadlock We create our own workqueue mainly because we want to be able to flush it when devices are being removed. We can't use schedule_work()/flush_scheduled_work() because both unregister_netdev() and linkwatch_event take the rtnl lock, so flush_scheduled_work() can deadlock during device removal. Signed-off-by: Mitko Haralanov mi...@qlogic.com commit 9e829f8742501afefc49c1b16945322cd6577dfd Author: Ralph Campbell (QLogic) ral...@hosting.openfabrics.org Date: Tue Dec 8 17:12:02 2009 -0800 IB/qib: serdes changes for QME7342 serdes Because we now have different values of H1 all over, I redid the h1_vals code that we weren't really using, changed it to a single h1_val, and initialized for the board types that need different values. Signed-off-by: Dave Olson dave.ol...@qlogic.com commit 52cb89d80698f6cfdf7d58f0b0bbef85cd70dc69 Author: Ralph Campbell (QLogic) ral...@hosting.openfabrics.org Date: Tue Dec 8 17:11:22 2009 -0800 IB/qib: added missing code to report if 7322 memory BIST failed Don't clear the memory built-in-self-test error bit so it gets reported. Signed-off-by: Dave Olson dave.ol...@qlogic.com commit 56d291199ea1479a441af974ab3f311b4703c897 Author: Ralph Campbell (QLogic) ral...@hosting.openfabrics.org Date: Tue Dec 8 17:10:35 2009 -0800 IB/qib: improve twsi error messages for human beings Some people were confused by TWSI, so make messages somewhat clearer. Signed-off-by: Dave Olson dave.ol...@qlogic.com Done, Regards, Vladimir ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] ofa_1_5_kernel 20091209-0200 daily build status
This email was generated automatically, please do not reply git_url: git://git.openfabrics.org/ofed_1_5/linux-2.6.git git_branch: ofed_kernel_1_5 Common build parameters: Passed: Passed on i686 with linux-2.6.18 Passed on i686 with linux-2.6.21.1 Passed on i686 with linux-2.6.19 Passed on i686 with linux-2.6.26 Passed on i686 with linux-2.6.24 Passed on i686 with linux-2.6.22 Passed on i686 with linux-2.6.27 Passed on x86_64 with linux-2.6.16.60-0.54.5-smp Passed on x86_64 with linux-2.6.16.60-0.21-smp Passed on x86_64 with linux-2.6.18 Passed on x86_64 with linux-2.6.18-128.el5 Passed on x86_64 with linux-2.6.18-164.el5 Passed on x86_64 with linux-2.6.19 Passed on x86_64 with linux-2.6.20 Passed on x86_64 with linux-2.6.18-93.el5 Passed on x86_64 with linux-2.6.21.1 Passed on x86_64 with linux-2.6.24 Passed on x86_64 with linux-2.6.22 Passed on x86_64 with linux-2.6.26 Passed on x86_64 with linux-2.6.27 Passed on x86_64 with linux-2.6.25 Passed on x86_64 with linux-2.6.27.19-5-smp Passed on x86_64 with linux-2.6.9-89.ELsmp Passed on x86_64 with linux-2.6.9-67.ELsmp Passed on x86_64 with linux-2.6.9-78.ELsmp Passed on ia64 with linux-2.6.19 Passed on ia64 with linux-2.6.18 Passed on ia64 with linux-2.6.21.1 Passed on ia64 with linux-2.6.23 Passed on ia64 with linux-2.6.24 Passed on ia64 with linux-2.6.22 Passed on ia64 with linux-2.6.26 Passed on ia64 with linux-2.6.25 Passed on ppc64 with linux-2.6.18 Passed on ppc64 with linux-2.6.19 Failed: ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ewg] [PATCH] docs: Document NFSRDMA mlx4 firmware requirement
Jon Mason wrote: Firmware 2.6.0 has known issues that prevent the RDMA connection from being established. Looking on bz 1815 I couldn't see why you say there are known issues with connection establishment in firmware 2.6, as the problems there were around fast reg work requests, have I missed something? ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
RE: [ewg] [PATCH] docs: Document NFSRDMA mlx4 firmware requirement
There was a FW bug with FRWR that was fixed in FW 2.7.0 What is not clear here? Tziporet -Original Message- From: Or Gerlitz [mailto:ogerl...@voltaire.com] Sent: Wednesday, December 09, 2009 2:21 PM To: Jon Mason Cc: Tziporet Koren; ewg@lists.openfabrics.org Subject: Re: [ewg] [PATCH] docs: Document NFSRDMA mlx4 firmware requirement Jon Mason wrote: Firmware 2.6.0 has known issues that prevent the RDMA connection from being established. Looking on bz 1815 I couldn't see why you say there are known issues with connection establishment in firmware 2.6, as the problems there were around fast reg work requests, have I missed something? ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ewg] [PATCH] docs: Document NFSRDMA mlx4 firmware requirement
Tziporet Koren wrote: There was a FW bug with FRWR that was fixed in FW 2.7.0 What is not clear here? exactly, the bug has nothing to do with the connection establishment but rather with fast reg work requests, while the text points towards conn establishment. Or. ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] New Open MPI: 1.4
I have uploaded a new Open MPI to staging.openfabrics.org and updated the latest.txt so that the nightly OFED 1.5 builds will pull from it. This new version is a ***VERY*** minor change compared to v1.3.4 (see below) and is in direct response to the GNU Libtool security alert. To be clear: the change of numbering from 1.3.4 to 1.4 *only* reflects the fact that the code has now shifted from a new features focus to a bug fix only focus. It does *not* indicate a large change in the code. The Open MPI Team, representing a consortium of research, academic, and industry partners is releasing Open MPI version 1.4 in reaction to the GNU Libtool 2.2.6b security update release (see http://security-tracker.debian.org/tracker/CVE-2009-3736 for more details). The Open MPI v1.4 release closes a potential security vulnerability associated with the embedded version of GNU Libtool used in the Open MPI v1.3.x series. The *only* change between Open MPI v1.3.4 and Open MPI v1.4 is that we used GNU Libtool 2.2.6b to build Open MPI v1.4, thereby updating Open MPI's embedded copy of the libltdl library. *** More details about this release were sent to the Open MPI user's list: http://www.open-mpi.org/community/lists/users/2009/12/11460.php -- Jeff Squyres jsquy...@cisco.com ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] missing daily build for 09/12/09?
Hi all, I don't see a daily snapshot for Dec. 9, 2009 on the website: [DIR]Parent Directory - [TXT]latest.txt 08-Dec-2009 06:08 27 [ ]latest.tgz 08-Dec-2009 06:08 65M [ ]OFED-1.5-20091208-06.. 08-Dec-2009 06:08 65M [ ]OFED-1.5-20091207-07.. 07-Dec-2009 07:34 65M [ ]OFED-1.5-20091207-06.. 07-Dec-2009 06:49 65M ... Is this a known condition? b. signature.asc Description: This is a digitally signed message part ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] [PATCH] mlx4_core: Cleanup bug in __mlx4_init_one()
If mlx4_init_port_info() fails, cleanup the initialized ports only. Signed-off-by: Eli Cohen e...@mellanox.co.il --- drivers/net/mlx4/main.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/drivers/net/mlx4/main.c b/drivers/net/mlx4/main.c index 6f5a3cf..0c868c9 100644 --- a/drivers/net/mlx4/main.c +++ b/drivers/net/mlx4/main.c @@ -1294,7 +1294,7 @@ static int __mlx4_init_one(struct pci_dev *pdev, const struct pci_device_id *id) return 0; err_port: - for (port = 1; port = dev-caps.num_ports; port++) + for (--port; port = 1; --port) mlx4_cleanup_port_info(priv-port[port]); mlx4_cleanup_counters_table(dev); -- 1.6.5.5 ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] [PATCH ofed-1.5 docs] cxgb3 release notes for ofed-1.5
Signed-off-by: Steve Wise sw...@opengridcomputing.com --- cxgb3_release_notes.txt | 115 ++- 1 files changed, 102 insertions(+), 13 deletions(-) diff --git a/cxgb3_release_notes.txt b/cxgb3_release_notes.txt index d1fdafc..61e45da 100644 --- a/cxgb3_release_notes.txt +++ b/cxgb3_release_notes.txt @@ -1,20 +1,20 @@ Open Fabrics Enterprise Distribution (OFED) CHELSIO T3 RNIC RELEASE NOTES - May 2009 + Dec 2009 The iw_cxgb3 and cxgb3 modules provide RDMA and NIC support for the Chelsio S series adapters. Make sure you choose the 'cxgb3' and -'libcxgb3' options when generating your ofed-1.4.1 rpms. +'libcxgb3' options when generating your ofed rpms. -New for ofed-1.4.1 +New for ofed-1.5 -- NFSRDMA support. +- 7.7 Firmware. See below for more information on updating your RNIC +to the latest firmware. -- 7.4 Firmware support. See below for more information on updating -your RNIC to the latest firmware. +- Version 1.1.2 cxgb3 driver. Enabling Various MPIs @@ -33,10 +33,12 @@ options iw_cxgb3 peer2peer=1 For Intel MPI, HP MPI, and Scali MPI: Enable the chelsio device by adding an entry to /etc/dat.conf for the chelsio interface. For instance, -if your chelsio interface name is eth2, then the following line adds a -DAT device named chelsio for that interface: +if your chelsio interface name is eth2, then the following line adds +a DAT version 1.2 and 2.0 devices named chelsio and chelsio2 for +that interface: chelsio u1.2 nonthreadsafe default libdaplcma.so.1 dapl.1.2 eth2 0 +chelsio2 u2.0 nonthreadsafe default libdaplofa.so.2 dapl.2.0 eth2 0 = Intel MPI: @@ -51,15 +53,38 @@ export I_MPI_DEVICE=rdssm:chelsio export MPIEXEC_TIMEOUT=180 export MPI_BIT_MODE=64 +Logout log back in. + +Populate mpd.hosts with node names. +Note: The hosts in this file should be Chelsio interface IP addresses. + Note: I_MPI_DEVICE=rdssm:chelsio assumes you have an entry in /etc/dat.conf named chelsio. +Note: MPIEXEC_TIMEOUT value might be required to increase if heavy traffic +is going across the systems. + Contact Intel for obtaining their MPI with DAPL support. +To run Intel MPI applications: + + mpdboot -n num nodes -r ssh --ncpus=num cpus + mpiexec -ppn process per node -n num nodes MPI Application Path + + = HP MPI: = +The following env vars enable HP MPI version 2.03.01.00. Place these +in your user env after installing and setting up HP MPI: + +export MPI_ROOT=/opt/hpmpi +export PATH=$MPI_ROOT/bin:/opt/bin:$PATH +export MANPATH=$MANPATH:$MPI_ROOT/share/man + +Log out log back in. + To run HP MPI applications, use these mpirun options: -prot -e DAPL_MAX_INLINE=64 -UDAPL @@ -80,17 +105,28 @@ Scali MPI: = The following env vars enable Scali MPI. Place these in your user env -after installing and setting up Scali MPI for running over Infiniband: +after installing and setting up Scali MPI for running over IWARP: export DAPL_MAX_INLINE=64 export SCAMPI_NETWORKS=chelsio export SCAMPI_CHANNEL_ENTRY_COUNT=chelsio:128 +Log out log back in. + Note: SCAMPI_NETWORKS=chelsio assumes you have an entry in /etc/dat.conf named chelsio. +Note: SCAMPI supports only dapl 1.2 library not dapl 2.0 + Contact Scali for obtaining their MPI with DAPL support. +To run SCALI MPI applications: + + mpimon SCALI Application Path -- node1_IP procs node2_IP procs + +Note: procs is the number of processes to run on the node Note: +node#_IP should be the IP of Chelsio's interface + = OpenMPI: = @@ -102,6 +138,58 @@ Users wishing to performance tune the configurable options may wish to inspect the receive queue values. Those can be found in the Chelsio T3 section of mca-btl-openib-hca-params.ini. +Note: OpenMPI version 1.3 does not support newer Chelsio card with device +ID 0x0035 and 0x0036. To use those cards add the device id of the cards +in the Chelsio T3 section of mca-btl-openib-hca-params.ini file. + +To run OpenMPI applications: + + mpirun --host node1,node2 -mca btl openib,sm,self OpenMPI Application Path + += +MVAPICH2: += + +The following env vars enable MVAPICH2 version 1.4-2. Place these +in your user env after installing and setting up MVAPICH2 MPI: + +export MVAPICH2_HOME=/usr/mpi/gcc/mvapich2-1.4/ +export MV2_USE_IWARP_MODE=1 +export MV2_USE_RDMA_CM=1 + +On each node, add this to the end of /etc/profile. + + ulimit -l 99 + +On each node, add this to the end of /etc/init.d/sshd and restart sshd. + + ulimit -l 99 + % service sshd restart + +Verify the ulimit changes worked. These should show '99': + + % ulimit -l + % ssh peer ulimit -l + +Note: You may have to restart sshd a few times to get it to work.
Re: [ewg] [PATCH] docs: Document NFSRDMA mlx4 firmware requirement
On Wed, Dec 09, 2009 at 02:52:40PM +0200, Or Gerlitz wrote: Tziporet Koren wrote: There was a FW bug with FRWR that was fixed in FW 2.7.0 What is not clear here? exactly, the bug has nothing to do with the connection establishment but rather with fast reg work requests, while the text points towards conn establishment. The issue is with Fast Reg WR, which are needed to setup a NFS RDMA connection. The issue is that the 2.6.0 firmware advertises support for this feature (previous versions did not), but there is bug in the firmware.2.7.0 firmware resolved this issue. Bug 1815 hit this issue, and the originator suggested I document the fix in the OFED docs. If the verbage is confusing, I can reduce it to say that ConnectX requires 2.7.0 firmware for NFSRDMA. Thanks, Jon Or. ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] Re: End user documentation for Infiniband?
On 12/09/2009 06:14 AM, Ross Smith wrote: Hi guys, Sorry if this is a bit of a newbie question, or posted to the wrong place, but I've been struggling to find any documentation for configuring infiniband, and the openfabrics mailing lists appear well and truly dead. The Enterprise Working Group List (ewg@lists.openfabrics.org) is the list for end users. Thanks. -jeff I have a very basic network running now, with three hosts, ipoib and opensm. It's running at 2.5Gbps which I understand is the default, but I cannot for the life of me find any documentation on how to configure it to run at higher speeds. Any network benchmarks are peaking at around 290MB/s, so just under 2.5Gbps. ibstat tells me that the link is running at 20. ibdiagnet tells me that the ipoib subnet is configured for 10Gbps, but that it has no members. I'm guessing that I need to configure the ipoib subnet for 20Gbps, and join the machines to it. However I can't find documentation on how to do either of these. Can anybody help, or point me in the direction of some documentation? thanks, Ross -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] [PATCH] IB/qib: fix backport patches for RHEL 5
Vlad, please pull the following since I made a minor error in one of the backport patches from yesterday. git://git.openfabrics.org/~ralphc/linux-2.6/.git ofed_kernel_1_5 commit 5f541de203009c2bcc13c110f8b70cb976b0e0d2 Author: Ralph Campbell (QLogic) ral...@hosting.openfabrics.org Date: Wed Dec 9 12:16:03 2009 -0800 IB/qib: fix backport patches for RHEL 5 The RHEL 5 backports for the IPoIB device stop deadlock commit had a bug where the wrong work queue was being flushed. This updates the patches to use the correct flush. Signed-off-by: Ralph Campbell ralph.campb...@qlogic.com ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] [PATCH] docs: update nes release notes for OFED 1.5
Signed-off-by: Chien Tung chien.tin.t...@intel.com --- nes_release_notes.txt | 100 ++--- 1 files changed, 53 insertions(+), 47 deletions(-) diff --git a/nes_release_notes.txt b/nes_release_notes.txt index 14f596f..b698766 100644 --- a/nes_release_notes.txt +++ b/nes_release_notes.txt @@ -1,6 +1,6 @@ Open Fabrics Enterprise Distribution (OFED) NetEffect Ethernet Cluster Server Adapter Release Notes - May 2009 + December 2009 @@ -69,7 +69,7 @@ NOTE: Assuming NetEffect Ethernet Cluster Server Adapter is assigned eth2. ethtool -C eth2 rx-usecs-irq 128 - set static interrupt moderation ethtool -C eth2 adaptive-rx on - enable dynamic interrupt moderation -ethtool -C eth2 adaptive-rx off - disable dynamic interrupt moderation +ethtool -C eth2 adaptive-rx off - disable dynamic interrupt moderation ethtool -C eth2 rx-frames-low 16 - low watermark of rx queue for dynamic interrupt moderation ethtool -C eth2 rx-frames-high 256 - high watermark of rx queue for @@ -85,8 +85,8 @@ uDAPL Configuration === Rest of the document assumes the following uDAPL settings in dat.conf: -OpenIB-cma-nes u1.2 nonthreadsafe default libdaplcma.so.1 dapl.1.2 eth2 0 -ofa-v2-nes u2.0 nonthreadsafe default libdaplofa.so.2 dapl.2.0 eth2 0 +OpenIB-iwarp u1.2 nonthreadsafe default libdaplcma.so.1 dapl.1.2 eth2 0 +ofa-v2-iwarp u2.0 nonthreadsafe default libdaplofa.so.2 dapl.2.0 eth2 0 === @@ -98,34 +98,34 @@ Add the following to mpirun command: Example mpirun command with uDAPL-2.0: -mpirun -UDAPL -prot -intra=shm - -e MPI_ICLIB_UDAPL=libdaplofa.so.1 - -e MPI_HASIC_UDAPL=ofa-v2-nes +mpirun -UDAPL -prot -intra=shm + -e MPI_ICLIB_UDAPL=libdaplofa.so.2 + -e MPI_HASIC_UDAPL=ofa-v2-iwarp -1sided -f /opt/hpmpi/appfile Example mpirun command with uDAPL-1.2: -mpirun -UDAPL -prot -intra=shm +mpirun -UDAPL -prot -intra=shm -e MPI_ICLIB_UDAPL=libdaplcma.so.1 - -e MPI_HASIC_UDAPL=OpenIB-cma-nes - -1sided + -e MPI_HASIC_UDAPL=OpenIB-iwarp + -1sided -f /opt/hpmpi/appfile -=== -Recommended Settings for Intel MPI 3.2 -=== + +Recommended Settings for Intel MPI 3.2.x + Add the following to mpiexec command: -genv I_MPI_FALLBACK_DEVICE 0 --genv I_MPI_DEVICE rdma:OpenIB-cma-nes +-genv I_MPI_DEVICE rdma:OpenIB-iwarp -genv I_MPI_RENDEZVOUS_RDMA_WRITE Example mpiexec command line for uDAPL-2.0: mpiexec -genv I_MPI_FALLBACK_DEVICE 0 --genv I_MPI_DEVICE rdma:ofa-v2-nes +-genv I_MPI_DEVICE rdma:ofa-v2-iwarp -genv I_MPI_RENDEZVOUS_RDMA_WRITE -ppn 1 -n 2 /opt/intel/impi/3.2.0.011/bin64/IMB-MPI1 @@ -133,7 +133,7 @@ Example mpiexec command line for uDAPL-2.0: Example mpiexec command line for uDAPL-1.2: mpiexec -genv I_MPI_FALLBACK_DEVICE 0 --genv I_MPI_DEVICE rdma:OpenIB-cma-nes +-genv I_MPI_DEVICE rdma:OpenIB-iwarp -genv I_MPI_RENDEZVOUS_RDMA_WRITE -ppn 1 -n 2 /opt/intel/impi/3.2.0.011/bin64/IMB-MPI1 @@ -146,37 +146,42 @@ Add the following to the mpirun command: -env MV2_USE_RDMA_CM 1 -env MV2_USE_IWARP_MODE 1 - -For larger number of processes, it is also recommended to set the following: - -env MV2_MAX_INLINE_SIZE 64 --env MV2_USE_SRQ 0 +-env MV2_DEFAULT_MAX_CQ_SIZE 32766 +-env MV2_RDMA_CM_MAX_PORT 65535 +-env MV2_VBUF_TOTAL_SIZE 9216 Example mpiexec command line: mpiexec -l -n 2 -env MV2_USE_RDMA_CM 1 --env MV2_USE_IWARP_MODE 1 +-env MV2_USE_IWARP_MODE 1 +-env MV2_MAX_INLINE_SIZE 64 +-env MV2_DEFAULT_MAX_CQ_SIZE 32766 +-env MV2_RDMA_CM_MAX_PORT 65535 +-env MV2_VBUF_TOTAL_SIZE 9216 /usr/mpi/gcc/mvapich2-1.2p1/tests/osu_benchmarks-3.0/osu_latency == Recommended Setting for MVAPICH2 and uDAPL == -Add the following to the mpirun command: +Add the following to the mpirun command for 64 or more processes: --env MV2_PREPOST_DEPTH 59 +-env MV2_ON_DEMAND_THRESHOLD number of processes -Example mpiexec command line: +Example mpirun command with uDAPL-2.0: -mpiexec -l -n 2 --env MV2_DAPL_PROVIDER ofa-v2-nes --env MV2_PREPOST_DEPTH 59 +mpiexec -l -n 64 +-env MV2_DAPL_PROVIDER ofa-v2-iwarp +-env MV2_ON_DEMAND_THRESHOLD 64
[ewg] [PATCH] docs: update ehca release notes for OFED-1.5
Hi Vlad, Signed-off-by: Alexander Schmidt al...@linux.vnet.ibm.com --- ehca_release_notes.txt | 33 + 1 file changed, 17 insertions(+), 16 deletions(-) --- docs.git.orig/ehca_release_notes.txt +++ docs.git/ehca_release_notes.txt @@ -1,8 +1,8 @@ Open Fabrics Enterprise Distribution (OFED) -ehca in OFED 1.4.1 Release Notes +ehca in OFED 1.5 Release Notes - May 2009 + December 2009 Overview @@ -28,7 +28,7 @@ whereby parameter is one of the follow - scaling_codescaling code (0: disable (default), 1: enable) - open_aqp1 Open AQP1 on startup (default: no) (bool) - hw_levelHardware level (0: autosensing (default), 0x10..0x14: eHCA, 0x20..0x23: eHCA2) (int) -- nr_portsnumber of connected ports (-1: autodetect, 1: port one only, 2: two ports (default) (int) +- nr_portsnumber of connected ports (-1: autodetect (default), 1: port one only, 2: two ports) (int) - use_hp_mr Use high performance MRs (default: no) (bool) - poll_all_eqsPoll all event queues periodically (default: yes) (bool) - static_rate Set permanent static rate (default: no static rate) (int) @@ -38,7 +38,14 @@ whereby parameter is one of the follow New Features -- none +- DMEM toleration +- Port autodetection set to default + +Fixed Bugs ofed-1.5 +- +- SRQ overflow prevention +- Performance improvements for QP creation +- MAD redirection fix Fixed Bugs ofed-1.4.1 - @@ -81,23 +88,17 @@ Fixed Bugs ofed-1.3 Available backports --- -- RedHat EL5 up2: 2.6.18-92.ELsmp - RedHat EL5 up3: 2.6.18-128.ELsmp +- RedHat EL5 up4: 2.6.18-164.ELsmp - SLES11: 2.6.27.19-5.1-smp -- SLES10SP1: 2.6.16-53-0.16-smp - SLES10SP2: 2.6.16-60 -- kernel.org: 2.6.24-27 +- SLES10SP3: 2.6.16.60-0.54.5 +- kernel.org: 2.6.27-30 Known Issues -1. The device driver normally uses both ports. For using just one port it is -strongly recommended to set option nr_ports=-1 to enable autodetect mode: - modprobe ib_ehca nr_ports=-1 - -2. Furthermore the port(s) needs to be connected to an active switch port while +1. The port(s) needs to be connected to an active switch port while loading the ehca device driver. -3. Dynamic memory operations are not supported with ehca - -4. Allocating a large number of queue pairs might be time consuming. This will -be fixed in next OFED release. +2. Dynamic memory operations are tolerated by ehca, but are prevented by +the driver while it is loaded. ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ewg] [PATCH] mlx4: Fix bug in mlx4_ib_mcg_attach
This bug doesn't seem to ever have been present in the upstream kernel -- what are you generating this patch against? - R. ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ewg] Re: [PATCH] mlx4_core: Cleanup bug in __mlx4_init_one()
On Wed, Dec 09, 2009 at 02:35:18PM -0800, Roland Dreier wrote: Looks like a valid fix but your patch doesn't apply. I don't seem to have mlx4_cleanup_counters_table() in my tree. I noticed that already. I'll send a new patch in my next working day. ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ewg] [PATCH] IPoIB: synchronize timer deletion with completion handler
When calling del_timer_sync on priv-poll_timer, it is necessary to prevent farther arming of the timer which can be done by a completion handler. Though it is harmless since the timer will eventually stop being rearmed, it is better practice to avoid calling the timer function after it is deleted. This patch handles this by using a new flag that is checked before arming the timer. have you seen this in practice? If it can happen then it's not harmless, since the module could be unloaded with the timer pending. However I don't see how it could happen, since we only seem to delete the timer after we know that no more completions are coming (except for the case where we decide that the hardware is wedged but it really only takes a *long* time to respond at exactly the wrong time, and we somehow get a completion between the del_timer_sync and the modify QP to reset state -- which is so unlikely it seems not worth adding this extra complexity for -- maybe we could add the del_timer_sync to after we delete the CQ or something if you're really worried) ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ewg] [PATCH] mlx4: Fix bug in mlx4_ib_mcg_attach
On Wed, Dec 09, 2009 at 02:33:31PM -0800, Roland Dreier wrote: This bug doesn't seem to ever have been present in the upstream kernel -- what are you generating this patch against? I think it came from your for-next branch. ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ewg] [PATCH] mlx4: Fix bug in mlx4_ib_mcg_attach
This bug doesn't seem to ever have been present in the upstream kernel -- what are you generating this patch against? I think it came from your for-next branch. I don't see anything touching this code there. The patch that introduced this code upstream, 521e575b (IB/mlx4: Add support for blocking multicast loopback packets) doesn't have this bug and I don't see anything else that changed that area of the code. - R. ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] Re: [PATCH] docs: update nes release notes for OFED 1.5
Chien Tung wrote: Signed-off-by: Chien Tung chien.tin.t...@intel.com --- nes_release_notes.txt | 100 ++--- 1 files changed, 53 insertions(+), 47 deletions(-) Applied, Regards, Vladimir ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] Re: [PATCH] docs: update ehca release notes for OFED-1.5
Alexander Schmidt wrote: Hi Vlad, Signed-off-by: Alexander Schmidt al...@linux.vnet.ibm.com --- ehca_release_notes.txt | 33 + 1 file changed, 17 insertions(+), 16 deletions(-) Applied, Regards, Vladimir ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] Re: [GIT PULL] RDMA/nes: Updates for OFED 1.5 RC4
Tung, Chien Tin wrote: Vlad, Please pull nes OFD 1.5 RC4 update from: git://sofa.openfabrics.org/~ctung/ofed-1.5.git ofed_kernel_1_5 for the following commits: Chien Tung (2): RDMA/nes: update for OFED 1.5 RC4 Commit: ac0fd81fbc830f15e3e76d7b4baa2ceb0bfad7b2 RDMA/nes: udpate backports for RC4 Commit: f4996b2c3b72e2cb4c9b99e59f704adff7baad20 Thanks, Chien -- Chien Tung | chien.tin.t...@intel.com Done, Regards, Vladimir ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg