Re: [openib-general] [openfabrics-ewg] OFED 1.1-rc3 is ready
Hi Scott, This was my mistake (I tgz both binary RPMs and not just the source RMPs). I fixed this (removed the binary RPMs). All the rest was not touched. Tziporet -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Scott Weitzenkamp (sweitzen) Sent: Friday, September 01, 2006 4:31 AM To: Tziporet Koren; EWG Cc: OPENIB Subject: Re: [openfabrics-ewg] OFED 1.1-rc3 is ready RC3 includes a bunch of binary RPMS, please remove for RC4. Look at the size of the RC3 tarball vs previous ones: $ ls -s | more total 290848 46512 OFED-1.1-rc1.tgz 0 OFED-1.1-rc1.tgz.md5sum 47048 OFED-1.1-rc2.tgz 0 OFED-1.1-rc2.tgz.md5sum 197288 OFED-1.1-rc3.tgz 0 OFED-1.1-rc3.tgz.md5sum Scott Weitzenkamp SQA and Release Manager Server Virtualization Business Unit Cisco Systems > -Original Message- > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] On Behalf Of > Tziporet Koren > Sent: Thursday, August 31, 2006 9:24 AM > To: EWG > Cc: OPENIB > Subject: [openfabrics-ewg] OFED 1.1-rc3 is ready > > Hi, > > OFED 1.1-RC3 is available on > https://openib.org/svn/gen2/branches/1.1/ofed/releases/ > File: OFED-1.1-rc3.tgz > Please report any issues in bugzilla http://openib.org/bugzilla/ > > Schedule reminder: > == > Next milestones: > RC4 is planned for 7-Sep. It should include critical bug fixes only. > Final release will be on 11 or 12 Sep. > > Owners - please update release notes for RC4. > > Tziporet & Vlad > -- > --- > > Release details: > > Build_id: > OFED-1.1-rc3 > > openib-1.1 (REV=9203) > # User space > https://openib.org/svn/gen2/branches/1.1/src/userspace > Git: > ref: refs/heads/ofed_1_1 > commit 338e942a4ae10d62f2632e6292f85bb1b15d154c > > # MPI > mpi_osu-0.9.7-mlx2.2.0.tgz > openmpi-1.1.1-1.src.rpm > mpitests-2.0-0.src.rpm > > > OS support: > === > Novell: > - SLES 9.0 SP3 > - SLES10 > Redhat: > - Redhat EL4 up3 > - Redhat EL4 up4 > kernel.org: > - Kernel 2.6.17 > > Note: Redhat EL4 up2, Fedora C4 and SuSE Pro 10 were dropped > from the list. > We keep the backport patches for these OSes and make sure > OFED compile and > loaded properly but will not do full QA cycle. > > Systems: > > * x86_64 > * x86 > * ia64 > * ppc64 > > Main changes from OFED-1.1-rc2: > === > 1. Added ehca (IBM) driver. This driver can be compiled on > kernel 2.6.18 > only > 3. Open MPI version update to openmpi-1.1.1-1 > 4. Core: Huge pages registration is supported > 5. IPoIB high availability script supports multicast groups > 6. RHEL4 up4 is now supported > 7. SDP: fixed connection refused problem; get peer name working > 8. libsdp: several bug fixes > > Limitations and known issues: > = > 1. SDP: For Mellanox Sinai HCAs one must use latest FW > version (1.1.000). > 2. SDP: Scalability issue when many connections are opened > 3. SDP: If RTU packet is lost Accept call blocks even if > client connected. > 4. ipath driver is not supported on SLES9 SP3 > 5. Compilation on kernel 2.6.18-rc5 is failing - to be fixed in RC4 > > > Missing features that should be completed for RC4: > == > None > > > > ___ > openfabrics-ewg mailing list > [EMAIL PROTECTED] > http://openib.org/mailman/listinfo/openfabrics-ewg > ___ openfabrics-ewg mailing list [EMAIL PROTECTED] http://openib.org/mailman/listinfo/openfabrics-ewg ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] single rkey
On 8/31/06, yipee <[EMAIL PROTECTED]> wrote: > Hi, > > Is it possible for several memory registrations (using ibv_reg_mr) to have a > single rkey? > Can I add memory registrations to a previous rkey? No this is not possible, In a single memory registration call you can have large buffer but once it is registered with NIC you can not any modifications to it and hence multiple registrations cannot share a same R_Key. > > > thanks, > y > > > > > ___ > openib-general mailing list > openib-general@openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > > ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH] IB/perftest: Fix get_median, size of delta, usage(), worst latency
>> "Michael S. Tsirkin" <[EMAIL PROTECTED]> 8/31/06 2:59 PM >>> > 3) Worst latency is delta[iters - 2] in read_lat.c, not delta[iters - 3]. >> could you explain this last bit please? delta having (iters - 1) elements has index range: 0 to (iters - 2). After sorting delta[iters - 2] is the maximum. Regards, Ganapathi. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [openfabrics-ewg] OFED 1.1-rc3 is ready
RC3 includes a bunch of binary RPMS, please remove for RC4. Look at the size of the RC3 tarball vs previous ones: $ ls -s | more total 290848 46512 OFED-1.1-rc1.tgz 0 OFED-1.1-rc1.tgz.md5sum 47048 OFED-1.1-rc2.tgz 0 OFED-1.1-rc2.tgz.md5sum 197288 OFED-1.1-rc3.tgz 0 OFED-1.1-rc3.tgz.md5sum Scott Weitzenkamp SQA and Release Manager Server Virtualization Business Unit Cisco Systems > -Original Message- > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] On Behalf Of > Tziporet Koren > Sent: Thursday, August 31, 2006 9:24 AM > To: EWG > Cc: OPENIB > Subject: [openfabrics-ewg] OFED 1.1-rc3 is ready > > Hi, > > OFED 1.1-RC3 is available on > https://openib.org/svn/gen2/branches/1.1/ofed/releases/ > File: OFED-1.1-rc3.tgz > Please report any issues in bugzilla http://openib.org/bugzilla/ > > Schedule reminder: > == > Next milestones: > RC4 is planned for 7-Sep. It should include critical bug fixes only. > Final release will be on 11 or 12 Sep. > > Owners - please update release notes for RC4. > > Tziporet & Vlad > -- > --- > > Release details: > > Build_id: > OFED-1.1-rc3 > > openib-1.1 (REV=9203) > # User space > https://openib.org/svn/gen2/branches/1.1/src/userspace > Git: > ref: refs/heads/ofed_1_1 > commit 338e942a4ae10d62f2632e6292f85bb1b15d154c > > # MPI > mpi_osu-0.9.7-mlx2.2.0.tgz > openmpi-1.1.1-1.src.rpm > mpitests-2.0-0.src.rpm > > > OS support: > === > Novell: > - SLES 9.0 SP3 > - SLES10 > Redhat: > - Redhat EL4 up3 > - Redhat EL4 up4 > kernel.org: > - Kernel 2.6.17 > > Note: Redhat EL4 up2, Fedora C4 and SuSE Pro 10 were dropped > from the list. > We keep the backport patches for these OSes and make sure > OFED compile and > loaded properly but will not do full QA cycle. > > Systems: > > * x86_64 > * x86 > * ia64 > * ppc64 > > Main changes from OFED-1.1-rc2: > === > 1. Added ehca (IBM) driver. This driver can be compiled on > kernel 2.6.18 > only > 3. Open MPI version update to openmpi-1.1.1-1 > 4. Core: Huge pages registration is supported > 5. IPoIB high availability script supports multicast groups > 6. RHEL4 up4 is now supported > 7. SDP: fixed connection refused problem; get peer name working > 8. libsdp: several bug fixes > > Limitations and known issues: > = > 1. SDP: For Mellanox Sinai HCAs one must use latest FW > version (1.1.000). > 2. SDP: Scalability issue when many connections are opened > 3. SDP: If RTU packet is lost Accept call blocks even if > client connected. > 4. ipath driver is not supported on SLES9 SP3 > 5. Compilation on kernel 2.6.18-rc5 is failing - to be fixed in RC4 > > > Missing features that should be completed for RC4: > == > None > > > > ___ > openfabrics-ewg mailing list > [EMAIL PROTECTED] > http://openib.org/mailman/listinfo/openfabrics-ewg > ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] Srp question
Makia> We did add this (pardon my typo of max_sects below). I did Makia> find that by appending the max_sect= at the end of the line Makia> we were seeing strange behaviour (it seemed that the parser Makia> added a newline for no reason) and the only fix was to put Makia> it at the beginning of the line. Actually it's probably echo adding the newline. You can use "echo -n" to work around this, or just put the max_sect at the beginning of the line. Makia> Sorry again about the type (I should never attempt to work Makia> off of memory). With the max_sect=4096, and the Makia> srp_sg_tablesize to 256, we are now seeing 512KB IOs. The Makia> new question is is there a way to get this to 1M IOs? Don't know... do you get that with the old IBgold SRP initiator? - R. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] lockdep warnings
Michael> Hi, Roland! I got a load of lockdep warnings after Michael> loading all modules and configuring ipoib. This doesn't Michael> usually happen, not sure what I changed this time. I'm a Michael> bit too busy this week - could you take a look at the Michael> log, please? This would only happen on a mem-full HCA. I just asked Linus to pull the following fix. commit 02113bd77e86386d02a9a606cdad53803a6e2794 Author: Roland Dreier <[EMAIL PROTECTED]> Date: Thu Aug 31 16:43:06 2006 -0700 IB/mthca: Use IRQ safe locks to protect allocation bitmaps It is supposed to be OK to call mthca_create_ah() and mthca_destroy_ah() from any context. However, for mem-full HCAs, these functions use the mthca_alloc() and mthca_free() bitmap helpers, and those helpers use non-IRQ-safe spin_lock() internally. Lockdep correctly warns that this could lead to a deadlock. Fix this by changing mthca_alloc() and mthca_free() to use spin_lock_irqsave(). Signed-off-by: Roland Dreier <[EMAIL PROTECTED]> diff --git a/drivers/infiniband/hw/mthca/mthca_allocator.c b/drivers/infiniband/hw/mthca/mthca_allocator.c index 25157f5..f930e55 100644 --- a/drivers/infiniband/hw/mthca/mthca_allocator.c +++ b/drivers/infiniband/hw/mthca/mthca_allocator.c @@ -41,9 +41,11 @@ #include "mthca_dev.h" /* Trivial bitmap-based allocator */ u32 mthca_alloc(struct mthca_alloc *alloc) { + unsigned long flags; u32 obj; - spin_lock(&alloc->lock); + spin_lock_irqsave(&alloc->lock, flags); + obj = find_next_zero_bit(alloc->table, alloc->max, alloc->last); if (obj >= alloc->max) { alloc->top = (alloc->top + alloc->max) & alloc->mask; @@ -56,19 +58,24 @@ u32 mthca_alloc(struct mthca_alloc *allo } else obj = -1; - spin_unlock(&alloc->lock); + spin_unlock_irqrestore(&alloc->lock, flags); return obj; } void mthca_free(struct mthca_alloc *alloc, u32 obj) { + unsigned long flags; + obj &= alloc->max - 1; - spin_lock(&alloc->lock); + + spin_lock_irqsave(&alloc->lock, flags); + clear_bit(obj, alloc->table); alloc->last = min(alloc->last, obj); alloc->top = (alloc->top + alloc->max) & alloc->mask; - spin_unlock(&alloc->lock); + + spin_unlock_irqrestore(&alloc->lock, flags); } int mthca_alloc_init(struct mthca_alloc *alloc, u32 num, u32 mask, ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] [GIT PULL] please pull infiniband.git
Linus, please pull from master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband.git for-linus This tree is also available from kernel.org mirrors at: git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git for-linus to get a fix for a locking bug found by lockdep: Roland Dreier: IB/mthca: Use IRQ safe locks to protect allocation bitmaps drivers/infiniband/hw/mthca/mthca_allocator.c | 15 +++ 1 files changed, 11 insertions(+), 4 deletions(-) diff --git a/drivers/infiniband/hw/mthca/mthca_allocator.c b/drivers/infiniband/hw/mthca/mthca_allocator.c index 25157f5..f930e55 100644 --- a/drivers/infiniband/hw/mthca/mthca_allocator.c +++ b/drivers/infiniband/hw/mthca/mthca_allocator.c @@ -41,9 +41,11 @@ #include "mthca_dev.h" /* Trivial bitmap-based allocator */ u32 mthca_alloc(struct mthca_alloc *alloc) { + unsigned long flags; u32 obj; - spin_lock(&alloc->lock); + spin_lock_irqsave(&alloc->lock, flags); + obj = find_next_zero_bit(alloc->table, alloc->max, alloc->last); if (obj >= alloc->max) { alloc->top = (alloc->top + alloc->max) & alloc->mask; @@ -56,19 +58,24 @@ u32 mthca_alloc(struct mthca_alloc *allo } else obj = -1; - spin_unlock(&alloc->lock); + spin_unlock_irqrestore(&alloc->lock, flags); return obj; } void mthca_free(struct mthca_alloc *alloc, u32 obj) { + unsigned long flags; + obj &= alloc->max - 1; - spin_lock(&alloc->lock); + + spin_lock_irqsave(&alloc->lock, flags); + clear_bit(obj, alloc->table); alloc->last = min(alloc->last, obj); alloc->top = (alloc->top + alloc->max) & alloc->mask; - spin_unlock(&alloc->lock); + + spin_unlock_irqrestore(&alloc->lock, flags); } int mthca_alloc_init(struct mthca_alloc *alloc, u32 num, u32 mask, ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] Srp question
On 8/31/06 3:34 PM, "Roland Dreier" <[EMAIL PROTECTED]> wrote: > Makia> We are attempting to do some performance testing of the SRP driver > (with a > Makia> DDN target) and are seeing some poor results: > > Makia> ~120MB/s per lun with 1 sgp_dd > Makia> ~80MB/s per lun with 4 sgp_dd > > Makia> Previously we had attempted the same tests with IBGold and got the > Makia> following: > > Makia> ~150MB/s per lun with 1 sgp_dd > Makia> ~600MB/s per lun with 4 sgp_dd > > Were these tests with the same kernels otherwise? If not, there may > be unrelated changes to the SCSI stack that affect synthetic > benchmarks like this. (I seem to remember a change in the not too > distant patch that affected the largest IO it is possible to submit > through the SG interface). The kernels in question were 2.6.9.22.0.2 and 2.6.9-34.EL. I'll have to find some changelogs to see if there were changes to the SCSI stack. > Makia> To achieve the results in IBGold, we were able to set the > Makia> srp module option "max_xfer_sectors_per_io=4096", but can't > Makia> seem to find an equivalent option in the OFED SRP drivers. > > When connecting to the target (the echo to the add_target file), you > can add ",max_sect=4096" to the string you pass in. We did add this (pardon my typo of max_sects below). I did find that by appending the max_sect= at the end of the line we were seeing strange behaviour (it seemed that the parser added a newline for no reason) and the only fix was to put it at the beginning of the line. > Makia> By default, we found (via stats from the DDN) that we were > Makia> only seeing reads and writes in the 0-32Kbyte range. > Makia> Comparing IBGold and OFED, we found that the > Makia> srp_sg_tablesize defaulted to 256, but in OFED it defaulted > Makia> to 12. So, changing this (via modprobe.conf) to 256 in > Makia> OFED, we were able to see reads and writes in the 128Kbyte > Makia> range (which is what ultimately got us to the performance > Makia> above). I also noticed that there is a max_sects option > Makia> you can pass to add_target (in the SRP /sys entries) which > Makia> seemed to be the same idea as srp_sg_tablesize, but this > Makia> didn't seem to affect anything. > > It is "max_sect" not "max_sects" (no final 's'). Anyway, what do you > mean that it didn't affect anything? max_sect=4096 should > theoretically get you up to 512 KB IOs. Sorry again about the type (I should never attempt to work off of memory). With the max_sect=4096, and the srp_sg_tablesize to 256, we are now seeing 512KB IOs. The new question is is there a way to get this to 1M IOs? > - R. -- Makia Minich <[EMAIL PROTECTED]> National Center for Computation Science Oak Ridge National Laboratory ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [Openib-windows] File transfer performance options
There is one thing that is missing from your mail, and that is if you want to see the windows machine as some file server (for example SAMBA, NFS, SRP), or are you ready to accept it as a normal server. The big difference is that on the second option the server can be running at user mode (for example FTP server). When (the server application is) running at user mode, SDP can be used as a socket provider. This means that theoretically every socket application should run and enjoy the speed of Infiniband. Currently there are two projects of SDP under development: one is for Linux and the other for Windows, so SDP can be used to allow machines from both types to connect. Performance that we have measured on the windows platform, using DDR cards was bigger than 1200 MB/Sec. (of course, this data was from host memory, and not from disks). So, if all you need to do is to pass files from one side to the other, I would recommend that you will check this option. One note about your experiments: when using ram disks, this probably means that there is one more copy from the ram disk to the application buffer. A real disk, has it's DMA engine, while a ram disk doesn't. Another copy is probably not a problem when you are talking about 100MB/sec, but it would become a problem once you will use SDP (I hope). Thanks Tzachi We've been testing an application that archives large quantities of data from a Linux system onto a Windows-based server (64bit server 2003 R2). As part of the investigation into relatively modest transfer speeds in the win-linux configuration, we configured a Linux-Linux transfer via IpoIB with NFS layered on top (with ram disks to avoid physical disk issues) [Whilst for a real Linux-Linux configuration I would look for the RDMA over NFS solution, this wouldn't translate to our eventual win-linux inter-operable system.] I was surprised that even on linux-linux I hit a wall of 100MB/s (test notes below). Are others doing better? I was hoping for 150MB/s - 200MB/s Does anyone have any hints on tweaking of an IPoIB/NFS solution to get better throughput for large files (not so concerned about latency). Are there any other inter-operable windows-linux solutions now? (cross-platform NFS over RDMA or SRP initiator/target?) Paul Baxter ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH] cma: protect against adding device during destruction
Quoting r. Sean Hefty <[EMAIL PROTECTED]>: > Subject: [PATCH] cma: protect against adding device during destruction > > Can you see if this patch helps any? > > This closes a window where address resolution can attach an rdma_cm_id > to a device during destruction of the rdma_cm_id. This can result in > the rdma_cm_id remaining in the device list after its memory has been > freed. > > Signed-off-by: Sean Hefty <[EMAIL PROTECTED]> I'll test some, but the problem hasn't reappeared since. The patch looks right, I'd say push it for 2.6.18. -- MST ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH] 2.6.19 cma: fix typo
This was already fixed by the iWARP merge patches (which I'll push out shortly). So I'll drop this patch... ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] Srp question
Makia> We are attempting to do some performance testing of the SRP driver (with a Makia> DDN target) and are seeing some poor results: Makia> ~120MB/s per lun with 1 sgp_dd Makia> ~80MB/s per lun with 4 sgp_dd Makia> Previously we had attempted the same tests with IBGold and got the Makia> following: Makia> ~150MB/s per lun with 1 sgp_dd Makia> ~600MB/s per lun with 4 sgp_dd Were these tests with the same kernels otherwise? If not, there may be unrelated changes to the SCSI stack that affect synthetic benchmarks like this. (I seem to remember a change in the not too distant patch that affected the largest IO it is possible to submit through the SG interface). Makia> To achieve the results in IBGold, we were able to set the Makia> srp module option "max_xfer_sectors_per_io=4096", but can't Makia> seem to find an equivalent option in the OFED SRP drivers. When connecting to the target (the echo to the add_target file), you can add ",max_sect=4096" to the string you pass in. Makia> By default, we found (via stats from the DDN) that we were Makia> only seeing reads and writes in the 0-32Kbyte range. Makia> Comparing IBGold and OFED, we found that the Makia> srp_sg_tablesize defaulted to 256, but in OFED it defaulted Makia> to 12. So, changing this (via modprobe.conf) to 256 in Makia> OFED, we were able to see reads and writes in the 128Kbyte Makia> range (which is what ultimately got us to the performance Makia> above). I also noticed that there is a max_sects option Makia> you can pass to add_target (in the SRP /sys entries) which Makia> seemed to be the same idea as srp_sg_tablesize, but this Makia> didn't seem to affect anything. It is "max_sect" not "max_sects" (no final 's'). Anyway, what do you mean that it didn't affect anything? max_sect=4096 should theoretically get you up to 512 KB IOs. - R. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] Srp question
We are attempting to do some performance testing of the SRP driver (with a DDN target) and are seeing some poor results: ~120MB/s per lun with 1 sgp_dd ~80MB/s per lun with 4 sgp_dd Previously we had attempted the same tests with IBGold and got the following: ~150MB/s per lun with 1 sgp_dd ~600MB/s per lun with 4 sgp_dd To achieve the results in IBGold, we were able to set the srp module option "max_xfer_sectors_per_io=4096", but can't seem to find an equivalent option in the OFED SRP drivers. By default, we found (via stats from the DDN) that we were only seeing reads and writes in the 0-32Kbyte range. Comparing IBGold and OFED, we found that the srp_sg_tablesize defaulted to 256, but in OFED it defaulted to 12. So, changing this (via modprobe.conf) to 256 in OFED, we were able to see reads and writes in the 128Kbyte range (which is what ultimately got us to the performance above). I also noticed that there is a max_sects option you can pass to add_target (in the SRP /sys entries) which seemed to be the same idea as srp_sg_tablesize, but this didn't seem to affect anything. So, my question is, what is the right magic to get SRP up to speed? Thanks... -- Makia Minich <[EMAIL PROTECTED]> National Center for Computation Science Oak Ridge National Laboratory ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] [PATCH] cma: protect against adding device during destruction
Can you see if this patch helps any? This closes a window where address resolution can attach an rdma_cm_id to a device during destruction of the rdma_cm_id. This can result in the rdma_cm_id remaining in the device list after its memory has been freed. Signed-off-by: Sean Hefty <[EMAIL PROTECTED]> --- Index: cma.c === --- cma.c (revision 9192) +++ cma.c (working copy) @@ -283,7 +284,6 @@ static int cma_acquire_ib_dev(struct rdm ib_addr_get_sgid(&id_priv->id.route.addr.dev_addr, &gid); - mutex_lock(&lock); list_for_each_entry(cma_dev, &dev_list, list) { ret = ib_find_cached_gid(cma_dev->device, &gid, &id_priv->id.port_num, NULL); @@ -292,7 +292,6 @@ static int cma_acquire_ib_dev(struct rdm break; } } - mutex_unlock(&lock); return ret; } @@ -781,7 +780,9 @@ void rdma_destroy_id(struct rdma_cm_id * state = cma_exch(id_priv, CMA_DESTROYING); cma_cancel_operation(id_priv, state); + mutex_lock(&lock); if (id_priv->cma_dev) { + mutex_unlock(&lock); switch (rdma_node_get_transport(id->device->node_type)) { case RDMA_TRANSPORT_IB: if (id_priv->cm_id.ib && !IS_ERR(id_priv->cm_id.ib)) @@ -793,8 +794,8 @@ void rdma_destroy_id(struct rdma_cm_id * cma_leave_mc_groups(id_priv); mutex_lock(&lock); cma_detach_from_dev(id_priv); - mutex_unlock(&lock); } + mutex_unlock(&lock); cma_release_port(id_priv); cma_deref_id(id_priv); @@ -1511,16 +1512,26 @@ static void addr_handler(int status, str enum rdma_cm_event_type event; atomic_inc(&id_priv->dev_remove); - if (!id_priv->cma_dev && !status) + + /* +* Grab mutex to block rdma_destroy_id() from removing the device while +* we're trying to acquire it. +*/ + mutex_lock(&lock); + if (!cma_comp_exch(id_priv, CMA_ADDR_QUERY, CMA_ADDR_RESOLVED)) { + mutex_unlock(&lock); + goto out; + } + + if (!status && !id_priv->cma_dev) status = cma_acquire_dev(id_priv); + mutex_unlock(&lock); if (status) { - if (!cma_comp_exch(id_priv, CMA_ADDR_QUERY, CMA_ADDR_BOUND)) + if (!cma_comp_exch(id_priv, CMA_ADDR_RESOLVED, CMA_ADDR_BOUND)) goto out; event = RDMA_CM_EVENT_ADDR_ERROR; } else { - if (!cma_comp_exch(id_priv, CMA_ADDR_QUERY, CMA_ADDR_RESOLVED)) - goto out; memcpy(&id_priv->id.route.addr.src_addr, src_addr, ip_addr_size(src_addr)); event = RDMA_CM_EVENT_ADDR_RESOLVED; @@ -1747,8 +1758,11 @@ int rdma_bind_addr(struct rdma_cm_id *id if (!cma_any_addr(addr)) { ret = rdma_translate_ip(addr, &id->route.addr.dev_addr); - if (!ret) + if (!ret) { + mutex_lock(&lock); ret = cma_acquire_dev(id_priv); + mutex_unlock(&lock); + } if (ret) goto err; } ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] [PATCH] 2.6.19 cma: fix typo
Comma should be semi-colon Signed-off-by: Sean Hefty <[EMAIL PROTECTED]> --- Please queue for 2.6.19 diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c index d6f99d5..bf20410 100644 --- a/drivers/infiniband/core/cma.c +++ b/drivers/infiniband/core/cma.c @@ -265,7 +265,7 @@ static int cma_acquire_ib_dev(struct rdm union ib_gid gid; int ret = -ENODEV; - ib_addr_get_sgid(&id_priv->id.route.addr.dev_addr, &gid), + ib_addr_get_sgid(&id_priv->id.route.addr.dev_addr, &gid); mutex_lock(&lock); list_for_each_entry(cma_dev, &dev_list, list) { ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] OFED 1.1-rc3 is ready
Hi, OFED 1.1-RC3 is available on https://openib.org/svn/gen2/branches/1.1/ofed/releases/ File: OFED-1.1-rc3.tgz Please report any issues in bugzilla http://openib.org/bugzilla/ Schedule reminder: == Next milestones: RC4 is planned for 7-Sep. It should include critical bug fixes only. Final release will be on 11 or 12 Sep. Owners - please update release notes for RC4. Tziporet & Vlad - Release details: Build_id: OFED-1.1-rc3 openib-1.1 (REV=9203) # User space https://openib.org/svn/gen2/branches/1.1/src/userspace Git: ref: refs/heads/ofed_1_1 commit 338e942a4ae10d62f2632e6292f85bb1b15d154c # MPI mpi_osu-0.9.7-mlx2.2.0.tgz openmpi-1.1.1-1.src.rpm mpitests-2.0-0.src.rpm OS support: === Novell: - SLES 9.0 SP3 - SLES10 Redhat: - Redhat EL4 up3 - Redhat EL4 up4 kernel.org: - Kernel 2.6.17 Note: Redhat EL4 up2, Fedora C4 and SuSE Pro 10 were dropped from the list. We keep the backport patches for these OSes and make sure OFED compile and loaded properly but will not do full QA cycle. Systems: * x86_64 * x86 * ia64 * ppc64 Main changes from OFED-1.1-rc2: === 1. Added ehca (IBM) driver. This driver can be compiled on kernel 2.6.18 only 3. Open MPI version update to openmpi-1.1.1-1 4. Core: Huge pages registration is supported 5. IPoIB high availability script supports multicast groups 6. RHEL4 up4 is now supported 7. SDP: fixed connection refused problem; get peer name working 8. libsdp: several bug fixes Limitations and known issues: = 1. SDP: For Mellanox Sinai HCAs one must use latest FW version (1.1.000). 2. SDP: Scalability issue when many connections are opened 3. SDP: If RTU packet is lost Accept call blocks even if client connected. 4. ipath driver is not supported on SLES9 SP3 5. Compilation on kernel 2.6.18-rc5 is failing - to be fixed in RC4 Missing features that should be completed for RC4: == None ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] [PATCH] IB/srp: destroy and re-create QP and CQ on reconnect
Hello, Roland! Please consider the following for 2.6.19. --- >From: Ishai Rabinovitz <[EMAIL PROTECTED]> For some reason (could be a firmware problem) I got a CQ overrun in SRP. Because of that there was a QP FATAL. Since in srp_reconnect_target we are not destroying the QP, the QP FATAL persists after the reconnect. In order to be able to recover from such situation I suggest we destroy the CQ and the QP in every reconnect. This also corrects a minor spec in-compliance - when srp_reconnect_target is called, srp destroys the CM ID and resets the QP, the new connection will be retried with the same QPN which could theoretically lead to stale packets (for strict spec compliance I think QPN should not be reused till all stale packets are flushed out of the network). --- IB/srp: destroy/re-create QP and CQ on each reconnect. This makes SRP more robust in presence of hardware errors and is closer to behaviour suggested by IB spec, reducing chance of stale packets. Signed-off-by: Ishai Rabinovitz <[EMAIL PROTECTED]> Signed-off-by: Michael S. Tsirkin <[EMAIL PROTECTED]> Index: last_stable/drivers/infiniband/ulp/srp/ib_srp.c === --- last_stable.orig/drivers/infiniband/ulp/srp/ib_srp.c2006-08-31 12:23:52.0 +0300 +++ last_stable/drivers/infiniband/ulp/srp/ib_srp.c 2006-08-31 12:30:48.0 +0300 @@ -495,10 +495,10 @@ static int srp_reconnect_target(struct srp_target_port *target) { struct ib_cm_id *new_cm_id; - struct ib_qp_attr qp_attr; struct srp_request *req, *tmp; - struct ib_wc wc; int ret; + struct ib_cq *old_cq; + struct ib_qp *old_qp; spin_lock_irq(target->scsi_host->host_lock); if (target->state != SRP_TARGET_LIVE) { @@ -522,17 +522,17 @@ ib_destroy_cm_id(target->cm_id); target->cm_id = new_cm_id; - qp_attr.qp_state = IB_QPS_RESET; - ret = ib_modify_qp(target->qp, &qp_attr, IB_QP_STATE); - if (ret) - goto err; - - ret = srp_init_qp(target, target->qp); - if (ret) + old_qp = target->qp; + old_cq = target->cq; + ret = srp_create_target_ib(target); + if (ret) { + target->qp = old_qp; + target->cq = old_cq; goto err; + } - while (ib_poll_cq(target->cq, 1, &wc) > 0) - ; /* nothing */ + ib_destroy_qp(old_qp); + ib_destroy_cq(old_cq); spin_lock_irq(target->scsi_host->host_lock); list_for_each_entry_safe(req, tmp, &target->req_queue, list) -- MST ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] single rkey
yipee wrote: > Hi, > > Is it possible for several memory registrations (using ibv_reg_mr) to have a > single rkey? > Can I add memory registrations to a previous rkey? > > > thanks, > y > I believe that the answer is no. Dotan ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] single rkey
Hi, Is it possible for several memory registrations (using ibv_reg_mr) to have a single rkey? Can I add memory registrations to a previous rkey? thanks, y ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] [PATCH] IB/cm: do not track remote QPN in timewait state
Roland, please queue for 2.6.19. --- IB/cm: fix spurious rejects with bogus stale connection syndrome. CM should not track remote QPN in TimeWait, since QP is not connected. Signed-off-by: Michael S. Tsirkin <[EMAIL PROTECTED]> Acked-by: Sean Hefty <[EMAIL PROTECTED]> diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c index f85c97f..e270311 100644 --- a/drivers/infiniband/core/cm.c +++ b/drivers/infiniband/core/cm.c @@ -679,6 +679,8 @@ static void cm_enter_timewait(struct cm_ { int wait_time; + cm_cleanup_timewait(cm_id_priv->timewait_info); + /* * The cm_id could be destroyed by the user before we exit timewait. * To protect against this, we search for the cm_id after exiting -- MST ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH] IB/perftest: Fix get_median, size of delta, usage(), worst latency
Quoting r. CH Ganapathi <[EMAIL PROTECTED]>: > o Fix get_median. > o Change usage() in write_bw.c to match the actual default of exchanges. > o Fix worst latency in read_lat.c. > o Allocate only the necessary (iters - 1) elements for delta. > > Signed-off-by: Ganapathi CH <[EMAIL PROTECTED]> Thanks, applied. -- MST ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH] perftest: enhancement to rdma_lat to allow use of RDMA CM
Quoting r. Pradipta Kumar Banerjee <[EMAIL PROTECTED]>: > Subject: [PATCH] perftest: enhancement to rdma_lat to allow use of RDMA CM > > Hi Michael, > This patch contains changes to the rdma_lat.c to allow use of RDMA CM. > This has been successfully tested with Ammasso iWARP cards, IBM eHCA and > mthca IB > cards. > > Summary of changes > > # Added an option (-c|--cma) to enable use of RDMA CM > # Added a new structure (struct pp_data) containing the user parameters as > well > as other data required by most of the routines. This makes it convenient to > pass the parameters between various routines. > # Outputs to stdout/stderr are prefixed with the process-id. This helps to > sort the output when multiple servers/clients are run from the same machine > > Signed-off-by: Pradipta Kumar Banerjee <[EMAIL PROTECTED]> Thanks, applied. -- MST ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH] IB/perftest: Fix get_median, size of delta, usage(), worst latency
3) Worst latency is delta[iters - 2] in read_lat.c, not delta[iters - 3]. >> could you explain this last bit please? -- MST ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] [PATCH] IB/perftest: Fix get_median, size of delta, usage(), worst latency
Hi, 1) When iters (exchanges) is even, delta has odd no.of elements and when iters is odd, delta has even no.of elements. Hence when (iters - 1) is passed get_median() uses incorrect indexes to find the median. For example: When iters = 2 , get_median returns median = (delta[0] + delta[-1])/2 when it should have been median = delta[0]. When iters = 3 get_median returns median = delta[1] when actually it should have been median = (delta[0] + delta[1])/2. 2) The array delta requires only (iters - 1) size to be allocated. 3) Worst latency is delta[iters - 2] in read_lat.c, not delta[iters - 3]. 4) usage() in write_bw.c incorrectly states default exchanges as 1000. Thanks, Ganapathi Novell Inc. The following patch includes: o Fix get_median. o Change usage() in write_bw.c to match the actual default of exchanges. o Fix worst latency in read_lat.c. o Allocate only the necessary (iters - 1) elements for delta. Signed-off-by: Ganapathi CH <[EMAIL PROTECTED]> Index: userspace/perftest/read_lat.c === --- userspace/perftest/read_lat.c (revision 9196) +++ userspace/perftest/read_lat.c (working copy) @@ -568,7 +568,7 @@ */ static inline cycles_t get_median(int n, cycles_t delta[]) { - if (n % 2) + if ((n - 1) % 2) return(delta[n / 2] + delta[n / 2 - 1]) / 2; else return delta[n / 2]; @@ -591,7 +591,7 @@ cycles_t median; unsigned int i; const char* units; - cycles_t *delta = malloc(iters * sizeof *delta); + cycles_t *delta = malloc((iters - 1) * sizeof *delta); if (!delta) { perror("malloc"); @@ -627,7 +627,7 @@ median = get_median(iters - 1, delta); printf("%7d%d%7.2f%7.2f %7.2f\n", size,iters,delta[0] / cycles_to_units , - delta[iters - 3] / cycles_to_units ,median / cycles_to_units ); + delta[iters - 2] / cycles_to_units ,median / cycles_to_units ); free(delta); } Index: userspace/perftest/write_bw.c === --- userspace/perftest/write_bw.c (revision 9196) +++ userspace/perftest/write_bw.c (working copy) @@ -509,7 +509,7 @@ printf(" -s, --size= size of message to exchange (default 65536)\n"); printf(" -a, --all Run sizes from 2 till 2^23\n"); printf(" -t, --tx-depth= size of tx queue (default 100)\n"); - printf(" -n, --iters= number of exchanges (at least 2, default 1000)\n"); + printf(" -n, --iters= number of exchanges (at least 2, default 5000)\n"); printf(" -b, --bidirectional measure bidirectional bandwidth (default unidirectional)\n"); printf(" -V, --version display version number\n"); } Index: userspace/perftest/rdma_lat.c === --- userspace/perftest/rdma_lat.c (revision 9196) +++ userspace/perftest/rdma_lat.c (working copy) @@ -516,7 +516,7 @@ */ static inline cycles_t get_median(int n, cycles_t delta[]) { - if (n % 2) + if ((n - 1) % 2) return (delta[n / 2] + delta[n / 2 - 1]) / 2; else return delta[n / 2]; @@ -538,7 +538,7 @@ cycles_t median; unsigned int i; const char* units; - cycles_t *delta = malloc(iters * sizeof *delta); + cycles_t *delta = malloc((iters - 1) * sizeof *delta); if (!delta) { perror("malloc"); Index: userspace/perftest/send_lat.c === --- userspace/perftest/send_lat.c (revision 9196) +++ userspace/perftest/send_lat.c (working copy) @@ -678,7 +678,7 @@ */ static inline cycles_t get_median(int n, cycles_t delta[]) { - if (n % 2) + if ((n - 1) % 2) return(delta[n / 2] + delta[n / 2 - 1]) / 2; else return delta[n / 2]; @@ -701,7 +701,7 @@ cycles_t median; unsigned int i; const char* units; - cycles_t *delta = malloc(iters * sizeof *delta); + cycles_t *delta = malloc((iters - 1) * sizeof *delta); if (!delta) { perror("malloc"); Index: userspace/perftest/write_lat.c === --- userspace/perftest/write_lat.c (revision 9196) +++ userspace/perftest/write_lat.c (working copy) @@ -579,7 +579,7 @@ */ static inline cycles_t get_median(int n, cycles_t delta[]) { - if (n % 2) + if ((n - 1) % 2) return(delta[n / 2] + delta[n / 2 - 1]) / 2; else return delta[n / 2]; @@ -602,7 +602,7 @@ cycles_t median; unsigned int i; const char* u
Re: [openib-general] [PATCH] libibcm: Need to include stddef.h in cm.c for SLES10 compilations
Sean Hefty wrote: > Jack Morgenstein wrote: > >> Fix compilation on SLES10: >> cm.c uses offsetof, so it must include stddef.h >> > > Thanks - committed in 9150. > > I checked this libibcm with multithreaded test (qp_test) and it is working with no problems. thanks Dotan ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] ibv_poll_cq
Sunil Patil wrote: > I am using socket based communication for exchanging intial > information such as lid,qpn,psn, in fact, more or less the same code > that is there in the examples. Is there any CM based example that I > can look at? > > Regards, > John in: https://openib.org/svn/gen2/trunk/src/userspace/libibcm/examples you can find some libibcm examples. Anyway, if you are using sockets you should sync between the two sides before you use the QPs (sync between them after they both in at least the RTR state). Dotan ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] ibv_poll_cq
I am using socket based communication for exchanging intial information such as lid,qpn,psn, in fact, more or less the same code that is there in the examples. Is there any CM based example that I can look at? Regards, John On 8/31/06, Dotan Barak <[EMAIL PROTECTED]> wrote: Hi.john t wrote:> Hi Dotan>> Is there a way to know if the two QPs (local and remote) are in sync > or to wait for them to get in sync and then do the data transfer.>> I think in my case it is more like one QP is sending the message but> the other end (receiver) is not in RTR state at that time (since > sender and receiver are implemented as threads, may be receiver thread> on the other machine is getting scheduled very late).>> Is there a way where I can specifiy infinite retry_count/timeout or > find out if remote QP is in RTR state (or error state) and only then> do the actual data tranfer.>Sorry, but the answer is no: there isn't any way for a local QP to knowthe state of the remote QP . This is exactly the role of the CM: to sync between the two QPs and tomove the various attributes between the two sides.how do you connect the two QPs?(are you using the CM or a socket based communication?) Dotan___openib-general mailing listopenib-general@openib.org http://openib.org/mailman/listinfo/openib-generalTo unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] ibv_poll_cq
Hi. john t wrote: > Hi Dotan > > Is there a way to know if the two QPs (local and remote) are in sync > or to wait for them to get in sync and then do the data transfer. > > I think in my case it is more like one QP is sending the message but > the other end (receiver) is not in RTR state at that time (since > sender and receiver are implemented as threads, may be receiver thread > on the other machine is getting scheduled very late). > > Is there a way where I can specifiy infinite retry_count/timeout or > find out if remote QP is in RTR state (or error state) and only then > do the actual data tranfer. > Sorry, but the answer is no: there isn't any way for a local QP to know the state of the remote QP . This is exactly the role of the CM: to sync between the two QPs and to move the various attributes between the two sides. how do you connect the two QPs? (are you using the CM or a socket based communication?) Dotan ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general