[PATCH] mlx4: Change a warning message to debug

2010-12-08 Thread Eli Cohen
This workaround presented in 58d74bb is not something we should alert the user
on.  Debug level message is enough.

Signed-off-by: Eli Cohen e...@mellanox.co.il
---
 drivers/net/mlx4/fw.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/net/mlx4/fw.c b/drivers/net/mlx4/fw.c
index 7a7e18b..9f415df 100644
--- a/drivers/net/mlx4/fw.c
+++ b/drivers/net/mlx4/fw.c
@@ -290,7 +290,7 @@ int mlx4_QUERY_DEV_CAP(struct mlx4_dev *dev, struct 
mlx4_dev_cap *dev_cap)
dev_cap-bf_reg_size = 1  (field  0x1f);
MLX4_GET(field, outbox, 
QUERY_DEV_CAP_LOG_MAX_BF_REGS_PER_PAGE_OFFSET);
if ((1  (field  0x3f))  (PAGE_SIZE / dev_cap-bf_reg_size)) 
{
-   mlx4_warn(dev, firmware bug: log2 # of blue flame regs 
is invalid (%d), forcing 3\n, field  0x1f);
+   mlx4_dbg(dev, firmware bug: log2 # of blue flame regs 
is invalid (%d), forcing 3\n, field  0x1f);
field = 3;
}
dev_cap-bf_regs_per_page = 1  (field  0x3f);
-- 
1.7.3.2

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: ibv_post_send/recv kernel path optimizations

2010-12-08 Thread Walukiewicz, Miroslaw
Or, 

I don't see why the ib uverbs flow (BTW - the data path has nothing to do with 
the 
rdma_cm, you're working with /dev/infiniband/uverbsX), can't be enhanced 
e.g to support shared-page which is allocated  mmaped from uverbs to 
user space and used in the same manner your implementation does.

The problem that I see is that the mmap is currently used for mapping of 
doorbell page in different drivers.

We can use it for mapping a page for transmit/receive operation when we are 
able to differentiate that we need to map 
Doorbell or our shared page. 

The second problem is that this rx/tx mmap should map the separate page per QP 
to avoid the unnecessary QP lookups so page identifier passed to mmap should be 
based on QP identifier. 

I cannot find a specific code for /dev/infiniband/uverbsX. Is this device 
driver sharing the same functions like /dev/infiniband/rdmacm or it has own 
implementation. 

Mirek

-Original Message-
From: Or Gerlitz [mailto:ogerl...@voltaire.com] 
Sent: Wednesday, December 01, 2010 9:12 AM
To: Walukiewicz, Miroslaw; Jason Gunthorpe; Roland Dreier
Cc: Roland Dreier; Hefty, Sean; linux-rdma@vger.kernel.org
Subject: Re: ibv_post_send/recv kernel path optimizations

On 11/26/2010 1:56 PM, Walukiewicz, Miroslaw wrote:
 Form the trace it looks like the __up_read() - 11% wastes most of time. It is 
 called from idr_read_qp when a  put_uobj_read is called. if 
 (copy_from_user(cmd, buf, sizeof cmd))  - 5% it is called twice from 
 ib_uverbs_post_send() for IMA and once in ib_uverbs_write() per each frame... 
 and __kmalloc/kfree - 5% is the third function that has a big meaning. It is 
 called twice for each frame transmitted. It is about 20% of performance loss 
 comparing to nes_ud_sksq path which we miss when we use a OFED path.

 What I can modify is a kmalloc/kfree optimization - it is possible to make 
 allocation only at start and use pre-allocated buffers. I don't see any way 
 for optimalization of idr_read_qp usage or copy_user. In current approach we 
 use a shared page and a separate nes_ud_sksq handle for each created QP so 
 there is no need for any user space data copy or QP lookup.
As was mentioned earlier on this thread, and repeated here, the 
kmalloc/kfree can be removed, as or the 2nd copy_from_user, I don't see 
why the ib uverbs flow (BTW - the data path has nothing to do with the 
rdma_cm, you're working with /dev/infiniband/uverbsX), can't be enhanced 
e.g to support shared-page which is allocated  mmaped from uverbs to 
user space and used in the same manner your implementation does. The 1st 
copy_from_user should add pretty nothing and if it does, it can be 
replaced with different user/kernel IPC mechanism which costs less. So 
we're basically remained with the idr_read_qp, I wonder what other 
people think if/how this can be optimized?

Or.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V2 5/5] RDMA CM: Netlink Client

2010-12-08 Thread Nir Muchtar
On Tue, 2010-12-07 at 14:29 -0700, Jason Gunthorpe wrote:

 What you've done in your v2 patch won't work if the table you are
 dumping is too large, once you pass sk_rmem_alloc for the netlink
 socket it will deadlock. The purpose of dump_start is to avoid that
 deadlock. (review my past messages on the subject)
 
 Your v1 patch wouldn't deadlock, but it would fail to dump with
 ENOMEM, and provides an avenue to build an unprivileged kernel OOM
 DOS.
 
 The places in the kernel that don't use dump_start have to stay under
 sk_rmem_alloc.
 
 Jason

Sorry, I still need some clarifications...
When you say deadlocks, do you mean when calling malloc with a lock or
when overflowing a socket receive buffer?
For the second case, when we use netlink_unicast, the skbuff is sent and
freed. It is transferred to the userspace's socket using netlink_sendskb
and accumulated in its recv buff.

Are you referring to a deadlock there? I still fail to see the issue.
Why would the kernel socket recv buff reach a limit? Could you please
elaborate?

Thanks,
Nir

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: NFS-RDMA hangs: connection closed (-103)

2010-12-08 Thread Spelic
Tom, have you reproduced the RDMA hangs - connection closes bug or the 
sparse file at server side upon NFS hitting ENOSPC ?


Because for the latter people have already given exhaustive explanation: 
see this other thread at 
http://fossplanet.com/f13/%5Blinux-lvm%5D-bugs-mkfs-xfs-device-mapper-xfs-dev-ram-81653/ 



While the former bug is still open and very interesting for us.

Thanks for your help
S.


On 12/07/2010 05:12 PM, Tom Tucker wrote:

Status update...

I have reproduced the bug a number of different ways. It seems to be 
most easily reproduced by simply writing more data than the filesystem 
has space for. I can do this reliably with any FS. I think the XFS bug 
may have tickled this bug somehow.


Tom

On 12/2/10 1:09 PM, Spelic wrote:

Hello all
please be aware that the file oversize bug is reproducible also 
without infiniband, with just nfs over ethernet over xfs over ramdisk 
(but it doesn't hang, so it's a different bug than the one I posted 
here at the RDMA mailing list)
I have posted another thread regarding the file oversize bug, which 
you can read in the LVM, XFS, and LKML mailing lists, please have a look
http://fossplanet.com/f13/%5Blinux-lvm%5D-bugs-mkfs-xfs-device-mapper-xfs-dev-ram-81653/ 

Especially my second post, replying myself at +30 minutes, explains 
that it's reproducible also with ethernet.


Thank you

On 12/02/2010 07:37 PM, Roland Dreier wrote:

Adding Dave Chinner to the cc list, since he's both an XFS guru as well
as being very familiar with NFS and RDMA...

Dave, if you read below, it seems there is some strange behavior
exporting XFS with NFS/RDMA.

  - R.

  On 12/02/2010 12:59 AM, Tom Tucker wrote:
   Spelic,
 
   I have seen this problem before, but have not been able to 
reliably
   reproduce it. When I saw the problem, there were no transport 
errors

   and it appeared as if the I/O had actually completed, but that the
   waiter was not being awoken. I was not able to reliably reproduce
   the problem and was not able to determine if the problem was a
   latent bug in NFS in general or a bug in the RDMA transport in
   particular.
 
   I will try your setup here, but I don't have a system like 
yours so

   I'll have to settle for a smaller ramdisk, however, I have a few
   questions:
 
   - Does the FS matter? For example, can you use ext[2-4] on the
   ramdisk and not still reproduce
   - As I mentioned earlier NFS v3 vs. NFS v4
   - RAMDISK size, i.e. 2G vs. 14G
 
   Thanks,
   Tom

  Hello Tom, thanks for replying

  - The FS matters to some extent: as I wrote, with ext4 it's not
  possible to reproduce the bug in this way, so immediately and
  reliably, however ext4 also will hang eventually if you work on 
it for

  hours so I had to switch to IPoIB for our real work; reread my
  previous post.

  - NFS3 not tried yet. Never tried to do RDMA on NFS3... do you 
have a

  pointer on instructions?


  - RAMDISK size: I am testing it.

  Ok I confirm with 1.5GB ramdisk it's reproducible.
  boot option ramdisk_size=1572864
  (1.5*1024**2=1572864.0)
  confirm: blockdev --getsize64 /dev/ram0 == 1610612736

  now at server side mkfs and mount with defaults:
  mkfs.xfs /dev/ram0
  mount /dev/ram0 /mnt/ram
  (this is a simplification over my previous email, and it's needed 
with

  a smaller ramdisk or mkfs.xfs will refuse to work. The bug is still
  reproducible like this)


  DOH! another bug:
  It's strange how at the end of the test
  ls -lh /mnt/ram
  at server side will show a zerofile larger than 1.5GB at the end of
  the procedure, sometimes it's 3GB, sometimes it's 2.3GB... but it's
  larger than the ramdisk size.

  # ll -h /mnt/ram
  total 1.5G
  drwxr-xr-x 2 root root   21 2010-12-02 12:54 ./
  drwxr-xr-x 3 root root 4.0K 2010-11-29 23:51 ../
  -rw-r--r-- 1 root root 2.3G 2010-12-02 12:59 zerofile
  # df -h
  FilesystemSize  Used Avail Use% Mounted on
  /dev/sda1 294G  4.1G  275G   2% /
  devtmpfs  7.9G  184K  7.9G   1% /dev
  none  7.9G 0  7.9G   0% /dev/shm
  none  7.9G  100K  7.9G   1% /var/run
  none  7.9G 0  7.9G   0% /var/lock
  none  7.9G 0  7.9G   0% /lib/init/rw
  /dev/ram0 1.5G  1.5G   20K 100% /mnt/ram

  # dd if=/mnt/ram/zerofile | wc -c
  4791480+0 records in
  4791480+0 records out
  2453237760
  2453237760 bytes (2.5 GB) copied, 8.41821 s, 291 MB/s

  It seems there is also an XFS bug here...

  This might help triggering the bug however please note than ext4
  (nfs-rdma over it) also hanged on us and it was real work on HDD 
disks

  and they were not full... after switching to IPoIB it didn't hang
  anymore.

  On IPoIB the size problem also shows up: final file is 2.3GB instead
  of  1.5GB, however nothing hangs:

  # echo begin; dd if=/dev/zero of=/mnt/nfsram/zerofile bs=1M ; echo
  syncing now ; time sync ; echo finished
  begin
  dd: writing `/mnt/nfsram/zerofile': Input/output error
  2497+0 

ibnetdiscover issue

2010-12-08 Thread Tom Ammon

Hi,

I get the following when I try to run ibnetdiscover from a server 
plugged in to a voltaire 4036 switch. We're using OFED 1.5.2:


[r...@sm1 ~]# ibnetdiscover
src/chassis.c:535; Unexpected node found: guid 0x0008f1050075134c
ibnetdiscover: iberror: failed: discover failed


However, ibdiagnet runs fine:

[r...@sm1 ~]# ibdiagnet
Loading IBDIAGNET from: /usr/lib64/ibdiagnet1.5.4
-W- Topology file is not specified.
 Reports regarding cluster links will use direct routes.
Loading IBDM from: /usr/lib64/ibdm1.5.4
-I- Using port 1 as the local port.
-I- Discovering ... 277 nodes (23 Switches  254 CA-s) discovered.


-I---
-I- Bad Guids/LIDs Info
-I---
-I- No bad Guids were found

-I---
-I- Links With Logical State = INIT
-I---
-I- No bad Links (with logical state = INIT) were found

-I---
-I- General Device Info
-I---

-I---
-I- PM Counters Info
-I---
-W- lid=0x0007 guid=0x0008f105006515ba dev=23131 Port=33
   Performance Monitor counter : Value
   link_error_recovery_counter : 0xff (overflow)
-W- lid=0x0010 guid=0x0008f10500201d7c dev=23130 Port=14
   Performance Monitor counter : Value
   symbol_error_counter: 0x (overflow)
-W- lid=0x0001 guid=0x0008f10500108a76 dev=23130 Port=30
   Performance Monitor counter : Value
   symbol_error_counter: 0x (overflow)

-I---
-I- Fabric Partitions Report (see ibdiagnet.pkey for a full hosts list)
-I---
-I-PKey:0x7fff Hosts:254 full:254 limited:0

-I---
-I- IPoIB Subnets Check
-I---
-I- Subnet: IPv4 PKey:0x7fff QKey:0x0b1b MTU:2048Byte rate:10Gbps
SL:0x00

-I---
-I- Bad Links Info
-I- No bad link were found
-I---

-I- Stages Status Report:
 STAGEErrors Warnings
 Bad GUIDs/LIDs Check 0  0
 Link State Active Check  0  0
 General Devices Info Report  0  0
 Performance Counters Report  0  3
 Partitions Check 0  0
 IPoIB Subnets Check  0  0

Please see /tmp/ibdiagnet.log for complete log


-I- Done. Run time was 21 seconds.

Any ideas?

Tom

--
Tom Ammon
Network Engineer
Office: 801.587.0976
Mobile: 801.674.9273

Center for High Performance Computing
University of Utah
http://www.chpc.utah.edu
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V2 5/5] RDMA CM: Netlink Client

2010-12-08 Thread Jason Gunthorpe
On Wed, Dec 08, 2010 at 04:55:22PM +0200, Nir Muchtar wrote:
 On Tue, 2010-12-07 at 14:29 -0700, Jason Gunthorpe wrote:
 
  What you've done in your v2 patch won't work if the table you are
  dumping is too large, once you pass sk_rmem_alloc for the netlink
  socket it will deadlock. The purpose of dump_start is to avoid that
  deadlock. (review my past messages on the subject)
  
  Your v1 patch wouldn't deadlock, but it would fail to dump with
  ENOMEM, and provides an avenue to build an unprivileged kernel OOM
  DOS.
  
  The places in the kernel that don't use dump_start have to stay under
  sk_rmem_alloc.
  
  Jason
 
 Sorry, I still need some clarifications...
 When you say deadlocks, do you mean when calling malloc with a lock or
 when overflowing a socket receive buffer?
 For the second case, when we use netlink_unicast, the skbuff is sent and
 freed. It is transferred to the userspace's socket using netlink_sendskb
 and accumulated in its recv buff.
 
 Are you referring to a deadlock there? I still fail to see the issue.
 Why would the kernel socket recv buff reach a limit? Could you please
 elaborate?

Netlink is all driven from user space syscalls.. so it looks like

sendmsg()
[..]
ibnl_rcv_msg
cma_get_stats
[..]
ibnl_unicast
[..]
netlink_attachskb
(now we block on the socket recv queue once it fills)

The deadlock is that userspace is sitting in sendmsg() while the
kernel is sleeping in netlink_attachskb waiting for the recvbuf to
empty.

User space cannot call recvmsg() while it is in blocked in sendmsg()
so it all goes boom.

Even if cma_get_stats was executed from a kernel thread and
ibnl_rcv_msg returned back to userspace you still hold the dev_list
mutex while calling ibnl_unicast, which can sleep waiting on
userspace, which creates an easy DOS against the RDMA CM (I can write
a program that causes the kernel the hold the mutx indefinitely).

You can't hold the mutex while sleeping for userspace, so you have to
unlock it. If you unlock it you have to fixup your position when you
re-lock it. If you can fixup your position then you can use
dump_start.

I don't see malloc being a concern anywhere in what you've done...

Jason
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: ibnetdiscover issue

2010-12-08 Thread Hal Rosenstock

Hi Tom,

On 12/8/2010 12:48 PM, Tom Ammon wrote:

Hi,

I get the following when I try to run ibnetdiscover from a server
plugged in to a voltaire 4036 switch. We're using OFED 1.5.2:

[r...@sm1 ~]# ibnetdiscover
src/chassis.c:535; Unexpected node found: guid 0x0008f1050075134c
ibnetdiscover: iberror: failed: discover failed


Looks to me like there's a missing is_spine_4200() clause missing in 
get_router_slot in libibnetdisc/src/chassis.c. Eli had added changes to 
support the 4200 so he's the best one to comment.


-- Hal



However, ibdiagnet runs fine:

[r...@sm1 ~]# ibdiagnet
Loading IBDIAGNET from: /usr/lib64/ibdiagnet1.5.4
-W- Topology file is not specified.
Reports regarding cluster links will use direct routes.
Loading IBDM from: /usr/lib64/ibdm1.5.4
-I- Using port 1 as the local port.
-I- Discovering ... 277 nodes (23 Switches  254 CA-s) discovered.


-I---
-I- Bad Guids/LIDs Info
-I---
-I- No bad Guids were found

-I---
-I- Links With Logical State = INIT
-I---
-I- No bad Links (with logical state = INIT) were found

-I---
-I- General Device Info
-I---

-I---
-I- PM Counters Info
-I---
-W- lid=0x0007 guid=0x0008f105006515ba dev=23131 Port=33
Performance Monitor counter : Value
link_error_recovery_counter : 0xff (overflow)
-W- lid=0x0010 guid=0x0008f10500201d7c dev=23130 Port=14
Performance Monitor counter : Value
symbol_error_counter : 0x (overflow)
-W- lid=0x0001 guid=0x0008f10500108a76 dev=23130 Port=30
Performance Monitor counter : Value
symbol_error_counter : 0x (overflow)

-I---
-I- Fabric Partitions Report (see ibdiagnet.pkey for a full hosts list)
-I---
-I- PKey:0x7fff Hosts:254 full:254 limited:0

-I---
-I- IPoIB Subnets Check
-I---
-I- Subnet: IPv4 PKey:0x7fff QKey:0x0b1b MTU:2048Byte rate:10Gbps
SL:0x00

-I---
-I- Bad Links Info
-I- No bad link were found
-I---

-I- Stages Status Report:
STAGE Errors Warnings
Bad GUIDs/LIDs Check 0 0
Link State Active Check 0 0
General Devices Info Report 0 0
Performance Counters Report 0 3
Partitions Check 0 0
IPoIB Subnets Check 0 0

Please see /tmp/ibdiagnet.log for complete log


-I- Done. Run time was 21 seconds.

Any ideas?

Tom



--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: ibv_post_send/recv kernel path optimizations

2010-12-08 Thread Jason Gunthorpe
On Wed, Dec 08, 2010 at 12:14:51PM +, Walukiewicz, Miroslaw wrote:
 Or, 
 
 I don't see why the ib uverbs flow (BTW - the data path has nothing to do 
 with the 
 rdma_cm, you're working with /dev/infiniband/uverbsX), can't be enhanced 
 e.g to support shared-page which is allocated  mmaped from uverbs to 
 user space and used in the same manner your implementation does.
 
 The problem that I see is that the mmap is currently used for
 mapping of doorbell page in different drivers.
 
 We can use it for mapping a page for transmit/receive operation when
 we are able to differentiate that we need to map Doorbell or our
 shared page.

There is the 64 bit file offset field to mmap which I think is
driver-specific. You could use 0 for the doorbell page, QPN*PAGE_SIZE
+ QPN_OFFSET for the per-QP page, etc..

 The second problem is that this rx/tx mmap should map the separate
 page per QP to avoid the unnecessary QP lookups so page identifier
 passed to mmap should be based on QP identifier.
 
 I cannot find a specific code for /dev/infiniband/uverbsX. Is this
 device driver sharing the same functions like /dev/infiniband/rdmacm
 or it has own implementation.

It is in drivers/infiniband/core/uverbs*

For mmap the call is just routed to the driver's ib_dev mmap function,
so you can do whatever you want in your driver and match the
functionality in your userspace libibverbs driver library.

I think you should be able to implement your driver-specific
optimization within the uverbs framework - that would be best all
round.

Jason
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] mlx4: Change a warning message to debug

2010-12-08 Thread Roland Dreier
  This workaround presented in 58d74bb is not something we should alert the 
  user
  on.  Debug level message is enough.

Not sure I agree ... surely the point of this message is for the user to
see it and know to update firmware?  Otherwise why print anything at
all, since I'm sure you guys have already fixed the firmware and have a
regression test so this bug doesn't reappear?

 - R.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: ibv_post_send/recv kernel path optimizations

2010-12-08 Thread Roland Dreier
  The problem that I see is that the mmap is currently used for mapping
  of doorbell page in different drivers.

The driver can use different offsets into the file to map different
things.  For example I believe ehca, ipath and qib already do this.

  I cannot find a specific code for /dev/infiniband/uverbsX. Is this
  device driver sharing the same functions like /dev/infiniband/rdmacm
  or it has own implementation.

it is in drivers/infiniband/core/uverbs_main.c.

 - R.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/4] ibacm: support no delay options

2010-12-08 Thread Hefty, Sean
Allow a user to specify that address and/or route resolution
protocols should be suppressed, and that only locally cached
data should be returned.

This helps support rdma_getaddrinfo options RAI_NUMERICHOST
and RAI_NOROUTE.  If data for a request is available, it is
immediately returned.  Otherwise, the client request is
failed, but the lookup is still initiated.  This avoids
blocking a client for an extended period of time while
resolution completes, but allows future calls to find cached
data.

Signed-off-by: Sean Hefty sean.he...@intel.com
---
 include/infiniband/acm.h |1 +
 man/ib_acme.1|4 
 src/acm.c|   17 -
 src/acme.c   |   21 ++---
 src/libacm.c |   14 +++---
 src/libacm.h |4 ++--
 6 files changed, 40 insertions(+), 21 deletions(-)

diff --git a/include/infiniband/acm.h b/include/infiniband/acm.h
index d193d43..41b95b8 100644
--- a/include/infiniband/acm.h
+++ b/include/infiniband/acm.h
@@ -51,6 +51,7 @@
 #define ACM_STATUS_EDESTTYPE10
 
 #define ACM_FLAGS_QUERY_SA  (131)
+#define ACM_FLAGS_NODELAY  (130)
 
 #define ACM_MSG_HDR_LENGTH  16
 #define ACM_MAX_ADDRESS 64
diff --git a/man/ib_acme.1 b/man/ib_acme.1
index 0c0e332..52000a3 100644
--- a/man/ib_acme.1
+++ b/man/ib_acme.1
@@ -37,6 +37,10 @@ Indicates that the resolved path information should be 
verified with the
 active IB SA.  Use of the -v option provides a sanity check that
 resolved path information is usable given the current cluster configuration.
 .TP
+\-c
+Instructs the ACM service to only returned information that currently resides
+in its local cache.
+.TP
 \-A
 With this option, the ib_acme utility automatically generates the address
 configuration file acm_addr.cfg.  The generated file is
diff --git a/src/acm.c b/src/acm.c
index d9a81d9..08f233c 100644
--- a/src/acm.c
+++ b/src/acm.c
@@ -1914,8 +1914,7 @@ acm_svr_verify_resolve(struct acm_resolve_msg *msg,
 
cnt = (msg-hdr.length - ACM_MSG_HDR_LENGTH) / ACM_MSG_EP_LENGTH;
for (i = 0; i  cnt; i++) {
-   switch (msg-data[i].flags) {
-   case ACM_EP_FLAG_SOURCE:
+   if (msg-data[i].flags  ACM_EP_FLAG_SOURCE) {
if (src) {
acm_log(0, ERROR - multiple sources 
specified\n);
return ACM_STATUS_ESRCADDR;
@@ -1925,8 +1924,8 @@ acm_svr_verify_resolve(struct acm_resolve_msg *msg,
return ACM_STATUS_ESRCTYPE;
}
src = msg-data[i];
-   break;
-   case ACM_EP_FLAG_DEST:
+   }
+   if (msg-data[i].flags  ACM_EP_FLAG_DEST) {
if (dst) {
acm_log(0, ERROR - multiple destinations 
specified\n);
return ACM_STATUS_EDESTADDR;
@@ -1936,11 +1935,6 @@ acm_svr_verify_resolve(struct acm_resolve_msg *msg,
return ACM_STATUS_EDESTTYPE;
}
dst = msg-data[i];
-   break;
-   default:
-   acm_log(0, ERROR - unexpected endpoint flags 0x%x\n,
-   msg-data[i].flags);
-   return ACM_STATUS_EINVAL;
}
}
 
@@ -2040,6 +2034,11 @@ acm_svr_resolve(struct acm_client *client, struct 
acm_resolve_msg *msg)
/* fall through */
default:
 queue:
+   if (daddr-flags  ACM_FLAGS_NODELAY) {
+   acm_log(2, lookup initiated, but client wants no 
delay\n);
+   status = ACM_STATUS_ENODATA;
+   break;
+   }
status = acm_svr_queue_req(dest, client, msg);
if (status) {
break;
diff --git a/src/acme.c b/src/acme.c
index cc34577..daa6051 100644
--- a/src/acme.c
+++ b/src/acme.c
@@ -51,6 +51,7 @@ static char *dest_addr;
 static char *src_addr;
 static char addr_type = 'i';
 static int verify;
+static int nodelay;
 static int make_addr;
 static int make_opts;
 int verbose;
@@ -70,6 +71,7 @@ static void show_usage(char *program)
printf(   -s src_addr  - format defined by -f option\n);
printf(   -d dest_addr - format defined by -f option\n);
printf(   [-v] - verify ACM response against SA query 
response\n);
+   printf(   [-c] - read ACM cached data only\n);
printf(usage 2: %s\n, program);
printf(   -A [addr_file]   - generate local address configuration 
file\n);
printf(  (default is %s)\n, ACM_ADDR_FILE);
@@ -418,6 +420,16 @@ static void show_path(struct ibv_path_record *path)
printf(  packet lifetime: %d\n, path-packetlifetime  0x1F);
 }
 
+static uint32_t get_resolve_flags()
+{
+ 

[PATCH 2/4] ibacm: automatically generate addresses if missing acm_addr.cfg

2010-12-08 Thread Hefty, Sean
If the acm_addr.cfg file is missing, automatically attempt to
generate the file.  If no addresses can be assigned to any
endpoints because of a missing file, log an error and exit
the service.  This avoids running a service that is basically
useless.

This change allows the ib_acm service to execute using defaults
without requiring that a user first run 'ib_acme -A -O' or
otherwise creating the acm configuration files.

Signed-off-by: Sean Hefty sean.he...@intel.com
---
 src/acm.c |   37 +
 1 files changed, 29 insertions(+), 8 deletions(-)

diff --git a/src/acm.c b/src/acm.c
index 40f2205..e035741 100644
--- a/src/acm.c
+++ b/src/acm.c
@@ -203,6 +203,7 @@ static SOCKET listen_socket;
 static struct acm_client client[FD_SETSIZE - 1];
 
 static FILE *flog;
+static FILE *faddr;
 static lock_t log_lock;
 PER_THREAD char log_data[ACM_MAX_ADDRESS];
 
@@ -2249,7 +2250,6 @@ err:
 static int acm_assign_ep_names(struct acm_ep *ep)
 {
char *dev_name;
-   FILE *f;
char s[120];
char dev[32], addr[32], pkey_str[8];
uint16_t pkey;
@@ -2261,12 +2261,8 @@ static int acm_assign_ep_names(struct acm_ep *ep)
acm_log(1, device %s, port %d, pkey 0x%x\n,
dev_name, ep-port-port_num, ep-pkey);
 
-   if (!(f = fopen(addr_file, r))) {
-   acm_log(0, ERROR - unable to open acm_addr.cfg file\n);
-   return ACM_STATUS_ENODATA;
-   }
-
-   while (fgets(s, sizeof s, f)) {
+   rewind(faddr);
+   while (fgets(s, sizeof s, faddr)) {
if (s[0] == '#')
continue;
 
@@ -2310,7 +2306,6 @@ static int acm_assign_ep_names(struct acm_ep *ep)
}
}
 
-   fclose(f);
return !index;
 }
 
@@ -2722,6 +2717,22 @@ static FILE *acm_open_log(void)
return f;
 }
 
+static FILE *acm_open_addr_file(void)
+{
+   FILE *f;
+
+   if ((f = fopen(addr_file, r)))
+   return f;
+
+   acm_log(0, notice - generating acm_addr.cfg file\n);
+   if (!(f = popen(ib_acme -A, r))) {
+   acm_log(0, ERROR - cannot generate acm_addr.cfg\n);
+   return NULL;
+   }
+   pclose(f);
+   return fopen(addr_file, r);
+}
+
 static int acm_open_lock_file(void)
 {
int lock_fd;
@@ -2818,6 +2829,11 @@ int CDECL_FUNC main(int argc, char **argv)
DListInit(timeout_list);
event_init(timeout_event);
 
+   if (!(faddr = acm_open_addr_file())) {
+   acm_log(0, ERROR - address file not found\n);
+   return -1;
+   }
+
umad_init();
ibdev = ibv_get_device_list(dev_cnt);
if (!ibdev) {
@@ -2830,6 +2846,11 @@ int CDECL_FUNC main(int argc, char **argv)
acm_open_dev(ibdev[i]);
 
ibv_free_device_list(ibdev);
+   if (DListEmpty(dev_list)) {
+   acm_log(0, ERROR - no active devices\n);
+   return -1;
+   }
+   fclose(faddr);
 
acm_log(1, initiating multicast joins\n);
acm_join_groups();


--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/4] ibacm: updates base on scale-up testing

2010-12-08 Thread Hefty, Sean
The following selected patches update the ib_acm service based
on the results of testing the service across an 1100 node cluster.
There were 8 additional patches that resulted from the testing,
but those were either trivial updates or improvements to the ib_acme
test utility only. 

These changes will be part of a pending 1.0.4 release to
match up with the upcoming OFED release.
 
Signed-off-by: Sean Hefty sean.he...@intel.com
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/4] ibacm: write port to /var/run/ibacm.port

2010-12-08 Thread Hefty, Sean
Write used port data to /var/run/ibacm.port.  This will allow
librdmacm and other libraries and applications to find the
ibacm service when it has been moved from its default port.

Signed-off-by: Sean Hefty sean.he...@intel.com
---
 src/acm.c|8 
 src/libacm.c |   11 +++
 2 files changed, 19 insertions(+), 0 deletions(-)

diff --git a/src/acm.c b/src/acm.c
index e035741..d9a81d9 100644
--- a/src/acm.c
+++ b/src/acm.c
@@ -1597,6 +1597,7 @@ static void CDECL_FUNC acm_retry_handler(void *context)
 
 static void acm_init_server(void)
 {
+   FILE *f;
int i;
 
for (i = 0; i  FD_SETSIZE - 1; i++) {
@@ -1605,6 +1606,13 @@ static void acm_init_server(void)
client[i].sock = INVALID_SOCKET;
atomic_init(client[i].refcnt);
}
+
+   if (!(f = fopen(/var/run/ibacm.port, w))) {
+   acm_log(0, notice - cannot publish ibacm port number\n);
+   return;
+   }
+   fprintf(f, %hu\n, server_port);
+   fclose(f);
 }
 
 static int acm_listen(void)
diff --git a/src/libacm.c b/src/libacm.c
index 9d56cd2..3ce0cd0 100644
--- a/src/libacm.c
+++ b/src/libacm.c
@@ -57,6 +57,16 @@ extern lock_t lock;
 static SOCKET sock = INVALID_SOCKET;
 static short server_port = 6125;
 
+static void acm_set_server_port(void)
+{
+   FILE *f;
+
+   if ((f = fopen(/var/run/ibacm.port, r))) {
+   fscanf(f, %hu, (unsigned short *) server_port);
+   fclose(f);
+   }
+}
+
 int libacm_init(void)
 {
struct sockaddr_in addr;
@@ -66,6 +76,7 @@ int libacm_init(void)
if (ret)
return ret;
 
+   acm_set_server_port();
sock = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP);
if (sock == INVALID_SOCKET) {
ret = socket_errno();


--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/4] ibacm: Add lock to prevent multiple daemons from running

2010-12-08 Thread Hefty, Sean
Use a lock file to prevent multiple daemons from running
simultaneously.

Without this lock, a second instance of ib_acm eventually
fails to bind to the server's TCP port and exits, but not
before it overwrites a portion of the log file. 

Signed-off-by: Sean Hefty sean.he...@intel.com
---
 acm_opts.cfg |6 ++
 src/acm.c|   27 +++
 src/acme.c   |6 ++
 3 files changed, 39 insertions(+), 0 deletions(-)

diff --git a/acm_opts.cfg b/acm_opts.cfg
index 372cd7b..7147fe2 100644
--- a/acm_opts.cfg
+++ b/acm_opts.cfg
@@ -26,6 +26,12 @@ log_file /var/log/ibacm.log
 
 log_level 0
 
+# lock_file:
+# Specifies the location of the ACM lock file used to ensure that only a
+# single instance of ACM is running.
+
+lock_file /var/lock/ibacm.pid
+
 # addr_prot:
 # Default resolution protocol to resolve IP addresses into IB GIDs.
 # Supported protocols are:
diff --git a/src/acm.c b/src/acm.c
index 3152392..bc7124f 100644
--- a/src/acm.c
+++ b/src/acm.c
@@ -36,8 +36,10 @@
 #include string.h
 #include osd.h
 #include arpa/inet.h
+#include sys/types.h
 #include sys/stat.h
 #include sys/time.h
+#include fcntl.h
 #include infiniband/acm.h
 #include infiniband/umad.h
 #include infiniband/verbs.h
@@ -208,6 +210,7 @@ static char *opts_file = /etc/ibacm/acm_opts.cfg;
 static char *addr_file = /etc/ibacm/acm_addr.cfg;
 static char log_file[128] = stdout;
 static int log_level = 0;
+static char lock_file[128] = /var/lock/ibacm.pid;
 static enum acm_addr_prot addr_prot = ACM_ADDR_PROT_ACM;
 static enum acm_route_prot route_prot = ACM_ROUTE_PROT_ACM;
 static enum acm_loopback_prot loopback_prot = ACM_LOOPBACK_PROT_LOCAL;
@@ -2654,6 +2657,8 @@ static void acm_set_options(void)
strcpy(log_file, value);
else if (!stricmp(log_level, opt))
log_level = atoi(value);
+   else if (!stricmp(lock_file, opt))
+   strcpy(lock_file, value);
else if (!stricmp(addr_prot, opt))
addr_prot = acm_convert_addr_prot(value);
else if (!stricmp(route_prot, opt))
@@ -2686,6 +2691,7 @@ static void acm_set_options(void)
 static void acm_log_options(void)
 {
acm_log(0, log level %d\n, log_level);
+   acm_log(0, lock file %s\n, lock_file);
acm_log(0, address resolution %d\n, addr_prot);
acm_log(0, route resolution %d\n, route_prot);
acm_log(0, loopback resolution %d\n, loopback_prot);
@@ -2716,6 +2722,25 @@ static FILE *acm_open_log(void)
return f;
 }
 
+static int acm_open_lock_file(void)
+{
+   int lock_fd;
+   char pid[16];
+
+   lock_fd = open(lock_file, O_RDWR | O_CREAT, 0640);
+   if (lock_fd  0)
+   return lock_fd;
+
+   if (lockf(lock_fd, F_TLOCK, 0)) {
+   close(lock_fd);
+   return -1;
+   }
+
+   sprintf(pid, %d\n, getpid());
+   write(lock_fd, pid, strlen(pid));
+   return 0;
+}
+
 static void daemonize(void)
 {
pid_t pid, sid;
@@ -2778,6 +2803,8 @@ int CDECL_FUNC main(int argc, char **argv)
return -1;
 
acm_set_options();
+   if (acm_open_lock_file())
+   return -1;
 
lock_init(log_lock);
flog = acm_open_log();
diff --git a/src/acme.c b/src/acme.c
index 218dbe8..552f42a 100644
--- a/src/acme.c
+++ b/src/acme.c
@@ -106,6 +106,12 @@ static void gen_opts_temp(FILE *f)
fprintf(f, \n);
fprintf(f, log_level 0\n);
fprintf(f, \n);
+   fprintf(f, # lock_file:\n);
+   fprintf(f, # Specifies the location of the ACM lock file used to 
ensure that only a\n);
+   fprintf(f, # single instance of ACM is running.\n);
+   fprintf(f, \n);
+   fprintf(f, lock_file /var/lock/ibacm.pid\n);
+   fprintf(f, \n);
fprintf(f, # addr_prot:\n);
fprintf(f, # Default resolution protocol to resolve IP addresses into 
IB GIDs.\n);
fprintf(f, # Supported protocols are:\n);


--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 4/4] ibacm: support no delay options

2010-12-08 Thread Davis, Arlin R
Allow a user to specify that address and/or route resolution
protocols should be suppressed, and that only locally cached
data should be returned.

This helps support rdma_getaddrinfo options RAI_NUMERICHOST
and RAI_NOROUTE.  If data for a request is available, it is
immediately returned.  Otherwise, the client request is
failed, but the lookup is still initiated.  This avoids
blocking a client for an extended period of time while
resolution completes, but allows future calls to find cached
data.

Do you have a librdmacm patch coming soon with the new ai_flags?







--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 4/4] ibacm: support no delay options

2010-12-08 Thread Hefty, Sean
 Do you have a librdmacm patch coming soon with the new ai_flags?

yes - It is coming soon to a mail list near you this Christmas season
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: ibnetdiscover issue

2010-12-08 Thread Tom Ammon

Eli,

Is there a quick workaround we could put in place? I want to map out our 
fabric, and I especially need the spine GUIDs on the GD4200 because I'm 
going to be doing up/down routing and want to specify the root GUIDs. I 
can also submit a support case to Voltaire, if you think that would make 
it go faster. I want to make sure we are using OFED as distributed from OFA.


Tom

On 12/8/2010 11:28 AM, Hal Rosenstock wrote:

Hi Tom,

On 12/8/2010 12:48 PM, Tom Ammon wrote:

Hi,

I get the following when I try to run ibnetdiscover from a server
plugged in to a voltaire 4036 switch. We're using OFED 1.5.2:

[r...@sm1 ~]# ibnetdiscover
src/chassis.c:535; Unexpected node found: guid 0x0008f1050075134c
ibnetdiscover: iberror: failed: discover failed


Looks to me like there's a missing is_spine_4200() clause missing in
get_router_slot in libibnetdisc/src/chassis.c. Eli had added changes to
support the 4200 so he's the best one to comment.

-- Hal



However, ibdiagnet runs fine:

[r...@sm1 ~]# ibdiagnet
Loading IBDIAGNET from: /usr/lib64/ibdiagnet1.5.4
-W- Topology file is not specified.
Reports regarding cluster links will use direct routes.
Loading IBDM from: /usr/lib64/ibdm1.5.4
-I- Using port 1 as the local port.
-I- Discovering ... 277 nodes (23 Switches  254 CA-s) discovered.


-I---
-I- Bad Guids/LIDs Info
-I---
-I- No bad Guids were found

-I---
-I- Links With Logical State = INIT
-I---
-I- No bad Links (with logical state = INIT) were found

-I---
-I- General Device Info
-I---

-I---
-I- PM Counters Info
-I---
-W- lid=0x0007 guid=0x0008f105006515ba dev=23131 Port=33
Performance Monitor counter : Value
link_error_recovery_counter : 0xff (overflow)
-W- lid=0x0010 guid=0x0008f10500201d7c dev=23130 Port=14
Performance Monitor counter : Value
symbol_error_counter : 0x (overflow)
-W- lid=0x0001 guid=0x0008f10500108a76 dev=23130 Port=30
Performance Monitor counter : Value
symbol_error_counter : 0x (overflow)

-I---
-I- Fabric Partitions Report (see ibdiagnet.pkey for a full hosts list)
-I---
-I- PKey:0x7fff Hosts:254 full:254 limited:0

-I---
-I- IPoIB Subnets Check
-I---
-I- Subnet: IPv4 PKey:0x7fff QKey:0x0b1b MTU:2048Byte rate:10Gbps
SL:0x00

-I---
-I- Bad Links Info
-I- No bad link were found
-I---

-I- Stages Status Report:
STAGE Errors Warnings
Bad GUIDs/LIDs Check 0 0
Link State Active Check 0 0
General Devices Info Report 0 0
Performance Counters Report 0 3
Partitions Check 0 0
IPoIB Subnets Check 0 0

Please see /tmp/ibdiagnet.log for complete log


-I- Done. Run time was 21 seconds.

Any ideas?

Tom



--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
Tom Ammon
Network Engineer
Office: 801.587.0976
Mobile: 801.674.9273

Center for High Performance Computing
University of Utah
http://www.chpc.utah.edu
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] IB/uverbs: Handle large number of entries in poll CQ

2010-12-08 Thread Roland Dreier
So I finally got around to applying this, and I ended up monkeying
around a fair bit, mostly because gcc (my version is 4.4.4) seems to end
up generating horrible code for non-constant designated initializers.
Please look this over before I send it to Linus (and Dan, if you don't
want to be author any more, let me know ;)

 - R.


In ib_uverbs_poll_cq() code there is a potential integer overflow if
userspace passes in a large cmd.ne.  The calls to kmalloc() would
allocate smaller buffers than intended, leading to memory corruption.
There iss also an information leak if resp wasn't all used.
Unprivileged userspace may call this function, although only if an
RDMA device that uses this function is present.

Fix this by copying CQ entries one at a time, which avoids the
allocation entirely, and also by moving this copying into a function
that makes sure to initialize all memory copied to userspace.

Special thanks to Jason Gunthorpe jguntho...@obsidianresearch.com
for his help and advice.

Cc: sta...@kernel.org
Signed-off-by: Dan Carpenter erro...@gmail.com

[ Monkey around with things a bit to avoid bad code generation by gcc
  when designated initializers are used.  - Roland ]

Signed-off-by: Roland Dreier rola...@cisco.com
---
 drivers/infiniband/core/uverbs_cmd.c |  101 +++---
 1 files changed, 57 insertions(+), 44 deletions(-)

diff --git a/drivers/infiniband/core/uverbs_cmd.c 
b/drivers/infiniband/core/uverbs_cmd.c
index b342248..f8c6f4e 100644
--- a/drivers/infiniband/core/uverbs_cmd.c
+++ b/drivers/infiniband/core/uverbs_cmd.c
@@ -893,68 +893,81 @@ out:
return ret ? ret : in_len;
 }
 
+static int copy_wc_to_user(void __user *dest, struct ib_wc *wc)
+{
+   struct ib_uverbs_wc tmp;
+
+   tmp.wr_id   = wc-wr_id;
+   tmp.status  = wc-status;
+   tmp.opcode  = wc-opcode;
+   tmp.vendor_err  = wc-vendor_err;
+   tmp.byte_len= wc-byte_len;
+   tmp.ex.imm_data = (__u32 __force) wc-ex.imm_data;
+   tmp.qp_num  = wc-qp-qp_num;
+   tmp.src_qp  = wc-src_qp;
+   tmp.wc_flags= wc-wc_flags;
+   tmp.pkey_index  = wc-pkey_index;
+   tmp.slid= wc-slid;
+   tmp.sl  = wc-sl;
+   tmp.dlid_path_bits  = wc-dlid_path_bits;
+   tmp.port_num= wc-port_num;
+   tmp.reserved= wc-port_num;
+
+   if (copy_to_user(dest, tmp, sizeof tmp))
+   return -EFAULT;
+
+   return 0;
+}
+
 ssize_t ib_uverbs_poll_cq(struct ib_uverbs_file *file,
  const char __user *buf, int in_len,
  int out_len)
 {
struct ib_uverbs_poll_cq   cmd;
-   struct ib_uverbs_poll_cq_resp *resp;
+   struct ib_uverbs_poll_cq_resp  resp;
+   u8 __user *header_ptr;
+   u8 __user *data_ptr;
struct ib_cq  *cq;
-   struct ib_wc  *wc;
-   intret = 0;
-   inti;
-   intrsize;
+   struct ib_wc   wc;
+   intret;
 
if (copy_from_user(cmd, buf, sizeof cmd))
return -EFAULT;
 
-   wc = kmalloc(cmd.ne * sizeof *wc, GFP_KERNEL);
-   if (!wc)
-   return -ENOMEM;
-
-   rsize = sizeof *resp + cmd.ne * sizeof(struct ib_uverbs_wc);
-   resp = kmalloc(rsize, GFP_KERNEL);
-   if (!resp) {
-   ret = -ENOMEM;
-   goto out_wc;
-   }
-
cq = idr_read_cq(cmd.cq_handle, file-ucontext, 0);
-   if (!cq) {
-   ret = -EINVAL;
-   goto out;
-   }
+   if (!cq)
+   return -EINVAL;
 
-   resp-count = ib_poll_cq(cq, cmd.ne, wc);
+   /* we copy a struct ib_uverbs_poll_cq_resp to user space */
+   header_ptr = (void __user *)(unsigned long) cmd.response;
+   data_ptr = header_ptr + sizeof resp;
 
-   put_cq_read(cq);
+   memset(resp, 0, sizeof resp);
+   while (resp.count  cmd.ne) {
+   ret = ib_poll_cq(cq, 1, wc);
+   if (ret  0)
+   goto out_put;
+   if (!ret)
+   break;
+
+   ret = copy_wc_to_user(data_ptr, wc);
+   if (ret)
+   goto out_put;
 
-   for (i = 0; i  resp-count; i++) {
-   resp-wc[i].wr_id  = wc[i].wr_id;
-   resp-wc[i].status = wc[i].status;
-   resp-wc[i].opcode = wc[i].opcode;
-   resp-wc[i].vendor_err = wc[i].vendor_err;
-   resp-wc[i].byte_len   = wc[i].byte_len;
-   resp-wc[i].ex.imm_data= (__u32 __force) wc[i].ex.imm_data;
-   resp-wc[i].qp_num = wc[i].qp-qp_num;
-   

Re: [PATCH] Fix autotools to include the necessary M4 files

2010-12-08 Thread Roland Dreier
Thanks, applied to libibverbs at last.  Will try to get to it for other
libraries but please remind me if it slips out of my mind...
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: libibverbs: ibv_fork_init() and huge pages

2010-12-08 Thread Roland Dreier
So I'm finally looking at applying this (sorry for all the delay), but
I do have one worry: is it possible for an application to register an MR
that has both a huge page and a normal-size page mapping?  In that case
do things still work, or do things blow up?

And is it worth worrying about this case even if it is theoretically possible?

 - R.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] librdmacm: support non-default ACM port number

2010-12-08 Thread Hefty, Sean
By default, ACM uses port 6125.  The actual port number
used is now published in /var/run/ibacm.port.  Attempt to
obtain the correct port number from here, and if that fails
revert to using the default port number of 6125.

Signed-off-by: Sean Hefty sean.he...@intel.com
---
 src/acm.c |   12 
 1 files changed, 12 insertions(+), 0 deletions(-)
 mode change 100644 = 100755 src/acm.c

diff --git a/src/acm.c b/src/acm.c
old mode 100644
new mode 100755
index 1867362..e2d02b4
--- a/src/acm.c
+++ b/src/acm.c
@@ -34,6 +34,7 @@
 #  include config.h
 #endif /* HAVE_CONFIG_H */
 
+#include stdio.h
 #include sys/types.h
 #include sys/socket.h
 #include netdb.h
@@ -63,11 +64,22 @@ struct ib_connect_hdr {
 #define cma_dst_ip6 dst_addr[0]
 };
 
+static void ucma_set_server_port(void)
+{
+   FILE *f;
+
+   if ((f = fopen(/var/run/ibacm.port, r))) {
+   fscanf(f, %hu, (unsigned short *) server_port);
+   fclose(f);
+   }
+}
+
 void ucma_ib_init(void)
 {
struct sockaddr_in addr;
int ret;
 
+   ucma_set_server_port();
sock = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP);
if (sock  0)
return;


--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] librdmacm: Support RAI_NUMERICHOST and no delay options

2010-12-08 Thread Hefty, Sean
Add support similar to getaddrinfo AI_NUMERICHOST.  This
indicates that lengthy address resolution protocols should
not be used.  Also allow a caller of rdma_getaddrinfo to
indicate that lengthy route resolution protocols should not
be used.

Since rdma_getaddrinfo is a synchronous call, this allows a
user to obtain locally available data only without long
delays that may block an application thread.  Callers can then
use the asynchronous librdmacm calls to complete any missing
information.

Signed-off-by: Sean Hefty sean.he...@intel.com
---
 include/rdma/rdma_cma.h |2 ++
 man/rdma_getaddrinfo.3  |5 +
 src/acm.c   |2 ++
 src/addrinfo.c  |3 ++-
 4 files changed, 11 insertions(+), 1 deletions(-)

diff --git a/include/rdma/rdma_cma.h b/include/rdma/rdma_cma.h
index d17ef88..b48cd2e
--- a/include/rdma/rdma_cma.h
+++ b/include/rdma/rdma_cma.h
@@ -165,6 +165,8 @@ struct rdma_cm_event {
 };
 
 #define RAI_PASSIVE0x0001
+#define RAI_NUMERICHOST0x0002
+#define RAI_NOROUTE0x0004
 
 struct rdma_addrinfo {
int ai_flags;
diff --git a/man/rdma_getaddrinfo.3 b/man/rdma_getaddrinfo.3
index c418b5a..e69d8ce
--- a/man/rdma_getaddrinfo.3
+++ b/man/rdma_getaddrinfo.3
@@ -38,6 +38,11 @@ Hint flags that control the operation.  Supported flags are:
 .IP RAI_PASSIVE 12
 Indicates that the results will be used on the passive/listening
 side of a connection.
+.IP RAI_NUMERICHOST 12
+If specified, then the node parameter, if provided, must be a numerical
+network address.  This flag suppresses any lengthy address resolution. 
+.IP RAI_NOROUTE 12
+If set, this flag suppresses any lengthy route resolution.
 .IP ai_family 12
 Address family for the source and destination address.  Supported families
 are: AF_INET, AF_INET6, and AF_IB.
diff --git a/src/acm.c b/src/acm.c
index e2d02b4..1fa6c62 100755
--- a/src/acm.c
+++ b/src/acm.c
@@ -292,6 +292,8 @@ void ucma_ib_resolve(struct rdma_addrinfo *rai, struct 
rdma_addrinfo *hints)
 
if (rai-ai_dst_len) {
data-flags = ACM_EP_FLAG_DEST;
+   if (rai-ai_flags  (RAI_NUMERICHOST | RAI_NOROUTE))
+   data-flags |= ACM_FLAGS_NODELAY;
ucma_copy_rai_addr(data, rai-ai_dst_addr);
data++;
msg.hdr.length += ACM_MSG_EP_LENGTH;
diff --git a/src/addrinfo.c b/src/addrinfo.c
index a1cb8a5..021f7c4 100755
--- a/src/addrinfo.c
+++ b/src/addrinfo.c
@@ -48,7 +48,8 @@
 static void ucma_convert_to_ai(struct addrinfo *ai, struct rdma_addrinfo *rai)
 {
memset(ai, 0, sizeof *ai);
-   ai-ai_flags = (rai-ai_flags  RAI_PASSIVE) ? AI_PASSIVE : 0;
+   ai-ai_flags  = (rai-ai_flags  RAI_PASSIVE) ? AI_PASSIVE : 0;
+   ai-ai_flags |= (rai-ai_flags  RAI_NUMERICHOST) ? AI_NUMERICHOST : 0;
ai-ai_family = rai-ai_family;
 
switch (rai-ai_qp_type) {


--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] IB/uverbs: Handle large number of entries in poll CQ

2010-12-08 Thread Dan Carpenter
On Wed, Dec 08, 2010 at 03:56:09PM -0800, Roland Dreier wrote:
 + tmp.port_num= wc-port_num;
 + tmp.reserved= wc-port_num;

I would probably have put a zero in tmp.reserved.

Otherwise it looks good.

regards,
dan carpenter

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html