Re: [PATCH] opensm: Add a rate based mechanism for SMP transactions

2010-06-02 Thread Hal Rosenstock
Hi Sasha,

On Tue, Jun 1, 2010 at 11:32 AM, Sasha Khapyorsky sas...@voltaire.com wrote:
 Hi Hal,

 On 10:11 Wed 16 Dec     , Hal Rosenstock wrote:

 In order to better handle non responsive SMAs (when link is physically up
 but the SMA does not respond), a rate based mechanism for SMPs is added
 to better enable forward progress in a more timely fashion. So rather than
 wait for timeouts and outstanding wire SMPs to drop below some configured
 value, there is also a periodic rate for transaction based SMPs. These
 rate based SMPs are capped at a configured maximum value. In order to
 accomodate these, the vendor layer ibumad match table is increased by
 that number in order not to overflow due to these added transactions.

 Two new options are added for this:
 rate_based_smp_usecs indicates the number of microseconds between rate
 based SMPs.
 max_rate_based_smps indicates the maximum number of rate based SMPs
 supported. When this limit is reached, rate based SMPs are no longer
 sent (until the number of outstanding ones drops below this limit).

 As far as I learned the patch Wouldn't something like below does the
 same work:


 diff --git a/opensm/opensm/osm_vl15intf.c b/opensm/opensm/osm_vl15intf.c
 index ff9e4db..a16d88e 100644
 --- a/opensm/opensm/osm_vl15intf.c
 +++ b/opensm/opensm/osm_vl15intf.c
 @@ -113,6 +113,8 @@ static void vl15_poller(IN void *p_ptr)
        osm_madw_t *p_madw;
        osm_vl15_t *p_vl = p_ptr;
        cl_qlist_t *p_fifo;
 +       int32_t max_smps = p_vl-max_wire_smps;
 +       int32_t max_wire_smps2 = 2 * max_smps; /* FIXME: make configurable */

        OSM_LOG_ENTER(p_vl-p_log);

 @@ -156,16 +158,21 @@ static void vl15_poller(IN void *p_ptr)
                                                  EVENT_NO_TIMEOUT, TRUE);

                while (p_vl-p_stats-qp0_mads_outstanding_on_wire =
 -                      (int32_t) p_vl-max_wire_smps 
 +                      max_smps 
                       p_vl-thread_state == OSM_THREAD_STATE_RUN) {
                        status = cl_event_wait_on(p_vl-signal,
                                                  EVENT_NO_TIMEOUT, TRUE);
 -                       if (status != CL_SUCCESS) {
 +                       if (status == CL_TIMEOUT 
 +                           max_smps  max_wire_smps2) {
 +                               max_smps++;
 +                               break;
 +                       } else if (status != CL_SUCCESS) {
                                OSM_LOG(p_vl-p_log, OSM_LOG_ERROR, ERR 3E02: 
 
                                        Event wait failed (%s)\n,
                                        CL_STATUS_MSG(status));
                                break;
                        }
 +                       max_smps = p_vl-max_wire_smps;
                }
        }


 If yes, we will need only have two configurable max_wire_smps limits.

I had started with an algorithm along these lines but evolved towards
the proposed one based on CPU utilization. An algorithm along the
lines of the above wastes CPU (when idling and other times) which
significantly impacts any other apps running.

-- Hal

 Sasha


 The rate based SMP mechanism can be disabled by setting rate_based_smp_usecs
 to 0. This is equivalent to the (current) algorithm prior to this change.

 Test results:

 Subnet consists of 55 switches (all 36-port IS4) and couple of HCAs.
 OpenSM configuration to enlarge the fabric: LMC=7, LMC of
 extended port 0 = TRUE.

 It takes ~8K SMPs to configure this fabric (no QoS).

 Measured section of the code: LFTs configuration, which is
 the most SMP-intense phase of the sweep.

 Existing OpenSM code:
        max_wire_smps=1: LFT configuration took ~0.27 sec
        max_wire_smps=4: LFT configuration took ~0.13 sec

 OpenSM with rate-based SMPs
        no difference from the existing OpenSM was observed.

 Further testing showed that when subnet is OK (no timeouts),
 SM doesn't send rate-based SMPs at all, or sends just a couple
 of them (out of total 8K SMPs).

 Experimenting with bad fabric:
 With 480 timeouts in a row, all the timeouts were failed Set() commands.
 OpenSM configuration was as follows:
        max_wire_smps=1
        rate_based_smp_usec=1 (10 msec)
        max_rate_based_smps=100

 Whole sweep time: 21 seconds
 Virtually all the SMPs were rate-based.
 Calculating how much this should have taken w/o rate-based SMPs:
 (480 timeouts) * (3 retries) * (0.2 sec timeout) = 4.8 minutes
 so this is a big improvement in the presence of errors.

 Signed-off-by: Hal Rosenstock hal.rosenst...@gmail.com
 ---
 diff --git a/opensm/include/opensm/osm_base.h 
 b/opensm/include/opensm/osm_base.h
 index 4e9aaa9..ddb1265 100644
 --- a/opensm/include/opensm/osm_base.h
 +++ b/opensm/include/opensm/osm_base.h
 @@ -448,6 +448,30 @@ BEGIN_C_DECLS
  */
  #define OSM_DEFAULT_SMP_MAX_ON_WIRE 4
  /***/
 +/d* OpenSM: Base/OSM_DEFAULT_SMP_RATE
 +* NAME
 +*    OSM_DEFAULT_SMP_RATE
 +*
 +* DESCRIPTION
 +*    Specifies 

[PATCH 1/2] infiniband-diags/libibnetdisc/chassis.c: In group_nodes, pass on all nodes

2010-06-02 Thread Hal Rosenstock

rather than just switches for common SystemImageGUID
Now consistent with original grouping.c from which this was ported

Signed-off-by: Hal Rosenstock hal.rosenst...@gmail.com
---
 infiniband-diags/libibnetdisc/src/chassis.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/infiniband-diags/libibnetdisc/src/chassis.c 
b/infiniband-diags/libibnetdisc/src/chassis.c
index cd2113f..f5f683e 100644
--- a/infiniband-diags/libibnetdisc/src/chassis.c
+++ b/infiniband-diags/libibnetdisc/src/chassis.c
@@ -1059,7 +1059,7 @@ int group_nodes(ibnd_fabric_t * fabric)
 
/* now make pass on nodes for chassis which are not Voltaire */
/* grouped by common SystemImageGUID */
-   for (node = fabric-switches; node; node = node-type_next) {
+   for (node = fabric-nodes; node; node = node-next) {
if (mad_get_field(node-info, 0,
  IB_NODE_VENDORID_F) == VTR_VENDOR_ID)
continue;
-- 
1.5.6.4

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] infiniband-diags/libibnetdisc/chassis.c: In group_nodes, set fabric-chassis

2010-06-02 Thread Hal Rosenstock

when grouping by SystemImageGUID

Without doing this, find_chassisguid will always return without
finding the already discovered chassis so no grouping occurs
when it should

Signed-off-by: Hal Rosenstock hal.rosenst...@gmail.com
---
 infiniband-diags/libibnetdisc/src/chassis.c |4 
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/infiniband-diags/libibnetdisc/src/chassis.c 
b/infiniband-diags/libibnetdisc/src/chassis.c
index f5f683e..25d7473 100644
--- a/infiniband-diags/libibnetdisc/src/chassis.c
+++ b/infiniband-diags/libibnetdisc/src/chassis.c
@@ -2,6 +2,7 @@
  * Copyright (c) 2004-2009 Voltaire Inc.  All rights reserved.
  * Copyright (c) 2007 Xsigo Systems Inc.  All rights reserved.
  * Copyright (c) 2008 Lawrence Livermore National Lab.  All rights reserved.
+ * Copyright (c) 2010 HNR Consulting.  All rights reserved.
  *
  * This software is available to you under a choice of one of two
  * licenses.  You may choose to be licensed under the terms of the GNU
@@ -1074,6 +1075,8 @@ int group_nodes(ibnd_fabric_t * fabric)
chassis_scan.current_chassis-chassisguid =
get_chassisguid(node);
chassis_scan.current_chassis-nodecount = 1;
+   if (!fabric-chassis)
+   fabric-chassis = 
chassis_scan.first_chassis;
}
}
}
@@ -1107,5 +1110,6 @@ cleanup:
free(ch);
ch = ch_next;
}
+   fabric-chassis = NULL;
return -1;
 }
-- 
1.5.6.4

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] opensm/osm_perfmgr.c: Remove unnecessary lock reference from Performance Manager object

2010-06-02 Thread Sasha Khapyorsky
On 18:37 Tue 01 Jun , Ira Weiny wrote:
 
 From: Ira Weiny wei...@llnl.gov
 Date: Mon, 22 Dec 2008 11:03:59 -0800
 Subject: [PATCH] opensm/osm_perfmgr.c: Remove unnecessary lock reference from 
 Performance Manager object
 
 
 Signed-off-by: Ira Weiny wei...@llnl.gov

Applied. Thanks.

Sasha
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] infiniband-diags/libibnetdisc/chassis.c: In group_nodes, set fabric-chassis

2010-06-02 Thread Sasha Khapyorsky
On 09:04 Wed 02 Jun , Hal Rosenstock wrote:
 
 when grouping by SystemImageGUID
 
 Without doing this, find_chassisguid will always return without
 finding the already discovered chassis so no grouping occurs
 when it should
 
 Signed-off-by: Hal Rosenstock hal.rosenst...@gmail.com

Both applied. Thanks.

Sasha
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] opensm: fixing compilation issues in some header files

2010-06-02 Thread Sasha Khapyorsky
On 10:29 Wed 02 Jun , Yevgeny Kliteynik wrote:
 
 Right now gcc allows you to do implicit casting in both ways, g++ doesn't.

OpenSM is written in C, not C++.

Sasha
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] opensm: Add a rate based mechanism for SMP transactions

2010-06-02 Thread Sasha Khapyorsky
On 06:58 Wed 02 Jun , Hal Rosenstock wrote:
 
 I had started with an algorithm along these lines but evolved towards
 the proposed one based on CPU utilization. An algorithm along the
 lines of the above wastes CPU (when idling and other times) which
 significantly impacts any other apps running.

So what are you saying? To skip this patch until better implementation?

Sasha
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] net/core: Save the port number a netdevice uses

2010-06-02 Thread Ben Hutchings
On Wed, 2010-05-26 at 02:16 -0700, David Miller wrote:
 From: Eli Cohen e...@dev.mellanox.co.il
 Date: Wed, 26 May 2010 12:08:52 +0300
 
  So if I understand you correctly, you think that I should not bother
  to set a default value of 1. Each driver that cares about the value
  of this field, will set it however they want.
 
 I actually mean that the value 0 should mean the first port,
 the value 1 should mean the second port, etc.

There's a compatibility problem here though: when user-space looks at
this number it can't tell whether 0 really means the first port or
that the driver isn't setting dev_id.  In order to tell that it would
have to check the driver or kernel version too, and this is really nasty
(what if the driver was backported?).  So I think that 1-based numbering
for drivers that previously didn't set dev_id.

Why does this matter?  I'm maintaining the firmware update program for
Solarflare NICs under Linux.  Some of the firmware it updates is
per-port and some of it is per-board.  It needs to be able to tell which
net and PCI devices are associated with the same board.  At the moment
it works on the basis of PCI addresses, but this is unreliable in the
presence of virtualisation.  It would be better to use board serial
number and the port identifier that I just changed the driver to use,
but since there are existing driver versions always leave dev_id = 0, I
it needs to be able to tell whether dev_id is meaningful.

Ben.

-- 
Ben Hutchings, Senior Software Engineer, Solarflare Communications
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] libibverbs: Force line-buffering in ibv_asyncwatch

2010-06-02 Thread Roland Dreier
  setlinebuf() is pretty intuitive to understand, compared to setvbuf().

I finally applied this; however in the end I decided to do

setvbuf(stdout, NULL, _IOLBF, 0);

instead of setlinebuf(), since in the past I've prefered more pedantic
stuff (eg posix_memalign instead of memalign) to the older simpler
traditional functions.  Kind of a trivial issue either way anyway.
-- 
Roland Dreier rola...@cisco.com || For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/index.html
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] net/core: Save the port number a netdevice uses

2010-06-02 Thread David Miller
From: Ben Hutchings bhutchi...@solarflare.com
Date: Wed, 02 Jun 2010 17:42:29 +0100

 On Wed, 2010-05-26 at 02:16 -0700, David Miller wrote:
 From: Eli Cohen e...@dev.mellanox.co.il
 Date: Wed, 26 May 2010 12:08:52 +0300
 
  So if I understand you correctly, you think that I should not bother
  to set a default value of 1. Each driver that cares about the value
  of this field, will set it however they want.
 
 I actually mean that the value 0 should mean the first port,
 the value 1 should mean the second port, etc.
 
 There's a compatibility problem here though: when user-space looks at
 this number it can't tell whether 0 really means the first port or
 that the driver isn't setting dev_id.

That's perfect, it means we don't have to add explicit settings to
drivers which driver only one port.  Ie. the vast majority of
cases.

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: libibverbs and versions

2010-06-02 Thread Jason Gunthorpe
On Wed, Jun 02, 2010 at 10:50:56AM -0500, Steve Wise wrote:

 I have this library I'm writing that uses libibverbs.  Somehow it seems  
 to be using the 1.0 version instead of the newer version (I'm guessing).  
 When I call ibv_create_cq()  I get a seg fault and the stack looks like 
 this:

Make sure you link your library to libibverbs and do not rely on
linking the app to libibverbs to resolve the symbols. Check with 
ldd foo.so

Jason
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: libibverbs and versions

2010-06-02 Thread Steve Wise

Roland Dreier wrote:

  I have this library I'm writing that uses libibverbs.  Somehow it
  seems to be using the 1.0 version instead of the newer version (I'm
  guessing).  When I call ibv_create_cq()  I get a seg fault and the
  stack looks like this:
  
  #0  0x00319d80871d in pthread_mutex_lock () from /lib64/libpthread.so.0

  #1  0x7f60e93fac9a in __ibv_create_cq (context=0x7f60e8597c51, cqe=256,
 cq_context=0x0, channel=0x0, comp_vector=0) at src/verbs.c:278
  #2  0x7f60e93f727f in __ibv_create_cq_1_0 (context=0x61db30, cqe=256,
 cq_context=0x0, channel=0x0, comp_vector=0) at src/compat-1_0.c:649

Code compiled against libibverbs 1.1 shouldn't ever hit the 1.0
compatibility code.  libibverbs is using versioned symbols, so if your
library is built/linked correctly, it should pick up the 1.1 versions.
How are you building your library?

 - R.
  


Apparently I'm building it incorrectly. :)I stole the Makefile.am 
and configure.in from one of my other libs.

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] ib/mgmt: fix build under windows stack

2010-06-02 Thread Hefty, Sean
Fix warnings in windows build.

These are typecasts in several places, with a couple additional changes
needed to libibnetdisc.  Windows uses ISO definitions instead of
POSIX definitions, so calls like _read, _write, etc. are used on
Windows instead of read, write.  The result is that internal functions
are renamed with an ibnd prefix.

In order to handle ibdebug exported from libibmad, Windows need to mark
the variable as a 'dllexport' from libibmad, but as 'dllimport' for users.
Since libibnetdisc wants to 'dllimport' ibdebug from libibmad, while
'dllexporting its own functions, it requires a separate 'MAD_EXPORT' like
definition to allow this.

Signed-off-by: Sean Hefty sean.he...@intel.com
---
Note the change to query_smp.c.  It's the one area that may need more
thought about how things get handled.

 .../libibnetdisc/include/infiniband/ibnetdisc.h|   28 +-
 infiniband-diags/libibnetdisc/src/ibnetdisc.c  |6 +-
 .../libibnetdisc/src/ibnetdisc_cache.c |   58 ++--
 infiniband-diags/libibnetdisc/src/query_smp.c  |6 ++
 infiniband-diags/src/ibping.c  |2 -
 infiniband-diags/src/saquery.c |2 -
 libibmad/include/infiniband/mad.h  |2 -
 libibmad/include/infiniband/mad_osd.h  |1 
 8 files changed, 55 insertions(+), 50 deletions(-)

diff --git a/infiniband-diags/libibnetdisc/include/infiniband/ibnetdisc.h 
b/infiniband-diags/libibnetdisc/include/infiniband/ibnetdisc.h
index 83d0ba7..49ff743 100644
--- a/infiniband-diags/libibnetdisc/include/infiniband/ibnetdisc.h
+++ b/infiniband-diags/libibnetdisc/include/infiniband/ibnetdisc.h
@@ -168,7 +168,7 @@ typedef struct ibnd_fabric {
  * Initialization (fabric operations)
  */
 
-MAD_EXPORT ibnd_fabric_t *ibnd_discover_fabric(char * ca_name,
+IBND_EXPORT ibnd_fabric_t *ibnd_discover_fabric(char * ca_name,
   int ca_port,
   ib_portid_t * from,
   struct ibnd_config *config);
@@ -179,39 +179,39 @@ MAD_EXPORT ibnd_fabric_t *ibnd_discover_fabric(char * 
ca_name,
 *   If NULL start from the CA/CA port specified
 * config: (optional) additional config options for the scan
 */
-MAD_EXPORT void ibnd_destroy_fabric(ibnd_fabric_t * fabric);
+IBND_EXPORT void ibnd_destroy_fabric(ibnd_fabric_t * fabric);
 
-MAD_EXPORT ibnd_fabric_t *ibnd_load_fabric(const char *file,
+IBND_EXPORT ibnd_fabric_t *ibnd_load_fabric(const char *file,
   unsigned int flags);
 
-MAD_EXPORT int ibnd_cache_fabric(ibnd_fabric_t * fabric, const char *file,
+IBND_EXPORT int ibnd_cache_fabric(ibnd_fabric_t * fabric, const char *file,
 unsigned int flags);
 
 /** =
  * Node operations
  */
-MAD_EXPORT ibnd_node_t *ibnd_find_node_guid(ibnd_fabric_t * fabric,
+IBND_EXPORT ibnd_node_t *ibnd_find_node_guid(ibnd_fabric_t * fabric,
uint64_t guid);
-MAD_EXPORT ibnd_node_t *ibnd_find_node_dr(ibnd_fabric_t * fabric, char 
*dr_str);
+IBND_EXPORT ibnd_node_t *ibnd_find_node_dr(ibnd_fabric_t * fabric, char 
*dr_str);
 
 typedef void (*ibnd_iter_node_func_t) (ibnd_node_t * node, void *user_data);
-MAD_EXPORT void ibnd_iter_nodes(ibnd_fabric_t * fabric,
+IBND_EXPORT void ibnd_iter_nodes(ibnd_fabric_t * fabric,
ibnd_iter_node_func_t func, void *user_data);
-MAD_EXPORT void ibnd_iter_nodes_type(ibnd_fabric_t * fabric,
+IBND_EXPORT void ibnd_iter_nodes_type(ibnd_fabric_t * fabric,
 ibnd_iter_node_func_t func,
 int node_type, void *user_data);
 
 /** =
  * Chassis queries
  */
-MAD_EXPORT uint64_t ibnd_get_chassis_guid(ibnd_fabric_t * fabric,
+IBND_EXPORT uint64_t ibnd_get_chassis_guid(ibnd_fabric_t * fabric,
  unsigned char chassisnum);
-MAD_EXPORT char *ibnd_get_chassis_type(ibnd_node_t * node);
-MAD_EXPORT char *ibnd_get_chassis_slot_str(ibnd_node_t * node,
+IBND_EXPORT char *ibnd_get_chassis_type(ibnd_node_t * node);
+IBND_EXPORT char *ibnd_get_chassis_slot_str(ibnd_node_t * node,
   char *str, size_t size);
 
-MAD_EXPORT int ibnd_is_xsigo_guid(uint64_t guid);
-MAD_EXPORT int ibnd_is_xsigo_tca(uint64_t guid);
-MAD_EXPORT int ibnd_is_xsigo_hca(uint64_t guid);
+IBND_EXPORT int ibnd_is_xsigo_guid(uint64_t guid);
+IBND_EXPORT int ibnd_is_xsigo_tca(uint64_t guid);
+IBND_EXPORT int ibnd_is_xsigo_hca(uint64_t guid);
 
 #endif /* _IBNETDISC_H_ */
diff --git a/infiniband-diags/libibnetdisc/src/ibnetdisc.c 
b/infiniband-diags/libibnetdisc/src/ibnetdisc.c
index 0c4009a..f525d71 

variable length array in qib

2010-06-02 Thread Roland Dreier
qib has the code

void qib_disarm_piobufs_set(struct qib_devdata *dd, unsigned long *mask,
unsigned cnt)
{
struct qib_pportdata *ppd, *pppd[dd-num_pports];

it would probably be safer to avoid the variable length array pppd[] on
the kernel stack here... could this code be easily refactored to avoid
doing that?
-- 
Roland Dreier rola...@cisco.com || For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/index.html
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] libibverbs: ibv_fork_init() and libhugetlbfs

2010-06-02 Thread Roland Dreier
  without patch:
  1M memory region120usec
  16M memory region  1970usec 
  
  with patch v2:
  1M memory region172usec
  16M memory region  2030usec

So if I read this correctly this patch introduces almost a 50% overhead
in the 1M case... and probably much worse (as a fraction) in say the 64K
or 4K case.  I wonder if that's acceptable?

Alex's original approach was to try the memadvise with normal page size
and then fall back to huge page size if that failed.  But of course that
wastes some time on the failed madvise in the hugepage case.

I think it would be interesting to compare timings for registering, say,
4K, 64K, 1M and 16M regions with and without huge page backing, for
three possibilities:

 - unpatched libibverbs (will obviously fail on hugepage backed regions)
 - patched with this v2
 - alternate patch that tries madvise and only does your /proc/pid/smaps
 - parsing if the first madvise fails.

 - R.
-- 
Roland Dreier rola...@cisco.com || For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/index.html
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 2/2] infiniband/cxgb4: add null check

2010-06-02 Thread Roland Dreier
   if (status  0) {
  -ep-com.cm_id-rem_ref(ep-com.cm_id);
  +if (ep-com.cm_id)
  +ep-com.cm_id-rem_ref(ep-com.cm_id);

Steve, does this make sense?
-- 
Roland Dreier rola...@cisco.com || For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/index.html
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: variable length array in qib

2010-06-02 Thread Ralph Campbell
Sure. I will work on a patch.

On Wed, 2010-06-02 at 14:37 -0700, Roland Dreier wrote:
 qib has the code
 
   void qib_disarm_piobufs_set(struct qib_devdata *dd, unsigned long *mask,
   unsigned cnt)
   {
   struct qib_pportdata *ppd, *pppd[dd-num_pports];
 
 it would probably be safer to avoid the variable length array pppd[] on
 the kernel stack here... could this code be easily refactored to avoid
 doing that?


--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] IB/qib: define max IB ports instead of variable stack allocation

2010-06-02 Thread Ralph Campbell
Rather than use a variable size array allocation on the stack,
define a constant for the maximum array size possible.

Signed-off-by: Ralph Campbell ralph.campb...@qlogic.com
---

 drivers/infiniband/hw/qib/qib.h|3 +++
 drivers/infiniband/hw/qib/qib_tx.c |2 +-
 2 files changed, 4 insertions(+), 1 deletions(-)

diff --git a/drivers/infiniband/hw/qib/qib.h b/drivers/infiniband/hw/qib/qib.h
index 32d9208..211ff6a 100644
--- a/drivers/infiniband/hw/qib/qib.h
+++ b/drivers/infiniband/hw/qib/qib.h
@@ -326,6 +326,9 @@ struct qib_verbs_txreq {
 
 #define QIB_DEFAULT_MTU 4096
 
+/* max number of IB ports supported per HCA */
+#define QIB_MAX_IB_PORTS 2
+
 /*
  * Possible IB config parameters for f_get/set_ib_table()
  */
diff --git a/drivers/infiniband/hw/qib/qib_tx.c 
b/drivers/infiniband/hw/qib/qib_tx.c
index f7eb1dd..77909b5 100644
--- a/drivers/infiniband/hw/qib/qib_tx.c
+++ b/drivers/infiniband/hw/qib/qib_tx.c
@@ -170,7 +170,7 @@ static int find_ctxt(struct qib_devdata *dd, unsigned bufn)
 void qib_disarm_piobufs_set(struct qib_devdata *dd, unsigned long *mask,
unsigned cnt)
 {
-   struct qib_pportdata *ppd, *pppd[dd-num_pports];
+   struct qib_pportdata *ppd, *pppd[QIB_MAX_IB_PORTS];
unsigned i;
unsigned long flags;
 

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 2/2] infiniband/cxgb4: add null check

2010-06-02 Thread Steve Wise

Roland Dreier wrote:

if (status  0) {
  - ep-com.cm_id-rem_ref(ep-com.cm_id);
  + if (ep-com.cm_id)
  + ep-com.cm_id-rem_ref(ep-com.cm_id);

Steve, does this make sense?
  
I actually think that ep-com.cm_id is always valid when 
connect_reply_upcall() is called.  But I'd have to test it more.  I'd 
rather not add this change unless someone convinces me that there 
actually is a path where the cm_id null...

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 2/2] infiniband/cxgb4: add null check

2010-06-02 Thread Roland Dreier
  I actually think that ep-com.cm_id is always valid when
  connect_reply_upcall() is called.  But I'd have to test it more.  I'd
  rather not add this change unless someone convinces me that there
  actually is a path where the cm_id null...

Then would it make sense to change the code

if (ep-com.cm_id) {
PDBG(%s ep %p tid %u status %d\n, __func__, ep,
 ep-hwtid, status);
ep-com.cm_id-event_handler(ep-com.cm_id, event);
}

to just

PDBG(%s ep %p tid %u status %d\n, __func__, ep,
 ep-hwtid, status);
ep-com.cm_id-event_handler(ep-com.cm_id, event);

?
-- 
Roland Dreier rola...@cisco.com || For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/index.html
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] cxgb4: use the DMA state API instead of the pci equivalents

2010-06-02 Thread FUJITA Tomonori
This replace the PCI DMA state API (include/linux/pci-dma.h) with the
DMA equivalents since the PCI DMA state API will be obsolete.

No functional change.

For further information about the background:

http://marc.info/?l=linux-netdevm=127037540020276w=2

Signed-off-by: FUJITA Tomonori fujita.tomon...@lab.ntt.co.jp
---
 drivers/infiniband/hw/cxgb4/cq.c   |6 +++---
 drivers/infiniband/hw/cxgb4/iw_cxgb4.h |2 +-
 drivers/infiniband/hw/cxgb4/mem.c  |4 ++--
 drivers/infiniband/hw/cxgb4/qp.c   |   12 ++--
 drivers/infiniband/hw/cxgb4/t4.h   |6 +++---
 5 files changed, 15 insertions(+), 15 deletions(-)

diff --git a/drivers/infiniband/hw/cxgb4/cq.c b/drivers/infiniband/hw/cxgb4/cq.c
index 2447f52..e1317f5 100644
--- a/drivers/infiniband/hw/cxgb4/cq.c
+++ b/drivers/infiniband/hw/cxgb4/cq.c
@@ -77,7 +77,7 @@ static int destroy_cq(struct c4iw_rdev *rdev, struct t4_cq 
*cq,
kfree(cq-sw_queue);
dma_free_coherent((rdev-lldi.pdev-dev),
  cq-memsize, cq-queue,
- pci_unmap_addr(cq, mapping));
+ dma_unmap_addr(cq, mapping));
c4iw_put_cqid(rdev, cq-cqid, uctx);
return ret;
 }
@@ -112,7 +112,7 @@ static int create_cq(struct c4iw_rdev *rdev, struct t4_cq 
*cq,
ret = -ENOMEM;
goto err3;
}
-   pci_unmap_addr_set(cq, mapping, cq-dma_addr);
+   dma_unmap_addr_set(cq, mapping, cq-dma_addr);
memset(cq-queue, 0, cq-memsize);
 
/* build fw_ri_res_wr */
@@ -179,7 +179,7 @@ static int create_cq(struct c4iw_rdev *rdev, struct t4_cq 
*cq,
return 0;
 err4:
dma_free_coherent(rdev-lldi.pdev-dev, cq-memsize, cq-queue,
- pci_unmap_addr(cq, mapping));
+ dma_unmap_addr(cq, mapping));
 err3:
kfree(cq-sw_queue);
 err2:
diff --git a/drivers/infiniband/hw/cxgb4/iw_cxgb4.h 
b/drivers/infiniband/hw/cxgb4/iw_cxgb4.h
index 277ab58..d33e1a6 100644
--- a/drivers/infiniband/hw/cxgb4/iw_cxgb4.h
+++ b/drivers/infiniband/hw/cxgb4/iw_cxgb4.h
@@ -261,7 +261,7 @@ static inline struct c4iw_mw *to_c4iw_mw(struct ib_mw *ibmw)
 
 struct c4iw_fr_page_list {
struct ib_fast_reg_page_list ibpl;
-   DECLARE_PCI_UNMAP_ADDR(mapping);
+   DEFINE_DMA_UNMAP_ADDR(mapping);
dma_addr_t dma_addr;
struct c4iw_dev *dev;
int size;
diff --git a/drivers/infiniband/hw/cxgb4/mem.c 
b/drivers/infiniband/hw/cxgb4/mem.c
index 7f94da1..82b5703 100644
--- a/drivers/infiniband/hw/cxgb4/mem.c
+++ b/drivers/infiniband/hw/cxgb4/mem.c
@@ -764,7 +764,7 @@ struct ib_fast_reg_page_list *c4iw_alloc_fastreg_pbl(struct 
ib_device *device,
if (!c4pl)
return ERR_PTR(-ENOMEM);
 
-   pci_unmap_addr_set(c4pl, mapping, dma_addr);
+   dma_unmap_addr_set(c4pl, mapping, dma_addr);
c4pl-dma_addr = dma_addr;
c4pl-dev = dev;
c4pl-size = size;
@@ -779,7 +779,7 @@ void c4iw_free_fastreg_pbl(struct ib_fast_reg_page_list 
*ibpl)
struct c4iw_fr_page_list *c4pl = to_c4iw_fr_page_list(ibpl);
 
dma_free_coherent(c4pl-dev-rdev.lldi.pdev-dev, c4pl-size,
- c4pl, pci_unmap_addr(c4pl, mapping));
+ c4pl, dma_unmap_addr(c4pl, mapping));
 }
 
 int c4iw_dereg_mr(struct ib_mr *ib_mr)
diff --git a/drivers/infiniband/hw/cxgb4/qp.c b/drivers/infiniband/hw/cxgb4/qp.c
index 0c28ed1..7065cb3 100644
--- a/drivers/infiniband/hw/cxgb4/qp.c
+++ b/drivers/infiniband/hw/cxgb4/qp.c
@@ -40,10 +40,10 @@ static int destroy_qp(struct c4iw_rdev *rdev, struct t4_wq 
*wq,
 */
dma_free_coherent((rdev-lldi.pdev-dev),
  wq-rq.memsize, wq-rq.queue,
- pci_unmap_addr(wq-rq, mapping));
+ dma_unmap_addr(wq-rq, mapping));
dma_free_coherent((rdev-lldi.pdev-dev),
  wq-sq.memsize, wq-sq.queue,
- pci_unmap_addr(wq-sq, mapping));
+ dma_unmap_addr(wq-sq, mapping));
c4iw_rqtpool_free(rdev, wq-rq.rqt_hwaddr, wq-rq.rqt_size);
kfree(wq-rq.sw_rq);
kfree(wq-sq.sw_sq);
@@ -99,7 +99,7 @@ static int create_qp(struct c4iw_rdev *rdev, struct t4_wq *wq,
if (!wq-sq.queue)
goto err5;
memset(wq-sq.queue, 0, wq-sq.memsize);
-   pci_unmap_addr_set(wq-sq, mapping, wq-sq.dma_addr);
+   dma_unmap_addr_set(wq-sq, mapping, wq-sq.dma_addr);
 
wq-rq.queue = dma_alloc_coherent((rdev-lldi.pdev-dev),
  wq-rq.memsize, (wq-rq.dma_addr),
@@ -112,7 +112,7 @@ static int create_qp(struct c4iw_rdev *rdev, struct t4_wq 
*wq,
wq-rq.queue,
(unsigned long long)virt_to_phys(wq-rq.queue));
memset(wq-rq.queue, 0, wq-rq.memsize);
-   pci_unmap_addr_set(wq-rq, mapping, wq-rq.dma_addr);
+   dma_unmap_addr_set(wq-rq, mapping, wq-rq.dma_addr);