Re: [PATCH] opensm: Fix sl2vl configuration

2010-08-25 Thread Eli Dorfman (Voltaire)
Hal Rosenstock wrote:
 Hi Eli,
 
 On Mon, Aug 23, 2010 at 3:37 PM, Eli Dorfman dorfman@gmail.com wrote:
 Hi Hal,

 On Mon, Aug 23, 2010 at 5:25 PM, Hal Rosenstock
 hal.rosenst...@gmail.com wrote:
 On Wed, Jul 28, 2010 at 12:26 PM, Eli Dorfman (Voltaire)
 dorfman@gmail.com wrote:
 Subject: [PATCH] Fix sl2vl configuration

 For non-optimized sl2vl configuration in and out ports were reversed.
 Are you sure these are reversed ? Any idea which commit introduced
 this reversal of in and out ports ?
 I'm sure they are reversed.
 
 I looked at it some more and the ports look reversed to me too.
 
 This patch was also tested by Jim Schutt.
 
 That only means it works in his environment rather than correctness.
It was tested in mine enviornment as well.
Usually I test the patch in my environment to verify its correctness (in 
addition to code review).
I assume you do the same.

Do you expect me to test the patch in your environment as well?


 
 I didn't check which commit introduced this - why is it important?
 
 I'd like to understand which patch introduced the reversal. I don't
 see it but might have missed it.  I think it's important to know which
 versions are broken.

I think that the following commit added the bug:

commit 051a1dd514e63f3a971afad38e377932efca5e18
Author: Sasha Khapyorsky sas...@voltaire.com
Date:   Mon Jan 4 21:06:19 2010 +0200

opensm/osm_qos.c: split switch external and end ports setup

This splits QoS related port parameters setup over switch external ports
and end ports. Such separation leaves us with simpler code and saves
some repeated flows in case of switch external ports (actually required
per switch and not per port).

Another advantages: Optimized Sl2VL Mapping procedure can be implemented
easier using this model. A low level port QoS related parameters setup
infrastructure is ready for supporting per port QoS related configuration
(which hopefully will be implemented some days).



 
 For optimal sl2vl added override of default ALL settting with port's
 sl2vl when operational VL was other than the default port.
 What is the motivation to override when the operational VLs is
 different ? Why is that better than what is done currently ?
 The idea was to apply the default policy - set sl2vl modulo operational VL.
 When applying ALL settings using port 1 we still want to override this
 setting for ports with different operational VL.
 
 What makes the default policy modulo operational VLs ?
This is how its implemented in sl2vl_update_table()

vl_mask = (1  (ib_port_info_get_op_vls(p-port_info) - 1)) - 1;

for (i = 0; i  IB_MAX_NUM_VLS / 2; i++) {
vl1 = sl2vl_table-raw_vl_by_sl[i]  4;
vl2 = sl2vl_table-raw_vl_by_sl[i]  0xf;
if (vl1 != 15)
vl1 = vl_mask;
if (vl2 != 15)
vl2 = vl_mask;

Default startup switches configuration uses the same policy.


 
 Is this really a policy issue ?

 IMO there are two separate issues in this patch and they should be in
 separate patches (for better git bisection).
 Maybe, but I still think this patch fixes wrong setting for sl2vl
 using optimized and non optimized methods.
 I'm not sure this should be divided to 2 separate patches.
 
 It's one thought per patch and to me this is two different thoughts.
If this is the only issue, I can split this patch to 2 separate patches.

Eli
 
 Also, a couple of (possibly related) questions below.
 It seems that you are referring to patch v1 which was modified
 according to Jim's comments.
 Please check the v2 patch .
 
 I see my questions are moot in terms of the v2 patch.
 
 -- Hal
 
 Thanks,
 Eli

 Signed-off-by: Eli Dorfman e...@voltaire.com
 ---
  opensm/opensm/osm_qos.c |   25 -
  1 files changed, 20 insertions(+), 5 deletions(-)

 diff --git a/opensm/opensm/osm_qos.c b/opensm/opensm/osm_qos.c
 index a571370..de0ae23 100644
 --- a/opensm/opensm/osm_qos.c
 +++ b/opensm/opensm/osm_qos.c
 @@ -182,7 +182,7 @@ static ib_api_status_t sl2vl_update_table(osm_sm_t * 
 sm, osm_physp_t * p,
tbl.raw_vl_by_sl[i] = (vl1  4) | vl2;
}

 -   if (!force_update  (p_tbl = osm_physp_get_slvl_tbl(p, in_port)) 
 
 +   if (!force_update  in_port  (p_tbl = osm_physp_get_slvl_tbl(p, 
 in_port)) 
!memcmp(p_tbl, tbl, sizeof(tbl)))
return IB_SUCCESS;
 Why exclude port 0 here ? Is it related to the change noted below ?

 @@ -209,6 +209,7 @@ static int qos_extports_setup(osm_sm_t * sm, 
 osm_node_t *node,
unsigned num_ports = osm_node_get_num_physp(node);
int ret = 0;
unsigned i, j;
 +   uint8_t op_vl1;

for (i = 1; i  num_ports; i++) {
p = osm_node_get_physp_ptr(node, i);
 @@ -225,17 +226,31 @@ static int qos_extports_setup(osm_sm_t * sm, 
 osm_node_t *node,
if (ib_switch_info_get_opt_sl2vlmapping(node-sw-switch_info) 
  

srp target - write to raw block device like /dev/sda ist fragmented into very small requests

2010-08-25 Thread Billich Heiner
 
Hello,

we employ Infiniband SRP storage. For basic tests I just did write/read
with dd to the raw block device like /dev/sda. I tell dd to use 1M sized
blocks, but I see very small few KB sized requests only. To make this
tests of real value I would need to read and write with large 1MB sized
requests. More important I'm not sure whether this points to some
misconfiguration.

Example: 
Read 1M sized blocks from /dev/sdh - iostat shows 8*512byte average
request size, which is very small, I would expect 1M sized requests:
 
# dd if=/dev/sdh of=/dev/null bs=1M 

# iostat -xk /dev/sdh 2

Device: rrqm/s   wrqm/s   r/s   w/srkB/swkB/s avgrq-sz
avgqu-sz   await  svctm  %util
sdh   0.00 0.00 8253.23  0.00 33012.94 0.00 8.00
0.900.11   0.11  89.95

Performance counters on the srp target show the same small request size.

Is this expected behaviour if I write or read from/to a srp block device
file? Does anything limit the request size if I access the storage via
the block device file like /dev/sdh? 

Using a mounted filesystem on top of the block devices I get perfectly
1M sized i/o requests, hence I think that the storage (DDN S2A9900) an
the settings for the block device are o.k..

We currently run Scientific Linux 5.4 with a rebuild vendor kernel
2.6.18-194.8.1.el5 and the included ofed 1.3. I know that this is no
plain vanilla kernel.org kernel and no very recent OFED version.

Kind regards,

Heiner Billich



Some more details



# ibstat
CA 'mlx4_0'
CA type: MT26428
Number of ports: 2
Firmware version: 2.7.100
Hardware version: b0
Node GUID: 0x0002c90300088dc6
System image GUID: 0x0002c90300088dc9
Port 1:
State: Active
Physical state: LinkUp
Rate: 40
Base lid: 14
LMC: 0
SM lid: 1
Capability mask: 0x02510868
Port GUID: 0x0002c90300088dc7

ib_srp parameters:

mellanox_workarounds 1
srp_dev_loss_tmo 60
srp_sg_tablesize 255
topspin_workarounds 1

modinfo ib_srp
filename:
/lib/modules/2.6.18-194.8.1.el5/kernel/drivers/infiniband/ulp/srp/ib_srp
.ko
license:Dual BSD/GPL
description:InfiniBand SCSI RDMA Protocol initiator v0.2 (November
1, 2005)
author: Roland Dreier
srcversion: DCFC6574D959F8EC7B4AE72
depends:ib_core,scsi_mod,ib_cm,ib_sa
vermagic:   2.6.18-194.8.1.el5 SMP mod_unload gcc-4.1
parm:   srp_sg_tablesize:Max number of gather/scatter entries
per I/O (default is 12, max 255) (int)
parm:   topspin_workarounds:Enable workarounds for Topspin/Cisco
SRP target bugs if != 0 (int)
parm:   mellanox_workarounds:Enable workarounds for Mellanox SRP
target bugs if != 0 (int)
parm:   srp_dev_loss_tmo:Default number of seconds that srp
transport should  insulate the lost of a remote port
(default is 60 secs (int)
module_sig:
883f3504c2cf87cab955e73ffd495d2112149509f45aa2bcee8c81463e247bd36d55c5dd
bc2634d509e2b3eb9c09799f2ea4e951c973d0fe3725ddbc
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: IPoIB CM connection establishment protocol question

2010-08-25 Thread Eli Cohen
Hi stan,

Here's a short summary:
An IPoIB interface encodes it capability of supporting CM mode in the
hardware adress it publishes. Once the stack hands an SKB to IPoIB for
transmission (unicast), IPoIB needs to resolve the hardware address to
IB adress information (LID, SL etc.). Once resolution is complete, it
will create a RC connection with the other host using IB CM. This
process is symetrical - that is, for two communicating hosts there are
two RC connections so that each connection works half duplex.

Hope that helps.

On Thu, Aug 05, 2010 at 10:25:27AM -0700, Smith, Stan wrote:
 
 Could someone who is OFED IPoIB knowledgeable help me understand which IPoIB 
 side starts the RC connection establishment (Tx REQ) to an ARP unknown remote 
 host?
 
 I tried reading the Linux IPoIB code and realized it's been too many years 
 since writing Linux network drivers (2001) and found myself lost in the bits 
 w.r.t. CM connection mgmt. and ARP handling.
 
 From limited observation, it looks like the ACTIVE side (ttcp -t) of an 
 IPoIB network connection broadcasts an IB ARP-REQ which indicates CM 
 capability in the IB ARP flags byte.
 
 When the PASSIVE IPoIB (ttcp -r) receives the IB ARP-REQ, it sends back an IB 
 ARP-REPLY which indicates it's CM capability among other items.
 
 Now the confusing part: which IPoIB CM side starts the CM-RC connection 
 establishment?
 
 Does the PASSIVE IPoIB start the RC connection (Tx REQ) prior to sending the 
 IB ARP-REPLY?
 
 Or does the ACTIVE IPoIB side receive the IB ARP-REPLY and then start an 
 ACTIVE RC connection (Tx REQ) now that it knows the remote side is CM capable?
 
 thanks,
 
 stan.
 
 --
 To unsubscribe from this list: send the line unsubscribe linux-rdma in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: IPoIB: Broken IGMP processing

2010-08-25 Thread Christoph Lameter
On Mon, 23 Aug 2010, Yossi Etigin wrote:

 
  Simplest then is to check if byte 24 of the packet is 0xff.
  (ie IN6_IS_ADDR_MULTICAST)
 
  No need to worry about if it is properly formed or anything, if it is
  a multicast DGID then it is a multicast packet at the link level.
 

 Sounds good to me

Therefore this patch?

Subject: [IB] Make igmp processing work with IPOIB

IGMP processing is broken because the IPOIB does not set the
skb-pkt_type the right way for Multicast traffic. All incoming
packets are set to PACKET_HOST which means that the igmp_recv()
function will ignore the IGMP broadcasts/multicasts.

This in turn means that the IGMP timers are firing and are sending
information about multicast subscriptions unnecessarily. In a large
private network this can cause traffic spikes.

Signed-off-by: Christoph Lameter c...@linux.com

---
 drivers/infiniband/ulp/ipoib/ipoib_ib.c |7 +--
 include/linux/in6.h |3 +++
 2 files changed, 8 insertions(+), 2 deletions(-)

Index: linux-2.6/drivers/infiniband/ulp/ipoib/ipoib_ib.c
===
--- linux-2.6.orig/drivers/infiniband/ulp/ipoib/ipoib_ib.c  2010-08-23 
16:04:38.0 -0500
+++ linux-2.6/drivers/infiniband/ulp/ipoib/ipoib_ib.c   2010-08-25 
09:43:01.0 -0500
@@ -281,8 +281,11 @@ static void ipoib_ib_handle_rx_wc(struct
dev-stats.rx_bytes += skb-len;

skb-dev = dev;
-   /* XXX get correct PACKET_ type here */
-   skb-pkt_type = PACKET_HOST;
+   if (IN6_IS_ADDR_MULTICAST(skb-head + 24))
+
+   skb-pkt_type = PACKET_MULTICAST;
+   else
+   skb-pkt_type = PACKET_HOST;

if (test_bit(IPOIB_FLAG_CSUM, priv-flags)  likely(wc-csum_ok))
skb-ip_summed = CHECKSUM_UNNECESSARY;
Index: linux-2.6/include/linux/in6.h
===
--- linux-2.6.orig/include/linux/in6.h  2010-08-25 09:39:40.0 -0500
+++ linux-2.6/include/linux/in6.h   2010-08-25 09:40:22.0 -0500
@@ -53,6 +53,9 @@ extern const struct in6_addr in6addr_lin
 extern const struct in6_addr in6addr_linklocal_allrouters;
 #define IN6ADDR_LINKLOCAL_ALLROUTERS_INIT \
{ { { 0xff,2,0,0,0,0,0,0,0,0,0,0,0,0,0,2 } } }
+
+#define IN6_IS_ADDR_MULTICAST(a) (((const __u8 *) (a))[0] == 0xff)
+
 #endif

 struct sockaddr_in6 {
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] complib/cl_timer.c: fixing cl_timer calculation

2010-08-25 Thread Sasha Khapyorsky
On 15:01 Tue 24 Aug , Yevgeny Kliteynik wrote:
 
 BTW, any idea what are the (obviously historical)
 reasons for these lines in the code?
 
 323:  /* do not do 0 wait ! */
 324:  /* if (delta_time  1000.0) {delta_time = 1000;} */

Have no idea. This is from day zero of git history.

Sasha
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] complib/cl_timer.c: fixing cl_timer calculation

2010-08-25 Thread Sasha Khapyorsky
On 16:21 Tue 24 Aug , Yevgeny Kliteynik wrote:
 When calculating p_timer-timeout.tv_sec and p_timer-timeout.tv_nsec,
 the carry was ignored, resulting in wrong value in p_timer-timeout.tv_sec,
 and value  10^9 in p_timer-timeout.tv_nsec (illegal value).
 
 Signed-off-by: Yevgeny Kliteynik klit...@dev.mellanox.co.il

Applied. Thanks.

Tiny comment is below.

 ---
 
 V2: - using macros instead of int types
 - fixed the same problem in cl_timer_trim()
 
  opensm/complib/cl_timer.c |   48 ++--
  1 files changed, 24 insertions(+), 24 deletions(-)
 
 diff --git a/opensm/complib/cl_timer.c b/opensm/complib/cl_timer.c
 index 2acdb51..3b3c7b0 100644
 --- a/opensm/complib/cl_timer.c
 +++ b/opensm/complib/cl_timer.c
 @@ -294,12 +294,32 @@ static cl_status_t __cl_timer_find(IN const 
 cl_list_item_t * const p_list_item,
   return (CL_NOT_FOUND);
  }
 
 +/*
 + * Calculate 'struct timespec' value that is the
 + * current time plus the 'time_ms' milliseconds.
 + */
 +static __inline void __cl_timer_calculate(IN const uint32_t time_ms,
 +   OUT struct timespec * const p_timer)
 +{
 + struct timeval curtime, deltatime, endtime;
 +
 +#ifndef timerclear
 +#define timerclear(tvp)  (tvp)-tv_sec = (time_t)0, (tvp)-tv_usec = 0L
 +#endif
 + timerclear(curtime);
 + gettimeofday(curtime, NULL);

I don't think that curtime initialization does something useful before
gettimeofday() call.

Sasha

 +
 + deltatime.tv_sec = time_ms / 1000;
 + deltatime.tv_usec = (time_ms % 1000) * 1000;
 + timeradd(curtime, deltatime, endtime);
 + p_timer-tv_sec = endtime.tv_sec;
 + p_timer-tv_nsec = endtime.tv_usec * 1000;
 +}
 +
  cl_status_t cl_timer_start(IN cl_timer_t * const p_timer,
  IN const uint32_t time_ms)
  {
 - struct timeval curtime;
   cl_list_item_t *p_list_item;
 - uint32_t delta_time = time_ms;
 
   CL_ASSERT(p_timer);
   CL_ASSERT(p_timer-state == CL_INITIALIZED);
 @@ -313,20 +333,7 @@ cl_status_t cl_timer_start(IN cl_timer_t * const p_timer,
   cl_qlist_remove_item(gp_timer_prov-queue,
p_timer-list_item);
 
 - /* Get the current time */
 -#ifndef timerclear
 -#define  timerclear(tvp) (tvp)-tv_sec = (time_t)0, 
 (tvp)-tv_usec = 0L
 -#endif
 - timerclear(curtime);
 - gettimeofday(curtime, NULL);
 -
 - /* do not do 0 wait ! */
 - /* if (delta_time  1000.0) {delta_time = 1000;} */
 -
 - /* Calculate the timeout. */
 - p_timer-timeout.tv_sec = curtime.tv_sec + (delta_time / 1000);
 - p_timer-timeout.tv_nsec =
 - (curtime.tv_usec + ((delta_time % 1000) * 1000)) * 1000;
 + __cl_timer_calculate(time_ms, p_timer-timeout);
 
   /* Add the timer to the queue. */
   if (cl_is_qlist_empty(gp_timer_prov-queue)) {
 @@ -385,7 +392,6 @@ void cl_timer_stop(IN cl_timer_t * const p_timer)
  cl_status_t cl_timer_trim(IN cl_timer_t * const p_timer,
 IN const uint32_t time_ms)
  {
 - struct timeval curtime;
   struct timespec newtime;
   cl_status_t status;
 
 @@ -394,13 +400,7 @@ cl_status_t cl_timer_trim(IN cl_timer_t * const p_timer,
 
   pthread_mutex_lock(gp_timer_prov-mutex);
 
 - /* Get the current time */
 - timerclear(curtime);
 - gettimeofday(curtime, NULL);
 -
 - /* Calculate the timeout. */
 - newtime.tv_sec = curtime.tv_sec + (time_ms / 1000);
 - newtime.tv_nsec = (curtime.tv_usec + ((time_ms % 1000) * 1000)) * 1000;
 + __cl_timer_calculate(time_ms, newtime);
 
   if (p_timer-timer_state == CL_TIMER_QUEUED) {
   /* If the old time is earlier, do not trim it.  Just return. */
 -- 
 1.6.2.4
 
 
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: opensm/osm_opensm.c: no report when SM is exiting

2010-08-25 Thread Sasha Khapyorsky
On 16:38 Wed 18 Aug , Yevgeny Kliteynik wrote:
 Don't bother reporting events to plug-ins while SM is exiting.
 
 Signed-off-by: Yevgeny Kliteynik klit...@dev.mellanox.co.il

Applied. Thanks.

Sasha
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/5 v6] IB/srp: IB/srp: Implement SRP_CRED_REQ and SRP_AER_REQ

2010-08-25 Thread Bart Van Assche
On Wed, Aug 25, 2010 at 4:48 AM, David Dillow d...@thedillows.org wrote:

 On Sun, 2010-08-22 at 18:56 +0200, Bart Van Assche wrote:
  [ ... ]
   static struct srp_iu *__srp_get_tx_iu(struct srp_target_port *target,
  -                                     enum srp_request_type req_type)
  +                                   enum srp_tx_iu_type iu_type)
   {
  -     s32 rsv = (req_type == SRP_REQ_TASK_MGMT) ? 0 : SRP_TSK_MGMT_SQ_SIZE;
  +     s32 rsv = (iu_type == SRP_IU_REQ_TSK_MGMT) ? 0 : SRP_TSK_MGMT_SQ_SIZE;

 You check for credits even if this is an SRP_IU_RSP, so we'll fail to
 allocate an IU if we're low on credits. Not a common condition, but
 possible within the spec.

The code below was present in version five of the patch series and has
been removed accidentally. I'll resend the patch series.

+   if (tx_iu_type != SRP_TX_IU_RSP  target-req_lim = rsv) {
+   ++target-zero_req_lim;
+   return NULL;
+   }

Bart.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] ibnetdiscover: add '-f' flag to show full information (ports' speed and width).

2010-08-25 Thread Sasha Khapyorsky
On 09:45 Tue 24 Aug , Hal Rosenstock wrote:
 
  What about the flag? do we still need it if we pass the output after the 
  comment?
 
 I wouldn't think so. I also think we've made commentary changes to the
 ibnetdiscover output format like this before. If we wanted to be
 absolutely sure it wouldn't break anything, we'd keep the flag though.
 It's up to Sasha.

The '-f' flag could make sense since those information actually
duplicates previous 4xDDR, etc. and used by ibsim's parser only.

Also it would be nice to make ibsim's parser to understand both
possibilities: s=2 and speed=2 (Or even to parse *x*DR strings :))

Sasha
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ofw] [Patch] [Tools][infiniband-diags]

2010-08-25 Thread Sasha Khapyorsky
On 20:19 Sun 01 Aug , Sasha Khapyorsky wrote:
 On 14:36 Wed 28 Jul , Irena Kruchkovsky wrote:
  A patch that fixes the GUID output in ibstat to work correctly in windows 
  2003.
  
  Index: D:/Windows/MLNX_VPI/tools/infiniband-diags/src/ibstat.c
  ===
  --- D:/Windows/MLNX_VPI/tools/infiniband-diags/src/ibstat.c  (revision 6199)
  +++ D:/Windows/MLNX_VPI/tools/infiniband-diags/src/ibstat.c   
  (revision 6200)
  @@ -72,9 +72,9 @@
  printf(\tNumber of ports: %d\n, ca-numports);
  printf(\tFirmware version: %s\n, ca-fw_ver);
  printf(\tHardware version: %s\n, ca-hw_ver);
  -  printf(\tNode GUID: 0x%016llx\n,
  + printf(\tNode GUID: 0x%016I64x\n,
 
 Normally we are using PRI* macros with management code. What about this:

I've commited this couple of days ago:

commit e4b73c2081a9afc346adf98d0f4d348005c920f6
Author: Sasha Khapyorsky sas...@voltaire.com
Date:   Sun Aug 1 20:26:11 2010 +0300

infiniband-diags/ibstat: convert to PRIx64 macros GUID printing

Convert to using PRIx64 macros for printing GUID values.

Pointed out by Irena Kruchkovsky.

Signed-off-by: Sasha Khapyorsky sas...@voltaire.com

diff --git a/infiniband-diags/src/ibstat.c b/infiniband-diags/src/ibstat.c
index c44d8c4..f655a13 100644
--- a/infiniband-diags/src/ibstat.c
+++ b/infiniband-diags/src/ibstat.c
@@ -72,10 +72,9 @@ static void ca_dump(umad_ca_t * ca)
printf(\tNumber of ports: %d\n, ca-numports);
printf(\tFirmware version: %s\n, ca-fw_ver);
printf(\tHardware version: %s\n, ca-hw_ver);
-   printf(\tNode GUID: 0x%016llx\n,
-  (long long unsigned)ntohll(ca-node_guid));
-   printf(\tSystem image GUID: 0x%016llx\n,
-  (long long unsigned)ntohll(ca-system_guid));
+   printf(\tNode GUID: 0x%016 PRIx64 \n, ntohll(ca-node_guid));
+   printf(\tSystem image GUID: 0x%016 PRIx64 \n,
+  ntohll(ca-system_guid));
 }
 
 static char *port_state_str[] = {
@@ -122,8 +121,7 @@ static int port_dump(umad_port_t * port, int alone)
printf(%sLMC: %d\n, pre, port-lmc);
printf(%sSM lid: %d\n, pre, port-sm_lid);
printf(%sCapability mask: 0x%08x\n, pre, ntohl(port-capmask));
-   printf(%sPort GUID: 0x%016llx\n, pre,
-  (long long unsigned)ntohll(port-port_guid));
+   printf(%sPort GUID: 0x%016 PRIx64 \n, pre, ntohll(port-port_guid));
printf(%sLink layer: %s\n, pre, port-link_layer);
return 0;
 }
@@ -182,8 +180,7 @@ static int ports_list(char names[][UMAD_CA_NAME_LEN], int n)
 
for (i = 0; i  found; i++)
if (guids[i])
-   printf(0x%016llx\n,
-  (long long unsigned)ntohll(guids[i]));
+   printf(0x%016 PRIx64 \n, ntohll(guids[i]));
return found;
 }


, hope this solves your issues too.

Sasha
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] opensm: Fix sl2vl configuration

2010-08-25 Thread Hal Rosenstock
On Wed, Aug 25, 2010 at 4:43 AM, Eli Dorfman (Voltaire)
dorfman@gmail.com wrote:
 Hal Rosenstock wrote:
 Hi Eli,

 On Mon, Aug 23, 2010 at 3:37 PM, Eli Dorfman dorfman@gmail.com wrote:
 Hi Hal,

 On Mon, Aug 23, 2010 at 5:25 PM, Hal Rosenstock
 hal.rosenst...@gmail.com wrote:
 On Wed, Jul 28, 2010 at 12:26 PM, Eli Dorfman (Voltaire)
 dorfman@gmail.com wrote:
 Subject: [PATCH] Fix sl2vl configuration

 For non-optimized sl2vl configuration in and out ports were reversed.
 Are you sure these are reversed ? Any idea which commit introduced
 this reversal of in and out ports ?
 I'm sure they are reversed.

 I looked at it some more and the ports look reversed to me too.

 This patch was also tested by Jim Schutt.

 That only means it works in his environment rather than correctness.
 It was tested in mine enviornment as well.
 Usually I test the patch in my environment to verify its correctness (in 
 addition to code review).
 I assume you do the same.

 Do you expect me to test the patch in your environment as well?

Huh ?



 I didn't check which commit introduced this - why is it important?

 I'd like to understand which patch introduced the reversal. I don't
 see it but might have missed it.  I think it's important to know which
 versions are broken.

 I think that the following commit added the bug:

 commit 051a1dd514e63f3a971afad38e377932efca5e18
 Author: Sasha Khapyorsky sas...@voltaire.com
 Date:   Mon Jan 4 21:06:19 2010 +0200

opensm/osm_qos.c: split switch external and end ports setup

This splits QoS related port parameters setup over switch external ports
and end ports. Such separation leaves us with simpler code and saves
some repeated flows in case of switch external ports (actually required
per switch and not per port).

Another advantages: Optimized Sl2VL Mapping procedure can be implemented
easier using this model. A low level port QoS related parameters setup
infrastructure is ready for supporting per port QoS related configuration
(which hopefully will be implemented some days).


I thought it might be that one but wasn't sure. Thanks.



 For optimal sl2vl added override of default ALL settting with port's
 sl2vl when operational VL was other than the default port.
 What is the motivation to override when the operational VLs is
 different ? Why is that better than what is done currently ?
 The idea was to apply the default policy - set sl2vl modulo operational VL.
 When applying ALL settings using port 1 we still want to override this
 setting for ports with different operational VL.

 What makes the default policy modulo operational VLs ?
 This is how its implemented in sl2vl_update_table()
vl_mask = (1  (ib_port_info_get_op_vls(p-port_info) - 1)) - 1;

for (i = 0; i  IB_MAX_NUM_VLS / 2; i++) {
vl1 = sl2vl_table-raw_vl_by_sl[i]  4;
vl2 = sl2vl_table-raw_vl_by_sl[i]  0xf;
if (vl1 != 15)
vl1 = vl_mask;
if (vl2 != 15)
vl2 = vl_mask;

 Default startup switches configuration uses the same policy.

I view this as further modification on the configuration based on the
operational VLs. Besides, any VL specified above the op VLs is a drop
(same as indicating VL15).

I think what you are doing is changing the default behavior and hence
perhaps an additional policy switch is needed if this change really is
needed and I'm unsure of that.


 Is this really a policy issue ?

 IMO there are two separate issues in this patch and they should be in
 separate patches (for better git bisection).
 Maybe, but I still think this patch fixes wrong setting for sl2vl
 using optimized and non optimized methods.
 I'm not sure this should be divided to 2 separate patches.

 It's one thought per patch and to me this is two different thoughts.
 If this is the only issue, I can split this patch to 2 separate patches.

It's also the issue noted above.

-- Hal

 Eli

 Also, a couple of (possibly related) questions below.
 It seems that you are referring to patch v1 which was modified
 according to Jim's comments.
 Please check the v2 patch .

 I see my questions are moot in terms of the v2 patch.

 -- Hal

 Thanks,
 Eli

 Signed-off-by: Eli Dorfman e...@voltaire.com
 ---
  opensm/opensm/osm_qos.c |   25 -
  1 files changed, 20 insertions(+), 5 deletions(-)

 diff --git a/opensm/opensm/osm_qos.c b/opensm/opensm/osm_qos.c
 index a571370..de0ae23 100644
 --- a/opensm/opensm/osm_qos.c
 +++ b/opensm/opensm/osm_qos.c
 @@ -182,7 +182,7 @@ static ib_api_status_t sl2vl_update_table(osm_sm_t * 
 sm, osm_physp_t * p,
tbl.raw_vl_by_sl[i] = (vl1  4) | vl2;
}

 -   if (!force_update  (p_tbl = osm_physp_get_slvl_tbl(p, in_port)) 
 
 +   if (!force_update  in_port  (p_tbl = 
 osm_physp_get_slvl_tbl(p, in_port)) 
!memcmp(p_tbl, tbl, sizeof(tbl)))
return IB_SUCCESS;
 Why exclude 

ibnetdiscover issue with multiported CA (or router) with multiple ports on same subnet

2010-08-25 Thread Hal Rosenstock
Sasha,

I'm seeing an issue with ibnetdiscover from a CA port where it appears
to extend a path at a remote CA port (it's actually another port on
the same CA) to query NodeInfo of the next hop beyond it. I get the
following error message:

src/query_smp.c:188; umad (DR path slid 0; dlid 0; 0,1,20,2 Attr
0x11:0) bad status 110; Connection timed out

where smpquery -D nodeinfo of 0,1,20 is a CA which can also be seen
from the topology.

It appears to stem from the following code snippet from
libibnetdisc/src/ibnetdisc.c:recv_port_info

if (port_num  mad_get_field(port-info, 0, IB_PORT_PHYS_STATE_F)
== IB_PORT_PHYS_STATE_LINKUP
 ((node-type == IB_NODE_SWITCH  port_num != local_port) ||
(node == fabric-from_node  port_num == local_port))) {
ib_portid_t path = smp-path;
if (extend_dpath(engine, path, port_num)  0)
query_node_info(engine, path, node);
}

that was introduced by:
commit fcb8d5e7588e38508a8e354c37009d73c0a3889f
Author: Sasha Khapyorsky sas...@voltaire.com
Date:   Sat Apr 10 02:43:24 2010 +0300

libibnetdisc: no backward NodeInfo queries

Then switch is reached via port N we don't need to query back via this
port - source node is discovered already. Finally this saves some amount
of unnecessary MADs.

Signed-off-by: Sasha Khapyorsky sas...@voltaire.com

and subsequently modified by:
commit 49d149c63a44d99259f516a15af53d8cf3f0e7c9
Author: Sasha Khapyorsky sas...@voltaire.com
Date:   Tue Apr 13 19:54:45 2010 +0300

libibnetdisc: don't try to cross discovery over CA

When discovery is running from CA node it shouldn't try to cross over
all ports, but only via local one (send over non-local ports will fail
since CA doesn't route MADs).

Signed-off-by: Sasha Khapyorsky sas...@voltaire.com

due to the (node == fabric-from_node  port_num == local_port)
clause being TRUE.

ibnetdiscover
src/query_smp.c:188; umad (DR path slid 0; dlid 0; 0,1,20,2 Attr
0x11:0) bad status 110; Connection timed out
#
# Topology file: generated on Wed Aug 25 18:52:16 2010
#
# Initiated from node 0002c9020020ee0c port 0002c9020020ee0d

vendid=0x2c9
devid=0xb924
sysimgguid=0xb8c00438b
switchguid=0xb8c00438b(b8c00438b)
Switch  24 S-000b8c00438b # MT47396 Infiniscale-III
Mellanox Technologies base port 0 lid 4 lmc 0
[5] H-0002c90310e0[1](2c90310e1)  # sw124
HCA-1 lid 5 4xDDR
[6] H-0002c903d1c8[1](2c903d1c9)  # sw123
HCA-1 lid 0 4xDDR
[7] H-0002c9020020ee0c[1](2c9020020ee0d)  # sw075
HCA-1 lid 2 4xDDR
[20]H-0002c9020020ee0c[2](2c9020020ee0e)  # sw075
HCA-1 lid 3 4xDDR

...

vendid=0x2c9
devid=0x6278
sysimgguid=0x2c9020020ee0f
caguid=0x2c9020020ee0c
Ca  2 H-0002c9020020ee0c  # sw075 HCA-1
[1](2c9020020ee0d)  S-000b8c00438b[7] # lid 2 lmc 0
MT47396 Infiniscale-III Mellanox Technologies lid 4 4xDDR
[2](2c9020020ee0e)  S-000b8c00438b[20]# lid
3 lmc 0 MT47396 Infiniscale-III Mellanox Technologies lid 4 4xDDR


smpquery -D nodeinfo 0,1,20
# Node info: DR path slid 65535; dlid 65535; 0,1,20
BaseVers:1
ClassVers:...1
NodeType:Channel Adapter
NumPorts:2
SystemGuid:..0x0002c9020020ee0f
Guid:0x0002c9020020ee0c
PortGuid:0x0002c9020020ee0e
PartCap:.64
DevId:...0x6278
Revision:0x00a0
LocalPort:...2
VendorId:0x0002c9

I don't think the local port part of the test above (node ==
fabric-from_node  port_num == local_port)  is correct where:

local_port = (uint8_t) mad_get_field(port_info, 0,
IB_PORT_LOCAL_PORT_F);

Instead, shouldn't port_num be checked against the local port that
initiated the ibnetdiscover (which in this case is port 1) ? If so, a
from_portnum could be added/saved in the fabric structure and used
for this check. Do you concur with this approach ?

-- Hal
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] infiniband-diags/ibnetdiscover: Fix handling of CA ports in recv_port_info

2010-08-25 Thread Hal Rosenstock

When multiple ports on the same CA are connected to the same subnet and
an ibnetdiscover is initiated from one of them, the discovery continues
past the other CA port and an error occurs.

The error is:
src/query_smp.c:188; umad (DR path slid 0; dlid 0; 0,1,20,2 Attr0x11:0) bad 
status 110; Connection timed out

Fix this by saving the initiating port number in the fabric structure
and using that rather than local_port for the comparison in
recv_port_info.

Signed-off-by: Hal Rosenstock h...@mellanox.com
---
diff --git a/infiniband-diags/libibnetdisc/include/infiniband/ibnetdisc.h 
b/infiniband-diags/libibnetdisc/include/infiniband/ibnetdisc.h
index cfd3bbe..935e427 100644
--- a/infiniband-diags/libibnetdisc/include/infiniband/ibnetdisc.h
+++ b/infiniband-diags/libibnetdisc/include/infiniband/ibnetdisc.h
@@ -1,6 +1,7 @@
 /*
  * Copyright (c) 2009 Voltaire, Inc. All rights reserved.
  * Copyright (c) 2008 Lawrence Livermore National Lab.  All rights reserved.
+ * Copyright (c) 2010 Mellanox Technologies LTD.  All rights reserved.
  *
  * This software is available to you under a choice of one of two
  * licenses.  You may choose to be licensed under the terms of the GNU
@@ -149,6 +150,8 @@ typedef struct ibnd_fabric {
 * or by default the node you ar running on
 */
ibnd_node_t *from_node;
+   int from_portnum;
+
/* NULL term list of all nodes in the fabric */
ibnd_node_t *nodes;
/* NULL terminated list of all chassis found in the fabric */
diff --git a/infiniband-diags/libibnetdisc/src/ibnetdisc.c 
b/infiniband-diags/libibnetdisc/src/ibnetdisc.c
index f525d71..79dd98e 100644
--- a/infiniband-diags/libibnetdisc/src/ibnetdisc.c
+++ b/infiniband-diags/libibnetdisc/src/ibnetdisc.c
@@ -2,6 +2,7 @@
  * Copyright (c) 2004-2009 Voltaire Inc.  All rights reserved.
  * Copyright (c) 2007 Xsigo Systems Inc.  All rights reserved.
  * Copyright (c) 2008 Lawrence Livermore National Laboratory
+ * Copyright (c) 2010 Mellanox Technologies LTD.  All rights reserved.
  *
  * This software is available to you under a choice of one of two
  * licenses.  You may choose to be licensed under the terms of the GNU
@@ -199,7 +200,7 @@ static int recv_port_info(smp_engine_t * engine, ibnd_smp_t 
* smp,
if (port_num  mad_get_field(port-info, 0, IB_PORT_PHYS_STATE_F)
== IB_PORT_PHYS_STATE_LINKUP
 ((node-type == IB_NODE_SWITCH  port_num != local_port) ||
-   (node == fabric-from_node  port_num == local_port))) {
+   (node == fabric-from_node  port_num == 
fabric-from_portnum))) {
ib_portid_t path = smp-path;
if (extend_dpath(engine, path, port_num)  0)
query_node_info(engine, path, node);
@@ -324,9 +325,10 @@ static int recv_node_info(smp_engine_t * engine, 
ibnd_smp_t * smp,
dump_endnode(smp-path, node_is_new ? new : known,
 node, port);
 
-   if (rem_node == NULL)   /* this is the start node */
+   if (rem_node == NULL) { /* this is the start node */
fabric-from_node = node;
-   else {
+   fabric-from_portnum = port_num;
+   } else {
/* link ports... */
int rem_port_num = get_last_port(smp-path);
 
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] complib/cl_timer.c: fixing cl_timer calculation

2010-08-25 Thread Yevgeny Kliteynik
On 25-Aug-10 7:07 PM, Sasha Khapyorsky wrote:
 On 15:01 Tue 24 Aug , Yevgeny Kliteynik wrote:

 BTW, any idea what are the (obviously historical)
 reasons for these lines in the code?

 323: /* do not do 0 wait ! */
 324: /* if (delta_time  1000.0) {delta_time = 1000;} */
 
 Have no idea. This is from day zero of git history.

Well, it's not there any more.

-- Yevgeny

 Sasha
 

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: IPoIB CM connection establishment protocol question

2010-08-25 Thread Smith, Stan
Eli Cohen wrote:
 Hi stan,

 Here's a short summary:
 An IPoIB interface encodes it capability of supporting CM mode in the
 hardware address it publishes. Once the stack hands an SKB to IPoIB for
 transmission (unicast), IPoIB needs to resolve the hardware address to
 IB address information (LID, SL etc.). Once resolution is complete, it
 will create a RC connection with the other host using IB CM. This
 process is symmetrical - that is, for two communicating hosts there are
 two RC connections so that each connection works half duplex.

 Hope that helps.

Hello,
  Thanks for responding.
Indeed this clears the air; the original Windows IPoIB CM code assumed a single 
connection.

stan.


 On Thu, Aug 05, 2010 at 10:25:27AM -0700, Smith, Stan wrote:

 Could someone who is OFED IPoIB knowledgeable help me understand
 which IPoIB side starts the RC connection establishment (Tx REQ) to
 an ARP unknown remote host?

 I tried reading the Linux IPoIB code and realized it's been too many
 years since writing Linux network drivers (2001) and found myself
 lost in the bits w.r.t. CM connection mgmt. and ARP handling.

 From limited observation, it looks like the ACTIVE side (ttcp -t)
 of an IPoIB network connection broadcasts an IB ARP-REQ which
 indicates CM capability in the IB ARP flags byte.

 When the PASSIVE IPoIB (ttcp -r) receives the IB ARP-REQ, it sends
 back an IB ARP-REPLY which indicates it's CM capability among other
 items.

 Now the confusing part: which IPoIB CM side starts the CM-RC
 connection establishment?

 Does the PASSIVE IPoIB start the RC connection (Tx REQ) prior to
 sending the IB ARP-REPLY?

 Or does the ACTIVE IPoIB side receive the IB ARP-REPLY and then
 start an ACTIVE RC connection (Tx REQ) now that it knows the remote
 side is CM capable?

 thanks,

 stan.

 --
 To unsubscribe from this list: send the line unsubscribe
 linux-rdma in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html