[ewg] Re: [PATCH] call skb_orphan() after sending an SKB
commit f17ebf3e2099257da244587f1ee33f51745f7cdb Author: Eli Cohen [EMAIL PROTECTED] Date: Tue Feb 5 11:15:46 2008 +0200 Call skb_orphan() after sending an SKB This will call the destructor of the SKB (but not free the memory). It appears that some applications (ttcpv for example) are sensitive to delaying the time the SKB is freed. This commit fixes this problem. Can you explain what is the difference from the socket send buffer accounting point of view, between freeing the SKB to freeing the memory? what was the problem with ttcpv, did it hanged? have you tested the unsig_udqp.patch with different socket buffer sizes to make sure there's no live-lock etc? what was the app you were using? Also, I see that you have added a call to netif_stop_queue(), is this to solve another problem? Or. Signed-off-by: Eli Cohen [EMAIL PROTECTED] diff --git a/kernel_patches/fixes/ipoib_0190_unsig_udqp.patch b/kernel_patches/fixes/ipoib_0190_unsig_udqp.patch index b76cdab..3fbeda3 100644 --- a/kernel_patches/fixes/ipoib_0190_unsig_udqp.patch +++ b/kernel_patches/fixes/ipoib_0190_unsig_udqp.patch @@ -10,10 +10,10 @@ UDP messages, went up from 380 mbps to 508 mbps. Signed-off-by: Eli Cohen [EMAIL PROTECTED] --- -Index: ofed_kernel/drivers/infiniband/ulp/ipoib/ipoib.h +Index: ofa_1_3_dev_kernel/drivers/infiniband/ulp/ipoib/ipoib.h === ofed_kernel.orig/drivers/infiniband/ulp/ipoib/ipoib.h -+++ ofed_kernel/drivers/infiniband/ulp/ipoib/ipoib.h +--- ofa_1_3_dev_kernel.orig/drivers/infiniband/ulp/ipoib/ipoib.h 2008-02-05 11:04:35.0 +0200 ofa_1_3_dev_kernel/drivers/infiniband/ulp/ipoib/ipoib.h 2008-02-05 11:05:07.0 +0200 @@ -373,6 +373,7 @@ struct ipoib_dev_priv { struct ib_wc ibwc[IPOIB_NUM_WC]; @@ -39,10 +39,10 @@ Index: ofed_kernel/drivers/infiniband/ulp/ipoib/ipoib.h struct ipoib_ah *ipoib_create_ah(struct net_device *dev, struct ib_pd *pd, struct ib_ah_attr *attr); -Index: ofed_kernel/drivers/infiniband/ulp/ipoib/ipoib_ib.c +Index: ofa_1_3_dev_kernel/drivers/infiniband/ulp/ipoib/ipoib_ib.c === ofed_kernel.orig/drivers/infiniband/ulp/ipoib/ipoib_ib.c -+++ ofed_kernel/drivers/infiniband/ulp/ipoib/ipoib_ib.c +--- ofa_1_3_dev_kernel.orig/drivers/infiniband/ulp/ipoib/ipoib_ib.c 2008-02-05 11:04:35.0 +0200 ofa_1_3_dev_kernel/drivers/infiniband/ulp/ipoib/ipoib_ib.c 2008-02-05 11:05:44.0 +0200 @@ -254,12 +254,10 @@ repost: for buf %d\n, wr_id); } @@ -128,7 +128,7 @@ Index: ofed_kernel/drivers/infiniband/ulp/ipoib/ipoib_ib.c } int ipoib_poll(struct napi_struct *napi, int budget) -@@ -361,11 +372,65 @@ void ipoib_ib_rx_completion(struct ib_cq +@@ -361,11 +372,68 @@ void ipoib_ib_rx_completion(struct ib_cq netif_rx_schedule(dev, priv-napi); } @@ -168,8 +168,11 @@ Index: ofed_kernel/drivers/infiniband/ulp/ipoib/ipoib_ib.c +ipoib_warn(priv, failed to post zlen send\n); +else { +++priv-tx_head; -+++priv-tx_outstanding; +ipoib_dbg(priv, %s-%d: head = %d\n, __func__, __LINE__, priv-tx_head); ++if (++priv-tx_outstanding == ipoib_sendq_size) { ++ipoib_dbg(priv, TX ring full, stopping kernel net queue\n); ++netif_stop_queue(dev); ++} +} +} +poll_tx(priv); @@ -197,7 +200,7 @@ Index: ofed_kernel/drivers/infiniband/ulp/ipoib/ipoib_ib.c } static inline int post_send(struct ipoib_dev_priv *priv, -@@ -405,6 +470,11 @@ static inline int post_send(struct ipoib +@@ -405,6 +473,11 @@ static inline int post_send(struct ipoib } else priv-tx_wr.opcode = IB_WR_SEND; @@ -209,16 +212,18 @@ Index: ofed_kernel/drivers/infiniband/ulp/ipoib/ipoib_ib.c return ib_post_send(priv-qp, priv-tx_wr, bad_wr); } -@@ -489,7 +559,7 @@ void ipoib_send(struct net_device *dev, +@@ -489,7 +562,9 @@ void ipoib_send(struct net_device *dev, } if (unlikely(priv-tx_outstanding MAX_SEND_CQE + 1)) -poll_tx(priv, 0); +poll_tx(priv); ++ ++skb_orphan(skb); return; -@@ -530,6 +600,32 @@ void ipoib_reap_ah(struct work_struct *w +@@ -530,6 +605,32 @@ void ipoib_reap_ah(struct work_struct *w round_jiffies_relative(HZ)); } @@ -251,7 +256,7 @@ Index: ofed_kernel/drivers/infiniband/ulp/ipoib/ipoib_ib.c int ipoib_ib_dev_open(struct net_device *dev) { struct ipoib_dev_priv *priv = netdev_priv(dev); -@@ -542,9 +638,17 @@ int ipoib_ib_dev_open(struct net_device +@@ -542,9 +643,17 @@ int ipoib_ib_dev_open(struct net_device }
[ewg] some comments/cleanups for the openibd service script
Vlad, I just realized that there is some old and misleading sections here, for example bringing up/down of GEN1 drivers, mlx4_enet driver which is not part of this release AKAIK ..., kdapl which was removed, starting/stopping the ipoib ha tools which were removed, etc. I can send a patch to clean them up, but I thought you might prefer to do it yourself, please let me know, this has to get in for 1.3, I don't want to start handling support cases with questions on non existent features. This service script goes into commencial distributions, correct? Please see below and let me know your thinking, thanks, Or. On Wed, 6 Feb 2008, Or Gerlitz wrote: --- /dev/null 2008-02-05 10:18:44.755516936 +0200 +++ ofed_scripts/openibd 2008-02-06 13:46:50.0 +0200 @@ -0,0 +1,1375 @@ +#!/bin/bash + +# +# Copyright (c) 2006 Mellanox Technologies. All rights reserved. +# +# This Software is licensed under one of the following licenses: +# +# 1) under the terms of the Common Public License 1.0 a copy of which is +#available from the Open Source Initiative, see +#http://www.opensource.org/licenses/cpl.php. +# +# 2) under the terms of the The BSD License a copy of which is +#available from the Open Source Initiative, see +#http://www.opensource.org/licenses/bsd-license.php. +# +# 3) under the terms of the GNU General Public License (GPL) Version 2 a +#copy of which is available from the Open Source Initiative, see +#http://www.opensource.org/licenses/gpl-license.php. +# +# Licensee has the right to choose one of the above licenses. +# +# Redistributions of source code must retain the above copyright +# notice and one of the license notices. +# +# Redistributions in binary form must reproduce both the above copyright +# notice, one of the license notices in the documentation +# and/or other materials provided with the distribution. +# +# +# $Id: openibd 9139 2006-08-29 14:03:38Z vlad $ +# + +# config: /etc/infiniband/openib.conf +CONFIG=/etc/infiniband/openib.conf + +if [ ! -f $CONFIG ]; then +echo No InfiniBand configuration found +exit 0 +fi + +. $CONFIG + +CWD=`pwd` +cd /etc/infiniband +WD=`pwd` + +PATH=$PATH:/sbin:/usr/bin +if [ -e /etc/profile.d/ofed.sh ]; then +. /etc/profile.d/ofed.sh +fi + +# Only use ONBOOT option if called by a runlevel directory. +# Therefore determine the base, follow a runlevel link name ... +base=${0##*/} +link=${base#*[SK][0-9][0-9]} +# ... and compare them +if [ $link == $base ] ; then +RUNMODE=manual +ONBOOT=yes +else +RUNMODE=auto +fi + +ACTION=$1 +shift +RESTART=0 +max_ports_num_in_hca=0 + +# Check if OpenIB configured to start automatically +if [ X${ONBOOT} != Xyes ]; then +exit 0 +fi + +if ( grep -i 'SuSE Linux' /etc/issue /dev/null 21 ); then +if [ -n $INIT_VERSION ] ; then +# MODE=onboot +if LANG=C egrep -L ^ONBOOT=['\]?[Nn][Oo]['\]? ${CONFIG} /dev/null ; then +exit 0 +fi +fi +fi + +# +# Get a sane screen width +[ -z ${COLUMNS:-} ] COLUMNS=80 + +[ -z ${CONSOLETYPE:-} ] [ -x /sbin/consoletype ] CONSOLETYPE=`/sbin/consoletype` + +if [ -f /etc/sysconfig/i18n -a -z ${NOLOCALE:-} ] ; then + . /etc/sysconfig/i18n + if [ $CONSOLETYPE != pty ]; then +case ${LANG:-} in +ja_JP*|ko_KR*|zh_CN*|zh_TW*) +export LC_MESSAGES=en_US +;; +*) +export LANG +;; +esac + else +export LANG + fi +fi + +# Read in our configuration +if [ -z ${BOOTUP:-} ]; then + if [ -f /etc/sysconfig/init ]; then + . /etc/sysconfig/init + else +# This all seem confusing? Look in /etc/sysconfig/init, +# or in /usr/doc/initscripts-*/sysconfig.txt +BOOTUP=color +RES_COL=60 +MOVE_TO_COL=echo -en \\033[${RES_COL}G +SETCOLOR_SUCCESS=echo -en \\033[1;32m +SETCOLOR_FAILURE=echo -en \\033[1;31m +SETCOLOR_WARNING=echo -en \\033[1;33m +SETCOLOR_NORMAL=echo -en \\033[0;39m +LOGLEVEL=1 + fi + if [ $CONSOLETYPE = serial ]; then + BOOTUP=serial + MOVE_TO_COL= + SETCOLOR_SUCCESS= + SETCOLOR_FAILURE= + SETCOLOR_WARNING= + SETCOLOR_NORMAL= + fi +fi + +if [ ${BOOTUP:-} != verbose ]; then + INITLOG_ARGS=-q +else + INITLOG_ARGS= +fi + +echo_success() { + echo -n $@ + [ $BOOTUP = color ] $MOVE_TO_COL + echo -n [ + [ $BOOTUP = color ] $SETCOLOR_SUCCESS + echo -n $OK + [ $BOOTUP = color ] $SETCOLOR_NORMAL + echo -n ] + echo -e \r + return 0 +} + +echo_done() { + echo -n $@ + [ $BOOTUP = color ] $MOVE_TO_COL + echo -n [ + [ $BOOTUP = color ] $SETCOLOR_NORMAL + echo -n $done + [ $BOOTUP = color ] $SETCOLOR_NORMAL + echo -n ]
[ewg] Re: [PATCH] call skb_orphan() after sending an SKB
On Wed, 2008-02-06 at 10:17 +0200, Or Gerlitz wrote: commit f17ebf3e2099257da244587f1ee33f51745f7cdb Author: Eli Cohen [EMAIL PROTECTED] Date: Tue Feb 5 11:15:46 2008 +0200 Call skb_orphan() after sending an SKB This will call the destructor of the SKB (but not free the memory). It appears that some applications (ttcpv for example) are sensitive to delaying the time the SKB is freed. This commit fixes this problem. Can you explain what is the difference from the socket send buffer accounting point of view, between freeing the SKB to freeing the memory? When you call skb_orphan(), the destructor of the SKB is called, in the case this a function put by the socket. So from the socket point of view the packet has been sent. The memory is still no freed since it is needed by HW. Once we get a completion for the send operation, the SKB is freed. what was the problem with ttcpv, did it hanged? The problem with ttcpv was that it stopped sending packets since it was waiting for freeing the memory. The system did not hang, just the application (ttcpv) stopped sending. Other applications could continue working over the ipoib interface. have you tested the unsig_udqp.patch with different socket buffer sizes to make sure there's no live-lock etc? Yes, our regression system does that with different applications and benchmarks. what was the app you were using? ttcpv Also, I see that you have added a call to netif_stop_queue(), is this to solve another problem? This was just a whole that I found in code review - when I post a zero length packet, I still want this to affect the net queue control. Or. ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] Re: with the ipoib patches, debug prints spam the system log
They are only visible when activating ipoib debug. I know it fills the dmesg ring with messages. Do you think I should remove them? On Wed, 2008-02-06 at 10:38 +0200, Or Gerlitz wrote: Eli, You have left somehow too many... debug prints in the last patches, please clean this up. See for example how the system log after less then a minute when ipoib debug prints are opened, it has one original print (ib0: Send unicast ARP to 0023) and all the rest are yours. Or Feb 6 14:39:23 kernel: ib0: posting zlen send, wrid = 4: head = 2756, tail = 2752 Feb 6 14:39:23 kernel: ib0: ipoib_ib_tx_timer_func-427: head = 2757 Feb 6 14:39:25 kernel: ib0: posting zlen send, wrid = 39: head = 2919, tail = 2912 Feb 6 14:39:25 kernel: ib0: ipoib_ib_tx_timer_func-427: head = 2920 Feb 6 14:39:25 kernel: ib0: posting zlen send, wrid = 15: head = 2959, tail = 2944 Feb 6 14:39:25 kernel: ib0: ipoib_ib_tx_timer_func-427: head = 2960 Feb 6 14:39:27 kernel: ib0: posting zlen send, wrid = 8: head = 3080, tail = 3072 Feb 6 14:39:27 kernel: ib0: ipoib_ib_tx_timer_func-427: head = 3081 Feb 6 14:39:34 kernel: ib0: posting zlen send, wrid = 51: head = 3699, tail = 3696 Feb 6 14:39:34 kernel: ib0: ipoib_ib_tx_timer_func-427: head = 3700 Feb 6 14:39:35 kernel: ib0: posting zlen send, wrid = 25: head = 3737, tail = 3728 Feb 6 14:39:35 kernel: ib0: ipoib_ib_tx_timer_func-427: head = 3738 Feb 6 14:39:35 kernel: ib0: posting zlen send, wrid = 3: head = 3779, tail = 3776 Feb 6 14:39:35 kernel: ib0: ipoib_ib_tx_timer_func-427: head = 3780 Feb 6 14:39:36 kernel: ib0: posting zlen send, wrid = 48: head = 3824, tail = 3808 Feb 6 14:39:36 kernel: ib0: ipoib_ib_tx_timer_func-427: head = 3825 Feb 6 14:39:38 kernel: ib0: posting zlen send, wrid = 24: head = 3992, tail = 3984 Feb 6 14:39:38 kernel: ib0: ipoib_ib_tx_timer_func-427: head = 3993 Feb 6 14:39:38 kernel: ib0: posting zlen send, wrid = 4: head = 4036, tail = 4032 Feb 6 14:39:38 kernel: ib0: ipoib_ib_tx_timer_func-427: head = 4037 Feb 6 14:39:46 kernel: ib0: Send unicast ARP to 0023 Feb 6 14:39:46 kernel: ib0: posting zlen send, wrid = 11: head = 4683, tail = 4672 Feb 6 14:39:46 kernel: ib0: ipoib_ib_tx_timer_func-427: head = 4684 Feb 6 14:39:58 kernel: ib0: posting zlen send, wrid = 58: head = 5626, tail = 5616 Feb 6 14:39:58 kernel: ib0: ipoib_ib_tx_timer_func-427: head = 5627 Feb 6 14:39:59 kernel: ib0: posting zlen send, wrid = 56: head = 5752, tail = 5744 Feb 6 14:39:59 kernel: ib0: ipoib_ib_tx_timer_func-427: head = 5753 Feb 6 14:40:01 kernel: ib0: posting zlen send, wrid = 54: head = 5878, tail = 5872 Feb 6 14:40:01 kernel: ib0: ipoib_ib_tx_timer_func-427: head = 5879 Feb 6 14:40:01 kernel: ib0: posting zlen send, wrid = 30: head = 5918, tail = 5904 Feb 6 14:40:01 kernel: ib0: ipoib_ib_tx_timer_func-427: head = 5919 Feb 6 14:40:10 kernel: ib0: posting zlen send, wrid = 33: head = 6689, tail = 6672 Feb 6 14:40:10 kernel: ib0: ipoib_ib_tx_timer_func-427: head = 6690 Feb 6 14:40:13 kernel: ib0: posting zlen send, wrid = 48: head = 6896, tail = 6880 Feb 6 14:40:13 kernel: ib0: ipoib_ib_tx_timer_func-427: head = 6897 Feb 6 14:40:13 kernel: ib0: posting zlen send, wrid = 26: head = 6938, tail = 6928 Feb 6 14:40:13 kernel: ib0: ipoib_ib_tx_timer_func-427: head = 6939 Feb 6 14:40:15 kernel: ib0: posting zlen send, wrid = 61: head = 7101, tail = 7088 Feb 6 14:40:15 kernel: ib0: ipoib_ib_tx_timer_func-427: head = 7102 ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg][PATCH][0/2] SRP multipath failover within 60 seconds,
The following patches assist SRP/dm-multipath to failover within 60 seconds (bugzilla #577) without data corruption, read/write error 1. srp_disconnect_without_wait.patch - srp send disconnect request without waiting for CM timewait exit event since srp current does not re-use the cm_id and qp/cq of a connection (patch srp_1_recreate_at_reconnect.patch already in kernel_patches/fixes recreate the cmid, qp/cq for a connection at reconnect) 2. srp_qp_in_err_timer_reconnect_target.patch - when detecting a post_send/post_receive error, srp set qp_in_error, set a timer to reconnect to target, return SCSI_MLQUEUE_HOST_BUSY to lock the queue, and return DID_NO_CONNECT when target state is DEAD or REMOVED Here is my multipath.conf defaults { udev_dir/dev polling_interval5 selectorround-robin 0 path_grouping_policymultibus getuid_callout /sbin/scsi_id -g -u -s /block/%n prio_callout/bin/true path_checkerreadsector0 rr_min_io 100 rr_weight priorities failbackimmediate no_path_retry 5 user_friendly_names no } I also set srp_daemon.sh to rescan fabric every 60 seconds (instead of 300 secs as default setting) I ran data integrity test to /dev/mapper/devices and {disable path 1, sleep 90, enable path 1, sleep 60, disable path 2, sleep 90, enable path 2, sleep 60} in the loop RHEL5, 5.1 work very well (no data corruption, read/write failure report) For SLES 10 sp1, it work well as long as I run *multipath* every 60 secs. I think that I mis-configured the multipathd somehow (Here is how I set it up: using the same multipath.conf above, chkconfig boot.multipath on and chkconf multipathd on) -vu ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg][PATCH][1/2] SRP multipath failover within 60 seconds,
srp_disconnect_without_wait.patch - srp send disconnect request without waiting for CM timewait exit event since srp current does not re-use the cm_id and qp/cq of a connection (patch srp_1_recreate_at_reconnect.patch already in kernel_patches/fixes recreate the cmid, qp/cq for a connection at reconnect) Signed-off-by: Vu Pham [EMAIL PROTECTED] diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c index 950228f..45a2533 100644 --- a/drivers/infiniband/ulp/srp/ib_srp.c +++ b/drivers/infiniband/ulp/srp/ib_srp.c @@ -400,7 +400,6 @@ printk(KERN_DEBUG PFX Sending CM DREQ failed\n); return; } - wait_for_completion(target-done); } static void srp_remove_work(struct work_struct *work) @@ -1266,7 +1294,6 @@ case IB_CM_TIMEWAIT_EXIT: printk(KERN_ERR PFX connection closed\n); - comp = 1; target-status = 0; break; ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg][PATCH][2/2] SRP multipath failover within 60 seconds,
srp_qp_in_err_timer_reconnect_target.patch - when detecting a post_send/post_receive error, srp set qp_in_error, set a timer to reconnect to target, return SCSI_MLQUEUE_HOST_BUSY to lock the queue, and return DID_NO_CONNECT when target state is DEAD or REMOVED Signed-off-by: Vu Pham [EMAIL PROTECTED] --- ofa_kernel-1.3.configured/drivers/infiniband/ulp/srp/ib_srp.c 2008-02-05 11:18:16.0 -0800 +++ ofa_kernel-1.3/drivers/infiniband/ulp/srp/ib_srp.c 2008-02-05 15:18:33.0 -0800 @@ -885,6 +884,26 @@ DMA_FROM_DEVICE); } +static void srp_reconnect_work(struct work_struct *work) +{ + struct srp_target_port *target = + container_of(work, struct srp_target_port, work); + + srp_reconnect_target(target); +} + +static void srp_qp_in_err_timer(unsigned long data) +{ + struct srp_target_port *target = (struct srp_target_port *)data; + + spin_lock_irq(target-scsi_host-host_lock); + INIT_WORK(target-work, srp_reconnect_work); + schedule_work(target-work); + spin_unlock_irq(target-scsi_host-host_lock); + + del_timer(target-qp_err_timer); +} + static void srp_completion(struct ib_cq *cq, void *target_ptr) { struct srp_target_port *target = target_ptr; @@ -896,7 +915,16 @@ printk(KERN_ERR PFX failed %s status %d\n, wc.wr_id SRP_OP_RECV ? receive : send, wc.status); - target-qp_in_error = 1; + if (!target-qp_in_error) { +target-qp_in_error = 1; +if (!timer_pending(target-qp_err_timer)) { + setup_timer(target-qp_err_timer, + srp_qp_in_err_timer, + (unsigned long)target); + target-qp_err_timer.expires = 10 * HZ + jiffies; + add_timer(target-qp_err_timer); +} + } break; } @@ -1004,12 +1032,13 @@ struct ib_device *dev; int len; - if (target-state == SRP_TARGET_CONNECTING) + if (target-state == SRP_TARGET_CONNECTING || + target-qp_in_error) goto err; if (target-state == SRP_TARGET_DEAD || target-state == SRP_TARGET_REMOVED) { - scmnd-result = DID_BAD_TARGET 16; + scmnd-result = DID_NO_CONNECT 16; done(scmnd); return 0; } --- ofa_kernel-1.3.configured/drivers/infiniband/ulp/srp/ib_srp.h 2008-02-05 11:18:16.0 -0800 +++ ofa_kernel-1.3/drivers/infiniband/ulp/srp/ib_srp.h 2008-02-05 11:20:49.0 -0800 @@ -160,6 +160,7 @@ int status; enum srp_target_state state; int qp_in_error; + struct timer_list qp_err_timer; }; struct srp_iu { ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] traffic jittery, send queue full reports from mthca driver
I just opened case #897 on the below, it happens with last night snapshot. Or client MT25204 FW 1.2.0 two CPUs, four cores each server MT25418 FW 2.3.0 two CPUs, four cores each client : iperf -c $server -P 4 -d -t 3600 -i 1 server : iperf -s -i 1 [ 5] 39.0-40.0 sec 29.4 MBytes246 Mbits/sec [ 4] 39.0-40.0 sec 25.5 MBytes214 Mbits/sec [ 3] 34.0-35.0 sec 88.0 KBytes721 Kbits/sec [ 3] 35.0-36.0 sec 0.00 Bytes 0.00 bits/sec [ 3] 36.0-37.0 sec 0.00 Bytes 0.00 bits/sec [ 3] 37.0-38.0 sec 0.00 Bytes 0.00 bits/sec [ 3] 38.0-39.0 sec 0.00 Bytes 0.00 bits/sec [ 3] 39.0-40.0 sec 0.00 Bytes 0.00 bits/sec [ 5] 40.0-41.0 sec 38.5 MBytes323 Mbits/sec [ 8] 40.0-41.0 sec 36.2 MBytes304 Mbits/sec [ 9] 40.0-41.0 sec 54.3 MBytes456 Mbits/sec [ 10] 40.0-41.0 sec 32.1 MBytes270 Mbits/sec [ 11] 40.0-41.0 sec 29.4 MBytes247 Mbits/sec [SUM] 40.0-41.0 sec152 MBytes 1.28 Gbits/sec ib_mthca :03:00.0: SQ 000404 full (756910656 head, 756910592 tail, 64 max, 0 nreq) ib0: failed to post zlen send ib_mthca :03:00.0: SQ 000404 full (756915376 head, 756915312 tail, 64 max, 0 nreq) ib0: failed to post zlen send ib_mthca :03:00.0: SQ 000404 full (757146224 head, 757146160 tail, 64 max, 0 nreq) ib0: failed to post zlen send ib_mthca :03:00.0: SQ 000404 full (757146336 head, 757146272 tail, 64 max, 0 nreq) ib0: failed to post zlen send ib_mthca :03:00.0: SQ 000404 full (757317104 head, 757317040 tail, 64 max, 0 nreq) ib0: failed to post zlen send ib_mthca :03:00.0: SQ 000404 full (757361808 head, 757361744 tail, 64 max, 0 nreq) ib0: failed to post zlen send ib_mthca :03:00.0: SQ 000404 full (757361920 head, 757361856 tail, 64 max, 0 nreq) ib0: failed to post zlen send ib_mthca :03:00.0: SQ 000404 full (757515760 head, 757515696 tail, 64 max, 0 nreq) ib0: failed to post zlen send ib_mthca :03:00.0: SQ 000404 full (757515872 head, 757515808 tail, 64 max, 0 nreq) ib0: failed to post zlen send ib_mthca :03:00.0: SQ 000404 full (757515984 head, 757515920 tail, 64 max, 0 nreq) ib0: failed to post zlen send ib_mthca :03:00.0: SQ 000404 full (757516112 head, 757516048 tail, 64 max, 0 nreq) ib0: failed to post zlen send ib_mthca :03:00.0: SQ 000404 full (757516224 head, 757516160 tail, 64 max, 0 nreq) ib0: failed to post zlen send ib_mthca :03:00.0: SQ 000404 full (757516352 head, 757516288 tail, 64 max, 0 nreq) ib0: failed to post zlen send ib_mthca :03:00.0: SQ 000404 full (757516448 head, 757516384 tail, 64 max, 0 nreq) ib0: failed to post zlen send ib_mthca :03:00.0: SQ 000404 full (757516576 head, 757516512 tail, 64 max, 0 nreq) ib0: failed to post zlen send ib_mthca :03:00.0: SQ 000404 full (757523168 head, 757523104 tail, 64 max, 0 nreq) ib0: failed to post zlen send ib_mthca :03:00.0: SQ 000404 full (757531472 head, 757531408 tail, 64 max, 0 nreq) ib0: failed to post zlen send ib_mthca :03:00.0: SQ 000404 full (757531568 head, 757531504 tail, 64 max, 0 nreq) ib0: failed to post zlen send ib_mthca :03:00.0: SQ 000404 full (757548064 head, 757548000 tail, 64 max, 0 nreq) ib0: failed to post zlen send ib_mthca :03:00.0: SQ 000404 full (757582992 head, 757582928 tail, 64 max, 0 nreq) ib0: failed to post zlen send ib_mthca :03:00.0: SQ 000404 full (758082528 head, 758082464 tail, 64 max, 0 nreq) ib0: failed to post zlen send ib_mthca :03:00.0: SQ 000404 full (758162208 head, 758162144 tail, 64 max, 0 nreq) ib0: failed to post zlen send ib_mthca :03:00.0: SQ 000404 full (758232720 head, 758232656 tail, 64 max, 0 nreq) ib0: failed to post zlen send ib_mthca :03:00.0: SQ 000404 full (758232848 head, 758232784 tail, 64 max, 0 nreq) ib0: failed to post zlen send ib_mthca :03:00.0: SQ 000404 full (758232960 head, 758232896 tail, 64 max, 0 nreq) ib0: failed to post zlen send ib_mthca :03:00.0: SQ 000404 full (758233088 head, 758233024 tail, 64 max, 0 nreq) ib0: failed to post zlen send ib_mthca :03:00.0: SQ 000404 full (758303696 head, 758303632 tail, 64 max, 0 nreq) ib0: failed to post zlen send ib_mthca :03:00.0: SQ 000404 full (758303776 head, 758303712 tail, 64 max, 0 nreq) ib0: failed to post zlen send ib_mthca :03:00.0: SQ 000404 full (758307744 head, 758307680 tail, 64 max, 0 nreq) ib0: failed to post zlen send ib_mthca :03:00.0: SQ 000404 full (758307872 head, 758307808 tail, 64 max, 0 nreq) ib0: failed to post zlen send ib_mthca :03:00.0: SQ 000404 full (758334928 head, 758334864 tail, 64 max, 0 nreq) ib0: failed to post zlen send ib_mthca :03:00.0: SQ 000404 full (758335056 head, 758334992 tail, 64 max, 0 nreq) ib0: failed to post zlen send ib_mthca :03:00.0: SQ 000404 full (758341744 head, 758341680 tail, 64 max, 0 nreq) ib0: failed to post zlen send ib_mthca :03:00.0: SQ 000404 full (758341856 head, 758341792 tail, 64 max, 0 nreq) ib0: failed to post zlen send ib_mthca :03:00.0: SQ 000404 full (758396784
[ewg] Re: traffic jittery, send queue full reports from mthca driver
ib_mthca :03:00.0: SQ 000404 full (756910656 head, 756910592 tail, 64 max, 0 nreq) ib0: failed to post zlen send Eli, can this be a bug in the send ring accounting wrt to the zlen packet you use in the unsig-ud-qp patch? Or. ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] Re: [PATCH] call skb_orphan() after sending an SKB
On Wed, 2008-02-06 at 15:11 +0200, Or Gerlitz wrote: Eli Cohen wrote: On Wed, 2008-02-06 at 10:17 +0200, Or Gerlitz wrote: The problem with ttcpv was that it stopped sending packets since it was waiting for freeing the memory. The system did not hang, just the application (ttcpv) stopped sending. Other applications could continue working over the ipoib interface. What's ttcpv, doing web-search I only find ttcp, so I would be happy to get pointer plus what param you were using to see the problem. It's a variant of ttcp we're using here in our regression. Dotan can you send a pointer? Also, I see that you have added a call to netif_stop_queue(), is this to solve another problem? This was just a whole that I found in code review - when I post a zero length packet, I still want this to affect the net queue control. Why posting a zero len packet is related to the net queue control logic? I was thinking it has to do with releasing unsignaled SKBs Yes but if I have no more room in the tx ring I would like to stop the queue even here. Or ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] RE: traffic jittery, send queue full reports from mthca driver
OK - Eli found the problem to be fixed soon Tziporet -Original Message- From: Or Gerlitz [mailto:[EMAIL PROTECTED] Sent: Wednesday, February 06, 2008 2:54 PM To: Tziporet Koren Cc: ewg@lists.openfabrics.org Subject: traffic jittery, send queue full reports from mthca driver I just opened case #897 on the below, it happens with last night snapshot. Or client MT25204 FW 1.2.0 two CPUs, four cores each server MT25418 FW 2.3.0 two CPUs, four cores each client : iperf -c $server -P 4 -d -t 3600 -i 1 server : iperf -s -i 1 [ 5] 39.0-40.0 sec 29.4 MBytes246 Mbits/sec [ 4] 39.0-40.0 sec 25.5 MBytes214 Mbits/sec [ 3] 34.0-35.0 sec 88.0 KBytes721 Kbits/sec [ 3] 35.0-36.0 sec 0.00 Bytes 0.00 bits/sec [ 3] 36.0-37.0 sec 0.00 Bytes 0.00 bits/sec [ 3] 37.0-38.0 sec 0.00 Bytes 0.00 bits/sec [ 3] 38.0-39.0 sec 0.00 Bytes 0.00 bits/sec [ 3] 39.0-40.0 sec 0.00 Bytes 0.00 bits/sec [ 5] 40.0-41.0 sec 38.5 MBytes323 Mbits/sec [ 8] 40.0-41.0 sec 36.2 MBytes304 Mbits/sec [ 9] 40.0-41.0 sec 54.3 MBytes456 Mbits/sec [ 10] 40.0-41.0 sec 32.1 MBytes270 Mbits/sec [ 11] 40.0-41.0 sec 29.4 MBytes247 Mbits/sec [SUM] 40.0-41.0 sec152 MBytes 1.28 Gbits/sec ib_mthca :03:00.0: SQ 000404 full (756910656 head, 756910592 tail, 64 max, 0 nreq) ib0: failed to post zlen send ib_mthca :03:00.0: SQ 000404 full (756915376 head, 756915312 tail, 64 max, 0 nreq) ib0: failed to post zlen send ib_mthca :03:00.0: SQ 000404 full (757146224 head, 757146160 tail, 64 max, 0 nreq) ib0: failed to post zlen send ib_mthca :03:00.0: SQ 000404 full (757146336 head, 757146272 tail, 64 max, 0 nreq) ib0: failed to post zlen send ib_mthca :03:00.0: SQ 000404 full (757317104 head, 757317040 tail, 64 max, 0 nreq) ib0: failed to post zlen send ib_mthca :03:00.0: SQ 000404 full (757361808 head, 757361744 tail, 64 max, 0 nreq) ib0: failed to post zlen send ib_mthca :03:00.0: SQ 000404 full (757361920 head, 757361856 tail, 64 max, 0 nreq) ib0: failed to post zlen send ib_mthca :03:00.0: SQ 000404 full (757515760 head, 757515696 tail, 64 max, 0 nreq) ib0: failed to post zlen send ib_mthca :03:00.0: SQ 000404 full (757515872 head, 757515808 tail, 64 max, 0 nreq) ib0: failed to post zlen send ib_mthca :03:00.0: SQ 000404 full (757515984 head, 757515920 tail, 64 max, 0 nreq) ib0: failed to post zlen send ib_mthca :03:00.0: SQ 000404 full (757516112 head, 757516048 tail, 64 max, 0 nreq) ib0: failed to post zlen send ib_mthca :03:00.0: SQ 000404 full (757516224 head, 757516160 tail, 64 max, 0 nreq) ib0: failed to post zlen send ib_mthca :03:00.0: SQ 000404 full (757516352 head, 757516288 tail, 64 max, 0 nreq) ib0: failed to post zlen send ib_mthca :03:00.0: SQ 000404 full (757516448 head, 757516384 tail, 64 max, 0 nreq) ib0: failed to post zlen send ib_mthca :03:00.0: SQ 000404 full (757516576 head, 757516512 tail, 64 max, 0 nreq) ib0: failed to post zlen send ib_mthca :03:00.0: SQ 000404 full (757523168 head, 757523104 tail, 64 max, 0 nreq) ib0: failed to post zlen send ib_mthca :03:00.0: SQ 000404 full (757531472 head, 757531408 tail, 64 max, 0 nreq) ib0: failed to post zlen send ib_mthca :03:00.0: SQ 000404 full (757531568 head, 757531504 tail, 64 max, 0 nreq) ib0: failed to post zlen send ib_mthca :03:00.0: SQ 000404 full (757548064 head, 757548000 tail, 64 max, 0 nreq) ib0: failed to post zlen send ib_mthca :03:00.0: SQ 000404 full (757582992 head, 757582928 tail, 64 max, 0 nreq) ib0: failed to post zlen send ib_mthca :03:00.0: SQ 000404 full (758082528 head, 758082464 tail, 64 max, 0 nreq) ib0: failed to post zlen send ib_mthca :03:00.0: SQ 000404 full (758162208 head, 758162144 tail, 64 max, 0 nreq) ib0: failed to post zlen send ib_mthca :03:00.0: SQ 000404 full (758232720 head, 758232656 tail, 64 max, 0 nreq) ib0: failed to post zlen send ib_mthca :03:00.0: SQ 000404 full (758232848 head, 758232784 tail, 64 max, 0 nreq) ib0: failed to post zlen send ib_mthca :03:00.0: SQ 000404 full (758232960 head, 758232896 tail, 64 max, 0 nreq) ib0: failed to post zlen send ib_mthca :03:00.0: SQ 000404 full (758233088 head, 758233024 tail, 64 max, 0 nreq) ib0: failed to post zlen send ib_mthca :03:00.0: SQ 000404 full (758303696 head, 758303632 tail, 64 max, 0 nreq) ib0: failed to post zlen send ib_mthca :03:00.0: SQ 000404 full (758303776 head, 758303712 tail, 64 max, 0 nreq) ib0: failed to post zlen send ib_mthca :03:00.0: SQ 000404 full (758307744 head, 758307680 tail, 64 max, 0 nreq) ib0: failed to post zlen send ib_mthca :03:00.0: SQ 000404 full (758307872 head, 758307808 tail, 64 max, 0 nreq) ib0: failed to post zlen send ib_mthca :03:00.0: SQ 000404 full (758334928 head, 758334864 tail, 64 max, 0 nreq) ib0: failed to post zlen send ib_mthca :03:00.0: SQ 000404 full (758335056 head, 758334992 tail, 64 max, 0 nreq) ib0: failed to post zlen send ib_mthca :03:00.0: SQ
[ewg] RE: traffic jittery, send queue full reports from mthca driver
I will check this. -Original Message- From: Or Gerlitz [mailto:[EMAIL PROTECTED] Sent: ד 06 פברואר 2008 14:57 To: Eli Cohen Cc: ewg@lists.openfabrics.org Subject: Re: traffic jittery, send queue full reports from mthca driver ib_mthca :03:00.0: SQ 000404 full (756910656 head, 756910592 tail, 64 max, 0 nreq) ib0: failed to post zlen send Eli, can this be a bug in the send ring accounting wrt to the zlen packet you use in the unsig-ud-qp patch? Or. ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ewg][PATCH][0/2] SRP multipath failover within 60 seconds,
Vu Pham wrote: The following patches assist SRP/dm-multipath to failover within 60 seconds (bugzilla #577) without data corruption, read/write error 1. srp_disconnect_without_wait.patch - srp send disconnect request without waiting for CM timewait exit event since srp current does not re-use the cm_id and qp/cq of a connection (patch srp_1_recreate_at_reconnect.patch already in kernel_patches/fixes recreate the cmid, qp/cq for a connection at reconnect) 2. srp_qp_in_err_timer_reconnect_target.patch - when detecting a post_send/post_receive error, srp set qp_in_error, set a timer to reconnect to target, return SCSI_MLQUEUE_HOST_BUSY to lock the queue, and return DID_NO_CONNECT when target state is DEAD or REMOVED Here is my multipath.conf defaults { udev_dir/dev polling_interval5 selectorround-robin 0 path_grouping_policymultibus getuid_callout /sbin/scsi_id -g -u -s /block/%n prio_callout/bin/true path_checkerreadsector0 rr_min_io 100 rr_weight priorities failbackimmediate no_path_retry 5 user_friendly_names no } I also set srp_daemon.sh to rescan fabric every 60 seconds (instead of 300 secs as default setting) I ran data integrity test to /dev/mapper/devices and {disable path 1, sleep 90, enable path 1, sleep 60, disable path 2, sleep 90, enable path 2, sleep 60} in the loop RHEL5, 5.1 work very well (no data corruption, read/write failure report) For SLES 10 sp1, it work well as long as I run *multipath* every 60 secs. I think that I mis-configured the multipathd somehow (Here is how I set it up: using the same multipath.conf above, chkconfig boot.multipath on and chkconf multipathd on) -vu This fix issue 577 https://bugs.openfabrics.org/show_bug.cgi?id=577 that was found in OFED 1.2 Vlad - please take this Tziporet ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] OFED teleconf 18 Feb
Some US companies mark 18 Feb as a holiday (President's Day), so per request, I'm moving the OFED teleconference from 18 Feb to 19 Feb (same time slot). You'll receive an Outlook meeting update shortly. -- Jeff Squyres Cisco Systems ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] [GIT PULL] ~sashak/management.git
Hi Vlad, Please pull recent ofed_1_3 branch of ~sashak/management.git. The changes are: Ira K. Weiny (2): Move opensm.8 man page in prep for making config file changes. Update man page for configurable partition and prefix-routes file Ira Weiny (1): Add node name map, partition config, and QOS policy config files to the FILES section of man page. Sasha Khapyorsky (1): opensm: scripts/opensmd - fix opensm path. Thanks, Sasha ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ewg] traffic jittery, send queue full reports from mthca driver
Hello Or, I found out that if you increase send_queue_size and recv_queue_size, like 1K, this problem will be gone. Thanks Shirley ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] Re: [GIT PULL] ~sashak/management.git
Sasha Khapyorsky wrote: Hi Vlad, Please pull recent ofed_1_3 branch of ~sashak/management.git. The changes are: Ira K. Weiny (2): Move opensm.8 man page in prep for making config file changes. Update man page for configurable partition and prefix-routes file Ira Weiny (1): Add node name map, partition config, and QOS policy config files to the FILES section of man page. Sasha Khapyorsky (1): opensm: scripts/opensmd - fix opensm path. Done, Regards, Vladimir ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] RE: Updated: OFED weekly teleconference
Jeff, Just so you know, this conflicts with the OFA IWG meeting which has always been held from 11:30 - 1:00 PM EST on Tuesday's. Since this is a one time occurrence, I would not change anything but I just thought you should know. Rupert _ From: Jeff Squyres (jsquyres) [mailto:[EMAIL PROTECTED] Sent: Wednesday, February 06, 2008 10:03 AM To: ewg@lists.openfabrics.org Cc: Scott Bahling; John Russo; Ryan, Jim; Ken L Johnson; [EMAIL PROTECTED]; [EMAIL PROTECTED]; Head Bubba; Van Houten, Betty; Patrick Mullaney Subject: Updated: OFED weekly teleconference When: Tuesday, February 19, 2008 12:00 PM-1:00 PM (GMT-05:00) Eastern Time (US Canada). Where:ID: 210020028 __ Jeffrey Squyres has invited you to a Cisco Unified MeetingPlace Conference Date/Time: FEB 19, 2008 at 12:00PM America/New_York Length: 60 Frequency: 3 Meeting ID: 210020028 Meeting Password: Global Access Numbers: http://cisco.com/en/US/about/doing_business/conferencing/index.html US/Canada: +1.866.432.9903United Kingdom: +44.20.8824.0117 India: +91.80.4103.3979 Germany: +49.619.6773.9002 Japan: +81.3.5763.9394China:+86.10.8515.5666 TO ATTEND A WEB AND VOICE CONFERENCE: CISCO INTRANET ATTENDEES Join the Web Voice Conference* 1. Go to http://meetingplaceinternal.cisco.com/join.asp?210020028 2. Enter your CEC User ID Password then click OK - Accept any security warnings you receive and wait for the Meeting Room to initialize 3. Click on CONNECT from the Meeting Room to join the Voice Conference portion of the meeting EXTERNAL ATTENDEES - Outside the Cisco Intranet** Join the Web Voice Conference* 1. Go to http://meetingplace.cisco.com/join.asp?210020028 2. Fill in the My Name is field then click Attend Meeting - If you have a CEC User ID, click on the Cisco icon - Accept any security warnings you receive and wait for the Meeting Room to initialize 3. Click on CONNECT from the Meeting Room to join the Voice Conference portion of the meeting - Note: Guest users will see a link to the Global Access Numbers. *If this is your first time attending a Web Conference, disable any pop-up blockers and visit http://meetingplace.cisco.com/mpweb/scripts/browsertestupper.asp to test your web browser for compatibility with the Web Conference. **Not all meetings are scheduled to allow external attendees into the Web Conference portion of the meeting, if the URL does not work, please follow the Voice only Conference instructions below to attend. TO ATTEND A VOICE ONLY CONFERENCE 1. Dial into Cisco Unified MeetingPlace (view the Access Numbers and link above) 2. Press 1 to attend the meeting 3. Follow the prompts to enter the Meeting ID 210020028 and join the meeting SUPPORT Information about this Conference: Contact Jeffrey Squyres, 85250971 Cisco IT Support Center: Attend the Voice Conference and then press #0 on your phone keypad GLOBAL ACCESS NUMBERS COUNTRYLOCATIONLOCAL NUMBER TOLL FREE-FREEFONE AlgeriaAlgiers+213.21.98.9047 Argentina Buenos Aires +54.11.4341.0101 Australia Canberra +61.2.6216.0643 Melbourne +61.3.9659.4173 North Sydney +61.2.8446.5260 AustriaVienna +43.12.4030.6022 Azerbaijan Baku +994.12.437.4829 BelgiumBrussels +32.2.704.5072 Bosnia HerzegovinaSarajevo +387.33.56.2898 Brazil Brasilia +55.613.424.0220 Rio de Janeiro +55.21.2483.6302 Sao Paulo +55.11.5508.6311 Bulgaria Sofia +359.2.937.5938 Canada Calgary+1.403.514.2435 Edmonton +1.780.441.3715 Halifax+1.902.474.0214 Kanata +1.613.254.0005 Markham+1.905.470.4810 Montreal +1.514.847.6875 Ottawa +1.613.788.7250 Quebec +1.418.634.5645 Regina +1.306.566.6410 Toronto+1.416.306.7230 Vancouver +1.604.647.2350 Winnipeg +1.204.336.6610 Chile Santiago +56.2.431.4936 China Beijing+86.10.8515.5666 Chengdu+86.28.8696.1333 Guangzhou +86.20.8519.
[ewg] Re: Updated: OFED weekly teleconference
Note that I'm not the one who schedules the EWG teleconferences; I'm just the guy who provides the phone bridge. Tziporet is the OFED release manager and coordinates the EWG teleconferences. On Feb 6, 2008, at 10:12 AM, Rupert Dance wrote: Jeff, Just so you know, this conflicts with the OFA IWG meeting which has always been held from 11:30 - 1:00 PM EST on Tuesday's. Since this is a one time occurrence, I would not change anything but I just thought you should know. Rupert _ From: Jeff Squyres (jsquyres) [mailto:[EMAIL PROTECTED] Sent: Wednesday, February 06, 2008 10:03 AM To: ewg@lists.openfabrics.org Cc: Scott Bahling; John Russo; Ryan, Jim; Ken L Johnson; [EMAIL PROTECTED] ; [EMAIL PROTECTED]; Head Bubba; Van Houten, Betty; Patrick Mullaney Subject:Updated: OFED weekly teleconference When: Tuesday, February 19, 2008 12:00 PM-1:00 PM (GMT-05:00) Eastern Time (US Canada). Where: ID: 210020028 __ Jeffrey Squyres has invited you to a Cisco Unified MeetingPlace Conference Date/Time: FEB 19, 2008 at 12:00PM America/New_York Length: 60 Frequency: 3 Meeting ID: 210020028 Meeting Password: Global Access Numbers: http://cisco.com/en/US/about/doing_business/conferencing/index.html US/Canada: +1.866.432.9903United Kingdom: +44.20.8824.0117 India: +91.80.4103.3979 Germany: +49.619.6773.9002 Japan: +81.3.5763.9394China:+86.10.8515.5666 TO ATTEND A WEB AND VOICE CONFERENCE: CISCO INTRANET ATTENDEES Join the Web Voice Conference* 1. Go to http://meetingplaceinternal.cisco.com/join.asp?210020028 2. Enter your CEC User ID Password then click OK - Accept any security warnings you receive and wait for the Meeting Room to initialize 3. Click on CONNECT from the Meeting Room to join the Voice Conference portion of the meeting EXTERNAL ATTENDEES - Outside the Cisco Intranet** Join the Web Voice Conference* 1. Go to http://meetingplace.cisco.com/join.asp?210020028 2. Fill in the My Name is field then click Attend Meeting - If you have a CEC User ID, click on the Cisco icon - Accept any security warnings you receive and wait for the Meeting Room to initialize 3. Click on CONNECT from the Meeting Room to join the Voice Conference portion of the meeting - Note: Guest users will see a link to the Global Access Numbers. *If this is your first time attending a Web Conference, disable any pop-up blockers and visit http://meetingplace.cisco.com/mpweb/scripts/browsertestupper.asp to test your web browser for compatibility with the Web Conference. **Not all meetings are scheduled to allow external attendees into the Web Conference portion of the meeting, if the URL does not work, please follow the Voice only Conference instructions below to attend. TO ATTEND A VOICE ONLY CONFERENCE 1. Dial into Cisco Unified MeetingPlace (view the Access Numbers and link above) 2. Press 1 to attend the meeting 3. Follow the prompts to enter the Meeting ID 210020028 and join the meeting SUPPORT Information about this Conference: Contact Jeffrey Squyres, 85250971 Cisco IT Support Center: Attend the Voice Conference and then press #0 on your phone keypad GLOBAL ACCESS NUMBERS COUNTRYLOCATIONLOCAL NUMBER TOLL FREE-FREEFONE AlgeriaAlgiers+213.21.98.9047 Argentina Buenos Aires +54.11.4341.0101 Australia Canberra +61.2.6216.0643 Melbourne +61.3.9659.4173 North Sydney +61.2.8446.5260 AustriaVienna +43.12.4030.6022 Azerbaijan Baku +994.12.437.4829 BelgiumBrussels +32.2.704.5072 Bosnia HerzegovinaSarajevo +387.33.56.2898 Brazil Brasilia +55.613.424.0220 Rio de Janeiro +55.21.2483.6302 Sao Paulo +55.11.5508.6311 Bulgaria Sofia +359.2.937.5938 Canada Calgary+1.403.514.2435 Edmonton +1.780.441.3715 Halifax+1.902.474.0214 Kanata +1.613.254.0005 Markham+1.905.470.4810 Montreal +1.514.847.6875 Ottawa +1.613.788.7250 Quebec +1.418.634.5645 Regina +1.306.566.6410 Toronto+1.416.306.7230 Vancouver +1.604.647.2350 Winnipeg +1.204.336.6610 Chile Santiago +56.2.431.4936 China Beijing+86.10.8515.5666
[ewg] Your profile
Hello! I am tired today. I am nice girl that would like to chat with you. Email me at [EMAIL PROTECTED] only, because I am using my friend's email to write this. Will send some of my pictures ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ewg] OFED 1.3 rc4 update
On Wed, 2008-02-06 at 18:25 +0200, Tziporet Koren wrote: Hi, We will have OFED 1.3-rc4 tomorrow after one more night of regression It will include: 1. IPoIB: Non-SRQ for CM mode 2. IPOIB: 4K MTU 3. IPoIB - Small messages improvements Note that today's latest build will include theses features too if someone want to test it today Tziporet Thanks Tziporet. We will test it right after it's out. Thanks Shirley ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ewg] Re: with the ipoib patches, debug prints spam the system log
On 2/6/08, Eli Cohen [EMAIL PROTECTED] wrote: They are only visible when activating ipoib debug. I know it fills the dmesg ring with messages. Do you think I should remove them? Yes, you should remove them. The ipoib debug prints are very usefull to debug and analyze at the field, however, your 3 prints per second addition makes them useless, at least for me, and I use them a lot where working to debug and help others, so please do. Or On Wed, 2008-02-06 at 10:38 +0200, Or Gerlitz wrote: Eli, You have left somehow too many... debug prints in the last patches, please clean this up. See for example how the system log after less then a minute when ipoib debug prints are opened, it has one original print (ib0: Send unicast ARP to 0023) and all the rest are yours. Or Feb 6 14:39:23 kernel: ib0: posting zlen send, wrid = 4: head = 2756, tail = 2752 Feb 6 14:39:23 kernel: ib0: ipoib_ib_tx_timer_func-427: head = 2757 Feb 6 14:39:25 kernel: ib0: posting zlen send, wrid = 39: head = 2919, tail = 2912 Feb 6 14:39:25 kernel: ib0: ipoib_ib_tx_timer_func-427: head = 2920 Feb 6 14:39:25 kernel: ib0: posting zlen send, wrid = 15: head = 2959, tail = 2944 Feb 6 14:39:25 kernel: ib0: ipoib_ib_tx_timer_func-427: head = 2960 Feb 6 14:39:27 kernel: ib0: posting zlen send, wrid = 8: head = 3080, tail = 3072 Feb 6 14:39:27 kernel: ib0: ipoib_ib_tx_timer_func-427: head = 3081 Feb 6 14:39:34 kernel: ib0: posting zlen send, wrid = 51: head = 3699, tail = 3696 Feb 6 14:39:34 kernel: ib0: ipoib_ib_tx_timer_func-427: head = 3700 Feb 6 14:39:35 kernel: ib0: posting zlen send, wrid = 25: head = 3737, tail = 3728 Feb 6 14:39:35 kernel: ib0: ipoib_ib_tx_timer_func-427: head = 3738 Feb 6 14:39:35 kernel: ib0: posting zlen send, wrid = 3: head = 3779, tail = 3776 Feb 6 14:39:35 kernel: ib0: ipoib_ib_tx_timer_func-427: head = 3780 Feb 6 14:39:36 kernel: ib0: posting zlen send, wrid = 48: head = 3824, tail = 3808 Feb 6 14:39:36 kernel: ib0: ipoib_ib_tx_timer_func-427: head = 3825 Feb 6 14:39:38 kernel: ib0: posting zlen send, wrid = 24: head = 3992, tail = 3984 Feb 6 14:39:38 kernel: ib0: ipoib_ib_tx_timer_func-427: head = 3993 Feb 6 14:39:38 kernel: ib0: posting zlen send, wrid = 4: head = 4036, tail = 4032 Feb 6 14:39:38 kernel: ib0: ipoib_ib_tx_timer_func-427: head = 4037 Feb 6 14:39:46 kernel: ib0: Send unicast ARP to 0023 Feb 6 14:39:46 kernel: ib0: posting zlen send, wrid = 11: head = 4683, tail = 4672 Feb 6 14:39:46 kernel: ib0: ipoib_ib_tx_timer_func-427: head = 4684 Feb 6 14:39:58 kernel: ib0: posting zlen send, wrid = 58: head = 5626, tail = 5616 Feb 6 14:39:58 kernel: ib0: ipoib_ib_tx_timer_func-427: head = 5627 Feb 6 14:39:59 kernel: ib0: posting zlen send, wrid = 56: head = 5752, tail = 5744 Feb 6 14:39:59 kernel: ib0: ipoib_ib_tx_timer_func-427: head = 5753 Feb 6 14:40:01 kernel: ib0: posting zlen send, wrid = 54: head = 5878, tail = 5872 Feb 6 14:40:01 kernel: ib0: ipoib_ib_tx_timer_func-427: head = 5879 Feb 6 14:40:01 kernel: ib0: posting zlen send, wrid = 30: head = 5918, tail = 5904 Feb 6 14:40:01 kernel: ib0: ipoib_ib_tx_timer_func-427: head = 5919 Feb 6 14:40:10 kernel: ib0: posting zlen send, wrid = 33: head = 6689, tail = 6672 Feb 6 14:40:10 kernel: ib0: ipoib_ib_tx_timer_func-427: head = 6690 Feb 6 14:40:13 kernel: ib0: posting zlen send, wrid = 48: head = 6896, tail = 6880 Feb 6 14:40:13 kernel: ib0: ipoib_ib_tx_timer_func-427: head = 6897 Feb 6 14:40:13 kernel: ib0: posting zlen send, wrid = 26: head = 6938, tail = 6928 Feb 6 14:40:13 kernel: ib0: ipoib_ib_tx_timer_func-427: head = 6939 Feb 6 14:40:15 kernel: ib0: posting zlen send, wrid = 61: head = 7101, tail = 7088 Feb 6 14:40:15 kernel: ib0: ipoib_ib_tx_timer_func-427: head = 7102 ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ewg] OFED 1.3 rc4 update
Shirley Ma wrote: Thanks Tziporet. We will test it right after it's out. You can start use the lates build - http://www.openfabrics.org/builds/ofed-1.3/OFED-1.3-20080206-0751.tgz Tziporet ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ewg] Re: with the ipoib patches, debug prints spam the system log
On Wed, 2008-02-06 at 18:42 +0200, Or Gerlitz wrote: On 2/6/08, Eli Cohen [EMAIL PROTECTED] wrote: They are only visible when activating ipoib debug. I know it fills the dmesg ring with messages. Do you think I should remove them? Yes, you should remove them. The ipoib debug prints are very usefull to debug and analyze at the field, however, your 3 prints per second addition makes them useless, at least for me, and I use them a lot where working to debug and help others, so please do. Or OK ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ewg][PATCH][1/2] SRP multipath failover within 60 seconds,
diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c index 950228f..45a2533 100644 --- a/drivers/infiniband/ulp/srp/ib_srp.c +++ b/drivers/infiniband/ulp/srp/ib_srp.c @@ -400,7 +400,6 @@ printk(KERN_DEBUG PFX Sending CM DREQ failed\n); return; } -wait_for_completion(target-done); } static void srp_remove_work(struct work_struct *work) @@ -1266,7 +1294,6 @@ case IB_CM_TIMEWAIT_EXIT: printk(KERN_ERR PFX connection closed\n); -comp = 1; target-status = 0; break; Seems like this would leak the cm_id? ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ewg][PATCH][1/2] SRP multipath failover within 60 seconds,
Roland Dreier wrote: diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c index 950228f..45a2533 100644 --- a/drivers/infiniband/ulp/srp/ib_srp.c +++ b/drivers/infiniband/ulp/srp/ib_srp.c @@ -400,7 +400,6 @@ printk(KERN_DEBUG PFX Sending CM DREQ failed\n); return; } - wait_for_completion(target-done); } static void srp_remove_work(struct work_struct *work) @@ -1266,7 +1294,6 @@ case IB_CM_TIMEWAIT_EXIT: printk(KERN_ERR PFX connection closed\n); - comp = 1; target-status = 0; break; Seems like this would leak the cm_id? I said in my [0/2] email, this patch should be applied on top of srp_1_recreate_at_reconnect.patch which is already in ofed_1_3.git tree kernel_patches/fixes/ directory I attached it here Hello, Roland! Please consider the following for 2.6.19. --- From: Ishai Rabinovitz [EMAIL PROTECTED] For some reason (could be a firmware problem) I got a CQ overrun in SRP. Because of that there was a QP FATAL. Since in srp_reconnect_target we are not destroying the QP, the QP FATAL persists after the reconnect. In order to be able to recover from such situation I suggest we destroy the CQ and the QP in every reconnect. This also corrects a minor spec in-compliance - when srp_reconnect_target is called, srp destroys the CM ID and resets the QP, the new connection will be retried with the same QPN which could theoretically lead to stale packets (for strict spec compliance I think QPN should not be reused till all stale packets are flushed out of the network). --- IB/srp: destroy/re-create QP and CQ on each reconnect. This makes SRP more robust in presence of hardware errors and is closer to behaviour suggested by IB spec, reducing chance of stale packets. Signed-off-by: Ishai Rabinovitz [EMAIL PROTECTED] Signed-off-by: Michael S. Tsirkin [EMAIL PROTECTED] Index: last_stable/drivers/infiniband/ulp/srp/ib_srp.c === --- last_stable.orig/drivers/infiniband/ulp/srp/ib_srp.c 2006-08-31 12:23:52.0 +0300 +++ last_stable/drivers/infiniband/ulp/srp/ib_srp.c 2006-08-31 12:30:48.0 +0300 @@ -495,10 +495,10 @@ static int srp_reconnect_target(struct srp_target_port *target) { struct ib_cm_id *new_cm_id; - struct ib_qp_attr qp_attr; struct srp_request *req, *tmp; - struct ib_wc wc; int ret; + struct ib_cq *old_cq; + struct ib_qp *old_qp; spin_lock_irq(target-scsi_host-host_lock); if (target-state != SRP_TARGET_LIVE) { @@ -522,17 +522,17 @@ ib_destroy_cm_id(target-cm_id); target-cm_id = new_cm_id; - qp_attr.qp_state = IB_QPS_RESET; - ret = ib_modify_qp(target-qp, qp_attr, IB_QP_STATE); - if (ret) - goto err; - - ret = srp_init_qp(target, target-qp); - if (ret) + old_qp = target-qp; + old_cq = target-cq; + ret = srp_create_target_ib(target); + if (ret) { + target-qp = old_qp; + target-cq = old_cq; goto err; + } - while (ib_poll_cq(target-cq, 1, wc) 0) - ; /* nothing */ + ib_destroy_qp(old_qp); + ib_destroy_cq(old_cq); spin_lock_irq(target-scsi_host-host_lock); list_for_each_entry_safe(req, tmp, target-req_queue, list) -- MST ___ openib-general mailing list [EMAIL PROTECTED] http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ofa-general] Re: [ewg] OFED 1.3 rc4 update
Tziporet Koren wrote: Shirley Ma wrote: Thanks Tziporet. We will test it right after it's out. You can start use the lates build - http://www.openfabrics.org/builds/ofed-1.3/OFED-1.3-20080206-0751.tgz Tziporet I have downloaded the todays build mentioned above. I am still seeing the issue of failing ib_destroy_cq() for the rcq mentioned yesterday. Here are the steps that I follow: 1. On a freshly booted system configure ib0 2. Switch to connected mode ( on HCA that supports SRQ) 3. ping remote interface 4. modprobe -r ib_ehca 5. I see the failures about ib_destroy_cq() failing and the cascading failures following that (srq and pd cannot be destroyed) 6. If I try a modprobe ib_ehca I get an error Cannot allocate memory This also means some one is chewing tons of memory. I realize that the qp and associated pd were not freed, so some memory is lost. However, this system has 8 GB of memory. Pradeep ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ewg][PATCH][1/2] SRP multipath failover within 60 seconds,
Roland Dreier wrote: diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c index 950228f..45a2533 100644 --- a/drivers/infiniband/ulp/srp/ib_srp.c +++ b/drivers/infiniband/ulp/srp/ib_srp.c @@ -400,7 +400,6 @@ printk(KERN_DEBUG PFX Sending CM DREQ failed\n); return; } - wait_for_completion(target-done); } static void srp_remove_work(struct work_struct *work) @@ -1266,7 +1294,6 @@ case IB_CM_TIMEWAIT_EXIT: printk(KERN_ERR PFX connection closed\n); - comp = 1; target-status = 0; break; Seems like this would leak the cm_id? I said in my [0/2] email, this patch should be applied on top of srp_1_recreate_at_reconnect.patch which is already in ofed_1_3.git tree kernel_patches/fixes/ directory I attached it here Hello, Roland! Please consider the following for 2.6.19. --- From: Ishai Rabinovitz [EMAIL PROTECTED] For some reason (could be a firmware problem) I got a CQ overrun in SRP. Because of that there was a QP FATAL. Since in srp_reconnect_target we are not destroying the QP, the QP FATAL persists after the reconnect. In order to be able to recover from such situation I suggest we destroy the CQ and the QP in every reconnect. This also corrects a minor spec in-compliance - when srp_reconnect_target is called, srp destroys the CM ID and resets the QP, the new connection will be retried with the same QPN which could theoretically lead to stale packets (for strict spec compliance I think QPN should not be reused till all stale packets are flushed out of the network). --- IB/srp: destroy/re-create QP and CQ on each reconnect. This makes SRP more robust in presence of hardware errors and is closer to behaviour suggested by IB spec, reducing chance of stale packets. Signed-off-by: Ishai Rabinovitz [EMAIL PROTECTED] Signed-off-by: Michael S. Tsirkin [EMAIL PROTECTED] Index: last_stable/drivers/infiniband/ulp/srp/ib_srp.c === --- last_stable.orig/drivers/infiniband/ulp/srp/ib_srp.c 2006-08-31 12:23:52.0 +0300 +++ last_stable/drivers/infiniband/ulp/srp/ib_srp.c 2006-08-31 12:30:48.0 +0300 @@ -495,10 +495,10 @@ static int srp_reconnect_target(struct srp_target_port *target) { struct ib_cm_id *new_cm_id; - struct ib_qp_attr qp_attr; struct srp_request *req, *tmp; - struct ib_wc wc; int ret; + struct ib_cq *old_cq; + struct ib_qp *old_qp; spin_lock_irq(target-scsi_host-host_lock); if (target-state != SRP_TARGET_LIVE) { @@ -522,17 +522,17 @@ ib_destroy_cm_id(target-cm_id); target-cm_id = new_cm_id; - qp_attr.qp_state = IB_QPS_RESET; - ret = ib_modify_qp(target-qp, qp_attr, IB_QP_STATE); - if (ret) - goto err; - - ret = srp_init_qp(target, target-qp); - if (ret) + old_qp = target-qp; + old_cq = target-cq; + ret = srp_create_target_ib(target); + if (ret) { + target-qp = old_qp; + target-cq = old_cq; goto err; + } - while (ib_poll_cq(target-cq, 1, wc) 0) - ; /* nothing */ + ib_destroy_qp(old_qp); + ib_destroy_cq(old_cq); spin_lock_irq(target-scsi_host-host_lock); list_for_each_entry_safe(req, tmp, target-req_queue, list) -- MST ___ openib-general mailing list [EMAIL PROTECTED] http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] Re: [ofa-general] [ANNOUNCE] open iSCSI over iSER target RPM is available
Hi Erez Erez Zilber wrote: stgt (SCSI target) is an open-source framework for storage target drivers. It supports iSCSI over iSER among other storage target drivers. Voltaire added a git tree for stgt that will be added to OFED 1.4: http://www2.openfabrics.org/git/?p=~dorons/tgt.git;a=summary Until OFED 1.4 gets released, it is possible to install the stgt RPM on top of OFED 1.3. For more details about how to install and use stgt, please refer to https://wiki.openfabrics.org/tiki-index.php?page=ISER-target Some performance numbers that were measured by OSC (using SDR cards): Is there a 2TB limit on this target? It turns our 6TB partition into a 2TB lun. * READ: 920 MB/sec * WRITE: 850 MB/sec Not getting anything even remotely close to this. Are there more details on configuration somewhere? I followed the web page as indicated. Joe We hope to have DDR measurements numbers soon. -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics LLC, email: [EMAIL PROTECTED] web : http://www.scalableinformatics.com http://jackrabbit.scalableinformatics.com phone: +1 734 786 8423 fax : +1 866 888 3112 cell : +1 734 612 4615 ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] Re: [Stgt-devel] [ofa-general] [ANNOUNCE] open iSCSI over iSER target RPM is available
On Wed, 06 Feb 2008 16:38:11 -0500 Joe Landman [EMAIL PROTECTED] wrote: Hi Erez Erez Zilber wrote: stgt (SCSI target) is an open-source framework for storage target drivers. It supports iSCSI over iSER among other storage target drivers. Voltaire added a git tree for stgt that will be added to OFED 1.4: http://www2.openfabrics.org/git/?p=~dorons/tgt.git;a=summary Until OFED 1.4 gets released, it is possible to install the stgt RPM on top of OFED 1.3. For more details about how to install and use stgt, please refer to https://wiki.openfabrics.org/tiki-index.php?page=ISER-target Some performance numbers that were measured by OSC (using SDR cards): Is there a 2TB limit on this target? It turns our 6TB partition into a 2TB lun. No, there isn't. ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ofa-general] Re: [ewg] OFED 1.3 rc4 update
Pradeep Satyanarayana wrote: Tziporet Koren wrote: Shirley Ma wrote: Thanks Tziporet. We will test it right after it's out. You can start use the lates build - http://www.openfabrics.org/builds/ofed-1.3/OFED-1.3-20080206-0751.tgz Tziporet I have downloaded the todays build mentioned above. I am still seeing the issue of failing ib_destroy_cq() for the rcq mentioned yesterday. Here are the steps that I follow: 1. On a freshly booted system configure ib0 2. Switch to connected mode ( on HCA that supports SRQ) 3. ping remote interface 4. modprobe -r ib_ehca 5. I see the failures about ib_destroy_cq() failing and the cascading failures following that (srq and pd cannot be destroyed) The ib_destroy_qp() fails because of refcnt is not zero. On my system it was set to 2. Pradeep ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ewg] with the ipoib patches, debug prints spam the system log
Or Gerlitz wrote: You have left somehow too many... debug prints in the last patches, please clean this up. See for example how the system log after less then a minute when ipoib debug prints are opened, it has one original print (ib0: Send unicast ARP to 0023) and all the rest are yours. Feb 6 14:39:23 kernel: ib0: posting zlen send, wrid = 4: head = 2756, tail = 2752 Feb 6 14:39:23 kernel: ib0: ipoib_ib_tx_timer_func-427: head = 2757 Hi Eli, Just a reminder to remove this for RC4, using last night snapshot I still see it. Or. ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg