[PATCH 0/1] ib/ipoib: Added adaptive moderation algorithm for better latency.

2011-08-14 Thread Erez Shitrit
Hi Roland,

The following patch adds a new algorithm for adaptive moderation.
The main idea is changing the CQ moderation (only for the RX) according to the 
traffic.
The Adaptive moderation is controlled via ethtool: adaptive-rx on/off.
(more details in the patch itself.)

Some results without/with adaptive-rx:
--
for latency i run the test: 
netperf -n 8 -H 11.134.14.1 -c -C -P 1 -t UDP_RR -l 10
for BW i run the test:
iperf -c 11.134.14.1 -l 64K -P8

first setup:
no adaptive moderation, default moderation values:
 rx-usecs: 0 rx-frames: 0
we will get the next results:
latency 9 usec 
BW 6.60 Gbits/sec

second setup:
no adaptive moderation, manually, moderation values:
 rx-usecs: 10 rx-frames: 44
we will get the next results:
latency 18 usec 
BW 8.50 Gbits/sec

third setup:
adaptive moderation on (Adaptive RX: on)
we will get the next results:
latency 9 usec 
BW 8.60 Gbits/sec


As you can see, the adaptive moderation takes the good both tests (from 
different kind of traffic).

Thanks, Erez
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/1] ib/ipoib: Added adaptive moderation algorithm for better latency.

2011-08-14 Thread Erez Shitrit
From 8ea4a6d4387a07b4e0abfb92f164f5181cf636e4 Mon Sep 17 00:00:00 2001
From: Erez Shitrit ere...@mellanox.co.il
Date: Thu, 30 Jun 2011 09:58:09 +0300
Subject: [PATCH 1/2] ib/ipoib: Added adaptive moderation algorithm for better 
latency.

[PATCH V2]:
Adaptive moderation is controlled via ethtool: adaptive-rx on/off.

When adaptive moderation is enabled,the adaptive moderation task is started.
 The task runs on new workqueue, and reschedule itself for the next run.
 The task periodically (every 250 ms) samples the traffic (packet rate and 
average packet sizes),
 and runs an algorithm to define a new moderation time for the receive queue.
 The algorithm classifies the incoming traffic during each sampling interval
 into classes. The rx_usec value (i.e., moderation time) is then adjusted 
appropriately per class.

 There are two classes defined:

  A.  Bulk traffic: for heavy traffic consisting of packets of normal size.
  This class is further divided into two sub-classes:
1. Traffic that is mainly BW bound
- This traffic will get maximum moderation.
2. Traffic that is mostly latency bound
- For situations where low latency is vital such as cluster or 
grid computing
- For this traffic the rx_usec will be changed to a value in the 
range (ethtool.pkt_rate_low  .. ethtool.pkt_rate_high) depending on sampled 
packet rate.

  B.  Low latency traffic: for minimal traffic, or traffic consisting almost 
completely of small packets.
- This traffic will get minimum moderation.

Signed-off-by: Erez Shitrit ere...@mellanox.co.il
Reviewed-by: Eli cohen e...@mellanox.co.il
---
 drivers/infiniband/ulp/ipoib/ipoib.h   |   39 ++-
 drivers/infiniband/ulp/ipoib/ipoib_ethtool.c   |   67 +-
 drivers/infiniband/ulp/ipoib/ipoib_ib.c|6 +
 drivers/infiniband/ulp/ipoib/ipoib_main.c  |  167 
+++-
 drivers/infiniband/ulp/ipoib/ipoib_multicast.c |8 +-
 5 files changed, 278 insertions(+), 9 deletions(-)

diff --git a/drivers/infiniband/ulp/ipoib/ipoib.h 
b/drivers/infiniband/ulp/ipoib/ipoib.h
index 7b6985a..c58f231 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib.h
+++ b/drivers/infiniband/ulp/ipoib/ipoib.h
@@ -91,6 +91,7 @@ enum {
IPOIB_STOP_REAPER = 7,
IPOIB_FLAG_ADMIN_CM   = 9,
IPOIB_FLAG_UMCAST = 10,
+   IPOIB_FLAG_AUTO_MODER = 11, /*indicates moderation is running*/
 
IPOIB_MAX_BACKOFF_SECONDS = 16,
 
@@ -253,9 +254,43 @@ struct ipoib_cm_dev_priv {
int num_frags;
 };
 
+/* adaptive moderation parameters: */
+enum {
+   /* Target number of packets to coalesce with interrupt moderation */
+   IPOIB_RX_COAL_TARGET= 44,
+   IPOIB_RX_COAL_TIME  = 16,
+   IPOIB_TX_COAL_PKTS  = 5,
+   IPOIB_TX_COAL_TIME  = 0x80,
+   IPOIB_RX_RATE_LOW   = 40,
+   IPOIB_RX_COAL_TIME_LOW  = 0,
+   IPOIB_RX_RATE_HIGH  = 45,
+   IPOIB_RX_COAL_TIME_HIGH = 128,
+   IPOIB_RX_SIZE_THRESH= 1024,
+   IPOIB_RX_RATE_THRESH= 100 / IPOIB_RX_COAL_TIME_HIGH,
+   IPOIB_SAMPLE_INTERVAL   = 0,
+   IPOIB_AVG_PKT_SMALL = 256,
+   IPOIB_AUTO_CONF = 0x,
+   ADAPT_MODERATION_DELAY  = HZ / 4,
+};
+
 struct ipoib_ethtool_st {
-   u16 coalesce_usecs;
+   __u32 rx_max_coalesced_frames;
+   __u32 rx_coalesce_usecs;
+/* u16 coalesce_usecs;
u16 max_coalesced_frames;
+*/
+   __u32   pkt_rate_low;
+   __u32   pkt_rate_high;
+   __u32   rx_coalesce_usecs_low;
+   __u32   rx_coalesce_usecs_high;
+   __u32   rate_sample_interval;
+   __u32   use_adaptive_rx_coalesce;
+   int last_moder_time;
+   u16 sample_interval;
+   unsigned long last_moder_jiffies;
+   unsigned long last_moder_packets;
+   unsigned long last_moder_tx_packets;
+   unsigned long last_moder_bytes;
 };
 
 /*
@@ -289,6 +324,7 @@ struct ipoib_dev_priv {
struct work_struct flush_heavy;
struct work_struct restart_task;
struct delayed_work ah_reap_task;
+   struct delayed_work adaptive_moder_task;
 
struct ib_device *ca;
u8port;
@@ -409,6 +445,7 @@ struct ipoib_neigh *ipoib_neigh_alloc(struct neighbour 
*neigh,
 void ipoib_neigh_free(struct net_device *dev, struct ipoib_neigh *neigh);
 
 extern struct workqueue_struct *ipoib_workqueue;
+extern struct workqueue_struct *ipoib_auto_moder_workqueue;
 
 /* functions */
 
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_ethtool.c 
b/drivers/infiniband/ulp/ipoib/ipoib_ethtool.c
index 29bc7b5..b41c061 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_ethtool.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_ethtool.c
@@ -46,18 +46,30 @@ static int ipoib_get_coalesce(struct net_device *dev,
  struct ethtool_coalesce *coal)
 {
struct ipoib_dev_priv *priv = netdev_priv(dev);
-
-