[PATCH] MII bitbang driver should generate MII bus phy_mask dynamically

2007-06-11 Thread Mark Zhan
Current MII bitbang bus driver hard-codes the phy mask of mii_bus to
~0x09, which is actually specific to the FSL boards. This patch will
make the bitbang driver to generate MII bus phy_mask dynamically, by the
PHY irq info provided by the platform.

Signed-off-by: Mark Zhan <[EMAIL PROTECTED]>
---
 b/drivers/net/fs_enet/mii-bitbang.c |9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/drivers/net/fs_enet/mii-bitbang.c
b/drivers/net/fs_enet/mii-bitbang.c
index d384010..3732d69 100644
--- a/drivers/net/fs_enet/mii-bitbang.c
+++ b/drivers/net/fs_enet/mii-bitbang.c
@@ -315,7 +315,7 @@ static int __devinit fs_enet_mdio_probe(
struct fs_mii_bb_platform_info *pdata;
struct mii_bus *new_bus;
struct bb_info *bitbang;
-   int err = 0;
+   int i, err = 0;
 
if (NULL == dev)
return -EINVAL;
@@ -336,14 +336,17 @@ static int __devinit fs_enet_mdio_probe(
new_bus->reset = &fs_enet_mii_bb_reset,
new_bus->id = pdev->id;
 
-   new_bus->phy_mask = ~0x9;
pdata = (struct fs_mii_bb_platform_info *)pdev->dev.platform_data;
-
if (NULL == pdata) {
printk(KERN_ERR "gfar mdio %d: Missing platform data!\n", 
pdev->id);
return -ENODEV;
}
 
+   new_bus->phy_mask = 0x;
+   for (i = 0; i < PHY_MAX_ADDR; i++)
+   if (pdata->irq[i] != -1)
+   new_bus->phy_mask &= ~(1 << i);
+
/*set up workspace*/
fs_mii_bitbang_init(bitbang, pdata);
 

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH][RFC] network splice receive

2007-06-11 Thread Jens Axboe
On Fri, Jun 08 2007, Evgeniy Polyakov wrote:
> Size of the received file is bigger than file sent, file contains repeated
> blocks of data sometimes. Cloned skb usage is likely too big overhead,
> although for receiving fast clone is unused in most cases, so there
> might be some gain.

That was actually a new bug, here:

plen -= *offset;
poff += *offset;

in __skb_slice_bits(), we should only subtract the offset from plen, not
add to poff. Then we just create some weird hole without any meaning. So
remove those two poff additions in the two loops, and the size issue is
resolved at least.

-- 
Jens Axboe

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] fix race in AF_UNIX

2007-06-11 Thread Miklos Szeredi
[CC'd Al Viro and Alan Cox, restored patch]

> > There are races involving the garbage collector, that can throw away
> > perfectly good packets with AF_UNIX sockets in them.
> > 
> > The problems arise when a socket goes from installed to in-flight or
> > vice versa during garbage collection.  Since gc is done with a
> > spinlock held, this only shows up on SMP.
> > 
> > Signed-off-by: Miklos Szeredi <[EMAIL PROTECTED]>
> 
> I'm going to hold off on this one for now.
> 
> Holding all of the read locks kind of defeats the purpose of using
> the per-socket lock.
> 
> Can't you just lock purely around the receive queue operation?

That's already protected by the receive queue spinlock.  The race
however happens _between_ pushing the root set and marking of the
in-flight but reachable sockets.

If in that space any of the AF_UNIX sockets goes from in-flight to
installed into a file descriptor, the garbage collector can miss it.
If we want to protect against this using unix_sk(s)->readlock, then we
have to hold all of them for the duration of the marking.

Al, Alan, you have more experience with this piece of code.  Do you
have better ideas about how to fix this?

Thanks,
Miklos

> Index: linux-2.6.22-rc2/net/unix/garbage.c
> ===
> --- linux-2.6.22-rc2.orig/net/unix/garbage.c  2007-06-03 23:58:11.0 
> +0200
> +++ linux-2.6.22-rc2/net/unix/garbage.c   2007-06-06 09:48:36.0 
> +0200
> @@ -186,7 +186,21 @@ void unix_gc(void)
>  
>   forall_unix_sockets(i, s)
>   {
> - unix_sk(s)->gc_tree = GC_ORPHAN;
> + struct unix_sock *u = unix_sk(s);
> +
> + u->gc_tree = GC_ORPHAN;
> +
> + /*
> +  * Hold ->readlock to protect against sockets going from
> +  * in-flight to installed
> +  *
> +  * Can't sleep on this, because
> +  *   a) we are under spinlock
> +  *   b) skb_recv_datagram() could be waiting for a packet that
> +  *  is to be sent by this thread
> +  */
> + if (!mutex_trylock(&u->readlock))
> + goto lock_failed;
>   }
>   /*
>*  Everything is now marked
> @@ -207,8 +221,6 @@ void unix_gc(void)
>  
>   forall_unix_sockets(i, s)
>   {
> - int open_count = 0;
> -
>   /*
>*  If all instances of the descriptor are not
>*  in flight we are in use.
> @@ -218,10 +230,20 @@ void unix_gc(void)
>*  In this case (see unix_create1()) we set artificial
>*  negative inflight counter to close race window.
>*  It is trick of course and dirty one.
> +  *
> +  *  Get the inflight counter first, then the open
> +  *  counter.  This avoids problems if racing with
> +  *  sendmsg
> +  *
> +  *  If just created socket is not yet attached to
> +  *  a file descriptor, assume open_count of 1
>*/
> + int inflight_count = atomic_read(&unix_sk(s)->inflight);
> + int open_count = 1;
> +
>   if (s->sk_socket && s->sk_socket->file)
>   open_count = file_count(s->sk_socket->file);
> - if (open_count > atomic_read(&unix_sk(s)->inflight))
> + if (open_count > inflight_count)
>   maybe_unmark_and_push(s);
>   }
>  
> @@ -300,6 +322,7 @@ void unix_gc(void)
>   spin_unlock(&s->sk_receive_queue.lock);
>   }
>   u->gc_tree = GC_ORPHAN;
> + mutex_unlock(&u->readlock);
>   }
>   spin_unlock(&unix_table_lock);
>  
> @@ -309,4 +332,19 @@ void unix_gc(void)
>  
>   __skb_queue_purge(&hitlist);
>   mutex_unlock(&unix_gc_sem);
> + return;
> +
> + lock_failed:
> + {
> + struct sock *s1;
> + forall_unix_sockets(i, s1) {
> + if (s1 == s) {
> + spin_unlock(&unix_table_lock);
> + mutex_unlock(&unix_gc_sem);
> + return;
> + }
> + mutex_unlock(&unix_sk(s1)->readlock);
> + }
> + BUG();
> + }
>  }
> 
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/2] NetXen: Link status and multicast filter fixes

2007-06-11 Thread Mithlesh Thukral
Hi All,

I will be resending the patches related to link status and multicast
filter of NetXen's 1/10G Ethernet driver code after incorporating 
the feedback which we got.
These patches are wrt netdev#upstream-fixes.

Regards,
Mithlesh Thukral
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] NetXen: Fix link status messages

2007-06-11 Thread Mithlesh Thukral
NetXen: Fix incorrect link status even with switch turned OFF.
NetXen driver failed to accurately indicate when a link is up or down. 
This was encountered during failover testing, when the first port 
indicated that the link was up even when the 10G switch it was assigned
to in the Bladecenter was turned off completely. 

Signed-off by: Wen Xiong <[EMAIL PROTECTED]>
Signed-off by: Mithlesh Thukral <[EMAIL PROTECTED]>
---

 drivers/net/netxen/netxen_nic.h  |1 +
 drivers/net/netxen/netxen_nic_init.c |   21 +
 drivers/net/netxen/netxen_nic_isr.c  |   24 
 3 files changed, 38 insertions(+), 8 deletions(-)

diff --git a/drivers/net/netxen/netxen_nic.h b/drivers/net/netxen/netxen_nic.h
index ad6688e..a0b39ee 100644
--- a/drivers/net/netxen/netxen_nic.h
+++ b/drivers/net/netxen/netxen_nic.h
@@ -1048,6 +1048,7 @@ int netxen_rom_se(struct netxen_adapter 
 int netxen_do_rom_se(struct netxen_adapter *adapter, int addr);
 
 /* Functions from netxen_nic_isr.c */
+int netxen_nic_link_ok(struct netxen_adapter *adapter);
 void netxen_nic_isr_other(struct netxen_adapter *adapter);
 void netxen_indicate_link_status(struct netxen_adapter *adapter, u32 link);
 void netxen_handle_port_int(struct netxen_adapter *adapter, u32 enable);
diff --git a/drivers/net/netxen/netxen_nic_init.c 
b/drivers/net/netxen/netxen_nic_init.c
index a368924..06e4f5c 100644
--- a/drivers/net/netxen/netxen_nic_init.c
+++ b/drivers/net/netxen/netxen_nic_init.c
@@ -1036,18 +1036,23 @@ void netxen_watchdog_task(struct work_st
if ((adapter->portnum  == 0) && netxen_nic_check_temp(adapter))
return;
 
+   if (adapter->handle_phy_intr)
+   adapter->handle_phy_intr(adapter);
+
netdev = adapter->netdev;
-   if ((netif_running(netdev)) && !netif_carrier_ok(netdev)) {
-   printk(KERN_INFO "%s port %d, %s carrier is now ok\n",
-  netxen_nic_driver_name, adapter->portnum, netdev->name);
+   if ((netif_running(netdev)) && !netif_carrier_ok(netdev) &&
+   netxen_nic_link_ok(adapter) ) {
+   printk(KERN_INFO "%s %s (port %d), Link is up\n",
+  netxen_nic_driver_name, netdev->name, 
adapter->portnum);
netif_carrier_on(netdev);
-   }
-
-   if (netif_queue_stopped(netdev))
netif_wake_queue(netdev);
+   } else if(!(netif_running(netdev)) && netif_carrier_ok(netdev)) {
+   printk(KERN_ERR "%s %s Link is Down\n",
+   netxen_nic_driver_name, netdev->name);
+   netif_carrier_off(netdev);
+   netif_stop_queue(netdev);
+   }
 
-   if (adapter->handle_phy_intr)
-   adapter->handle_phy_intr(adapter);
mod_timer(&adapter->watchdog_timer, jiffies + 2 * HZ);
 }
 
diff --git a/drivers/net/netxen/netxen_nic_isr.c 
b/drivers/net/netxen/netxen_nic_isr.c
index b213b06..b2de6b6 100644
--- a/drivers/net/netxen/netxen_nic_isr.c
+++ b/drivers/net/netxen/netxen_nic_isr.c
@@ -169,6 +169,24 @@ void netxen_nic_gbe_handle_phy_intr(stru
netxen_nic_isr_other(adapter);
 }
 
+int netxen_nic_link_ok(struct netxen_adapter *adapter)
+{
+   switch (adapter->ahw.board_type) {
+   case NETXEN_NIC_GBE:
+   return ((adapter->ahw.qg_linksup) & 1);
+
+   case NETXEN_NIC_XGBE:
+   return ((adapter->ahw.xg_linkup) & 1);
+
+   default:
+   printk(KERN_ERR"%s: Function: %s, Unknown board type\n",
+   netxen_nic_driver_name, __FUNCTION__);
+   break;
+   }
+
+   return 0;
+}
+
 void netxen_nic_xgbe_handle_phy_intr(struct netxen_adapter *adapter)
 {
struct net_device *netdev = adapter->netdev;
@@ -183,6 +201,10 @@ void netxen_nic_xgbe_handle_phy_intr(str
printk(KERN_INFO "%s: %s NIC Link is down\n",
   netxen_nic_driver_name, netdev->name);
adapter->ahw.xg_linkup = 0;
+   if (netif_running(netdev)) {
+   netif_carrier_off(netdev);
+   netif_stop_queue(netdev);
+   }
/* read twice to clear sticky bits */
/* WINDOW = 0 */
netxen_nic_read_w0(adapter, NETXEN_NIU_XG_STATUS, &val1);
@@ -196,5 +218,7 @@ void netxen_nic_xgbe_handle_phy_intr(str
printk(KERN_INFO "%s: %s NIC Link is up\n",
   netxen_nic_driver_name, netdev->name);
adapter->ahw.xg_linkup = 1;
+   netif_carrier_on(netdev);
+   netif_wake_queue(netdev);
}
 }
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] NetXen: Add correct routines to setup multicast address

2007-06-11 Thread Mithlesh Thukral
NetXen: Add multi cast filter code
This patch adds multi cast filter code to NetXen NIC driver.
It also adds capabilities to setup the multicast address in hardware
from the host side.

Signed-off by: Mithlesh Thukral <[EMAIL PROTECTED]>
---

 drivers/net/netxen/netxen_nic.h |   24 
 drivers/net/netxen/netxen_nic_hdr.h |3 
 drivers/net/netxen/netxen_nic_hw.c  |  132 +-
 3 files changed, 156 insertions(+), 3 deletions(-)

diff --git a/drivers/net/netxen/netxen_nic.h b/drivers/net/netxen/netxen_nic.h
index a0b39ee..2fddfd1 100644
--- a/drivers/net/netxen/netxen_nic.h
+++ b/drivers/net/netxen/netxen_nic.h
@@ -261,6 +261,27 @@ #define netxen_set_msg_ctxid(config_word
 #define netxen_set_msg_opcode(config_word, val)\
((config_word) &= ~(0xf<<28), (config_word) |= (val & 0xf) << 28)
 
+#define netxen_set_addr_ctl_id_pool0(config_word, val) \
+   ((config_word) &= ~3, (config_word) |= val & 0x3)
+#define netxen_set_addr_ctl_enable_xtnd_0(config_word) \
+   ((config_word) |= 1 << 2)
+#define netxen_set_addr_ctl_id_pool1(config_word, val) \
+   ((config_word) &= ~(0x3<<4), (config_word) |= (val & 0x3) << 4)
+#define netxen_set_addr_ctl_enable_xtnd_1(config_word) \
+   ((config_word) |= 1 << 6)
+#define netxen_set_addr_ctl_id_pool2(config_word, val) \
+   ((config_word) &= ~(0x3<<8), (config_word) |= (val & 0x3) << 8)
+#define netxen_set_addr_ctl_enable_xtnd_2(config_word) \
+   ((config_word) |= 1 << 10)
+#define netxen_set_addr_ctl_id_pool3(config_word, val) \
+   ((config_word) &= ~(0x3<<12), (config_word) |= (val & 0x3) << 12)
+#define netxen_set_addr_ctl_enable_xtnd_3(config_word) \
+   ((config_word) |= 1 << 14)
+#define netxen_set_addr_ctl_mode(config_word, val) \
+   ((config_word) &= ~(0x3<<26), (config_word) |= (val & 0x3) << 26)
+#define netxen_set_addr_ctl_enable_poll(config_word, val)  \
+   ((config_word) &= ~(0xf<<30), (config_word) |= (val & 0xf) << 30)
+
 struct netxen_rcv_context {
__le64 rcv_ring_addr;
__le32 rcv_ring_size;
@@ -883,6 +904,9 @@ struct netxen_adapter {
unsigned char mac_addr[ETH_ALEN];
int mtu;
int portnum;
+   u8 promisc;
+   u8 mc_enabled;
+   u8 max_mc_count;
 
spinlock_t tx_lock;
spinlock_t lock;
diff --git a/drivers/net/netxen/netxen_nic_hdr.h 
b/drivers/net/netxen/netxen_nic_hdr.h
index 608e37b..2bfecbc 100644
--- a/drivers/net/netxen/netxen_nic_hdr.h
+++ b/drivers/net/netxen/netxen_nic_hdr.h
@@ -545,6 +545,9 @@ #define NETXEN_MULTICAST_ADDR_HI_1  (NETX
 #define NETXEN_MULTICAST_ADDR_HI_2 (NETXEN_CRB_NIU + 0x1018)
 #define NETXEN_MULTICAST_ADDR_HI_3 (NETXEN_CRB_NIU + 0x101c)
 
+#define NETXEN_UNICAST_ADDR_BASE   (NETXEN_CRB_NIU + 0x1080)
+#define NETXEN_MULTICAST_ADDR_BASE (NETXEN_CRB_NIU + 0x1100)
+
 #defineNETXEN_NIU_GB_MAC_CONFIG_0(I)   \
(NETXEN_CRB_NIU + 0x3 + (I)*0x1)
 #defineNETXEN_NIU_GB_MAC_CONFIG_1(I)   \
diff --git a/drivers/net/netxen/netxen_nic_hw.c 
b/drivers/net/netxen/netxen_nic_hw.c
index baff17a..17ab06a 100644
--- a/drivers/net/netxen/netxen_nic_hw.c
+++ b/drivers/net/netxen/netxen_nic_hw.c
@@ -303,6 +303,110 @@ int netxen_nic_set_mac(struct net_device
return 0;
 }
 
+#define NETXEN_UNICAST_ADDR(port, index)   \
+   (NETXEN_UNICAST_ADDR_BASE+(port*32)+(index*8))
+
+int netxen_nic_enable_mcast_filter(struct netxen_adapter *adapter)
+{
+   u32 val = 0;
+   u16 port = physical_port[adapter->portnum];
+
+   if (adapter->mc_enabled)
+   return 0;
+   
+   netxen_set_addr_ctl_enable_poll(val, 0xf);
+
+   if (adapter->ahw.board_type == NETXEN_NIC_XGBE)
+   netxen_set_addr_ctl_mode(val, 0x3);
+   else
+   netxen_set_addr_ctl_mode(val, 0x0);
+
+   netxen_set_addr_ctl_id_pool0(val, 0x0);
+   netxen_set_addr_ctl_id_pool1(val, 0x1);
+   netxen_set_addr_ctl_id_pool2(val, 0x2);
+   netxen_set_addr_ctl_id_pool3(val, 0x3);
+
+   netxen_set_addr_ctl_enable_xtnd_0(val);
+   netxen_set_addr_ctl_enable_xtnd_1(val);
+   netxen_set_addr_ctl_enable_xtnd_2(val);
+   netxen_set_addr_ctl_enable_xtnd_3(val);
+   
+   netxen_crb_writelit_adapter(adapter, NETXEN_MAC_ADDR_CNTL_REG, val);
+   
+   val = 0xff;
+
+   netxen_crb_writelit_adapter(adapter, NETXEN_UNICAST_ADDR(port,0), val);
+   netxen_crb_writelit_adapter(adapter, NETXEN_UNICAST_ADDR(port,0)+4, 
+   val);
+   
+   val = 0x00;
+   val = (u32) adapter->mac_addr[0] | 
+   ((u32) adapter->mac_addr[1] << 8) |
+   ((u32) adapter->mac_addr[2] << 16);
+
+   netxen_crb_writelit_adapter(adapter, NETXEN_UNICAST_ADDR(port,1), val);
+
+   val = 0x00;
+   val = (u32) adapter->mac_addr[3] |
+   ((u32) adapter->mac_addr[4] << 8) |
+   ((u32) adapter->mac_addr[5] << 16);
+

[PATCH] qdisc_restart - readability changes (plus couple of optimizations)

2007-06-11 Thread Krishna Kumar
Hi Dave,

I am sorry to miss out during Jamal's original effort to make
qdisc_restart more readable, but I feel some more changes could
help in that direction. Plus there were some useful comments
in the original code which I updated (hopefully correctly) and
put back - it does seem useful for a new reader.

Other changes :

- netif_queue_stopped need not be called inside qdisc_restart as
  it has been called already in qdisc_run() before the first skb
  is sent, and in __qdisc_run() after each intermediate skb is 
  sent (note : we are the only sender, so the queue cannot get
  stopped while the tx lock was got in the ~LLTX case).

- BUG_ON((int) q->q.qlen < 0) was a relic from old times when -1
  meant more packets are available, and __qdisc_run used to loop
  when qdisc_restart() returned -1. During those days, it was
  necessary to make sure that qlen is never less than zero, since
  __qdisc_run would get into an infinite loop if no packets are on
  the queue and this bug in qdisc was there (and worse - no more
  skbs could ever get queue'd as we hold the queue lock too). With
  Herbert's recent change to return values, this check is not
  required and hopefully Herbert can validate this change. If at
  all the check is required, it should be added to skb_dequeue (in
  failure case), and not to qdisc_qlen.

- Convert to use neater switch/case code.

- "if (ret == NETDEV_TX_LOCKED && lockless)" can be changed to
  remove the lockless check, since driver will return
  NETDEV_TX_LOCKED only if lockless is true and driver has to do the
  locking. In the original code, this hack was fixing a driver bug
  where the driver was also getting tx lock despite ~LLTX being set.
  But that is anyway handled in the current code where
  netif_tx_unlock() is called, plus removing this check will catch
  these driver bugs instead of hiding the problem.

- Changed some names, like try_get_tx_pkt to dev_dequeue (as that is
  the work being done and easier to understand) and similarly
  do_dev_requeue to dev_requeue, merged handle_dev_cpu_collision and
  tx_islocked to dev_handle_collision (handle_dev_cpu_collision is a
  small routine with only one caller, so no need to have two separate
  routines thereby getting rid of two unnecessary macros), etc.

- Removed an XXX comment as it should never fail (I suspect this was
  related to batch skb WIP, Jamal ?). Converted some functions to
  original coding style of having return values and the function name
  on same line, eg prio2list.

I ran a 50 process netperf2 (both TCP & UDP) without issues.

Please provide your comments/feedback (cc'ing the original reviewers
also, as per Jamal's mail from May 25th).

Thanks,

- KK

Signed-off-by: Krishna Kumar <[EMAIL PROTECTED]>
---

diff -ruNp org/net/sched/sch_generic.c new/net/sched/sch_generic.c
--- org/net/sched/sch_generic.c 2007-06-11 13:12:11.0 +0530
+++ new/net/sched/sch_generic.c 2007-06-11 15:37:48.0 +0530
@@ -34,9 +34,6 @@
 #include 
 #include 
 
-#define SCHED_TX_DROP -2
-#define SCHED_TX_QUEUE -3
-
 /* Main transmission queue. */
 
 /* Modifications to data participating in scheduling must be protected with
@@ -64,45 +61,27 @@ void qdisc_unlock_tree(struct net_device
 
 static inline int qdisc_qlen(struct Qdisc *q)
 {
-   BUG_ON((int) q->q.qlen < 0);
return q->q.qlen;
 }
 
-static inline int handle_dev_cpu_collision(struct net_device *dev)
-{
-   if (unlikely(dev->xmit_lock_owner == smp_processor_id())) {
-   if (net_ratelimit())
-   printk(KERN_WARNING
-  "Dead loop on netdevice %s, fix it urgently!\n",
-  dev->name);
-   return SCHED_TX_DROP;
-   }
-   __get_cpu_var(netdev_rx_stat).cpu_collision++;
-   return SCHED_TX_QUEUE;
-}
-
-static inline int
-do_dev_requeue(struct sk_buff *skb, struct net_device *dev, struct Qdisc *q)
+static inline int requeue_skb(struct sk_buff *skb, struct net_device *dev,
+ struct Qdisc *q)
 {
-
if (unlikely(skb->next))
dev->gso_skb = skb;
else
q->ops->requeue(skb, q);
-   /* XXX: Could netif_schedule fail? Or is the fact we are
-* requeueing imply the hardware path is closed
-* and even if we fail, some interupt will wake us
-*/
+
netif_schedule(dev);
return 0;
 }
 
-static inline struct sk_buff *
-try_get_tx_pkt(struct net_device *dev, struct Qdisc *q)
+static inline struct sk_buff *dequeue_skb(struct net_device *dev,
+ struct Qdisc *q)
 {
-   struct sk_buff *skb = dev->gso_skb;
+   struct sk_buff *skb;
 
-   if (skb)
+   if ((skb = dev->gso_skb))
dev->gso_skb = NULL;
else
skb = q->dequeue(q);
@@ -110,92 +89,111 @@ try_get_tx_pkt(struct net_device *dev, s
return skb;
 }
 
-static inline int
-tx_islocked(struct sk_buff *skb, stru

Re: [GIT]: net-2.6.23 tree available

2007-06-11 Thread Patrick McHardy
David Miller wrote:
> Get it while it's hot:
> 
>   kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6.23.git
> 
> We have the qdisc_restart cleanup from Jama, some TIPC work
> from Allan Stephens and co., and Patrick's rtnl_link work.


I hope this still gives me the chance to submit updated patches
since I've changed a few things since the last submission and
a single incremental update would mix things that don't really
belong together.

BTW, I'm back from my trip, I'll work through all the comments
regarding these patches now.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Virtual ethernet tunnel

2007-06-11 Thread Patrick McHardy
Pavel Emelianov wrote:
> Patrick McHardy wrote:
> 

>>>+skb->pkt_type = PACKET_HOST;
>>>+skb->protocol = eth_type_trans(skb, rcv);
>>>+if (dev->features & NETIF_F_NO_CSUM)
>>>+skb->ip_summed = rcv_priv->ip_summed;
>>>+
>>>+dst_release(skb->dst);
>>>+skb->dst = NULL;
>>>+
>>>+secpath_reset(skb);
>>>+nf_reset(skb);
>>
>>
>>Is skb->mark supposed to survive communication between different
>>namespaces?
> 
> 
> I guess it must not. Thanks.


I guess there are a few others that should be cleared as well,
like the tc related members, secmark, ipvs_property, ...

>>The rtnl_link codes looks fine. I don't like the VETH_INFO_MAC attribute
>>very much though, we already have a generic device attribute for MAC
>>addresses. Of course that only allows you to supply one MAC address, so
>>I'm wondering what you think of allocating only a single device per
>>newlink operation and binding them in a seperate enslave operation?
> 
> 
> I did this at the very first version, but Alexey showed me that this
> would be wrong. Look. When we create the second device it must be in
> the other namespace as it is useless to have them in one namespace.
> But if we have the device in the other namespace the RTNL_NEWLINK 
> message from kernel would come into this namespace thus confusing ip
> utility in the init namespace. Creating the device in the init ns and
> moving it into the new one is rather a complex task.
> 
> But with such approach the creation looks really logical. We send a 
> packet to the kernel and have a single response about the new device 
> appearance. At the same time we have a RTNL_NEWLINK message arrived at 
> the destination namespace informing that a new device has appeared 
> there as well.


The question is how to proceed. I haven't read all mails yet, but it
seems there is some disagreement about whether to create all devices
in the same namespace and move them later or create them directly in
their target namespace. For now I guess it doesn't matter much, so
can everyone agree to adding a IFLA_PARTNER attribute that includes
a complete ifinfomsg and the attributes and you later decide how to
handle namespaces?

>>>+enum {
>>>+VETH_INFO_UNSPEC,
>>>+VETH_INFO_MAC,
>>>+VETH_INFO_PEER,
>>>+VETH_INFO_PEER_MAC,
>>>+
>>>+VETH_INFO_MAX
>>>+};
>>
>>Please follow the
>>
>>#define VETH_INFO_MAX (__VETH_INFO_MAX - 1)
>>
>>convention here.
> 
> 
> Could you please clarify this point. I saw the lines
> enum {
>   ...
>   RTNL_NEWLINK
> #define RTNL_NEWLINK RTNL_NEWLINK
>   ...
> }
> and had my brains exploded imagining what this would mean :(


Thats just to make the new attributes visible as preprocessor
symbols so userspace can use them for #ifdefs. We usually use
it when adding new attributes/message types, but its not necessary
for the initial set of attributes if you already have some other
preprocessor-visisble symbol (like VETH_INFO_MAX) userspace can
use.

What I was refering to is this convention:

enum {
...
__IFLA_MAX
};

#define IFLA_MAX (__IFLA_MAX - 1)

Which is used to make sure that IFLA_MAX is really the max and
not max + 1 and additionally people won't forget to update it.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Virtual ethernet tunnel (v.2)

2007-06-11 Thread Patrick McHardy
Ben Greear wrote:
> Pavel Emelianov wrote:
> 
>>> I would also like some way to identify veth from other device types,
>>> preferably
>>> something like a value in sysfs.   However, that should not hold up
>>> 
>>
>>
>> We can do this with ethtool. It can get and print the driver name of
>> the device.
>>   
> 
> I think I'd like something in sysfs that we could query for any
> interface.  Possible return
> strings could be:
> VLAN
> VETH
> ETH
> PPP
> BRIDGE
> AP /* wifi access point interface */
> STA /* wifi station */
> 
> 
> I will cook up a patch for consideration after veth goes in.


The rtnl_link API gives you the name of the driver (IFLA_INFO_KIND).

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] NET: Multiqueue network device support.

2007-06-11 Thread Patrick McHardy
jamal wrote:
> On Wed, 2007-06-06 at 17:11 +0200, Patrick McHardy wrote:
> 
> 
>>[...]
> The problem is the premise is _innacurate_.
> Since you havent followed the discussion, i will try to be brief (which
> is hard).
> If you want verbosity it is in my previous emails:
> 
> Consider a simple example of strict prio qdisc which is mirror
> configuration of a specific hardware. 
> Then for sake of discussion, assume two prio queues in the qdisc - PSL
> and PSH and two hardware queues/rings in a NIC which does strict prio
> with queues PHL and PHH.
> The mapping is as follows:
> PSL --- maps to --- PHL
> PSH --- maps to --- PHH
> 
> Assume the PxH has a higher prio than PxL.
> Strict prio will always favor H over L.
> 
> Two scenarios:
> a) a lot of packets for PSL arriving on the stack.
> They only get sent from PSL -> PHL if and only if there are no
> packets from PSH->PHH.
> b)a lot of packets for PSH arriving from the stack.
> They will always be favored over PSL in sending to the hardware.
> 
>>From the above:
> The only way PHL will ever shutdown the path to the hardware is when
> there are sufficient PHL packets.
> Corrollary,
> The only way PSL will ever shutdown the path to the hardware is when
> there are _NO_ PSH packets.


Thats not true. Assume PSL has lots of packets, PSH is empty. We
fill the PHL queue until their is no room left, so the driver
has to stop the queue. Now some PSH packets arrive, but the queue
is stopped, no packets will be sent. Now, you can argue that as
soon as the first PHL packet is sent there is room for more and
the queue will be activated again and we'll take PSH packets,
so it doesn't matter because we can't send two packets at once
anyway. Fine. Take three HW queues, prio 0-2. The prio 2 queue
is entirely full, prio 1 has some packets queued and prio 0 is
empty. Now, because prio 2 is completely full, the driver has to
stop the queue. Before it can start it again it has to send all
prio 1 packets and then at least one packet of prio 2. Until
this happens, no packets can be queued to prio 0.

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH][RFC] network splice receive v2

2007-06-11 Thread Jens Axboe
Hi,

Here's an updated implementation of tcp network splice receive support.
It actually works for me now, no data corruption seen.

For the original announcement and how to test it, see:

http://marc.info/?l=linux-netdev&m=118103093400770&w=2

I hang on to the original skb by creating a clone of it and holding
references to that. I really don't need a clone, but tcp_read_sock() is
not being very helpful by calling sk_eat_skb() which blissfully ignores
any reference counts and uncondtionally frees the skb - why is that??
The clone works around that issue.

The current code also gets rid of the data_ready hack, and it should now
handle linear/fragments/fraglist in the skb just fine. Thanks to Olaf
for providing review of that stuff!

Patches are against the #splice branch of the block repo, "official" url
of that is:

git://git.kernel.dk/data/git/linux-2.6-block.git/

and it's based on Linus main tree. Let me know if I should supply netdev
branch patches instead, or even just provide a rolled up patch (or patch
series) for anyone curious to test or play with it.

-- 
Jens Axboe

>From fd79bf84fdeb15b72f256c342609257ae8a56235 Mon Sep 17 00:00:00 2001
From: Jens Axboe <[EMAIL PROTECTED]>
Date: Mon, 11 Jun 2007 13:00:32 +0200
Subject: [PATCH] splice: don't assume regular pages in splice_to_pipe()

Allow caller to pass in a release function, there might be
other resources that need releasing as well. Needed for
network receive.

Signed-off-by: Jens Axboe <[EMAIL PROTECTED]>
---
 fs/splice.c|9 -
 include/linux/splice.h |1 +
 2 files changed, 9 insertions(+), 1 deletions(-)

diff --git a/fs/splice.c b/fs/splice.c
index f24e367..25ec9c8 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -247,11 +247,16 @@ ssize_t splice_to_pipe(struct pipe_inode_info *pipe,
 	}
 
 	while (page_nr < spd->nr_pages)
-		page_cache_release(spd->pages[page_nr++]);
+		spd->spd_release(spd, page_nr++);
 
 	return ret;
 }
 
+static void spd_release_page(struct splice_pipe_desc *spd, unsigned int i)
+{
+	page_cache_release(spd->pages[i]);
+}
+
 static int
 __generic_file_splice_read(struct file *in, loff_t *ppos,
 			   struct pipe_inode_info *pipe, size_t len,
@@ -270,6 +275,7 @@ __generic_file_splice_read(struct file *in, loff_t *ppos,
 		.partial = partial,
 		.flags = flags,
 		.ops = &page_cache_pipe_buf_ops,
+		.spd_release = spd_release_page,
 	};
 
 	index = *ppos >> PAGE_CACHE_SHIFT;
@@ -1442,6 +1448,7 @@ static long vmsplice_to_pipe(struct file *file, const struct iovec __user *iov,
 		.partial = partial,
 		.flags = flags,
 		.ops = &user_page_pipe_buf_ops,
+		.spd_release = spd_release_page,
 	};
 
 	pipe = pipe_info(file->f_path.dentry->d_inode);
diff --git a/include/linux/splice.h b/include/linux/splice.h
index 1a1182b..04c1068 100644
--- a/include/linux/splice.h
+++ b/include/linux/splice.h
@@ -53,6 +53,7 @@ struct splice_pipe_desc {
 	int nr_pages;			/* number of pages in map */
 	unsigned int flags;		/* splice flags */
 	const struct pipe_buf_operations *ops;/* ops associated with output pipe */
+	void (*spd_release)(struct splice_pipe_desc *, unsigned int);
 };
 
 typedef int (splice_actor)(struct pipe_inode_info *, struct pipe_buffer *,
-- 
1.5.2.1.174.gcd03

>From 95a1ee277f2a6df5c95d786b9229ea0ffa46850d Mon Sep 17 00:00:00 2001
From: Jens Axboe <[EMAIL PROTECTED]>
Date: Mon, 4 Jun 2007 15:06:43 +0200
Subject: [PATCH] tcp_read_sock: alloc recv_actor() return return negative error value

Signed-off-by: Jens Axboe <[EMAIL PROTECTED]>
---
 net/ipv4/tcp.c |8 ++--
 1 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index cd3c7e9..450f44b 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -1064,7 +1064,11 @@ int tcp_read_sock(struct sock *sk, read_descriptor_t *desc,
 	break;
 			}
 			used = recv_actor(desc, skb, offset, len);
-			if (used <= len) {
+			if (used < 0) {
+if (!copied)
+	copied = used;
+break;
+			} else if (used <= len) {
 seq += used;
 copied += used;
 offset += used;
@@ -1086,7 +1090,7 @@ int tcp_read_sock(struct sock *sk, read_descriptor_t *desc,
 	tcp_rcv_space_adjust(sk);
 
 	/* Clean up data we have read: This will do ACK frames. */
-	if (copied)
+	if (copied > 0)
 		tcp_cleanup_rbuf(sk, copied);
 	return copied;
 }
-- 
1.5.2.1.174.gcd03

>From f0329226c6c1f521c2069358699bae5ed48f5a43 Mon Sep 17 00:00:00 2001
From: Jens Axboe <[EMAIL PROTECTED]>
Date: Mon, 11 Jun 2007 13:51:57 +0200
Subject: [PATCH] TCP splice receive support

Support for network splice receive.

Signed-off-by: Jens Axboe <[EMAIL PROTECTED]>
---
 include/linux/net.h|3 +
 include/linux/skbuff.h |5 ++
 include/net/tcp.h  |3 +
 net/core/skbuff.c  |  175 
 net/ipv4/af_inet.c |1 +
 net/ipv4/tcp.c |  122 +
 net/socket.c   |   13 
 7 files changed, 322 insertions(+), 0 deletions(-)

diff --git a/include/linux/net.

Re: [PATCH] NET: Multiqueue network device support.

2007-06-11 Thread Patrick McHardy
jamal wrote:
> On Wed, 2007-06-06 at 15:35 -0700, David Miller wrote:
> 
>>The problem with this line of thinking is that it ignores the fact
>>that it is bad to not queue to the device when there is space
>>available, _even_ for lower priority packets.
> 
> 
> So use a different scheduler. Dont use strict prio. Strict prio will
> guarantee starvation of low prio packets as long as there are high prio
> packets. Thats the intent.


With a single queue state _any_ full HW queue will starve all other
queues, independant of the software queueing discipline.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] NET: Multiqueue network device support.

2007-06-11 Thread Patrick McHardy
Waskiewicz Jr, Peter P wrote:
>>>If they have multiple TX queues, independantly programmable, that 
>>>single lock is stupid.
>>>
>>>We could use per-queue TX locks for such hardware, but we can't 
>>>support that currently.
>>
>>There could be bad packet reordering with this (like some SMP 
>>routers used to do).
> 
> 
> My original multiqueue patches I submitted actually had a per-queue Tx
> lock, but it was removed since the asymmetry in the stack for locking
> was something people didn't like.  Locking a queue for ->enqueue(),
> unlocking, then locking for ->dequeue(), unlocking, was something people
> didn't like very much.  Also knowing what queue to lock on ->enqueue()
> was where the original ->map_queue() idea came from, since we wanted to
> lock before calling ->enqueue().


I guess there were a few more reasons why people (at least me) didn't
like it. IIRC It didn't include any sch_api locking changes, to it
was completely broken wrt. concurrent configuration changes (easy
fixable though). Additionally it assumed that classification was
deterministic and two classify calls would return the same result,
which is not necessarily true and might have resulted in locking
the wrong queue, and it didn't deal with TC actions doing stuff
to a packet during the first classification.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] NET: Multiqueue network device support.

2007-06-11 Thread jamal
On Mon, 2007-11-06 at 13:58 +0200, Patrick McHardy wrote:

> Thats not true. Assume PSL has lots of packets, PSH is empty. We
> fill the PHL queue until their is no room left, so the driver
> has to stop the queue. 

Sure. Packets stashed on the any DMA ring are considered "gone to the
wire". That is a very valid assumption to make.
 
> Now some PSH packets arrive, but the queue
> is stopped, no packets will be sent. 
> Now, you can argue that as
> soon as the first PHL packet is sent there is room for more and
> the queue will be activated again and we'll take PSH packets,

_exactly_ ;->

> so it doesn't matter because we can't send two packets at once
> anyway. Fine.

i can see your thought process building -
You are actually following what i am saying;->

>  Take three HW queues, prio 0-2. The prio 2 queue
> is entirely full, prio 1 has some packets queued and prio 0 is
> empty. Now, because prio 2 is completely full, the driver has to
> stop the queue. Before it can start it again it has to send all
> prio 1 packets and then at least one packet of prio 2. Until
> this happens, no packets can be queued to prio 0.

The assumption is packets gone to the DMA are gone to the wire, thats
it. 
If you have a strict prio scheduler, contention from the stack is only
valid if they both arrive at the same time.
If that happens then (assuming 0 is more important than 1 which is more
important than 2) then 0 always wins over 1 which wins over 2.
Same thing if you queue into hardware and the priorization is the same.

cheers,
jamal

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] [IPV4]: Add default config support after inetdev_init

2007-06-11 Thread Patrick McHardy
Herbert Xu wrote:
> [IPV4]: Add default config support after inetdev_init
> 
> Previously once inetdev_init has been called on a device any changes made
> to ipv4_devconf_dflt would have no effect on that device's configuration.


I noticed a few more side-effects from the original change that
seem to be undesired. Some code assumes that dev->ip_ptr != NULL
implies existance of IP addresses on the device. For example
fib_check_nh used to allow to add routes to devices only when a
in_device is present. We can now add routes without having any
IP addresses configured, which makes routing choose 0.0.0.0 as
source and invalidates the assumption of some other code that
the outgoing device of a packet always has an in_device present
(like MASQUERADE). fib_sync_up used to skip a nexthop when no
IP addresses was present on the device, now it will keep it.
There's probably more, I guess we need to audit all in_dev_get
calls.

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] [IPV4]: Add default config support after inetdev_init

2007-06-11 Thread Herbert Xu
On Mon, Jun 11, 2007 at 02:26:58PM +0200, Patrick McHardy wrote:
> 
> I noticed a few more side-effects from the original change that
> seem to be undesired. Some code assumes that dev->ip_ptr != NULL
> implies existance of IP addresses on the device. For example
> fib_check_nh used to allow to add routes to devices only when a
> in_device is present. We can now add routes without having any
> IP addresses configured, which makes routing choose 0.0.0.0 as
> source and invalidates the assumption of some other code that
> the outgoing device of a packet always has an in_device present
> (like MASQUERADE). fib_sync_up used to skip a nexthop when no
> IP addresses was present on the device, now it will keep it.
> There's probably more, I guess we need to audit all in_dev_get
> calls.

Good catch.  I'll work through them.  Oh yeah I still need to do
the same thing for IPv6 too.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RFC: Support send-to-self over external interfaces (and veths).

2007-06-11 Thread Patrick McHardy
Ben Greear wrote:
> This should also be useful with the pending 'veth' driver, as it
> emulates two ethernet ports connected with a cross-over cable.
> 
> To make this work, you have to enable the sysctl (look Dave,
> no IOCTLS, there might be hope for me yet!! :)), and in your
> application you will need to use SO_BINDTODEVICE (and probably bind to
> the local IP as well).  Some applications such as traceroute already
> support this binding..others such as ping do not.
> 
> You most likely will also have to set up routing tables using
> source IPs as a rule to direct these connections to a particular
> routing table.
> 
> Comments welcome.


I would really prefer to simply make the prio 0 "lookup local"
rule deletable so you can rules with higher priority. That
allows to do send to self without any further code changes
and avoids the need to bind applications to a device.

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] NET: Multiqueue network device support.

2007-06-11 Thread Patrick McHardy
jamal wrote:
> On Mon, 2007-11-06 at 13:58 +0200, Patrick McHardy wrote:
> 
> 
>>Thats not true. Assume PSL has lots of packets, PSH is empty. We
>>fill the PHL queue until their is no room left, so the driver
>>has to stop the queue. 
> 
> 
> Sure. Packets stashed on the any DMA ring are considered "gone to the
> wire". That is a very valid assumption to make.


I disagree, its obviously not true and leads to the behaviour I
described. If it were true there would be no reason to use multiple
HW TX queues to begin with.


>>[...]
> 
> i can see your thought process building -
> You are actually following what i am saying;->


I am :)
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] NET: Multiqueue network device support.

2007-06-11 Thread jamal
On Mon, 2007-11-06 at 14:39 +0200, Patrick McHardy wrote:
> jamal wrote:
> > On Mon, 2007-11-06 at 13:58 +0200, Patrick McHardy wrote:
> > 

> > Sure. Packets stashed on the any DMA ring are considered "gone to the
> > wire". That is a very valid assumption to make.
> 
> 
> I disagree, its obviously not true 

Patrick, you are making too strong a statement. Take a step back:
When you put a packet on the DMA ring, are you ever going to take it
away at some point before it goes to the wire? 

> and leads to the behaviour I
> described. If it were true there would be no reason to use multiple
> HW TX queues to begin with.

In the general case, they are totaly useless.
They are  useful when theres contention/congestion. Even in a shared
media like wireless. 
And if there is contention, the qdisc scheduler will do the right thing.

cheers,
jamal

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] NET: Multiqueue network device support.

2007-06-11 Thread Patrick McHardy
jamal wrote:
> On Mon, 2007-11-06 at 14:39 +0200, Patrick McHardy wrote:
> 
>>>Sure. Packets stashed on the any DMA ring are considered "gone to the
>>>wire". That is a very valid assumption to make.
>>
>>
>>I disagree, its obviously not true 
> 
> 
> Patrick, you are making too strong a statement.


Well, its not.

> Take a step back:
> When you put a packet on the DMA ring, are you ever going to take it
> away at some point before it goes to the wire? 


No, but its nevertheless not on the wire yet and the HW scheduler
controls when it will get there. It might in theory even never get
there if higher priority queues are continously active.

>>and leads to the behaviour I
>>described. If it were true there would be no reason to use multiple
>>HW TX queues to begin with.
> 
> 
> In the general case, they are totaly useless.
> They are  useful when theres contention/congestion. Even in a shared
> media like wireless. 


The same is true for any work-conserving queue, software or hardware.

> And if there is contention, the qdisc scheduler will do the right thing.


That ignores a few points that were raised in this thread,

- you can treat each HW queue as an indivdual network device
- you can avoid synchronizing on a single queue lock for
  multiple TX queues
- it is desirable to keep all queues full
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] NET: Multiqueue network device support.

2007-06-11 Thread jamal
On Mon, 2007-11-06 at 15:03 +0200, Patrick McHardy wrote:
> jamal wrote:

> Well, its not.

I dont wanna go into those old style debates again; so lets drop this
point. 

> > Take a step back:
> > When you put a packet on the DMA ring, are you ever going to take it
> > away at some point before it goes to the wire? 
> 
> 
> No, 
> but its nevertheless not on the wire yet and the HW scheduler
> controls when it will get there. 
>
> It might in theory even never get
> there if higher priority queues are continously active.

Sure - but what is wrong with that? 
What would be wrong is in the case of contention for a resource like a
wire between a less important packet and a more important packet, the
more important packet gets favored.
Nothing like that ever happens in what i described.
Remember there is no issue if there is no congestion or contention for
local resources.

> > And if there is contention, the qdisc scheduler will do the right thing.
> 
> 
> That ignores a few points that were raised in this thread,
> 
> - you can treat each HW queue as an indivdual network device

You can treat a pair of tx/rx as a netdev. In which case none of this is
important. You instantiate a different netdev and it only holds the
appropriate locks.

> - you can avoid synchronizing on a single queue lock for
>   multiple TX queues

Unneeded if you do what i described. Zero changes to the qdisc code.

> - it is desirable to keep all queues full

It is desirable to keep resources fully utilized. Sometimes that is
achieved by keeping _all_ queues full. If i fill up a single queue full
and transmit at wire rate, there is no issue.

cheers,
jamal

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[WIP][DOC] Net tx batching

2007-06-11 Thread jamal

I have started writting a small howto for drivers. Hoping to get a wider
testing with more drivers.
So far i have changed e1000 and tun drivers as well as modified the
packetgen tool to do batching.

I will update this document as needed if something is unclear. 
Please contribute by asking questions, changing a driver and wide
testing. I may target tg3 next and write a tool to do testing from
UDP level.

cheers,
jamal


Heres the begining of a howto for driver author.
The current working tree can be found at:
git://git.kernel.org/pub/scm/linux/kernel/git/hadi/batch-lin26.git

The intended audience for this howto is people already
familiar with netdevices.

0) Hardware Pre-requisites:
---

You must have at least hardware that is capable of doing
DMA with many descriptors; i.e having hardware with a queue
length of 3 (as in some fscked ethernet hardware) is not
very useful in this case.

1) What is new in the driver API:
-

a) A new method called onto the driver by the net tx core to
batch packets. This method, dev->hard_batch_xmit(dev), 
is no different than dev->hard_start_xmit(dev) in terms of the 
arguements it takes. You just have to handle it differently 
(more below).

b) A new method, dev->hard_prep_xmit(), called onto the driver to 
massage the packet before it gets transmitted. 
This method is optional i.e if you dont specify it, you will
not be invoked(more below)

c) A new variable dev->xmit_win which provides suggestions to the
core calling into the driver a rough estimate of how many
packets can be batched onto the driver.

2) Driver pre-requisite


The typical driver tx state machine is:


--> +Core sends packets
+--> Driver puts packet onto hardware queue
+if hardware queue is full, netif_stop_queue(dev)
+
--> +core stops sending because of netif_stop_queue(dev)
..
.. time passes
..
..
--> +---> driver has transmitted packets, opens up tx path by
  invoking netif_wake_queue(dev)
--> +Core sends packets, and the cycle repeats.


The pre-requisite for batching changes is that the driver should 
provide a low threshold to open up the tx path.
This is a very important requirement in making batching useful.
Drivers such as tg3 and e1000 already do this.
So in the above annotation, as a driver author, before you
invoke netif_wake_queue(dev) you check if there are enough
entries left.

Heres an example of how i added it to tun driver
---
+#define NETDEV_LTT 4 /* the low threshold to open up the tx path */
..
..
u32 t = skb_queue_len(&tun->readq);
if (netif_queue_stopped(tun->dev) && t < NETDEV_LTT) {
tun->dev->xmit_win = tun->dev->tx_queue_len;
netif_wake_queue(tun->dev);
}
---

Heres how the batching e1000 driver does it (ignore the setting of
netdev->xmit_win, more on this later):

--
if (netif_queue_stopped(netdev)) {
   int rspace =  E1000_DESC_UNUSED(tx_ring) - (MAX_SKB_FRAGS +  2);
   netdev->xmit_win = rspace;
   netif_wake_queue(netdev);
   }
---

in tg3 code looks like:

-
if (netif_queue_stopped(tp->dev) &&
(tg3_tx_avail(tp) > TG3_TX_WAKEUP_THRESH(tp)))
netif_wake_queue(tp->dev);
---

3) Driver Setup:

a) On initialization (before netdev registration)
 i) set NETIF_F_BTX in  dev->features 
  i.e dev->features |= NETIF_F_BTX
  This makes the core do proper initialization.

  ii) set dev->xmit_win to something reasonable like
  maybe half the tx DMA ring size etc.
  This is later used by the core to guess how much packets to send
  in one batch. 

  b) create proper pointer to the two new methods desribed above.


4) The new methods

  a) The batching method
  
Heres an example of a batch tx routine that is similar
to the one i added to tun driver


  static int xxx_net_bxmit(struct net_device *dev)
  {
  
  
while (skb_queue_len(dev->blist)) {
dequeue from dev->blist
enqueue onto hardware ring
if hardware ring full break
}
   
if (hardware ring full) {
  netif_stop_queue(dev);
  dev->xmit_win = 1;
}

   if we queued on hardware, tell it to chew
   ...
   ..
   .
  }
--

All return codes like NETDEV_TX_OK etc still apply.
In this method, if there are any IO operations that apply to a 
set of packets (such as kicking DMA) leave them to the end and apply
them once if you have successfully enqueued. For an example of this
look e1000 driver e1000_kick_DMA() function.

b) The dev->hard_prep_xmit() method

Use this method to only do pre-processing of the skb passed.
If in the current dev->hard_start_xmit() you are pre-processing
packets before holding any locks (eg formating them to be put in
any descriptor etc).
Look at e1000_prep_queue_frame() fo

Re: [PATCH] NET: Multiqueue network device support.

2007-06-11 Thread Patrick McHardy
jamal wrote:
> On Mon, 2007-11-06 at 15:03 +0200, Patrick McHardy wrote:
> 
>>>Take a step back:
>>>When you put a packet on the DMA ring, are you ever going to take it
>>>away at some point before it goes to the wire? 
>>
>>
>>No, but its nevertheless not on the wire yet and the HW scheduler
>>controls when it will get there. 
>>
>>It might in theory even never get
>>there if higher priority queues are continously active.
> 
> 
> Sure - but what is wrong with that? 


Nothing, this was just to illustrate why I disagree with the assumption
that the packet has hit the wire. On second thought I do agree with your
assumption for the single HW queue case, at the point we hand the packet
to the HW the packet order is determined and is unchangeable. But this
is not the case if the hardware includes its own scheduler. The qdisc
is simply not fully in charge anymore.

> What would be wrong is in the case of contention for a resource like a
> wire between a less important packet and a more important packet, the
> more important packet gets favored.


Read again what I wrote about the n > 2 case. Low priority queues might
starve high priority queues when using a single queue state for a
maximum of the time it takes to service n - 2 queues with max_qlen - 1
packets queued plus the time for a single packet. Thats assuming the
worst case of n - 2 queues with max_qlen - 1 packets and the lowest
priority queue full, so the queue is stopped until we can send at
least one lowest priority packet, which requires to fully service
all higher priority queues previously.

> Nothing like that ever happens in what i described.
> Remember there is no issue if there is no congestion or contention for
> local resources.


Your basic assumption seems to be that the qdisc is still in charge
of when packets get sent. This isn't the case if there is another
scheduler after the qdisc and there is contention in the second
queue.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH] NET: Multiqueue network device support.

2007-06-11 Thread Cohen, Guy

Patrick McHardy wrote:
> jamal wrote:
> > Sure - but what is wrong with that?
> 
> 
> Nothing, this was just to illustrate why I disagree with the
assumption
> that the packet has hit the wire. On second thought I do agree with
your
> assumption for the single HW queue case, at the point we hand the
packet
> to the HW the packet order is determined and is unchangeable. But this
> is not the case if the hardware includes its own scheduler. The qdisc
> is simply not fully in charge anymore.

For WiFi devices the HW often implements the scheduling, especially when
QoS (WMM/11e/11n) is implemented. There are few traffic queues defined
by the specs and the selection of the next queue to transmit a packet
from, is determined in real time, just when there is a tx opportunity.
This cannot be predicted in advance since it depends on the medium usage
of other stations.

Hence, to make it possible for wireless devices to use the qdisc
mechanism properly, the HW queues should _ALL_ be non-empty at all
times, whenever data is available in the upper layers. Or in other
words, the upper layers should not block a specific queue because of the
usage of any other queue.

> 
> Your basic assumption seems to be that the qdisc is still in charge
> of when packets get sent. This isn't the case if there is another
> scheduler after the qdisc and there is contention in the second
> queue.

Which is often the case in wireless devices - transmission scheduling is
done in HW.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] NET: Multiqueue network device support.

2007-06-11 Thread jamal
On Mon, 2007-11-06 at 16:03 +0200, Patrick McHardy wrote:
> jamal wrote:

> > Sure - but what is wrong with that?
> 
> Nothing, this was just to illustrate why I disagree with the assumption
> that the packet has hit the wire. 

fair enough.

> On second thought I do agree with your
> assumption for the single HW queue case, at the point we hand the packet
> to the HW the packet order is determined and is unchangeable. But this
> is not the case if the hardware includes its own scheduler. The qdisc
> is simply not fully in charge anymore.

 i am making the case that it does not affect the overall results
as long as you use the same parameterization on qdisc and hardware.
If in fact the qdisc high prio packets made it to the driver before
the they make it out onto the wire, it is probably a good thing
that the hardware scheduler starves the low prio packets.

> Read again what I wrote about the n > 2 case. Low priority queues might
> starve high priority queues when using a single queue state for a
> maximum of the time it takes to service n - 2 queues with max_qlen - 1
> packets queued plus the time for a single packet. Thats assuming the
> worst case of n - 2 queues with max_qlen - 1 packets and the lowest
> priority queue full, so the queue is stopped until we can send at
> least one lowest priority packet, which requires to fully service
> all higher priority queues previously.

I didnt quiet follow the above - I will try retrieving reading your
other email to see if i can make sense of it. 

> Your basic assumption seems to be that the qdisc is still in charge
> of when packets get sent. This isn't the case if there is another
> scheduler after the qdisc and there is contention in the second
> queue.

My basic assumption is if you use the same scheduler in both the
hardware and qdisc, configured the same same number of queues and
mapped the same priorities then you dont need to make any changes
to the qdisc code. If i have a series of routers through which a packet
traveses to its destination with the same qos parameters i also achieve
the same results.

cheers,
jamal

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] NET: Multiqueue network device support.

2007-06-11 Thread Patrick McHardy
Cohen, Guy wrote:
> Patrick McHardy wrote:
> 
>>jamal wrote:
>>
>>>Sure - but what is wrong with that?
>>
>>
>>Nothing, this was just to illustrate why I disagree with the
> 
> assumption
> 
>>that the packet has hit the wire. On second thought I do agree with
> 
> your
> 
>>assumption for the single HW queue case, at the point we hand the
> 
> packet
> 
>>to the HW the packet order is determined and is unchangeable. But this
>>is not the case if the hardware includes its own scheduler. The qdisc
>>is simply not fully in charge anymore.
> 
> 
> For WiFi devices the HW often implements the scheduling, especially when
> QoS (WMM/11e/11n) is implemented. There are few traffic queues defined
> by the specs and the selection of the next queue to transmit a packet
> from, is determined in real time, just when there is a tx opportunity.
> This cannot be predicted in advance since it depends on the medium usage
> of other stations.
> 
> Hence, to make it possible for wireless devices to use the qdisc
> mechanism properly, the HW queues should _ALL_ be non-empty at all
> times, whenever data is available in the upper layers. Or in other
> words, the upper layers should not block a specific queue because of the
> usage of any other queue.


Thats exactly what I'm saying. And its not possible with a single
queue state as I tried to explain in my last last.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH] NET: Multiqueue network device support.

2007-06-11 Thread jamal
On Mon, 2007-11-06 at 17:30 +0300, Cohen, Guy wrote:

> 
> For WiFi devices the HW often implements the scheduling, especially when
> QoS (WMM/11e/11n) is implemented. There are few traffic queues defined
> by the specs and the selection of the next queue to transmit a packet
> from, is determined in real time, just when there is a tx opportunity.
> This cannot be predicted in advance since it depends on the medium usage
> of other stations.

WMM is a strict prio mechanism.
The parametrization very much favors the high prio packets when the
tx opportunity to send shows up.

> Hence, to make it possible for wireless devices to use the qdisc
> mechanism properly, the HW queues should _ALL_ be non-empty at all
> times, whenever data is available in the upper layers. 

agreed.

> Or in other
> words, the upper layers should not block a specific queue because of the
> usage of any other queue.

This is where we are going to disagree. 
There is no way the stack will send the driver packets which are low
prio if there are some which are high prio. There is therefore, on
contention between low and high prio, no way for low prio packets to
obstruct the high prio packets; however, it is feasible that high prio
packets will obstruct low prio packets (which is fine). 

cheers,
jamal

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] NET: Multiqueue network device support.

2007-06-11 Thread Patrick McHardy
jamal wrote:
> On Mon, 2007-11-06 at 16:03 +0200, Patrick McHardy wrote:
> 
>>Read again what I wrote about the n > 2 case. Low priority queues might
>>starve high priority queues when using a single queue state for a
>>maximum of the time it takes to service n - 2 queues with max_qlen - 1
>>packets queued plus the time for a single packet. Thats assuming the
>>worst case of n - 2 queues with max_qlen - 1 packets and the lowest
>>priority queue full, so the queue is stopped until we can send at
>>least one lowest priority packet, which requires to fully service
>>all higher priority queues previously.
> 
> 
> I didnt quiet follow the above - I will try retrieving reading your
> other email to see if i can make sense of it. 


Let me explain with some ASCII art :)

We have n empty HW queues with a maximum length of m packets per queue:

[0] empty
[1] empty
[2] empty
..
[n-1] empty

Now we receive m - 1 packets for each all priorities >= 1 and < n - 1,
so we have:

[0] empty
[1] m - 1 packets
[2] m - 1 packets
..
[n-2] m - 1 packets
[n] empty

Since no queue is completely full, the queue is still active.
Now we receive m packets of priorty n:

[0] empty
[1] m - 1 packets
[2] m - 1 packets
..
[n-2] m - 1 packets
[n-1] m packets

At this point the queue needs to be stopped since the highest
priority queue is entirely full. To start it again at least
one packet of queue n - 1 needs to be sent, which (assuming
strict priority) requires that queues 1 to n - 2 are serviced
first. So any prio 0 packets arriving during this period will
sit in the qdisc and will not reach the device for a possibly
quite long time. With multiple queue states we'd know that
queue 0 can still take packets.

>>Your basic assumption seems to be that the qdisc is still in charge
>>of when packets get sent. This isn't the case if there is another
>>scheduler after the qdisc and there is contention in the second
>>queue.
> 
> 
> My basic assumption is if you use the same scheduler in both the
> hardware and qdisc, configured the same same number of queues and
> mapped the same priorities then you dont need to make any changes
> to the qdisc code. If i have a series of routers through which a packet
> traveses to its destination with the same qos parameters i also achieve
> the same results.


Did my example above convince you that this is not the case?

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] NET: Multiqueue network device support.

2007-06-11 Thread Tomas Winkler

On 6/11/07, jamal <[EMAIL PROTECTED]> wrote:

On Mon, 2007-11-06 at 17:30 +0300, Cohen, Guy wrote:

>
> For WiFi devices the HW often implements the scheduling, especially when
> QoS (WMM/11e/11n) is implemented. There are few traffic queues defined
> by the specs and the selection of the next queue to transmit a packet
> from, is determined in real time, just when there is a tx opportunity.
> This cannot be predicted in advance since it depends on the medium usage
> of other stations.

WMM is a strict prio mechanism.
The parametrization very much favors the high prio packets when the
tx opportunity to send shows up.



This is not true, there is no simple priority order from 1 to 4 ,
rather set of parameters that dermises access to medium.  You have to
emulate medium behavior to schedule packets in correct order. That's
why this pushed to HW, otherwise nobody would invest money in this
part of silicon :)


> Hence, to make it possible for wireless devices to use the qdisc
> mechanism properly, the HW queues should _ALL_ be non-empty at all
> times, whenever data is available in the upper layers.

agreed.

> Or in other
> words, the upper layers should not block a specific queue because of the
> usage of any other queue.

This is where we are going to disagree.
There is no way the stack will send the driver packets which are low
prio if there are some which are high prio. There is therefore, on
contention between low and high prio, no way for low prio packets to
obstruct the high prio packets; however, it is feasible that high prio
packets will obstruct low prio packets (which is fine).

cheers,
jamal

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] NET: Multiqueue network device support.

2007-06-11 Thread jamal
On Mon, 2007-11-06 at 16:49 +0200, Patrick McHardy wrote:

> Let me explain with some ASCII art :)

Ok ;-> 

> We have n empty HW queues with a maximum length of m packets per queue:
> 
> [0] empty
> [1] empty
> [2] empty
> ..
> [n-1] empty
> 

Asumming 0 i take it is higher prio than n-1.

> Now we receive m - 1 packets for each all priorities >= 1 and < n - 1,
> so we have:
> 
> [0] empty
> [1] m - 1 packets
> [2] m - 1 packets
> ..
> [n-2] m - 1 packets
> [n] empty
> 
> Since no queue is completely full, the queue is still active.

and packets are being fired on the wire by the driver etc ...

> Now we receive m packets of priorty n:

n-1 (i think?)

> [0] empty
> [1] m - 1 packets
> [2] m - 1 packets
> ..
> [n-2] m - 1 packets
> [n-1] m packets
> 
> At this point the queue needs to be stopped since the highest
> priority queue is entirely full. 

ok, so 0 is lower prio than n-1 

> To start it again at least
> one packet of queue n - 1 needs to be sent, 

following so far ...

> which (assuming
> strict priority) requires that queues 1 to n - 2 are serviced
> first. 

Ok, so let me revert that; 0 is higher prio than n-1.

> So any prio 0 packets arriving during this period will
> sit in the qdisc and will not reach the device for a possibly
> quite long time. 

"possibly long time" is where we diverge ;->
If you throw the burden to the driver (as i am recommending in all my
arguements so far), it should open up sooner based on priorities.
I didnt wanna bring this earlier because it may take the discussion in
the wrong direction. 
So in your example if n-1 shuts down the driver, then it is upto to the
driver to open it up if any higher prio packet makes it out.

> With multiple queue states we'd know that
> queue 0 can still take packets.

And with what i described you dont make any such changes to the core;
the burden is on the driver.

cheers,
jamal

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] NET: Multiqueue network device support.

2007-06-11 Thread jamal
On Mon, 2007-11-06 at 18:00 +0300, Tomas Winkler wrote:
> On 6/11/07, jamal <[EMAIL PROTECTED]> wrote:
> > On Mon, 2007-11-06 at 17:30 +0300, Cohen, Guy wrote:
> >
> > >
> > > For WiFi devices the HW often implements the scheduling, especially when
> > > QoS (WMM/11e/11n) is implemented. There are few traffic queues defined
> > > by the specs and the selection of the next queue to transmit a packet
> > > from, is determined in real time, just when there is a tx opportunity.
> > > This cannot be predicted in advance since it depends on the medium usage
> > > of other stations.
> >
> > WMM is a strict prio mechanism.
> > The parametrization very much favors the high prio packets when the
> > tx opportunity to send shows up.
> >
> 
> This is not true, there is no simple priority order from 1 to 4 ,
> rather set of parameters that dermises access to medium.  You have to
> emulate medium behavior to schedule packets in correct order. That's
> why this pushed to HW, otherwise nobody would invest money in this
> part of silicon :)
> 

I dont have the specs neither am i arguing the value of having the
scheduler in hardware. (I think the over radio contention clearly
needs the scheduler in hardware).

But i have read a couple of papers on people simulating this in s/ware.
And have seen people describe the parametrization that is default,
example Slide 43 on:
http://madwifi.org/attachment/wiki/ChipsetFeatures/WMM/qos11e.pdf?format=raw
seems to indicate the default parameters for the different timers
is clearly strictly in favor of you if you have higher prio.
If the info quoted is correct, it doesnt change anything i have said so
far.
i.e it is strict prio scheduling with some statistical chance a low prio
packet will make it. 

cheers,
jamal

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] NET: Multiqueue network device support.

2007-06-11 Thread Patrick McHardy
jamal wrote:
> On Mon, 2007-11-06 at 16:49 +0200, Patrick McHardy wrote:
> 
>>We have n empty HW queues with a maximum length of m packets per queue:
>>
>>[0] empty
>>[1] empty
>>[2] empty
>>..
>>[n-1] empty
>
> 
> Asumming 0 i take it is higher prio than n-1.


Yes.

>>Now we receive m - 1 packets for each all priorities >= 1 and < n - 1,
>>so we have:
>>
>>[0] empty
>>[1] m - 1 packets
>>[2] m - 1 packets
>>..
>>[n-2] m - 1 packets
>>[n] empty
>>
>>Since no queue is completely full, the queue is still active.
>>Now we receive m packets of priorty n:
> 
> 
> n-1 (i think?)


Right.

>>[0] empty
>>[1] m - 1 packets
>>[2] m - 1 packets
>>..
>>[n-2] m - 1 packets
>>[n-1] m packets
>>
>>At this point the queue needs to be stopped since the highest
>>priority queue is entirely full. 
> 
> 
> ok, so 0 is lower prio than n-1 


Higher priority. But we don't know what the priority of the
next packet is going to be, so we have to stop the entire
qdisc anyway.

>>To start it again at least one packet of queue n - 1 needs to be sent, 
> 
> 
> following so far ...
> 
> 
>>which (assuming
>>strict priority) requires that queues 1 to n - 2 are serviced
>>first. 
> 
> 
> Ok, so let me revert that; 0 is higher prio than n-1.


Yes.

>>So any prio 0 packets arriving during this period will
>>sit in the qdisc and will not reach the device for a possibly
>>quite long time. 
> 
> 
> "possibly long time" is where we diverge ;->

Worst cast is (n - 2) * (m - 1) + 1 full sized packet transmission
times.

You can do the math yourself, but we're talking about potentially
a lot of packets.

> If you throw the burden to the driver (as i am recommending in all my
> arguements so far), it should open up sooner based on priorities.
> I didnt wanna bring this earlier because it may take the discussion in
> the wrong direction. 
> So in your example if n-1 shuts down the driver, then it is upto to the
> driver to open it up if any higher prio packet makes it out.


How could it do that? n-1 is still completely full and you don't
know what the next packet is going to be. Are you proposing to
simply throw the packet away in the driver even though its within
the configured limits of the qdisc?
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] NET: Multiqueue network device support.

2007-06-11 Thread jamal
On Mon, 2007-11-06 at 17:12 +0200, Patrick McHardy wrote:


> > Ok, so let me revert that; 0 is higher prio than n-1.
> 
> 
> Yes.
> 

Ok, gotcha.
 
> > "possibly long time" is where we diverge ;->
> 
> Worst cast is (n - 2) * (m - 1) + 1 full sized packet transmission
> times.
> 
> You can do the math yourself, but we're talking about potentially
> a lot of packets.

I agree if you use the strategy of "a ring shutdown down" implies
"dont wake up until the ring that caused the shutdown opens up"
What i am saying below is to make a change to that strategy.

> > If you throw the burden to the driver (as i am recommending in all my
> > arguements so far), it should open up sooner based on priorities.
> > I didnt wanna bring this earlier because it may take the discussion in
> > the wrong direction. 
> > So in your example if n-1 shuts down the driver, then it is upto to the
> > driver to open it up if any higher prio packet makes it out.
> 
> 
> How could it do that? n-1 is still completely full and you don't
> know what the next packet is going to be. Are you proposing to
> simply throw the packet away in the driver even though its within
> the configured limits of the qdisc?

No no Patrick - i am just saying the following:
- let the driver shutdown whenever a ring is full. Remember which ring X
shut it down.
- when you get a tx interupt or prun tx descriptors, if a ring <= X has
transmitted a packet (or threshold of packets), then wake up the driver
(i.e open up). 
In the meantime packets from the stack are sitting on the qdisc and will
be sent when the driver opens up.

Anyways, I have to run to work; thanks for keeping the discussion at the
level you did.

cheers,
jamal

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RFC airo : wpa support

2007-06-11 Thread Dan Williams
On Sat, 2007-05-05 at 00:52 +0200, matthieu castet wrote: 
> Hi,
> 
> I attach a diff against 2.6.21 for adding wpa support for airo driver.
> In then end of 2005 I manage to make work wpa but the code was really 
> ugly. I manage to find some time to clean it.
> 
> 
> To support wpa, a new interface of the firmware should be used. This 
> interface is incompatible with the old interface and using both 
> interface at the same time make the firmware hang.
> 
> Porting OPEN and WEP mode to new interface need some driver rewrite, and 
> the old interface should be keep for the older cards (or the cards that 
> doesn't have newer firmware).

What's involved in doing this?  Do you have any pointers or sample code
that could be used here?

I'd assume that we'd just use AUTH_CIPHER_WEP/AUTH_CIPHER_NONE and
AUTH_KEY_MGMT_NONE in ConfigRidExtra.  But where does the WEP key
material go, does it get stuffed into the WpaKeyRid too or somewhere
else?

Dan

> That's why I didn't do it, and I added a module parameter for choosing 
> between the old driver behavior (with no wpa support) or the driver with 
> wpa only support.
> 
> 
> The wireless extension handlers are a bit ugly, I will be very happy, if 
> somebody could help me to clean them.
> 
> Any comments are appreciated.
> 
> 
> Matthieu
> 
> 
> PS : the lastest version of the driver can be found in 
> http://svn.gna.org/viewcvs/airo-wpa/branches/kernel/
> 
> PS2 : There is some remaining trace in the driver for debug purpose, 
> that will be removed in the final version

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH] NET: Multiqueue network device support.

2007-06-11 Thread Cohen, Guy
Some more details inside regarding wireless QoS.

jamal wrote:
> On Mon, 2007-11-06 at 17:30 +0300, Cohen, Guy wrote:
> 
> >
> > For WiFi devices the HW often implements the scheduling, especially
when
> > QoS (WMM/11e/11n) is implemented. There are few traffic queues
defined
> > by the specs and the selection of the next queue to transmit a
packet
> > from, is determined in real time, just when there is a tx
opportunity.
> > This cannot be predicted in advance since it depends on the medium
usage
> > of other stations.
> 
> WMM is a strict prio mechanism.
> The parametrization very much favors the high prio packets when the
> tx opportunity to send shows up.

Sorry, but this not as simple as you describe it. WMM is much more
complicated. WMM defines the HW queues as virtually multiple clients
that compete on the medium access individually. Each implements a
contention-based medium access. The Access Point publishes to the
clients the medium access parameters (e.g. back off parameters) that are
different for each access category (virtual client). There is _not_ a
strict priority assigned to each access category. The behavior of each
access category totally depends on the medium usage of other clients and
is totally different for each access category. This cannot be predicated
at the host SW.

> > Hence, to make it possible for wireless devices to use the qdisc
> > mechanism properly, the HW queues should _ALL_ be non-empty at all
> > times, whenever data is available in the upper layers.
> 
> agreed.
> 
> > Or in other
> > words, the upper layers should not block a specific queue because of
the
> > usage of any other queue.
> 
> This is where we are going to disagree.
> There is no way the stack will send the driver packets which are low
> prio if there are some which are high prio. There is therefore, on
> contention between low and high prio, no way for low prio packets to
> obstruct the high prio packets;

And this is not the right behavior for a WLAN stack. QoS in WLAN doesn't
favor strictly one access category over another, but defines some softer
and smarter prioritization. This is implemented in the HW/Firmware. I
just think that providing a per-queue controls (start/stop) will allow
WLAN drivers/Firmware/HW to do that while still using qdisc (and it will
work properly even when one queue is full and others are empty).

> however, it is feasible that high prio
> packets will obstruct low prio packets (which is fine).

No this is _not_ fine. Just to emphasize again, WMM doesn't define
priority in the way it is implemented in airplane boarding (Pilots
first, Business passengers next, couch passengers at the end), but more
like _distributed_ weights prioritization (between all the multiple
queues of all the clients on the channel).

As a side note, in one of the WFA WMM certification tests, the AP
changes the medium access parameters of the access categories in a way
that favors a lower access category. This is something very soft that
cannot be reflected in any intuitive way in the host SW.

> cheers,
> jamal
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] NET: Multiqueue network device support.

2007-06-11 Thread Patrick McHardy
jamal wrote:
> On Mon, 2007-11-06 at 17:12 +0200, Patrick McHardy wrote:
> 
>>Worst cast is (n - 2) * (m - 1) + 1 full sized packet transmission
>>times.
>>
>>You can do the math yourself, but we're talking about potentially
>>a lot of packets.
> 
> 
> I agree if you use the strategy of "a ring shutdown down" implies
> "dont wake up until the ring that caused the shutdown opens up"
> What i am saying below is to make a change to that strategy.


Glad we agree on something. Now all I have to do is convince you that
a change to this strategy is not a good idea :)

>>>If you throw the burden to the driver (as i am recommending in all my
>>>arguements so far), it should open up sooner based on priorities.
>>>I didnt wanna bring this earlier because it may take the discussion in
>>>the wrong direction. 
>>>So in your example if n-1 shuts down the driver, then it is upto to the
>>>driver to open it up if any higher prio packet makes it out.
>>
>>
>>How could it do that? n-1 is still completely full and you don't
>>know what the next packet is going to be. Are you proposing to
>>simply throw the packet away in the driver even though its within
>>the configured limits of the qdisc?
> 
> 
> No no Patrick - i am just saying the following:
> - let the driver shutdown whenever a ring is full. Remember which ring X
> shut it down.
> - when you get a tx interupt or prun tx descriptors, if a ring <= X has
> transmitted a packet (or threshold of packets), then wake up the driver
> (i.e open up). 


At this point the qdisc might send new packets. What do you do when a
packet for a full ring arrives?

I see three choices:

- drop it, even though its still within the qdiscs configured limits
- requeue it, which does not work because the qdisc is still active
  and might just hand you the same packet over and over again in a
  busy loop, until the ring has more room (which has the same worst
  case, just that we're sitting in a busy loop now).
- requeue and stop the queue: we're back to where we started since
  now higher priority packets will not get passed to the driver.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RFC: Support send-to-self over external interfaces (and veths).

2007-06-11 Thread Ben Greear

Patrick McHardy wrote:

Ben Greear wrote:
  

This should also be useful with the pending 'veth' driver, as it
emulates two ethernet ports connected with a cross-over cable.

To make this work, you have to enable the sysctl (look Dave,
no IOCTLS, there might be hope for me yet!! :)), and in your
application you will need to use SO_BINDTODEVICE (and probably bind to
the local IP as well).  Some applications such as traceroute already
support this binding..others such as ping do not.

You most likely will also have to set up routing tables using
source IPs as a rule to direct these connections to a particular
routing table.

Comments welcome.




I would really prefer to simply make the prio 0 "lookup local"
rule deletable so you can rules with higher priority. That
allows to do send to self without any further code changes
and avoids the need to bind applications to a device.
  
I am not against making that change as well, but it is often easier to 
just bind-to-device
than to set up specific host routes for every possible combination..as 
it appears your
method requires.  (I could have mis-understood the routing requirements, 
but it seemed to
if you wanted any 100 interfaces to send to any other, your method would 
required 100 * 100

routes.)

A decent set of programs already support bind-to-device, and others are 
easily patched

if they need the behaviour.

Thanks,
Ben



-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
  



--
Ben Greear <[EMAIL PROTECTED]> 
Candela Technologies Inc  http://www.candelatech.com



-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] fix e100 rx path on ARM (was [PATCH] e100 rx: or s and el bits)

2007-06-11 Thread Milton Miller


On Jun 6, 2007, at 4:28 AM, Milton Miller wrote:



On Jun 5, 2007, at 9:26 PM, Kok, Auke wrote:


Kok, Auke wrote:


Hmm git-revert seems to do the job right. I checked it with git-show 
| patch -p1 -R and the results look OK. The two patches on top of 
the one we want to revert are unrelated enough to apply (manually it 
shows some fuzz, but otherwise it's OK).
Jeff, please `git-revert d52df4a35af569071fda3f4eb08e47cc7023f094` 
to revert the following patch for now:

---
commit  d52df4a35af569071fda3f4eb08e47cc7023f094
Author: Scott Feldman <[EMAIL PROTECTED]>
Date:   Wed Nov 9 02:18:52 2005 -0500
 [netdrvr e100] experiment with doing RX in a similar manner to 
eepro100
 I was going to say that eepro100's speedo_rx_link() does the 
same DMA
 abuse as e100, but then I noticed one little detail: eepro100 
sets  both
 EL (end of list) and S (suspend) bits in the RFD as it chains 
it  to the
 RFD list.  e100 was only setting the EL bit.  Hmmm, that's  
interesting.
 That means that if HW reads a RFD with the S-bit set,  it'll 
process
 that RFD and then suspend the receive unit.  The  receive unit 
will
 resume when SW clears the S-bit.  There is no need  for SW to 
restart
 the receive unit.  Which means a lot of the receive  unit state 
tracking

 code in the driver goes away.
 So here's a patch against 2.6.14.  (Sorry for inlining it; the 
mailer
 I'm using now will mess with the word wrap).  I can't test this 
on
 XScale (unless someone has an e100 module for Gumstix :) .  It 
should
 be doing exactly what eepro100 does with RFDs.  I don't believe 
this
 change will introduce a performance hit because the S-bit and 
EL-bit  go
 hand-in-hand meaning if we're going to suspend because of the 
S- bit,
 we're on the last resource anyway, so we'll have to wait for SW 
 to

 replenish.
 (cherry picked from 29e79da9495261119e3b2e4e7c72507348e75976 
commit)

---


A little bit more is needed to explain why we're reverting it for 
now. Jeff, please insert this into the revert commit.


Auke

--
This patch attempted to fix e100 for non-cache coherent memory 
architectures by using the cb style code that eepro100 had and using 
the EL and s bits from the RFD list. Unfortunately the hardware 
doesn't work exactly like this and therefore this patch actually 
breaks e100 on those systems.


on all systems.  (Both the &| typo and the removed restart logic).

Reverting the change brings it back to the previously known good 
state for 2.6.22. The pending rewrite in progress to this code can 
then be safely merged later.


Signed-off-by: Auke Kok <[EMAIL PROTECTED]>



Jeff I don't see this in netdev upstream-fixes.  Are you waiting on 
Auke to respond to this?


milton

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RFC: Support send-to-self over external interfaces (and veths).

2007-06-11 Thread Patrick McHardy
Ben Greear wrote:
> Patrick McHardy wrote:
> 
>> I would really prefer to simply make the prio 0 "lookup local"
>> rule deletable so you can rules with higher priority. That
>> allows to do send to self without any further code changes
>> and avoids the need to bind applications to a device.
>>   
> 
> I am not against making that change as well, but it is often easier to
> just bind-to-device
> than to set up specific host routes for every possible combination..as
> it appears your
> method requires.  (I could have mis-understood the routing requirements,
> but it seemed to
> if you wanted any 100 interfaces to send to any other, your method would
> required 100 * 100
> routes.)


If you want arbitary combinations bases solely on routing, yes
(thats not different from not using send to self). But binding
explicitly should also work fine.

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] [RFC] NET: Implement a standard ndev_printk family

2007-06-11 Thread Kok, Auke

Stephen Hemminger wrote:

On Fri, 08 Jun 2007 16:42:31 -0700
"Kok, Auke" <[EMAIL PROTECTED]> wrote:


Stephen Hemminger wrote:
 
+#define ndev_printk(kern_level, netif_level, netdev, format, arg...) \

+   do { if ((netdev)->msg_enable & NETIF_MSG_##netif_level) { \
+   printk(kern_level "%s: " format, \
+   (netdev)->name, ## arg); } } while (0)


My preference would be something more like dev_printk or even use that?
You want to show both device name, and physical attachment in the message.


yes, agreed, but currently netdev->dev->bus_id is bluntly overwritten by 
net-sysfs.c making this not trivial:


+471:strlcpy(dev->bus_id, net->name, BUS_ID_SIZE);

so now netdev->dev->bus_id contains eth%d as well. I vaguely remember that there 
is another way to get the bus_id back, so I can likely work around it, but this 
particular overwriting of information is a BUG imo, and should be fixed.


Auke
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] NET: Multiqueue network device support.

2007-06-11 Thread Patrick McHardy
PJ Waskiewicz wrote:
> diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
> index f28bb2d..b9dc2a6 100644
> --- a/net/sched/sch_generic.c
> +++ b/net/sched/sch_generic.c
> @@ -123,7 +123,8 @@ static inline int qdisc_restart(struct net_device *dev)
>   /* And release queue */
>   spin_unlock(&dev->queue_lock);
>  
> - if (!netif_queue_stopped(dev)) {
> + if (!netif_queue_stopped(dev) &&
> + !netif_subqueue_stopped(dev, skb->queue_mapping)) {
>   int ret;
>  
>   ret = dev_hard_start_xmit(skb, dev);


Your patch doesn't update any other users of netif_queue_stopped().
The assumption that they can pass packets to the driver when the
queue is running is no longer valid since they don't know whether
the subqueue the packet will end up in is active (it might be
different from queue 0 if packets were redirected from a multiqueue
aware qdisc through TC actions). So they need to be changed to
check the subqueue state as well.

BTW, I couldn't find anything but a single netif_wake_subqueue
in your (old) e1000 patch. Why doesn't it stop subqueues?
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] [RFC] NET: Implement a standard ndev_printk family

2007-06-11 Thread Stephen Hemminger
On Mon, 11 Jun 2007 10:30:26 -0700
"Kok, Auke" <[EMAIL PROTECTED]> wrote:

> Stephen Hemminger wrote:
> > On Fri, 08 Jun 2007 16:42:31 -0700
> > "Kok, Auke" <[EMAIL PROTECTED]> wrote:
> > 
> >> Stephen Hemminger wrote:
>   
>  +#define ndev_printk(kern_level, netif_level, netdev, format, arg...) \
>  +do { if ((netdev)->msg_enable & NETIF_MSG_##netif_level) { \
>  +printk(kern_level "%s: " format, \
>  +(netdev)->name, ## arg); } } while (0)
> > 
> > My preference would be something more like dev_printk or even use that?
> > You want to show both device name, and physical attachment in the message.
> 
> yes, agreed, but currently netdev->dev->bus_id is bluntly overwritten by 
> net-sysfs.c making this not trivial:
> 
> +471:strlcpy(dev->bus_id, net->name, BUS_ID_SIZE);

That was because the bus_id is used as the name in /sys/class/net. The term
bus_id is overloaded as part of the confusion about device vs pseudo-device
that was part of the original sysfs design.

> so now netdev->dev->bus_id contains eth%d as well. I vaguely remember that 
> there 
> is another way to get the bus_id back, so I can likely work around it, but 
> this 
> particular overwriting of information is a BUG imo, and should be fixed.
> 
> Auke

If it is a pci_device use pci_name(pdev).

-- 
Stephen Hemminger <[EMAIL PROTECTED]>
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] NET: Multiqueue network device support.

2007-06-11 Thread Patrick McHardy
PJ Waskiewicz wrote:
> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> index e7367c7..8bcd870 100644
> --- a/include/linux/skbuff.h
> +++ b/include/linux/skbuff.h
> @@ -215,6 +215,7 @@ typedef unsigned char *sk_buff_data_t;
>   *   @pkt_type: Packet class
>   *   @fclone: skbuff clone status
>   *   @ip_summed: Driver fed us an IP checksum
> + *   @queue_mapping: Queue mapping for multiqueue devices
>   *   @priority: Packet queueing priority
>   *   @users: User count - see {datagram,tcp}.c
>   *   @protocol: Packet protocol from driver
> @@ -269,6 +270,7 @@ struct sk_buff {
>   __u16   csum_offset;
>   };
>   };
> + __u16   queue_mapping;
>   __u32   priority;
>   __u8local_df:1,
>   cloned:1,


I think we can reuse skb->priority. Assuming only real hardware
devices use multiqueue support, there should be no user of
skb->priority after egress qdisc classification. The only reason
to preserve it in the qdisc layer is for software devices.

Grepping through drivers/net shows a few users, bot most seem
to be using it on the RX path and some use it to store internal
data.

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH] NET: Multiqueue network device support.

2007-06-11 Thread Waskiewicz Jr, Peter P
> PJ Waskiewicz wrote:
> > diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 
> > e7367c7..8bcd870 100644
> > --- a/include/linux/skbuff.h
> > +++ b/include/linux/skbuff.h
> > @@ -215,6 +215,7 @@ typedef unsigned char *sk_buff_data_t;
> >   * @pkt_type: Packet class
> >   * @fclone: skbuff clone status
> >   * @ip_summed: Driver fed us an IP checksum
> > + * @queue_mapping: Queue mapping for multiqueue devices
> >   * @priority: Packet queueing priority
> >   * @users: User count - see {datagram,tcp}.c
> >   * @protocol: Packet protocol from driver
> > @@ -269,6 +270,7 @@ struct sk_buff {
> > __u16   csum_offset;
> > };
> > };
> > +   __u16   queue_mapping;
> > __u32   priority;
> > __u8local_df:1,
> > cloned:1,
> 
> 
> I think we can reuse skb->priority. Assuming only real 
> hardware devices use multiqueue support, there should be no user of
> skb->priority after egress qdisc classification. The only reason
> to preserve it in the qdisc layer is for software devices.

That would be oustanding.

> Grepping through drivers/net shows a few users, bot most seem 
> to be using it on the RX path and some use it to store internal data.

Thank you for hunting this down.  I will test on my little environment
here to see if I run into any issues.

-PJ
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH] NET: Multiqueue network device support.

2007-06-11 Thread Waskiewicz Jr, Peter P
> PJ Waskiewicz wrote:
> > diff --git a/net/sched/sch_generic.c 
> b/net/sched/sch_generic.c index 
> > f28bb2d..b9dc2a6 100644
> > --- a/net/sched/sch_generic.c
> > +++ b/net/sched/sch_generic.c
> > @@ -123,7 +123,8 @@ static inline int qdisc_restart(struct 
> net_device *dev)
> > /* And release queue */
> > spin_unlock(&dev->queue_lock);
> >  
> > -   if (!netif_queue_stopped(dev)) {
> > +   if (!netif_queue_stopped(dev) &&
> > +   !netif_subqueue_stopped(dev, 
> skb->queue_mapping)) {
> > int ret;
> >  
> > ret = dev_hard_start_xmit(skb, dev);
> 
> 
> Your patch doesn't update any other users of netif_queue_stopped().
> The assumption that they can pass packets to the driver when 
> the queue is running is no longer valid since they don't know 
> whether the subqueue the packet will end up in is active (it 
> might be different from queue 0 if packets were redirected 
> from a multiqueue aware qdisc through TC actions). So they 
> need to be changed to check the subqueue state as well.

I will look at all these cases and change them accordingly.  Thanks for
catching that.

> BTW, I couldn't find anything but a single 
> netif_wake_subqueue in your (old) e1000 patch. Why doesn't it 
> stop subqueues?

A previous e1000 patch stopped subqueues.  The last e1000 patch I sent
to the list doesn't stop them, and that's a problem with that patch; it
was sent purely to show how the alloc_etherdev_mq() stuff worked, but I
missed the subqueue control.  I can fix that and send an updated patch
if you'd like.  The reason I missed it is we maintain an out-of-tree
driver and an in-tree driver, and mixing/matching code between the two
becomes a bit of a juggling act sometimes when doing little engineering
snippits.

Thanks for reviewing these.  I'll repost something with updates from
your feedback.

Cheers,
-PJ Waskiewicz
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] NET: Multiqueue network device support.

2007-06-11 Thread Patrick McHardy
Waskiewicz Jr, Peter P wrote:
>>I think we can reuse skb->priority. Assuming only real 
>>hardware devices use multiqueue support, there should be no user of
>>skb->priority after egress qdisc classification. The only reason
>>to preserve it in the qdisc layer is for software devices.
> 
> 
> That would be oustanding.
> 
> 
>>Grepping through drivers/net shows a few users, bot most seem 
>>to be using it on the RX path and some use it to store internal data.
> 
> 
> Thank you for hunting this down.  I will test on my little environment
> here to see if I run into any issues.


I think grepping will help more than testing :)

The only issue I can see is that packets going to a multiqueue device
that doesn't have a multiqueue aware qdisc attached will get a random
value. So you would have to conditionally reset it before ->enqueue.

Another question is what to do about other hard_start_xmit callers.
Independant of which field is used, should the classification that
may have happend on a different device be retained (TC actions again)?
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] NET: Multiqueue network device support.

2007-06-11 Thread Patrick McHardy
Waskiewicz Jr, Peter P wrote:
>>BTW, I couldn't find anything but a single 
>>netif_wake_subqueue in your (old) e1000 patch. Why doesn't it 
>>stop subqueues?
> 
> 
> A previous e1000 patch stopped subqueues.  The last e1000 patch I sent
> to the list doesn't stop them, and that's a problem with that patch; it
> was sent purely to show how the alloc_etherdev_mq() stuff worked, but I
> missed the subqueue control.  I can fix that and send an updated patch
> if you'd like.  The reason I missed it is we maintain an out-of-tree
> driver and an in-tree driver, and mixing/matching code between the two
> becomes a bit of a juggling act sometimes when doing little engineering
> snippits.
> 
> Thanks for reviewing these.  I'll repost something with updates from
> your feedback.


Thanks, I do have some more comments, but a repost with the patches
split up in infrastructure changes, qdisc changes one patch per qdisc
and the e1000 patch would make that easier.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/15] spidernet driver bug fixes

2007-06-11 Thread Linas Vepstas
On Fri, Jun 08, 2007 at 01:20:20PM -0400, Jeff Garzik wrote:
> On Fri, Jun 08, 2007 at 12:06:08PM -0500, Linas Vepstas wrote:
> > On Fri, Jun 08, 2007 at 11:12:31AM +1000, Michael Ellerman wrote:
> > > On Thu, 2007-06-07 at 14:17 -0500, Linas Vepstas wrote:
> > > > 
> > > > The major bug fixes are: 
> > > I realise it's late, but shouldn't "major bugfixes" be going into 22 ?
> > Yeah, I suppose, I admit I've lost track of the process. 
>
> You need to order your bug fixes first in the queue. 

OK, here are the patches, re-ordered. There is a different number
than last time, as I threw out one, merged one, and got cold feet
on a third one.  They still pass the tests.

The first five patches focus on three serious bugs, fixing crashes or
hangs.

-- patch 1 -- kernel crash when ifdown while receiving packets.
-- patch 2,3,4 -- device driver deadlocks on "RX ram full" mesgs.
  (kernel stays up, ifdown/up clear the problem).
-- patch 5 -- misconfigured TX interrupts results in 3x-4x per
  degradation for small packets.

-- patch 6 -- rx stats may be mangled
-- patch 7 -- hw checksum sometimes breaks ipv6 operation

-- patches 8-15 -- misc tweaks, and documentation.


I re-ran my stress tests with patches 1-7 applied; they pass.

I suggest that patches 1-5 or 1-7 be applied asap.

--linas
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH] NET: Multiqueue network device support.

2007-06-11 Thread Waskiewicz Jr, Peter P
> I think grepping will help more than testing :)
> 
> The only issue I can see is that packets going to a 
> multiqueue device that doesn't have a multiqueue aware qdisc 
> attached will get a random value. So you would have to 
> conditionally reset it before ->enqueue.

I currently clear queue_mapping before ->enqueue().  Perhaps keeping
queue_mapping in there might solve needing a conditional in the hotpath.
Let me think about this one.

> Another question is what to do about other hard_start_xmit callers.
> Independant of which field is used, should the classification 
> that may have happend on a different device be retained (TC 
> actions again)?

The two cases I can think of here are ip forwarding and bonding.  In the
case of bonding, things should be fine since the bonded device will only
have one "ring."  Therefore if the underlying slave devices are
heterogenous, there shouldn't be a problem retaining the previous TC
classification; if the device has its own qdisc and classifiers, it can
override it.

For ip forwarding, I believe it will also be ok since ultimately the
device doing the last transmit will have his classifiers applied and
remap skb's if necessary.  Either way, before it gets enqueued through
dev_queue_xmit(), the value will get cleared, so having an artifact
laying around won't be possible.

If that's not what you're referring to, please let me know.

Thanks,
-PJ Waskiewicz
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/15] spidernet: null out skb pointer after its been used.

2007-06-11 Thread Linas Vepstas

Avoid kernel crash in mm/slab.c due to double-free of pointer.

If the ethernet interface is brought down while there is still
RX traffic in flight, the device shutdown routine can end up
trying to double-free an skb, leading to a crash in mm/slab.c
Avoid the double-free by nulling out the skb pointer.

Signed-off-by: Linas Vepstas <[EMAIL PROTECTED]>


 drivers/net/spider_net.c |1 +
 1 file changed, 1 insertion(+)

Index: linux-2.6.22-rc1/drivers/net/spider_net.c
===
--- linux-2.6.22-rc1.orig/drivers/net/spider_net.c  2007-06-08 
15:45:33.0 -0500
+++ linux-2.6.22-rc1/drivers/net/spider_net.c   2007-06-08 15:48:10.0 
-0500
@@ -1131,6 +1131,7 @@ spider_net_decode_one_descr(struct spide
 
/* Ok, we've got a packet in descr */
spider_net_pass_skb_up(descr, card);
+   descr->skb = NULL;
hwdescr->dmac_cmd_status = SPIDER_NET_DESCR_NOT_IN_USE;
return 1;
 
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/15] spidernet: Don't terminate the RX ring

2007-06-11 Thread Linas Vepstas


The terminated RX ring will cause trouble during the RX ram full
conditions, leading to a hung driver, as the hardware can't find
the next descr.  There is no real reason to terminate the RX ring; 
it doesn't make the operation any smooother, and it does
require an extra sync. So don't do it.

Signed-off-by: Linas Vepstas <[EMAIL PROTECTED]>


 drivers/net/spider_net.c |   18 +-
 1 file changed, 9 insertions(+), 9 deletions(-)

Index: linux-2.6.22-rc1/drivers/net/spider_net.c
===
--- linux-2.6.22-rc1.orig/drivers/net/spider_net.c  2007-06-08 
17:35:33.0 -0500
+++ linux-2.6.22-rc1/drivers/net/spider_net.c   2007-06-08 17:36:19.0 
-0500
@@ -460,13 +460,9 @@ spider_net_prepare_rx_descr(struct spide
hwdescr->dmac_cmd_status = SPIDER_NET_DESCR_NOT_IN_USE;
} else {
hwdescr->buf_addr = buf;
-   hwdescr->next_descr_addr = 0;
wmb();
hwdescr->dmac_cmd_status = SPIDER_NET_DESCR_CARDOWNED |
 SPIDER_NET_DMAC_NOINTR_COMPLETE;
-
-   wmb();
-   descr->prev->hwdescr->next_descr_addr = descr->bus_addr;
}
 
return 0;
@@ -541,12 +537,16 @@ spider_net_refill_rx_chain(struct spider
 static int
 spider_net_alloc_rx_skbs(struct spider_net_card *card)
 {
-   int result;
-   struct spider_net_descr_chain *chain;
+   struct spider_net_descr_chain *chain = &card->rx_chain;
+   struct spider_net_descr *start = chain->tail;
+   struct spider_net_descr *descr = start;
 
-   result = -ENOMEM;
+   /* Link up the hardware chain pointers */
+   do {
+   descr->prev->hwdescr->next_descr_addr = descr->bus_addr;
+   descr = descr->next;
+   } while (descr != start);
 
-   chain = &card->rx_chain;
/* Put at least one buffer into the chain. if this fails,
 * we've got a problem. If not, spider_net_refill_rx_chain
 * will do the rest at the end of this function. */
@@ -563,7 +563,7 @@ spider_net_alloc_rx_skbs(struct spider_n
 
 error:
spider_net_free_rx_chain_contents(card);
-   return result;
+   return -ENOMEM;
 }
 
 /**
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/3] myri10ge updates

2007-06-11 Thread Brice Goglin
Hi Jeff,

Here are 3 minor updates of myri10ge for 2.6.22. The first one is a
reduced version of the earlier nacked patch, it now limits the number of
recoveries (to detect bad nics) without making the limit tunable.

1. limit the number of recoveries
2. report when the link partner is running in Myrinet mode
3. update driver version

Please apply.
Thanks!
Brice

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/3] myri10ge: limit the number of recoveries

2007-06-11 Thread Brice Goglin
Limit the number of recoveries from a NIC hw watchdog reset to 1 by default.
It enables detection of defective NICs immediately since these memory parity
errors are expected to happen very rarely (less than once per century*NIC).

Signed-off-by: Brice Goglin <[EMAIL PROTECTED]>
---
 drivers/net/myri10ge/myri10ge.c |   12 ++--
 1 file changed, 10 insertions(+), 2 deletions(-)

Index: linux-rc/drivers/net/myri10ge/myri10ge.c
===
--- linux-rc.orig/drivers/net/myri10ge/myri10ge.c   2007-05-30 
20:58:22.0 +0200
+++ linux-rc/drivers/net/myri10ge/myri10ge.c2007-06-04 19:01:54.0 
+0200
@@ -279,6 +279,8 @@
 module_param(myri10ge_fill_thresh, int, S_IRUGO | S_IWUSR);
 MODULE_PARM_DESC(myri10ge_fill_thresh, "Number of empty rx slots allowed\n");
 
+static int myri10ge_reset_recover = 1;
+
 static int myri10ge_wcfifo = 0;
 module_param(myri10ge_wcfifo, int, S_IRUGO);
 MODULE_PARM_DESC(myri10ge_wcfifo, "Enable WC Fifo when WC is enabled\n");
@@ -2730,8 +2732,14 @@
 * For now, just report it */
reboot = myri10ge_read_reboot(mgp);
printk(KERN_ERR
-  "myri10ge: %s: NIC rebooted (0x%x), resetting\n",
-  mgp->dev->name, reboot);
+  "myri10ge: %s: NIC rebooted (0x%x),%s resetting\n",
+  mgp->dev->name, reboot,
+  myri10ge_reset_recover ? " " : " not");
+   if (myri10ge_reset_recover == 0)
+   return;
+
+   myri10ge_reset_recover--;
+
/*
 * A rebooted nic will come back with config space as
 * it was after power was applied to PCIe bus.


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/3] myri10ge: report when the link partner is running in Myrinet mode

2007-06-11 Thread Brice Goglin
Since Myri-10G boards may also run in Myrinet mode instead of Ethernet,
add a message when we detect that the link partner is not running in the
right mode.

Signed-off-by: Brice Goglin <[EMAIL PROTECTED]>
---
 drivers/net/myri10ge/myri10ge.c |   15 ++-
 1 file changed, 10 insertions(+), 5 deletions(-)

Index: linux-rc/drivers/net/myri10ge/myri10ge.c
===
--- linux-rc.orig/drivers/net/myri10ge/myri10ge.c   2007-06-05 
08:25:05.0 +0200
+++ linux-rc/drivers/net/myri10ge/myri10ge.c2007-06-11 20:04:57.0 
+0200
@@ -1156,9 +1156,11 @@
struct mcp_irq_data *stats = mgp->fw_stats;
 
if (unlikely(stats->stats_updated)) {
-   if (mgp->link_state != stats->link_up) {
-   mgp->link_state = stats->link_up;
-   if (mgp->link_state) {
+   unsigned link_up = ntohl(stats->link_up);
+   if (mgp->link_state != link_up) {
+   mgp->link_state = link_up;
+
+   if (mgp->link_state == MXGEFW_LINK_UP) {
if (netif_msg_link(mgp))
printk(KERN_INFO
   "myri10ge: %s: link up\n",
@@ -1168,8 +1170,11 @@
} else {
if (netif_msg_link(mgp))
printk(KERN_INFO
-  "myri10ge: %s: link down\n",
-  mgp->dev->name);
+  "myri10ge: %s: link %s\n",
+  mgp->dev->name,
+  (link_up == MXGEFW_LINK_MYRINET ?
+   "mismatch (Myrinet detected)" :
+   "down"));
netif_carrier_off(mgp->dev);
mgp->link_changes++;
}


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] NET: Multiqueue network device support.

2007-06-11 Thread Patrick McHardy
Waskiewicz Jr, Peter P wrote:
>>Another question is what to do about other hard_start_xmit callers.
>>Independant of which field is used, should the classification 
>>that may have happend on a different device be retained (TC 
>>actions again)?
> 
> 
> [...] Either way, before it gets enqueued through
> dev_queue_xmit(), the value will get cleared, so having an artifact
> laying around won't be possible.


You're right, I was thinking of a case where a packet would
be redirected from a multiqueue device to another one and
then not go through dev_queue_xmit but some other path to
hard_start_xmit that doesn't update the classification.
But there is no case like this.

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/3] myri10ge: update driver version

2007-06-11 Thread Brice Goglin
Update myri10ge driver version to 1.3.1-1.248.

Signed-off-by: Brice Goglin <[EMAIL PROTECTED]>
---
 drivers/net/myri10ge/myri10ge.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Index: linux-rc/drivers/net/myri10ge/myri10ge.c
===
--- linux-rc.orig/drivers/net/myri10ge/myri10ge.c   2007-06-11 
20:04:57.0 +0200
+++ linux-rc/drivers/net/myri10ge/myri10ge.c2007-06-11 20:05:09.0 
+0200
@@ -71,7 +71,7 @@
 #include "myri10ge_mcp.h"
 #include "myri10ge_mcp_gen_header.h"
 
-#define MYRI10GE_VERSION_STR "1.3.0-1.233"
+#define MYRI10GE_VERSION_STR "1.3.1-1.248"
 
 MODULE_DESCRIPTION("Myricom 10G driver (10GbE)");
 MODULE_AUTHOR("Maintainer: [EMAIL PROTECTED]");


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/15] spidernet: silence the ramfull messages

2007-06-11 Thread Linas Vepstas

Although the previous patch resolved issues with hangs when the
RX ram full interrupt is encountered, there are still situations
where lots of RX ramfull interrupts arrive, resulting in a noisy
log in syslog. There is no need for this.

Signed-off-by: Linas Vepstas <[EMAIL PROTECTED]>


 drivers/net/spider_net.c |   20 +++-
 drivers/net/spider_net.h |1 +
 2 files changed, 12 insertions(+), 9 deletions(-)

Index: linux-2.6.22-rc1/drivers/net/spider_net.c
===
--- linux-2.6.22-rc1.orig/drivers/net/spider_net.c  2007-06-11 
10:02:34.0 -0500
+++ linux-2.6.22-rc1/drivers/net/spider_net.c   2007-06-11 11:45:25.0 
-0500
@@ -1172,7 +1172,7 @@ spider_net_decode_one_descr(struct spide
goto bad_desc;
}
 
-   if (hwdescr->dmac_cmd_status & 0xfefe) {
+   if (hwdescr->dmac_cmd_status & 0xfcf4) {
pr_err("%s: bad status, cmd_status=x%08x\n",
   card->netdev->name,
   hwdescr->dmac_cmd_status);
@@ -1251,6 +1251,7 @@ spider_net_poll(struct net_device *netde
if (no_more_packets) {
netif_rx_complete(netdev);
spider_net_rx_irq_on(card);
+   card->ignore_rx_ramfull = 0;
return 0;
}
 
@@ -1521,15 +1522,15 @@ spider_net_handle_error_irq(struct spide
case SPIDER_NET_GRFBFLLINT: /* fallthrough */
case SPIDER_NET_GRFAFLLINT: /* fallthrough */
case SPIDER_NET_GRMFLLINT:
-   if (netif_msg_intr(card) && net_ratelimit())
-   pr_err("Spider RX RAM full, incoming packets "
-  "might be discarded!\n");
/* Could happen when rx chain is full */
-   spider_net_resync_head_ptr(card);
-   spider_net_refill_rx_chain(card);
-   spider_net_enable_rxdmac(card);
-   card->num_rx_ints ++;
-   netif_rx_schedule(card->netdev);
+   if (card->ignore_rx_ramfull == 0) {
+   card->ignore_rx_ramfull = 1;
+   spider_net_resync_head_ptr(card);
+   spider_net_refill_rx_chain(card);
+   spider_net_enable_rxdmac(card);
+   card->num_rx_ints ++;
+   netif_rx_schedule(card->netdev);
+   }
show_error = 0;
break;
 
@@ -2305,6 +2306,7 @@ spider_net_setup_netdev(struct spider_ne
 
netdev->irq = card->pdev->irq;
card->num_rx_ints = 0;
+   card->ignore_rx_ramfull = 0;
 
dn = pci_device_to_OF_node(card->pdev);
if (!dn)
Index: linux-2.6.22-rc1/drivers/net/spider_net.h
===
--- linux-2.6.22-rc1.orig/drivers/net/spider_net.h  2007-06-11 
10:02:25.0 -0500
+++ linux-2.6.22-rc1/drivers/net/spider_net.h   2007-06-11 11:45:50.0 
-0500
@@ -462,6 +462,7 @@ struct spider_net_card {
atomic_t tx_timeout_task_counter;
wait_queue_head_t waitq;
int num_rx_ints;
+   int ignore_rx_ramfull;
 
/* for ethtool */
int msg_enable;
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 5/15] spidernet: turn off descriptor chain end interrupt.

2007-06-11 Thread Linas Vepstas

At some point, the transmit descriptor chain end interrupt (TXDCEINT)
was turned on. This is a mistake; and it damages small packet
transmit performance, as it results in a huge storm of interrupts.  
Turn it off.

Signed-off-by: Linas Vepstas <[EMAIL PROTECTED]>


 drivers/net/spider_net.h |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Index: linux-2.6.22-rc1/drivers/net/spider_net.h
===
--- linux-2.6.22-rc1.orig/drivers/net/spider_net.h  2007-06-08 
17:40:02.0 -0500
+++ linux-2.6.22-rc1/drivers/net/spider_net.h   2007-06-08 17:40:05.0 
-0500
@@ -222,6 +222,7 @@ extern char spider_net_driver_name[];
 #define SPIDER_NET_GDTBSTA 0x0300
 #define SPIDER_NET_GDTDCEIDIS  0x0002
 #define SPIDER_NET_DMA_TX_VALUESPIDER_NET_TX_DMA_EN | \
+   SPIDER_NET_GDTDCEIDIS | \
SPIDER_NET_GDTBSTA
 
 #define SPIDER_NET_DMA_TX_FEND_VALUE   0x00030003
@@ -332,8 +333,7 @@ enum spider_net_int2_status {
SPIDER_NET_GRISPDNGINT
 };
 
-#define SPIDER_NET_TXINT   ( (1 << SPIDER_NET_GDTFDCINT) | \
- (1 << SPIDER_NET_GDTDCEINT) )
+#define SPIDER_NET_TXINT   (1 << SPIDER_NET_GDTFDCINT)
 
 /* We rely on flagged descriptor interrupts */
 #define SPIDER_NET_RXINT   ( (1 << SPIDER_NET_GDAFDCINT) )
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/15] spidernet: Cure RX ram full bug

2007-06-11 Thread Linas Vepstas

This patch fixes a rare deadlock that can occur when the kernel
is not able to empty out the RX ring quickly enough. Below follows
a detailed description of the bug and the fix.

As long as the OS can empty out the RX buffers at a rate faster than
the hardware can fill them, there is no problem. If, for some reason,
the OS fails to empty the RX ring fast enough, the hardware GDACTDPA
pointer will catch up to the head, notice the not-empty condition,
ad stop. However, RX packets may still continue arriving on the wire.
The spidernet chip can save some limited number of these in local RAM.
When this local ram fills up, the spider chip will issue an interrupt
indicating this (GHIINT0STS will show ERRINT, and the GRMFLLINT bit
will be set in GHIINT1STS).  When te RX ram full condition occurs, 
a certain bug/feature is triggered that has to be specially handled. 
This section describes the special handling for this condition.

When the OS finally has a chance to run, it will empty out the RX ring.
In particular, it will clear the descriptor on which the hardware had
stopped. However, once the hardware has decided that a certain
descriptor is invalid, it will not restart at that descriptor; instead
it will restart at the next descr. This potentially will lead to a 
deadlock condition, as the tail pointer will be pointing at this descr, 
which, from the OS point of view, is empty; the OS will be waiting for 
this descr to be filled. However, the hardware has skipped this descr, 
and is filling the next descrs. Since the OS doesn't see this, there
is a potential deadlock, with the OS waiting for one descr to fill, 
while the hardware is waiting for a differen set of descrs to become
empty.

A call to show_rx_chain() at this point indicates the nature of the
problem. A typical print when the network is hung shows the following:

net eth1: Spider RX RAM full, incoming packets might be discarded!
net eth1: Total number of descrs=256
net eth1: Chain tail located at descr=255
net eth1: Chain head is at 255
net eth1: HW curr desc (GDACTDPA) is at 0
net eth1: Have 1 descrs with stat=xa080
net eth1: HW next desc (GDACNEXTDA) is at 1
net eth1: Have 127 descrs with stat=x40800101
net eth1: Have 1 descrs with stat=x4081
net eth1: Have 126 descrs with stat=x40800101
net eth1: Last 1 descrs with stat=xa080

Both the tail and head pointers are pointing at descr 255, which is
marked xa... which is "empty". Thus, from the OS point of view, there
is nothing to be done. In particular, there is the implicit assumption
that everything in front of the "empty" descr must surely also be empty,
as explained in the last section. The OS is waiting for descr 255 to
become non-empty, which, in this case, will never happen.

The HW pointer is at descr 0. This descr is marked 0x4.. or "full". 
Since its already full, the hardware can do nothing more, and thus has
halted processing. Notice that descrs 0 through 254 are all marked
"full", while descr 254 and 255 are empty. (The "Last 1 descrs" is 
descr 254, since tail was at 255.) Thus, the system is deadlocked, 
and there can be no forward progress; the OS thinks there's nothing 
to do, and the hardware has nowhere to put incoming data.

This bug/feature is worked around with the spider_net_resync_head_ptr()
routine. When the driver receives RX interrupts, but an examination
of the RX chain seems to show it is empty, then it is probable that
the hardware has skipped a descr or two (sometimes dozens under heavy
network conditions). The spider_net_resync_head_ptr() subroutine will
search the ring for the next full descr, and the driver will resume
operations there.  Since this will leave "holes" in the ring, there
is also a spider_net_resync_tail_ptr() that will skip over such holes. 


Signed-off-by: Linas Vepstas <[EMAIL PROTECTED]>


 drivers/net/spider_net.c |   86 +++
 drivers/net/spider_net.h |3 +
 2 files changed, 82 insertions(+), 7 deletions(-)

Index: linux-2.6.22-rc1/drivers/net/spider_net.c
===
--- linux-2.6.22-rc1.orig/drivers/net/spider_net.c  2007-06-08 
15:48:10.0 -0500
+++ linux-2.6.22-rc1/drivers/net/spider_net.c   2007-06-11 10:02:12.0 
-0500
@@ -1051,6 +1051,66 @@ static void show_rx_chain(struct spider_
 #endif
 
 /**
+ * spider_net_resync_head_ptr - Advance head ptr past empty descrs
+ *
+ * If the driver fails to keep up and empty the queue, then the
+ * hardware wil run out of room to put incoming packets. This
+ * will cause the hardware to skip descrs that are full (instead
+ * of halting/retrying). Thus, once the driver runs, it wil need
+ * to "catch up" to where the hardware chain pointer is at.
+ */
+static void spider_net_resync_head_ptr(struct spider_net_card *card)
+{
+   unsigned long flags;
+   struct spider_net_descr_chain *chain = &card->rx_chain;
+   struct spider_net_descr *descr;
+   int i, sta

[PATCH 6/15] spidernet: skb used after netif_receive_skb

2007-06-11 Thread Linas Vepstas

From: Florin Malita <[EMAIL PROTECTED]>

The stats update code in spider_net_pass_skb_up() is touching the skb 
after it's been passed up to the stack. To avoid that, just update the 
stats first.

Signed-off-by: Florin Malita <[EMAIL PROTECTED]>
Signed-off-by: Linas Vepstas <[EMAIL PROTECTED]>



 drivers/net/spider_net.c |6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/spider_net.c b/drivers/net/spider_net.c
index 108adbf..1df2f0b 100644
Index: linux-2.6.22-rc1/drivers/net/spider_net.c
===
--- linux-2.6.22-rc1.orig/drivers/net/spider_net.c  2007-06-08 
17:40:02.0 -0500
+++ linux-2.6.22-rc1/drivers/net/spider_net.c   2007-06-08 17:40:09.0 
-0500
@@ -1014,12 +1014,12 @@ spider_net_pass_skb_up(struct spider_net
 */
}
 
-   /* pass skb up to stack */
-   netif_receive_skb(skb);
-
/* update netdevice statistics */
card->netdev_stats.rx_packets++;
card->netdev_stats.rx_bytes += skb->len;
+
+   /* pass skb up to stack */
+   netif_receive_skb(skb);
 }
 
 #ifdef DEBUG
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 7/15] spidernet: checksum and ethtool

2007-06-11 Thread Linas Vepstas

From: Stephen Hemminger <[EMAIL PROTECTED]>

It doesn't look like spidernet hardware can really checksum all protocols,
the code looks like it does IPV4 only.  If so, it should use NETIF_F_IP_CSUM
instead of NETIF_F_HW_CSUM.

The driver doesn't need it's own get/set for ethtool tx csum, and it
should use the standard ethtool_op_get_link.

Signed-off-by: Stephen Hemminger <[EMAIL PROTECTED]>
Signed-off-by: Linas Vepstas <[EMAIL PROTECTED]>

-

 drivers/net/spider_net.c |4 ++--
 drivers/net/spider_net_ethtool.c |   21 +++--
 2 files changed, 5 insertions(+), 20 deletions(-)

Index: linux-2.6.22-rc1/drivers/net/spider_net.c
===
--- linux-2.6.22-rc1.orig/drivers/net/spider_net.c  2007-06-08 
17:28:55.0 -0500
+++ linux-2.6.22-rc1/drivers/net/spider_net.c   2007-06-08 17:28:58.0 
-0500
@@ -718,7 +718,7 @@ spider_net_prepare_tx_descr(struct spide
SPIDER_NET_DESCR_CARDOWNED | SPIDER_NET_DMAC_NOCS;
spin_unlock_irqrestore(&chain->lock, flags);
 
-   if (skb->protocol == htons(ETH_P_IP) && skb->ip_summed == 
CHECKSUM_PARTIAL)
+   if (skb->ip_summed == CHECKSUM_PARTIAL)
switch (ip_hdr(skb)->protocol) {
case IPPROTO_TCP:
hwdescr->dmac_cmd_status |= SPIDER_NET_DMAC_TCP;
@@ -2300,7 +2300,7 @@ spider_net_setup_netdev(struct spider_ne
 
spider_net_setup_netdev_ops(netdev);
 
-   netdev->features = NETIF_F_HW_CSUM | NETIF_F_LLTX;
+   netdev->features = NETIF_F_IP_CSUM | NETIF_F_LLTX;
/* some time: NETIF_F_HW_VLAN_TX | NETIF_F_HW_VLAN_RX |
 *  NETIF_F_HW_VLAN_FILTER */
 
Index: linux-2.6.22-rc1/drivers/net/spider_net_ethtool.c
===
--- linux-2.6.22-rc1.orig/drivers/net/spider_net_ethtool.c  2007-06-08 
17:27:01.0 -0500
+++ linux-2.6.22-rc1/drivers/net/spider_net_ethtool.c   2007-06-08 
17:28:58.0 -0500
@@ -134,22 +134,6 @@ spider_net_ethtool_set_rx_csum(struct ne
return 0;
 }
 
-static uint32_t
-spider_net_ethtool_get_tx_csum(struct net_device *netdev)
-{
-return (netdev->features & NETIF_F_HW_CSUM) != 0;
-}
-
-static int
-spider_net_ethtool_set_tx_csum(struct net_device *netdev, uint32_t data)
-{
-if (data)
-netdev->features |= NETIF_F_HW_CSUM;
-else
-netdev->features &= ~NETIF_F_HW_CSUM;
-
-return 0;
-}
 
 static void
 spider_net_ethtool_get_ringparam(struct net_device *netdev,
@@ -200,11 +184,12 @@ const struct ethtool_ops spider_net_etht
.get_wol= spider_net_ethtool_get_wol,
.get_msglevel   = spider_net_ethtool_get_msglevel,
.set_msglevel   = spider_net_ethtool_set_msglevel,
+   .get_link   = ethtool_op_get_link,
.nway_reset = spider_net_ethtool_nway_reset,
.get_rx_csum= spider_net_ethtool_get_rx_csum,
.set_rx_csum= spider_net_ethtool_set_rx_csum,
-   .get_tx_csum= spider_net_ethtool_get_tx_csum,
-   .set_tx_csum= spider_net_ethtool_set_tx_csum,
+   .get_tx_csum= ethtool_op_get_tx_csum,
+   .set_tx_csum= ethtool_op_set_tx_csum,
.get_ringparam  = spider_net_ethtool_get_ringparam,
.get_strings= spider_net_get_strings,
.get_stats_count= spider_net_get_stats_count,
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 8/15] spidernet: beautify error messages

2007-06-11 Thread Linas Vepstas

Use dev_err() to print device error messages.

Signed-off-by: Linas Vepstas <[EMAIL PROTECTED]>


 drivers/net/spider_net.c |   64 ---
 1 file changed, 34 insertions(+), 30 deletions(-)

Index: linux-2.6.22-rc1/drivers/net/spider_net.c
===
--- linux-2.6.22-rc1.orig/drivers/net/spider_net.c  2007-06-11 
13:09:46.0 -0500
+++ linux-2.6.22-rc1/drivers/net/spider_net.c   2007-06-11 13:11:29.0 
-0500
@@ -434,7 +434,8 @@ spider_net_prepare_rx_descr(struct spide
  bufsize + SPIDER_NET_RXBUF_ALIGN - 1);
if (!descr->skb) {
if (netif_msg_rx_err(card) && net_ratelimit())
-   pr_err("Not enough memory to allocate rx buffer\n");
+   dev_err(&card->netdev->dev,
+   "Not enough memory to allocate rx buffer\n");
card->spider_stats.alloc_rx_skb_error++;
return -ENOMEM;
}
@@ -455,7 +456,7 @@ spider_net_prepare_rx_descr(struct spide
dev_kfree_skb_any(descr->skb);
descr->skb = NULL;
if (netif_msg_rx_err(card) && net_ratelimit())
-   pr_err("Could not iommu-map rx buffer\n");
+   dev_err(&card->netdev->dev, "Could not iommu-map rx 
buffer\n");
card->spider_stats.rx_iommu_map_error++;
hwdescr->dmac_cmd_status = SPIDER_NET_DESCR_NOT_IN_USE;
} else {
@@ -692,7 +693,7 @@ spider_net_prepare_tx_descr(struct spide
buf = pci_map_single(card->pdev, skb->data, skb->len, PCI_DMA_TODEVICE);
if (pci_dma_mapping_error(buf)) {
if (netif_msg_tx_err(card) && net_ratelimit())
-   pr_err("could not iommu-map packet (%p, %i). "
+   dev_err(&card->netdev->dev, "could not iommu-map packet 
(%p, %i). "
  "Dropping packet\n", skb->data, skb->len);
card->spider_stats.tx_iommu_map_error++;
return -ENOMEM;
@@ -832,9 +833,8 @@ spider_net_release_tx_chain(struct spide
case SPIDER_NET_DESCR_PROTECTION_ERROR:
case SPIDER_NET_DESCR_FORCE_END:
if (netif_msg_tx_err(card))
-   pr_err("%s: forcing end of tx descriptor "
-  "with status x%02x\n",
-  card->netdev->name, status);
+   dev_err(&card->netdev->dev, "forcing end of tx 
descriptor "
+  "with status x%02x\n", status);
card->netdev_stats.tx_errors++;
break;
 
@@ -1147,8 +1147,8 @@ spider_net_decode_one_descr(struct spide
 (status == SPIDER_NET_DESCR_PROTECTION_ERROR) ||
 (status == SPIDER_NET_DESCR_FORCE_END) ) {
if (netif_msg_rx_err(card))
-   pr_err("%s: dropping RX descriptor with state %d\n",
-  card->netdev->name, status);
+   dev_err(&card->netdev->dev,
+  "dropping RX descriptor with state %d\n", 
status);
card->netdev_stats.rx_dropped++;
goto bad_desc;
}
@@ -1156,8 +1156,8 @@ spider_net_decode_one_descr(struct spide
if ( (status != SPIDER_NET_DESCR_COMPLETE) &&
 (status != SPIDER_NET_DESCR_FRAME_END) ) {
if (netif_msg_rx_err(card))
-   pr_err("%s: RX descriptor with unknown state %d\n",
-  card->netdev->name, status);
+   dev_err(&card->netdev->dev,
+  "RX descriptor with unknown state %d\n", status);
card->spider_stats.rx_desc_unk_state++;
goto bad_desc;
}
@@ -1165,16 +1165,15 @@ spider_net_decode_one_descr(struct spide
/* The cases we'll throw away the packet immediately */
if (hwdescr->data_error & SPIDER_NET_DESTROY_RX_FLAGS) {
if (netif_msg_rx_err(card))
-   pr_err("%s: error in received descriptor found, "
+   dev_err(&card->netdev->dev,
+  "error in received descriptor found, "
   "data_status=x%08x, data_error=x%08x\n",
-  card->netdev->name,
   hwdescr->data_status, hwdescr->data_error);
goto bad_desc;
}
 
if (hwdescr->dmac_cmd_status & 0xfcf4) {
-   pr_err("%s: bad status, cmd_status=x%08x\n",
-  card->netdev->name,
+   dev_err(&card->netdev->dev, "bad status, cmd_status=x%08x\n",
   hwdescr->dmac_cmd_status);
pr_err("buf_add

[PATCH 9/15] spidernet: enhance the dump routine

2007-06-11 Thread Linas Vepstas

Crazy device problems are hard to debug, when one does not have
good trace info. This patch makes a major enhancement to the
device dump routine.

Signed-off-by: Linas Vepstas <[EMAIL PROTECTED]>


 drivers/net/spider_net.c |   78 ++-
 1 file changed, 70 insertions(+), 8 deletions(-)

Index: linux-2.6.22-rc1/drivers/net/spider_net.c
===
--- linux-2.6.22-rc1.orig/drivers/net/spider_net.c  2007-06-11 
11:50:03.0 -0500
+++ linux-2.6.22-rc1/drivers/net/spider_net.c   2007-06-11 11:51:19.0 
-0500
@@ -1022,34 +1022,94 @@ spider_net_pass_skb_up(struct spider_net
netif_receive_skb(skb);
 }
 
-#ifdef DEBUG
 static void show_rx_chain(struct spider_net_card *card)
 {
struct spider_net_descr_chain *chain = &card->rx_chain;
struct spider_net_descr *start= chain->tail;
struct spider_net_descr *descr= start;
+   struct spider_net_hw_descr *hwd = start->hwdescr;
+   struct device *dev = &card->netdev->dev;
+   u32 curr_desc, next_desc;
int status;
 
+   int tot = 0;
int cnt = 0;
-   int cstat = spider_net_get_descr_status(descr);
-   printk(KERN_INFO "RX chain tail at descr=%ld\n",
-(start - card->descr) - card->tx_chain.num_desc);
+   int off = start - chain->ring;
+   int cstat = hwd->dmac_cmd_status;
+
+   dev_info(dev, "Total number of descrs=%d\n",
+   chain->num_desc);
+   dev_info(dev, "Chain tail located at descr=%d, status=0x%x\n",
+   off, cstat);
+
+   curr_desc = spider_net_read_reg(card, SPIDER_NET_GDACTDPA);
+   next_desc = spider_net_read_reg(card, SPIDER_NET_GDACNEXTDA);
+
status = cstat;
do
{
-   status = spider_net_get_descr_status(descr);
+   hwd = descr->hwdescr;
+   off = descr - chain->ring;
+   status = hwd->dmac_cmd_status;
+
+   if (descr == chain->head)
+   dev_info(dev, "Chain head is at %d, head status=0x%x\n",
+off, status);
+
+   if (curr_desc == descr->bus_addr)
+   dev_info(dev, "HW curr desc (GDACTDPA) is at %d, 
status=0x%x\n",
+off, status);
+
+   if (next_desc == descr->bus_addr)
+   dev_info(dev, "HW next desc (GDACNEXTDA) is at %d, 
status=0x%x\n",
+off, status);
+
+   if (hwd->next_descr_addr == 0)
+   dev_info(dev, "chain is cut at %d\n", off);
+
if (cstat != status) {
-   printk(KERN_INFO "Have %d descrs with stat=x%08x\n", 
cnt, cstat);
+   int from = (chain->num_desc + off - cnt) % 
chain->num_desc;
+   int to = (chain->num_desc + off - 1) % chain->num_desc;
+   dev_info(dev, "Have %d (from %d to %d) descrs "
+"with stat=0x%08x\n", cnt, from, to, cstat);
cstat = status;
cnt = 0;
}
+
cnt ++;
+   tot ++;
+   descr = descr->next;
+   } while (descr != start);
+
+   dev_info(dev, "Last %d descrs with stat=0x%08x "
+"for a total of %d descrs\n", cnt, cstat, tot);
+
+#ifdef DEBUG
+   /* Now dump the whole ring */
+   descr = start;
+   do
+   {
+   struct spider_net_hw_descr *hwd = descr->hwdescr;
+   status = spider_net_get_descr_status(hwd);
+   cnt = descr - chain->ring;
+   dev_info(dev, "Descr %d stat=0x%08x skb=%p\n",
+cnt, status, descr->skb);
+   dev_info(dev, "bus addr=%08x buf addr=%08x sz=%d\n",
+descr->bus_addr, hwd->buf_addr, hwd->buf_size);
+   dev_info(dev, "next=%08x result sz=%d valid sz=%d\n",
+hwd->next_descr_addr, hwd->result_size,
+hwd->valid_size);
+   dev_info(dev, "dmac=%08x data stat=%08x data err=%08x\n",
+hwd->dmac_cmd_status, hwd->data_status,
+hwd->data_error);
+   dev_info(dev, "\n");
+
descr = descr->next;
} while (descr != start);
-   printk(KERN_INFO "Last %d descrs with stat=x%08x\n", cnt, cstat);
-}
 #endif
 
+}
+
 /**
  * spider_net_resync_head_ptr - Advance head ptr past empty descrs
  *
@@ -1197,6 +1257,8 @@ spider_net_decode_one_descr(struct spide
return 1;
 
 bad_desc:
+   if (netif_msg_rx_err(card))
+   show_rx_chain(card);
dev_kfree_skb_irq(descr->skb);
descr->skb = NULL;
hwdescr->dmac_cmd_status = SPIDER_NET_DESCR_NOT_IN_USE;
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message

[PATCH 10/15] spidernet: invalidate unused pointer.

2007-06-11 Thread Linas Vepstas

Invalidate a pointer as its pci_unmap'ed; this is a bit of 
paranoia to make sure hardware doesn't continue trying to 
DMA to it. 

Signed-off-by: Linas Vepstas <[EMAIL PROTECTED]>


 drivers/net/spider_net.c |7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

Index: linux-2.6.22-rc1/drivers/net/spider_net.c
===
--- linux-2.6.22-rc1.orig/drivers/net/spider_net.c  2007-06-11 
11:51:19.0 -0500
+++ linux-2.6.22-rc1/drivers/net/spider_net.c   2007-06-11 11:53:21.0 
-0500
@@ -1187,6 +1187,7 @@ spider_net_decode_one_descr(struct spide
struct spider_net_descr_chain *chain = &card->rx_chain;
struct spider_net_descr *descr = chain->tail;
struct spider_net_hw_descr *hwdescr = descr->hwdescr;
+   u32 hw_buf_addr;
int status;
 
status = spider_net_get_descr_status(hwdescr);
@@ -1200,7 +1201,9 @@ spider_net_decode_one_descr(struct spide
chain->tail = descr->next;
 
/* unmap descriptor */
-   pci_unmap_single(card->pdev, hwdescr->buf_addr,
+   hw_buf_addr = hwdescr->buf_addr;
+   hwdescr->buf_addr = 0x;
+   pci_unmap_single(card->pdev, hw_buf_addr,
SPIDER_NET_MAX_FRAME, PCI_DMA_FROMDEVICE);
 
if ( (status == SPIDER_NET_DESCR_RESPONSE_ERROR) ||
@@ -1237,7 +1240,7 @@ spider_net_decode_one_descr(struct spide
dev_err(&card->netdev->dev, "bad status, cmd_status=x%08x\n",
   card->netdev->name,
   hwdescr->dmac_cmd_status);
-   pr_err("buf_addr=x%08x\n", hwdescr->buf_addr);
+   pr_err("buf_addr=x%08x\n", hw_buf_addr);
pr_err("buf_size=x%08x\n", hwdescr->buf_size);
pr_err("next_descr_addr=x%08x\n", hwdescr->next_descr_addr);
pr_err("result_size=x%08x\n", hwdescr->result_size);
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] MII bitbang driver should generate MII bus phy_mask dynamically

2007-06-11 Thread Andy Fleming


On Jun 11, 2007, at 02:53, Mark Zhan wrote:


Current MII bitbang bus driver hard-codes the phy mask of mii_bus to
~0x09, which is actually specific to the FSL boards. This patch will
make the bitbang driver to generate MII bus phy_mask dynamically,  
by the

PHY irq info provided by the platform.

Signed-off-by: Mark Zhan <[EMAIL PROTECTED]>


[...]



+   new_bus->phy_mask = 0x;
+   for (i = 0; i < PHY_MAX_ADDR; i++)
+   if (pdata->irq[i] != -1)
+   new_bus->phy_mask &= ~(1 << i);
+



This doesn't work.  There are a couple of things wrong:

1) Don't use -1, use PHY_POLL

2) As you can tell from #1, the absence of a set interrupt for a  
particular PHY does not indicate it doesn't exist.  Rather, it  
indicates it doesn't have an interrupt.


I think you need to add a phy_mask to the platform data, and then set  
it appropriately.  We also need to look into adding that for the mdio  
nodes in the device tree, so it will get passed in properly.


Andy

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] MII bitbang driver should generate MII bus phy_mask dynamically

2007-06-11 Thread Vitaly Bordug
On Mon, Jun 11, 2007 at 03:53:37PM +0800, Mark Zhan wrote:
> Current MII bitbang bus driver hard-codes the phy mask of mii_bus to
> ~0x09, which is actually specific to the FSL boards. This patch will
> make the bitbang driver to generate MII bus phy_mask dynamically, by the
> PHY irq info provided by the platform.
> 
> Signed-off-by: Mark Zhan <[EMAIL PROTECTED]>
Acked-by: Vitaly Bordug <[EMAIL PROTECTED]>
> ---

>  b/drivers/net/fs_enet/mii-bitbang.c |9 ++---
>  1 file changed, 6 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/net/fs_enet/mii-bitbang.c
> b/drivers/net/fs_enet/mii-bitbang.c
> index d384010..3732d69 100644
> --- a/drivers/net/fs_enet/mii-bitbang.c
> +++ b/drivers/net/fs_enet/mii-bitbang.c
> @@ -315,7 +315,7 @@ static int __devinit fs_enet_mdio_probe(
>   struct fs_mii_bb_platform_info *pdata;
>   struct mii_bus *new_bus;
>   struct bb_info *bitbang;
> - int err = 0;
> + int i, err = 0;
>  
>   if (NULL == dev)
>   return -EINVAL;
> @@ -336,14 +336,17 @@ static int __devinit fs_enet_mdio_probe(
>   new_bus->reset = &fs_enet_mii_bb_reset,
>   new_bus->id = pdev->id;
>  
> - new_bus->phy_mask = ~0x9;
>   pdata = (struct fs_mii_bb_platform_info *)pdev->dev.platform_data;
> -
>   if (NULL == pdata) {
>   printk(KERN_ERR "gfar mdio %d: Missing platform data!\n", 
> pdev->id);
>   return -ENODEV;
>   }
>  
> + new_bus->phy_mask = 0x;
> + for (i = 0; i < PHY_MAX_ADDR; i++)
> + if (pdata->irq[i] != -1)
> + new_bus->phy_mask &= ~(1 << i);
> +
>   /*set up workspace*/
>   fs_mii_bitbang_init(bitbang, pdata);
>  
> 
> ___
> Linuxppc-dev mailing list
> [EMAIL PROTECTED]
> https://ozlabs.org/mailman/listinfo/linuxppc-dev
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 11/15] spidernet: service TX later.

2007-06-11 Thread Linas Vepstas


When entering the netdev poll routine, empty out the RX
chain first, before cleaning up the TX chain. This should
help avoid RX buffer overflows.

Signed-off-by: Linas Vepstas <[EMAIL PROTECTED]>


 drivers/net/spider_net.c |3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Index: linux-2.6.22-rc1/drivers/net/spider_net.c
===
--- linux-2.6.22-rc1.orig/drivers/net/spider_net.c  2007-06-11 
11:53:21.0 -0500
+++ linux-2.6.22-rc1/drivers/net/spider_net.c   2007-06-11 11:53:24.0 
-0500
@@ -1287,7 +1287,6 @@ spider_net_poll(struct net_device *netde
int packets_to_do, packets_done = 0;
int no_more_packets = 0;
 
-   spider_net_cleanup_tx_ring(card);
packets_to_do = min(*budget, netdev->quota);
 
while (packets_to_do) {
@@ -1312,6 +1311,8 @@ spider_net_poll(struct net_device *netde
spider_net_refill_rx_chain(card);
spider_net_enable_rxdmac(card);
 
+   spider_net_cleanup_tx_ring(card);
+
/* if all packets are in the stack, enable interrupts and return 0 */
/* if not, return 1 */
if (no_more_packets) {
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 12/15] spidernet: increase the NAPI weight

2007-06-11 Thread Linas Vepstas

Another way of minimizing the likelyhood of RX ram from overflowing
is to empty out the entire rx ring every chance we get. Change
the crazy watchdog timeout from 50 seconds to 3 seconds, while
we're here.

Signed-off-by: Linas Vepstas <[EMAIL PROTECTED]>


 drivers/net/spider_net.h |9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

Index: linux-2.6.22-rc1/drivers/net/spider_net.h
===
--- linux-2.6.22-rc1.orig/drivers/net/spider_net.h  2007-06-11 
11:50:03.0 -0500
+++ linux-2.6.22-rc1/drivers/net/spider_net.h   2007-06-11 11:53:26.0 
-0500
@@ -56,8 +56,13 @@ extern char spider_net_driver_name[];
 
 #define SPIDER_NET_RX_CSUM_DEFAULT 1
 
-#define SPIDER_NET_WATCHDOG_TIMEOUT50*HZ
-#define SPIDER_NET_NAPI_WEIGHT 64
+#define SPIDER_NET_WATCHDOG_TIMEOUT3*HZ
+
+/* We really really want to empty the ring buffer every time,
+ * so as to avoid the RX ram full bug. So set te napi wieght
+ * to the ring size.
+ */
+#define SPIDER_NET_NAPI_WEIGHT SPIDER_NET_RX_DESCRIPTORS_DEFAULT
 
 #define SPIDER_NET_FIRMWARE_SEQS   6
 #define SPIDER_NET_FIRMWARE_SEQWORDS   1024
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] MII bitbang driver should generate MII bus phy_mask dynamically

2007-06-11 Thread Vitaly Bordug
On Mon, Jun 11, 2007 at 03:53:37PM +0800, Mark Zhan wrote:
> Current MII bitbang bus driver hard-codes the phy mask of mii_bus to
> ~0x09, which is actually specific to the FSL boards. This patch will
> make the bitbang driver to generate MII bus phy_mask dynamically, by the
> PHY irq info provided by the platform.

just realized that haven't mentioned that it's OK from  me,
assuming Andy's comment taken into account...
> 
> Signed-off-by: Mark Zhan <[EMAIL PROTECTED]>
> ---
>  b/drivers/net/fs_enet/mii-bitbang.c |9 ++---
>  1 file changed, 6 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/net/fs_enet/mii-bitbang.c
> b/drivers/net/fs_enet/mii-bitbang.c
> index d384010..3732d69 100644
> --- a/drivers/net/fs_enet/mii-bitbang.c
> +++ b/drivers/net/fs_enet/mii-bitbang.c
> @@ -315,7 +315,7 @@ static int __devinit fs_enet_mdio_probe(
>   struct fs_mii_bb_platform_info *pdata;
>   struct mii_bus *new_bus;
>   struct bb_info *bitbang;
> - int err = 0;
> + int i, err = 0;
>  
>   if (NULL == dev)
>   return -EINVAL;
> @@ -336,14 +336,17 @@ static int __devinit fs_enet_mdio_probe(
>   new_bus->reset = &fs_enet_mii_bb_reset,
>   new_bus->id = pdev->id;
>  
> - new_bus->phy_mask = ~0x9;
>   pdata = (struct fs_mii_bb_platform_info *)pdev->dev.platform_data;
> -
>   if (NULL == pdata) {
>   printk(KERN_ERR "gfar mdio %d: Missing platform data!\n", 
> pdev->id);
>   return -ENODEV;
>   }
>  
> + new_bus->phy_mask = 0x;
> + for (i = 0; i < PHY_MAX_ADDR; i++)
> + if (pdata->irq[i] != -1)
> + new_bus->phy_mask &= ~(1 << i);
> +
>   /*set up workspace*/
>   fs_mii_bitbang_init(bitbang, pdata);
>  
> 
> ___
> Linuxppc-dev mailing list
> [EMAIL PROTECTED]
> https://ozlabs.org/mailman/listinfo/linuxppc-dev
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 13/15] spidernet: move a block of code around

2007-06-11 Thread Linas Vepstas


Put the enable and disable routines next to one-another, 
as this makes verifying thier symmetry that much easier.

Signed-off-by: Linas Vepstas <[EMAIL PROTECTED]>


 drivers/net/spider_net.c |   28 ++--
 1 file changed, 14 insertions(+), 14 deletions(-)

Index: linux-2.6.22-rc1/drivers/net/spider_net.c
===
--- linux-2.6.22-rc1.orig/drivers/net/spider_net.c  2007-06-11 
11:53:24.0 -0500
+++ linux-2.6.22-rc1/drivers/net/spider_net.c   2007-06-11 11:53:27.0 
-0500
@@ -501,6 +501,20 @@ spider_net_enable_rxdmac(struct spider_n
 }
 
 /**
+ * spider_net_disable_rxdmac - disables the receive DMA controller
+ * @card: card structure
+ *
+ * spider_net_disable_rxdmac terminates processing on the DMA controller
+ * by turing off the DMA controller, with the force-end flag set.
+ */
+static inline void
+spider_net_disable_rxdmac(struct spider_net_card *card)
+{
+   spider_net_write_reg(card, SPIDER_NET_GDADMACCNTR,
+SPIDER_NET_DMA_RX_FEND_VALUE);
+}
+
+/**
  * spider_net_refill_rx_chain - refills descriptors/skbs in the rx chains
  * @card: card structure
  *
@@ -656,20 +670,6 @@ write_hash:
 }
 
 /**
- * spider_net_disable_rxdmac - disables the receive DMA controller
- * @card: card structure
- *
- * spider_net_disable_rxdmac terminates processing on the DMA controller by
- * turing off DMA and issueing a force end
- */
-static void
-spider_net_disable_rxdmac(struct spider_net_card *card)
-{
-   spider_net_write_reg(card, SPIDER_NET_GDADMACCNTR,
-SPIDER_NET_DMA_RX_FEND_VALUE);
-}
-
-/**
  * spider_net_prepare_tx_descr - fill tx descriptor with skb data
  * @card: card structure
  * @descr: descriptor structure to fill out
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 14/15] spidernet: fix misnamed flag

2007-06-11 Thread Linas Vepstas

The transmit frame tail bit is stranglely misnamed as 
"no checksum". Fix the name to what it should be:
"transmit frame tail". No functional change, 
just a name change.

Signed-off-by: Linas Vepstas <[EMAIL PROTECTED]>


 drivers/net/spider_net.c |2 +-
 drivers/net/spider_net.h |2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

Index: linux-2.6.22-rc1/drivers/net/spider_net.c
===
--- linux-2.6.22-rc1.orig/drivers/net/spider_net.c  2007-06-11 
11:53:27.0 -0500
+++ linux-2.6.22-rc1/drivers/net/spider_net.c   2007-06-11 11:53:29.0 
-0500
@@ -716,7 +716,7 @@ spider_net_prepare_tx_descr(struct spide
hwdescr->data_status = 0;
 
hwdescr->dmac_cmd_status =
-   SPIDER_NET_DESCR_CARDOWNED | SPIDER_NET_DMAC_NOCS;
+   SPIDER_NET_DESCR_CARDOWNED | SPIDER_NET_DMAC_TXFRMTL;
spin_unlock_irqrestore(&chain->lock, flags);
 
if (skb->ip_summed == CHECKSUM_PARTIAL)
Index: linux-2.6.22-rc1/drivers/net/spider_net.h
===
--- linux-2.6.22-rc1.orig/drivers/net/spider_net.h  2007-06-11 
11:53:26.0 -0500
+++ linux-2.6.22-rc1/drivers/net/spider_net.h   2007-06-11 11:53:29.0 
-0500
@@ -354,7 +354,7 @@ enum spider_net_int2_status {
 #define SPIDER_NET_GPRDAT_MASK 0x
 
 #define SPIDER_NET_DMAC_NOINTR_COMPLETE0x0080
-#define SPIDER_NET_DMAC_NOCS   0x0004
+#define SPIDER_NET_DMAC_TXFRMTL0x0004
 #define SPIDER_NET_DMAC_TCP0x0002
 #define SPIDER_NET_DMAC_UDP0x0003
 #define SPIDER_NET_TXDCEST 0x0800
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 15/15] spidernet: driver docmentation

2007-06-11 Thread Linas Vepstas

Documentation for the spidernet driver.

Signed-off-by: Linas Vepstas <[EMAIL PROTECTED]>


 Documentation/networking/spider_net.txt |  204 
 1 file changed, 204 insertions(+)

Index: linux-2.6.22-rc1/Documentation/networking/spider_net.txt
===
--- /dev/null   1970-01-01 00:00:00.0 +
+++ linux-2.6.22-rc1/Documentation/networking/spider_net.txt2007-06-11 
11:53:31.0 -0500
@@ -0,0 +1,204 @@
+
+The Spidernet Device Driver
+===
+
+Written by Linas Vepstas <[EMAIL PROTECTED]>
+
+Version of 7 June 2007
+
+Abstract
+
+This document sketches the structure of portions of the spidernet
+device driver in the Linux kernel tree. The spidernet is a gigabit
+ethernet device built into the Toshiba southbridge commonly used
+in the SONY Playstation 3 and the IBM QS20 Cell blade.
+
+The Structure of the RX Ring.
+=
+The receive (RX) ring is a circular linked list of RX descriptors,
+together with three pointers into the ring that are used to manage its
+contents.
+
+The elements of the ring are called "descriptors" or "descrs"; they
+describe the received data. This includes a pointer to a buffer
+containing the received data, the buffer size, and various status bits.
+
+There are three primary states that a descriptor can be in: "empty",
+"full" and "not-in-use".  An "empty" or "ready" descriptor is ready
+to receive data from the hardware. A "full" descriptor has data in it,
+and is waiting to be emptied and processed by the OS. A "not-in-use"
+descriptor is neither empty or full; it is simply not ready. It may
+not even have a data buffer in it, or is otherwise unusable.
+
+During normal operation, on device startup, the OS (specifically, the
+spidernet device driver) allocates a set of RX descriptors and RX
+buffers. These are all marked "empty", ready to receive data. This
+ring is handed off to the hardware, which sequentially fills in the
+buffers, and marks them "full". The OS follows up, taking the full
+buffers, processing them, and re-marking them empty.
+
+This filling and emptying is managed by three pointers, the "head"
+and "tail" pointers, managed by the OS, and a hardware current
+descriptor pointer (GDACTDPA). The GDACTDPA points at the descr
+currently being filled. When this descr is filled, the hardware
+marks it full, and advances the GDACTDPA by one.  Thus, when there is
+flowing RX traffic, every descr behind it should be marked "full",
+and everything in front of it should be "empty".  If the hardware
+discovers that the current descr is not empty, it will signal an
+interrupt, and halt processing.
+
+The tail pointer tails or trails the hardware pointer. When the
+hardware is ahead, the tail pointer will be pointing at a "full"
+descr. The OS will process this descr, and then mark it "not-in-use",
+and advance the tail pointer.  Thus, when there is flowing RX traffic,
+all of the descrs in front of the tail pointer should be "full", and
+all of those behind it should be "not-in-use". When RX traffic is not
+flowing, then the tail pointer can catch up to the hardware pointer.
+The OS will then note that the current tail is "empty", and halt
+processing.
+
+The head pointer (somewhat mis-named) follows after the tail pointer.
+When traffic is flowing, then the head pointer will be pointing at
+a "not-in-use" descr. The OS will perform various housekeeping duties
+on this descr. This includes allocating a new data buffer and
+dma-mapping it so as to make it visible to the hardware. The OS will
+then mark the descr as "empty", ready to receive data. Thus, when there
+is flowing RX traffic, everything in front of the head pointer should
+be "not-in-use", and everything behind it should be "empty". If no
+RX traffic is flowing, then the head pointer can catch up to the tail
+pointer, at which point the OS will notice that the head descr is
+"empty", and it will halt processing.
+
+Thus, in an idle system, the GDACTDPA, tail and head pointers will
+all be pointing at the same descr, which should be "empty". All of the
+other descrs in the ring should be "empty" as well.
+
+The show_rx_chain() routine will print out the the locations of the
+GDACTDPA, tail and head pointers. It will also summarize the contents
+of the ring, starting at the tail pointer, and listing the status
+of the descrs that follow.
+
+A typical example of the output, for a nearly idle system, might be
+
+net eth1: Total number of descrs=256
+net eth1: Chain tail located at descr=20
+net eth1: Chain head is at 20
+net eth1: HW curr desc (GDACTDPA) is at 21
+net eth1: Have 1 descrs with stat=x40800101
+net eth1: HW next desc (GDACNEXTDA) is at 22
+net eth1: Last 255 descrs with stat=xa080
+
+In the above, the hardware has filled in one descr, number 20. Both
+head and tail are pointing at 20, because it has not yet been emptied

Re: [PATCH 1/2] [RFC] NET: Implement a standard ndev_printk family

2007-06-11 Thread Joe Perches
I like the ndev_printk idea.

I think a ndev_ printks should take a const netdev*
as the first argument.

Also, better for the non-debug use of ndev_dbg is to have a
static inline function so printf args are verified.

For instance, dev_dbg does:

static inline int __attribute__ ((format (printf, 2, 3)))
dev_dbg(struct device * dev, const char * fmt, ...)
{
return 0;
}

Perhaps ndev_dbg should look like this:

static inline int __attribute ((format (printf, 3, 4)))
__ndev_dbg(const struct net_device *netdev, u32 level, const char *format, ...)
{
return 0;
}

#define ndev_dbg(netdev, level, fmt, args...) \
__ndev_dbg(netdev, NETIF_MSG_##level, fmt, ##args)



-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: SKY2 vs SK98LIN performance on 88E8053 MAC

2007-06-11 Thread Philip Romanov


> > We are doing pure IPv4 forwarding between two
> Ethernet
> > interfaces:
> > 
> >  IXIA port A<--->System Under Test<--->IXIA Port B
> > 
> > Traffic has two IP destinations for each direction
> and
> > L4 protocol is UDP. There are two static ARP
> entries
> > and only interface routes. Two tests are identical
> > except that we switch from one driver to another. 
> > 
> > Ethernet ports on the SUT are oversubscribed --
> I'm
> > sending 60% of line rate (of 256-byte packets) and
> > measuring percentage of pass-through traffic which
> > makes to the IXIA port on the other side. Traffic
> is
> > bidirectional and system load is close to 100%.
> > 

> 
> Could you post the profiles. Hopefully, others have
> good ideas
> as well.
> 
> 256 bytes is the size where the copybreak
> optimization kicks in
> so you might want to experiment with the copybreak
> module option
> to the sky2 driver. copybreak=0 would no packets to
> be copied,
> copybreak=1514 would cause all packets to be copied.
>  Copying is
> an optimization that helps when receiving small
> packets locally,
> but may slow down forwarding path.
> 
 

Profiles were attached to previous posting in the
thread. I'm pasting them in plain text now at the end.
There are four profiles: two for the vmlinux and two
for sky2 and sk98lin drivers.

Regarding copybreak parameter: it appears that it
kicks in starting from 128 bytes by default??? 

...
static int copybreak __read_mostly = 128;
module_param(copybreak, int, 0);
MODULE_PARM_DESC(copybreak, "Receive copy threshold");
...

Anyway, I tried both copybreak settings of 0 and 1500:
there is significant slowdown when copybreak is set to
1500 with 256-byte traffic. Another clarification:
256-byte packets refer to entire Ethernet frame
including FCS, so when packets make into the driver
they become 252-byte long. I also tried to switch
driver to IRQ mode from MSI (SK98LIN is running is IRQ
mode) -- that did not have any significant effect on
forwarding performance.


Oprofile results:

profile for vmlinux 2.6.21.3 running with sk98lin
driver:

CPU: PIII, speed 2000.1 MHz (estimated)
Counted CPU_CLK_UNHALTED events (clocks processor is
not halted) with a unit mask of 0x00 (No unit mask)
count 10
samples  %symbol name
1626 14.3222  _raw_spin_trylock
935   8.2357  dev_hard_start_xmit
756   6.6590  sub_preempt_count
574   5.0559  __alloc_skb
507   4.4658  _raw_spin_unlock
462   4.0694  add_preempt_count
452   3.9813  dev_queue_xmit
432   3.8052  ip_output
416   3.6642  ip_rcv
406   3.5761  preempt_schedule
380   3.3471  netif_receive_skb
364   3.2062  __qdisc_run
283   2.4927  skb_release_data
274   2.4135  debug_smp_processor_id
265   2.3342  kfree
219   1.9290  kmem_cache_free
211   1.8585  __kmalloc
181   1.5943  ip_route_input
177   1.5591  pfifo_fast_dequeue
164   1.4446  ip_forward
150   1.3212  kmem_cache_alloc
141   1.2420  __kfree_skb
128   1.1275  ide_insw
121   1.0658  rt_hash_code
100   0.8808  pfifo_fast_requeue
960.8456  nf_iterate
940.8280  pfifo_fast_enqueue
910.8016  eth_type_trans
800.7047  nf_hook_slow
780.6870  cache_alloc_refill
720.6342  dev_kfree_skb_any
680.5990  local_bh_enable
580.5109  kfree_skb
580.5109  kfree_skbmem
520.4580  free_block
490.4316  selinux_ipv4_postroute_last
480.4228  delay_tsc
380.3347  page_fault
360.3171  kunmap_atomic
330.2907  memcpy
270.2378  __handle_mm_fault
270.2378  __netif_schedule
270.2378  cache_flusharray
260.2290  do_wp_page
250.2202  net_rx_action
210.1850  __d_lookup
160.1409  __copy_to_user_ll
160.1409  unmap_vmas
150.1321  default_idle
150.1321  kmap_atomic
140.1233  get_page_from_freelist
120.1057  __link_path_walk
120.1057  flush_tlb_mm
120.1057  strnlen_user
110.0969  avc_has_perm_noaudit
110.0969  do_page_fault
110.0969  sysenter_past_esp
100.0881  inode_has_perm
100.0881  net_tx_action
100.0881  selinux_inode_permission
9 0.0793  __might_sleep
9 0.0793  filemap_nopage
8 0.0705  cache_reap
8 0.0705  find_get_page
8 0.0705  find_vma
8 0.0705  local_bh_disable
7 0.0617  _atomic_dec_and_lock
6 0.0528  __copy_from_user_ll
6 0.0528  do_lookup
6 0.0528  do_timer
6 0.0528  free_hot_cold_page
6 0.0528  hrtimer_run_queues
6 0.0528  run_rebalance_domains
5 0.0440  apic_timer_interrupt
5 0.0440  error_code
5 0.0440  find_busiest_group
5 0.0440  task_rq_lock
4 0.0352  __do_softirq
4 0.0352  _spin_lock_irq
4 0.0352  copy_page_range
4 0.0352  do_mmap_pgoff
4

Re: [PATCH 1/2] [RFC] NET: Implement a standard ndev_printk family

2007-06-11 Thread Kok, Auke

Joe Perches wrote:

I like the ndev_printk idea.

I think a ndev_ printks should take a const netdev*
as the first argument.

Also, better for the non-debug use of ndev_dbg is to have a
static inline function so printf args are verified.

For instance, dev_dbg does:

static inline int __attribute__ ((format (printf, 2, 3)))
dev_dbg(struct device * dev, const char * fmt, ...)
{
return 0;
}

Perhaps ndev_dbg should look like this:

static inline int __attribute ((format (printf, 3, 4)))
__ndev_dbg(const struct net_device *netdev, u32 level, const char *format, ...)
{
return 0;
}

#define ndev_dbg(netdev, level, fmt, args...) \
__ndev_dbg(netdev, NETIF_MSG_##level, fmt, ##args)



I'll take both of these comments into consideration and post a new version in a 
bit...


Thanks.

Auke
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [GIT]: net-2.6.23 tree available

2007-06-11 Thread David Miller
From: Patrick McHardy <[EMAIL PROTECTED]>
Date: Mon, 11 Jun 2007 13:12:08 +0200

> I hope this still gives me the chance to submit updated patches
> since I've changed a few things since the last submission and
> a single incremental update would mix things that don't really
> belong together.
> 
> BTW, I'm back from my trip, I'll work through all the comments
> regarding these patches now.

No problem, I can yank out the current set and put your new
stuff in.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [GIT]: net-2.6.23 tree available

2007-06-11 Thread Patrick McHardy
David Miller wrote:
> From: Patrick McHardy <[EMAIL PROTECTED]>
> Date: Mon, 11 Jun 2007 13:12:08 +0200
> 
> 
>>I hope this still gives me the chance to submit updated patches
>>since I've changed a few things since the last submission and
>>a single incremental update would mix things that don't really
>>belong together.
>>
>>BTW, I'm back from my trip, I'll work through all the comments
>>regarding these patches now.
> 
> 
> No problem, I can yank out the current set and put your new
> stuff in.


Thanks. I'll send you an update once we've worked out the
IFLA_PARTNER thing.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] NET: Multiqueue network device support.

2007-06-11 Thread jamal
On Mon, 2007-11-06 at 17:44 +0200, Patrick McHardy wrote:
> jamal wrote:
[..]
> > - let the driver shutdown whenever a ring is full. Remember which ring X
> > shut it down.
> > - when you get a tx interupt or prun tx descriptors, if a ring <= X has
> > transmitted a packet (or threshold of packets), then wake up the driver
> > (i.e open up). 
> 
> 
> At this point the qdisc might send new packets. What do you do when a
> packet for a full ring arrives?
> 

Hrm... ok, is this a trick question or i am missing the obvious?;->
What is wrong with what any driver would do today - which is:
netif_stop and return BUSY; core requeues the packet?

> I see three choices:
> 
> - drop it, even though its still within the qdiscs configured limits
> - requeue it, which does not work because the qdisc is still active
>   and might just hand you the same packet over and over again in a
>   busy loop, until the ring has more room (which has the same worst
>   case, just that we're sitting in a busy loop now).
> - requeue and stop the queue: we're back to where we started since
>   now higher priority packets will not get passed to the driver.

Refer to choice #4 above. 

The patches are trivial - really; as soon as Peter posts the e1000
change for his version i should be able to cutnpaste and produce one
that will work with what i am saying.
I am going to try my best to do that this week - i am going to be a
little busy and have a few outstanding items (like the pktgen thing)
that i want to get out of the way...

cheers,
jamal

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] [RFC] NET: Implement a standard ndev_printk family

2007-06-11 Thread Auke Kok
A lot of netdevices implement their own variant of printk and use
use variations of dev_printk, printk or others that use msg_enable,
which has been an eyesore with countless variations across drivers.

This patch implements a standard ndev_printk and derivatives
such as ndev_err, ndev_info, ndev_warn that allows drivers to
transparently use both the msg_enable and a generic netdevice
message layout. It moves the msg_enable over to the net_device
struct and allows drivers to obsolete ethtool handling code of
the msg_enable value.

The current code has each driver contain a copy of msg_enable and
handle the setting/changing through ethtool that way. Since the
netdev name is stored in the net_device struct, those two are
not coherently available in a uniform way across all drivers (a
single macro or function would not work since all drivers name
their net_device members differently). This makes netdevice
driver writes reinvent the wheel over and over again.

It thus makes sense to move msg_enable to the net_device. This
gives us the opportunity to (1) initialize it by default with a
globally sane value, (2) remove msg_enable handling code w/r
ethtool for drivers that know and use the msg_enable member
of the net_device struct. (3) Ethtool code can just modify the
net_device msg_enable for drivers that do not have custom
msg_enable get/set handlers so converted drivers lose some
code for that as well.

Signed-off-by: Auke Kok <[EMAIL PROTECTED]>
---

 include/linux/netdevice.h |   38 ++
 net/core/dev.c|5 +
 net/core/ethtool.c|   14 +++---
 3 files changed, 50 insertions(+), 7 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 3a70f55..d185f41 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -540,6 +540,8 @@ struct net_device
struct device   dev;
/* space for optional statistics and wireless sysfs groups */
struct attribute_group  *sysfs_groups[3];
+
+   int msg_enable;
 };
 #define to_net_dev(d) container_of(d, struct net_device, dev)
 
@@ -838,6 +840,42 @@ enum {
NETIF_MSG_WOL   = 0x4000,
 };
 
+#define ndev_err(netdev, level, format, arg...) \
+   do { if ((netdev)->msg_enable & NETIF_MSG_##level) { \
+   printk(KERN_ERR "%s: %s: " format, (netdev)->name, \
+   (netdev)->dev.parent->bus_id, ## arg); } } while (0)
+
+#define ndev_warn(netdev, level, format, arg...) \
+   do { if ((netdev)->msg_enable & NETIF_MSG_##level) { \
+   printk(KERN_WARNING "%s: %s: " format, (netdev)->name, \
+   (netdev)->dev.parent->bus_id, ## arg); } } while (0)
+
+#define ndev_notice(netdev, level, format, arg...) \
+   do { if ((netdev)->msg_enable & NETIF_MSG_##level) { \
+   printk(KERN_NOTICE "%s: %s: " format, (netdev)->name, \
+   (netdev)->dev.parent->bus_id, ## arg); } } while (0)
+
+#define ndev_info(netdev, level, format, arg...) \
+   do { if ((netdev)->msg_enable & NETIF_MSG_##level) { \
+   printk(KERN_INFO "%s: %s: " format, (netdev)->name, \
+   (netdev)->dev.parent->bus_id, ## arg); } } while (0)
+
+#ifdef DEBUG
+#define ndev_debug(netdev, level, format, arg...) \
+   do { if ((netdev)->msg_enable & NETIF_MSG_##level) { \
+   printk(KERN_DEBUG "%s: %s: " format, (netdev)->name, \
+   (netdev)->dev.parent->bus_id, ## arg); } } while (0)
+#else
+/* Force type-checking on ndev_dbg() when DEBUG is not set */
+static inline int __attribute__ ((format (printf, 3, 4)))
+__ndev_dbg(struct net_device * netdev, u32 level, const char *format, ...)
+{
+   return 0;
+}
+#define ndev_dbg(netdev, level, fmt, args...) \
+   __ndev_dbg(netdev, NETIF_MSG_##level, fmt, ##args)
+#endif
+
 #define netif_msg_drv(p)   ((p)->msg_enable & NETIF_MSG_DRV)
 #define netif_msg_probe(p) ((p)->msg_enable & NETIF_MSG_PROBE)
 #define netif_msg_link(p)  ((p)->msg_enable & NETIF_MSG_LINK)
diff --git a/net/core/dev.c b/net/core/dev.c
index 2609062..316f250 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3378,6 +3378,11 @@ struct net_device *alloc_netdev(int sizeof_priv, const 
char *name,
dev->priv = netdev_priv(dev);
 
dev->get_stats = internal_stats;
+   dev->msg_enable = NETIF_MSG_DRV | NETIF_MSG_PROBE | NETIF_MSG_LINK;
+#ifdef DEBUG
+   /* put these to good use: */
+   dev->msg_enable |= NETIF_MSG_RX_ERR | NETIF_MSG_TX_ERR;
+#endif
setup(dev);
strcpy(dev->name, name);
return dev;
diff --git a/net/core/ethtool.c b/net/core/ethtool.c
index 8d5e5a0..ff8d52f 100644
--- a/net/core/ethtool.c
+++ b/net/core/ethtool.c
@@ -234,9 +234,9 @@ static int ethtool_get_msglevel(struct net_device *dev, 
char __user *useraddr)
struct ethtool_value edata = { ETHTOOL_GMSGLVL };
 
if (!dev->ethtool_ops->get_msglevel)
-   

[PATCH 2/2] [RFC] NET: Convert several drivers to ndev_printk

2007-06-11 Thread Auke Kok
With the generic ndev_printk macros, we can now convert network
drivers to use this generic printk family for netdevices.

Signed-off-by: Auke Kok <[EMAIL PROTECTED]>
---

 drivers/net/e100.c|  135 -
 drivers/net/e1000/e1000.h |   15 
 drivers/net/e1000/e1000_ethtool.c |   39 +++
 drivers/net/e1000/e1000_main.c|  110 +++---
 drivers/net/e1000/e1000_param.c   |   67 +-
 drivers/net/ixgb/ixgb.h   |   14 
 drivers/net/ixgb/ixgb_ethtool.c   |   15 
 drivers/net/ixgb/ixgb_main.c  |   50 ++
 8 files changed, 184 insertions(+), 261 deletions(-)

diff --git a/drivers/net/e100.c b/drivers/net/e100.c
index 6169663..8c09c9e 100644
--- a/drivers/net/e100.c
+++ b/drivers/net/e100.c
@@ -172,19 +172,12 @@ MODULE_AUTHOR(DRV_COPYRIGHT);
 MODULE_LICENSE("GPL");
 MODULE_VERSION(DRV_VERSION);
 
-static int debug = 3;
 static int eeprom_bad_csum_allow = 0;
 static int use_io = 0;
-module_param(debug, int, 0);
 module_param(eeprom_bad_csum_allow, int, 0);
 module_param(use_io, int, 0);
-MODULE_PARM_DESC(debug, "Debug level (0=none,...,16=all)");
 MODULE_PARM_DESC(eeprom_bad_csum_allow, "Allow bad eeprom checksums");
 MODULE_PARM_DESC(use_io, "Force use of i/o access mode");
-#define DPRINTK(nlevel, klevel, fmt, args...) \
-   (void)((NETIF_MSG_##nlevel & nic->msg_enable) && \
-   printk(KERN_##klevel PFX "%s: %s: " fmt, nic->netdev->name, \
-   __FUNCTION__ , ## args))
 
 #define INTEL_8255X_ETHERNET_DEVICE(device_id, ich) {\
PCI_VENDOR_ID_INTEL, device_id, PCI_ANY_ID, PCI_ANY_ID, \
@@ -645,12 +638,12 @@ static int e100_self_test(struct nic *nic)
 
/* Check results of self-test */
if(nic->mem->selftest.result != 0) {
-   DPRINTK(HW, ERR, "Self-test failed: result=0x%08X\n",
+   ndev_err(nic->netdev, HW, "Self-test failed: result=0x%08X\n",
nic->mem->selftest.result);
return -ETIMEDOUT;
}
if(nic->mem->selftest.signature == 0) {
-   DPRINTK(HW, ERR, "Self-test failed: timed out\n");
+   ndev_err(nic->netdev, HW, "Self-test failed: timed out\n");
return -ETIMEDOUT;
}
 
@@ -754,7 +747,7 @@ static int e100_eeprom_load(struct nic *nic)
 * the sum of words should be 0xBABA */
checksum = le16_to_cpu(0xBABA - checksum);
if(checksum != nic->eeprom[nic->eeprom_wc - 1]) {
-   DPRINTK(PROBE, ERR, "EEPROM corrupted\n");
+   ndev_err(nic->netdev, PROBE, "EEPROM corrupted\n");
if (!eeprom_bad_csum_allow)
return -EAGAIN;
}
@@ -896,8 +889,7 @@ static u16 mdio_ctrl(struct nic *nic, u32 addr, u32 dir, 
u32 reg, u16 data)
udelay(20);
}
if (unlikely(!i)) {
-   printk("e100.mdio_ctrl(%s) won't go Ready\n",
-   nic->netdev->name );
+   dev_err(&nic->pdev->dev, "mdio_ctrl won't go Ready\n");
spin_unlock_irqrestore(&nic->mdio_lock, flags);
return 0;   /* No way to indicate timeout error */
}
@@ -909,7 +901,7 @@ static u16 mdio_ctrl(struct nic *nic, u32 addr, u32 dir, 
u32 reg, u16 data)
break;
}
spin_unlock_irqrestore(&nic->mdio_lock, flags);
-   DPRINTK(HW, DEBUG,
+   ndev_dbg(nic->netdev, HW,
"%s:addr=%d, reg=%d, data_in=0x%04X, data_out=0x%04X\n",
dir == mdi_read ? "READ" : "WRITE", addr, reg, data, data_out);
return (u16)data_out;
@@ -962,8 +954,8 @@ static void e100_get_defaults(struct nic *nic)
 static void e100_configure(struct nic *nic, struct cb *cb, struct sk_buff *skb)
 {
struct config *config = &cb->u.config;
-   u8 *c = (u8 *)config;
-
+   u8 *c;
+   
cb->command = cpu_to_le16(cb_config);
 
memset(config, 0, sizeof(struct config));
@@ -1023,12 +1015,16 @@ static void e100_configure(struct nic *nic, struct cb 
*cb, struct sk_buff *skb)
config->standard_stat_counter = 0x0;
}
 
-   DPRINTK(HW, DEBUG, "[00-07]=%02X:%02X:%02X:%02X:%02X:%02X:%02X:%02X\n",
-   c[0], c[1], c[2], c[3], c[4], c[5], c[6], c[7]);
-   DPRINTK(HW, DEBUG, "[08-15]=%02X:%02X:%02X:%02X:%02X:%02X:%02X:%02X\n",
-   c[8], c[9], c[10], c[11], c[12], c[13], c[14], c[15]);
-   DPRINTK(HW, DEBUG, "[16-23]=%02X:%02X:%02X:%02X:%02X:%02X:%02X:%02X\n",
-   c[16], c[17], c[18], c[19], c[20], c[21], c[22], c[23]);
+   c = (u8 *)config;
+   ndev_dbg(nic->netdev, HW,
+"[00-07]=%02X:%02X:%02X:%02X:%02X:%02X:%02X:%02X\n",
+c[0], c[1], c[2], c[3], c[4], c[5], c[6], c[7]);
+   ndev_dbg(nic->netdev, HW,
+"[08-15]=%02X:%02X:%02X:%02X:%02X:%02X:%02X:%02X\n",
+c[8], c[9], c[10], c[11], c[

Re: [PATCH 1/2] [RFC] NET: Implement a standard ndev_printk family

2007-06-11 Thread Kok, Auke

Auke Kok wrote:

A lot of netdevices implement their own variant of printk and use
use variations of dev_printk, printk or others that use msg_enable,
which has been an eyesore with countless variations across drivers.





+#define ndev_err(netdev, level, format, arg...) \
+   do { if ((netdev)->msg_enable & NETIF_MSG_##level) { \
+   printk(KERN_ERR "%s: %s: " format, (netdev)->name, \
+   (netdev)->dev.parent->bus_id, ## arg); } } while (0)


so now the only question is whether we want the ndev_printk() stuff to format as 
much as possible like dev_printk. dev_printk also prints dev_driver_string(dev), 
which is what above macro omits. To make this as much similar as possible, we

could do:

> +#define ndev_err(netdev, level, format, arg...) \
> +  do { if ((netdev)->msg_enable & NETIF_MSG_##level) { \
> +  printk(KERN_ERR "%s: %s %s: " format, (netdev)->name, \
> +  dev_driver_string(netdev)-dev), (netdev)->dev.parent->bus_id, \
> + ## arg); } } while (0)

it would (e.g. e1000) change the output from:
eth1: :00:19.0: NIC Link is Down

To:
eth1: e1000 :00:19.0: NIC Link is Down

But I am unsure whether the addition of the driver name is useful at this point 
- most drivers already printk some sort of association out that can be used to 
track the bus_id/netdev name back to the driver easily.


So, hence I omitted doing this in this patch.

Auke
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] [RFC] NET: Implement a standard ndev_printk family

2007-06-11 Thread Joe Perches
On Mon, 2007-06-11 at 14:37 -0700, Auke Kok wrote:
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index 3a70f55..d185f41 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -540,6 +540,8 @@ struct net_device
>   struct device   dev;
>   /* space for optional statistics and wireless sysfs groups */
>   struct attribute_group  *sysfs_groups[3];
> +
> + int msg_enable;
>  };
>  #define to_net_dev(d) container_of(d, struct net_device, dev)
>  

msg_enable is more frequently defined in drivers/net as u32 not int.

egrep -r -w --include=*.[ch] "u32[[:space:]]+msg_enable" drivers/net | wc -l
egrep -r -w --include=*.[ch] "int[[:space:]]+msg_enable" drivers/net | wc -l

29 to 5

cheers, Joe

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] [RFC] NET: Implement a standard ndev_printk family

2007-06-11 Thread Randy Dunlap
On Mon, 11 Jun 2007 14:37:21 -0700 Auke Kok wrote:

>  include/linux/netdevice.h |   38 ++
>  net/core/dev.c|5 +
>  net/core/ethtool.c|   14 +++---
>  3 files changed, 50 insertions(+), 7 deletions(-)
> 
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index 3a70f55..d185f41 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -540,6 +540,8 @@ struct net_device
>   struct device   dev;
>   /* space for optional statistics and wireless sysfs groups */
>   struct attribute_group  *sysfs_groups[3];
> +
> + int msg_enable;
>  };
>  #define to_net_dev(d) container_of(d, struct net_device, dev)
>  
> @@ -838,6 +840,42 @@ enum {
>   NETIF_MSG_WOL   = 0x4000,
>  };
>  
> +#define ndev_err(netdev, level, format, arg...) \
> + do { if ((netdev)->msg_enable & NETIF_MSG_##level) { \
> + printk(KERN_ERR "%s: %s: " format, (netdev)->name, \
> + (netdev)->dev.parent->bus_id, ## arg); } } while (0)
> +

I would still do what Steve H. suggested, i.e., only evaluate
macro args one time.  E.g. (not tested & reformatted :)

+#define ndev_err(netdev, level, format, arg...) \
+   do { \
+   struct net_device *__nd = netdev; \
+   if ((__nd)->msg_enable & NETIF_MSG_##level) \
+   printk(KERN_ERR "%s: %s: " format, (__nd)->name, \
+   (__nd)->dev.parent->bus_id, ## arg); \
+   } while (0)
+

Dropping the braces around the printk() too.

etc...

> +#define ndev_warn(netdev, level, format, arg...) \
> + do { if ((netdev)->msg_enable & NETIF_MSG_##level) { \
> + printk(KERN_WARNING "%s: %s: " format, (netdev)->name, \
> + (netdev)->dev.parent->bus_id, ## arg); } } while (0)
> +
> +#define ndev_notice(netdev, level, format, arg...) \
> + do { if ((netdev)->msg_enable & NETIF_MSG_##level) { \
> + printk(KERN_NOTICE "%s: %s: " format, (netdev)->name, \
> + (netdev)->dev.parent->bus_id, ## arg); } } while (0)
> +
> +#define ndev_info(netdev, level, format, arg...) \
> + do { if ((netdev)->msg_enable & NETIF_MSG_##level) { \
> + printk(KERN_INFO "%s: %s: " format, (netdev)->name, \
> + (netdev)->dev.parent->bus_id, ## arg); } } while (0)
> +
> +#ifdef DEBUG
> +#define ndev_debug(netdev, level, format, arg...) \
> + do { if ((netdev)->msg_enable & NETIF_MSG_##level) { \
> + printk(KERN_DEBUG "%s: %s: " format, (netdev)->name, \
> + (netdev)->dev.parent->bus_id, ## arg); } } while (0)
> +#else
> +/* Force type-checking on ndev_dbg() when DEBUG is not set */
> +static inline int __attribute__ ((format (printf, 3, 4)))
> +__ndev_dbg(struct net_device * netdev, u32 level, const char *format, ...)
> +{
> + return 0;
> +}
> +#define ndev_dbg(netdev, level, fmt, args...) \
> + __ndev_dbg(netdev, NETIF_MSG_##level, fmt, ##args)
> +#endif
> +
>  #define netif_msg_drv(p) ((p)->msg_enable & NETIF_MSG_DRV)
>  #define netif_msg_probe(p)   ((p)->msg_enable & NETIF_MSG_PROBE)
>  #define netif_msg_link(p)((p)->msg_enable & NETIF_MSG_LINK)


---
~Randy
*** Remember to use Documentation/SubmitChecklist when testing your code ***
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] [RFC] NET: Implement a standard ndev_printk family

2007-06-11 Thread Kok, Auke

Joe Perches wrote:

On Mon, 2007-06-11 at 14:37 -0700, Auke Kok wrote:

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 3a70f55..d185f41 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -540,6 +540,8 @@ struct net_device
struct device   dev;
/* space for optional statistics and wireless sysfs groups */
struct attribute_group  *sysfs_groups[3];
+
+   int msg_enable;
 };
 #define to_net_dev(d) container_of(d, struct net_device, dev)
 


msg_enable is more frequently defined in drivers/net as u32 not int.

egrep -r -w --include=*.[ch] "u32[[:space:]]+msg_enable" drivers/net | wc -l
egrep -r -w --include=*.[ch] "int[[:space:]]+msg_enable" drivers/net | wc -l

29 to 5


yes, we're only using the bottom 15 bits anyway, but the net_device struct 
consistently uses 'int' style members, leaving it up to the compiler to pick an 
appropriate size for each arch. That was the reason I chose int here.


Auke
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] [RFC] NET: Implement a standard ndev_printk family

2007-06-11 Thread Kok, Auke

Randy Dunlap wrote:

On Mon, 11 Jun 2007 14:37:21 -0700 Auke Kok wrote:


 include/linux/netdevice.h |   38 ++
 net/core/dev.c|5 +
 net/core/ethtool.c|   14 +++---
 3 files changed, 50 insertions(+), 7 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 3a70f55..d185f41 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -540,6 +540,8 @@ struct net_device
struct device   dev;
/* space for optional statistics and wireless sysfs groups */
struct attribute_group  *sysfs_groups[3];
+
+   int msg_enable;
 };
 #define to_net_dev(d) container_of(d, struct net_device, dev)
 
@@ -838,6 +840,42 @@ enum {

NETIF_MSG_WOL   = 0x4000,
 };
 
+#define ndev_err(netdev, level, format, arg...) \

+   do { if ((netdev)->msg_enable & NETIF_MSG_##level) { \
+   printk(KERN_ERR "%s: %s: " format, (netdev)->name, \
+   (netdev)->dev.parent->bus_id, ## arg); } } while (0)
+


I would still do what Steve H. suggested, i.e., only evaluate
macro args one time.  E.g. (not tested & reformatted :)

+#define ndev_err(netdev, level, format, arg...) \
+   do { \
+   struct net_device *__nd = netdev; \
+   if ((__nd)->msg_enable & NETIF_MSG_##level) \
+   printk(KERN_ERR "%s: %s: " format, (__nd)->name, \
+   (__nd)->dev.parent->bus_id, ## arg); \
+   } while (0)
+

Dropping the braces around the printk() too.


ahh, *now* I get it... :)

Auke
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] [RFC] NET: Implement a standard ndev_printk family

2007-06-11 Thread Stephen Hemminger
On Mon, 11 Jun 2007 14:37:21 -0700
Auke Kok <[EMAIL PROTECTED]> wrote:

> A lot of netdevices implement their own variant of printk and use
> use variations of dev_printk, printk or others that use msg_enable,
> which has been an eyesore with countless variations across drivers.
> 
> This patch implements a standard ndev_printk and derivatives
> such as ndev_err, ndev_info, ndev_warn that allows drivers to
> transparently use both the msg_enable and a generic netdevice
> message layout. It moves the msg_enable over to the net_device
> struct and allows drivers to obsolete ethtool handling code of
> the msg_enable value.
> 
> The current code has each driver contain a copy of msg_enable and
> handle the setting/changing through ethtool that way. Since the
> netdev name is stored in the net_device struct, those two are
> not coherently available in a uniform way across all drivers (a
> single macro or function would not work since all drivers name
> their net_device members differently). This makes netdevice
> driver writes reinvent the wheel over and over again.
> 
> It thus makes sense to move msg_enable to the net_device. This
> gives us the opportunity to (1) initialize it by default with a
> globally sane value, (2) remove msg_enable handling code w/r
> ethtool for drivers that know and use the msg_enable member
> of the net_device struct. (3) Ethtool code can just modify the
> net_device msg_enable for drivers that do not have custom
> msg_enable get/set handlers so converted drivers lose some
> code for that as well.
> 
> Signed-off-by: Auke Kok <[EMAIL PROTECTED]>
> ---
> 
>  include/linux/netdevice.h |   38 ++
>  net/core/dev.c|5 +
>  net/core/ethtool.c|   14 +++---
>  3 files changed, 50 insertions(+), 7 deletions(-)
> 
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index 3a70f55..d185f41 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -540,6 +540,8 @@ struct net_device
>   struct device   dev;
>   /* space for optional statistics and wireless sysfs groups */
>   struct attribute_group  *sysfs_groups[3];
> +
> + int msg_enable;
>  };

Since msg_enable is used as bits, it should be unsigned (probably unsigned 
long).
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH] NET: Multiqueue network device support.

2007-06-11 Thread jamal

On Mon, 2007-11-06 at 18:34 +0300, Cohen, Guy wrote:

> jamal wrote:
[..]
> > WMM is a strict prio mechanism.
> > The parametrization very much favors the high prio packets when the
> > tx opportunity to send shows up.
> 
> Sorry, but this not as simple as you describe it. WMM is much more
> complicated. WMM defines the HW queues as virtually multiple clients
> that compete on the medium access individually. Each implements a
> contention-based medium access. The Access Point publishes to the
> clients the medium access parameters (e.g. back off parameters) that are
> different for each access category (virtual client). There is _not_ a
> strict priority assigned to each access category. 

You sound like you know this stuff well so please bear with me. I am
actually hoping i will learn from you.

I dont have access to the IEEE docs but i have been reasonably following
up on the qos aspect and i have a good feel for how the parameters work.
I posted a url to a pdf earlier which describes the WMM default
parameterization for each AC you refer to above - do you wanna comment
on the accuracy of that?

> The behavior of each
> access category totally depends on the medium usage of other clients and
> is totally different for each access category. This cannot be predicated
> at the host SW.

It could be estimated well by the host sw; but lets defer that to later
in case i am clueless on something or you misunderstood something i
said.

> QoS in WLAN doesn't
> favor strictly one access category over another, but defines some softer
> and smarter prioritization. This is implemented in the HW/Firmware. 

I understand.  Please correct me if am wrong:
The only reason AC_BK packet will go out instead of AC_VO when
contending in hardware is because of a statistical opportunity not the
firmware intentionaly trying to allow AC_BK out 
i.e it is influenced by the three variables:
1) The contention window 2) the backoff timer and 3)the tx opportunity
And if you look at the default IEEE parameters as in that url slide 43,
the only time AC_BK will win is luck.

> I
> just think that providing a per-queue controls (start/stop) will allow
> WLAN drivers/Firmware/HW to do that while still using qdisc (and it will
> work properly even when one queue is full and others are empty).

I dont see it the same way. But iam willing to see wireless in a
different light than wireless, more below.

> > however, it is feasible that high prio
> > packets will obstruct low prio packets (which is fine).
> 
> No this is _not_ fine. Just to emphasize again, WMM doesn't define
> priority in the way it is implemented in airplane boarding (Pilots
> first, Business passengers next, couch passengers at the end), but more
> like _distributed_ weights prioritization (between all the multiple
> queues of all the clients on the channel).


I am not trying to be obtuse in any way - but let me ask this for
wireless contention resolution:
When a bussiness passenger is trying to get into plane at the same time
as a couch passenger and the attendant notices i.e to resolve the
contention, who gets preferential treatment? There is the case of the
attendant statistically not noticing (but that accounts for luck)...

Heres a really dated paper before the standard was ratified:
http://www.mwnl.snu.ac.kr/~schoi/publication/Conferences/02-EW.pdf

a) looking at table 1 at the AIFS, CWmin/max and PF used in the
experiment I dont see how a low prio or mid prio ac will
ever beat something in the high prio just by virtue that they have
longer AIFS + CW values. Maybe you can explain (trust me i am trying to
resolve this in my mind and not trying to be difficult in any way; i am
a geek and these sorts of things intrigue me; i may curse but thats ok)
The only way it would happen is if there is no collision i.e stastical
"luck".
b) The paragraph between fig 4 and fig 5 talks about "virtual collision"
between two TCs within a station as _always_ favoring the higher prio.

Slide 43 on:
http://madwifi.org/attachment/wiki/ChipsetFeatures/WMM/qos11e.pdf?format=raw
also seems to indicate the default parameters for the different timers
is clearly strictly in favor of you if you have higher prio.
Do those numbers cross-reference with the IEEE doc you may have?

> As a side note, in one of the WFA WMM certification tests, the AP
> changes the medium access parameters of the access categories in a way
> that favors a lower access category. This is something very soft that
> cannot be reflected in any intuitive way in the host SW.

So essentially the test you mention changes priorities in real time. 
What is the purpose of this test? Is WMM expected to change its
priorities in real time?

cheers,
jamal


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: SKY2 vs SK98LIN performance on 88E8053 MAC

2007-06-11 Thread Stephen Hemminger
On Mon, 11 Jun 2007 12:05:02 -0700 (PDT)
Philip Romanov <[EMAIL PROTECTED]> wrote:

> 
> 
> > > We are doing pure IPv4 forwarding between two
> > Ethernet
> > > interfaces:
> > > 
> > >  IXIA port A<--->System Under Test<--->IXIA Port B
> > > 
> > > Traffic has two IP destinations for each direction
> > and
> > > L4 protocol is UDP. There are two static ARP
> > entries
> > > and only interface routes. Two tests are identical
> > > except that we switch from one driver to another. 
> > > 
> > > Ethernet ports on the SUT are oversubscribed --
> > I'm
> > > sending 60% of line rate (of 256-byte packets) and
> > > measuring percentage of pass-through traffic which
> > > makes to the IXIA port on the other side. Traffic
> > is
> > > bidirectional and system load is close to 100%.
> > > 
> 
> > 
> > Could you post the profiles. Hopefully, others have
> > good ideas
> > as well.
> > 
> > 256 bytes is the size where the copybreak
> > optimization kicks in
> > so you might want to experiment with the copybreak
> > module option
> > to the sky2 driver. copybreak=0 would no packets to
> > be copied,
> > copybreak=1514 would cause all packets to be copied.
> >  Copying is
> > an optimization that helps when receiving small
> > packets locally,
> > but may slow down forwarding path.
> > 
>  
> 
> Profiles were attached to previous posting in the
> thread. I'm pasting them in plain text now at the end.
> There are four profiles: two for the vmlinux and two
> for sky2 and sk98lin drivers.
> 
> Regarding copybreak parameter: it appears that it
> kicks in starting from 128 bytes by default??? 
> 
> ...
> static int copybreak __read_mostly = 128;
> module_param(copybreak, int, 0);
> MODULE_PARM_DESC(copybreak, "Receive copy threshold");
> ...
> 
> Anyway, I tried both copybreak settings of 0 and 1500:
> there is significant slowdown when copybreak is set to
> 1500 with 256-byte traffic. Another clarification:
> 256-byte packets refer to entire Ethernet frame
> including FCS, so when packets make into the driver
> they become 252-byte long. I also tried to switch
> driver to IRQ mode from MSI (SK98LIN is running is IRQ
> mode) -- that did not have any significant effect on
> forwarding performance.
> 
> 
> Oprofile results:
> 
> profile for vmlinux 2.6.21.3 running with sk98lin
> driver:
> 
> CPU: PIII, speed 2000.1 MHz (estimated)
> Counted CPU_CLK_UNHALTED events (clocks processor is
> not halted) with a unit mask of 0x00 (No unit mask)
> count 10
> samples  %symbol name
> 1626 14.3222  _raw_spin_trylock

Bogus extra locking in sk98lin, no surprise.
BTW. for a scare run lockdep on it...


> 935   8.2357  dev_hard_start_xmit
> 756   6.6590  sub_preempt_count
> 574   5.0559  __alloc_skb
> 507   4.4658  _raw_spin_unlock
> 462   4.0694  add_preempt_count
>
> ==
> profile for vmlinux 2.6.21.3 running with sky2 driver:
> 
> CPU: PIII, speed 2000.22 MHz (estimated)
> Counted CPU_CLK_UNHALTED events (clocks processor is
> not halted) with a unit mask of 0x00 (No unit mask)
> count 10
> samples  %symbol name
> 7894  9.0213  __alloc_skb
> 6475  7.3997  skb_release_data
> 5706  6.5208  dev_hard_start_xmit
> 5656  6.4637  ip_output
> 5652  6.4591  eth_type_trans
> 5432  6.2077  ip_rcv
> 5278  6.0317  netif_receive_skb
> 3499  3.9987  kfree
> 3195  3.6513  _raw_spin_trylock

It looks like it is reallocating for each receive, not sure why?
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] [RFC] NET: Implement a standard ndev_printk family

2007-06-11 Thread Joe Perches
On Mon, 2007-06-11 at 15:01 -0700, Kok, Auke wrote:
> > msg_enable is more frequently defined in drivers/net as u32 not int.
> yes, we're only using the bottom 15 bits anyway, but the net_device struct 
> consistently uses 'int' style members, leaving it up to the compiler to pick 
> an 
> appropriate size for each arch. That was the reason I chose int here.

I guess I'm from a "don't be larger than necessary" school.

As it's a bitfield and not a boolean, it should be unsigned.
Perhaps it could also be a good time to rename it to something
like netif_enabled_msgs?

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] [RFC] NET: Implement a standard ndev_printk family

2007-06-11 Thread Kok, Auke

Stephen Hemminger wrote:

On Mon, 11 Jun 2007 14:37:21 -0700
Auke Kok <[EMAIL PROTECTED]> wrote:


A lot of netdevices implement their own variant of printk and use
use variations of dev_printk, printk or others that use msg_enable,
which has been an eyesore with countless variations across drivers.

This patch implements a standard ndev_printk and derivatives
such as ndev_err, ndev_info, ndev_warn that allows drivers to
transparently use both the msg_enable and a generic netdevice
message layout. It moves the msg_enable over to the net_device
struct and allows drivers to obsolete ethtool handling code of
the msg_enable value.

The current code has each driver contain a copy of msg_enable and
handle the setting/changing through ethtool that way. Since the
netdev name is stored in the net_device struct, those two are
not coherently available in a uniform way across all drivers (a
single macro or function would not work since all drivers name
their net_device members differently). This makes netdevice
driver writes reinvent the wheel over and over again.

It thus makes sense to move msg_enable to the net_device. This
gives us the opportunity to (1) initialize it by default with a
globally sane value, (2) remove msg_enable handling code w/r
ethtool for drivers that know and use the msg_enable member
of the net_device struct. (3) Ethtool code can just modify the
net_device msg_enable for drivers that do not have custom
msg_enable get/set handlers so converted drivers lose some
code for that as well.

Signed-off-by: Auke Kok <[EMAIL PROTECTED]>
---

 include/linux/netdevice.h |   38 ++
 net/core/dev.c|5 +
 net/core/ethtool.c|   14 +++---
 3 files changed, 50 insertions(+), 7 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 3a70f55..d185f41 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -540,6 +540,8 @@ struct net_device
struct device   dev;
/* space for optional statistics and wireless sysfs groups */
struct attribute_group  *sysfs_groups[3];
+
+   int msg_enable;
 };


Since msg_enable is used as bits, it should be unsigned (probably unsigned 
long).


ack, but the long is really not needed here I think - we're only using the 
first 15.

Auke
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] NET: Multiqueue network device support.

2007-06-11 Thread Patrick McHardy
jamal wrote:
> On Mon, 2007-11-06 at 17:44 +0200, Patrick McHardy wrote:
> 
>>jamal wrote:
> 
> [..]
> 
>>>- let the driver shutdown whenever a ring is full. Remember which ring X
>>>shut it down.
>>>- when you get a tx interupt or prun tx descriptors, if a ring <= X has
>>>transmitted a packet (or threshold of packets), then wake up the driver
>>>(i.e open up). 
>>
>>
>>At this point the qdisc might send new packets. What do you do when a
>>packet for a full ring arrives?
>>
> 
> 
> Hrm... ok, is this a trick question or i am missing the obvious?;->
> What is wrong with what any driver would do today - which is:
> netif_stop and return BUSY; core requeues the packet?


That doesn't fix the problem, high priority queues may be starved
by low priority queues if you do that.

BTW, I missed something you said before:

--quote--
 i am making the case that it does not affect the overall results
as long as you use the same parameterization on qdisc and hardware.
--end quote--

I agree that multiple queue states wouldn't be necessary if they
would be parameterized the same, in that case we wouldn't even
need the qdisc at all (as you're saying). But one of the
parameters is the maximum queue length, and we want to be able
to parameterize the qdisc differently than the hardware here.
Which is the only reason for the possible starvation.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [WIP][DOC] Net tx batching

2007-06-11 Thread jamal
A small update on the e1000 

On Mon, 2007-11-06 at 09:52 -0400, jamal wrote:
> I have started writting a small howto for drivers. Hoping to get a wider
> testing with more drivers.
> So far i have changed e1000 and tun drivers as well as modified the
> packetgen tool to do batching.
> 
> I will update this document as needed if something is unclear. 
> Please contribute by asking questions, changing a driver and wide
> testing. I may target tg3 next and write a tool to do testing from
> UDP level.
> 
> cheers,
> jamal
> 

Heres the begining of a howto for driver author.
The current working tree can be found at:
git://git.kernel.org/pub/scm/linux/kernel/git/hadi/batch-lin26.git

The intended audience for this howto is people already
familiar with netdevices.

0) Hardware Pre-requisites:
---

You must have at least hardware that is capable of doing
DMA with many descriptors; i.e having hardware with a queue
length of 3 (as in some fscked ethernet hardware) is not
very useful in this case.

1) What is new in the driver API:
-

a) A new method called onto the driver by the net tx core to
batch packets. This method, dev->hard_batch_xmit(dev), 
is no different than dev->hard_start_xmit(dev) in terms of the 
arguements it takes. You just have to handle it differently 
(more below).

b) A new method, dev->hard_prep_xmit(), called onto the driver to 
massage the packet before it gets transmitted. 
This method is optional i.e if you dont specify it, you will
not be invoked(more below)

c) A new variable dev->xmit_win which provides suggestions to the
core calling into the driver a rough estimate of how many
packets can be batched onto the driver.

2) Driver pre-requisite


The typical driver tx state machine is:


--> +Core sends packets
+--> Driver puts packet onto hardware queue
+if hardware queue is full, netif_stop_queue(dev)
+
--> +core stops sending because of netif_stop_queue(dev)
..
.. time passes
..
..
--> +---> driver has transmitted packets, opens up tx path by
  invoking netif_wake_queue(dev)
--> +Core sends packets, and the cycle repeats.


The pre-requisite for batching changes is that the driver should 
provide a low threshold to open up the tx path.
This is a very important requirement in making batching useful.
Drivers such as tg3 and e1000 already do this.
So in the above annotation, as a driver author, before you
invoke netif_wake_queue(dev) you check if there are enough
entries left.

Heres an example of how i added it to tun driver
---
+#define NETDEV_LTT 4 /* the low threshold to open up the tx path */
..
..
u32 t = skb_queue_len(&tun->readq);
if (netif_queue_stopped(tun->dev) && t < NETDEV_LTT) {
tun->dev->xmit_win = tun->dev->tx_queue_len;
netif_wake_queue(tun->dev);
}
---

Heres how the batching e1000 driver does it (ignore the setting of
netdev->xmit_win, more on this later):

--
if (unlikely(cleaned && netif_carrier_ok(netdev) &&
 E1000_DESC_UNUSED(tx_ring) >= TX_WAKE_THRESHOLD)) {

if (netif_queue_stopped(netdev)) {
   int rspace =  E1000_DESC_UNUSED(tx_ring) - (MAX_SKB_FRAGS +  2);
   netdev->xmit_win = rspace;
   netif_wake_queue(netdev);
   }
---

in tg3 code looks like:

-
if (netif_queue_stopped(tp->dev) &&
(tg3_tx_avail(tp) > TG3_TX_WAKEUP_THRESH(tp)))
netif_wake_queue(tp->dev);
---

3) Driver Setup:
---

a) On initialization (before netdev registration)
 i) set NETIF_F_BTX in  dev->features 
  i.e dev->features |= NETIF_F_BTX
  This makes the core do proper initialization.

  ii) set dev->xmit_win to something reasonable like
  maybe half the tx DMA ring size etc.
  This is later used by the core to guess how much packets to send
  in one batch. 

  b) create proper pointer to the two new methods desribed above.


4) The new methods


  a) The batching method
  
Heres an example of a batch tx routine that is similar
to the one i added to tun driver


  static int xxx_net_bxmit(struct net_device *dev)
  {
  
  
while (skb_queue_len(dev->blist)) {
dequeue from dev->blist
enqueue onto hardware ring
if hardware ring full break
}
   
if (hardware ring full) {
  netif_stop_queue(dev);
  dev->xmit_win = 1;
}

   if we queued on hardware, tell it to chew
   ...
   ..
   .
  }
--

All return codes like NETDEV_TX_OK etc still apply.
In this method, if there are any IO operations that apply to a 
set of packets (such as kicking DMA) leave them to the end and apply
them once if you have successfully enqueued. For an example of this
look e1000 driver e1000_kick_DMA() function.

b) The dev->hard_prep_xmi

[PATCH] [RFC -v3] NET: Implement a standard ndev_printk family

2007-06-11 Thread Auke Kok
A lot of netdevices implement their own variant of printk and use
use variations of dev_printk, printk or others that use msg_enable,
which has been an eyesore with countless variations across drivers.

This patch implements a standard ndev_printk and derivatives
such as ndev_err, ndev_info, ndev_warn that allows drivers to
transparently use both the msg_enable and a generic netdevice
message layout. It moves the msg_enable over to the net_device
struct and allows drivers to obsolete ethtool handling code of
the msg_enable value.

The current code has each driver contain a copy of msg_enable and
handle the setting/changing through ethtool that way. Since the
netdev name is stored in the net_device struct, those two are
not coherently available in a uniform way across all drivers (a
single macro or function would not work since all drivers name
their net_device members differently). This makes netdevice
driver writes reinvent the wheel over and over again.

It thus makes sense to move msg_enable to the net_device. This
gives us the opportunity to (1) initialize it by default with a
globally sane value, (2) remove msg_enable handling code w/r
ethtool for drivers that know and use the msg_enable member
of the net_device struct. (3) Ethtool code can just modify the
net_device msg_enable for drivers that do not have custom
msg_enable get/set handlers so converted drivers lose some
code for that as well. (4) non-modified drivers remain
functional without additional changes needed.

Signed-off-by: Auke Kok <[EMAIL PROTECTED]>
---

 include/linux/netdevice.h |   54 +
 net/core/dev.c|5 
 net/core/ethtool.c|   14 ++--
 3 files changed, 66 insertions(+), 7 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 3a70f55..a5a5fc8 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -540,6 +540,8 @@ struct net_device
struct device   dev;
/* space for optional statistics and wireless sysfs groups */
struct attribute_group  *sysfs_groups[3];
+
+   unsigned intmsg_enable;
 };
 #define to_net_dev(d) container_of(d, struct net_device, dev)
 
@@ -838,6 +840,58 @@ enum {
NETIF_MSG_WOL   = 0x4000,
 };
 
+#define ndev_err(netdev, level, format, arg...) \
+   do { \
+   struct net_device *__nd = (netdev); \
+   if ((__nd)->msg_enable & NETIF_MSG_##level) \
+   printk(KERN_ERR "%s: %s: " format, (__nd)->name, \
+   (__nd)->dev.parent->bus_id, ## arg); \
+   } while (0)
+
+#define ndev_warn(netdev, level, format, arg...) \
+   do { \
+   struct net_device *__nd = (netdev); \
+   if ((__nd)->msg_enable & NETIF_MSG_##level) \
+   printk(KERN_WARNING "%s: %s: " format, (__nd)->name, \
+   (__nd)->dev.parent->bus_id, ## arg); \
+   } while (0)
+
+#define ndev_info(netdev, level, format, arg...) \
+   do { \
+   struct net_device *__nd = (netdev); \
+   if ((__nd)->msg_enable & NETIF_MSG_##level) \
+   printk(KERN_INFO "%s: %s: " format, (__nd)->name, \
+   (__nd)->dev.parent->bus_id, ## arg); \
+   } while (0)
+
+#define ndev_notice(netdev, level, format, arg...) \
+   do { \
+   struct net_device *__nd = (netdev); \
+   if ((__nd)->msg_enable & NETIF_MSG_##level) \
+   printk(KERN_NOTICE "%s: %s: " format, (__nd)->name, \
+   (__nd)->dev.parent->bus_id, ## arg); \
+   } while (0)
+
+#ifdef DEBUG
+#define ndev_dbg(netdev, level, format, arg...) \
+   do { \
+   struct net_device *__nd = (netdev); \
+   if ((__nd)->msg_enable & NETIF_MSG_##level) \
+   printk(KERN_DEBUG "%s: %s: " format, (__nd)->name, \
+   (__nd)->dev.parent->bus_id, ## arg); \
+   } while (0)
+
+#else
+/* Force type-checking on ndev_dbg() when DEBUG is not set */
+static inline int __attribute__ ((format (printf, 3, 4)))
+__ndev_dbg(struct net_device * netdev, u32 level, const char *format, ...)
+{
+   return 0;
+}
+#define ndev_dbg(netdev, level, fmt, args...) \
+   __ndev_dbg(netdev, NETIF_MSG_##level, fmt, ##args)
+#endif
+
 #define netif_msg_drv(p)   ((p)->msg_enable & NETIF_MSG_DRV)
 #define netif_msg_probe(p) ((p)->msg_enable & NETIF_MSG_PROBE)
 #define netif_msg_link(p)  ((p)->msg_enable & NETIF_MSG_LINK)
diff --git a/net/core/dev.c b/net/core/dev.c
index 2609062..316f250 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3378,6 +3378,11 @@ struct net_device *alloc_netdev(int sizeof_priv, const 
char *name,
dev->priv = netdev_priv(dev);
 
dev->get_stats = internal_stats;
+   dev->msg_enable = NETIF_MSG_DRV | NETIF_MSG_PROBE | 

Re: [PATCH] NET: Multiqueue network device support.

2007-06-11 Thread Patrick McHardy
jamal wrote:
> On Mon, 2007-11-06 at 17:44 +0200, Patrick McHardy wrote:
> 
>>At this point the qdisc might send new packets. What do you do when a
>>packet for a full ring arrives?
>>
> 
> 
> Hrm... ok, is this a trick question or i am missing the obvious?;->
> What is wrong with what any driver would do today - which is:
> netif_stop and return BUSY; core requeues the packet?
> 
> 
>>I see three choices:
>>
>>- drop it, even though its still within the qdiscs configured limits
>>- requeue it, which does not work because the qdisc is still active
>>  and might just hand you the same packet over and over again in a
>>  busy loop, until the ring has more room (which has the same worst
>>  case, just that we're sitting in a busy loop now).
>>- requeue and stop the queue: we're back to where we started since
>>  now higher priority packets will not get passed to the driver.
> 
> 
> Refer to choice #4 above. 


Replying again so we can hopefully move forward soon. Your choice #4
is exactly what I proposed as choice number 3.

Let me repeat my example why it doesn't work (well) without multiple
queue states (with typos etc fixed) and describe the possibilities.
If you still disagree I hope you can just change my example to show
how it gets fixed. As a thank you I will actually understand that
your solution works as well :)

We have n empty HW queues served in ascending priority order
with a maximum length of m packets per queue:

[0] empty
[1] empty
[2] empty
..
[n-1] empty

Now we receive m - 1 packets for all priorities >= 1 and < n - 1,
so we have:

[0] empty
[1] m - 1 packets
[2] m - 1 packets
..
[n-2] m - 1 packets
[n-1] empty

Since no HW queue is completely full, the queue is still active.
Now we receive m packets of priority n - 1:

[0] empty
[1] m - 1 packets
[2] m - 1 packets
..
[n-2] m - 1 packets
[n-1] m packets

At this point the queue needs to be stopped since the highest
priority queue is entirely full. To start it again at least
one packet of queue n - 1 needs to be sent, which requires
that queues 1 to n - 2 are serviced first. So any prio 0 packet
arriving during this period will sit in the qdisc and will not
reach the device for up to the time for (n - 2) * (m - 1) + 1
full sized packet transmissions. With multiple queue states
we'd know that queue 0 can still take packets.

You agreed that this is a problem and instead of keeping the
queue stopped until all rings can take at least one packet
again you proposed:

> - let the driver shutdown whenever a ring is full. Remember which
>   ring X shut it down.
> - when you get a tx interupt or prun tx descriptors, if a
>   ring <= X has transmitted a packet (or threshold of packets),
>   then wake up the driver (i.e open up).

At this point the queue is active, but at least one ring is already
full and the qdisc can still pass packets for it to the driver.
When this happens we can:

- drop it. This makes qdisc configured limit meaningless since
  the qdisc can't anticipate when the packet will make it through
  or get dropped.

- requeue it: this might result in a busy loop if the qdisc
  decides to hand out the packet again. The loop will be
  terminated once the ring has more room available an can
  eat the packet, which has the same worst case behaviour
  I described above.

- requeue (return BUSY) and stop the queue: thats what you
  proposed as option #4. The question is when to wake the
  queue again. Your suggestion was to wake it when some
  other queue with equal or higher priority got dequeued.
  Correcting my previous statement, you are correct that
  this will fix the starvation of higher priority queues
  because the qdisc has a chance to hand out either a packet
  of the same priority or higher priority, but at the cost of
  at worst (n - 1) * m unnecessary dequeues+requeues in case
  there is only a packet of lowest priority and we need to
  fully serve all higher priority HW queues before it can
  actually be dequeued. The other possibility would be to
  activate the queue again once all rings can take packets
  again, but that wouldn't fix the problem, which you can
  easily see if you go back to my example and assume we still
  have a low priority packet within the qdisc when the lowest
  priority ring fills up (and the queue is stopped), and after
  we tried to wake it and stopped it again the higher priority
  packet arrives.

Considering your proposal in combination with RR, you can see
the same problem of unnecessary dequeues+requeues. Since there
is no priority for waking the queue when a equal or higher
priority ring got dequeued as in the prio case, I presume you
would wake the queue whenever a packet was sent. For the RR
qdisc dequeue after requeue should hand out the same packet,
independantly of newly enqueued packets (which doesn't happen
and is a bug in Peter's RR version), so in the worst case the
HW has to make the entire round before a packet can get
dequeued in case the corresponding HW queue is full. This is
a bit

mac80211 fixes for 2.6.22

2007-06-11 Thread John W. Linville
Dave,

Here are a few mac80211 patches appropriate for 2.6.22.

Individual patches to follow, or you can pull at your leisure...

John

---

The following changes since commit 5ecd3100e695228ac5e0ce0e325e252c0f11806f:
  Linus Torvalds (1):
Linux 2.6.22-rc4

are found in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6.git 
mac80211-fixes

David Lamparter (1):
  cfg80211: fix signed macaddress in sysfs

Johannes Berg (1):
  mac80211: fix debugfs tx power reduction output

Mattias Nissler (1):
  mac80211: Don't stop tx queue on master device while scanning.

 net/mac80211/debugfs.c   |2 +-
 net/mac80211/ieee80211_sta.c |   12 
 net/wireless/sysfs.c |2 +-
 3 files changed, 14 insertions(+), 2 deletions(-)

-- 
John W. Linville
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


mac80211 patches for 2.6.23

2007-06-11 Thread John W. Linville
Dave,

Here are a few mac80211 patches appropriate for 2.6.23.

Individual patches to follow, or you can pull at your leisure...

John

---

The following changes since commit 5ecd3100e695228ac5e0ce0e325e252c0f11806f:
  Linus Torvalds (1):
Linux 2.6.22-rc4

are found in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6.git 
mac80211-upstream

Larry Finger (2):
  mac80211: Add support for SIOCGIWRATE ioctl to provide rate information
  mac80211: Set low initial rate in rc80211_simple

 net/mac80211/ieee80211_ioctl.c |   25 -
 net/mac80211/rc80211_simple.c  |   12 +++-
 2 files changed, 31 insertions(+), 6 deletions(-)

-- 
John W. Linville
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] mac80211: fix debugfs tx power reduction output

2007-06-11 Thread John W. Linville
From: Johannes Berg <[EMAIL PROTECTED]>

This patch fixes a typo in mac80211's debugfs.c.

Signed-off-by: Johannes Berg <[EMAIL PROTECTED]>
Signed-off-by: John W. Linville <[EMAIL PROTECTED]>
---
 net/mac80211/debugfs.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/net/mac80211/debugfs.c b/net/mac80211/debugfs.c
index bb6c0fe..476c848 100644
--- a/net/mac80211/debugfs.c
+++ b/net/mac80211/debugfs.c
@@ -112,7 +112,7 @@ DEBUGFS_READONLY_FILE(wep_iv, 20, "%#06x",
  local->wep_iv & 0xff);
 DEBUGFS_READONLY_FILE(tx_power_reduction, 20, "%d.%d dBm",
  local->hw.conf.tx_power_reduction / 10,
- local->hw.conf.tx_power_reduction & 10);
+ local->hw.conf.tx_power_reduction % 10);
 DEBUGFS_READONLY_FILE(rate_ctrl_alg, 100, "%s",
  local->rate_ctrl ? local->rate_ctrl->ops->name : 
"");
 
-- 
1.5.0.6

-- 
John W. Linville
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] cfg80211: fix signed macaddress in sysfs

2007-06-11 Thread John W. Linville
From: David Lamparter <[EMAIL PROTECTED]>

Fix signedness mixup making mac addresses show up strangely
(like 00:11:22:33:44:ffaa) in /sys/class/ieee80211/*/macaddress.

Signed-off-by: David Lamparter <[EMAIL PROTECTED]>
Acked-by: Johannes Berg <[EMAIL PROTECTED]>
Signed-off-by: John W. Linville <[EMAIL PROTECTED]>
---
 net/wireless/sysfs.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/net/wireless/sysfs.c b/net/wireless/sysfs.c
index 3ebae14..88aaacd 100644
--- a/net/wireless/sysfs.c
+++ b/net/wireless/sysfs.c
@@ -33,7 +33,7 @@ static ssize_t _show_permaddr(struct device *dev,
  struct device_attribute *attr,
  char *buf)
 {
-   char *addr = dev_to_rdev(dev)->wiphy.perm_addr;
+   unsigned char *addr = dev_to_rdev(dev)->wiphy.perm_addr;
 
return sprintf(buf, "%.2x:%.2x:%.2x:%.2x:%.2x:%.2x\n",
   addr[0], addr[1], addr[2], addr[3], addr[4], addr[5]);
-- 
1.5.0.6

-- 
John W. Linville
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] mac80211: Don't stop tx queue on master device while scanning.

2007-06-11 Thread John W. Linville
From: Mattias Nissler <[EMAIL PROTECTED]>

mac80211 stops the tx queues during scans. This is wrong with respect
to the master deivce tx queue, since stopping it prevents any probes
from being sent during the scan. Instead, they accumulate in the queue
and are only sent after the scan is finished, which is obviously
wrong.

Signed-off-by: Mattias Nissler <[EMAIL PROTECTED]>
Signed-off-by: John W. Linville <[EMAIL PROTECTED]>
---
 net/mac80211/ieee80211_sta.c |   12 
 1 files changed, 12 insertions(+), 0 deletions(-)

diff --git a/net/mac80211/ieee80211_sta.c b/net/mac80211/ieee80211_sta.c
index 9f30ae4..91b545c 100644
--- a/net/mac80211/ieee80211_sta.c
+++ b/net/mac80211/ieee80211_sta.c
@@ -2592,11 +2592,17 @@ void ieee80211_scan_completed(struct ieee80211_hw *hw)
 
read_lock(&local->sub_if_lock);
list_for_each_entry(sdata, &local->sub_if_list, list) {
+
+   /* No need to wake the master device. */
+   if (sdata->dev == local->mdev)
+   continue;
+
if (sdata->type == IEEE80211_IF_TYPE_STA) {
if (sdata->u.sta.associated)
ieee80211_send_nullfunc(local, sdata, 0);
ieee80211_sta_timer((unsigned long)sdata);
}
+
netif_wake_queue(sdata->dev);
}
read_unlock(&local->sub_if_lock);
@@ -2738,6 +2744,12 @@ static int ieee80211_sta_start_scan(struct net_device 
*dev,
 
read_lock(&local->sub_if_lock);
list_for_each_entry(sdata, &local->sub_if_list, list) {
+
+   /* Don't stop the master interface, otherwise we can't transmit
+* probes! */
+   if (sdata->dev == local->mdev)
+   continue;
+
netif_stop_queue(sdata->dev);
if (sdata->type == IEEE80211_IF_TYPE_STA &&
sdata->u.sta.associated)
-- 
1.5.0.6

-- 
John W. Linville
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


  1   2   >