Re: [PATCH] RFC: have tcp_recvmsg() check kthread_should_stop() and treat it as if it were signalled

2007-06-09 Thread Jeff Layton
On Sat, 09 Jun 2007 11:30:04 +1000
Herbert Xu <[EMAIL PROTECTED]> wrote:

> Please cc networking patches to [EMAIL PROTECTED]
> 
> Jeff Layton <[EMAIL PROTECTED]> wrote:
> > 
> > The following patch is a first stab at removing this need. It makes it
> > so that in tcp_recvmsg() we also check kthread_should_stop() at any
> > point where we currently check to see if the task was signalled. If
> > that returns true, then it acts as if it were signalled and returns to
> > the calling function.
> 
> This just doesn't seem to fit.  Why should networking care about kthreads?
> 
> Perhaps you can get kthread_stop to send a signal instead?
> 

The problem there is that we still have to make the kthread let signals
through. The nice thing about this approach is that we can make the
kthread ignore signals, but still allow it to break out of kernel_recvmsg
when a kthread_stop is done.

Though I will confess that you have a point about this feeling like a
layering violation...

-- 
Jeff Layton <[EMAIL PROTECTED]>
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] NET: Multiqueue network device support.

2007-06-09 Thread Herbert Xu
On Fri, Jun 08, 2007 at 09:12:52AM -0400, jamal wrote:
> 
> To mimick that behavior in LLTX, a driver needs to use the same lock on
> both tx and receive. e1000 holds a different lock on tx path from rx
> path. Maybe theres something clever i am missing; but it seems to be a
> bug on e1000.

It's both actually :)

It takes the tx_lock in the xmit routine as well as in the clean-up
routine.  However, the lock is only taken when it updates the queue
status.

Thanks to the ring buffer structure the rest of the clean-up/xmit code
will run concurrently just fine.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] iproute2: Format IPv6 tunnels endpoints nicely.

2007-06-09 Thread David Lamparter
Change formatting of IPv6 tunnel endpoints from hex chain to standard IPv6
representation.

Signed-off-by: David Lamparter <[EMAIL PROTECTED]>

---
 lib/ll_addr.c |3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/lib/ll_addr.c b/lib/ll_addr.c
index 581487d..f558050 100644
--- a/lib/ll_addr.c
+++ b/lib/ll_addr.c
@@ -38,6 +38,9 @@ const char *ll_addr_n2a(unsigned char *addr, int alen, int 
type, char *buf, int
(type == ARPHRD_TUNNEL || type == ARPHRD_SIT || type == 
ARPHRD_IPGRE)) {
return inet_ntop(AF_INET, addr, buf, blen);
}
+   if (alen == 16 && type == ARPHRD_TUNNEL6) {
+   return inet_ntop(AF_INET6, addr, buf, blen);
+   }
l = 0;
for (i=0; ihttp://vger.kernel.org/majordomo-info.html


[RFC] [PATCHES] pktgen IPSEC 0/4

2007-06-09 Thread jamal

This is a set of patches that add ipsec functionality to pktgen. I have
lost these patches before - but they are now fully recovered and well
tested. Robert has glanced at the patches and seems to have no qualms
with them. I am soliciting for any feedback because i would like to push
them for 2.6.23 when Dave opens his tree.

[I think i may have figured out how the cool cats send their series of
patches using git patch-format but i am not sure i can trust my lil
soldier's config to do the right thing. So i will do them manually. 
I am climbing up the git ladder folks, one step at a time!].


cheers,
jamal

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PKTGEN] Centralize packet overhead tracking

2007-06-09 Thread jamal
1 of 4.

cheers,
jamal

commit f7da845f37e3cd47be46697491210c126b37c8fc
Author: Jamal Hadi Salim <[EMAIL PROTECTED]>
Date:   Sat Jun 9 09:11:16 2007 -0400

[PKTGEN] Centralize packet overhead tracking
Track the extra packet overhead for VLAN tags, MPLS, IPSEC etc

Signed-off-by: Jamal Hadi Salim <[EMAIL PROTECTED]>

diff --git a/net/core/pktgen.c b/net/core/pktgen.c
index 9cd3a1c..1352316 100644
--- a/net/core/pktgen.c
+++ b/net/core/pktgen.c
@@ -228,6 +228,7 @@ struct pktgen_dev {
 
int min_pkt_size;   /* = ETH_ZLEN; */
int max_pkt_size;   /* = ETH_ZLEN; */
+   int pkt_overhead;   /* overhead for MPLS, VLANs, IPSEC etc */
int nfrags;
__u32 delay_us; /* Default delay */
__u32 delay_ns;
@@ -2075,6 +2076,13 @@ static void spin(struct pktgen_dev *pkt_dev, __u64 
spin_until_us)
pkt_dev->idle_acc += now - start;
 }
 
+static inline void set_pkt_overhead(struct pktgen_dev *pkt_dev)
+{
+   pkt_dev->pkt_overhead += pkt_dev->nr_labels*sizeof(u32);
+   pkt_dev->pkt_overhead += VLAN_TAG_SIZE(pkt_dev);
+   pkt_dev->pkt_overhead += SVLAN_TAG_SIZE(pkt_dev);
+}
+
 /* Increment/randomize headers according to flags and current values
  * for IP src/dest, UDP src/dst port, MAC-Addr src/dst
  */
@@ -2323,9 +2331,7 @@ static struct sk_buff *fill_packet_ipv4(struct net_device 
*odev,
 
datalen = (odev->hard_header_len + 16) & ~0xf;
skb = alloc_skb(pkt_dev->cur_pkt_size + 64 + datalen +
-   pkt_dev->nr_labels*sizeof(u32) +
-   VLAN_TAG_SIZE(pkt_dev) + SVLAN_TAG_SIZE(pkt_dev),
-   GFP_ATOMIC);
+   pkt_dev->pkt_overhead, GFP_ATOMIC);
if (!skb) {
sprintf(pkt_dev->result, "No memory");
return NULL;
@@ -2368,7 +2374,7 @@ static struct sk_buff *fill_packet_ipv4(struct net_device 
*odev,
 
/* Eth + IPh + UDPh + mpls */
datalen = pkt_dev->cur_pkt_size - 14 - 20 - 8 -
- pkt_dev->nr_labels*sizeof(u32) - VLAN_TAG_SIZE(pkt_dev) - 
SVLAN_TAG_SIZE(pkt_dev);
+ pkt_dev->pkt_overhead;
if (datalen < sizeof(struct pktgen_hdr))
datalen = sizeof(struct pktgen_hdr);
 
@@ -2391,8 +2397,7 @@ static struct sk_buff *fill_packet_ipv4(struct net_device 
*odev,
iph->check = ip_fast_csum((void *)iph, iph->ihl);
skb->protocol = protocol;
skb->mac_header = (skb->network_header - ETH_HLEN -
-  pkt_dev->nr_labels * sizeof(u32) -
-  VLAN_TAG_SIZE(pkt_dev) - SVLAN_TAG_SIZE(pkt_dev));
+  pkt_dev->pkt_overhead);
skb->dev = odev;
skb->pkt_type = PACKET_HOST;
 
@@ -2662,9 +2667,7 @@ static struct sk_buff *fill_packet_ipv6(struct net_device 
*odev,
mod_cur_headers(pkt_dev);
 
skb = alloc_skb(pkt_dev->cur_pkt_size + 64 + 16 +
-   pkt_dev->nr_labels*sizeof(u32) +
-   VLAN_TAG_SIZE(pkt_dev) + SVLAN_TAG_SIZE(pkt_dev),
-   GFP_ATOMIC);
+   pkt_dev->pkt_overhead, GFP_ATOMIC);
if (!skb) {
sprintf(pkt_dev->result, "No memory");
return NULL;
@@ -2708,7 +2711,7 @@ static struct sk_buff *fill_packet_ipv6(struct net_device 
*odev,
/* Eth + IPh + UDPh + mpls */
datalen = pkt_dev->cur_pkt_size - 14 -
  sizeof(struct ipv6hdr) - sizeof(struct udphdr) -
- pkt_dev->nr_labels*sizeof(u32) - VLAN_TAG_SIZE(pkt_dev) - 
SVLAN_TAG_SIZE(pkt_dev);
+ pkt_dev->pkt_overhead;
 
if (datalen < sizeof(struct pktgen_hdr)) {
datalen = sizeof(struct pktgen_hdr);
@@ -2738,8 +2741,7 @@ static struct sk_buff *fill_packet_ipv6(struct net_device 
*odev,
ipv6_addr_copy(&iph->saddr, &pkt_dev->cur_in6_saddr);
 
skb->mac_header = (skb->network_header - ETH_HLEN -
-  pkt_dev->nr_labels * sizeof(u32) -
-  VLAN_TAG_SIZE(pkt_dev) - SVLAN_TAG_SIZE(pkt_dev));
+  pkt_dev->pkt_overhead);
skb->protocol = protocol;
skb->dev = odev;
skb->pkt_type = PACKET_HOST;
@@ -2857,6 +2859,7 @@ static void pktgen_run(struct pktgen_thread *t)
pkt_dev->started_at = getCurUs();
pkt_dev->next_tx_us = getCurUs();   /* Transmit 
immediately */
pkt_dev->next_tx_ns = 0;
+   set_pkt_overhead(pkt_dev);
 
strcpy(pkt_dev->result, "Starting");
started++;


[PKTGEN] Introduce sequential flows

2007-06-09 Thread jamal
2 of 4.

cheers,
jamal

commit d0d2c0c2e5539a54d66f07d2fa99bb52c19cc698
Author: Jamal Hadi Salim <[EMAIL PROTECTED]>
Date:   Sat Jun 9 09:12:21 2007 -0400

[PKTGEN] Introduce sequential flows

By default all flows in pktgen are randomly selected.
This patch introduces ability to have all defined flows to
be sent sequentially. It also cleans the small piece of code
associated with the change for readability.

Signed-off-by: Jamal Hadi Salim <[EMAIL PROTECTED]>

diff --git a/net/core/pktgen.c b/net/core/pktgen.c
index 1352316..2e861d2 100644
--- a/net/core/pktgen.c
+++ b/net/core/pktgen.c
@@ -181,6 +181,7 @@
 #define F_MPLS_RND(1<<8)   /* Random MPLS labels */
 #define F_VID_RND (1<<9)   /* Random VLAN ID */
 #define F_SVID_RND(1<<10)  /* Random SVLAN ID */
+#define F_FLOW_RND(1<<11)  /* Random flows */
 
 /* Thread control flag bits */
 #define T_TERMINATE   (1<<0)
@@ -207,8 +208,12 @@ static struct proc_dir_entry *pg_proc_dir = NULL;
 struct flow_state {
__be32 cur_daddr;
int count;
+   __u32 flags;
 };
 
+/* flow flag bits */
+#define F_INIT   (1<<0)/* flow has been initialized */
+
 struct pktgen_dev {
/*
 * Try to keep frequent/infrequent used vars. separated.
@@ -342,6 +347,7 @@ struct pktgen_dev {
unsigned cflows;/* Concurrent flows (config) */
unsigned lflow; /* Flow length  (config) */
unsigned nflows;/* accumulated flows (stats) */
+   unsigned curfl; /* current sequenced flow (state)*/
 
char result[512];
 };
@@ -691,6 +697,11 @@ static int pktgen_if_show(struct seq_file *seq, void *v)
if (pkt_dev->flags & F_MPLS_RND)
seq_printf(seq,  "MPLS_RND  ");
 
+   if (pkt_dev->flags & F_FLOW_RND)
+   seq_printf(seq,  "FLOW_RND  ");
+   else
+   seq_printf(seq,  "FLOW_SEQ  "); /*in sequence flows*/
+
if (pkt_dev->flags & F_MACSRC_RND)
seq_printf(seq, "MACSRC_RND  ");
 
@@ -1182,6 +1193,9 @@ static ssize_t pktgen_if_write(struct file *file,
else if (strcmp(f, "!SVID_RND") == 0)
pkt_dev->flags &= ~F_SVID_RND;
 
+   else if (strcmp(f, "FLOW_RND") == 0)
+   pkt_dev->flags |= F_FLOW_RND;
+
else if (strcmp(f, "!IPV6") == 0)
pkt_dev->flags &= ~F_IPV6;
 
@@ -1190,7 +1204,7 @@ static ssize_t pktgen_if_write(struct file *file,
"Flag -:%s:- unknown\nAvailable flags, (prepend 
! to un-set flag):\n%s",
f,
"IPSRC_RND, IPDST_RND, UDPSRC_RND, UDPDST_RND, "
-   "MACSRC_RND, MACDST_RND, TXSIZE_RND, IPV6, 
MPLS_RND, VID_RND, SVID_RND\n");
+   "MACSRC_RND, MACDST_RND, TXSIZE_RND, IPV6, 
MPLS_RND, VID_RND, SVID_RND, FLOW_RND\n");
return count;
}
sprintf(pg_result, "OK: flags=0x%x", pkt_dev->flags);
@@ -2083,6 +2097,37 @@ static inline void set_pkt_overhead(struct pktgen_dev 
*pkt_dev)
pkt_dev->pkt_overhead += SVLAN_TAG_SIZE(pkt_dev);
 }
 
+static inline int f_seen(struct pktgen_dev *pkt_dev, int flow)
+{
+
+   if (pkt_dev->flows[flow].flags & F_INIT)
+   return 1;
+   else
+   return 0;
+}
+
+static inline int f_pick(struct pktgen_dev *pkt_dev)
+{
+   int flow = pkt_dev->curfl;
+
+   if (pkt_dev->flags & F_FLOW_RND) {
+   flow = random32() % pkt_dev->cflows;
+
+   if (pkt_dev->flows[flow].count > pkt_dev->lflow)
+   pkt_dev->flows[flow].count = 0;
+   } else {
+   if (pkt_dev->flows[flow].count >= pkt_dev->lflow) {
+   /* reset time */
+   pkt_dev->flows[flow].count = 0;
+   pkt_dev->curfl += 1;
+   if (pkt_dev->curfl >= pkt_dev->cflows)
+   pkt_dev->curfl = 0; /*reset */
+   }
+   }
+
+   return pkt_dev->curfl;
+}
+
 /* Increment/randomize headers according to flags and current values
  * for IP src/dest, UDP src/dst port, MAC-Addr src/dst
  */
@@ -2092,12 +2137,8 @@ static void mod_cur_headers(struct pktgen_dev *pkt_dev)
__u32 imx;
int flow = 0;
 
-   if (pkt_dev->cflows) {
-   flow = random32() % pkt_dev->cflows;
-
-   if (pkt_dev->flows[flow].count > pkt_dev->lflow)
-   pkt_dev->flows[flow].count = 0;
-   }
+   if (pkt_dev->cflows)
+   flow = f_pick(pkt_dev);
 
/*  Deal with source MAC */
if (pkt_dev->src_mac_count > 1) {
@@ -2213,7 +2254,7 @@ static void mod_cur_headers(struct pktgen_dev *pkt_dev)
pkt_dev->cur_saddr = htonl(t);
}
 
-   if (pkt_dev->cflows && pkt_dev->flows[flow].count 

[XFRM] Introduce standalone SAD lookup

2007-06-09 Thread jamal
3 of 4.

cheers,
jamal
commit 923d6c49f9f513da41e4bfd8188304787a5c8093
Author: Jamal Hadi Salim <[EMAIL PROTECTED]>
Date:   Sat Jun 9 09:16:12 2007 -0400

[XFRM] Introduce standalone SAD lookup
This allows other in-kernel functions to do SAD lookups.
The only known user at the moment is pktgen.

Signed-off-by: Jamal Hadi Salim <[EMAIL PROTECTED]>

diff --git a/include/net/xfrm.h b/include/net/xfrm.h
index 311f25a..79d2c37 100644
--- a/include/net/xfrm.h
+++ b/include/net/xfrm.h
@@ -920,6 +920,10 @@ extern struct xfrm_state *xfrm_state_find(xfrm_address_t 
*daddr, xfrm_address_t
  struct flowi *fl, struct xfrm_tmpl 
*tmpl,
  struct xfrm_policy *pol, int *err,
  unsigned short family);
+extern struct xfrm_state * xfrm_stateonly_find(xfrm_address_t *daddr,
+  xfrm_address_t *saddr,
+  unsigned short family,
+  u8 mode, u8 proto, u32 reqid);
 extern int xfrm_state_check_expire(struct xfrm_state *x);
 extern void xfrm_state_insert(struct xfrm_state *x);
 extern int xfrm_state_add(struct xfrm_state *x);
diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c
index 85f3f43..b8562e4 100644
--- a/net/xfrm/xfrm_state.c
+++ b/net/xfrm/xfrm_state.c
@@ -686,6 +686,41 @@ out:
return x;
 }
 
+struct xfrm_state *
+xfrm_stateonly_find(xfrm_address_t *daddr, xfrm_address_t *saddr,
+   unsigned short family, u8 mode, u8 proto, u32 reqid)
+{
+   unsigned int h = xfrm_dst_hash(daddr, saddr, reqid, family);
+   struct xfrm_state *rx = NULL, *x = NULL;
+   struct hlist_node *entry;
+
+   spin_lock(&xfrm_state_lock);
+   hlist_for_each_entry(x, entry, xfrm_state_bydst+h, bydst) {
+   if (x->props.family == family &&
+   x->props.reqid == reqid &&
+   !(x->props.flags & XFRM_STATE_WILDRECV) &&
+   xfrm_state_addr_check(x, daddr, saddr, family) &&
+   mode == x->props.mode &&
+   proto == x->id.proto)  {
+
+   if (x->km.state != XFRM_STATE_VALID)
+   continue;
+   else {
+   rx = x;
+   break;
+   }
+   }
+   }
+
+   if (rx)
+   xfrm_state_hold(rx);
+   spin_unlock(&xfrm_state_lock);
+
+
+   return rx;
+}
+EXPORT_SYMBOL(xfrm_stateonly_find);
+
 static void __xfrm_state_insert(struct xfrm_state *x)
 {
unsigned int h;


[PKTGEN] IPSEC support

2007-06-09 Thread jamal
4 of 4

cheers,
jamal
commit d1d8ea490a517df484e6774c4f41123ccde52434
Author: Jamal Hadi Salim <[EMAIL PROTECTED]>
Date:   Sat Jun 9 09:46:52 2007 -0400

[PKTGEN] IPSEC support
Added transport mode ESP support for starters.
I will send more of these modes and types once i have resolved
the tunnel mode isses.

Signed-off-by: Jamal Hadi Salim <[EMAIL PROTECTED]>

diff --git a/net/core/pktgen.c b/net/core/pktgen.c
index 2e861d2..2ef80aa 100644
--- a/net/core/pktgen.c
+++ b/net/core/pktgen.c
@@ -152,6 +152,9 @@
 #include 
 #include 
 #include 
+#ifdef CONFIG_XFRM
+#include 
+#endif
 #include 
 #include 
 #include 
@@ -182,6 +185,7 @@
 #define F_VID_RND (1<<9)   /* Random VLAN ID */
 #define F_SVID_RND(1<<10)  /* Random SVLAN ID */
 #define F_FLOW_RND(1<<11)  /* Random flows */
+#define F_IPSEC_ON(1<<12)  /* ipsec on for flows */
 
 /* Thread control flag bits */
 #define T_TERMINATE   (1<<0)
@@ -208,6 +212,9 @@ static struct proc_dir_entry *pg_proc_dir = NULL;
 struct flow_state {
__be32 cur_daddr;
int count;
+#ifdef CONFIG_XFRM
+   struct xfrm_state *x;
+#endif
__u32 flags;
 };
 
@@ -348,7 +355,10 @@ struct pktgen_dev {
unsigned lflow; /* Flow length  (config) */
unsigned nflows;/* accumulated flows (stats) */
unsigned curfl; /* current sequenced flow (state)*/
-
+#ifdef CONFIG_XFRM
+   __u8ipsmode;/* IPSEC mode (config) */
+   __u8ipsproto;   /* IPSEC type (config) */
+#endif
char result[512];
 };
 
@@ -702,6 +712,9 @@ static int pktgen_if_show(struct seq_file *seq, void *v)
else
seq_printf(seq,  "FLOW_SEQ  "); /*in sequence flows*/
 
+   if (pkt_dev->flags & F_IPSEC_ON)
+   seq_printf(seq,  "IPSEC  ");
+
if (pkt_dev->flags & F_MACSRC_RND)
seq_printf(seq, "MACSRC_RND  ");
 
@@ -1196,6 +1209,11 @@ static ssize_t pktgen_if_write(struct file *file,
else if (strcmp(f, "FLOW_RND") == 0)
pkt_dev->flags |= F_FLOW_RND;
 
+#ifdef CONFIG_XFRM
+   else if (strcmp(f, "IPSEC") == 0)
+   pkt_dev->flags |= F_IPSEC_ON;
+#endif
+
else if (strcmp(f, "!IPV6") == 0)
pkt_dev->flags &= ~F_IPV6;
 
@@ -1204,7 +1222,7 @@ static ssize_t pktgen_if_write(struct file *file,
"Flag -:%s:- unknown\nAvailable flags, (prepend 
! to un-set flag):\n%s",
f,
"IPSRC_RND, IPDST_RND, UDPSRC_RND, UDPDST_RND, "
-   "MACSRC_RND, MACDST_RND, TXSIZE_RND, IPV6, 
MPLS_RND, VID_RND, SVID_RND, FLOW_RND\n");
+   "MACSRC_RND, MACDST_RND, TXSIZE_RND, IPV6, 
MPLS_RND, VID_RND, SVID_RND, FLOW_RND, IPSEC\n");
return count;
}
sprintf(pg_result, "OK: flags=0x%x", pkt_dev->flags);
@@ -2092,6 +2110,7 @@ static void spin(struct pktgen_dev *pkt_dev, __u64 
spin_until_us)
 
 static inline void set_pkt_overhead(struct pktgen_dev *pkt_dev)
 {
+   pkt_dev->pkt_overhead = 0;
pkt_dev->pkt_overhead += pkt_dev->nr_labels*sizeof(u32);
pkt_dev->pkt_overhead += VLAN_TAG_SIZE(pkt_dev);
pkt_dev->pkt_overhead += SVLAN_TAG_SIZE(pkt_dev);
@@ -2128,6 +2147,31 @@ static inline int f_pick(struct pktgen_dev *pkt_dev)
return pkt_dev->curfl;
 }
 
+
+#ifdef CONFIG_XFRM
+/* If there was already an IPSEC SA, we keep it as is, else
+ * we go look for it ...
+*/
+inline
+void get_ipsec_sa(struct pktgen_dev *pkt_dev, int flow)
+{
+   struct xfrm_state *x = pkt_dev->flows[flow].x;
+   if (!x) {
+   /*slow path: we dont already have xfrm_state*/
+   x = xfrm_stateonly_find((xfrm_address_t *)&pkt_dev->cur_daddr,
+   (xfrm_address_t *)&pkt_dev->cur_saddr,
+   AF_INET,
+   pkt_dev->ipsmode,
+   pkt_dev->ipsproto, 0);
+   if (x) {
+   pkt_dev->flows[flow].x = x;
+   set_pkt_overhead(pkt_dev);
+   pkt_dev->pkt_overhead+=x->props.header_len;
+   }
+
+   }
+}
+#endif
 /* Increment/randomize headers according to flags and current values
  * for IP src/dest, UDP src/dst port, MAC-Addr src/dst
  */
@@ -2287,6 +2331,10 @@ static void mod_cur_headers(struct pktgen_dev *pkt_dev)
pkt_dev->flows[flow].flags |= F_INIT;
pkt_dev->flows[flow].cur_daddr =
pkt_dev->cur_daddr;
+#ifdef CONFIG_XFRM
+   if (pkt_dev->flags & F_IPSEC_ON)
+   get_ipsec_sa(pkt_dev, flow);
+#endif
pkt_dev->nflows++;

Re: [XFRM] Introduce standalone SAD lookup

2007-06-09 Thread jamal
Sorry, meant to cc Herbert and James since they commented two
generations ago.
Gents, if you manage to have the cycles please look at this specific
one. Herbert, for tunnel mode i think i will agree with you and
introduce a dst struct; but i will defer that to some later patch.

cheers,
jamal

On Sat, 2007-09-06 at 10:18 -0400, jamal wrote:
> 3 of 4.
> 
> cheers,
> jamal

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] NET: Multiqueue network device support.

2007-06-09 Thread jamal
On Sat, 2007-09-06 at 21:08 +1000, Herbert Xu wrote:

> It takes the tx_lock in the xmit routine as well as in the clean-up
> routine.  However, the lock is only taken when it updates the queue
> status.
> 
> Thanks to the ring buffer structure the rest of the clean-up/xmit code
> will run concurrently just fine.

I know you are a patient man Herbert - so please explain slowly (if that
doesnt make sense on email, then bear with me as usual) ;->

- it seems the cleverness is that some parts of the ring description are
written to on tx but not rx (and vice-versa), correct? example the
next_to_watch/use bits. If thats a yes - there at least should have been
a big fat comment on the code so nobody changes it;
- and even if thats the case, 
a) then the tx_lock sounds unneeded, correct? (given the RUNNING
atomicity).
b) do you even need the adapter lock? ;-> given the nature of the NAPI
poll only one CPU can prune the descriptors.

I have tested with just getting rid of tx_lock and it worked fine. I
havent tried removing the adapter lock.

cheers,
jamal



-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: networking busted in current -git ???

2007-06-09 Thread Trond Myklebust
On Fri, 2007-06-08 at 19:06 -0700, David Miller wrote:
> From: Trond Myklebust <[EMAIL PROTECTED]>
> Date: Fri, 08 Jun 2007 17:43:27 -0400
> 
> > It is not dhcp. I'm seeing the same bug with bog-standard ifup with a
> > static address on an FC-6 machine.
> > 
> > It appears to be something in the latest dump from davem to Linus, but I
> > haven't yet had time to identify what.
> 
> Linus's current tree should have this fixed.
> 
> Let us know if this is not the case.

It appears to be working for me again.

Trond

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH] NET: Multiqueue network device support.

2007-06-09 Thread Leonid Grossman


> -Original Message-
> From: [EMAIL PROTECTED] [mailto:netdev-
> [EMAIL PROTECTED] On Behalf Of Waskiewicz Jr, Peter P
> Sent: Wednesday, June 06, 2007 3:31 PM
> To: [EMAIL PROTECTED]; Patrick McHardy
> Cc: [EMAIL PROTECTED]; netdev@vger.kernel.org; [EMAIL PROTECTED]; Kok,
> Auke-jan H
> Subject: RE: [PATCH] NET: Multiqueue network device support.
> 
> > [Which of course leads to the complexity (and not optimizing
> > for the common - which is single ring NICs)].
> 
> The common for 100 Mbit and older 1Gbit is single ring NICs.  Newer
> PCI-X and PCIe NICs from 1Gbit to 10Gbit support multiple rings in the
> hardware, and it's all headed in that direction, so it's becoming the
> common case.

IMHO, in addition to current Intel and Neterion NICs, some/most upcoming
NICs are likely to be multiqueue, since virtualization emerges as a
major driver for hw designs (there are other things of course that drive
hw, but these are complimentary to multiqueue).

PCI-SIG IOV extensions for pci spec are almost done, and a typical NIC
(at least, typical 10GbE NIC that supports some subset of IOV) in the
near future is likely to have at least 8  independent channels with its
own tx/rx queue, MAC address, msi-x vector(s), reset that doesn't affect
other channels, etc.

Basically, each channel could be used as an independent NIC that just
happens to share pci bus and 10GbE PHY with other channels (but has
per-channel QoS and throughput guarantees).

In a non-virtualized system, such NICs could be used in a mode when each
channel runs on one core; this may eliminate some locking...  This mode
will require btw deterministic session steering, current hashing
approach in the patch is not sufficient; this is something we can
contribute once Peter's code is in. 
In general, a consensus on kernel support for multiqueue NICs will be
beneficial since multiqueue HW is here and other stacks already taking
advantage of it. 
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] PHY fixed driver: rework release path and update phy_id notation

2007-06-09 Thread Vitaly Bordug

device_bind_driver() error code returning has been fixed. 
release() function has been written, so that to free resources
in correct way; the release path is now clean.

Before the rework, it used to cause
 Device '[EMAIL PROTECTED]:1' does not have a release() function, it is broken
 and must be fixed.
 BUG: at drivers/base/core.c:104 device_release()
 
 Call Trace:  
  [] kobject_cleanup+0x53/0x7e
  [] kobject_release+0x0/0x9
  [] kref_put+0x74/0x81
  [] fixed_mdio_register_device+0x230/0x265
  [] fixed_init+0x1f/0x35
  [] init+0x147/0x2fb
  [] schedule_tail+0x36/0x92
  [] child_rip+0xa/0x12
  [] acpi_ds_init_one_object+0x0/0x83
  [] init+0x0/0x2fb
  [] child_rip+0x0/0x12  


Also changed the notation of the fixed phy definition on
mdio bus to the form of + to make it able to be used by
gianfar and ucc_geth that define phy_id strictly as "%d:%d"

Signed-off-by: Vitaly Bordug <[EMAIL PROTECTED]>

Signed-off-by: Vitaly Bordug <[EMAIL PROTECTED]>
  
---

 drivers/net/phy/Kconfig |4 ++
 drivers/net/phy/fixed.c |   93 +++
 2 files changed, 57 insertions(+), 40 deletions(-)

diff --git a/drivers/net/phy/Kconfig b/drivers/net/phy/Kconfig
index 09b6f25..a938c48 100644
--- a/drivers/net/phy/Kconfig
+++ b/drivers/net/phy/Kconfig
@@ -71,4 +71,8 @@ config FIXED_MII_100_FDX
bool "Emulation for 100M Fdx fixed PHY behavior"
depends on FIXED_PHY
 
+config FIXED_MII_1000_FDX
+   bool "Emulation for 1000M Fdx fixed PHY behavior"
+   depends on FIXED_PHY
+
 endif # PHYLIB
diff --git a/drivers/net/phy/fixed.c b/drivers/net/phy/fixed.c
index 68c99b4..34b9111 100644
--- a/drivers/net/phy/fixed.c
+++ b/drivers/net/phy/fixed.c
@@ -187,12 +187,29 @@ static struct phy_driver fixed_mdio_driver = {
.driver = { .owner = THIS_MODULE,},
 };
 
+static void fixed_mdio_release (struct device * dev)
+{
+   struct phy_device *phydev = container_of(dev, struct phy_device, dev);
+   struct mii_bus *bus = phydev->bus;
+   struct fixed_info *fixed = bus->priv;
+
+   kfree(phydev);
+   kfree(bus->dev);
+   kfree(bus);
+   kfree(fixed->regs);
+   kfree(fixed);
+}
+
 /*-
  *  This func is used to create all the necessary stuff, bind
  * the fixed phy driver and register all it on the mdio_bus_type.
- * speed is either 10 or 100, duplex is boolean.
+ * speed is either 10 or 100 or 1000, duplex is boolean.
  * number is used to create multiple fixed PHYs, so that several devices can
  * utilize them simultaneously.
+ *
+ * The device on mdio bus will look like :,
+ * bus_id = number 
+ * phy_id = speed+duplex.
  
*-*/
 static int fixed_mdio_register_device(int number, int speed, int duplex)
 {
@@ -221,6 +238,12 @@ static int fixed_mdio_register_device(int number, int 
speed, int duplex)
}
 
fixed->regs = kzalloc(MII_REGS_NUM*sizeof(int), GFP_KERNEL);
+   if (NULL == fixed->regs) {
+   kfree(dev);
+   kfree(new_bus);
+   kfree(fixed);
+   return -ENOMEM;
+   }
fixed->regs_num = MII_REGS_NUM;
fixed->phy_status.speed = speed;
fixed->phy_status.duplex = duplex;
@@ -249,57 +272,43 @@ static int fixed_mdio_register_device(int number, int 
speed, int duplex)
fixed->phydev = phydev;
 
if(NULL == phydev) {
-   err = -ENOMEM;
-   goto device_create_fail;
+   kfree(dev);
+   kfree(new_bus);
+   kfree(fixed->regs);
+   kfree(fixed);
+   return -ENOMEM;
}
 
phydev->irq = PHY_IGNORE_INTERRUPT;
phydev->dev.bus = &mdio_bus_type;
 
-   if(number)
-   snprintf(phydev->dev.bus_id, BUS_ID_SIZE,
-   "[EMAIL PROTECTED]:%d", number, speed, duplex);
-   else
-   snprintf(phydev->dev.bus_id, BUS_ID_SIZE,
-   "[EMAIL PROTECTED]:%d", speed, duplex);
+   snprintf(phydev->dev.bus_id, BUS_ID_SIZE,
+   "%d:%d", number, speed + duplex);
+
phydev->bus = new_bus;
 
+   phydev->dev.driver = &fixed_mdio_driver.driver;
+   phydev->dev.release = fixed_mdio_release;
+ 
+   err = phydev->dev.driver->probe(&phydev->dev);
+   if(err < 0) {
+   printk(KERN_ERR "Phy %s: problems with fixed driver\n",
+   phydev->dev.bus_id);
+   kfree(phydev);
+   kfree(dev);
+   kfree(new_bus);
+   kfree(fixed->regs);
+   kfree(fixed);
+   return err;
+   }
+ 
err = device_register(&phydev->dev);
if(err) {
printk(KERN_ERR "Phy %s failed to register\n",
  

[PATCH] PHY fixed driver: rework release path and update phy_id notation

2007-06-09 Thread Vitaly Bordug

device_bind_driver() error code returning has been fixed. 
release() function has been written, so that to free resources
in correct way; the release path is now clean.

Before the rework, it used to cause
 Device '[EMAIL PROTECTED]:1' does not have a release() function, it is broken
 and must be fixed.
 BUG: at drivers/base/core.c:104 device_release()
 
 Call Trace:  
  [] kobject_cleanup+0x53/0x7e
  [] kobject_release+0x0/0x9
  [] kref_put+0x74/0x81
  [] fixed_mdio_register_device+0x230/0x265
  [] fixed_init+0x1f/0x35
  [] init+0x147/0x2fb
  [] schedule_tail+0x36/0x92
  [] child_rip+0xa/0x12
  [] acpi_ds_init_one_object+0x0/0x83
  [] init+0x0/0x2fb
  [] child_rip+0x0/0x12  


Also changed the notation of the fixed phy definition on
mdio bus to the form of + to make it able to be used by
gianfar and ucc_geth that define phy_id strictly as "%d:%d"

Signed-off-by: Vitaly Bordug <[EMAIL PROTECTED]>

---

 drivers/net/phy/Kconfig |4 ++
 drivers/net/phy/fixed.c |   93 +++
 2 files changed, 57 insertions(+), 40 deletions(-)

diff --git a/drivers/net/phy/Kconfig b/drivers/net/phy/Kconfig
index 09b6f25..a938c48 100644
--- a/drivers/net/phy/Kconfig
+++ b/drivers/net/phy/Kconfig
@@ -71,4 +71,8 @@ config FIXED_MII_100_FDX
bool "Emulation for 100M Fdx fixed PHY behavior"
depends on FIXED_PHY
 
+config FIXED_MII_1000_FDX
+   bool "Emulation for 1000M Fdx fixed PHY behavior"
+   depends on FIXED_PHY
+
 endif # PHYLIB
diff --git a/drivers/net/phy/fixed.c b/drivers/net/phy/fixed.c
index 68c99b4..34b9111 100644
--- a/drivers/net/phy/fixed.c
+++ b/drivers/net/phy/fixed.c
@@ -187,12 +187,29 @@ static struct phy_driver fixed_mdio_driver = {
.driver = { .owner = THIS_MODULE,},
 };
 
+static void fixed_mdio_release (struct device * dev)
+{
+   struct phy_device *phydev = container_of(dev, struct phy_device, dev);
+   struct mii_bus *bus = phydev->bus;
+   struct fixed_info *fixed = bus->priv;
+
+   kfree(phydev);
+   kfree(bus->dev);
+   kfree(bus);
+   kfree(fixed->regs);
+   kfree(fixed);
+}
+
 /*-
  *  This func is used to create all the necessary stuff, bind
  * the fixed phy driver and register all it on the mdio_bus_type.
- * speed is either 10 or 100, duplex is boolean.
+ * speed is either 10 or 100 or 1000, duplex is boolean.
  * number is used to create multiple fixed PHYs, so that several devices can
  * utilize them simultaneously.
+ *
+ * The device on mdio bus will look like :,
+ * bus_id = number 
+ * phy_id = speed+duplex.
  
*-*/
 static int fixed_mdio_register_device(int number, int speed, int duplex)
 {
@@ -221,6 +238,12 @@ static int fixed_mdio_register_device(int number, int 
speed, int duplex)
}
 
fixed->regs = kzalloc(MII_REGS_NUM*sizeof(int), GFP_KERNEL);
+   if (NULL == fixed->regs) {
+   kfree(dev);
+   kfree(new_bus);
+   kfree(fixed);
+   return -ENOMEM;
+   }
fixed->regs_num = MII_REGS_NUM;
fixed->phy_status.speed = speed;
fixed->phy_status.duplex = duplex;
@@ -249,57 +272,43 @@ static int fixed_mdio_register_device(int number, int 
speed, int duplex)
fixed->phydev = phydev;
 
if(NULL == phydev) {
-   err = -ENOMEM;
-   goto device_create_fail;
+   kfree(dev);
+   kfree(new_bus);
+   kfree(fixed->regs);
+   kfree(fixed);
+   return -ENOMEM;
}
 
phydev->irq = PHY_IGNORE_INTERRUPT;
phydev->dev.bus = &mdio_bus_type;
 
-   if(number)
-   snprintf(phydev->dev.bus_id, BUS_ID_SIZE,
-   "[EMAIL PROTECTED]:%d", number, speed, duplex);
-   else
-   snprintf(phydev->dev.bus_id, BUS_ID_SIZE,
-   "[EMAIL PROTECTED]:%d", speed, duplex);
+   snprintf(phydev->dev.bus_id, BUS_ID_SIZE,
+   "%d:%d", number, speed + duplex);
+
phydev->bus = new_bus;
 
+   phydev->dev.driver = &fixed_mdio_driver.driver;
+   phydev->dev.release = fixed_mdio_release;
+ 
+   err = phydev->dev.driver->probe(&phydev->dev);
+   if(err < 0) {
+   printk(KERN_ERR "Phy %s: problems with fixed driver\n",
+   phydev->dev.bus_id);
+   kfree(phydev);
+   kfree(dev);
+   kfree(new_bus);
+   kfree(fixed->regs);
+   kfree(fixed);
+   return err;
+   }
+ 
err = device_register(&phydev->dev);
if(err) {
printk(KERN_ERR "Phy %s failed to register\n",
phydev->dev.bus_id);
-   goto bus_register_fail;
-   }
-
-   /*
-  the mdio bu

[1/2] 2.6.22-rc4: known regressions with patches v2

2007-06-09 Thread Michal Piotrowski

Hi all,

Here is a list of some known regressions in 2.6.22-rc4
with patches available.

Feel free to add new regressions/remove fixed etc.
http://kernelnewbies.org/known_regressions



Unclassified

Subject: kernel BUG at arch/i386/kernel/cpu/perfctr-watchdog.c:126!
References : http://lkml.org/lkml/2007/6/3/60
Submitter  : Udo A. Steinberg <[EMAIL PROTECTED]>
Handled-By : Björn Steinbrink <[EMAIL PROTECTED]>
Patch  : http://lkml.org/lkml/2007/6/8/23
Status : patch available



Memory management

Subject: bug in i386 MTRR initialization
References : http://lkml.org/lkml/2007/5/19/93
Submitter  : Andrea Righi <[EMAIL PROTECTED]>
Status : patch available



Networking

Subject: OOPS iproute2/tc/u32_destroy in 2.6.22-rc3-git6
References : http://lkml.org/lkml/2007/6/3/66
Submitter  : Strobl Anton <[EMAIL PROTECTED]>
Handled-By : Patrick McHardy <[EMAIL PROTECTED]>
Patch  : http://lkml.org/lkml/2007/6/3/137
Status : patch available

Subject: no irda0 interface (2.6.21 was OK), smsc does not find chip
References : http://lkml.org/lkml/2007/6/3/16
Submitter  : Andrey Borzenkov <[EMAIL PROTECTED]>
Handled-By : Samuel Ortiz <[EMAIL PROTECTED]>
Bjorn Helgaas <[EMAIL PROTECTED]>
Patch  : http://lkml.org/lkml/2007/6/7/237
Status : patch was suggested



Regards,
Michal

--
"Najbardziej brakowało mi twojego milczenia."
-- Andrzej Sapkowski "Coś więcej"
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH] NET: Multiqueue network device support.

2007-06-09 Thread jamal
On Sat, 2007-09-06 at 10:58 -0400, Leonid Grossman wrote:

> IMHO, in addition to current Intel and Neterion NICs, some/most upcoming
> NICs are likely to be multiqueue, since virtualization emerges as a
> major driver for hw designs (there are other things of course that drive
> hw, but these are complimentary to multiqueue).
> 
> PCI-SIG IOV extensions for pci spec are almost done, and a typical NIC
> (at least, typical 10GbE NIC that supports some subset of IOV) in the
> near future is likely to have at least 8  independent channels with its
> own tx/rx queue, MAC address, msi-x vector(s), reset that doesn't affect
> other channels, etc.

Leonid - any relation between that and data center ethernet? i.e
http://www.ieee802.org/3/ar/public/0503/wadekar_1_0503.pdf
It seems to desire to do virtualization as well. 
Is there any open spec for PCI-SIG IOV?

> Basically, each channel could be used as an independent NIC that just
> happens to share pci bus and 10GbE PHY with other channels (but has
> per-channel QoS and throughput guarantees).

Sounds very similar to data centre ethernet - except data centre
ethernet seems to map "channels" to rings; whereas the scheme you
describe maps a channel essentially to a virtual nic which seems to read
in the common case as a single tx, single rx ring. Is that right? If
yes, we should be able to do the virtual nics today without any changes
really since each one appears as a separate NIC. It will be a matter of
probably boot time partitioning and parametrization to create virtual
nics (ex of priorities of each virtual NIC etc).

> In a non-virtualized system, such NICs could be used in a mode when each
> channel runs on one core; this may eliminate some locking...  This mode
> will require btw deterministic session steering, current hashing
> approach in the patch is not sufficient; this is something we can
> contribute once Peter's code is in. 

I can actually see how the PCI-SIG approach using virtual NIC approach
could run on multiple CPUs (since each is no different from a NIC that
we have today). And our current Linux steering would also work just
fine.

In the case of non-virtual NICs, i am afraid i dont think it is as easy
as simple session steering - if you want to be generic that is; you may
wanna consider a more complex connection tracking i.e a grouping of
sessions as the basis for steering to a tx ring (and therefore tying to
a specific CPU).
If you are an ISP or a data center with customers partitioned based on
simple subnets, then i can see a simple classification based on subnets
being tied to a hw ring/CPU. And in such cases simple flow control on a
per ring basis makes sense.
Have you guys experimented on the the non-virtual case? And are you
doing the virtual case as a pair of tx/rx being a single virtual nic?

> In general, a consensus on kernel support for multiqueue NICs will be
> beneficial since multiqueue HW is here and other stacks already taking
> advantage of it. 

My main contention with the Peters approach has been to do with the 
propagating of flow control back to the qdisc queues. However, if this
PCI SIG standard is also desiring such an approach then it will shed a
different light.

cheers,
jamal

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH] NET: Multiqueue network device support.

2007-06-09 Thread Leonid Grossman


> -Original Message-
> From: J Hadi Salim [mailto:[EMAIL PROTECTED] On Behalf Of jamal
> Sent: Saturday, June 09, 2007 12:23 PM
> To: Leonid Grossman
> Cc: Waskiewicz Jr, Peter P; Patrick McHardy; [EMAIL PROTECTED];
> netdev@vger.kernel.org; [EMAIL PROTECTED]; Kok, Auke-jan H; Ramkrishna
> Vepa; Alex Aizman
> Subject: RE: [PATCH] NET: Multiqueue network device support.
> 
> On Sat, 2007-09-06 at 10:58 -0400, Leonid Grossman wrote:
> 
> > IMHO, in addition to current Intel and Neterion NICs, some/most
> upcoming
> > NICs are likely to be multiqueue, since virtualization emerges as a
> > major driver for hw designs (there are other things of course that
> drive
> > hw, but these are complimentary to multiqueue).
> >
> > PCI-SIG IOV extensions for pci spec are almost done, and a typical
> NIC
> > (at least, typical 10GbE NIC that supports some subset of IOV) in
the
> > near future is likely to have at least 8  independent channels with
> its
> > own tx/rx queue, MAC address, msi-x vector(s), reset that doesn't
> affect
> > other channels, etc.
> 
> Leonid - any relation between that and data center ethernet? i.e
> http://www.ieee802.org/3/ar/public/0503/wadekar_1_0503.pdf
> It seems to desire to do virtualization as well.

Not really. This is a very old presentation; you probably saw some newer
PR on Convergence Enhanced Ethernet, Congestion Free Ethernet etc. 
These efforts are in very early stages and arguably orthogonal to
virtualization, but in general having per channel QoS (flow control is
just a part of it) is a good thing. 

> Is there any open spec for PCI-SIG IOV?

I don't think so, the actual specs and event presentations at
www.pcisig.org are members-only, although there are many PRs about early
IOV support that may shed some light on the features.  

But my point was that while virtualization capabilities of upcoming NICs
may be not even relevant to Linux, the multi-channel hw designs (a side
effect of virtualization push, if you will) will be there and a
non-virtualized stack can take advantage of them.

Actually, our current 10GbE NICs have most of such multichannel
framework already shipping (in pre-IOV fashion), so the programming
manual on the website can probably give you a pretty good idea about how
multi-channel 10GbE NICs may look like. 

> 
> > Basically, each channel could be used as an independent NIC that
just
> > happens to share pci bus and 10GbE PHY with other channels (but has
> > per-channel QoS and throughput guarantees).
> 
> Sounds very similar to data centre ethernet - except data centre
> ethernet seems to map "channels" to rings; whereas the scheme you
> describe maps a channel essentially to a virtual nic which seems to
> read
> in the common case as a single tx, single rx ring. Is that right? If
> yes, we should be able to do the virtual nics today without any
changes
> really since each one appears as a separate NIC. It will be a matter
of
> probably boot time partitioning and parametrization to create virtual
> nics (ex of priorities of each virtual NIC etc).

Right, this is one deployment scenario for a multi-channel NIC, and it
will require very few changes in the stack (couple extra IOCTLS would be
nice).
There are two reasons why you still may want to have a generic
multi-channel support/awareness in the stack: 
1. Some users may want to have single ip interface with multiple
channels.
2. While multi-channel NICs will likely to be many, only "best-in-class"
will make the hw "channels" completely independent and able to operate
as a separate nic. Other implementations may have some limitations, and
will work as multi-channel API compliant devices but not nesseserily as
independent mac devices.
I agree though that supporting multi-channel APIs is a bigger effort.

> 
> > In a non-virtualized system, such NICs could be used in a mode when
> each
> > channel runs on one core; this may eliminate some locking...  This
> mode
> > will require btw deterministic session steering, current hashing
> > approach in the patch is not sufficient; this is something we can
> > contribute once Peter's code is in.
> 
> I can actually see how the PCI-SIG approach using virtual NIC approach
> could run on multiple CPUs (since each is no different from a NIC that
> we have today). And our current Linux steering would also work just
> fine.
> 
> In the case of non-virtual NICs, i am afraid i dont think it is as
easy
> as simple session steering - if you want to be generic that is; you
may
> wanna consider a more complex connection tracking i.e a grouping of
> sessions as the basis for steering to a tx ring (and therefore tying
to
> a specific CPU).
> If you are an ISP or a data center with customers partitioned based on
> simple subnets, then i can see a simple classification based on
subnets
> being tied to a hw ring/CPU. And in such cases simple flow control on
a
> per ring basis makes sense.
> Have you guys experimented on the the non-virtual case? And are you
> doing the vir

Re: [PATCH] NET: Multiqueue network device support.

2007-06-09 Thread Jeff Garzik

Leonid Grossman wrote:

But my point was that while virtualization capabilities of upcoming NICs
may be not even relevant to Linux, the multi-channel hw designs (a side
effect of virtualization push, if you will) will be there and a
non-virtualized stack can take advantage of them.



I'm looking at the current hardware virtualization efforts, and often 
grimacing.  A lot of these efforts assume that "virtual PCI devices" 
will be wonderful virtualization solutions, without stopping to think 
about global events that affect all such devices, such as silicon resets 
or errata workarounds.  In the real world, you wind up having to 
un-virtualize to deal with certain exceptional events.


But as you point out, these hardware virt efforts can bestow benefits on 
non-virtualized stacks.


Jeff
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/4] NetXen: Fix link status messages

2007-06-09 Thread Jeff Garzik

Mithlesh Thukral wrote:

-   if ((netif_running(netdev)) && !netif_carrier_ok(netdev)) {
-   printk(KERN_INFO "%s port %d, %s carrier is now ok\n",
-  netxen_nic_driver_name, adapter->portnum, netdev->name);
+   if ((netdev->flags & IFF_UP) && !netif_carrier_ok(netdev) &&
+   netxen_nic_link_ok(adapter) ) {
+   printk(KERN_INFO "%s %s (port %d), Link is up\n",
+  netxen_nic_driver_name, netdev->name, 
adapter->portnum);
netif_carrier_on(netdev);
-   }
-
-   if (netif_queue_stopped(netdev))
netif_wake_queue(netdev);
+   } else if(!(netdev->flags & IFF_UP) && netif_carrier_ok(netdev)) {
+   printk(KERN_ERR "%s %s Link is Down\n",
+   netxen_nic_driver_name, netdev->name);
+   netif_carrier_off(netdev);
+   netif_stop_queue(netdev);



Most of the patch is OK, but by substituting IFF_UP tests for 
netif_running(), you are removing race-free, correct tests and replacing 
them with incorrect, racy tests.


NAK the IFF_UP changes.  the rest looks OK.

Jeff


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2.6.21.3] bonding: Fix 802.3ad no carrier on "no partner found" instance

2007-06-09 Thread Jeff Garzik

Laurent Chavey wrote:

Remove the requirement to have at least one configured partner to
enable the operation of links. The later is necessary to have the code
in compliance with section 43.3.9 of IEEE 802.3,

Signed-off-by: Laurent Chavey <[EMAIL PROTECTED]>


Looks OK but patch is corrupted:

[EMAIL PROTECTED] netdev-2.6]$ git-am --signoff --utf8 /g/tmp/mbox

Applying 'bonding: Fix 802.3ad no carrier on "no partner found" instance'

fatal: patch fragment without header at line 7: @@ -2303,19 +2303,17 @@
Patch failed at 0001.
When you have resolved this problem run "git-am --resolved".
If you would prefer to skip this patch, instead run "git-am --skip".

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] ibmveth: Fix h_free_logical_lan error on pool resize

2007-06-09 Thread Jeff Garzik

Brian King wrote:

When attempting to activate additional rx buffer pools on an ibmveth interface 
that
was not yet up, the error below was seen. The patch fixes this by only closing
and opening the interface to activate the resize if the interface is already
opened.


applied 1-2 to #upstream-fixes


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: typo in via-velocity.c

2007-06-09 Thread Jeff Garzik

Dave Jones wrote:

http://bugzilla.kernel.org/show_bug.cgi?id=8160

Signed-off-by: Dave Jones <[EMAIL PROTECTED]>


applied


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/4] NetXen: Fix ping issue after reboot on Blades with 3.4.19 firmware

2007-06-09 Thread Jeff Garzik

Mithlesh Thukral wrote:

NetXen: Fix initialization and subsequent ping issue on 3.4.19 firmware
This patch fixes the ping problem seen X/PBlades after the adapter's 
firmware was moved to 3.4.19. After configured interface up, ping 
failed. 
NetXen adapter couldn't accept ARP broadcast packet. Manual addition of

MAC address in the ARP table, made ping work.
NetXen adapter should finish initilization after system boot. But looks
NetXen adapter didn't initilization correctly after system boot up.
So have to re-load the firmware again in probe routine.
Also re-initilization netxen_config_0 and netxen_config_1 registers.

Signed-off by: Wen Xiong <[EMAIL PROTECTED]>
Signed-off by: Mithlesh Thukral <[EMAIL PROTECTED]>


applied


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/4] NetXen: Fix compile failure seen on PPC architecture

2007-06-09 Thread Jeff Garzik

Mithlesh Thukral wrote:

NetXen: Add NETXEN prefixes to macros to clean them up.
This is a cleanup patch which adds NETXEN prefix to some stand 
alone macro names.

These posed compile errors when NetXen driver was backported to 2.6.9
on PPC architecture as macros like USER_START are defined in file
arch/ppc64/mm/hash_utils.c

Signed-off-by: Andy Gospodarek <[EMAIL PROTECTED]>
Signed-off by: Wen Xiong <[EMAIL PROTECTED]>
Acked-off by: Mithlesh Thukral <[EMAIL PROTECTED]>


applied


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2.6.22-rc4] ehea: Fixed possible kernel panic on VLAN packet recv

2007-06-09 Thread Jeff Garzik

Thomas Klein wrote:

This patch fixes a possible kernel panic due to not checking the vlan group
when processing received VLAN packets and a malfunction in VLAN/hypervisor
registration.


Signed-off-by: Thomas Klein <[EMAIL PROTECTED]>


applied


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] phylib: add RGMII-ID mode to the Marvell m88e1111 PHY to fix broken ucc_geth

2007-06-09 Thread Jeff Garzik

Li Yang wrote:

From: Kim Phillips <[EMAIL PROTECTED]>

Support for configuring RGMII-ID (RGMII with internal delay) mode on the
88e and 88e1145.  Ucc_geth on MPC8360EMDS(the main user of ucc_geth)
is broken after changed to use phylib.  It is fixed by adding this
internal delay.

Also renamed 88es -> 88e (no references to an 88es part were
found), and fixed some whitespace.

Signed-off-by: Kim Phillips <[EMAIL PROTECTED]>
Signed-off-by: Li Yang <[EMAIL PROTECTED]>
---
Please push this to Linus before 2.6.22 rc phase ends.  The regression
has caused serious breakage to ucc_geth driver.

drivers/net/phy/marvell.c |   62 
+++--

1 files changed, 54 insertions(+), 8 deletions(-)


applied


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] net: fix typo in drivers/net/usb/Kconfig

2007-06-09 Thread Sam Ravnborg
Replace invisible character with a space.

The diff looks like this on my terminal:
-Choose this option if you're using a host-to-host cable
-with one of these chips.
+ Choose this option if you're using a host-to-host cable
+ with one of these chips.

Reported by: Massimo Maiurana <[EMAIL PROTECTED]>

Signed-off-by: Sam Ravnborg <[EMAIL PROTECTED]>
Cc: Massimo Maiurana <[EMAIL PROTECTED]>
---
diff --git a/drivers/net/usb/Kconfig b/drivers/net/usb/Kconfig
index 3de564b..8dc09a3 100644
--- a/drivers/net/usb/Kconfig
+++ b/drivers/net/usb/Kconfig
@@ -313,8 +313,8 @@ config USB_KC2190
boolean "KT Technology KC2190 based cables (InstaNet)"
depends on USB_NET_CDC_SUBSET && EXPERIMENTAL
help
- Choose this option if you're using a host-to-host cable
- with one of these chips.
+ Choose this option if you're using a host-to-host cable
+ with one of these chips.
 
 config USB_NET_ZAURUS
tristate "Sharp Zaurus (stock ROMs) and compatible"
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH] NET: Multiqueue network device support.

2007-06-09 Thread jamal
On Sat, 2007-09-06 at 17:23 -0400, Leonid Grossman wrote:

> Not really. This is a very old presentation; you probably saw some newer
> PR on Convergence Enhanced Ethernet, Congestion Free Ethernet etc.

Not been keeping up to date in that area.

> These efforts are in very early stages and arguably orthogonal to
> virtualization, but in general having per channel QoS (flow control is
> just a part of it) is a good thing. 

our definition of "channel" on linux so far is a netdev
(not a DMA ring). A netdev is the entity that can be bound to a CPU.
Link layer flow control terminates (and emanates) from the netdev.

> But my point was that while virtualization capabilities of upcoming NICs
> may be not even relevant to Linux, the multi-channel hw designs (a side
> effect of virtualization push, if you will) will be there and a
> non-virtualized stack can take advantage of them.

Makes sense...

> Actually, our current 10GbE NICs have most of such multichannel
> framework already shipping (in pre-IOV fashion), so the programming
> manual on the website can probably give you a pretty good idea about how
> multi-channel 10GbE NICs may look like. 

Ok, thanks.

> Right, this is one deployment scenario for a multi-channel NIC, and it
> will require very few changes in the stack (couple extra IOCTLS would be
> nice).

Essentially a provisioning interface.

> There are two reasons why you still may want to have a generic
> multi-channel support/awareness in the stack: 
> 1. Some users may want to have single ip interface with multiple
> channels.
> 2. While multi-channel NICs will likely to be many, only "best-in-class"
> will make the hw "channels" completely independent and able to operate
> as a separate nic. Other implementations may have some limitations, and
> will work as multi-channel API compliant devices but not nesseserily as
> independent mac devices.
> I agree though that supporting multi-channel APIs is a bigger effort.

IMO, the challenges you describe above are solvable via a parent
netdevice (similar to bonding) with children being the virtual NICs. The
IP address is attached to the parent. Of course the other model is not
to show the parent device at all.

> To a degree. We have quite a bit of testing done in non-virtual OS (not
> in Linux though), using channels with tx/rx rings, msi-x etc as
> independent NICs. Flow control was not a focus since the fabric
> typically was not congested in these tests, but in theory per-channel
> flow control should work reasonably well. Of course, flow control is
> only part of resource sharing problem. 

In the current model - flow control to the s/ware queueing level (qdisc)
is implicit. i.e hardware receives pause frames - stops sending; ring
becomes full as hardware sends, netdev tx path gets shut until things
open up when 

> This is not what I'm saying :-). The IEEE link you sent shows that
> per-link flow control is a separate effort, and it will likely to take
> time to become a standard. 

Ok, my impression was it was happening already or it will happen
tommorow morning ;->

> Also, (besides the shared link) the channels will share pci bus.
> 
> One solution could be to provide a generic API for QoS level to a
> channel 
> (and also to a generic NIC!). 
> Internally, device driver can translate QoS requirements into flow
> control, pci bus bandwidth, and whatever else is shared on the physical
> NIC between the channels.
> As always, as some of that code becomes common between the drivers it
> can migrate up.

indeed. 

cheers,
jamal


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[git patches] net driver fixes

2007-06-09 Thread Jeff Garzik

A big batch of fixes for the newly added libertas wireless driver is
coming soon, too.

Please pull from 'upstream-linus' branch of
master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/netdev-2.6.git 
upstream-linus

to receive the following updates:

 drivers/net/ehea/ehea.h |2 +-
 drivers/net/ehea/ehea_main.c|   12 ++---
 drivers/net/ibmveth.c   |   80 +--
 drivers/net/netxen/netxen_nic.h |   47 +-
 drivers/net/netxen/netxen_nic_ethtool.c |8 ++--
 drivers/net/netxen/netxen_nic_hw.c  |   12 ++--
 drivers/net/netxen/netxen_nic_init.c|   23 +
 drivers/net/netxen/netxen_nic_main.c|7 +++
 drivers/net/netxen/netxen_nic_niu.c |8 +--
 drivers/net/phy/marvell.c   |   62 +---
 drivers/net/usb/Kconfig |4 +-
 drivers/net/via-velocity.c  |2 +-
 12 files changed, 172 insertions(+), 95 deletions(-)

Brian King (2):
  ibmveth: Fix h_free_logical_lan error on pool resize
  ibmveth: Automatically enable larger rx buffer pools for larger mtu

Dave Jones (1):
  typo in via-velocity.c

Kim Phillips (1):
  phylib: add RGMII-ID mode to the Marvell m88e PHY to fix broken 
ucc_geth

Mithlesh Thukral (2):
  NetXen: Fix ping issue after reboot on Blades with 3.4.19 firmware
  NetXen: Fix compile failure seen on PPC architecture

Sam Ravnborg (1):
  net: fix typo in drivers/net/usb/Kconfig

Thomas Klein (1):
  ehea: Fixed possible kernel panic on VLAN packet recv

diff --git a/drivers/net/ehea/ehea.h b/drivers/net/ehea/ehea.h
index e85a933..c0f81b5 100644
--- a/drivers/net/ehea/ehea.h
+++ b/drivers/net/ehea/ehea.h
@@ -39,7 +39,7 @@
 #include 
 
 #define DRV_NAME   "ehea"
-#define DRV_VERSION"EHEA_0061"
+#define DRV_VERSION"EHEA_0064"
 
 #define EHEA_MSG_DEFAULT (NETIF_MSG_LINK | NETIF_MSG_TIMER \
| NETIF_MSG_RX_ERR | NETIF_MSG_TX_ERR)
diff --git a/drivers/net/ehea/ehea_main.c b/drivers/net/ehea/ehea_main.c
index 152bb20..9e13433 100644
--- a/drivers/net/ehea/ehea_main.c
+++ b/drivers/net/ehea/ehea_main.c
@@ -451,7 +451,8 @@ static struct ehea_cqe *ehea_proc_rwqes(struct net_device 
*dev,
processed_rq3++;
}
 
-   if (cqe->status & EHEA_CQE_VLAN_TAG_XTRACT)
+   if ((cqe->status & EHEA_CQE_VLAN_TAG_XTRACT)
+   && port->vgrp)
vlan_hwaccel_receive_skb(skb, port->vgrp,
 cqe->vlan_tag);
else
@@ -1910,10 +1911,7 @@ static void ehea_vlan_rx_register(struct net_device *dev,
goto out;
}
 
-   if (grp)
-   memset(cb1->vlan_filter, 0, sizeof(cb1->vlan_filter));
-   else
-   memset(cb1->vlan_filter, 0xFF, sizeof(cb1->vlan_filter));
+   memset(cb1->vlan_filter, 0, sizeof(cb1->vlan_filter));
 
hret = ehea_h_modify_ehea_port(adapter->handle, port->logical_port_id,
   H_PORT_CB1, H_PORT_CB1_ALL, cb1);
@@ -1947,7 +1945,7 @@ static void ehea_vlan_rx_add_vid(struct net_device *dev, 
unsigned short vid)
}
 
index = (vid / 64);
-   cb1->vlan_filter[index] |= ((u64)(1 << (vid & 0x3F)));
+   cb1->vlan_filter[index] |= ((u64)(0x8000 >> (vid & 0x3F)));
 
hret = ehea_h_modify_ehea_port(adapter->handle, port->logical_port_id,
   H_PORT_CB1, H_PORT_CB1_ALL, cb1);
@@ -1982,7 +1980,7 @@ static void ehea_vlan_rx_kill_vid(struct net_device *dev, 
unsigned short vid)
}
 
index = (vid / 64);
-   cb1->vlan_filter[index] &= ~((u64)(1 << (vid & 0x3F)));
+   cb1->vlan_filter[index] &= ~((u64)(0x8000 >> (vid & 0x3F)));
 
hret = ehea_h_modify_ehea_port(adapter->handle, port->logical_port_id,
   H_PORT_CB1, H_PORT_CB1_ALL, cb1);
diff --git a/drivers/net/ibmveth.c b/drivers/net/ibmveth.c
index 3bec0f7..6ec3d50 100644
--- a/drivers/net/ibmveth.c
+++ b/drivers/net/ibmveth.c
@@ -915,17 +915,36 @@ static int ibmveth_change_mtu(struct net_device *dev, int 
new_mtu)
 {
struct ibmveth_adapter *adapter = dev->priv;
int new_mtu_oh = new_mtu + IBMVETH_BUFF_OH;
-   int i;
+   int reinit = 0;
+   int i, rc;
 
if (new_mtu < IBMVETH_MAX_MTU)
return -EINVAL;
 
+   for (i = 0; i < IbmVethNumBufferPools; i++)
+   if (new_mtu_oh < adapter->rx_buff_pool[i].buff_size)
+   break;
+
+   if (i == IbmVethNumBufferPools)
+   return -EINVAL;
+
/* Look for an active buffer pool that can hold the new MTU */
for(i = 0; irx_buff_pool[i].active)
-   continue;
+   if (!adapter->rx_buff_pool[i].active) {
+   

Re: Linux 2.6.22-rc4

2007-06-09 Thread Avuton Olrich

On 6/4/07, Linus Torvalds <[EMAIL PROTECTED]> wrote:


So -rc4 is out there now, hopefully shrinking the regression list further.


(CCd net device MAINTAINERs, I'm not sure, but nv_alloc_rx is forcedeth)

This server has been up for about a day now and I'm starting to get
some bad looking messages when plenty gets transferred over NFS:

(please excuse the links, without them this message would certainly be
too large)
Linux version 2.6.22-rc4 ([EMAIL PROTECTED]) (gcc version 4.2.0
(Gentoo 4.2.0)) #6 PREEMPT Fri Jun 8 18:54:00 PDT 2007
(it's actually with the latest rsdl patch)

http://avuton.googlepages.com/config
http://avuton.googlepages.com/lspci-vvv
http://avuton.googlepages.com/ioports
http://avuton.googlepages.com/iomem

===
Mem-info:
DMA per-cpu:
CPU0: Hot: hi:0, btch:   1 usd:   0   Cold: hi:0, btch:   1 usd:   0
Normal per-cpu:
CPU0: Hot: hi:  186, btch:  31 usd:  87   Cold: hi:   62, btch:  15 usd:  57
Active:43224 inactive:168748 dirty:2679 writeback:24182 unstable:0
free:4334 slab:6997 mapped:6927 pagetables:286 bounce:0
DMA free:3520kB min:68kB low:84kB high:100kB active:1064kB
inactive:5492kB present:16256kB pages_sc
anned:0 all_unreclaimable? no
lowmem_reserve[]: 0 873
Normal free:13816kB min:3744kB low:4680kB high:5616kB active:171832kB
inactive:669500kB present:894
080kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 0
DMA: 38*4kB 1*8kB 0*16kB 3*32kB 1*64kB 1*128kB 2*256kB 1*512kB
0*1024kB 1*2048kB 0*4096kB = 3520kB
Normal: 3278*4kB 0*8kB 0*16kB 0*32kB 1*64kB 1*128kB 0*256kB 1*512kB
0*1024kB 0*2048kB 0*4096kB = 13
816kB
Swap cache: add 0, delete 0, find 0/0, race 0+0
Free swap  = 498004kB
Total swap = 498004kB
Free swap:   498004kB
229376 pages of RAM
0 pages of HIGHMEM
3350 reserved pages
119197 pages shared
0 pages swap cached
2679 pages dirty
24182 pages writeback
6927 pages mapped
6997 pages slab
286 pages pagetables
swapper: page allocation failure. order:1, mode:0x4020
[] __alloc_pages+0x239/0x2f0
[] nv_alloc_rx+0xf6/0x1a0
[] __slab_alloc+0x422/0x500
[] nv_alloc_rx+0xf6/0x1a0
[] __kmalloc_track_caller+0x65/0x70
[] nv_alloc_rx+0xf6/0x1a0
[] __alloc_skb+0x55/0x120
[] nv_alloc_rx+0xf6/0x1a0
[] getnstimeofday+0x2f/0xe0
[] nv_nic_irq+0x2a8/0x590
[] nv_nic_irq+0x39/0x590
[] handle_IRQ_event+0x25/0x50
[] handle_fasteoi_irq+0x5b/0xe0
[] do_IRQ+0x4a/0x80
[] __netdev_alloc_skb+0x22/0x50
[] __netdev_alloc_skb+0x22/0x50
[] common_interrupt+0x23/0x28
[] __netdev_alloc_skb+0x22/0x50
[] __kmalloc_track_caller+0x3e/0x70
[] __netdev_alloc_skb+0x22/0x50
[] __alloc_skb+0x55/0x120
[] __netdev_alloc_skb+0x22/0x50
[] skge_poll+0x2c4/0x600
[] get_next_timer_interrupt+0x1a7/0x230
[] skge_poll+0x128/0x600
[] net_rx_action+0x61/0x170
[] __do_softirq+0x42/0x90
[] do_softirq+0x27/0x30
[] irq_exit+0x65/0x70
[] do_IRQ+0x4f/0x80
[] do_IRQ+0x4f/0x80
[] common_interrupt+0x23/0x28
[] default_idle+0x0/0x40
[] default_idle+0x2a/0x40
[] cpu_idle+0x50/0x70
[] start_kernel+0x215/0x260
[] unknown_bootoption+0x0/0x260
===
--
avuton
--
Anyone who quotes me in their sig is an idiot. -- Rusty Russell.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html