date:20071213

[PATCH 2.6.25] [IPV4] Thresholds in fib_trie.c are used as consts, so make them const.

2007-12-13 Thread Denis V. Lunev

[IPV4] Thresholds in fib_trie.c are used as consts, so make them const.

There are several thresholds for trie fib hash management. They are used
in the code as a constants. Make them constants from the compiler point of
view.

Signed-off-by: Denis V. Lunev <[EMAIL PROTECTED]>
---
diff --git a/net/ipv4/fib_trie.c b/net/ipv4/fib_trie.c
--- a/net/ipv4/fib_trie.c
+++ b/net/ipv4/fib_trie.c
@@ -299,10 +299,10 @@ static inline void check_tnode(const struct tnode *tn)
WARN_ON(tn && tn->pos+tn->bits > 32);
 }
 
-static int halve_threshold = 25;
-static int inflate_threshold = 50;
-static int halve_threshold_root = 8;
-static int inflate_threshold_root = 15;
+static const int halve_threshold = 25;
+static const int inflate_threshold = 50;
+static const int halve_threshold_root = 8;
+static const int inflate_threshold_root = 15;
 
 
 static void __alias_free_mem(struct rcu_head *head)
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] MACB: clear transmit buffers properly on TX Underrun

2007-12-13 Thread Gregory CLEMENT


Hi,
I generated this patch for linux 2.6.24-rc5 and test it on AT91SAM9263 
with iperf.


From: Gregory CLEMENT <[EMAIL PROTECTED]>
Date: Wed, 12 Dec 2007 18:10:14 +0100
Subject: [PATCH] MACB: clear transmit buffers properly on TX Underrun

Initially transmit buffer pointers were only reset. But buffer descriptors
were possibly still set as ready, and buffer in upper layer was not
freed. This caused driver hang under big load.
Now reset clean properly the buffer descriptor and freed upper layer.

Signed-off-by: Gregory CLEMENT <[EMAIL PROTECTED]>
---
drivers/net/macb.c |   26 +-
1 files changed, 25 insertions(+), 1 deletions(-)

diff --git a/drivers/net/macb.c b/drivers/net/macb.c
index 047ea7b..2ee1dab 100644
--- a/drivers/net/macb.c
+++ b/drivers/net/macb.c
@@ -307,9 +307,33 @@ static void macb_tx(struct macb *bp)
(unsigned long)status);

if (status & MACB_BIT(UND)) {
+int i;
printk(KERN_ERR "%s: TX underrun, resetting buffers\n",
-   bp->dev->name);
+   bp->dev->name);
+
+head = bp->tx_head;
+
+/* free transmit buffer in upper layer*/
+for (tail = bp->tx_tail; tail != head; tail = NEXT_TX(tail)) {
+struct ring_info *rp = &bp->tx_skb[tail];
+struct sk_buff *skb = rp->skb;
+
+BUG_ON(skb == NULL);
+
+rmb();
+
+dma_unmap_single(&bp->pdev->dev, rp->mapping, skb->len,
+ DMA_TO_DEVICE);
+rp->skb = NULL;
+dev_kfree_skb_irq(skb);
+}
+
+/*Mark all the buffer as used to avoid sending a lost buffer*/
+for (i = 0; i < RX_RING_SIZE; i++)
+bp->tx_ring[i].ctrl = MACB_BIT(TX_USED);
+
bp->tx_head = bp->tx_tail = 0;
+
}

if (!(status & MACB_BIT(COMP)))
--
1.5.3.7


--
Gregory CLEMENT
Adeneo
Adetel Group
2, chemin du Ruisseau
69134 ECULLY - FRANCE
Tél. : +33 (0)4 72 18 08 40 - Fax : +33 (0)4 72 18 08 41

www.adetelgroup.com

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [kvm-devel] [virtio-net][PATCH] Don't arm tx hrtimer with a constant 500us each transmit

2007-12-13 Thread Dor Laor


Christian Borntraeger wrote:


Am Mittwoch, 12. Dezember 2007 schrieb Dor Laor:
> Christian Borntraeger wrote:
> >
> > Am Mittwoch, 12. Dezember 2007 schrieb Dor Laor:
> > > --- a/drivers/net/virtio_net.c
> > > +++ b/drivers/net/virtio_net.c
> > > @@ -406,10 +405,10 @@ again:
> >
> > Hmm, while I agree in general with the patch, I fail to find the 
proper

> > version of virtio_net where this patch applies. I tried kvm.git and
> > linux-2.6.git from kernel.org. Can you give me a pointer to the
repository
> > where you work on virtio?
> >
> Sorry for that, I added some debug prints of my one.
> Here it is: *git clone
> git*://kvm.*qumranet*.com/home/*dor*/src/linux-2.6-nv use branch 
'virtio'.


Ah, ok. I will look into that branch.

> BTW: what git repository do you use?

I use Avis git from kernel.org:
git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvm


I patch it with Anthony's http://hg.codemonkey.ws/linux-virtio/
Over that I send the patches. One can use my repository directly.


Christian



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] r6040 various cleanups

2007-12-13 Thread Florian Fainelli

Hi Francois,

Francois Romieu a écrit :
> Thanks, I have split it in parts. The serie should be available shortly at:
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/romieu/netdev-2.6.git r6040

You are welcome, thank you for taking care of this driver !

> 
> Please note that:
> 1. TIMER_WUT has been removed as it was not used any more
> 2. I have kept the difference below. Was the patch really supposed to update
>the same error counter twice ?

You are right, I did not pay attention to this, thanks for fixing !

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

net-2.6.25 section warnings

2007-12-13 Thread Andrew Morton

WARNING: vmlinux.o(.init.text+0x1cfc4): Section mismatch: reference to 
.exit.text:tcpv6_exit (between 'inet6_init' and 'ac6_proc_init')
WARNING: vmlinux.o(.init.text+0x1cfc9): Section mismatch: reference to 
.exit.text:udplitev6_exit (between 'inet6_init' and 'ac6_proc_init')
WARNING: vmlinux.o(.init.text+0x1cfce): Section mismatch: reference to 
.exit.text:udpv6_exit (between 'inet6_init' and 'ac6_proc_init')
WARNING: vmlinux.o(.init.text+0x1cfdd): Section mismatch: reference to 
.exit.text:addrconf_cleanup (between 'inet6_init' and 'ac6_proc_init')
WARNING: vmlinux.o(.init.text+0x1d03c): Section mismatch: reference to 
.exit.text:rawv6_exit (between 'inet6_init' and 'ac6_proc_init')

http://userweb.kernel.org/~akpm/config-x.txt (x86_64)
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [PATCH 1/7] [NETDEV]: e1000 Fix possible causing oops of net_rx_action

2007-12-13 Thread Joonwoo Park

2007/12/12, Joonwoo Park <[EMAIL PROTECTED]>:
> [NETDEV]: e1000 Fix possible causing oops of net_rx_action
> returning work_done == weight as true after calling netif_rx_complete will 
> cause oops in net_rx_action.
> 

I tried two types of patches for oops and ifconfig down hang for e1000 first.
Just blowing netif_running up is not best solution I think, it makes ifconfig 
down hang at least for e1000.
I would like to listen to the others suggestions courteously, please enlighten 
me :-)

The first:
- if !netif_running, stop receiving process, up to 64 (e1000) packets in the 
queue would be dropped.
---
 drivers/net/e1000/e1000_main.c |8 
 1 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/net/e1000/e1000_main.c b/drivers/net/e1000/e1000_main.c
index 4f37506..664312b 100644
--- a/drivers/net/e1000/e1000_main.c
+++ b/drivers/net/e1000/e1000_main.c
@@ -3938,12 +3938,12 @@ e1000_clean(struct napi_struct *napi, int budget)
spin_unlock(&adapter->tx_queue_lock);
}
 
-   adapter->clean_rx(adapter, &adapter->rx_ring[0],
- &work_done, budget);
+   if (likely(netif_running(poll_dev)))
+   adapter->clean_rx(adapter, &adapter->rx_ring[0],
+   &work_done, budget);
 
/* If no Tx and not enough Rx work done, exit the polling mode */
-   if ((!tx_cleaned && (work_done == 0)) ||
-  !netif_running(poll_dev)) {
+   if ((!tx_cleaned && (work_done == 0))) {
 quit_polling:
if (likely(adapter->itr_setting & 3))
e1000_set_itr(adapter);
---

The second:
- if !netif_running, receive up to weight - 1 packets, one packets in the queue 
can be dropped.
---
 drivers/net/e1000/e1000_main.c |9 +
 1 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/net/e1000/e1000_main.c b/drivers/net/e1000/e1000_main.c
index 4f37506..8e53c5b 100644
--- a/drivers/net/e1000/e1000_main.c
+++ b/drivers/net/e1000/e1000_main.c
@@ -3919,7 +3919,7 @@ e1000_clean(struct napi_struct *napi, int budget)
 {
struct e1000_adapter *adapter = container_of(napi, struct 
e1000_adapter, napi);
struct net_device *poll_dev = adapter->netdev;
-   int tx_cleaned = 0, work_done = 0;
+   int tx_cleaned = 0, work_done = 0, running;
 
/* Must NOT use netdev_priv macro here. */
adapter = poll_dev->priv;
@@ -3938,12 +3938,13 @@ e1000_clean(struct napi_struct *napi, int budget)
spin_unlock(&adapter->tx_queue_lock);
}
 
+   running = netif_running(poll_dev);
+
adapter->clean_rx(adapter, &adapter->rx_ring[0],
- &work_done, budget);
+ &work_done, budget - !running);
 
/* If no Tx and not enough Rx work done, exit the polling mode */
-   if ((!tx_cleaned && (work_done == 0)) ||
-  !netif_running(poll_dev)) {
+   if ((!tx_cleaned && (work_done == 0)) || !running) {
 quit_polling:
if (likely(adapter->itr_setting & 3))
e1000_set_itr(adapter);
---


Thanks,
Joonwoo

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: net-2.6.25 section warnings

2007-12-13 Thread Daniel Lezcano


Andrew Morton wrote:

WARNING: vmlinux.o(.init.text+0x1cfc4): Section mismatch: reference to 
.exit.text:tcpv6_exit (between 'inet6_init' and 'ac6_proc_init')
WARNING: vmlinux.o(.init.text+0x1cfc9): Section mismatch: reference to 
.exit.text:udplitev6_exit (between 'inet6_init' and 'ac6_proc_init')
WARNING: vmlinux.o(.init.text+0x1cfce): Section mismatch: reference to 
.exit.text:udpv6_exit (between 'inet6_init' and 'ac6_proc_init')
WARNING: vmlinux.o(.init.text+0x1cfdd): Section mismatch: reference to 
.exit.text:addrconf_cleanup (between 'inet6_init' and 'ac6_proc_init')
WARNING: vmlinux.o(.init.text+0x1d03c): Section mismatch: reference to 
.exit.text:rawv6_exit (between 'inet6_init' and 'ac6_proc_init')

http://userweb.kernel.org/~akpm/config-x.txt (x86_64)
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html



My fault. I will fix that.
Sorry.

  -- Daniel

--






















































Sauf indication contraire ci-dessus:
Compagnie IBM France
Siège Social : Tour Descartes, 2, avenue Gambetta, La Défense 5, 92400
Courbevoie
RCS Nanterre 552 118 465
Forme Sociale : S.A.S.
Capital Social : 542.737.118 ?
SIREN/SIRET : 552 118 465 02430
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH][NET-2.6.25][IPV6] fix section mismatch warnings

2007-12-13 Thread Daniel Lezcano



Subject: fix section mismatch warnings
From: Daniel Lezcano <[EMAIL PROTECTED]>

Removed useless and buggy __exit section in the different 
ipv6 subsystems. Otherwise they will be called inside an
init section during rollbacking in case of an error in the
protocol initialization.

Signed-off-by: Daniel Lezcano <[EMAIL PROTECTED]>
---
 net/ipv6/addrconf.c |2 +-
 net/ipv6/raw.c  |2 +-
 net/ipv6/tcp_ipv6.c |2 +-
 net/ipv6/udp.c  |2 +-
 net/ipv6/udplite.c  |2 +-
 5 files changed, 5 insertions(+), 5 deletions(-)

Index: net-2.6.25/net/ipv6/addrconf.c
===
--- net-2.6.25.orig/net/ipv6/addrconf.c
+++ net-2.6.25/net/ipv6/addrconf.c
@@ -4223,7 +4223,7 @@ errout:
 	return err;
 }
 
-void __exit addrconf_cleanup(void)
+void addrconf_cleanup(void)
 {
 	struct net_device *dev;
 	struct inet6_ifaddr *ifa;
Index: net-2.6.25/net/ipv6/raw.c
===
--- net-2.6.25.orig/net/ipv6/raw.c
+++ net-2.6.25/net/ipv6/raw.c
@@ -1321,7 +1321,7 @@ out:
 	return ret;
 }
 
-void __exit rawv6_exit(void)
+void rawv6_exit(void)
 {
 	inet6_unregister_protosw(&rawv6_protosw);
 }
Index: net-2.6.25/net/ipv6/tcp_ipv6.c
===
--- net-2.6.25.orig/net/ipv6/tcp_ipv6.c
+++ net-2.6.25/net/ipv6/tcp_ipv6.c
@@ -2194,7 +2194,7 @@ out_tcpv6_protosw:
 	goto out;
 }
 
-void __exit tcpv6_exit(void)
+void tcpv6_exit(void)
 {
 	sock_release(tcp6_socket);
 	inet6_unregister_protosw(&tcpv6_protosw);
Index: net-2.6.25/net/ipv6/udp.c
===
--- net-2.6.25.orig/net/ipv6/udp.c
+++ net-2.6.25/net/ipv6/udp.c
@@ -1035,7 +1035,7 @@ out_udpv6_protocol:
 	goto out;
 }
 
-void __exit udpv6_exit(void)
+void udpv6_exit(void)
 {
 	inet6_unregister_protosw(&udpv6_protosw);
 	inet6_del_protocol(&udpv6_protocol, IPPROTO_UDP);
Index: net-2.6.25/net/ipv6/udplite.c
===
--- net-2.6.25.orig/net/ipv6/udplite.c
+++ net-2.6.25/net/ipv6/udplite.c
@@ -96,7 +96,7 @@ out_udplitev6_protocol:
 	goto out;
 }
 
-void __exit udplitev6_exit(void)
+void udplitev6_exit(void)
 {
 	inet6_unregister_protosw(&udplite6_protosw);
 	inet6_del_protocol(&udplitev6_protocol, IPPROTO_UDPLITE);

[PATCH][XFRM] Fix potential race vs xfrm_state(only)_find and xfrm_hash_resize.

2007-12-13 Thread Pavel Emelyanov

The _find calls calculate the hash value using the 
xfrm_state_hmask, without the xfrm_state_lock. But the 
value of this mask can change in the _resize call under
the state_lock, so we risk to fail in finding the desired 
entry in hash.

I think, that the hash value is better to calculate
under the state lock.

Signed-off-by: Pavel Emelyanov <[EMAIL PROTECTED]>

---

diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c
index 1af522b..1face71 100644
--- a/net/xfrm/xfrm_state.c
+++ b/net/xfrm/xfrm_state.c
@@ -759,7 +759,7 @@ xfrm_state_find(xfrm_address_t *daddr, xfrm_address_t 
*saddr,
struct xfrm_policy *pol, int *err,
unsigned short family)
 {
-   unsigned int h = xfrm_dst_hash(daddr, saddr, tmpl->reqid, family);
+   unsigned int h;
struct hlist_node *entry;
struct xfrm_state *x, *x0;
int acquire_in_progress = 0;
@@ -767,6 +767,7 @@ xfrm_state_find(xfrm_address_t *daddr, xfrm_address_t 
*saddr,
struct xfrm_state *best = NULL;
 
spin_lock_bh(&xfrm_state_lock);
+   h = xfrm_dst_hash(daddr, saddr, tmpl->reqid, family);
hlist_for_each_entry(x, entry, xfrm_state_bydst+h, bydst) {
if (x->props.family == family &&
x->props.reqid == tmpl->reqid &&
@@ -868,11 +869,12 @@ struct xfrm_state *
 xfrm_stateonly_find(xfrm_address_t *daddr, xfrm_address_t *saddr,
unsigned short family, u8 mode, u8 proto, u32 reqid)
 {
-   unsigned int h = xfrm_dst_hash(daddr, saddr, reqid, family);
+   unsigned int h;
struct xfrm_state *rx = NULL, *x = NULL;
struct hlist_node *entry;
 
spin_lock(&xfrm_state_lock);
+   h = xfrm_dst_hash(daddr, saddr, reqid, family);
hlist_for_each_entry(x, entry, xfrm_state_bydst+h, bydst) {
if (x->props.family == family &&
x->props.reqid == reqid &&
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [IPSEC]: Fix reversed ICMP6 policy check

2007-12-13 Thread Jarek Poplawski

On 13-12-2007 03:49, Herbert Xu wrote:
> On Thu, Dec 13, 2007 at 09:58:56AM +0800, Herbert Xu wrote:
>> [IPSEC]: Fix reversed ICMP6 policy check
> 
> While that won't crash anymore, it's still logically wrong.
> 
> Here's a more complete fix.

...even more than this!

Since more than a year each time I read your patches I wonder what
kind of special attachments you use they are so "unreadble" (blurred)
in Mozilla Thunderbird (but, I was never so desperate to study all
these mail RFCs). And now - BINGO! So, they are simply treated as
signatures! Nice trick! But, I see, you forgot about something this
time, and now it's all clear! (Actually, I probably need one more
year to find, how to turn off this sh...)

Thanks,
Jarek P.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: libnl - netlink library: Memory leak in address cache?

2007-12-13 Thread Thomas Graf

* Joerg Pommnitz <[EMAIL PROTECTED]> 2007-12-11 06:52
> I think the leak comes from addr_msg_parser. The newly created address object 
> gets added to the cache with nl_cache_add wich takes a reference, so the 
> reference in addr_msg_parser should be dropped, e.g. the following patch 
> might be correct:

That's correct, thanks for catching this.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] MACB: clear transmit buffers properly on TX Underrun

2007-12-13 Thread Haavard Skinnemoen

On Thu, 13 Dec 2007 08:51:57 +0100
Gregory CLEMENT <[EMAIL PROTECTED]> wrote:

> Hi,
> I generated this patch for linux 2.6.24-rc5 and test it on AT91SAM9263 
> with iperf.
> 
> From: Gregory CLEMENT <[EMAIL PROTECTED]>
> Date: Wed, 12 Dec 2007 18:10:14 +0100
> Subject: [PATCH] MACB: clear transmit buffers properly on TX Underrun
> 
> Initially transmit buffer pointers were only reset. But buffer descriptors
> were possibly still set as ready, and buffer in upper layer was not
> freed. This caused driver hang under big load.
> Now reset clean properly the buffer descriptor and freed upper layer.

Nice. I think we want this for 2.6.24.

But the patch is a bit mangled, so I don't think it will apply. Please
have a look in Documentation/email-clients.txt for information on how
to set up Thunderbird to avoid this.

> Signed-off-by: Gregory CLEMENT <[EMAIL PROTECTED]>
> ---
>  drivers/net/macb.c |   26 +-
>  1 files changed, 25 insertions(+), 1 deletions(-)
> 
> diff --git a/drivers/net/macb.c b/drivers/net/macb.c
> index 047ea7b..2ee1dab 100644
> --- a/drivers/net/macb.c
> +++ b/drivers/net/macb.c
> @@ -307,9 +307,33 @@ static void macb_tx(struct macb *bp)
>  (unsigned long)status);
>  
>  if (status & MACB_BIT(UND)) {
> +int i;
>  printk(KERN_ERR "%s: TX underrun, resetting buffers\n",
> -   bp->dev->name);
> +   bp->dev->name);
> +
> +head = bp->tx_head;
> +
> +/* free transmit buffer in upper layer*/
> +for (tail = bp->tx_tail; tail != head; tail = NEXT_TX(tail)) {
> +struct ring_info *rp = &bp->tx_skb[tail];
> +struct sk_buff *skb = rp->skb;
> +
> +BUG_ON(skb == NULL);
> +
> +rmb();
> +
> +dma_unmap_single(&bp->pdev->dev, rp->mapping, skb->len,
> + DMA_TO_DEVICE);
> +rp->skb = NULL;
> +dev_kfree_skb_irq(skb);
> +}
> +
> +/*Mark all the buffer as used to avoid sending a lost buffer*/
> +for (i = 0; i < RX_RING_SIZE; i++)
> +bp->tx_ring[i].ctrl = MACB_BIT(TX_USED);

That should be TX_RING_SIZE, shouldn't it?

I also think this should be done as part of the previous loop, before
they are freed. Having free buffers in the ring sounds dangerous, even
if it's only for a short time.

> +
>  bp->tx_head = bp->tx_tail = 0;

Now, I wonder if it would be better to just scan the ring descriptors,
find the one that failed and just move the DMA pointer to the next
entry. The hardware resets the DMA pointer when an underrun happens, so
I think your code will work, but we're losing more packets than
strictly necessary. In any case, it's better than the existing code.

Perhaps we need a check in macb_start_xmit() as well to avoid starting
a transmission when the ring has just been reset.

Haavard
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC] ehea: kdump support using new shutdown hook

2007-12-13 Thread Subrata Modak

Do you think we can improve upon our own LTP-KDUMP test cases housed
here:
http://ltp.cvs.sourceforge.net/ltp/ltp/testcases/kdump/,

because of your changes below:

--Subrata

On Wed, 2007-12-12 at 17:53 +0100, Thomas Klein wrote:
> This patch adds kdump support using the new PPC crash shutdown hook to the
> ehea driver. The driver now keeps a list of firmware handles which have to
> be freed in case of a crash. The crash handler does the minimum required: it
> frees the firmware resource handles plus broadcast/multicast registrations.
> 
> Please comment.
> 
> Shutdown hook patches:
>   http://ozlabs.org/pipermail/linuxppc-dev/2007-December/048058.html
>   http://ozlabs.org/pipermail/linuxppc-dev/2007-December/048059.html
> 
> 
> Signed-off-by: Thomas Klein <[EMAIL PROTECTED]>
> 
> ---
> diff -Nurp -X dontdiff linux-2.6.24-rc5/drivers/net/ehea/ehea.h 
> patched_kernel/drivers/net/ehea/ehea.h
> --- linux-2.6.24-rc5/drivers/net/ehea/ehea.h  2007-12-11 04:48:43.0 
> +0100
> +++ patched_kernel/drivers/net/ehea/ehea.h2007-12-12 17:30:53.0 
> +0100
> @@ -40,7 +40,7 @@
>  #include 
> 
>  #define DRV_NAME "ehea"
> -#define DRV_VERSION  "EHEA_0083"
> +#define DRV_VERSION  "EHEA_0084"
> 
>  /* eHEA capability flags */
>  #define DLPAR_PORT_ADD_REM 1
> @@ -386,6 +386,7 @@ struct ehea_port_res {
> 
> 
>  #define EHEA_MAX_PORTS 16
> +#define EHEA_MAX_RES_HANDLES (100 * EHEA_MAX_PORTS + 10)
>  struct ehea_adapter {
>   u64 handle;
>   struct of_device *ofdev;
> @@ -397,6 +398,7 @@ struct ehea_adapter {
>   u64 max_mc_mac;/* max number of multicast mac addresses */
>   int active_ports;
>   struct list_head list;
> + u64 res_handles[EHEA_MAX_RES_HANDLES];
>  };
> 
> 
> diff -Nurp -X dontdiff linux-2.6.24-rc5/drivers/net/ehea/ehea_main.c 
> patched_kernel/drivers/net/ehea/ehea_main.c
> --- linux-2.6.24-rc5/drivers/net/ehea/ehea_main.c 2007-12-11 
> 04:48:43.0 +0100
> +++ patched_kernel/drivers/net/ehea/ehea_main.c   2007-12-12 
> 17:30:53.0 +0100
> @@ -35,6 +35,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
> 
>  #include 
> 
> @@ -2256,6 +2257,33 @@ static int ehea_clean_all_portres(struct
>   return ret;
>  }
> 
> +static void ehea_update_adapter_handles(struct ehea_adapter *adapter)
> +{
> + int i, k;
> + int j = 0;
> +
> + memset(adapter->res_handles, sizeof(adapter->res_handles), 0);
> +
> + for (k = 0; k < EHEA_MAX_PORTS; k++) {
> + struct ehea_port *port = adapter->port[k];
> +
> + if (!port || (port->state != EHEA_PORT_UP))
> + continue;
> +
> + for(i = 0; i < port->num_def_qps + port->num_add_tx_qps; i++) {
> + struct ehea_port_res *pr = &port->port_res[i];
> +
> + adapter->res_handles[j++] = pr->qp->fw_handle;
> + adapter->res_handles[j++] = pr->send_cq->fw_handle;
> + adapter->res_handles[j++] = pr->recv_cq->fw_handle;
> + adapter->res_handles[j++] = pr->eq->fw_handle;
> + adapter->res_handles[j++] = pr->send_mr.handle;
> + adapter->res_handles[j++] = pr->recv_mr.handle;
> + }
> + adapter->res_handles[j++] = port->qp_eq->fw_handle;
> + }
> +}
> +
>  static void ehea_remove_adapter_mr(struct ehea_adapter *adapter)
>  {
>   if (adapter->active_ports)
> @@ -2318,6 +2346,7 @@ static int ehea_up(struct net_device *de
> 
>   ret = 0;
>   port->state = EHEA_PORT_UP;
> + ehea_update_adapter_handles(port->adapter);
>   goto out;
> 
>  out_free_irqs:
> @@ -2387,6 +2416,8 @@ static int ehea_down(struct net_device *
>   ehea_info("Failed freeing resources for %s. ret=%i",
> dev->name, ret);
> 
> + ehea_update_adapter_handles(port->adapter);
> +
>   return ret;
>  }
> 
> @@ -3302,6 +,71 @@ static int __devexit ehea_remove(struct 
>   return 0;
>  }
> 
> +void ehea_crash_deregister(void)
> +{
> + struct ehea_adapter *adapter;
> + int i;
> + u64 hret;
> + u8 reg_type;
> +
> + list_for_each_entry(adapter, &adapter_list, list) {
> + for (i = 0; i < EHEA_MAX_PORTS; i++) {
> + struct ehea_port *port = adapter->port[i];
> + if (port->state == EHEA_PORT_UP) {
> + struct ehea_mc_list *mc_entry = port->mc_list;
> + struct list_head *pos;
> + struct list_head *temp;
> +
> + /* Undo multicast registrations */
> + list_for_each_safe(pos, temp,
> +&(port->mc_list->list)) {
> + mc_entry = list_entry(pos,
> + struct ehea_mc_list,
> + li

Re: [RFC] mac80211: clean up frame receive handling

2007-12-13 Thread Johannes Berg


> > This cleans up the frame receive handling. After this patch
> >  * EAPOL frames addressed to us or the EAPOL group address are
> >always accepted regardless of whether they are encrypted or not
> 
> why? userspace (wap_supplicant) tryes to control this depending on the
> network settings.

No, wpa_supplicant actually requires EAPOL frames to go through. hostapd
currently wants them on the management interface but on the data
interface makes much more sense and doesn't matter for any sort of
control since we only let link-local and unicast packets through the
port control/drop_unencrypted.

johannes


signature.asc
Description: This is a digitally signed message part

[PATCH 0/6] PS3: gelic: gelic updates for 2.6.25

2007-12-13 Thread Masakazu Mokuno

Hi,

Here is a set of updates for PS3 gelic network driver.
This patch set requires other patches which were already submitted by
Geert (http://marc.info/?l=linux-kernel&m=119626095605487).

[1] PS3: gelic: Fix the wrong dev_id passed
[2] PS3: gelic: Add endianness macros
[3] PS3: gelic: Code cleanup
[4] PS3: gelic: Remove duplicated ethtool handers
[5] PS3: gelic: Add support for port link status
[6] PS3: gelic: Add support for dual network interface

This is also a set of prerequisite for new wireless driver for PS3, which
I'll submit later. 

Thanks for reviewing!

-- 
Masakazu MOKUNO

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 1/6] PS3: gelic: Fix the wrong dev_id passed

2007-12-13 Thread Masakazu Mokuno

PS3: gelic: Fix the wrong dev_id passed

The device id for lv1_net_set_interrupt_status_indicator() would be wrong.
This path would be invoked only in the case of the initialization failure.

Signed-off-by: Masakazu Mokuno <[EMAIL PROTECTED]>
---
 drivers/net/ps3_gelic_net.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/drivers/net/ps3_gelic_net.c
+++ b/drivers/net/ps3_gelic_net.c
@@ -1512,7 +1512,7 @@ static int ps3_gelic_driver_probe (struc
 
 fail_setup_netdev:
lv1_net_set_interrupt_status_indicator(bus_id(card),
-  bus_id(card),
+  dev_id(card),
   0 , 0);
 fail_status_indicator:
ps3_dma_region_free(dev->d_region);

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 4/6] PS3: gelic: remove duplicated ethtool handers

2007-12-13 Thread Masakazu Mokuno

PS3: gelic: remove duplicated ethtool handers

Remove some ethtool handers, which the common ethtool handlers already has
in functionality

Signed-off-by: Masakazu Mokuno <[EMAIL PROTECTED]>
---
 drivers/net/ps3_gelic_net.c |   43 +++
 1 file changed, 3 insertions(+), 40 deletions(-)

--- a/drivers/net/ps3_gelic_net.c
+++ b/drivers/net/ps3_gelic_net.c
@@ -1196,28 +1196,6 @@ static int gelic_ether_get_settings(stru
return 0;
 }
 
-static u32 gelic_ether_get_link(struct net_device *netdev)
-{
-   struct gelic_card *card = netdev_priv(netdev);
-   int status;
-   u64 v1, v2;
-   int link;
-
-   status = lv1_net_control(bus_id(card), dev_id(card),
-GELIC_LV1_GET_ETH_PORT_STATUS,
-GELIC_LV1_VLAN_TX_ETHERNET, 0, 0,
-&v1, &v2);
-   if (status)
-   return 0; /* link down */
-
-   if (v1 & GELIC_LV1_ETHER_LINK_UP)
-   link = 1;
-   else
-   link = 0;
-
-   return link;
-}
-
 static int gelic_net_nway_reset(struct net_device *netdev)
 {
if (netif_running(netdev)) {
@@ -1227,21 +1205,6 @@ static int gelic_net_nway_reset(struct n
return 0;
 }
 
-static u32 gelic_net_get_tx_csum(struct net_device *netdev)
-{
-   return (netdev->features & NETIF_F_IP_CSUM) != 0;
-}
-
-static int gelic_net_set_tx_csum(struct net_device *netdev, u32 data)
-{
-   if (data)
-   netdev->features |= NETIF_F_IP_CSUM;
-   else
-   netdev->features &= ~NETIF_F_IP_CSUM;
-
-   return 0;
-}
-
 static u32 gelic_net_get_rx_csum(struct net_device *netdev)
 {
struct gelic_card *card = netdev_priv(netdev);
@@ -1260,10 +1223,10 @@ static int gelic_net_set_rx_csum(struct 
 static struct ethtool_ops gelic_net_ethtool_ops = {
.get_drvinfo= gelic_net_get_drvinfo,
.get_settings   = gelic_ether_get_settings,
-   .get_link   = gelic_ether_get_link,
+   .get_link   = ethtool_op_get_link,
.nway_reset = gelic_net_nway_reset,
-   .get_tx_csum= gelic_net_get_tx_csum,
-   .set_tx_csum= gelic_net_set_tx_csum,
+   .get_tx_csum= ethtool_op_get_tx_csum,
+   .set_tx_csum= ethtool_op_set_tx_csum,
.get_rx_csum= gelic_net_get_rx_csum,
.set_rx_csum= gelic_net_set_rx_csum,
 };

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 5/6] PS3: gelic: add support for port link status

2007-12-13 Thread Masakazu Mokuno

PS3: gelic: add support for port link status

Add support for interrupt driven port link status detection.

Signed-off-by: Masakazu Mokuno <[EMAIL PROTECTED]>
---
 drivers/net/ps3_gelic_net.c |   77 
 drivers/net/ps3_gelic_net.h |2 +
 2 files changed, 52 insertions(+), 27 deletions(-)

--- a/drivers/net/ps3_gelic_net.c
+++ b/drivers/net/ps3_gelic_net.c
@@ -87,6 +87,28 @@ static inline void gelic_card_rx_irq_off
 {
gelic_card_set_irq_mask(card, card->ghiintmask & ~GELIC_CARD_RXINT);
 }
+
+static void
+gelic_card_get_ether_port_status(struct gelic_card *card, int inform)
+{
+   u64 v2;
+   struct net_device *ether_netdev;
+
+   lv1_net_control(bus_id(card), dev_id(card),
+   GELIC_LV1_GET_ETH_PORT_STATUS,
+   GELIC_LV1_VLAN_TX_ETHERNET, 0, 0,
+   &card->ether_port_status, &v2);
+
+   if (inform) {
+   ether_netdev = card->netdev;
+   if (card->ether_port_status & GELIC_LV1_ETHER_LINK_UP)
+   netif_carrier_on(ether_netdev);
+   else
+   netif_carrier_off(ether_netdev);
+   }
+}
+
+
 /**
  * gelic_descr_get_status -- returns the status of a descriptor
  * @descr: descriptor to look at
@@ -1032,6 +1054,10 @@ static irqreturn_t gelic_card_interrupt(
gelic_card_kick_txdma(card, card->tx_chain.tail);
spin_unlock_irqrestore(&card->tx_dma_lock, flags);
}
+
+   /* ether port status changed */
+   if (status & GELIC_CARD_PORT_STATUS_CHANGED)
+   gelic_card_get_ether_port_status(card, 1);
return IRQ_HANDLED;
 }
 
@@ -1128,13 +1154,14 @@ static int gelic_net_open(struct net_dev
napi_enable(&card->napi);
 
card->tx_dma_progress = 0;
-   card->ghiintmask = GELIC_CARD_RXINT | GELIC_CARD_TXINT;
+   card->ghiintmask = GELIC_CARD_RXINT | GELIC_CARD_TXINT |
+   GELIC_CARD_PORT_STATUS_CHANGED;
 
gelic_card_set_irq_mask(card, card->ghiintmask);
gelic_card_enable_rxdmac(card);
 
netif_start_queue(netdev);
-   netif_carrier_on(netdev);
+   gelic_card_get_ether_port_status(card, 1);
 
return 0;
 
@@ -1157,39 +1184,35 @@ static int gelic_ether_get_settings(stru
struct ethtool_cmd *cmd)
 {
struct gelic_card *card = netdev_priv(netdev);
-   int status;
-   u64 v1, v2;
-   int speed, duplex;
 
-   speed = duplex = -1;
-   status = lv1_net_control(bus_id(card), dev_id(card),
-GELIC_LV1_GET_ETH_PORT_STATUS,
-GELIC_LV1_VLAN_TX_ETHERNET, 0, 0,
-&v1, &v2);
-   if (status) {
-   /* link down */
-   } else {
-   if (v1 & GELIC_LV1_ETHER_FULL_DUPLEX) {
-   duplex = DUPLEX_FULL;
-   } else {
-   duplex = DUPLEX_HALF;
-   }
+   gelic_card_get_ether_port_status(card, 0);
 
-   if (v1 & GELIC_LV1_ETHER_SPEED_10) {
-   speed = SPEED_10;
-   } else if (v1 & GELIC_LV1_ETHER_SPEED_100) {
-   speed = SPEED_100;
-   } else if (v1 & GELIC_LV1_ETHER_SPEED_1000) {
-   speed = SPEED_1000;
-   }
+   if (card->ether_port_status & GELIC_LV1_ETHER_FULL_DUPLEX)
+   cmd->duplex = DUPLEX_FULL;
+   else
+   cmd->duplex = DUPLEX_HALF;
+
+   switch (card->ether_port_status & GELIC_LV1_ETHER_SPEED_MASK) {
+   case GELIC_LV1_ETHER_SPEED_10:
+   cmd->speed = SPEED_10;
+   break;
+   case GELIC_LV1_ETHER_SPEED_100:
+   cmd->speed = SPEED_100;
+   break;
+   case GELIC_LV1_ETHER_SPEED_1000:
+   cmd->speed = SPEED_1000;
+   break;
+   default:
+   pr_info("%s: speed unknown\n", __func__);
+   cmd->speed = SPEED_10;
+   break;
}
+
cmd->supported = SUPPORTED_TP | SUPPORTED_Autoneg |
SUPPORTED_10baseT_Half | SUPPORTED_10baseT_Full |
SUPPORTED_100baseT_Half | SUPPORTED_100baseT_Full |
SUPPORTED_1000baseT_Half | SUPPORTED_1000baseT_Full;
cmd->advertising = cmd->supported;
-   cmd->speed = speed;
-   cmd->duplex = duplex;
cmd->autoneg = AUTONEG_ENABLE; /* always enabled */
cmd->port = PORT_TP;
 
--- a/drivers/net/ps3_gelic_net.h
+++ b/drivers/net/ps3_gelic_net.h
@@ -261,6 +261,8 @@ struct gelic_card {
atomic_t tx_timeout_task_counter;
wait_queue_head_t waitq;
 
+   u64 ether_port_status;
+
struct gelic_descr *tx_top, *rx_top;
struct gelic_descr descr[0];
 };

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMA

[PATCH 6/6] PS3: gelic: Add support for dual network interface

2007-12-13 Thread Masakazu Mokuno

PS3: gelic: Add support for dual network interface

Add support for dual network (net_device) interface so that ethernet
and wireless can own separate ethX interface.
  - Export functions which are convenience for both interfaces
  - Move irq allocation/release code to driver probe/remove handlers
because interfaces share interrupts.
  - Allocate skbs by using dev_alloc_skb() instead netdev_alloc_skb()
as the interfaces share the hardware rx queue.
  - Add gelic_port struct in order to abstract dual interface handling
  - Change handers for hardware queues so that they can handle dual
{source,destination} interfaces.
  - Use new NAPI functions
This is a prerequisite for the new PS3 wireless support.

Signed-off-by: Masakazu Mokuno <[EMAIL PROTECTED]>
---
 drivers/net/ps3_gelic_net.c |  721 ++--
 drivers/net/ps3_gelic_net.h |  107 +-
 2 files changed, 525 insertions(+), 303 deletions(-)

--- a/drivers/net/ps3_gelic_net.c
+++ b/drivers/net/ps3_gelic_net.c
@@ -48,27 +48,19 @@
 #include "ps3_gelic_net.h"
 
 #define DRV_NAME "Gelic Network Driver"
-#define DRV_VERSION "1.0"
+#define DRV_VERSION "1.1"
 
 MODULE_AUTHOR("SCE Inc.");
 MODULE_DESCRIPTION("Gelic Network driver");
 MODULE_LICENSE("GPL");
 
-static inline struct device *ctodev(struct gelic_card *card)
-{
-   return &card->dev->core;
-}
-static inline u64 bus_id(struct gelic_card *card)
-{
-   return card->dev->bus_id;
-}
-static inline u64 dev_id(struct gelic_card *card)
-{
-   return card->dev->dev_id;
-}
+
+static inline void gelic_card_enable_rxdmac(struct gelic_card *card);
+static inline void gelic_card_disable_rxdmac(struct gelic_card *card);
+static inline void gelic_card_disable_txdmac(struct gelic_card *card);
 
 /* set irq_mask */
-static int gelic_card_set_irq_mask(struct gelic_card *card, u64 mask)
+int gelic_card_set_irq_mask(struct gelic_card *card, u64 mask)
 {
int status;
 
@@ -76,20 +68,23 @@ static int gelic_card_set_irq_mask(struc
mask, 0);
if (status)
dev_info(ctodev(card),
-"lv1_net_set_interrupt_mask failed %d\n", status);
+"%s failed %d\n", __func__, status);
return status;
 }
+
 static inline void gelic_card_rx_irq_on(struct gelic_card *card)
 {
-   gelic_card_set_irq_mask(card, card->ghiintmask | GELIC_CARD_RXINT);
+   card->irq_mask |= GELIC_CARD_RXINT;
+   gelic_card_set_irq_mask(card, card->irq_mask);
 }
 static inline void gelic_card_rx_irq_off(struct gelic_card *card)
 {
-   gelic_card_set_irq_mask(card, card->ghiintmask & ~GELIC_CARD_RXINT);
+   card->irq_mask &= ~GELIC_CARD_RXINT;
+   gelic_card_set_irq_mask(card, card->irq_mask);
 }
 
-static void
-gelic_card_get_ether_port_status(struct gelic_card *card, int inform)
+static void gelic_card_get_ether_port_status(struct gelic_card *card,
+int inform)
 {
u64 v2;
struct net_device *ether_netdev;
@@ -100,7 +95,7 @@ gelic_card_get_ether_port_status(struct 
&card->ether_port_status, &v2);
 
if (inform) {
-   ether_netdev = card->netdev;
+   ether_netdev = card->netdev[GELIC_PORT_ETHERNET];
if (card->ether_port_status & GELIC_LV1_ETHER_LINK_UP)
netif_carrier_on(ether_netdev);
else
@@ -108,6 +103,46 @@ gelic_card_get_ether_port_status(struct 
}
 }
 
+void gelic_card_up(struct gelic_card *card)
+{
+   pr_debug("%s: called\n", __func__);
+   down(&card->updown_lock);
+   if (atomic_inc_return(&card->users) == 1) {
+   pr_debug("%s: real do\n", __func__);
+   /* enable irq */
+   gelic_card_set_irq_mask(card, card->irq_mask);
+   /* start rx */
+   gelic_card_enable_rxdmac(card);
+
+   napi_enable(&card->napi);
+   }
+   up(&card->updown_lock);
+   pr_debug("%s: done\n", __func__);
+}
+
+void gelic_card_down(struct gelic_card *card)
+{
+   u64 mask;
+   pr_debug("%s: called\n", __func__);
+   down(&card->updown_lock);
+   if (atomic_dec_if_positive(&card->users) == 0) {
+   pr_debug("%s: real do\n", __func__);
+   napi_disable(&card->napi);
+   /*
+* Disable irq. Wireless interrupts will
+* be disabled later if any
+*/
+   mask = card->irq_mask & (GELIC_CARD_WLAN_EVENT_RECEIVED |
+GELIC_CARD_WLAN_COMMAND_COMPLETED);
+   gelic_card_set_irq_mask(card, mask);
+   /* stop rx */
+   gelic_card_disable_rxdmac(card);
+   /* stop tx */
+   gelic_card_disable_txdmac(card);
+   }
+   up(&card->updown_lock);
+   pr_debug("%s: done\n", __func__);
+}
 
 /**
  * gelic_descr_get_status

[PATCH 3/6] PS3: gelic: code cleanup

2007-12-13 Thread Masakazu Mokuno

PS3: gelic: code cleanup

Code cleanup:
 - Use appropriate prefixes of names instead of fixed 'gelic_net'
   so that objects of the functions, variables and constants can be esitimate.
 - Remove definitions for IPSec offload of the gelic hardware.  This
   functionality is never supported in PS3.
 - Group constatns with enum.
 - Use bitwise constatns for interrupt status, instead of bit number to
   eliminate shift operations.
 - Style fixes.
Signed-off-by: Masakazu Mokuno <[EMAIL PROTECTED]>
---
 drivers/net/ps3_gelic_net.c |  464 +---
 drivers/net/ps3_gelic_net.h |  283 +++---
 2 files changed, 389 insertions(+), 358 deletions(-)

--- a/drivers/net/ps3_gelic_net.c
+++ b/drivers/net/ps3_gelic_net.c
@@ -54,21 +54,21 @@ MODULE_AUTHOR("SCE Inc.");
 MODULE_DESCRIPTION("Gelic Network driver");
 MODULE_LICENSE("GPL");
 
-static inline struct device *ctodev(struct gelic_net_card *card)
+static inline struct device *ctodev(struct gelic_card *card)
 {
return &card->dev->core;
 }
-static inline u64 bus_id(struct gelic_net_card *card)
+static inline u64 bus_id(struct gelic_card *card)
 {
return card->dev->bus_id;
 }
-static inline u64 dev_id(struct gelic_net_card *card)
+static inline u64 dev_id(struct gelic_card *card)
 {
return card->dev->dev_id;
 }
 
 /* set irq_mask */
-static int gelic_net_set_irq_mask(struct gelic_net_card *card, u64 mask)
+static int gelic_card_set_irq_mask(struct gelic_card *card, u64 mask)
 {
int status;
 
@@ -79,51 +79,40 @@ static int gelic_net_set_irq_mask(struct
 "lv1_net_set_interrupt_mask failed %d\n", status);
return status;
 }
-static inline void gelic_net_rx_irq_on(struct gelic_net_card *card)
+static inline void gelic_card_rx_irq_on(struct gelic_card *card)
 {
-   gelic_net_set_irq_mask(card, card->ghiintmask | GELIC_NET_RXINT);
+   gelic_card_set_irq_mask(card, card->ghiintmask | GELIC_CARD_RXINT);
 }
-static inline void gelic_net_rx_irq_off(struct gelic_net_card *card)
+static inline void gelic_card_rx_irq_off(struct gelic_card *card)
 {
-   gelic_net_set_irq_mask(card, card->ghiintmask & ~GELIC_NET_RXINT);
+   gelic_card_set_irq_mask(card, card->ghiintmask & ~GELIC_CARD_RXINT);
 }
 /**
- * gelic_net_get_descr_status -- returns the status of a descriptor
+ * gelic_descr_get_status -- returns the status of a descriptor
  * @descr: descriptor to look at
  *
  * returns the status as in the dmac_cmd_status field of the descriptor
  */
-static enum gelic_net_descr_status
-gelic_net_get_descr_status(struct gelic_net_descr *descr)
+static enum gelic_descr_dma_status
+gelic_descr_get_status(struct gelic_descr *descr)
 {
-   u32 cmd_status;
-
-   cmd_status = be32_to_cpu(descr->dmac_cmd_status);
-   cmd_status >>= GELIC_NET_DESCR_IND_PROC_SHIFT;
-   return cmd_status;
+   return be32_to_cpu(descr->dmac_cmd_status) & GELIC_DESCR_DMA_STAT_MASK;
 }
 
 /**
- * gelic_net_set_descr_status -- sets the status of a descriptor
+ * gelic_descr_set_status -- sets the status of a descriptor
  * @descr: descriptor to change
  * @status: status to set in the descriptor
  *
  * changes the status to the specified value. Doesn't change other bits
  * in the status
  */
-static void gelic_net_set_descr_status(struct gelic_net_descr *descr,
-  enum gelic_net_descr_status status)
+static void gelic_descr_set_status(struct gelic_descr *descr,
+  enum gelic_descr_dma_status status)
 {
-   u32 cmd_status;
-
-   /* read the status */
-   cmd_status = be32_to_cpu(descr->dmac_cmd_status);
-   /* clean the upper 4 bits */
-   cmd_status &= GELIC_NET_DESCR_IND_PROC_MASKO;
-   /* add the status to it */
-   cmd_status |= ((u32)status) << GELIC_NET_DESCR_IND_PROC_SHIFT;
-   /* and write it back */
-   descr->dmac_cmd_status = cpu_to_be32(cmd_status);
+   descr->dmac_cmd_status = cpu_to_be32(status |
+   (be32_to_cpu(descr->dmac_cmd_status) &
+~GELIC_DESCR_DMA_STAT_MASK));
/*
 * dma_cmd_status field is used to indicate whether the descriptor
 * is valid or not.
@@ -134,24 +123,24 @@ static void gelic_net_set_descr_status(s
 }
 
 /**
- * gelic_net_free_chain - free descriptor chain
+ * gelic_card_free_chain - free descriptor chain
  * @card: card structure
  * @descr_in: address of desc
  */
-static void gelic_net_free_chain(struct gelic_net_card *card,
-struct gelic_net_descr *descr_in)
+static void gelic_card_free_chain(struct gelic_card *card,
+ struct gelic_descr *descr_in)
 {
-   struct gelic_net_descr *descr;
+   struct gelic_descr *descr;
 
for (descr = descr_in; descr && descr->bus_addr; descr = descr->next) {
dma_unmap_single(ctodev(card), descr->bus_addr,
-GELIC_NET_DE

[PATCH 2/6] PS3: gelic: Add endianness macros

2007-12-13 Thread Masakazu Mokuno

PS3: gelic: Add endianness macros

Mark the members of the structure for DMA descriptor proper endian
and use appropriate accessor macros.
As gelic driver works only on PS3, all these macros will be
expanded null.

Signed-off-by: Masakazu Mokuno <[EMAIL PROTECTED]>
---
 drivers/net/ps3_gelic_net.c |   70 
 drivers/net/ps3_gelic_net.h |   16 +-
 2 files changed, 47 insertions(+), 39 deletions(-)

--- a/drivers/net/ps3_gelic_net.c
+++ b/drivers/net/ps3_gelic_net.c
@@ -98,7 +98,7 @@ gelic_net_get_descr_status(struct gelic_
 {
u32 cmd_status;
 
-   cmd_status = descr->dmac_cmd_status;
+   cmd_status = be32_to_cpu(descr->dmac_cmd_status);
cmd_status >>= GELIC_NET_DESCR_IND_PROC_SHIFT;
return cmd_status;
 }
@@ -117,13 +117,13 @@ static void gelic_net_set_descr_status(s
u32 cmd_status;
 
/* read the status */
-   cmd_status = descr->dmac_cmd_status;
+   cmd_status = be32_to_cpu(descr->dmac_cmd_status);
/* clean the upper 4 bits */
cmd_status &= GELIC_NET_DESCR_IND_PROC_MASKO;
/* add the status to it */
cmd_status |= ((u32)status) << GELIC_NET_DESCR_IND_PROC_SHIFT;
/* and write it back */
-   descr->dmac_cmd_status = cmd_status;
+   descr->dmac_cmd_status = cpu_to_be32(cmd_status);
/*
 * dma_cmd_status field is used to indicate whether the descriptor
 * is valid or not.
@@ -193,7 +193,7 @@ static int gelic_net_init_chain(struct g
/* chain bus addr of hw descriptor */
descr = start_descr;
for (i = 0; i < no; i++, descr++) {
-   descr->next_descr_addr = descr->next->bus_addr;
+   descr->next_descr_addr = cpu_to_be32(descr->next->bus_addr);
}
 
chain->head = start_descr;
@@ -245,7 +245,7 @@ static int gelic_net_prepare_rx_descr(st
 "%s:allocate skb failed !!\n", __func__);
return -ENOMEM;
}
-   descr->buf_size = bufsize;
+   descr->buf_size = cpu_to_be32(bufsize);
descr->dmac_cmd_status = 0;
descr->result_size = 0;
descr->valid_size = 0;
@@ -256,9 +256,10 @@ static int gelic_net_prepare_rx_descr(st
if (offset)
skb_reserve(descr->skb, GELIC_NET_RXBUF_ALIGN - offset);
/* io-mmu-map the skb */
-   descr->buf_addr = dma_map_single(ctodev(card), descr->skb->data,
-GELIC_NET_MAX_MTU,
-DMA_FROM_DEVICE);
+   descr->buf_addr = cpu_to_be32(dma_map_single(ctodev(card),
+descr->skb->data,
+GELIC_NET_MAX_MTU,
+DMA_FROM_DEVICE));
if (!descr->buf_addr) {
dev_kfree_skb_any(descr->skb);
descr->skb = NULL;
@@ -284,7 +285,7 @@ static void gelic_net_release_rx_chain(s
do {
if (descr->skb) {
dma_unmap_single(ctodev(card),
-descr->buf_addr,
+be32_to_cpu(descr->buf_addr),
 descr->skb->len,
 DMA_FROM_DEVICE);
descr->buf_addr = 0;
@@ -353,10 +354,11 @@ static void gelic_net_release_tx_descr(s
 {
struct sk_buff *skb = descr->skb;
 
-   BUG_ON(!(descr->data_status & (1 << GELIC_NET_TXDESC_TAIL)));
+   BUG_ON(!(be32_to_cpu(descr->data_status) &
+(1 << GELIC_NET_TXDESC_TAIL)));
 
-   dma_unmap_single(ctodev(card), descr->buf_addr, skb->len,
-DMA_TO_DEVICE);
+   dma_unmap_single(ctodev(card),
+be32_to_cpu(descr->buf_addr), skb->len, DMA_TO_DEVICE);
dev_kfree_skb_any(skb);
 
descr->buf_addr = 0;
@@ -610,28 +612,29 @@ static void gelic_net_set_txdescr_cmdsta
  struct sk_buff *skb)
 {
if (skb->ip_summed != CHECKSUM_PARTIAL)
-   descr->dmac_cmd_status = GELIC_NET_DMAC_CMDSTAT_NOCS |
-   GELIC_NET_DMAC_CMDSTAT_END_FRAME;
+   descr->dmac_cmd_status =
+   cpu_to_be32(GELIC_NET_DMAC_CMDSTAT_NOCS |
+   GELIC_NET_DMAC_CMDSTAT_END_FRAME);
else {
/* is packet ip?
 * if yes: tcp? udp? */
if (skb->protocol == htons(ETH_P_IP)) {
if (ip_hdr(skb)->protocol == IPPROTO_TCP)
descr->dmac_cmd_status =
-   GELIC_NET_DMAC_CMDSTAT_TCPCS |
-   GELIC_NET_DMAC_CMDSTAT_END_FRAME;
+   cpu_to_be32(GELIC_NET_DMAC_CMDSTAT_TCPCS |
+

Re: [DECNET]: Fix inverted wait flag in xfrm_lookup call

2007-12-13 Thread David Miller

From: Herbert Xu <[EMAIL PROTECTED]>
Date: Thu, 13 Dec 2007 13:25:16 +0800

> [DECNET]: Fix inverted wait flag in xfrm_lookup call
> 
> My previous patch made the wait flag take the opposite value to what it
> should be.  This patch fixes that.
> 
> Signed-off-by: Herbert Xu <[EMAIL PROTECTED]>

Applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [kvm-devel] [PATCH resent] virtio_net: Fix stalled inbound trafficon early packets

2007-12-13 Thread Dor Laor


Christian Borntraeger wrote:

Am Mittwoch, 12. Dezember 2007 schrieb Dor Laor:
  
I think the change below handles the race. Otherwise please detail the 
use case.


[...]
  

@@ -292,6 +292,9 @@ static int virtnet_open(struct net_devic
return -ENOMEM;

napi_enable(&vi->napi);
+
+   vi->rvq->vq_ops->enable(vi->rvq);
+   vi->svq->vq_ops->enable(vi->svq);

  

If you change it to:
if (!vi->rvq->vq_ops->enable(vi->rvq))
vi->rvq->vq_ops->kick(vi->rvq);
if (!vi->rvq->vq_ops->enable(vi->svq))
vi->rvq->vq_ops->kick(vi->svq);

You solve the race of packets already waiting in the queue without 
triggering the irq.



Hmm, I dont fully understand your point. I think this will work as long as 
the host has not consumed all inbound buffers. It will also require that 
the host sends an additional packet, no? If no additional packet comes the 
host has no reason to send an interrupt just because it got a notify 
hypercall. kick inside a guest also does not trigger the poll routine. 


It also wont work on the following scenario:
in virtnet open we will allocate buffers and send them to the host using the 
kick callback. The host can now use _all_ buffers for incoming data while 
interrupts are still disabled and the guest is not running.( Lets say the 
host bridge has lots of multicast traffic and the guest gets not scheduled 
for a while). When the guest now continues and enables the interrupts 
nothing happens. Doing a kick does not help, as the host code will bail out 
with "no dma memory for transfer".
  
Christian


  

You're right I got confused somehow.
So in that case setting the driver status field on open in addition to 
your enable will do the trick.
On DRIVER_OPEN the host will trigger an interrupt if the queue is not 
empty..

Thanks,
Dor
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH][NET-2.6.25][IPV6] fix section mismatch warnings

2007-12-13 Thread David Miller

From: Daniel Lezcano <[EMAIL PROTECTED]>
Date: Thu, 13 Dec 2007 11:39:26 +0100

> Removed useless and buggy __exit section in the different 
> ipv6 subsystems. Otherwise they will be called inside an
> init section during rollbacking in case of an error in the
> protocol initialization.
> 
> Signed-off-by: Daniel Lezcano <[EMAIL PROTECTED]>

Applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[patch 0/2] rebinding tunnels to interfaces

2007-12-13 Thread mschmidt

These two patches allow rebinding of GRE and SIT tunnels to interfaces using
ip tun change  dev 
A similar change was already done for IPIP tunnels.

Michal
-- 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[patch 1/2] ip_gre: Rebinding of GRE tunnels to other interfaces

2007-12-13 Thread mschmidt

This is similar to the change already done for IPIP tunnels.

Once created, a GRE tunnel can't be bound to another device.
To reproduce:

# create a tunnel:
ip tunnel add tunneltest0 mode gre remote 10.0.0.1 dev eth0
# try to change the bounding device from eth0 to eth1:
ip tunnel change tunneltest0 dev eth1
# show the result:
ip tunnel show tunneltest0

tunneltest0: gre/ip  remote 10.0.0.1  local any  dev eth0  ttl inherit

Notice the bound device has not changed from eth0 to eth1.

This patch fixes it. When changing the binding, it also recalculates the
MTU according to the new bound device's MTU.

Signed-off-by: Michal Schmidt <[EMAIL PROTECTED]>

Index: linus-kernel.git/net/ipv4/ip_gre.c
===
--- linus-kernel.git.orig/net/ipv4/ip_gre.c
+++ linus-kernel.git/net/ipv4/ip_gre.c
@@ -896,6 +896,59 @@ tx_error:
return 0;
 }
 
+static void ipgre_tunnel_bind_dev(struct net_device *dev)
+{
+   struct net_device *tdev = NULL;
+   struct ip_tunnel *tunnel;
+   struct iphdr *iph;
+   int hlen = LL_MAX_HEADER;
+   int mtu = ETH_DATA_LEN;
+   int addend = sizeof(struct iphdr) + 4;
+
+   tunnel = netdev_priv(dev);
+   iph = &tunnel->parms.iph;
+
+   /* Guess output device to choose reasonable mtu and hard_header_len */
+
+   if (iph->daddr) {
+   struct flowi fl = { .oif = tunnel->parms.link,
+   .nl_u = { .ip4_u =
+ { .daddr = iph->daddr,
+   .saddr = iph->saddr,
+   .tos = RT_TOS(iph->tos) } },
+   .proto = IPPROTO_GRE };
+   struct rtable *rt;
+   if (!ip_route_output_key(&rt, &fl)) {
+   tdev = rt->u.dst.dev;
+   ip_rt_put(rt);
+   }
+   dev->flags |= IFF_POINTOPOINT;
+   }
+
+   if (!tdev && tunnel->parms.link)
+   tdev = __dev_get_by_index(&init_net, tunnel->parms.link);
+
+   if (tdev) {
+   hlen = tdev->hard_header_len;
+   mtu = tdev->mtu;
+   }
+   dev->iflink = tunnel->parms.link;
+
+   /* Precalculate GRE options length */
+   if (tunnel->parms.o_flags&(GRE_CSUM|GRE_KEY|GRE_SEQ)) {
+   if (tunnel->parms.o_flags&GRE_CSUM)
+   addend += 4;
+   if (tunnel->parms.o_flags&GRE_KEY)
+   addend += 4;
+   if (tunnel->parms.o_flags&GRE_SEQ)
+   addend += 4;
+   }
+   dev->hard_header_len = hlen + addend;
+   dev->mtu = mtu - addend;
+   tunnel->hlen = addend;
+
+}
+
 static int
 ipgre_tunnel_ioctl (struct net_device *dev, struct ifreq *ifr, int cmd)
 {
@@ -983,6 +1036,11 @@ ipgre_tunnel_ioctl (struct net_device *d
t->parms.iph.ttl = p.iph.ttl;
t->parms.iph.tos = p.iph.tos;
t->parms.iph.frag_off = p.iph.frag_off;
+   if (t->parms.link != p.link) {
+   t->parms.link = p.link;
+   ipgre_tunnel_bind_dev(dev);
+   netdev_state_change(dev);
+   }
}
if (copy_to_user(ifr->ifr_ifru.ifru_data, &t->parms, 
sizeof(p)))
err = -EFAULT;
@@ -1162,12 +1220,8 @@ static void ipgre_tunnel_setup(struct ne
 
 static int ipgre_tunnel_init(struct net_device *dev)
 {
-   struct net_device *tdev = NULL;
struct ip_tunnel *tunnel;
struct iphdr *iph;
-   int hlen = LL_MAX_HEADER;
-   int mtu = ETH_DATA_LEN;
-   int addend = sizeof(struct iphdr) + 4;
 
tunnel = netdev_priv(dev);
iph = &tunnel->parms.iph;
@@ -1178,23 +1232,9 @@ static int ipgre_tunnel_init(struct net_
memcpy(dev->dev_addr, &tunnel->parms.iph.saddr, 4);
memcpy(dev->broadcast, &tunnel->parms.iph.daddr, 4);
 
-   /* Guess output device to choose reasonable mtu and hard_header_len */
+   ipgre_tunnel_bind_dev(dev);
 
if (iph->daddr) {
-   struct flowi fl = { .oif = tunnel->parms.link,
-   .nl_u = { .ip4_u =
- { .daddr = iph->daddr,
-   .saddr = iph->saddr,
-   .tos = RT_TOS(iph->tos) } },
-   .proto = IPPROTO_GRE };
-   struct rtable *rt;
-   if (!ip_route_output_key(&rt, &fl)) {
-   tdev = rt->u.dst.dev;
-   ip_rt_put(rt);
-   }
-
-   dev->flags |= IFF_POINTOPOINT;
-
 #ifdef CONFIG_NET_IPGRE_BROADCAST

[patch 2/2] ipv6/sit: Rebinding of SIT tunnels to other interfaces

2007-12-13 Thread mschmidt

This is similar to the change already done for IPIP tunnels.

Once created, a SIT tunnel can't be bound to another device.
To reproduce:

# create a tunnel:
ip tunnel add tunneltest0 mode sit remote 10.0.0.1 dev eth0
# try to change the bounding device from eth0 to eth1:
ip tunnel change tunneltest0 dev eth1
# show the result:
ip tunnel show tunneltest0

tunneltest0: ipv6/ip  remote 10.0.0.1  local any  dev eth0  ttl inherit

Notice the bound device has not changed from eth0 to eth1.

This patch fixes it. When changing the binding, it also recalculates the
MTU according to the new bound device's MTU.

Signed-off-by: Michal Schmidt <[EMAIL PROTECTED]>

Index: linus-kernel.git/net/ipv6/sit.c
===
--- linus-kernel.git.orig/net/ipv6/sit.c
+++ linus-kernel.git/net/ipv6/sit.c
@@ -669,6 +669,42 @@ tx_error:
return 0;
 }
 
+static void ipip6_tunnel_bind_dev(struct net_device *dev)
+{
+   struct net_device *tdev = NULL;
+   struct ip_tunnel *tunnel;
+   struct iphdr *iph;
+
+   tunnel = netdev_priv(dev);
+   iph = &tunnel->parms.iph;
+
+   if (iph->daddr) {
+   struct flowi fl = { .nl_u = { .ip4_u =
+ { .daddr = iph->daddr,
+   .saddr = iph->saddr,
+   .tos = RT_TOS(iph->tos) } },
+   .oif = tunnel->parms.link,
+   .proto = IPPROTO_IPV6 };
+   struct rtable *rt;
+   if (!ip_route_output_key(&rt, &fl)) {
+   tdev = rt->u.dst.dev;
+   ip_rt_put(rt);
+   }
+   dev->flags |= IFF_POINTOPOINT;
+   }
+
+   if (!tdev && tunnel->parms.link)
+   tdev = __dev_get_by_index(&init_net, tunnel->parms.link);
+
+   if (tdev) {
+   dev->hard_header_len = tdev->hard_header_len + sizeof(struct 
iphdr);
+   dev->mtu = tdev->mtu - sizeof(struct iphdr);
+   if (dev->mtu < IPV6_MIN_MTU)
+   dev->mtu = IPV6_MIN_MTU;
+   }
+   dev->iflink = tunnel->parms.link;
+}
+
 static int
 ipip6_tunnel_ioctl (struct net_device *dev, struct ifreq *ifr, int cmd)
 {
@@ -740,6 +776,11 @@ ipip6_tunnel_ioctl (struct net_device *d
if (cmd == SIOCCHGTUNNEL) {
t->parms.iph.ttl = p.iph.ttl;
t->parms.iph.tos = p.iph.tos;
+   if (t->parms.link != p.link) {
+   t->parms.link = p.link;
+   ipip6_tunnel_bind_dev(dev);
+   netdev_state_change(dev);
+   }
}
if (copy_to_user(ifr->ifr_ifru.ifru_data, &t->parms, 
sizeof(p)))
err = -EFAULT;
@@ -808,12 +849,9 @@ static void ipip6_tunnel_setup(struct ne
 
 static int ipip6_tunnel_init(struct net_device *dev)
 {
-   struct net_device *tdev = NULL;
struct ip_tunnel *tunnel;
-   struct iphdr *iph;
 
tunnel = netdev_priv(dev);
-   iph = &tunnel->parms.iph;
 
tunnel->dev = dev;
strcpy(tunnel->parms.name, dev->name);
@@ -821,31 +859,7 @@ static int ipip6_tunnel_init(struct net_
memcpy(dev->dev_addr, &tunnel->parms.iph.saddr, 4);
memcpy(dev->broadcast, &tunnel->parms.iph.daddr, 4);
 
-   if (iph->daddr) {
-   struct flowi fl = { .nl_u = { .ip4_u =
- { .daddr = iph->daddr,
-   .saddr = iph->saddr,
-   .tos = RT_TOS(iph->tos) } },
-   .oif = tunnel->parms.link,
-   .proto = IPPROTO_IPV6 };
-   struct rtable *rt;
-   if (!ip_route_output_key(&rt, &fl)) {
-   tdev = rt->u.dst.dev;
-   ip_rt_put(rt);
-   }
-   dev->flags |= IFF_POINTOPOINT;
-   }
-
-   if (!tdev && tunnel->parms.link)
-   tdev = __dev_get_by_index(&init_net, tunnel->parms.link);
-
-   if (tdev) {
-   dev->hard_header_len = tdev->hard_header_len + sizeof(struct 
iphdr);
-   dev->mtu = tdev->mtu - sizeof(struct iphdr);
-   if (dev->mtu < IPV6_MIN_MTU)
-   dev->mtu = IPV6_MIN_MTU;
-   }
-   dev->iflink = tunnel->parms.link;
+   ipip6_tunnel_bind_dev(dev);
 
return 0;
 }

-- 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC] net: napi fix

2007-12-13 Thread David Miller

From: Jarek Poplawski <[EMAIL PROTECTED]>
Date: Thu, 13 Dec 2007 14:49:53 +0100

> As a matter of fact, since it's "unlikely()" in net_rx_action() anyway,
> I wonder what is the main reason or gain of leaving such a tricky
> exception, instead of letting drivers to always decide which is the
> best moment for napi_complete()? (Or maybe even, in such a case, they
> should call some function with this list_move_tail() if it's so
> useful?)

It is the only sane way to synchronize the list manipulations.

There has to be a way for ->poll() to tell net_rx_action() two things:

1) How much work was completed, so we can adjust 'budget'
2) Was the NAPI quota exhausted?  So that we know that
   net_rx_action() still "owns" the polling context and
   thus can do the list manipulation safely.

And these both need to be encoded into one single return value, thus
the adopted convention that "work == weight" means that the device has
not done a NAPI complete.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 2.6.25] [IPV6] Always pass a valid nl_info to inet6_rt_notify.

2007-12-13 Thread Denis V. Lunev

[IPV6] Always pass a valid nl_info to inet6_rt_notify.

This makes the code in the inet6_rt_notify more straightforward and provides
groud for namespace passing.

Signed-off-by: Denis V. Lunev <[EMAIL PROTECTED]>
---
 net/ipv6/ip6_fib.c |3 ++-
 net/ipv6/route.c   |   27 +--
 2 files changed, 15 insertions(+), 15 deletions(-)

diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c
index 5fae045..df05c6f 100644
--- a/net/ipv6/ip6_fib.c
+++ b/net/ipv6/ip6_fib.c
@@ -1315,6 +1315,7 @@ static int fib6_walk(struct fib6_walker_t *w)
 
 static int fib6_clean_node(struct fib6_walker_t *w)
 {
+   struct nl_info info = {};
int res;
struct rt6_info *rt;
struct fib6_cleaner_t *c = container_of(w, struct fib6_cleaner_t, w);
@@ -1323,7 +1324,7 @@ static int fib6_clean_node(struct fib6_walker_t *w)
res = c->func(rt, c->arg);
if (res < 0) {
w->leaf = rt;
-   res = fib6_del(rt, NULL);
+   res = fib6_del(rt, &info);
if (res) {
 #if RT6_DEBUG >= 2
printk(KERN_DEBUG "fib6_clean_node: del failed: 
[EMAIL PROTECTED] err=%d\n", rt, rt->rt6i_node, res);
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 1530934..02354a7 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -604,7 +604,8 @@ static int __ip6_ins_rt(struct rt6_info *rt, struct nl_info 
*info)
 
 int ip6_ins_rt(struct rt6_info *rt)
 {
-   return __ip6_ins_rt(rt, NULL);
+   struct nl_info info = {};
+   return __ip6_ins_rt(rt, &info);
 }
 
 static struct rt6_info *rt6_alloc_cow(struct rt6_info *ort, struct in6_addr 
*daddr,
@@ -1261,7 +1262,8 @@ static int __ip6_del_rt(struct rt6_info *rt, struct 
nl_info *info)
 
 int ip6_del_rt(struct rt6_info *rt)
 {
-   return __ip6_del_rt(rt, NULL);
+   struct nl_info info = {};
+   return __ip6_del_rt(rt, &info);
 }
 
 static int ip6_route_del(struct fib6_config *cfg)
@@ -2238,29 +2240,26 @@ errout:
 void inet6_rt_notify(int event, struct rt6_info *rt, struct nl_info *info)
 {
struct sk_buff *skb;
-   u32 pid = 0, seq = 0;
-   struct nlmsghdr *nlh = NULL;
-   int err = -ENOBUFS;
-
-   if (info) {
-   pid = info->pid;
-   nlh = info->nlh;
-   if (nlh)
-   seq = nlh->nlmsg_seq;
-   }
+   u32 seq;
+   int err;
+
+   err = -ENOBUFS;
+   seq = info->nlh != NULL ? info->nlh->nlmsg_seq : 0;
 
skb = nlmsg_new(rt6_nlmsg_size(), gfp_any());
if (skb == NULL)
goto errout;
 
-   err = rt6_fill_node(skb, rt, NULL, NULL, 0, event, pid, seq, 0, 0);
+   err = rt6_fill_node(skb, rt, NULL, NULL, 0,
+   event, info->pid, seq, 0, 0);
if (err < 0) {
/* -EMSGSIZE implies BUG in rt6_nlmsg_size() */
WARN_ON(err == -EMSGSIZE);
kfree_skb(skb);
goto errout;
}
-   err = rtnl_notify(skb, &init_net, pid, RTNLGRP_IPV6_ROUTE, nlh, 
gfp_any());
+   err = rtnl_notify(skb, &init_net, info->pid,
+   RTNLGRP_IPV6_ROUTE, info->nlh, gfp_any());
 errout:
if (err < 0)
rtnl_set_sk_err(&init_net, RTNLGRP_IPV6_ROUTE, err);
-- 
1.5.3.rc5

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/7] [NETDEV]: e1000 Fix possible causing oops of net_rx_action

2007-12-13 Thread David Miller

From: "Joonwoo Park" <[EMAIL PROTECTED]>
Date: Thu, 13 Dec 2007 19:18:56 +0900

> Just blowing netif_running up is not best solution I think, it makes
> ifconfig down hang at least for e1000.

It hangs because the packet receive rate is so high that NAPI
poll never exits.

I think we need a cheap solution to something so obscure and
almost not worth explicitly even coding for.  Really, if you
setup silly situations like that, you get what you asked for.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC] net: napi fix

2007-12-13 Thread Jarek Poplawski

On 12-12-2007 19:41, Kok, Auke wrote:
> David Miller wrote:
>> From: Andrew Gallatin <[EMAIL PROTECTED]>
>> Date: Wed, 12 Dec 2007 12:29:23 -0500
>>
>>> Is the netif_running() check even required?
>> No, it is not.
>>
>> When a device is brought down, one of the first things
>> that happens is that we wait for all pending NAPI polls
>> to complete, then block any new polls from starting.
> 
> I think this was previously (pre-2.6.24) not the case, which is why e1000 et 
> al
> has this check as well and that's exactly what is causing most of the
> net_rx_action oopses in the first place. Without the netif_running() check
> previously the drivers were just unusable with NAPI and prone to many races 
> with
> down (i.e. touching some ethtool ioctl which wants to do a reset while routing
> small packets at high numbers). that's why we added the netif_running() check 
> in
> the first place :)
> 
> There might be more drivers lurking that need this change...
> 

As a matter of fact, since it's "unlikely()" in net_rx_action() anyway,
I wonder what is the main reason or gain of leaving such a tricky
exception, instead of letting drivers to always decide which is the
best moment for napi_complete()? (Or maybe even, in such a case, they
should call some function with this list_move_tail() if it's so
useful?)

Regards,
Jarek P.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: 2.6.24-rc5-mm1

2007-12-13 Thread Pierre Peiffer

Hi,

My config does not link any more:

...
  CHK include/linux/compile.h
  UPD include/linux/compile.h
  CC  init/version.o
  LD  init/built-in.o
  LD  .tmp_vmlinux1
net/built-in.o: In function `xs_udp_data_ready':
/home/peifferp/containers/kernel/linux-2.6.24-rc5-mm1/net/sunrpc/xprtsock.c:842:
undefined reference to `udp_stats_in6'
/home/peifferp/containers/kernel/linux-2.6.24-rc5-mm1/net/sunrpc/xprtsock.c:846:
undefined reference to `udp_stats_in6'
make[1]: *** [.tmp_vmlinux1] Error 1
make: *** [sub-make] Error 2

After a first look, udp_stats_in6 seems to be defined in ipv6 (file
net/ipv6/udp.c) but I have

CONFIG_IPV6=m
and
CONFIG_SUNRPC=y

So, SUNRPC uses something defined in a module in my case ?

... looking more, this dependency seems to have been introduced by the patch
[UDP]: Restore missing inDatagrams increments
( http://thread.gmane.org/gmane.linux.network/79716/focus=79831 )

(I cc netdev)

I don't know what is the right way to fix this ... ?

P.
Andrew Morton wrote:
> 
> ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.24-rc5/2.6.24-rc5-mm1/
> 
> - If something goes wrong with a PCI device's probing or initialisation, try
>   reverting pci-disable-decoding-during-sizing-of-bars.patch.
> 
> - git-sched was dropped due to breaking suspend-to-RAM.
> 
> - git-block has been restored after having had a few problems
> 
> - git-newsetup.patch was dropped due to conflicts with git-x86
> 
> - git-perfmon.patch is still dropped for the same reason
> 
> - git-kgdb.patch is still dropped for the same reason
> 
> - Please do try to cc the correct developer and mailing list when
>   reporting problems - I'm just buried in bugs over here.
> 
> 
> 
> Boilerplate:
> 
> - See the `hot-fixes' directory for any important updates to this patchset.
> 
> - To fetch an -mm tree using git, use (for example)
> 
>   git-fetch 
> git://git.kernel.org/pub/scm/linux/kernel/git/smurf/linux-trees.git tag 
> v2.6.16-rc2-mm1
>   git-checkout -b local-v2.6.16-rc2-mm1 v2.6.16-rc2-mm1
> 
> - -mm kernel commit activity can be reviewed by subscribing to the
>   mm-commits mailing list.
> 
> echo "subscribe mm-commits" | mail [EMAIL PROTECTED]
> 
> - If you hit a bug in -mm and it is not obvious which patch caused it, it is
>   most valuable if you can perform a bisection search to identify which patch
>   introduced the bug.  Instructions for this process are at
> 
> http://www.zip.com.au/~akpm/linux/patches/stuff/bisecting-mm-trees.txt
> 
>   But beware that this process takes some time (around ten rebuilds and
>   reboots), so consider reporting the bug first and if we cannot immediately
>   identify the faulty patch, then perform the bisection search.
> 
> - When reporting bugs, please try to Cc: the relevant maintainer and mailing
>   list on any email.
> 
> - When reporting bugs in this kernel via email, please also rewrite the
>   email Subject: in some manner to reflect the nature of the bug.  Some
>   developers filter by Subject: when looking for messages to read.
> 
> - Occasional snapshots of the -mm lineup are uploaded to
>   ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/mm/ and are announced on
>   the mm-commits list.  These probably are at least compilable.
> 
> - More-than-daily -mm snapshots may be found at
>   http://userweb.kernel.org/~akpm/mmotm/.  These are almost certainly not
>   compileable.
> 
> 
> 
> Changes since 2.6.24-rc4-mm1:
> 
> 
>  origin.patch
>  git-acpi.patch
>  git-alsa.patch
>  git-agpgart.patch
>  git-arm.patch
>  git-arm-master.patch
>  git-avr32.patch
>  git-cpufreq.patch
>  git-powerpc.patch
>  git-drm.patch
>  git-dvb.patch
>  git-hwmon.patch
>  git-gfs2-nmw.patch
>  git-hid.patch
>  git-hrt.patch
>  git-ieee1394.patch
>  git-infiniband.patch
>  git-input.patch
>  git-jfs.patch
>  git-kbuild.patch
>  git-kvm.patch
>  git-lblnet.patch
>  git-leds.patch
>  git-libata-all.patch
>  git-md-accel.patch
>  git-mips.patch
>  git-mmc.patch
>  git-mtd.patch
>  git-ubi.patch
>  git-net.patch
>  git-netdev-all.patch
>  git-battery.patch
>  git-nfs.patch
>  git-nfsd.patch
>  git-ocfs2.patch
>  git-s390.patch
>  git-sh.patch
>  git-scsi-misc.patch
>  git-scsi-rc-fixes.patch
>  git-block.patch
>  git-unionfs.patch
>  git-v9fs.patch
>  git-watchdog.patch
>  git-wireless.patch
>  git-ipwireless_cs.patch
>  git-x86.patch
>  git-xfs.patch
>  git-cryptodev.patch
>  git-xtensa.patch
> 
>  git trees
> 
> -aio-only-account-i-o-wait-time-in-read_events-if-there-are-active-requests.patch
> -fix-cloneclone_newpid.patch
> -rtc-assure-proper-memory-ordering-with-respect-to-rtc_dev_busy-flag.patch
> -ufs-fix-nexstep-dir-block-size.patch
> -ufs-fix-nexstep-dir-block-size-checkpatch-fixes.patch
> -aoe-properly-initialise-the-request_queues-backing_dev_info.patch
> -mm-backing-devc-fix-percpu_counter_destroy-call-bug-in-bdi_init.patch
> -add-export_symbolksize.patch
> -spi-use-mutex-not-semaphore.patch
> -spi-at25-driver-is-for-eeprom-not-flash.patch
> -

Re: [RFC] net: napi fix

2007-12-13 Thread David Miller

From: Andrew Gallatin <[EMAIL PROTECTED]>
Date: Thu, 13 Dec 2007 09:13:54 -0500

> If the netif_running() check is indeed required to make a device break
> out of napi polling and respond to an ifconfig down, then I think the
> netif_running() check should be moved up into net_rx_action() to avoid
> potential for driver complexity and bugs like the ones you found.

That, or something like it, definitely sounds reasonable and much
better than putting the check into every driver :-)
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC] net: napi fix

2007-12-13 Thread Andrew Gallatin


Joonwoo Park wrote:
> 2007/12/13, Kok, Auke <[EMAIL PROTECTED]>:
>> David Miller wrote:
>>> From: Andrew Gallatin <[EMAIL PROTECTED]>
>>> Date: Wed, 12 Dec 2007 12:29:23 -0500
>>>
 Is the netif_running() check even required?
>>> No, it is not.
>>>
>>> When a device is brought down, one of the first things
>>> that happens is that we wait for all pending NAPI polls
>>> to complete, then block any new polls from starting.
>> I think this was previously (pre-2.6.24) not the case, which is why 
e1000 et al

>> has this check as well and that's exactly what is causing most of the
>> net_rx_action oopses in the first place. Without the netif_running() 
check
>> previously the drivers were just unusable with NAPI and prone to 
many races with
>> down (i.e. touching some ethtool ioctl which wants to do a reset 
while routing
>> small packets at high numbers). that's why we added the 
netif_running() check in

>> the first place :)
>>
>> There might be more drivers lurking that need this change...
>>
>> Auke
>>
>
> Also in my case, without netif_running() check, I cannot do ifconfig 
down.

> It stucked if packet generator was sending packets.

If the netif_running() check is indeed required to make a device break
out of napi polling and respond to an ifconfig down, then I think the
netif_running() check should be moved up into net_rx_action() to avoid
potential for driver complexity and bugs like the ones you found.

Drew
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 2.6.25] [IPV4] Reduce whitespaces in fib_rules.h.

2007-12-13 Thread Denis V. Lunev

There are to many spaces between type and function name in the declaration
of fib rules manipulation routines. Eat them and save a couple of lines.

Signed-off-by: Denis V. Lunev <[EMAIL PROTECTED]>
---
 include/net/fib_rules.h |   16 +++-
 1 files changed, 7 insertions(+), 9 deletions(-)

diff --git a/include/net/fib_rules.h b/include/net/fib_rules.h
index 2364db1..d20db25 100644
--- a/include/net/fib_rules.h
+++ b/include/net/fib_rules.h
@@ -101,14 +101,12 @@ static inline u32 frh_get_table(struct fib_rule_hdr *frh, 
struct nlattr **nla)
return frh->table;
 }
 
-extern int fib_rules_register(struct fib_rules_ops *);
-extern int fib_rules_unregister(struct fib_rules_ops *);
-extern void fib_rules_cleanup_ops(struct fib_rules_ops *);
+extern int fib_rules_register(struct fib_rules_ops *);
+extern int fib_rules_unregister(struct fib_rules_ops *);
+extern void fib_rules_cleanup_ops(struct fib_rules_ops *);
 
-extern int fib_rules_lookup(struct fib_rules_ops *,
-struct flowi *, int flags,
-struct fib_lookup_arg *);
-extern int fib_default_rule_add(struct fib_rules_ops *,
-u32 pref, u32 table,
-u32 flags);
+extern int fib_rules_lookup(struct fib_rules_ops *, struct flowi *, int flags,
+   struct fib_lookup_arg *);
+extern int fib_default_rule_add(struct fib_rules_ops *, u32 pref, u32 table,
+   u32 flags);
 #endif
-- 
1.5.3.rc5

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC] net: napi fix

2007-12-13 Thread Jarek Poplawski

On Thu, Dec 13, 2007 at 05:50:13AM -0800, David Miller wrote:
> From: Jarek Poplawski <[EMAIL PROTECTED]>
> Date: Thu, 13 Dec 2007 14:49:53 +0100
> 
> > As a matter of fact, since it's "unlikely()" in net_rx_action() anyway,
> > I wonder what is the main reason or gain of leaving such a tricky
> > exception, instead of letting drivers to always decide which is the
> > best moment for napi_complete()? (Or maybe even, in such a case, they
> > should call some function with this list_move_tail() if it's so
> > useful?)
> 
> It is the only sane way to synchronize the list manipulations.
> 
> There has to be a way for ->poll() to tell net_rx_action() two things:
> 
> 1) How much work was completed, so we can adjust 'budget'
> 2) Was the NAPI quota exhausted?  So that we know that
>net_rx_action() still "owns" the polling context and
>thus can do the list manipulation safely.
> 
> And these both need to be encoded into one single return value, thus
> the adopted convention that "work == weight" means that the device has
> not done a NAPI complete.

Thanks! So, I've to rethink this all...

Jarek P.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[DCCP] [Announce]: Ack Vector Implementation Notes

2007-12-13 Thread Gerrit Renker

I've been working on making DCCP Ack Vectors more robust, dealing more 
gracefully
with buffer overflow, and fixing two cases which will lead to corrupted buffer 
state.

The encountered problems and implementation strategy are documented on
http://www.erg.abdn.ac.uk/users/gerrit/dccp/notes/ack_vectors/

I'd be glad for feedback, in particular if there are any errors or points which 
may have been
overlooked.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2.6.25] [IPV4] Reduce whitespaces in fib_rules.h.

2007-12-13 Thread Denis V. Lunev

David Miller wrote:
> From: "Denis V. Lunev" <[EMAIL PROTECTED]>
> Date: Thu, 13 Dec 2007 17:18:42 +0300
> 
>> There are to many spaces between type and function name in the declaration
>> of fib rules manipulation routines. Eat them and save a couple of lines.
>>
>> Signed-off-by: Denis V. Lunev <[EMAIL PROTECTED]>
> 
> This is just noise and serves nothing other than to invite potential
> patch conflicts which makes development harder.
> 
> If you happened to be changing these for other reasons, I'd say OK,
> but not like this.
> 
I will add parameter to these calls. The line will be too long after
that. I'd like to separate sense changes from, you perfectly correct,
useless changes :(

Could you still apply it, or I will need to send fully functional set to
you including this?
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2.6.25] [IPV4] Reduce whitespaces in fib_rules.h.

2007-12-13 Thread David Miller

From: "Denis V. Lunev" <[EMAIL PROTECTED]>
Date: Thu, 13 Dec 2007 18:01:32 +0300

> Could you still apply it, or I will need to send fully functional set to
> you including this?

Please combine the changes so that when you change the args
you fixup the whitespace as well.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2.6.25] [IPV4] Reduce whitespaces in fib_rules.h.

2007-12-13 Thread David Miller

From: "Denis V. Lunev" <[EMAIL PROTECTED]>
Date: Thu, 13 Dec 2007 17:18:42 +0300

> There are to many spaces between type and function name in the declaration
> of fib rules manipulation routines. Eat them and save a couple of lines.
> 
> Signed-off-by: Denis V. Lunev <[EMAIL PROTECTED]>

This is just noise and serves nothing other than to invite potential
patch conflicts which makes development harder.

If you happened to be changing these for other reasons, I'd say OK,
but not like this.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Devel] [PATCH 2.6.25] [IPV4] Reduce whitespaces in fib_rules.h.

2007-12-13 Thread David Miller

From: Alexey Dobriyan <[EMAIL PROTECTED]>
Date: Thu, 13 Dec 2007 17:47:07 +0300

> On Thu, Dec 13, 2007 at 05:18:42PM +0300, Denis V. Lunev wrote:
> > There are to many spaces between type and function name in the declaration
> > of fib rules manipulation routines. Eat them and save a couple of lines.
> 
> If this patch is going in, it would be nice to get rid of "extern" as
> well.

The convention in the networking headers is to use extern, and
this is pretty consistently done across the board.

If we are going to do this, which I personally see no reason
for, we should do it across the whole networking.

Consistency is much more important than whatever reason you
could come up with to get rid of the 'extern'.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: 2.6.24-rc5-mm1

2007-12-13 Thread Benjamin Thery

The problem comes from the new macro UDPX_INC_STATS_BH introduced
by Herbert, which was a nice addition to increment the correct 
UDP MIB depending on the socket family, but unfortunately 
the use of this macro from kernel code (I mean code not compiled 
as module) requires that IPv6 is also compiled in kernel 
(CONFIG_IPv6=y) in order to have udp_stats_in6 defined at link 
time.

Benjamin

Pierre Peiffer wrote:
> Hi,
> 
>   My config does not link any more:
> 
> ...
>   CHK include/linux/compile.h
>   UPD include/linux/compile.h
>   CC  init/version.o
>   LD  init/built-in.o
>   LD  .tmp_vmlinux1
> net/built-in.o: In function `xs_udp_data_ready':
> /home/peifferp/containers/kernel/linux-2.6.24-rc5-mm1/net/sunrpc/xprtsock.c:842:
> undefined reference to `udp_stats_in6'
> /home/peifferp/containers/kernel/linux-2.6.24-rc5-mm1/net/sunrpc/xprtsock.c:846:
> undefined reference to `udp_stats_in6'
> make[1]: *** [.tmp_vmlinux1] Error 1
> make: *** [sub-make] Error 2
> 
> After a first look, udp_stats_in6 seems to be defined in ipv6 (file
> net/ipv6/udp.c) but I have
> 
> CONFIG_IPV6=m
> and
> CONFIG_SUNRPC=y
> 
> So, SUNRPC uses something defined in a module in my case ?
> 
> ... looking more, this dependency seems to have been introduced by the patch
> [UDP]: Restore missing inDatagrams increments
> ( http://thread.gmane.org/gmane.linux.network/79716/focus=79831 )
> 
> (I cc netdev)
> 
> I don't know what is the right way to fix this ... ?
> 
> P.
> Andrew Morton wrote:
>> ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.24-rc5/2.6.24-rc5-mm1/
>>
>> - If something goes wrong with a PCI device's probing or initialisation, try
>>   reverting pci-disable-decoding-during-sizing-of-bars.patch.
>>
>> - git-sched was dropped due to breaking suspend-to-RAM.
>>
>> - git-block has been restored after having had a few problems
>>
>> - git-newsetup.patch was dropped due to conflicts with git-x86
>>
>> - git-perfmon.patch is still dropped for the same reason
>>
>> - git-kgdb.patch is still dropped for the same reason
>>
>> - Please do try to cc the correct developer and mailing list when
>>   reporting problems - I'm just buried in bugs over here.
>>
>>
>>
>> Boilerplate:
>>
>> - See the `hot-fixes' directory for any important updates to this patchset.
>>
>> - To fetch an -mm tree using git, use (for example)
>>
>>   git-fetch 
>> git://git.kernel.org/pub/scm/linux/kernel/git/smurf/linux-trees.git tag 
>> v2.6.16-rc2-mm1
>>   git-checkout -b local-v2.6.16-rc2-mm1 v2.6.16-rc2-mm1
>>
>> - -mm kernel commit activity can be reviewed by subscribing to the
>>   mm-commits mailing list.
>>
>> echo "subscribe mm-commits" | mail [EMAIL PROTECTED]
>>
>> - If you hit a bug in -mm and it is not obvious which patch caused it, it is
>>   most valuable if you can perform a bisection search to identify which patch
>>   introduced the bug.  Instructions for this process are at
>>
>> 
>> http://www.zip.com.au/~akpm/linux/patches/stuff/bisecting-mm-trees.txt
>>
>>   But beware that this process takes some time (around ten rebuilds and
>>   reboots), so consider reporting the bug first and if we cannot immediately
>>   identify the faulty patch, then perform the bisection search.
>>
>> - When reporting bugs, please try to Cc: the relevant maintainer and mailing
>>   list on any email.
>>
>> - When reporting bugs in this kernel via email, please also rewrite the
>>   email Subject: in some manner to reflect the nature of the bug.  Some
>>   developers filter by Subject: when looking for messages to read.
>>
>> - Occasional snapshots of the -mm lineup are uploaded to
>>   ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/mm/ and are announced on
>>   the mm-commits list.  These probably are at least compilable.
>>
>> - More-than-daily -mm snapshots may be found at
>>   http://userweb.kernel.org/~akpm/mmotm/.  These are almost certainly not
>>   compileable.
>>
>>
>>
>> Changes since 2.6.24-rc4-mm1:
>>
>>
>>  origin.patch
>>  git-acpi.patch
>>  git-alsa.patch
>>  git-agpgart.patch
>>  git-arm.patch
>>  git-arm-master.patch
>>  git-avr32.patch
>>  git-cpufreq.patch
>>  git-powerpc.patch
>>  git-drm.patch
>>  git-dvb.patch
>>  git-hwmon.patch
>>  git-gfs2-nmw.patch
>>  git-hid.patch
>>  git-hrt.patch
>>  git-ieee1394.patch
>>  git-infiniband.patch
>>  git-input.patch
>>  git-jfs.patch
>>  git-kbuild.patch
>>  git-kvm.patch
>>  git-lblnet.patch
>>  git-leds.patch
>>  git-libata-all.patch
>>  git-md-accel.patch
>>  git-mips.patch
>>  git-mmc.patch
>>  git-mtd.patch
>>  git-ubi.patch
>>  git-net.patch
>>  git-netdev-all.patch
>>  git-battery.patch
>>  git-nfs.patch
>>  git-nfsd.patch
>>  git-ocfs2.patch
>>  git-s390.patch
>>  git-sh.patch
>>  git-scsi-misc.patch
>>  git-scsi-rc-fixes.patch
>>  git-block.patch
>>  git-unionfs.patch
>>  git-v9fs.patch
>>  git-watchdog.patch
>>  git-wireless.patch
>>  git-ipwireless_cs.patch
>>  git-x86.patch
>>  git-xfs.patch
>>  git-cryptodev.patch
>>  git-xtensa.

[PATCH 02/12] [DCCP]: Shift the retransmit timer for active-close into output.c

2007-12-13 Thread Arnaldo Carvalho de Melo

From: Gerrit Renker <[EMAIL PROTECTED]>

When performing active close, RFC 4340, 8.3. requires to retransmit the
Close/CloseReq with a backoff-retransmit timer starting at intially 2 RTTs.

This patch shifts the existing code for active-close retransmit timer
into output.c, so that the retransmit timer is started when the first
Close/CloseReq is sent. Previously, the timer was started when, after
releasing the socket in dccp_close(), the actively-closing side had not yet
reached the CLOSED/TIMEWAIT state.

The patch further reduces the initial timeout from 3 seconds to the required
2 RTTs, where - in absence of a known RTT - the fallback value specified in
RFC 4340, 3.4 is used.

Signed-off-by: Gerrit Renker <[EMAIL PROTECTED]>
Signed-off-by: Ian McDonald <[EMAIL PROTECTED]>
Signed-off-by: Arnaldo Carvalho de Melo <[EMAIL PROTECTED]>
---
 net/dccp/output.c |   13 -
 net/dccp/proto.c  |   18 --
 2 files changed, 12 insertions(+), 19 deletions(-)

diff --git a/net/dccp/output.c b/net/dccp/output.c
index 7caa7f5..e97584a 100644
--- a/net/dccp/output.c
+++ b/net/dccp/output.c
@@ -574,7 +574,18 @@ void dccp_send_close(struct sock *sk, const int active)
dccp_write_xmit(sk, 1);
dccp_skb_entail(sk, skb);
dccp_transmit_skb(sk, skb_clone(skb, prio));
-   /* FIXME do we need a retransmit timer here? */
+   /*
+* Retransmission timer for active-close: RFC 4340, 8.3 requires
+* to retransmit the Close/CloseReq until the CLOSING/CLOSEREQ
+* state can be left. The initial timeout is 2 RTTs.
+* Since RTT measurement is done by the CCIDs, there is no easy
+* way to get an RTT sample. The fallback RTT from RFC 4340, 3.4
+* is too low (200ms); we use a high value to avoid unnecessary
+* retransmissions when the link RTT is > 0.2 seconds.
+* FIXME: Let main module sample RTTs and use that instead.
+*/
+   inet_csk_reset_xmit_timer(sk, ICSK_TIME_RETRANS,
+ DCCP_TIMEOUT_INIT, DCCP_RTO_MAX);
} else
dccp_transmit_skb(sk, skb);
 }
diff --git a/net/dccp/proto.c b/net/dccp/proto.c
index 60f40ec..8a73c8f 100644
--- a/net/dccp/proto.c
+++ b/net/dccp/proto.c
@@ -996,24 +996,6 @@ adjudge_to_death:
if (state != DCCP_CLOSED && sk->sk_state == DCCP_CLOSED)
goto out;
 
-   /*
-* The last release_sock may have processed the CLOSE or RESET
-* packet moving sock to CLOSED state, if not we have to fire
-* the CLOSE/CLOSEREQ retransmission timer, see "8.3. Termination"
-* in draft-ietf-dccp-spec-11. -acme
-*/
-   if (sk->sk_state == DCCP_CLOSING) {
-   /* FIXME: should start at 2 * RTT */
-   /* Timer for repeating the CLOSE/CLOSEREQ until an answer. */
-   inet_csk_reset_xmit_timer(sk, ICSK_TIME_RETRANS,
- inet_csk(sk)->icsk_rto,
- DCCP_RTO_MAX);
-#if 0
-   /* Yeah, we should use sk->sk_prot->orphan_count, etc */
-   dccp_set_state(sk, DCCP_CLOSED);
-#endif
-   }
-
if (sk->sk_state == DCCP_CLOSED)
inet_csk_destroy_sock(sk);
 
-- 
1.5.3.4

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCHES 0/12]: DCCP patches for 2.6.25

2007-12-13 Thread Arnaldo Carvalho de Melo

Hi David,

Please consider pulling from:

master.kernel.org:/pub/scm/linux/kernel/git/acme/net-2.6.25

Best Regards,

- Arnaldo

 Documentation/networking/dccp.txt |6 +
 include/linux/dccp.h  |   24 ++-
 net/dccp/dccp.h   |   10 ++-
 net/dccp/feat.c   |   29 
 net/dccp/feat.h   |   26 ---
 net/dccp/input.c  |   28 +---
 net/dccp/ipv4.c   |8 +-
 net/dccp/ipv6.c   |8 +-
 net/dccp/minisocks.c  |   33 +
 net/dccp/options.c|  126 +-
 net/dccp/output.c |   21 +-
 net/dccp/proto.c  |   34 +++---
 12 files changed, 208 insertions(+), 145 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 06/12] [DCCP]: Allow to parse options on Request Sockets

2007-12-13 Thread Arnaldo Carvalho de Melo

From: Gerrit Renker <[EMAIL PROTECTED]>

The option parsing code currently only parses on full sk's. This causes a 
problem for
options sent during the initial handshake (in particular timestamps and 
feature-negotiation
options). Therefore, this patch extends the option parsing code with an 
additional argument
for request_socks: if it is non-NULL, options are parsed on the request socket, 
otherwise
the normal path (parsing on the sk) is used.

Subsequent patches, which implement feature negotiation during connection 
setup, make use
of this facility.

Signed-off-by: Gerrit Renker <[EMAIL PROTECTED]>
Signed-off-by: Ian McDonald <[EMAIL PROTECTED]>
Signed-off-by: Arnaldo Carvalho de Melo <[EMAIL PROTECTED]>
---
 include/linux/dccp.h |5 +++--
 net/dccp/input.c |6 +++---
 net/dccp/ipv4.c  |8 
 net/dccp/ipv6.c  |8 
 net/dccp/options.c   |   34 +++---
 5 files changed, 37 insertions(+), 24 deletions(-)

diff --git a/include/linux/dccp.h b/include/linux/dccp.h
index c676021..7214031 100644
--- a/include/linux/dccp.h
+++ b/include/linux/dccp.h
@@ -407,8 +407,6 @@ struct dccp_opt_pend {
 
 extern void dccp_minisock_init(struct dccp_minisock *dmsk);
 
-extern int dccp_parse_options(struct sock *sk, struct sk_buff *skb);
-
 struct dccp_request_sock {
struct inet_request_sock dreq_inet_rsk;
__u64dreq_iss;
@@ -423,6 +421,9 @@ static inline struct dccp_request_sock *dccp_rsk(const 
struct request_sock *req)
 
 extern struct inet_timewait_death_row dccp_death_row;
 
+extern int dccp_parse_options(struct sock *sk, struct dccp_request_sock *dreq,
+ struct sk_buff *skb);
+
 struct dccp_options_received {
u32 dccpor_ndp; /* only 24 bits */
u32 dccpor_timestamp;
diff --git a/net/dccp/input.c b/net/dccp/input.c
index dacd4fd..08392ed 100644
--- a/net/dccp/input.c
+++ b/net/dccp/input.c
@@ -369,7 +369,7 @@ int dccp_rcv_established(struct sock *sk, struct sk_buff 
*skb,
if (dccp_check_seqno(sk, skb))
goto discard;
 
-   if (dccp_parse_options(sk, skb))
+   if (dccp_parse_options(sk, NULL, skb))
goto discard;
 
if (DCCP_SKB_CB(skb)->dccpd_ack_seq != DCCP_PKT_WITHOUT_ACK_SEQ)
@@ -427,7 +427,7 @@ static int dccp_rcv_request_sent_state_process(struct sock 
*sk,
goto out_invalid_packet;
}
 
-   if (dccp_parse_options(sk, skb))
+   if (dccp_parse_options(sk, NULL, skb))
goto out_invalid_packet;
 
/* Obtain usec RTT sample from SYN exchange (used by CCID 3) */
@@ -609,7 +609,7 @@ int dccp_rcv_state_process(struct sock *sk, struct sk_buff 
*skb,
/*
 * Step 8: Process options and mark acknowledgeable
 */
-   if (dccp_parse_options(sk, skb))
+   if (dccp_parse_options(sk, NULL, skb))
goto discard;
 
if (dcb->dccpd_ack_seq != DCCP_PKT_WITHOUT_ACK_SEQ)
diff --git a/net/dccp/ipv4.c b/net/dccp/ipv4.c
index db17b83..02fc91c 100644
--- a/net/dccp/ipv4.c
+++ b/net/dccp/ipv4.c
@@ -600,11 +600,12 @@ int dccp_v4_conn_request(struct sock *sk, struct sk_buff 
*skb)
if (req == NULL)
goto drop;
 
-   if (dccp_parse_options(sk, skb))
-   goto drop_and_free;
-
dccp_reqsk_init(req, skb);
 
+   dreq = dccp_rsk(req);
+   if (dccp_parse_options(sk, dreq, skb))
+   goto drop_and_free;
+
if (security_inet_conn_request(sk, skb, req))
goto drop_and_free;
 
@@ -621,7 +622,6 @@ int dccp_v4_conn_request(struct sock *sk, struct sk_buff 
*skb)
 * In fact we defer setting S.GSR, S.SWL, S.SWH to
 * dccp_create_openreq_child.
 */
-   dreq = dccp_rsk(req);
dreq->dreq_isr = dcb->dccpd_seq;
dreq->dreq_iss = dccp_v4_init_sequence(skb);
dreq->dreq_service = service;
diff --git a/net/dccp/ipv6.c b/net/dccp/ipv6.c
index a08e2cb..f42b75c 100644
--- a/net/dccp/ipv6.c
+++ b/net/dccp/ipv6.c
@@ -415,11 +415,12 @@ static int dccp_v6_conn_request(struct sock *sk, struct 
sk_buff *skb)
if (req == NULL)
goto drop;
 
-   if (dccp_parse_options(sk, skb))
-   goto drop_and_free;
-
dccp_reqsk_init(req, skb);
 
+   dreq = dccp_rsk(req);
+   if (dccp_parse_options(sk, dreq, skb))
+   goto drop_and_free;
+
if (security_inet_conn_request(sk, skb, req))
goto drop_and_free;
 
@@ -449,7 +450,6 @@ static int dccp_v6_conn_request(struct sock *sk, struct 
sk_buff *skb)
 *   In fact we defer setting S.GSR, S.SWL, S.SWH to
 *   dccp_create_openreq_child.
 */
-   dreq = dccp_rsk(req);
dreq->dreq_isr = dcb->dccpd_seq;
dreq->dreq_iss = dccp_v6_init_sequence(skb);
dreq->dreq_servi

[PATCH 09/12] [DCCP]: Support inserting options during the 3-way handshake

2007-12-13 Thread Arnaldo Carvalho de Melo

From: Gerrit Renker <[EMAIL PROTECTED]>

This provides a separate routine to insert options during the initial handshake.
The main purpose is to conduct feature negotiation, for the moment the only user
is the timestamp echo needed for the (CCID3) handshake RTT sample.

Padding of options has been put into a small separate routine, to be shared 
among
the two functions. This could also be used as a generic routine to finish 
inserting
options.

Also removed an `XXX' comment since its content was obvious.

Signed-off-by: Gerrit Renker <[EMAIL PROTECTED]>
Signed-off-by: Ian McDonald <[EMAIL PROTECTED]>
Signed-off-by: Arnaldo Carvalho de Melo <[EMAIL PROTECTED]>
---
 net/dccp/dccp.h|1 +
 net/dccp/options.c |   32 ++--
 net/dccp/output.c  |2 +-
 3 files changed, 24 insertions(+), 11 deletions(-)

diff --git a/net/dccp/dccp.h b/net/dccp/dccp.h
index 3af3320..b138e20 100644
--- a/net/dccp/dccp.h
+++ b/net/dccp/dccp.h
@@ -428,6 +428,7 @@ static inline int dccp_ack_pending(const struct sock *sk)
 }
 
 extern int dccp_insert_options(struct sock *sk, struct sk_buff *skb);
+extern int dccp_insert_options_rsk(struct dccp_request_sock*, struct sk_buff*);
 extern int dccp_insert_option_elapsed_time(struct sock *sk,
struct sk_buff *skb,
u32 elapsed_time);
diff --git a/net/dccp/options.c b/net/dccp/options.c
index 0c996d8..bedb5da 100644
--- a/net/dccp/options.c
+++ b/net/dccp/options.c
@@ -537,6 +537,18 @@ static int dccp_insert_options_feat(struct sock *sk, 
struct sk_buff *skb)
return 0;
 }
 
+/* The length of all options needs to be a multiple of 4 (5.8) */
+static void dccp_insert_option_padding(struct sk_buff *skb)
+{
+   int padding = DCCP_SKB_CB(skb)->dccpd_opt_len % 4;
+
+   if (padding != 0) {
+   padding = 4 - padding;
+   memset(skb_push(skb, padding), 0, padding);
+   DCCP_SKB_CB(skb)->dccpd_opt_len += padding;
+   }
+}
+
 int dccp_insert_options(struct sock *sk, struct sk_buff *skb)
 {
struct dccp_sock *dp = dccp_sk(sk);
@@ -580,18 +592,18 @@ int dccp_insert_options(struct sock *sk, struct sk_buff 
*skb)
dccp_insert_option_timestamp_echo(dp, NULL, skb))
return -1;
 
-   /* XXX: insert other options when appropriate */
+   dccp_insert_option_padding(skb);
+   return 0;
+}
 
-   if (DCCP_SKB_CB(skb)->dccpd_opt_len != 0) {
-   /* The length of all options has to be a multiple of 4 */
-   int padding = DCCP_SKB_CB(skb)->dccpd_opt_len % 4;
+int dccp_insert_options_rsk(struct dccp_request_sock *dreq, struct sk_buff 
*skb)
+{
+   DCCP_SKB_CB(skb)->dccpd_opt_len = 0;
 
-   if (padding != 0) {
-   padding = 4 - padding;
-   memset(skb_push(skb, padding), 0, padding);
-   DCCP_SKB_CB(skb)->dccpd_opt_len += padding;
-   }
-   }
+   if (dreq->dreq_timestamp_echo != 0 &&
+   dccp_insert_option_timestamp_echo(NULL, dreq, skb))
+   return -1;
 
+   dccp_insert_option_padding(skb);
return 0;
 }
diff --git a/net/dccp/output.c b/net/dccp/output.c
index b2e1791..5589a5e 100644
--- a/net/dccp/output.c
+++ b/net/dccp/output.c
@@ -303,7 +303,7 @@ struct sk_buff *dccp_make_response(struct sock *sk, struct 
dst_entry *dst,
DCCP_SKB_CB(skb)->dccpd_type = DCCP_PKT_RESPONSE;
DCCP_SKB_CB(skb)->dccpd_seq  = dreq->dreq_iss;
 
-   if (dccp_insert_options(sk, skb)) {
+   if (dccp_insert_options_rsk(dreq, skb)) {
kfree_skb(skb);
return NULL;
}
-- 
1.5.3.4

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Devel] [PATCH 2.6.25] [IPV4] Reduce whitespaces in fib_rules.h.

2007-12-13 Thread Alexey Dobriyan

On Thu, Dec 13, 2007 at 05:18:42PM +0300, Denis V. Lunev wrote:
> There are to many spaces between type and function name in the declaration
> of fib rules manipulation routines. Eat them and save a couple of lines.

If this patch is going in, it would be nice to get rid of "extern" as
well.

Alexey, who once removed all externs from prototypes
and got 4 seconds compilation speedup.

> --- a/include/net/fib_rules.h
> +++ b/include/net/fib_rules.h
> @@ -101,14 +101,12 @@ static inline u32 frh_get_table(struct fib_rule_hdr 
> *frh, struct nlattr **nla)
>   return frh->table;
>  }
>  
> -extern int   fib_rules_register(struct fib_rules_ops *);
> -extern int   fib_rules_unregister(struct fib_rules_ops *);
> -extern void fib_rules_cleanup_ops(struct fib_rules_ops 
> *);
> +extern int fib_rules_register(struct fib_rules_ops *);
> +extern int fib_rules_unregister(struct fib_rules_ops *);
> +extern void fib_rules_cleanup_ops(struct fib_rules_ops *);
>  
> -extern int   fib_rules_lookup(struct fib_rules_ops *,
> -  struct flowi *, int flags,
> -  struct fib_lookup_arg *);
> -extern int   fib_default_rule_add(struct fib_rules_ops *,
> -  u32 pref, u32 table,
> -  u32 flags);
> +extern int fib_rules_lookup(struct fib_rules_ops *, struct flowi *, int 
> flags,
> + struct fib_lookup_arg *);
> +extern int fib_default_rule_add(struct fib_rules_ops *, u32 pref, u32 table,
> + u32 flags);
>  #endif

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 10/12] [DCCP]: Remove unused and redundant validation functions

2007-12-13 Thread Arnaldo Carvalho de Melo

From: Gerrit Renker <[EMAIL PROTECTED]>

This removes two inlines which were both called in a single function only:

 1) dccp_feat_change() is always called with either DCCPO_CHANGE_L or 
DCCPO_CHANGE_R as argument
* from dccp_set_socktopt_change() via do_dccp_setsockopt() with 
DCCP_SOCKOPT_CHANGE_R/L
* from __dccp_feat_init() via dccp_feat_init() also with 
DCCP_SOCKOPT_CHANGE_R/L.

Hence the dccp_feat_is_valid_type() is completely unnecessary and always 
returns true.

 2) Due to (1), the length test reduces to 'len >= 4', which in turn makes
dccp_feat_is_valid_length() unnecessary.

Furthermore, the inline function dccp_feat_is_reserved() was unfolded,
since only called in a single place.

Signed-off-by: Gerrit Renker <[EMAIL PROTECTED]>
Signed-off-by: Ian McDonald <[EMAIL PROTECTED]>
Signed-off-by: Arnaldo Carvalho de Melo <[EMAIL PROTECTED]>
---
 net/dccp/feat.c |   12 
 net/dccp/feat.h |   26 --
 2 files changed, 4 insertions(+), 34 deletions(-)

diff --git a/net/dccp/feat.c b/net/dccp/feat.c
index 5ebdd86..084744e 100644
--- a/net/dccp/feat.c
+++ b/net/dccp/feat.c
@@ -24,11 +24,7 @@ int dccp_feat_change(struct dccp_minisock *dmsk, u8 type, u8 
feature,
 
dccp_feat_debug(type, feature, *val);
 
-   if (!dccp_feat_is_valid_type(type)) {
-   DCCP_WARN("option type %d invalid in negotiation\n", type);
-   return 1;
-   }
-   if (!dccp_feat_is_valid_length(type, feature, len)) {
+   if (len > 3) {
DCCP_WARN("invalid length %d\n", len);
return 1;
}
@@ -637,12 +633,12 @@ const char *dccp_feat_name(const u8 feat)
[DCCPF_MIN_CSUM_COVER]  = "Min. Csum Coverage",
[DCCPF_DATA_CHECKSUM]   = "Send Data Checksum",
};
+   if (feat > DCCPF_DATA_CHECKSUM && feat < DCCPF_MIN_CCID_SPECIFIC)
+   return feature_names[DCCPF_RESERVED];
+
if (feat >= DCCPF_MIN_CCID_SPECIFIC)
return "CCID-specific";
 
-   if (dccp_feat_is_reserved(feat))
-   return feature_names[DCCPF_RESERVED];
-
return feature_names[feat];
 }
 
diff --git a/net/dccp/feat.h b/net/dccp/feat.h
index 177f7de..e27 100644
--- a/net/dccp/feat.h
+++ b/net/dccp/feat.h
@@ -14,32 +14,6 @@
 #include 
 #include "dccp.h"
 
-static inline int dccp_feat_is_valid_length(u8 type, u8 feature, u8 len)
-{
-   /* sec. 6.1: Confirm has at least length 3,
-* sec. 6.2: Change  has at least length 4 */
-   if (len < 3)
-   return 1;
-   if (len < 4  && (type == DCCPO_CHANGE_L || type == DCCPO_CHANGE_R))
-   return 1;
-   /* XXX: add per-feature length validation (sec. 6.6.8) */
-   return 0;
-}
-
-static inline int dccp_feat_is_reserved(const u8 feat)
-{
-   return (feat > DCCPF_DATA_CHECKSUM &&
-   feat < DCCPF_MIN_CCID_SPECIFIC) ||
-   feat == DCCPF_RESERVED;
-}
-
-/* feature negotiation knows only these four option types (RFC 4340, sec. 6) */
-static inline int dccp_feat_is_valid_type(const u8 optnum)
-{
-   return optnum >= DCCPO_CHANGE_L && optnum <= DCCPO_CONFIRM_R;
-
-}
-
 #ifdef CONFIG_IP_DCCP_DEBUG
 extern const char *dccp_feat_typename(const u8 type);
 extern const char *dccp_feat_name(const u8 feat);
-- 
1.5.3.4

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 12/12] [DCCP]: Ignore feature negotiation on Data packets

2007-12-13 Thread Arnaldo Carvalho de Melo

From: Gerrit Renker <[EMAIL PROTECTED]>

This implements [RFC 4340, p. 32]: "any feature negotiation options received
on DCCP-Data packets MUST be ignored".

Also added a FIXME for further processing, since the code currently (wrongly)
classifies empty Confirm options as invalid - this needs to be resolved in
a separate patch.

Signed-off-by: Gerrit Renker <[EMAIL PROTECTED]>
Signed-off-by: Ian McDonald <[EMAIL PROTECTED]>
Signed-off-by: Arnaldo Carvalho de Melo <[EMAIL PROTECTED]>
---
 net/dccp/options.c |6 +-
 1 files changed, 5 insertions(+), 1 deletions(-)

diff --git a/net/dccp/options.c b/net/dccp/options.c
index bedb5da..d2a84a2 100644
--- a/net/dccp/options.c
+++ b/net/dccp/options.c
@@ -132,6 +132,8 @@ int dccp_parse_options(struct sock *sk, struct 
dccp_request_sock *dreq,
case DCCPO_CHANGE_L:
/* fall through */
case DCCPO_CHANGE_R:
+   if (pkt_type == DCCP_PKT_DATA)
+   break;
if (len < 2)
goto out_invalid_option;
rc = dccp_feat_change_recv(sk, opt, *value, value + 1,
@@ -148,7 +150,9 @@ int dccp_parse_options(struct sock *sk, struct 
dccp_request_sock *dreq,
case DCCPO_CONFIRM_L:
/* fall through */
case DCCPO_CONFIRM_R:
-   if (len < 2)
+   if (pkt_type == DCCP_PKT_DATA)
+   break;
+   if (len < 2)/* FIXME this disallows empty confirm */
goto out_invalid_option;
if (dccp_feat_confirm_recv(sk, opt, *value,
   value + 1, len - 1))
-- 
1.5.3.4

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 08/12] [DCCP]: Handle timestamps on Request/Response exchange separately

2007-12-13 Thread Arnaldo Carvalho de Melo

From: Gerrit Renker <[EMAIL PROTECTED]>

In DCCP, timestamps can occur on packets anytime, CCID3 uses a timestamp(/echo) 
on the Request/Response
exchange. This patch addresses the following situation:
* timestamps are recorded on the listening socket;
* Responses are sent from dccp_request_sockets;
* suppose two connections reach the listening socket with very small 
time in between:
* the first timestamp value gets overwritten by the second connection 
request.

This is not really good, so this patch separates timestamps into
 * those which are received by the server during the initial handshake (on 
dccp_request_sock);
 * those which are received by the client or the client after connection 
establishment.

As before, a timestamp of 0 is regarded as indicating that no (meaningful) 
timestamp has been
received (in addition, a warning message is printed if hosts send 0-valued 
timestamps).

The timestamp-echoing now works as follows:
 * when a timestamp is present on the initial Request, it is placed into dreq, 
due to the
   call to dccp_parse_options in dccp_v{4,6}_conn_request;
 * when a timestamp is present on the Ack leading from RESPOND => OPEN, it is 
copied over
   from the request_sock into the child cocket in dccp_create_openreq_child;
 * timestamps received on an (established) dccp_sock are treated as before.

Since Elapsed Time is measured in hundredths of milliseconds (13.2), the new 
dccp_timestamp()
function is used, as it is expected that the time between receiving the 
timestamp and
sending the timestamp echo will be very small against the wrap-around time. As 
a byproduct,
this allows smaller timestamping-time fields.

Furthermore, inserting the Timestamp Echo option has been taken out of the 
block starting with
'!dccp_packet_without_ack()', since Timestamp Echo can be carried on any packet 
(5.8 and 13.3).

Signed-off-by: Gerrit Renker <[EMAIL PROTECTED]>
Acked-by: Ian McDonald <[EMAIL PROTECTED]>
Signed-off-by: Arnaldo Carvalho de Melo <[EMAIL PROTECTED]>
---
 include/linux/dccp.h |   16 -
 net/dccp/minisocks.c |   21 +++---
 net/dccp/options.c   |   56 -
 3 files changed, 63 insertions(+), 30 deletions(-)

diff --git a/include/linux/dccp.h b/include/linux/dccp.h
index 7214031..484e45c 100644
--- a/include/linux/dccp.h
+++ b/include/linux/dccp.h
@@ -407,11 +407,23 @@ struct dccp_opt_pend {
 
 extern void dccp_minisock_init(struct dccp_minisock *dmsk);
 
+/**
+ * struct dccp_request_sock  -  represent DCCP-specific connection request
+ * @dreq_inet_rsk: structure inherited from
+ * @dreq_iss: initial sequence number sent on the Response (RFC 4340, 7.1)
+ * @dreq_isr: initial sequence number received on the Request
+ * @dreq_service: service code present on the Request (there is just one)
+ * The following two fields are analogous to the ones in dccp_sock:
+ * @dreq_timestamp_echo: last received timestamp to echo (13.1)
+ * @dreq_timestamp_echo: the time of receiving the last @dreq_timestamp_echo
+ */
 struct dccp_request_sock {
struct inet_request_sock dreq_inet_rsk;
__u64dreq_iss;
__u64dreq_isr;
__be32   dreq_service;
+   __u32dreq_timestamp_echo;
+   __u32dreq_timestamp_time;
 };
 
 static inline struct dccp_request_sock *dccp_rsk(const struct request_sock 
*req)
@@ -477,8 +489,8 @@ struct dccp_ackvec;
  * @dccps_gar - greatest valid ack number received on a non-Sync; initialized 
to %dccps_iss
  * @dccps_service - first (passive sock) or unique (active sock) service code
  * @dccps_service_list - second .. last service code on passive socket
- * @dccps_timestamp_time - time of latest TIMESTAMP option
  * @dccps_timestamp_echo - latest timestamp received on a TIMESTAMP option
+ * @dccps_timestamp_time - time of receiving latest @dccps_timestamp_echo
  * @dccps_l_ack_ratio - feature-local Ack Ratio
  * @dccps_r_ack_ratio - feature-remote Ack Ratio
  * @dccps_pcslen - sender   partial checksum coverage (via sockopt)
@@ -514,8 +526,8 @@ struct dccp_sock {
__u64   dccps_gar;
__be32  dccps_service;
struct dccp_service_list*dccps_service_list;
-   ktime_t dccps_timestamp_time;
__u32   dccps_timestamp_echo;
+   __u32   dccps_timestamp_time;
__u16   dccps_l_ack_ratio;
__u16   dccps_r_ack_ratio;
__u16   dccps_pcslen;
diff --git a/net/dccp/minisocks.c b/net/dccp/minisocks.c
index b1d5da6..027d181 100644
--- a/net/dccp/minisocks.c
+++ b/net/dccp/minisocks.c
@@ -117,11 +117,13 @@ struct sock *dccp_create_openreq_child(struct sock *sk,
struct dccp_sock *newdp = dccp_sk(newsk);

[PATCH 07/12] [DCCP]: Add (missing) option parsing to request_sock processing

2007-12-13 Thread Arnaldo Carvalho de Melo

From: Gerrit Renker <[EMAIL PROTECTED]>

This adds option-parsing code to processing of Acks in the listening state
on request_socks on the server, serving two purposes
 (i)  resolves a FIXME (removed);
 (ii) paves the way for feature-negotiation during connection-setup.

There is an intended subtlety here with regard to dccp_check_req:

 Parsing options happens only after testing whether the received packet is
 a retransmitted Request.  Otherwise, if the Request contained (a possibly
 large number of) feature-negotiation options, recomputing state would have to
 happen each time a retransmitted Request arrives, which opens the door to an
 easy DoS attack.  Since in a genuine retransmission the options should not be
 different from the original, reusing the already computed state seems better.

 The other point is - if there are timestamp options on the Request, they will
 not be answered; which means that in the presence of retransmission (likely
 due to loss and/or other problems), the use of Request/Response RTT sampling
 is suspended, so that startup problems here do not propagate.

Signed-off-by: Gerrit Renker <[EMAIL PROTECTED]>
Signed-off-by: Ian McDonald <[EMAIL PROTECTED]>
Signed-off-by: Arnaldo Carvalho de Melo <[EMAIL PROTECTED]>
---
 net/dccp/minisocks.c |   12 ++--
 1 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/net/dccp/minisocks.c b/net/dccp/minisocks.c
index 831b76e..b1d5da6 100644
--- a/net/dccp/minisocks.c
+++ b/net/dccp/minisocks.c
@@ -200,10 +200,10 @@ struct sock *dccp_check_req(struct sock *sk, struct 
sk_buff *skb,
struct request_sock **prev)
 {
struct sock *child = NULL;
+   struct dccp_request_sock *dreq = dccp_rsk(req);
 
/* Check for retransmitted REQUEST */
if (dccp_hdr(skb)->dccph_type == DCCP_PKT_REQUEST) {
-   struct dccp_request_sock *dreq = dccp_rsk(req);
 
if (after48(DCCP_SKB_CB(skb)->dccpd_seq, dreq->dreq_isr)) {
dccp_pr_debug("Retransmitted REQUEST\n");
@@ -227,22 +227,22 @@ struct sock *dccp_check_req(struct sock *sk, struct 
sk_buff *skb,
goto drop;
 
/* Invalid ACK */
-   if (DCCP_SKB_CB(skb)->dccpd_ack_seq != dccp_rsk(req)->dreq_iss) {
+   if (DCCP_SKB_CB(skb)->dccpd_ack_seq != dreq->dreq_iss) {
dccp_pr_debug("Invalid ACK number: ack_seq=%llu, "
  "dreq_iss=%llu\n",
  (unsigned long long)
  DCCP_SKB_CB(skb)->dccpd_ack_seq,
- (unsigned long long)
- dccp_rsk(req)->dreq_iss);
+ (unsigned long long) dreq->dreq_iss);
goto drop;
}
 
+   if (dccp_parse_options(sk, dreq, skb))
+goto drop;
+
child = inet_csk(sk)->icsk_af_ops->syn_recv_sock(sk, skb, req, NULL);
if (child == NULL)
goto listen_overflow;
 
-   /* FIXME: deal with options */
-
inet_csk_reqsk_queue_unlink(sk, req, prev);
inet_csk_reqsk_queue_removed(sk, req);
inet_csk_reqsk_queue_add(sk, req, child);
-- 
1.5.3.4

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 05/12] [DCCP]: Collapse repeated `len' statements into one

2007-12-13 Thread Arnaldo Carvalho de Melo

From: Gerrit Renker <[EMAIL PROTECTED]>

This replaces 4 individual assignments for `len' with a single
one, placed where the control flow of those 4 leads to.

Signed-off-by: Gerrit Renker <[EMAIL PROTECTED]>
Signed-off-by: Ian McDonald <[EMAIL PROTECTED]>
Signed-off-by: Arnaldo Carvalho de Melo <[EMAIL PROTECTED]>
---
 net/dccp/proto.c |5 +
 1 files changed, 1 insertions(+), 4 deletions(-)

diff --git a/net/dccp/proto.c b/net/dccp/proto.c
index cc87c50..0bed4a6 100644
--- a/net/dccp/proto.c
+++ b/net/dccp/proto.c
@@ -657,19 +657,15 @@ static int do_dccp_getsockopt(struct sock *sk, int level, 
int optname,
   (__be32 __user *)optval, optlen);
case DCCP_SOCKOPT_GET_CUR_MPS:
val = dp->dccps_mss_cache;
-   len = sizeof(val);
break;
case DCCP_SOCKOPT_SERVER_TIMEWAIT:
val = dp->dccps_server_timewait;
-   len = sizeof(val);
break;
case DCCP_SOCKOPT_SEND_CSCOV:
val = dp->dccps_pcslen;
-   len = sizeof(val);
break;
case DCCP_SOCKOPT_RECV_CSCOV:
val = dp->dccps_pcrlen;
-   len = sizeof(val);
break;
case 128 ... 191:
return ccid_hc_rx_getsockopt(dp->dccps_hc_rx_ccid, sk, optname,
@@ -681,6 +677,7 @@ static int do_dccp_getsockopt(struct sock *sk, int level, 
int optname,
return -ENOPROTOOPT;
}
 
+   len = sizeof(val);
if (put_user(len, optlen) || copy_to_user(optval, &val, len))
return -EFAULT;
 
-- 
1.5.3.4

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 04/12] [DCCP]: Support for server holding timewait state

2007-12-13 Thread Arnaldo Carvalho de Melo

From: Gerrit Renker <[EMAIL PROTECTED]>

This adds a socket option and signalling support for the case where the server
holds timewait state on closing the connection, as described in RFC 4340, 8.3.

Since holding timewait state at the server is the non-usual case, it is enabled
via a socket option. Documentation for this socket option has been added.

The setsockopt statement has been made resilient against different possible 
cases
of expressing boolean `true' values using a suggestion by Ian McDonald.

Signed-off-by: Gerrit Renker <[EMAIL PROTECTED]>
Signed-off-by: Ian McDonald <[EMAIL PROTECTED]>
Signed-off-by: Arnaldo Carvalho de Melo <[EMAIL PROTECTED]>
---
 Documentation/networking/dccp.txt |6 ++
 include/linux/dccp.h  |3 +++
 net/dccp/output.c |6 --
 net/dccp/proto.c  |   13 -
 4 files changed, 25 insertions(+), 3 deletions(-)

diff --git a/Documentation/networking/dccp.txt 
b/Documentation/networking/dccp.txt
index d76905a..39131a3 100644
--- a/Documentation/networking/dccp.txt
+++ b/Documentation/networking/dccp.txt
@@ -57,6 +57,12 @@ can be set before calling bind().
 DCCP_SOCKOPT_GET_CUR_MPS is read-only and retrieves the current maximum packet
 size (application payload size) in bytes, see RFC 4340, section 14.
 
+DCCP_SOCKOPT_SERVER_TIMEWAIT enables the server (listening socket) to hold
+timewait state when closing the connection (RFC 4340, 8.3). The usual case is
+that the closing server sends a CloseReq, whereupon the client holds timewait
+state. When this boolean socket option is on, the server sends a Close instead
+and will enter TIMEWAIT. This option must be set after accept() returns.
+
 DCCP_SOCKOPT_SEND_CSCOV and DCCP_SOCKOPT_RECV_CSCOV are used for setting the
 partial checksum coverage (RFC 4340, sec. 9.2). The default is that checksums
 always cover the entire packet and that only fully covered application data is
diff --git a/include/linux/dccp.h b/include/linux/dccp.h
index 312b989..c676021 100644
--- a/include/linux/dccp.h
+++ b/include/linux/dccp.h
@@ -205,6 +205,7 @@ struct dccp_so_feat {
 #define DCCP_SOCKOPT_CHANGE_L  3
 #define DCCP_SOCKOPT_CHANGE_R  4
 #define DCCP_SOCKOPT_GET_CUR_MPS   5
+#define DCCP_SOCKOPT_SERVER_TIMEWAIT   6
 #define DCCP_SOCKOPT_SEND_CSCOV10
 #define DCCP_SOCKOPT_RECV_CSCOV11
 #define DCCP_SOCKOPT_CCID_RX_INFO  128
@@ -492,6 +493,7 @@ struct dccp_ackvec;
  * @dccps_role - role of this sock, one of %dccp_role
  * @dccps_hc_rx_insert_options - receiver wants to add options when acking
  * @dccps_hc_tx_insert_options - sender wants to add options when sending
+ * @dccps_server_timewait - server holds timewait state on close (RFC 4340, 
8.3)
  * @dccps_xmit_timer - timer for when CCID is not ready to send
  * @dccps_syn_rtt - RTT sample from Request/Response exchange (in usecs)
  */
@@ -528,6 +530,7 @@ struct dccp_sock {
enum dccp_role  dccps_role:2;
__u8dccps_hc_rx_insert_options:1;
__u8dccps_hc_tx_insert_options:1;
+   __u8dccps_server_timewait:1;
struct timer_list   dccps_xmit_timer;
 };
 
diff --git a/net/dccp/output.c b/net/dccp/output.c
index e97584a..b2e1791 100644
--- a/net/dccp/output.c
+++ b/net/dccp/output.c
@@ -567,8 +567,10 @@ void dccp_send_close(struct sock *sk, const int active)
 
/* Reserve space for headers and prepare control bits. */
skb_reserve(skb, sk->sk_prot->max_header);
-   DCCP_SKB_CB(skb)->dccpd_type = dp->dccps_role == DCCP_ROLE_CLIENT ?
-   DCCP_PKT_CLOSE : DCCP_PKT_CLOSEREQ;
+   if (dp->dccps_role == DCCP_ROLE_SERVER && !dp->dccps_server_timewait)
+   DCCP_SKB_CB(skb)->dccpd_type = DCCP_PKT_CLOSEREQ;
+   else
+   DCCP_SKB_CB(skb)->dccpd_type = DCCP_PKT_CLOSE;
 
if (active) {
dccp_write_xmit(sk, 1);
diff --git a/net/dccp/proto.c b/net/dccp/proto.c
index 8a73c8f..cc87c50 100644
--- a/net/dccp/proto.c
+++ b/net/dccp/proto.c
@@ -551,6 +551,12 @@ static int do_dccp_setsockopt(struct sock *sk, int level, 
int optname,
 (struct dccp_so_feat 
__user *)
 optval);
break;
+   case DCCP_SOCKOPT_SERVER_TIMEWAIT:
+   if (dp->dccps_role != DCCP_ROLE_SERVER)
+   err = -EOPNOTSUPP;
+   else
+   dp->dccps_server_timewait = (val != 0);
+   break;
case DCCP_SOCKOPT_SEND_CSCOV:   /* sender side, RFC 4340, sec. 9.2 */
if (val < 0 || val > 15)
err = -EINVAL;
@@ -653,6 +659,10 @@ static int do_dccp_getsockopt(struct sock *sk, int level, 
int optname,
val = dp->dccps_mss_cache;
len = siz

[PATCH 01/12] [DCCP]: Perform SHUT_RD and SHUT_WR on receiving close

2007-12-13 Thread Arnaldo Carvalho de Melo

From: Gerrit Renker <[EMAIL PROTECTED]>

This patch performs two changes:

1) Close the write-end in addition to the read-end when a fin-like segment
  (Close or CloseReq) is received by DCCP. This accounts for the fact that DCCP,
  in contrast to TCP, does not have a half-close. RFC 4340 says in this respect
  that when a fin-like segment has been sent there is no guarantee at all that
  any   further data will be processed.
  Thus this patch performs SHUT_WR in addition to the SHUT_RD when a fin-like
  segment is encountered.

2) Minor change: I noted that code appears twice in different places and think 
it
   makes sense to put this into a self-contained function (dccp_enqueue()).

Signed-off-by: Gerrit Renker <[EMAIL PROTECTED]>
Signed-off-by: Ian McDonald <[EMAIL PROTECTED]>
Signed-off-by: Arnaldo Carvalho de Melo <[EMAIL PROTECTED]>
---
 net/dccp/input.c |   22 +++---
 1 files changed, 15 insertions(+), 7 deletions(-)

diff --git a/net/dccp/input.c b/net/dccp/input.c
index decf2f2..dacd4fd 100644
--- a/net/dccp/input.c
+++ b/net/dccp/input.c
@@ -22,16 +22,27 @@
 /* rate-limit for syncs in reply to sequence-invalid packets; RFC 4340, 7.5.4 
*/
 int sysctl_dccp_sync_ratelimit __read_mostly = HZ / 8;
 
-static void dccp_fin(struct sock *sk, struct sk_buff *skb)
+static void dccp_enqueue_skb(struct sock *sk, struct sk_buff *skb)
 {
-   sk->sk_shutdown |= RCV_SHUTDOWN;
-   sock_set_flag(sk, SOCK_DONE);
__skb_pull(skb, dccp_hdr(skb)->dccph_doff * 4);
__skb_queue_tail(&sk->sk_receive_queue, skb);
skb_set_owner_r(skb, sk);
sk->sk_data_ready(sk, 0);
 }
 
+static void dccp_fin(struct sock *sk, struct sk_buff *skb)
+{
+   /*
+* On receiving Close/CloseReq, both RD/WR shutdown are performed.
+* RFC 4340, 8.3 says that we MAY send further Data/DataAcks after
+* receiving the closing segment, but there is no guarantee that such
+* data will be processed at all.
+*/
+   sk->sk_shutdown = SHUTDOWN_MASK;
+   sock_set_flag(sk, SOCK_DONE);
+   dccp_enqueue_skb(sk, skb);
+}
+
 static int dccp_rcv_close(struct sock *sk, struct sk_buff *skb)
 {
int queued = 0;
@@ -282,10 +293,7 @@ static int __dccp_rcv_established(struct sock *sk, struct 
sk_buff *skb,
 * - sk_shutdown == RCV_SHUTDOWN, use Code 1, "Not Listening"
 * - sk_receive_queue is full, use Code 2, "Receive Buffer"
 */
-   __skb_pull(skb, dh->dccph_doff * 4);
-   __skb_queue_tail(&sk->sk_receive_queue, skb);
-   skb_set_owner_r(skb, sk);
-   sk->sk_data_ready(sk, 0);
+   dccp_enqueue_skb(sk, skb);
return 0;
case DCCP_PKT_ACK:
goto discard;
-- 
1.5.3.4

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 03/12] [DCCP]: Use maximum-RTO backoff from DCCP spec

2007-12-13 Thread Arnaldo Carvalho de Melo

From: Gerrit Renker <[EMAIL PROTECTED]>

This removes another Fixme, using the TCP maximum RTO rather than the value
specified by the DCCP specification. Across the sections in RFC 4340, 64
seconds is consistently suggested as maximum RTO backoff value; and this is
the value which is now used.

I have checked both termination cases for retransmissions of Close/CloseReq:
with the default value 15 of `retries2', and an initial icsk_retransmit = 0,
it takes about 614 seconds to declare a non-responding peer as dead, after
which the final terminating Reset is sent. With the TCP maximum RTO value of
120 seconds it takes (as might be expected) almost twice as long, about 23
minutes.

Signed-off-by: Gerrit Renker <[EMAIL PROTECTED]>
Signed-off-by: Ian McDonald <[EMAIL PROTECTED]>
Signed-off-by: Arnaldo Carvalho de Melo <[EMAIL PROTECTED]>
---
 net/dccp/dccp.h |9 -
 1 files changed, 8 insertions(+), 1 deletions(-)

diff --git a/net/dccp/dccp.h b/net/dccp/dccp.h
index 07dcbe7..3af3320 100644
--- a/net/dccp/dccp.h
+++ b/net/dccp/dccp.h
@@ -72,7 +72,14 @@ extern void dccp_time_wait(struct sock *sk, int state, int 
timeo);
 /* RFC 1122, 4.2.3.1 initial RTO value */
 #define DCCP_TIMEOUT_INIT ((unsigned)(3 * HZ))
 
-#define DCCP_RTO_MAX ((unsigned)(120 * HZ)) /* FIXME: using TCP value */
+/*
+ * The maximum back-off value for retransmissions. This is needed for
+ *  - retransmitting client-Requests (sec. 8.1.1),
+ *  - retransmitting Close/CloseReq when closing (sec. 8.3),
+ *  - feature-negotiation retransmission (sec. 6.6.3),
+ *  - Acks in client-PARTOPEN state (sec. 8.1.5).
+ */
+#define DCCP_RTO_MAX ((unsigned)(64 * HZ))
 
 /*
  * RTT sampling: sanity bounds and fallback RTT value from RFC 4340, section 
3.4
-- 
1.5.3.4

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 11/12] [DCCP]: Make code assumptions explicit

2007-12-13 Thread Arnaldo Carvalho de Melo

From: Gerrit Renker <[EMAIL PROTECTED]>

This removes several `XXX' references which indicate a missing support
for non-1-byte feature values: this is unnecessary, as all currently known
(standardised) SP feature values are 1-byte quantities.

Signed-off-by: Gerrit Renker <[EMAIL PROTECTED]>
Signed-off-by: Ian McDonald <[EMAIL PROTECTED]>
Signed-off-by: Arnaldo Carvalho de Melo <[EMAIL PROTECTED]>
---
 net/dccp/feat.c |   17 ++---
 1 files changed, 10 insertions(+), 7 deletions(-)

diff --git a/net/dccp/feat.c b/net/dccp/feat.c
index 084744e..4a4f6ce 100644
--- a/net/dccp/feat.c
+++ b/net/dccp/feat.c
@@ -4,10 +4,16 @@
  *  An implementation of the DCCP protocol
  *  Andrea Bittau <[EMAIL PROTECTED]>
  *
- *  This program is free software; you can redistribute it and/or
- *  modify it under the terms of the GNU General Public License
- *  as published by the Free Software Foundation; either version
- *  2 of the License, or (at your option) any later version.
+ *  ASSUMPTIONS
+ *  ---
+ *  o All currently known SP features have 1-byte quantities. If in the future
+ *extensions of RFCs 4340..42 define features with item lengths larger than
+ *one byte, a feature-specific extension of the code will be required.
+ *
+ *  This program is free software; you can redistribute it and/or
+ *  modify it under the terms of the GNU General Public License
+ *  as published by the Free Software Foundation; either version
+ *  2 of the License, or (at your option) any later version.
  */
 
 #include 
@@ -95,7 +101,6 @@ static int dccp_feat_update_ccid(struct sock *sk, u8 type, 
u8 new_ccid_nr)
return 0;
 }
 
-/* XXX taking only u8 vals */
 static int dccp_feat_update(struct sock *sk, u8 type, u8 feat, u8 val)
 {
dccp_feat_debug(type, feat, val);
@@ -140,7 +145,6 @@ static int dccp_feat_reconcile(struct sock *sk, struct 
dccp_opt_pend *opt,
/* FIXME sanity check vals */
 
/* Are values in any order?  XXX Lame "algorithm" here */
-   /* XXX assume values are 1 byte */
for (i = 0; i < slen; i++) {
for (j = 0; j < rlen; j++) {
if (spref[i] == rpref[j]) {
@@ -175,7 +179,6 @@ static int dccp_feat_reconcile(struct sock *sk, struct 
dccp_opt_pend *opt,
}
 
/* need to put result and our preference list */
-   /* XXX assume 1 byte vals */
rlen = 1 + opt->dccpop_len;
rpref = kmalloc(rlen, GFP_ATOMIC);
if (rpref == NULL)
-- 
1.5.3.4

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: 2.6.24-rc5-mm1

2007-12-13 Thread Borislav Petkov

On Thu, Dec 13, 2007 at 04:01:34PM +0100, Benjamin Thery wrote:
> The problem comes from the new macro UDPX_INC_STATS_BH introduced
> by Herbert, which was a nice addition to increment the correct 
> UDP MIB depending on the socket family, but unfortunately 
> the use of this macro from kernel code (I mean code not compiled 
> as module) requires that IPv6 is also compiled in kernel 
> (CONFIG_IPv6=y) in order to have udp_stats_in6 defined at link 
> time.
> 
> Benjamin
> 
> Pierre Peiffer wrote:
> > Hi,
> > 
> > My config does not link any more:
> > 
> > ...
> >   CHK include/linux/compile.h
> >   UPD include/linux/compile.h
> >   CC  init/version.o
> >   LD  init/built-in.o
> >   LD  .tmp_vmlinux1
> > net/built-in.o: In function `xs_udp_data_ready':
> > /home/peifferp/containers/kernel/linux-2.6.24-rc5-mm1/net/sunrpc/xprtsock.c:842:
> > undefined reference to `udp_stats_in6'
> > /home/peifferp/containers/kernel/linux-2.6.24-rc5-mm1/net/sunrpc/xprtsock.c:846:
> > undefined reference to `udp_stats_in6'
> > make[1]: *** [.tmp_vmlinux1] Error 1
> > make: *** [sub-make] Error 2
> > 
> > After a first look, udp_stats_in6 seems to be defined in ipv6 (file
> > net/ipv6/udp.c) but I have
> > 
> > CONFIG_IPV6=m
> > and
> > CONFIG_SUNRPC=y
> > 
> > So, SUNRPC uses something defined in a module in my case ?
> > 
> > ... looking more, this dependency seems to have been introduced by the patch
> > [UDP]: Restore missing inDatagrams increments
> > ( http://thread.gmane.org/gmane.linux.network/79716/focus=79831 )
> > 
> > (I cc netdev)
> > 
> > I don't know what is the right way to fix this ... ?

you might do "select IPV6" in the SUNRPC section of fs/Kconfig, however select 
is
evil...

-- 
Regards/Gruß,
Boris.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [IPSEC]: Fix zero return value in xfrm_lookup on error

2007-12-13 Thread David Miller

From: Herbert Xu <[EMAIL PROTECTED]>
Date: Fri, 14 Dec 2007 00:44:48 +0800

> [IPSEC]: Fix zero return value in xfrm_lookup on error
> 
> Further testing shows that my ICMP relookup patch can cause xfrm_lookup
> to return zero on error which isn't very nice since it leads to the caller
> dying on null pointer dereference.  The bug is due to not setting err
> to ENOENT just before we leave xfrm_lookup in case of no policy.
> 
> This patch moves the err setting to where it should be.
> 
> Signed-off-by: Herbert Xu <[EMAIL PROTECTED]>

Applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: kernel 2.6.23.8: KERNEL: assertion in net/ipv4/tcp_input.c

2007-12-13 Thread Wolfgang Walter

Hello Ilpo,

it happened again with your patch applied:

WARNING: at net/ipv4/tcp_input.c:1018 tcp_sacktag_write_queue()

Call Trace:
  [] tcp_sacktag_write_queue+0x7d0/0xa60
[] add_partial+0x19/0x60
[] tcp_ack+0x5a4/0x1d70
[] tcp_rcv_established+0x485/0x7b0
[] tcp_v4_do_rcv+0xed/0x3e0
[] tcp_v4_rcv+0x947/0x970
[] ip_local_deliver+0xac/0x290
[] ip_rcv+0x362/0x6c0
[] netif_receive_skb+0x323/0x420
[] tg3_poll+0x630/0xa50
[] net_rx_action+0x8a/0x140
[] __do_softirq+0x69/0xe0
[] call_softirq+0x1c/0x30
[] do_softirq+0x35/0x90
[] irq_exit+0x55/0x60
[] do_IRQ+0x80/0x100
[] ret_from_intr+0x0/0xa



Am Montag, 3. Dezember 2007 14:34 schrieb Ilpo Järvinen:
> On Mon, 3 Dec 2007, Wolfgang Walter wrote:
> > with kernel 2.6.23.8 we saw a
> >
> > KERNEL: assertion ((int)tcp_packets_in_flight(tp) >= 0) failed at
> > net/ipv4/tcp_input.c (1292)
>
> Is this the only message? Are there any Leak printouts?
> Any tweaking done to TCP related sysctls?

net/core/somaxconn=2048
net/ipv4/tcp_syncookies=1
net/ipv4/tcp_max_syn_backlog=8192
net/ipv4/tcp_max_tw_buckets=180
net/ipv4/tcp_window_scaling=0
net/ipv4/tcp_timestamps=0

>
> Most likely I broke the manual synchronization for left_out in sacktag by
> skipping over it when packets_out == 0 but so far I haven't been able to
> figure out how such state could develop in the first place... Ie., I
> couldn't find a case where tcp_fastretrans_alert wouldn't be called if
> left_out was non-zero (and it did the sync_left_out after modifying
> either sacked_out or lost_out, IIRC).
>
> ...If you can reproduce it, you could try if this patch below changes
> anything (should silence the assert and trigger earlier a WARN_ON or
> two :-)). ...If this triggers, then I'm sure we can pollute TCP code
> by a larger number of more costly checks to catch it in early.
>
> This might reveal a long-standing inconsistency of left_out in some
> case I just couldn't come up with by code review. Left_out will be
> (is) anyway dropped as unnecessary in 2.6.24. In 2.6.23 sync for
> left_out occurs quite soon after that BUG_TRAP anyway so the effect
> won't be too dramatic, prior_in_flight would be once stale, won't
> lead to big problems (either missed cnwd or cwnd_cnt increment, or
> failure to do application limited check at that particular ACK).
>
> Thanks anyway for the report. ...If I figure something out here, I'll
> let you know.
>
> --
>
> diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
> index c9298a7..0c5194d 100644
> --- a/net/ipv4/tcp_input.c
> +++ b/net/ipv4/tcp_input.c
> @@ -1012,8 +1012,12 @@ tcp_sacktag_write_queue(struct sock *sk, struct
> sk_buff *ack_skb, u32 prior_snd_ if (before(TCP_SKB_CB(ack_skb)->ack_seq,
> prior_snd_una - tp->max_window)) return 0;
>
> - if (!tp->packets_out)
> + if (!tp->packets_out) {
> + WARN_ON(tp->sacked_out);
> + WARN_ON(tp->lost_out);
> + WARN_ON(tp->left_out);
>   goto out;
> + }
>
>   /* SACK fastpath:
>* if the only SACK change is the increase of the end_seq of
> @@ -1277,14 +1281,14 @@ tcp_sacktag_write_queue(struct sock *sk, struct
> sk_buff *ack_skb, u32 prior_snd_ }
>   }
>
> +out:
> +
>   tp->left_out = tp->sacked_out + tp->lost_out;
>
>   if ((reord < tp->fackets_out) && icsk->icsk_ca_state != TCP_CA_Loss &&
>   (!tp->frto_highmark || after(tp->snd_una, tp->frto_highmark)))
>   tcp_update_reordering(sk, ((tp->fackets_out + 1) - reord), 0);
>
> -out:
> -
>  #if FASTRETRANS_DEBUG > 0
>   BUG_TRAP((int)tp->sacked_out >= 0);
>   BUG_TRAP((int)tp->lost_out >= 0);

Regards,
-- 
Wolfgang Walter
Studentenwerk München
Anstalt des öffentlichen Rechts
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC] net: napi fix

2007-12-13 Thread Kok, Auke

David Miller wrote:
> From: Andrew Gallatin <[EMAIL PROTECTED]>
> Date: Thu, 13 Dec 2007 09:13:54 -0500
> 
>> If the netif_running() check is indeed required to make a device break
>> out of napi polling and respond to an ifconfig down, then I think the
>> netif_running() check should be moved up into net_rx_action() to avoid
>> potential for driver complexity and bugs like the ones you found.
> 
> That, or something like it, definitely sounds reasonable and much
> better than putting the check into every driver :-)

hear hear!

Auke
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[IPSEC]: Fix zero return value in xfrm_lookup on error

2007-12-13 Thread Herbert Xu

Hi Dave:

Found another silly bug in my ICMP relookup patch.

[IPSEC]: Fix zero return value in xfrm_lookup on error

Further testing shows that my ICMP relookup patch can cause xfrm_lookup
to return zero on error which isn't very nice since it leads to the caller
dying on null pointer dereference.  The bug is due to not setting err
to ENOENT just before we leave xfrm_lookup in case of no policy.

This patch moves the err setting to where it should be.

Signed-off-by: Herbert Xu <[EMAIL PROTECTED]>

diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c
index b04d88c..d2084b1 100644
--- a/net/xfrm/xfrm_policy.c
+++ b/net/xfrm/xfrm_policy.c
@@ -1469,8 +1469,6 @@ restart:
goto dropdst;
}
 
-   err = -ENOENT;
-
if (!policy) {
/* To accelerate a bit...  */
if ((dst_orig->flags & DST_NOXFRM) ||
@@ -1492,6 +1490,7 @@ restart:
npols ++;
xfrm_nr += pols[0]->xfrm_nr;
 
+   err = -ENOENT;
if ((flags & XFRM_LOOKUP_ICMP) && !(policy->flags & XFRM_POLICY_ICMP))
goto error;
 
@@ -1657,6 +1656,7 @@ dropdst:
return err;
 
 nopol:
+   err = -ENOENT;
if (flags & XFRM_LOOKUP_ICMP)
goto dropdst;
return 0;

Thanks,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: 2.6.24-rc5-mm1

2007-12-13 Thread Adrian Bunk

On Thu, Dec 13, 2007 at 05:07:44PM +0100, Borislav Petkov wrote:
> On Thu, Dec 13, 2007 at 04:01:34PM +0100, Benjamin Thery wrote:
> > The problem comes from the new macro UDPX_INC_STATS_BH introduced
> > by Herbert, which was a nice addition to increment the correct 
> > UDP MIB depending on the socket family, but unfortunately 
> > the use of this macro from kernel code (I mean code not compiled 
> > as module) requires that IPv6 is also compiled in kernel 
> > (CONFIG_IPv6=y) in order to have udp_stats_in6 defined at link 
> > time.
> > 
> > Benjamin
> > 
> > Pierre Peiffer wrote:
> > > Hi,
> > > 
> > >   My config does not link any more:
> > > 
> > > ...
> > >   CHK include/linux/compile.h
> > >   UPD include/linux/compile.h
> > >   CC  init/version.o
> > >   LD  init/built-in.o
> > >   LD  .tmp_vmlinux1
> > > net/built-in.o: In function `xs_udp_data_ready':
> > > /home/peifferp/containers/kernel/linux-2.6.24-rc5-mm1/net/sunrpc/xprtsock.c:842:
> > > undefined reference to `udp_stats_in6'
> > > /home/peifferp/containers/kernel/linux-2.6.24-rc5-mm1/net/sunrpc/xprtsock.c:846:
> > > undefined reference to `udp_stats_in6'
> > > make[1]: *** [.tmp_vmlinux1] Error 1
> > > make: *** [sub-make] Error 2
> > > 
> > > After a first look, udp_stats_in6 seems to be defined in ipv6 (file
> > > net/ipv6/udp.c) but I have
> > > 
> > > CONFIG_IPV6=m
> > > and
> > > CONFIG_SUNRPC=y
> > > 
> > > So, SUNRPC uses something defined in a module in my case ?
> > > 
> > > ... looking more, this dependency seems to have been introduced by the 
> > > patch
> > > [UDP]: Restore missing inDatagrams increments
> > > ( http://thread.gmane.org/gmane.linux.network/79716/focus=79831 )
> > > 
> > > (I cc netdev)
> > > 
> > > I don't know what is the right way to fix this ... ?
> 
> you might do "select IPV6" in the SUNRPC section of fs/Kconfig, however 
> select is
> evil...

select itself isn't evil.

Nonsensical selects like the one you suggest (sunrpc does not require 
IPV6) are evil.

> Regards/Gruß,
> Boris.

cu
Adrian

-- 

   "Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
   "Only a promise," Lao Er said.
   Pearl S. Buck - Dragon Seed

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2.6.25] [IPV4] Thresholds in fib_trie.c are used as consts, so make them const.

2007-12-13 Thread David Miller

From: "Denis V. Lunev" <[EMAIL PROTECTED]>
Date: Thu, 13 Dec 2007 11:06:18 +0300

> [IPV4] Thresholds in fib_trie.c are used as consts, so make them const.
> 
> There are several thresholds for trie fib hash management. They are used
> in the code as a constants. Make them constants from the compiler point of
> view.
> 
> Signed-off-by: Denis V. Lunev <[EMAIL PROTECTED]>

Applied, thanks for fixing this patch up.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [patch 1/2] ip_gre: Rebinding of GRE tunnels to other interfaces

2007-12-13 Thread David Miller

From: [EMAIL PROTECTED]
Date: Thu, 13 Dec 2007 14:37:49 +0100

> This is similar to the change already done for IPIP tunnels.
> 
> Once created, a GRE tunnel can't be bound to another device.
> To reproduce:
> 
> # create a tunnel:
> ip tunnel add tunneltest0 mode gre remote 10.0.0.1 dev eth0
> # try to change the bounding device from eth0 to eth1:
> ip tunnel change tunneltest0 dev eth1
> # show the result:
> ip tunnel show tunneltest0
> 
> tunneltest0: gre/ip  remote 10.0.0.1  local any  dev eth0  ttl inherit
> 
> Notice the bound device has not changed from eth0 to eth1.
> 
> This patch fixes it. When changing the binding, it also recalculates the
> MTU according to the new bound device's MTU.
> 
> Signed-off-by: Michal Schmidt <[EMAIL PROTECTED]>

Applied to net-2.6.25, thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2.6.25] [IPV6] Always pass a valid nl_info to inet6_rt_notify.

2007-12-13 Thread David Miller

From: "Denis V. Lunev" <[EMAIL PROTECTED]>
Date: Thu, 13 Dec 2007 16:58:54 +0300

> [IPV6] Always pass a valid nl_info to inet6_rt_notify.
> 
> This makes the code in the inet6_rt_notify more straightforward and provides
> groud for namespace passing.
> 
> Signed-off-by: Denis V. Lunev <[EMAIL PROTECTED]>

Applied, thanks!
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

tcp_sacktag_one() WARNING (was Re: 2.6.24-rc4-mm1)

2007-12-13 Thread Cedric Le Goater

Cedric Le Goater wrote:
> Ilpo Järvinen wrote:
>> On Wed, 5 Dec 2007, Andrew Morton wrote:
>>
>>> On Thu, 06 Dec 2007 17:59:37 +1100 Reuben Farrelly <[EMAIL PROTECTED]> 
>>> wrote:
>>>
 This non fatal oops which I have just noticed may be related to this 
 change then 
 - certainly looks networking related.
>>> yep, but it isn't e1000.  It's core TCP.
>>>
 WARNING: at net/ipv4/tcp_input.c:2518 tcp_fastretrans_alert()
 Pid: 0, comm: swapper Not tainted 2.6.24-rc4-mm1 #1
>>> Ilpo, Reuben's kernel is talking to you ;)
>> ...Please try the patch below. Andrew, this probably fixes your problem 
>> (the packets <= tp->packets_out) as well.
> 
> nah. I got the WARNINGs again with this patch.

I got this new one on a 2.6.24-rc5-mm1. It looked similar ? 

C.

WARNING: at /home/legoater/linux/2.6.24-rc5-mm1/net/ipv4/tcp_input.c:1280 
tcp_sacktag_one()
Pid: 0, comm: swapper Not tainted 2.6.24-rc5-mm1 #1

Call Trace:
   [] tcp_sacktag_walk+0x2bc/0x62a
 [] tcp_sacktag_write_queue+0x595/0xa7c
 [] kfree+0xd4/0xe0
 [] tcp_ack+0x2a7/0xfc7
 [] mark_held_locks+0x47/0x6a
 [] trace_hardirqs_on+0xfe/0x139
 [] tcp_rcv_established+0x66a/0x76d
 [] tcp_v4_do_rcv+0x37/0x3aa
 [] tcp_v4_rcv+0x9a9/0xa76
 [] ip_local_deliver_finish+0x161/0x23c
 [] ip_local_deliver+0x72/0x77
 [] ip_rcv_finish+0x371/0x3b5
 [] ip_rcv+0x292/0x2c6
 [] netif_receive_skb+0x267/0x340
 [] :tg3:tg3_poll+0x5d2/0x89e
 [] net_rx_action+0xd5/0x1ad
 [] __do_softirq+0x5f/0xe3
 [] call_softirq+0x1c/0x28
 [] do_softirq+0x39/0x9f
 [] irq_exit+0x4e/0x50
 [] do_IRQ+0xb7/0xd7
 [] mwait_idle+0x0/0x52
 [] ret_from_intr+0x0/0xf
   [] __atomic_notifier_call_chain+0x20/0x83
 [] mwait_idle+0x48/0x52
 [] enter_idle+0x22/0x24
 [] cpu_idle+0xa1/0xc5
 [] start_secondary+0x3b9/0x3c5
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 1/2] tulip: napi full quantum bug

2007-12-13 Thread Stephen Hemminger

This should fix the kernel warn/oops reported while routing.

The tulip driver has a fencepost bug with new NAPI in 2.6.24
It has an off by one bug if a full quantum is reached.

Signed-off-by: Stephen Hemminger <[EMAIL PROTECTED]>


--- a/drivers/net/tulip/interrupt.c 2007-12-13 09:20:27.0 -0800
+++ b/drivers/net/tulip/interrupt.c 2007-12-13 09:23:34.0 -0800
@@ -151,7 +151,8 @@ int tulip_poll(struct napi_struct *napi,
if (tulip_debug > 5)
printk(KERN_DEBUG "%s: In tulip_rx(), entry %d 
%8.8x.\n",
   dev->name, entry, status);
-  if (work_done++ >= budget)
+
+  if (++work_done >= budget)
goto not_done;
 
if ((status & 0x38008300) != 0x0300) {
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 2/2] tulip: hardware mitigation simplify

2007-12-13 Thread Stephen Hemminger

The hardware mitigation in tulip can be simpified.
1. The budget with new NAPI will always be less than RX_RING_SIZE
   because RX_RING_SIZE is 128 and weight is 16.
2. The received counter is redundunt, just use the work_done value.
3. Only one value is used from the mit_table[]

Signed-off-by: Stephen Hemminger <[EMAIL PROTECTED]>
---
This can wait for 2.6.25

--- a/drivers/net/tulip/interrupt.c 2007-12-13 09:26:35.0 -0800
+++ b/drivers/net/tulip/interrupt.c 2007-12-13 09:27:29.0 -0800
@@ -21,45 +21,6 @@
 int tulip_rx_copybreak;
 unsigned int tulip_max_interrupt_work;
 
-#ifdef CONFIG_TULIP_NAPI_HW_MITIGATION
-#define MIT_SIZE 15
-#define MIT_TABLE 15 /* We use 0 or max */
-
-static unsigned int mit_table[MIT_SIZE+1] =
-{
-/*  CRS11 21143 hardware Mitigation Control Interrupt
-We use only RX mitigation we other techniques for
-TX intr. mitigation.
-
-   31Cycle Size (timer control)
-   30:27 TX timer in 16 * Cycle size
-   26:24 TX No pkts before Int.
-   23:20 RX timer in Cycle size
-   19:17 RX No pkts before Int.
-   16   Continues Mode (CM)
-*/
-
-0x0, /* IM disabled */
-0x8015,  /* RX time = 1, RX pkts = 2, CM = 1 */
-0x8015,
-0x8027,
-0x8037,
-0x8049,
-0x8059,
-0x8069,
-0x807B,
-0x808B,
-0x809D,
-0x80AD,
-0x80BD,
-0x80CF,
-0x80DF,
-//   0x80FF  /* RX time = 16, RX pkts = 7, CM = 1 */
-0x80F1  /* RX time = 16, RX pkts = 0, CM = 1 */
-};
-#endif
-
-
 int tulip_refill_rx(struct net_device *dev)
 {
struct tulip_private *tp = netdev_priv(dev);
@@ -113,21 +74,10 @@ int tulip_poll(struct napi_struct *napi,
struct net_device *dev = tp->dev;
int entry = tp->cur_rx % RX_RING_SIZE;
int work_done = 0;
-#ifdef CONFIG_TULIP_NAPI_HW_MITIGATION
-   int received = 0;
-#endif
 
if (!netif_running(dev))
goto done;
 
-#ifdef CONFIG_TULIP_NAPI_HW_MITIGATION
-
-/* that one buffer is needed for mit activation; or might be a
-   bug in the ring buffer code; check later -- JHS*/
-
-if (budget >=RX_RING_SIZE) budget--;
-#endif
-
if (tulip_debug > 4)
printk(KERN_DEBUG " In tulip_rx(), entry %d %8.8x.\n", entry,
   tp->rx_ring[entry].status);
@@ -239,9 +189,6 @@ int tulip_poll(struct napi_struct *napi,
tp->stats.rx_packets++;
tp->stats.rx_bytes += pkt_len;
}
-#ifdef CONFIG_TULIP_NAPI_HW_MITIGATION
-  received++;
-#endif
 
entry = (++tp->cur_rx) % RX_RING_SIZE;
if (tp->cur_rx - tp->dirty_rx > RX_RING_SIZE/4)
@@ -279,14 +226,14 @@ done:
 ON:  More then 1 pkt received (per intr.) OR we are dropping
  OFF: Only 1 pkt received
 
- Note. We only use min and max (0, 15) settings from mit_table */
-
+ Note. We only use max or min values for mit */
 
   if( tp->flags &  HAS_INTR_MITIGATION) {
- if( received > 1 ) {
+ if( work_done > 1 ) {
  if( ! tp->mit_on ) {
  tp->mit_on = 1;
- iowrite32(mit_table[MIT_TABLE], tp->base_addr 
+ CSR11);
+/* RX time = 16, RX pkts = 0, CM = 1 */
+ iowrite32(0x80F1, tp->base_addr + CSR11);
  }
   }
  else {
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] e1000: remove no longer used code for pci read/write cfg

2007-12-13 Thread Auke Kok

From: Adrian Bunk <[EMAIL PROTECTED]>

Signed-off-by: Adrian Bunk <[EMAIL PROTECTED]>
Signed-off-by: Auke Kok <[EMAIL PROTECTED]>
---

 drivers/net/e1000/e1000_hw.h   |2 --
 drivers/net/e1000/e1000_main.c |   16 
 2 files changed, 0 insertions(+), 18 deletions(-)

diff --git a/drivers/net/e1000/e1000_hw.h b/drivers/net/e1000/e1000_hw.h
index a2a86c5..930554b 100644
--- a/drivers/net/e1000/e1000_hw.h
+++ b/drivers/net/e1000/e1000_hw.h
@@ -421,8 +421,6 @@ void e1000_tbi_adjust_stats(struct e1000_hw *hw, struct 
e1000_hw_stats *stats, u
 void e1000_get_bus_info(struct e1000_hw *hw);
 void e1000_pci_set_mwi(struct e1000_hw *hw);
 void e1000_pci_clear_mwi(struct e1000_hw *hw);
-void e1000_read_pci_cfg(struct e1000_hw *hw, uint32_t reg, uint16_t * value);
-void e1000_write_pci_cfg(struct e1000_hw *hw, uint32_t reg, uint16_t * value);
 int32_t e1000_read_pcie_cap_reg(struct e1000_hw *hw, uint32_t reg, uint16_t 
*value);
 void e1000_pcix_set_mmrbc(struct e1000_hw *hw, int mmrbc);
 int e1000_pcix_get_mmrbc(struct e1000_hw *hw);
diff --git a/drivers/net/e1000/e1000_main.c b/drivers/net/e1000/e1000_main.c
index 724f067..efd8c2d 100644
--- a/drivers/net/e1000/e1000_main.c
+++ b/drivers/net/e1000/e1000_main.c
@@ -4866,22 +4866,6 @@ e1000_pci_clear_mwi(struct e1000_hw *hw)
pci_clear_mwi(adapter->pdev);
 }
 
-void
-e1000_read_pci_cfg(struct e1000_hw *hw, uint32_t reg, uint16_t *value)
-{
-   struct e1000_adapter *adapter = hw->back;
-
-   pci_read_config_word(adapter->pdev, reg, value);
-}
-
-void
-e1000_write_pci_cfg(struct e1000_hw *hw, uint32_t reg, uint16_t *value)
-{
-   struct e1000_adapter *adapter = hw->back;
-
-   pci_write_config_word(adapter->pdev, reg, *value);
-}
-
 int
 e1000_pcix_get_mmrbc(struct e1000_hw *hw)
 {

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCHES 0/12]: DCCP patches for 2.6.25

2007-12-13 Thread David Miller

From: Arnaldo Carvalho de Melo <[EMAIL PROTECTED]>
Date: Thu, 13 Dec 2007 13:05:53 -0200

>   Please consider pulling from:
> 
> master.kernel.org:/pub/scm/linux/kernel/git/acme/net-2.6.25

Looks good, pulled and pushed back out to net-2.6.25

Thanks!
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: 2.6.24-rc4-mm1 - BUG in tcp_fragment

2007-12-13 Thread Cedric Le Goater

Andrew Morton wrote:
> Temporarily at
> 
>   http://userweb.kernel.org/~akpm/2.6.24-rc4-mm1/
> 
> Will appear later at
> 
>   
> ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.24-rc4/2.6.24-rc4-mm1/

I got this one while compiling on NFS.

C.

kernel BUG at /home/legoater/linux/2.6.24-rc4-mm1/include/net/tcp.h:1480!
invalid opcode:  [1] SMP 
last sysfs file: /sys/devices/pci:00/:00:1e.0/:01:01.0/local_cpus
CPU 1 
Modules linked in: autofs4 nfs lockd sunrpc tg3 sg joydev ext3 jbd ehci_hcd 
ohci_hcd uhci_hcd
Pid: 0, comm: swapper Not tainted 2.6.24-rc4-mm1 #3
RIP: 0010:[]  [] tcp_fragment+0x5ee/0x6f7
RSP: 0018:810147c9f9e0  EFLAGS: 00010217
RAX: 1526c311 RBX: 8100c2ce1d00 RCX: 810143cc6aa0
RDX: 0001 RSI: 810102b37b00 RDI: 810102b37b50
RBP: 810147c9fa50 R08: 004a R09: 0001
R10: 0b50 R11: 0001 R12: 81013a575700
R13:  R14: 810143cc6400 R15: 81013a575750
FS:  () GS:810147c57140() knlGS:
CS:  0010 DS: 0018 ES: 0018 CR0: 8005003b
CR2: 2ad5d294b000 CR3: bd11b000 CR4: 06e0
DR0:  DR1:  DR2: 
DR3:  DR6: 0ff0 DR7: 0400
Process swapper (pid: 0, threadinfo 810147c98000, task 810147c89040)
Stack:  810147c9fa00  05a843cc6400 810143cc6400
 810147c9fa70 8100c2ce1d50 810143cc6590 810143cc6aa0
 15265421 810143cc6400 810143cc6400 81013a575700
Call Trace:
   [] tcp_retransmit_skb+0xd6/0x713
 [] tcp_xmit_retransmit_queue+0xd0/0x330
 [] tcp_fastretrans_alert+0xb92/0xbf2
 [] tcp_ack+0xdf3/0xfbe
 [] tcp_rcv_established+0x66a/0x76d
 [] tcp_v4_do_rcv+0x37/0x3aa
 [] tcp_v4_rcv+0x9a9/0xa76
 [] ip_local_deliver_finish+0x161/0x23c
 [] ip_local_deliver+0x72/0x77
 [] ip_rcv_finish+0x371/0x3b5
 [] ip_rcv+0x292/0x2c6
 [] netif_receive_skb+0x267/0x340
 [] :tg3:tg3_poll+0x5d2/0x89e
 [] net_rx_action+0xd5/0x1ad
 [] __do_softirq+0x5f/0xe3
 [] call_softirq+0x1c/0x28
 [] do_softirq+0x39/0x9f
 [] irq_exit+0x4e/0x50
 [] do_IRQ+0xb7/0xd7
 [] mwait_idle+0x0/0x55
 [] ret_from_intr+0x0/0xf
   [] __atomic_notifier_call_chain+0x20/0x83
 [] mwait_idle+0x48/0x55
 [] enter_idle+0x22/0x24
 [] cpu_idle+0xa1/0xc5
 [] start_secondary+0x3b9/0x3c5


Code: 0f 0b eb fe 48 85 f6 74 08 8b 46 6c 3b 41 68 75 55 48 8d 41 
RIP  [] tcp_fragment+0x5ee/0x6f7
 RSP 
Kernel panic - not syncing: Aiee, killing interrupt handler!
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: 2.6.24-rc5-mm1

2007-12-13 Thread David Miller

From: Benjamin Thery <[EMAIL PROTECTED]>
Date: Thu, 13 Dec 2007 16:01:34 +0100

> The problem comes from the new macro UDPX_INC_STATS_BH introduced
> by Herbert, which was a nice addition to increment the correct 
> UDP MIB depending on the socket family, but unfortunately 
> the use of this macro from kernel code (I mean code not compiled 
> as module) requires that IPv6 is also compiled in kernel 
> (CONFIG_IPv6=y) in order to have udp_stats_in6 defined at link 
> time.

Herbert, please take a look at this, thanks!

> Benjamin
> 
> Pierre Peiffer wrote:
> > Hi,
> > 
> > My config does not link any more:
> > 
> > ...
> >   CHK include/linux/compile.h
> >   UPD include/linux/compile.h
> >   CC  init/version.o
> >   LD  init/built-in.o
> >   LD  .tmp_vmlinux1
> > net/built-in.o: In function `xs_udp_data_ready':
> > /home/peifferp/containers/kernel/linux-2.6.24-rc5-mm1/net/sunrpc/xprtsock.c:842:
> > undefined reference to `udp_stats_in6'
> > /home/peifferp/containers/kernel/linux-2.6.24-rc5-mm1/net/sunrpc/xprtsock.c:846:
> > undefined reference to `udp_stats_in6'
> > make[1]: *** [.tmp_vmlinux1] Error 1
> > make: *** [sub-make] Error 2
> > 
> > After a first look, udp_stats_in6 seems to be defined in ipv6 (file
> > net/ipv6/udp.c) but I have
> > 
> > CONFIG_IPV6=m
> > and
> > CONFIG_SUNRPC=y
> > 
> > So, SUNRPC uses something defined in a module in my case ?
> > 
> > ... looking more, this dependency seems to have been introduced by the patch
> > [UDP]: Restore missing inDatagrams increments
> > ( http://thread.gmane.org/gmane.linux.network/79716/focus=79831 )
> > 
> > (I cc netdev)
> > 
> > I don't know what is the right way to fix this ... ?
> > 
> > P.
> > Andrew Morton wrote:
> >> ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.24-rc5/2.6.24-rc5-mm1/
> >>
> >> - If something goes wrong with a PCI device's probing or initialisation, 
> >> try
> >>   reverting pci-disable-decoding-during-sizing-of-bars.patch.
> >>
> >> - git-sched was dropped due to breaking suspend-to-RAM.
> >>
> >> - git-block has been restored after having had a few problems
> >>
> >> - git-newsetup.patch was dropped due to conflicts with git-x86
> >>
> >> - git-perfmon.patch is still dropped for the same reason
> >>
> >> - git-kgdb.patch is still dropped for the same reason
> >>
> >> - Please do try to cc the correct developer and mailing list when
> >>   reporting problems - I'm just buried in bugs over here.
> >>
> >>
> >>
> >> Boilerplate:
> >>
> >> - See the `hot-fixes' directory for any important updates to this patchset.
> >>
> >> - To fetch an -mm tree using git, use (for example)
> >>
> >>   git-fetch 
> >> git://git.kernel.org/pub/scm/linux/kernel/git/smurf/linux-trees.git tag 
> >> v2.6.16-rc2-mm1
> >>   git-checkout -b local-v2.6.16-rc2-mm1 v2.6.16-rc2-mm1
> >>
> >> - -mm kernel commit activity can be reviewed by subscribing to the
> >>   mm-commits mailing list.
> >>
> >> echo "subscribe mm-commits" | mail [EMAIL PROTECTED]
> >>
> >> - If you hit a bug in -mm and it is not obvious which patch caused it, it 
> >> is
> >>   most valuable if you can perform a bisection search to identify which 
> >> patch
> >>   introduced the bug.  Instructions for this process are at
> >>
> >> 
> >> http://www.zip.com.au/~akpm/linux/patches/stuff/bisecting-mm-trees.txt
> >>
> >>   But beware that this process takes some time (around ten rebuilds and
> >>   reboots), so consider reporting the bug first and if we cannot 
> >> immediately
> >>   identify the faulty patch, then perform the bisection search.
> >>
> >> - When reporting bugs, please try to Cc: the relevant maintainer and 
> >> mailing
> >>   list on any email.
> >>
> >> - When reporting bugs in this kernel via email, please also rewrite the
> >>   email Subject: in some manner to reflect the nature of the bug.  Some
> >>   developers filter by Subject: when looking for messages to read.
> >>
> >> - Occasional snapshots of the -mm lineup are uploaded to
> >>   ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/mm/ and are announced 
> >> on
> >>   the mm-commits list.  These probably are at least compilable.
> >>
> >> - More-than-daily -mm snapshots may be found at
> >>   http://userweb.kernel.org/~akpm/mmotm/.  These are almost certainly not
> >>   compileable.
> >>
> >>
> >>
> >> Changes since 2.6.24-rc4-mm1:
> >>
> >>
> >>  origin.patch
> >>  git-acpi.patch
> >>  git-alsa.patch
> >>  git-agpgart.patch
> >>  git-arm.patch
> >>  git-arm-master.patch
> >>  git-avr32.patch
> >>  git-cpufreq.patch
> >>  git-powerpc.patch
> >>  git-drm.patch
> >>  git-dvb.patch
> >>  git-hwmon.patch
> >>  git-gfs2-nmw.patch
> >>  git-hid.patch
> >>  git-hrt.patch
> >>  git-ieee1394.patch
> >>  git-infiniband.patch
> >>  git-input.patch
> >>  git-jfs.patch
> >>  git-kbuild.patch
> >>  git-kvm.patch
> >>  git-lblnet.patch
> >>  git-leds.patch
> >>  git-libata-all.patch
> >>  git-md-accel.patch
> >>  git-mips.patch
> >>  git-mmc.patch
> >>  git-mtd.

Re: [patch 2/2] ipv6/sit: Rebinding of SIT tunnels to other interfaces

2007-12-13 Thread David Miller

From: [EMAIL PROTECTED]
Date: Thu, 13 Dec 2007 14:37:50 +0100

> This is similar to the change already done for IPIP tunnels.
> 
> Once created, a SIT tunnel can't be bound to another device.
> To reproduce:
> 
> # create a tunnel:
> ip tunnel add tunneltest0 mode sit remote 10.0.0.1 dev eth0
> # try to change the bounding device from eth0 to eth1:
> ip tunnel change tunneltest0 dev eth1
> # show the result:
> ip tunnel show tunneltest0
> 
> tunneltest0: ipv6/ip  remote 10.0.0.1  local any  dev eth0  ttl inherit
> 
> Notice the bound device has not changed from eth0 to eth1.
> 
> This patch fixes it. When changing the binding, it also recalculates the
> MTU according to the new bound device's MTU.
> 
> Signed-off-by: Michal Schmidt <[EMAIL PROTECTED]>

Also applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC] net: napi fix

2007-12-13 Thread Stephen Hemminger

On Thu, 13 Dec 2007 06:19:38 -0800 (PST)
David Miller <[EMAIL PROTECTED]> wrote:

> From: Andrew Gallatin <[EMAIL PROTECTED]>
> Date: Thu, 13 Dec 2007 09:13:54 -0500
> 
> > If the netif_running() check is indeed required to make a device break
> > out of napi polling and respond to an ifconfig down, then I think the
> > netif_running() check should be moved up into net_rx_action() to avoid
> > potential for driver complexity and bugs like the ones you found.
> 
> That, or something like it, definitely sounds reasonable and much
> better than putting the check into every driver :-)
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

It is not possible to do netif_running() check in generic code as currently
written because of the case of devices where a single NAPI object is
being used to handle two devices. The association between napi and netdevice
is M to N.  There are cases like niu that have multiple NAPI's and one
netdevice; and devices like sky2 that can have one NAPI and 2 netdevice's.

The existing pointer from napi to netdevice is only used by netconsole
now. For devices like sky2 it means that netconsole can't work on the the
second port which is a not a big problem. But adding a netif_running()
check would be a big issue.

-- 
Stephen Hemminger <[EMAIL PROTECTED]>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [kvm-devel] [PATCH resent] virtio_net: Fix stalled inbound trafficon early packets

2007-12-13 Thread Christian Borntraeger

Am Donnerstag, 13. Dezember 2007 schrieb Dor Laor:
> You're right I got confused somehow.
> So in that case setting the driver status field on open in addition to 
> your enable will do the trick.
> On DRIVER_OPEN the host will trigger an interrupt if the queue is not 
> empty..
> Thanks,
> Dor

After looking into some other drivers, I prefer my first patch (moving 
napi_enable) ;-)

There are some drivers like xen-netfront, b44, which call napi_enable before 
the buffers are passed to the hardware. So it seems that moving napi is 
also a valid option.

But maybe I can just wait until Rusty returns from vacation (I will leave 
next week) so everything might be wonderful when I return ;-)

Rusty, if you decide to apply my patch, there is one downside: The debugging 
code in virtio_ring sometimes triggers with a false positive:

try_fill_recv calls vring_kick. Here we do a notify to the host. This might 
cause an immediate interrupt, triggering the poll routine before vring_kick 
can run END_USE. This is no real problem, as we dont touch the vq after 
notify.

Christian, facing 64 guest cpus unconvering all kind of races

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: reading the tcp headers within the write queue

2007-12-13 Thread David Miller

From: Gavin McCullagh <[EMAIL PROTECTED]>
Date: Thu, 13 Dec 2007 18:56:26 +

> I'm trying to hack together something which will run through the
> retransmit queue looking at the tcp headers.

The packets in the retransmit queue are headerless, the
header only gets added to clones of the retransmit queue
frames during the actual transmit.

And this question belongs on netdev not linux-net.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: reading the tcp headers within the write queue

2007-12-13 Thread Gavin McCullagh

Hi,

thanks for the swift reply.

On Thu, 13 Dec 2007, David Miller wrote:

> > I'm trying to hack together something which will run through the
> > retransmit queue looking at the tcp headers.
> 
> The packets in the retransmit queue are headerless, the
> header only gets added to clones of the retransmit queue
> frames during the actual transmit.

Thought that might be it. I presume there isn't any other residue of the
tcp options elsewhere, that one could look at when the packet gets
acknowledged?  I'm particularly interested in the timestamp.

> And this question belongs on netdev not linux-net.

Oops, sorry.

Gavin

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC] net: napi fix

2007-12-13 Thread David Miller

From: Andrew Gallatin <[EMAIL PROTECTED]>
Date: Thu, 13 Dec 2007 14:02:25 -0500

> Or perhaps we should just leave things as is.

We should probably add a "disabling" state bit to the
napi struct flags, this will be set by napi_disable()
before it loops trying to set the sched bit.

net_rx_action() can then check this.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC] net: napi fix

2007-12-13 Thread Andrew Gallatin

Stephen Hemminger wrote:

On Thu, 13 Dec 2007 06:19:38 -0800 (PST)
David Miller <[EMAIL PROTECTED]> wrote:

From: Andrew Gallatin <[EMAIL PROTECTED]>
Date: Thu, 13 Dec 2007 09:13:54 -0500

If the netif_running() check is indeed required to make a device break
out of napi polling and respond to an ifconfig down, then I think the
netif_running() check should be moved up into net_rx_action() to avoid
potential for driver complexity and bugs like the ones you found.

That, or something like it, definitely sounds reasonable and much
better than putting the check into every driver :-)
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

It is not possible to do netif_running() check in generic code as currently
written because of the case of devices where a single NAPI object is
being used to handle two devices. The association between napi and netdevice
is M to N.  There are cases like niu that have multiple NAPI's and one
netdevice; and devices like sky2 that can have one NAPI and 2 netdevice's.

Ah, now I see.  I forgot that not every device has a 1:1::napi:netdev
relationship.

Could we make an optional *dev_state field in the napi structure.
It would be initialized to __LINK_STATE_START.  Devices which have
a 1:1 NAPI:netdevice relationship would set it to &netdev->state.
The generic code would then do a test_bit(__LINK_STATE_START, 
napi->dev_state), and 1:1 drivers could remove this check.

M:N drivers would pay for a useless (to them) test_bit, and would
have to provide their own netif_running check to get termination
under heavy load.

Just an idea, perhaps there is a better way which is less hacky.

Or perhaps we should just leave things as is.

Drew
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: reading the tcp headers within the write queue

2007-12-13 Thread David Miller

From: Gavin McCullagh <[EMAIL PROTECTED]>
Date: Thu, 13 Dec 2007 19:08:59 +

> Thought that might be it. I presume there isn't any other residue of the
> tcp options elsewhere, that one could look at when the packet gets
> acknowledged?  I'm particularly interested in the timestamp.

Every time we transmit, the timestamp will be different.

We store the jiffies at transmit time in TCP_SKB_CB(skb)->when,
so you can use that.  This is the value we use to compute the
timestamp.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] [RFC] New driver "sfc" for Solarstorm SFC4000 controller - 2nd try

2007-12-13 Thread Robert Stonehouse

This is a resubmission of a new driver for Solarflare network controllers.

The driver supports many types of PHY (10Gbase-T, XFP, CX4) on five
different 10G reference designs and one 1G NIC ref design.

The previous thread was:
  "[PATCH] [RFC] New driver "sfc" for Solarstorm SFC4000 controller"
  http://marc.info/?l=linux-netdev&m=119583775622559&w=2


Since the 1st patch we have addressed the review comments we received
 - removed usage of __LINK_STATE_START as unnecessary with latest NAPI
 - cleaned up many checkpatch violations
 - used the generic drv_* logging
   (macros were kept so that the network device name can be printed
consistently in all messaging but these can be dissolved if necessary)
 - Reduced over-use of docbook type comments
 - replaced uintN_t with uN types
 - merged some small headers to reduce the file count

It is still quite a large driver at ~25k LOC. The main body of the driver
is within efx.c, falcon.c, tx.c and rx.c if this helps direct review effort.

We welcome more review comments and will try and respond to them more
quickly than last time.

The patch (against net-2.6.25) is at:
https://support.solarflare.com/netdev/2/net-2.6.25-sfc-2.2.0026.patch

The new files may also be downloaded as a tarball:
https://support.solarflare.com/netdev/2/net-2.6.25-sfc-2.2.0026.tgz

And for verification there is:
https://support.solarflare.com/netdev/2/MD5SUMS


Regards

-- 
Rob Stonehouse
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC] net: napi fix

2007-12-13 Thread Stephen Hemminger

David Miller wrote:

From: Andrew Gallatin <[EMAIL PROTECTED]>
Date: Thu, 13 Dec 2007 14:02:25 -0500

Or perhaps we should just leave things as is.

We should probably add a "disabling" state bit to the
napi struct flags, this will be set by napi_disable()
before it loops trying to set the sched bit.

net_rx_action() can then check this.

How about allowing a return value of -1 from napi_poll and letting device
check itself.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: 2.6.24-rc5-mm1 regression - kernel warning on tcp_fastretrans_alert()

2007-12-13 Thread Andrew Morton

On Thu, 13 Dec 2007 20:26:21 +0530 Kamalesh Babulal <[EMAIL PROTECTED]> wrote:

> Hi Andrew,

Hi.  Please do try to cc netdev@vger.kernel.org on net-related problems. 
Doing so will often save multiple hours latency and will optimise away one
entire email (ie: this one).

> Following call trace is seen in 2.6.24-rc5-mm1 kernel also,it was reported
> for 2.6.24-rc4-mm1 kernel http://lkml.org/lkml/2007/12/6/22
> 
> ls21b kernel: [ 7530.313408] WARNING: at net/ipv4/tcp_input.c:2533 
> tcp_fastretrans_alert()
> ls21b kernel: [ 7530.354051] Pid: 0, comm: swapper Not tainted 2.6.24-rc5-mm1 
> #1
> ls21b kernel: [ 7530.389487]
> ls21b kernel: [ 7530.389488] Call Trace:
> ls21b kernel: [ 7530.413030][] 
> tcp_fastretrans_alert+0x127/0xdaf
> ls21b kernel: [ 7530.454295]  [] tcp_ack+0xf2f/0x10fe
> ls21b kernel: [ 7530.485066]  [] 
> tcp_rcv_established+0x695/0x79a
> ls21b kernel: [ 7530.521542]  [] 
> trace_hardirqs_off+0x39/0xdc
> ls21b kernel: [ 7530.556468]  [] tcp_v4_do_rcv+0x37/0x3e1
> ls21b kernel: [ 7530.589317]  [] tcp_v4_rcv+0xac7/0xb93
> ls21b kernel: [ 7530.621126]  [] 
> ip_local_deliver_finish+0x54/0x20f
> ls21b kernel: [ 7530.659168]  [] 
> ip_local_deliver_finish+0x134/0x20f
> ls21b kernel: [ 7530.697724]  [] ip_local_deliver+0x72/0x7a
> ls21b kernel: [ 7530.731609]  [] ip_rcv_finish+0x3c0/0x430
> ls21b kernel: [ 7530.764977]  [] 
> netif_receive_skb+0x10e/0x44d
> ls21b kernel: [ 7530.800422]  [] ip_rcv+0x326/0x35d
> ls21b kernel: [ 7530.830148]  [] 
> netif_receive_skb+0x3df/0x44d
> ls21b kernel: [ 7530.865603]  [] 
> :bnx2:bnx2_poll+0x1262/0x14a4
> ls21b kernel: [ 7530.901039]  [] __next_cpu+0x19/0x28
> ls21b kernel: [ 7530.931805]  [] 
> find_busiest_group+0x252/0x6da
> ls21b kernel: [ 7530.967768]  [] 
> trace_hardirqs_off+0x39/0xdc
> ls21b kernel: [ 7531.002693]  [] 
> trace_hardirqs_off+0x39/0xdc
> ls21b kernel: [ 7531.037612]  [] check_chain_key+0x9c/0x15f
> ls21b kernel: [ 7531.071501]  [] __lock_acquire+0xdee/0xf06
> ls21b kernel: [ 7531.105386]  [] net_rx_action+0x75/0x234
> ls21b kernel: [ 7531.138233]  [] net_rx_action+0x75/0x234
> ls21b kernel: [ 7531.171074]  [] net_rx_action+0xec/0x234
> ls21b kernel: [ 7531.203920]  [] __do_softirq+0x5f/0xe3
> ls21b kernel: [ 7531.235721]  [] call_softirq+0x1c/0x28
> ls21b kernel: [ 7531.267528]  [] do_softirq+0x45/0x108
> ls21b kernel: [ 7531.298811]  [] irq_exit+0x4e/0x50
> ls21b kernel: [ 7531.328540]  [] do_IRQ+0x171/0x194
> ls21b kernel: [ 7531.358267]  [] ret_from_intr+0x0/0xf
> ls21b kernel: [ 7531.389549][] 
> default_idle+0x58/0x8a
> ls21b kernel: [ 7531.425096]  [] default_idle+0x56/0x8a
> ls21b kernel: [ 7531.456900]  [] default_idle+0x0/0x8a
> ls21b kernel: [ 7531.488186]  [] cpu_idle+0xb5/0xec
> ls21b kernel: [ 7531.517913]  [] start_secondary+0x3ca/0x3da
> 

That is

if (WARN_ON(!tp->sacked_out && tp->fackets_out))
tp->fackets_out = 0;

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC] net: napi fix

2007-12-13 Thread Jarek Poplawski

David Miller wrote, On 12/13/2007 02:50 PM:

> From: Jarek Poplawski <[EMAIL PROTECTED]>
> Date: Thu, 13 Dec 2007 14:49:53 +0100
> 
>> As a matter of fact, since it's "unlikely()" in net_rx_action() anyway,
>> I wonder what is the main reason or gain of leaving such a tricky
>> exception, instead of letting drivers to always decide which is the
>> best moment for napi_complete()? (Or maybe even, in such a case, they
>> should call some function with this list_move_tail() if it's so
>> useful?)
> 
> It is the only sane way to synchronize the list manipulations.
> 
> There has to be a way for ->poll() to tell net_rx_action() two things:
> 
> 1) How much work was completed, so we can adjust 'budget'


The 'budget' line would stay where it is. IMHO, it's only about this
list_move_tail(). (Probably also doing netpoll_poll_unlock()
during n->poll() could be considered to let the driver even destroy
napi just after napi_complete() - but it's another subject.)

> 2) Was the NAPI quota exhausted?  So that we know that
>net_rx_action() still "owns" the polling context and
>thus can do the list manipulation safely.
> 
> And these both need to be encoded into one single return value, thus
> the adopted convention that "work == weight" means that the device has
> not done a NAPI complete.

Of course, with some care and explanations to driver maintainers, like in
this case, this all should probably work like it is. But IMHO it would be
easier to remember and maintain if there are some simple rules with no
exceptions, so here e.g. driver always "owns" (with functions like
napi_schedule(), napi_complete(), and maybe napi_move_tail()), and
net_rx_action() only reads the list and runs these functions?!

I see in a nearby thread you would prefer to save some work to drivers
(like this netif_running() check), but I think this all is at the cost
of flexibility, and there will probably appear new problems, when a
driver simply can't wait till the next poll (which btw. looks strange
with all these hotplugging, usb and powersaving).

Regards,
Jarek P.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC] net: napi fix

2007-12-13 Thread David Miller

From: Jarek Poplawski <[EMAIL PROTECTED]>
Date: Thu, 13 Dec 2007 21:16:12 +0100

> I see in a nearby thread you would prefer to save some work to drivers
> (like this netif_running() check), but I think this all is at the cost
> of flexibility, and there will probably appear new problems, when a
> driver simply can't wait till the next poll (which btw. looks strange
> with all these hotplugging, usb and powersaving).

As someone who has actually had to edit the NAPI support of _EVERY_
single driver in the tree I can tell you that code duplication and
subtle semantic differences are a huge issue.

And when you talk about driver flexibility, it's wise to mention that
this comes at the expense of flexibility in the core implmentation.
For example, if we export the list handling widget into the ->poll()
routines, god help the person who wants to change how the poll list is
managed in net_rx_action() :-/

So we don't want to export datastructure details like that to the
driver.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC] net: napi fix

2007-12-13 Thread David Miller

From: Stephen Hemminger <[EMAIL PROTECTED]>
Date: Thu, 13 Dec 2007 11:35:07 -0800

> How about allowing a return value of -1 from napi_poll and letting
> device check itself.

It doesn't avoid the code duplication in the ->poll() fast paths.

I don't care, on the other hand, if crap accumulates in non-critical
slow paths like napi_disable() and dev_close().  That's why I'm
suggesting solutions in that area.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC] net: napi fix

2007-12-13 Thread Stephen Hemminger

David Miller wrote:

From: Jarek Poplawski <[EMAIL PROTECTED]>
Date: Thu, 13 Dec 2007 21:16:12 +0100

I see in a nearby thread you would prefer to save some work to drivers
(like this netif_running() check), but I think this all is at the cost
of flexibility, and there will probably appear new problems, when a
driver simply can't wait till the next poll (which btw. looks strange
with all these hotplugging, usb and powersaving).

As someone who has actually had to edit the NAPI support of _EVERY_
single driver in the tree I can tell you that code duplication and
subtle semantic differences are a huge issue.

And when you talk about driver flexibility, it's wise to mention that
this comes at the expense of flexibility in the core implmentation.
For example, if we export the list handling widget into the ->poll()
routines, god help the person who wants to change how the poll list is
managed in net_rx_action() :-/

So we don't want to export datastructure details like that to the
driver.

Also, most of the drivers should/could be doing the same thing. It is 
seems that
driver writers just want to get "creative" and do things differently. 
The code is
cleaner, safer, and less buggy if every device uses the interface in the 
same way.

When I did the initial pass on this, I didn't see a single variation on 
NAPI usage
that was better than the simple "get N packets and return" variation.  
But Dave

did way more detailed grunt work on this.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [3/4] DST: Network state machine.

2007-12-13 Thread Dmitry Monakhov

On 14:47 Mon 10 Dec , Evgeniy Polyakov wrote:
> 
> Network state machine.
> 
> Includes network async processing state machine and related tasks.
Hi, I've tried to play a little bit with DST and discover huge memory
leak. Every read request from remote node result in bio + bio's pages leak.

Data flow:
->kst_export_ready  ## prepare and submit bio 
  ->generic_make_request(bio)   ## submit it

->kst_export_read_end_io## block layer call bio_end_io callback

->kst_thread_process_state  ## process ready requests
  ->kst_data_callback
 ->kst_data_process_bio ## submit pages to network layer
  ->kst_complete_req
 ->kst_bio_endio
   ->kst_export_read_end_io ## WoW we calling the same bio_end_io 
## callback twice 
 ->dst_free_request(req);   ## request will be destroyed but it's bio
## and all bio's pages wasn't released.
We may release bio's pages after it was sent to network, it is safe because
sendpage() already called get_page(). I've attached simple patch which 
this this.  
> 
> Signed-off-by: Evgeniy Polyakov <[EMAIL PROTECTED]>
> 
> 
> diff --git a/drivers/block/dst/kst.c b/drivers/block/dst/kst.c
> new file mode 100644
> index 000..8fa3387
> --- /dev/null
> +++ b/drivers/block/dst/kst.c
> @@ -0,0 +1,1513 @@
> +/*
> + * 2007+ Copyright (c) Evgeniy Polyakov <[EMAIL PROTECTED]>> + * All rights 
> reserved.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#include 
> +
> +struct kst_poll_helper
> +{
> + poll_table  pt;
> + struct kst_state*st;
> +};
> +
> +static LIST_HEAD(kst_worker_list);
> +static DEFINE_MUTEX(kst_worker_mutex);
> +
> +/*
> + * This function creates bound socket for local export node.
> + */
> +static int kst_sock_create(struct kst_state *st, struct saddr *addr,
> + int type, int proto, int backlog)
> +{
> + int err;
> +
> + err = sock_create(addr->sa_family, type, proto, &st->socket);
> + if (err)
> + goto err_out_exit;
> +
> + err = st->socket->ops->bind(st->socket, (struct sockaddr *)addr,
> + addr->sa_data_len);
> +
> + err = st->socket->ops->listen(st->socket, backlog);
> + if (err)
> + goto err_out_release;
> +
> + st->socket->sk->sk_allocation = GFP_NOIO;
> +
> + return 0;
> +
> +err_out_release:
> + sock_release(st->socket);
> +err_out_exit:
> + return err;
> +}
> +
> +static void kst_sock_release(struct kst_state *st)
> +{
> + if (st->socket) {
> + sock_release(st->socket);
> + st->socket = NULL;
> + }
> +}
> +
> +void kst_wake(struct kst_state *st)
> +{
> + if (st) {
> + struct kst_worker *w = st->node->w;
> + unsigned long flags;
> +
> + spin_lock_irqsave(&w->ready_lock, flags);
> + if (list_empty(&st->ready_entry))
> + list_add_tail(&st->ready_entry, &w->ready_list);
> + spin_unlock_irqrestore(&w->ready_lock, flags);
> +
> + wake_up(&w->wait);
> + }
> +}
> +EXPORT_SYMBOL_GPL(kst_wake);
> +
> +/*
> + * Polling machinery.
> + */
> +static int kst_state_wake_callback(wait_queue_t *wait, unsigned mode,
> + int sync, void *key)
> +{
> + struct kst_state *st = container_of(wait, struct kst_state, wait);
> + kst_wake(st);
> + return 1;
> +}
> +
> +static void kst_queue_func(struct file *file, wait_queue_head_t *whead,
> +  poll_table *pt)
> +{
> + struct kst_state *st = container_of(pt, struct kst_poll_helper, pt)->st;
> +
> + st->whead = whead;
> + init_waitqueue_func_entry(&st->wait, kst_state_wake_callback);
> + add_wait_queue(whead, &st->wait);
> +}
> +
> +static void kst_poll_exit(struct kst_state *st)
> +{
> + if (st->whead) {
> + remove_wait_queue(st->whead, &st->wait);
> + st->whead = NULL;
> + }
> +}
> +
> +/*
> + * This function removes request from state tree and ordering list.
> + */
> +void kst_del_req(struct dst_request *req)
> +{
> + list_del_init(&req->request_list_entry);
> +}
> +EXPORT_SYMBOL_GPL(kst_del_req);
> +
> +static struct dst_request *kst_req_first(struct kst_state *st)
> +{
> + struct dst_request *req = NULL;
> +
> + if (!list_empty(&st->request_

What was the reason for 2.6.22 SMP kernels to change how sendmsg is called?

2007-12-13 Thread Kevin Wilson

In SMP kernels 2.6.21 and prior you could use a SOCK's sendmsg() call via the 
PROTO structure directly. e.g., sock->sk_prot->sendmsg().

Now in 2.6.22 and later kernels you must use the higher level SOCKET to make a 
call to PROTO_OPS then to sendmsg(). e.g., socket->ops->sendmsg().

Would someone please clue me in as to what source changes caused previously 
working driver code to go belly up? (ref original post below) I tried finding 
it in git but I don't think this was intentional but rather a side effect of 
some other change made between .21 & .22.

The 2nd method fixes the kernel oops I reported. Thanks to all those that 
assisted me (0) with my first post to this list (see below) ... uh, oh yeah, 
did I mention that number would tally up to ZERO people. ;-}

Thanks,

Kevin

***
-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] Behalf Of Kevin Wilson
Sent: Tuesday, December 11, 2007 16:51
To: netdev@vger.kernel.org
Subject: sk_prot->sendmsg(...) giving me a kernel oops on kernels past
2.6.20


I've searched everywhere (including this list) for a report but couldn't
find anything that can tell me what implementation step(s) I am missing.
Essentially all the kernels past 2.6.20 gives me a kernel oops on the
sendmsg code.

I know the sk_buff changes worked things over quite a bit but I have (or
thought I did) those changes accounted for in our driver. Are there any
specific implementation changes we need to add that we didn't need prior
before we now use sendmsg()? Below is the general gist of the code I am
executing, any pointers to what I am missing would be very welcome.

Thanks a bunch.

Kevin

***
done in userland and passed via ioctl:
***

sk = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP);

tcp_params.boxid = boxIndex;
tcp_params.sk = sk;
tcp_params.rbuf = malloc(4096);
tcp_params.rbuf_size = 4096;
tcp_params.wbuf = malloc(1460);
tcp_params.wbuf_size = 1460;

rc = ioctl(fd, SI_SET_TCP, &tcp_params);

***
done in kernel space:
***

struct tcp_sock *tp;
struct sock *mysock;

tp = sockfd_lookup(tcp_params.sk, &i);

mysock = tp->sk;

mysock->sk_prot->sendmsg(...);  <-- OOPS HERE

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: What was the reason for 2.6.22 SMP kernels to change how sendmsg is called?

2007-12-13 Thread David Miller

From: "Kevin Wilson" <[EMAIL PROTECTED]>
Date: Thu, 13 Dec 2007 14:54:31 -0600

> The 2nd method fixes the kernel oops I reported. Thanks to all those
> that assisted me (0) with my first post to this list (see below)
> ... uh, oh yeah, did I mention that number would tally up to ZERO
> people. ;-}

Because the knowledgable folks here have zero interest in helping
people with out-of-tree-likely-binary-only modules, because in such
cases you have our code but we don't have your's.

Have a nice day.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC] mac80211: clean up frame receive handling

2007-12-13 Thread John W. Linville

On Wed, Dec 12, 2007 at 07:24:04PM +0100, Johannes Berg wrote:

> @@ -1014,6 +992,24 @@ ieee80211_drop_unencrypted(struct ieee80
>   return 0;
>  }
>  
> +static bool ieee80211_frame_allowed(struct ieee80211_txrx_data *rx)
> +{
> + static const u8 pae_group_addr[ETH_ALEN]
> + = { 0x01, 0x80, 0xC2, 0x00, 0x00, 0x03 };
> + struct ethhdr *ehdr = (struct ethhdr *)rx->skb->data;
> +
> + if (rx->skb->protocol == htons(ETH_P_PAE) &&
> + (compare_ether_addr(ehdr->h_dest, pae_group_addr) == 0 ||
> +  compare_ether_addr(ehdr->h_dest, rx->dev->dev_addr) == 0))
> + return true;

Should you reverse these two compare_ether_addr calls?
rx->dev->dev_addr seems more likely for any given packet.  It probably
makes little difference but it seems like checking for that first
would still be better.

John
-- 
John W. Linville
[EMAIL PROTECTED]
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC] net: napi fix

2007-12-13 Thread Jarek Poplawski

Stephen Hemminger wrote, On 12/13/2007 09:41 PM:

> David Miller wrote:
>> From: Jarek Poplawski <[EMAIL PROTECTED]>
>> Date: Thu, 13 Dec 2007 21:16:12 +0100
>>
>>   
>>> I see in a nearby thread you would prefer to save some work to drivers
>>> (like this netif_running() check), but I think this all is at the cost
>>> of flexibility, and there will probably appear new problems, when a
>>> driver simply can't wait till the next poll (which btw. looks strange
>>> with all these hotplugging, usb and powersaving).
>>> 
>> As someone who has actually had to edit the NAPI support of _EVERY_
>> single driver in the tree I can tell you that code duplication and
>> subtle semantic differences are a huge issue.
>>
>> And when you talk about driver flexibility, it's wise to mention that
>> this comes at the expense of flexibility in the core implmentation.
>> For example, if we export the list handling widget into the ->poll()
>> routines, god help the person who wants to change how the poll list is
>> managed in net_rx_action() :-/
>>
>> So we don't want to export datastructure details like that to the
>> driver.
   

(I hope you both don't mind I save some 'paper' and do this
2 in 1...)

So, you've seen a few drivers, know this much better than me, and
maybe even thought why they all so unnecessarily different... Of
course, if you think that despite those differences they all can
work with simpler napi api then OK (until they don't have to do
any cheating, like with this 'work' here).

> Also, most of the drivers should/could be doing the same thing. It is 
> seems that
> driver writers just want to get "creative" and do things differently. 
> The code is
> cleaner, safer, and less buggy if every device uses the interface in the 
> same way.
> 
> When I did the initial pass on this, I didn't see a single variation on 
> NAPI usage
> that was better than the simple "get N packets and return" variation.  
> But Dave
> did way more detailed grunt work on this.

It seems there are some differences in thinking what is simple/complex.
I think drivers' developers are used to controlling their devices, so
they know better when to turn on/off interrupts. So, maybe similar model
could be appropriate here. Sometimes doing more looks simpler than doing
less and remembering how and when the rest will be done (like
this netif_running() test). But I hope I'm wrong here, and this will
work after all!

Cheers,
Jarek P.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: What was the reason for 2.6.22 SMP kernels to change how sendmsg is called?

2007-12-13 Thread Kevin Wilson

I see your point but it just so happens it is a GPL'd driver, as is all of our 
Linux code we produce for our hardware. Granted it is out of tree, and after 
you saw it you would want it to stay that way. However, I would have sent you 
the whole thing if that is a pre-req to cordial exchanges on this list.

Nonetheless, a somewhat recent change in your tree, that I could not pinpoint 
on my own, caused the driver to stop functioning properly. So after much 
searching in git/google/sources with no luck, I decided to ask for a little 
assistance, maybe just a hint as to where the culprit may be in the tree so I 
could investigate for myself. For SNGs I tried the method that now works but I 
am still at a loss as to (can't find) what changes in the tree caused it to 
fail.

Does that now clarify/help matters any?

Regards,

Kevin

-Original Message-
From: David Miller [mailto:[EMAIL PROTECTED]
Sent: Thursday, December 13, 2007 15:18
To: Kevin Wilson
Cc: netdev@vger.kernel.org
Subject: Re: What was the reason for 2.6.22 SMP kernels to change how
sendmsg is called?


From: "Kevin Wilson" <[EMAIL PROTECTED]>
Date: Thu, 13 Dec 2007 14:54:31 -0600

> The 2nd method fixes the kernel oops I reported. Thanks to all those
> that assisted me (0) with my first post to this list (see below)
> ... uh, oh yeah, did I mention that number would tally up to ZERO
> people. ;-}

Because the knowledgable folks here have zero interest in helping
people with out-of-tree-likely-binary-only modules, because in such
cases you have our code but we don't have your's.

Have a nice day.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC] net: napi fix

2007-12-13 Thread Jarek Poplawski

David Miller wrote, On 12/13/2007 09:37 PM:
...

> For example, if we export the list handling widget into the ->poll()
> routines, god help the person who wants to change how the poll list is
> managed in net_rx_action() :-/

...I'm afraid I can't understand: I mean doing the same but without
passing this info with 'work == weight': if driver sends this info,
why it can't instead call something like napi_continue() with
this list_move_tail() (and probably additional local_irq_disable()/
enble() - but since it's unlikely()?) which looks much more readable,
and saves one whole unlikely if ()?

Jarek P.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC] net: napi fix

2007-12-13 Thread David Miller

From: Jarek Poplawski <[EMAIL PROTECTED]>
Date: Thu, 13 Dec 2007 23:28:41 +0100

> ...I'm afraid I can't understand: I mean doing the same but without
> passing this info with 'work == weight': if driver sends this info,
> why it can't instead call something like napi_continue() with
> this list_move_tail() (and probably additional local_irq_disable()/
> enble() - but since it's unlikely()?) which looks much more readable,
> and saves one whole unlikely if ()?

Because the poll list is private to net_rx_action() and we don't
want to expose implementation details like that to every
->poll() implementation.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC] net: napi fix

2007-12-13 Thread Jarek Poplawski

David Miller wrote, On 12/13/2007 11:34 PM:

> From: Jarek Poplawski <[EMAIL PROTECTED]>
> Date: Thu, 13 Dec 2007 23:28:41 +0100
> 
>> ...I'm afraid I can't understand: I mean doing the same but without
>> passing this info with 'work == weight': if driver sends this info,
>> why it can't instead call something like napi_continue() with
>> this list_move_tail() (and probably additional local_irq_disable()/
>> enble() - but since it's unlikely()?) which looks much more readable,
>> and saves one whole unlikely if ()?
> 
> Because the poll list is private to net_rx_action() and we don't
> want to expose implementation details like that to every
> ->poll() implementation.

So, it seems 'we' failed e.g. exposing napi_complete()...
OK, no offense, I'll only mention at the end that there is
always a possibility to redefine such a function to {} with any
change of implementation.

Jarek P.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: 2.6.24-rc4-mm1 - BUG in tcp_fragment

2007-12-13 Thread Ilpo Järvinen

On Thu, 13 Dec 2007, Cedric Le Goater wrote:

> I got this one while compiling on NFS.
> 
> C.
> 
> kernel BUG at /home/legoater/linux/2.6.24-rc4-mm1/include/net/tcp.h:1480!

I'm not exactly sure what patches you have applied and which patches are 
not, with rc4-mm1 there are two patches (first one was incomplete, I 
assume you had at least that one based on your other mail) to really fix 
the issues in (__|)tcp_reset_fack_counts(...). However, there seems to be 
so much breakage that I have a bit trouble to decide where to start...
The situation seems bit scary :-).

So, I might soon prepare a revert patch for most of the questionable 
TCP parts and ask Dave to apply it (and drop them fully during next 
rebase) unless I suddently figure something out soon which explains 
all/most of the problems, then return to drawing board. ...As it seems 
that the cumulative ACK processing problem discovered later on (having 
rather cumbersome solution with skbs only) will make part of the work 
that's currently in net-2.6.25 quite useless/duplicate effort. But thanks 
anyway for reporting these.

-- 
 i.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-2.6.25 7/8] drivers/infiniband: Use ipv4_is_

2007-12-13 Thread Joe Perches


Signed-off-by: Joe Perches <[EMAIL PROTECTED]>
---
 drivers/infiniband/core/addr.c |4 ++--
 drivers/infiniband/core/cma.c  |5 +++--
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/infiniband/core/addr.c b/drivers/infiniband/core/addr.c
index 5381c80..0802b79 100644
--- a/drivers/infiniband/core/addr.c
+++ b/drivers/infiniband/core/addr.c
@@ -265,11 +265,11 @@ static int addr_resolve_local(struct sockaddr_in *src_in,
if (!dev)
return -EADDRNOTAVAIL;
 
-   if (ZERONET(src_ip)) {
+   if (ipv4_is_zeronet(src_ip)) {
src_in->sin_family = dst_in->sin_family;
src_in->sin_addr.s_addr = dst_ip;
ret = rdma_copy_addr(addr, dev, dev->dev_addr);
-   } else if (LOOPBACK(src_ip)) {
+   } else if (ipv4_is_loopback(src_ip)) {
ret = rdma_translate_ip((struct sockaddr *)dst_in, addr);
if (!ret)
memcpy(addr->dst_dev_addr, dev->dev_addr, MAX_ADDR_LEN);
diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index 0751697..b37045c 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -624,7 +624,8 @@ static inline int cma_zero_addr(struct sockaddr *addr)
struct in6_addr *ip6;
 
if (addr->sa_family == AF_INET)
-   return ZERONET(((struct sockaddr_in *) addr)->sin_addr.s_addr);
+   return ipv4_is_zeronet(
+   ((struct sockaddr_in *)addr)->sin_addr.s_addr);
else {
ip6 = &((struct sockaddr_in6 *) addr)->sin6_addr;
return (ip6->s6_addr32[0] | ip6->s6_addr32[1] |
@@ -634,7 +635,7 @@ static inline int cma_zero_addr(struct sockaddr *addr)
 
 static inline int cma_loopback_addr(struct sockaddr *addr)
 {
-   return LOOPBACK(((struct sockaddr_in *) addr)->sin_addr.s_addr);
+   return ipv4_is_loopback(((struct sockaddr_in *) addr)->sin_addr.s_addr);
 }
 
 static inline int cma_any_addr(struct sockaddr *addr)
-- 
1.5.3.7.949.g2221a6

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-2.6.25 4/8] net/ipv4: Use ipv4_is_

2007-12-13 Thread Joe Perches


Signed-off-by: Joe Perches <[EMAIL PROTECTED]>
---
 net/ipv4/arp.c  |2 +-
 net/ipv4/datagram.c |2 +-
 net/ipv4/devinet.c  |4 +-
 net/ipv4/fib_frontend.c |6 ++--
 net/ipv4/igmp.c |   12 +-
 net/ipv4/ip_gre.c   |   23 +++-
 net/ipv4/ipmr.c |6 ++--
 net/ipv4/raw.c  |2 +-
 net/ipv4/route.c|   52 ++
 net/ipv4/udp.c  |2 +-
 10 files changed, 60 insertions(+), 51 deletions(-)

diff --git a/net/ipv4/arp.c b/net/ipv4/arp.c
index 48e7bf6..80bf2d0 100644
--- a/net/ipv4/arp.c
+++ b/net/ipv4/arp.c
@@ -778,7 +778,7 @@ static int arp_process(struct sk_buff *skb)
  * Check for bad requests for 127.x.x.x and requests for multicast
  * addresses.  If this is one such, delete it.
  */
-   if (LOOPBACK(tip) || MULTICAST(tip))
+   if (ipv4_is_loopback(tip) || ipv4_is_multicast(tip))
goto out;
 
 /*
diff --git a/net/ipv4/datagram.c b/net/ipv4/datagram.c
index 0301dd4..0c0c73f 100644
--- a/net/ipv4/datagram.c
+++ b/net/ipv4/datagram.c
@@ -40,7 +40,7 @@ int ip4_datagram_connect(struct sock *sk, struct sockaddr 
*uaddr, int addr_len)
 
oif = sk->sk_bound_dev_if;
saddr = inet->saddr;
-   if (MULTICAST(usin->sin_addr.s_addr)) {
+   if (ipv4_is_multicast(usin->sin_addr.s_addr)) {
if (!oif)
oif = inet->mc_index;
if (!saddr)
diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c
index 0e2a829..fda7414 100644
--- a/net/ipv4/devinet.c
+++ b/net/ipv4/devinet.c
@@ -401,7 +401,7 @@ static int inet_set_ifa(struct net_device *dev, struct 
in_ifaddr *ifa)
in_dev_hold(in_dev);
ifa->ifa_dev = in_dev;
}
-   if (LOOPBACK(ifa->ifa_local))
+   if (ipv4_is_loopback(ifa->ifa_local))
ifa->ifa_scope = RT_SCOPE_HOST;
return inet_insert_ifa(ifa);
 }
@@ -580,7 +580,7 @@ static __inline__ int inet_abc_len(__be32 addr)
 {
int rc = -1;/* Something else, probably a multicast. */
 
-   if (ZERONET(addr))
+   if (ipv4_is_zeronet(addr))
rc = 0;
else {
__u32 haddr = ntohl(addr);
diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
index 7962830..b03c4c6 100644
--- a/net/ipv4/fib_frontend.c
+++ b/net/ipv4/fib_frontend.c
@@ -167,9 +167,9 @@ static inline unsigned __inet_dev_addr_type(const struct 
net_device *dev,
unsigned ret = RTN_BROADCAST;
struct fib_table *local_table;
 
-   if (ZERONET(addr) || BADCLASS(addr))
+   if (ipv4_is_zeronet(addr) || ipv4_is_badclass(addr))
return RTN_BROADCAST;
-   if (MULTICAST(addr))
+   if (ipv4_is_multicast(addr))
return RTN_MULTICAST;
 
 #ifdef CONFIG_IP_MULTIPLE_TABLES
@@ -710,7 +710,7 @@ void fib_add_ifaddr(struct in_ifaddr *ifa)
if (ifa->ifa_broadcast && ifa->ifa_broadcast != htonl(0x))
fib_magic(RTM_NEWROUTE, RTN_BROADCAST, ifa->ifa_broadcast, 32, 
prim);
 
-   if (!ZERONET(prefix) && !(ifa->ifa_flags&IFA_F_SECONDARY) &&
+   if (!ipv4_is_zeronet(prefix) && !(ifa->ifa_flags&IFA_F_SECONDARY) &&
(prefix != addr || ifa->ifa_prefixlen < 32)) {
fib_magic(RTM_NEWROUTE, dev->flags&IFF_LOOPBACK ? RTN_LOCAL :
  RTN_UNICAST, prefix, ifa->ifa_prefixlen, prim);
diff --git a/net/ipv4/igmp.c b/net/ipv4/igmp.c
index c560a93..1bb4d0d 100644
--- a/net/ipv4/igmp.c
+++ b/net/ipv4/igmp.c
@@ -1742,7 +1742,7 @@ int ip_mc_join_group(struct sock *sk , struct ip_mreqn 
*imr)
int ifindex;
int count = 0;
 
-   if (!MULTICAST(addr))
+   if (!ipv4_is_multicast(addr))
return -EINVAL;
 
rtnl_lock();
@@ -1855,7 +1855,7 @@ int ip_mc_source(int add, int omode, struct sock *sk, 
struct
int leavegroup = 0;
int i, j, rv;
 
-   if (!MULTICAST(addr))
+   if (!ipv4_is_multicast(addr))
return -EINVAL;
 
rtnl_lock();
@@ -1985,7 +1985,7 @@ int ip_mc_msfilter(struct sock *sk, struct ip_msfilter 
*msf, int ifindex)
struct ip_sf_socklist *newpsl, *psl;
int leavegroup = 0;
 
-   if (!MULTICAST(addr))
+   if (!ipv4_is_multicast(addr))
return -EINVAL;
if (msf->imsf_fmode != MCAST_INCLUDE &&
msf->imsf_fmode != MCAST_EXCLUDE)
@@ -2068,7 +2068,7 @@ int ip_mc_msfget(struct sock *sk, struct ip_msfilter *msf,
struct inet_sock *inet = inet_sk(sk);
struct ip_sf_socklist *psl;
 
-   if (!MULTICAST(addr))
+   if (!ipv4_is_multicast(addr))
return -EINVAL;
 
rtnl_lock();
@@ -2130,7 +2130,7 @@ int ip_mc_gsfget(struct sock *sk, struct group_filter 
*gsf,
if (psin->sin_family != AF_INET)
return -EINVAL;
addr = psin->sin_addr.s_addr;
-   if (!MULTICAST(addr))
+   if (!ipv4_is_multicast(addr))

[PATCH net-2.6.25 6/8] sctp: Use ipv4_is_

2007-12-13 Thread Joe Perches


Signed-off-by: Joe Perches <[EMAIL PROTECTED]>
---
 include/net/sctp/constants.h |   36 ++--
 net/sctp/protocol.c  |   12 +++-
 2 files changed, 13 insertions(+), 35 deletions(-)

diff --git a/include/net/sctp/constants.h b/include/net/sctp/constants.h
index 05f22a6..fefcba6 100644
--- a/include/net/sctp/constants.h
+++ b/include/net/sctp/constants.h
@@ -365,36 +365,12 @@ typedef enum {
  * Also, RFC 8.4, non-unicast addresses are not considered valid SCTP
  * addresses.
  */
-#define IS_IPV4_UNUSABLE_ADDRESS(a) \
-   ((htonl(INADDR_BROADCAST) == *a) || \
-   (MULTICAST(*a)) || \
-   (((unsigned char *)(a))[0] == 0) || \
-   unsigned char *)(a))[0] == 198) && \
-   (((unsigned char *)(a))[1] == 18) && \
-   (((unsigned char *)(a))[2] == 0)) || \
-   unsigned char *)(a))[0] == 192) && \
-   (((unsigned char *)(a))[1] == 88) && \
-   (((unsigned char *)(a))[2] == 99)))
-
-/* IPv4 Link-local addresses: 169.254.0.0/16.  */
-#define IS_IPV4_LINK_ADDRESS(a) \
-   unsigned char *)(a))[0] == 169) && \
-   (((unsigned char *)(a))[1] == 254))
-
-/* RFC 1918 "Address Allocation for Private Internets" defines the IPv4
- * private address space as the following:
- *
- * 10.0.0.0 - 10.255.255.255 (10/8 prefix)
- * 172.16.0.0.0 - 172.31.255.255 (172.16/12 prefix)
- * 192.168.0.0 - 192.168.255.255 (192.168/16 prefix)
- */
-#define IS_IPV4_PRIVATE_ADDRESS(a) \
-   unsigned char *)(a))[0] == 10) || \
-   unsigned char *)(a))[0] == 172) && \
-   (((unsigned char *)(a))[1] >= 16) && \
-   (((unsigned char *)(a))[1] < 32)) || \
-   unsigned char *)(a))[0] == 192) && \
-   (((unsigned char *)(a))[1] == 168)))
+#define IS_IPV4_UNUSABLE_ADDRESS(a)\
+   ((htonl(INADDR_BROADCAST) == a) ||  \
+ipv4_is_multicast(a) ||\
+ipv4_is_zeronet(a) ||  \
+ipv4_is_test_198(a) || \
+ipv4_is_anycast_6to4(a))
 
 /* Flags used for the bind address copy functions.  */
 #define SCTP_ADDR6_ALLOWED 0x0001  /* IPv6 address is allowed by
diff --git a/net/sctp/protocol.c b/net/sctp/protocol.c
index d50f610..dc22d71 100644
--- a/net/sctp/protocol.c
+++ b/net/sctp/protocol.c
@@ -359,7 +359,7 @@ static int sctp_v4_addr_valid(union sctp_addr *addr,
  const struct sk_buff *skb)
 {
/* Is this a non-unicast address or a unusable SCTP address? */
-   if (IS_IPV4_UNUSABLE_ADDRESS(&addr->v4.sin_addr.s_addr))
+   if (IS_IPV4_UNUSABLE_ADDRESS(addr->v4.sin_addr.s_addr))
return 0;
 
/* Is this a broadcast address? */
@@ -408,13 +408,15 @@ static sctp_scope_t sctp_v4_scope(union sctp_addr *addr)
 */
 
/* Check for unusable SCTP addresses. */
-   if (IS_IPV4_UNUSABLE_ADDRESS(&addr->v4.sin_addr.s_addr)) {
+   if (IS_IPV4_UNUSABLE_ADDRESS(addr->v4.sin_addr.s_addr)) {
retval =  SCTP_SCOPE_UNUSABLE;
-   } else if (LOOPBACK(addr->v4.sin_addr.s_addr)) {
+   } else if (ipv4_is_loopback(addr->v4.sin_addr.s_addr)) {
retval = SCTP_SCOPE_LOOPBACK;
-   } else if (IS_IPV4_LINK_ADDRESS(&addr->v4.sin_addr.s_addr)) {
+   } else if (ipv4_is_linklocal_169(addr->v4.sin_addr.s_addr)) {
retval = SCTP_SCOPE_LINK;
-   } else if (IS_IPV4_PRIVATE_ADDRESS(&addr->v4.sin_addr.s_addr)) {
+   } else if (ipv4_is_private_10(addr->v4.sin_addr.s_addr) ||
+  ipv4_is_private_172(addr->v4.sin_addr.s_addr) ||
+  ipv4_is_private_192(addr->v4.sin_addr.s_addr)) {
retval = SCTP_SCOPE_PRIVATE;
} else {
retval = SCTP_SCOPE_GLOBAL;
-- 
1.5.3.7.949.g2221a6

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

1 2 >

1 - 100 of 145 matches

Mail list logo