Re: [PATCH][PPPOL2TP]: Fix SMP oops in pppol2tp driver

2008-02-26 Thread Jarek Poplawski
James Chapman wrote, On 02/26/2008 01:14 PM:
...
> Luckily, I'm in the lab where my two borrowed servers are today so I 
> have access to their consoles. Hopefully I'll be able to find out why 
> there are hanging. Btw, they don't hang if I disable irqs around the 
> ppp_input() call.

Maybe you've found the same, or there is some other reason yet, but
IMHO this locking break around ppp_input() is wrong. Probably there
is needed more advanced solution, but this should fix the problem if
it really exists (isn't there possible a race e.g. between receive
from socket and from network card?).

Jarek P.
---

 drivers/net/pppol2tp.c |   16 
 1 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/drivers/net/pppol2tp.c b/drivers/net/pppol2tp.c
index e0b072d..7c6fcb9 100644
--- a/drivers/net/pppol2tp.c
+++ b/drivers/net/pppol2tp.c
@@ -363,18 +363,17 @@ out:
spin_unlock(&session->reorder_q.lock);
 }
 
-/* Dequeue a single skb.
+/* Requeue a single skb.
  */
-static void pppol2tp_recv_dequeue_skb(struct pppol2tp_session *session, struct 
sk_buff *skb)
+static void pppol2tp_recv_requeue_skb(struct pppol2tp_session *session, struct 
sk_buff *skb)
 {
struct pppol2tp_tunnel *tunnel = session->tunnel;
int length = PPPOL2TP_SKB_CB(skb)->length;
struct sock *session_sock = NULL;
 
-   /* We're about to requeue the skb, so unlink it and return resources
+   /* We're about to requeue the skb, so return resources
 * to its current owner (a socket receive buffer).
 */
-   skb_unlink(skb, &session->reorder_q);
skb_orphan(skb);
 
tunnel->stats.rx_packets++;
@@ -436,14 +435,14 @@ static void pppol2tp_recv_dequeue_skb(struct 
pppol2tp_session *session, struct s
 static void pppol2tp_recv_dequeue(struct pppol2tp_session *session)
 {
struct sk_buff *skb;
-   struct sk_buff *tmp;
 
/* If the pkt at the head of the queue has the nr that we
 * expect to send up next, dequeue it and any other
 * in-sequence packets behind it.
 */
+again:
spin_lock(&session->reorder_q.lock);
-   skb_queue_walk_safe(&session->reorder_q, skb, tmp) {
+   skb_queue_walk(&session->reorder_q, skb) {
if (time_after(jiffies, PPPOL2TP_SKB_CB(skb)->expires)) {
session->stats.rx_seq_discards++;
session->stats.rx_errors++;
@@ -469,9 +468,10 @@ static void pppol2tp_recv_dequeue(struct pppol2tp_session 
*session)
goto out;
}
}
+   __skb_unlink(skb, &session->reorder_q);
spin_unlock(&session->reorder_q.lock);
-   pppol2tp_recv_dequeue_skb(session, skb);
-   spin_lock(&session->reorder_q.lock);
+   pppol2tp_recv_requeue_skb(session, skb);
+   goto again;
}
 
 out:
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC/PATCH] ipg: add jumbo frame support kconfig option

2008-02-26 Thread Pekka J Enberg
[ Sorry for the duplicate. I typoed Francois' email address. ]

From: Pekka Enberg <[EMAIL PROTECTED]>

Convert the internal JUMBO_FRAME #ifdef to CONFIG_IP1000_JUMBO_FRAME proper and
fix compilation errors.

Cc: Francois Romieu <[EMAIL PROTECTED]>
Cc: Sorbica Shieh <[EMAIL PROTECTED]>
Cc: Jesse Huang <[EMAIL PROTECTED]>
Signed-off-by: Pekka Enberg <[EMAIL PROTECTED]>
---
 drivers/net/Kconfig |8 
 drivers/net/ipg.c   |   21 ++---
 drivers/net/ipg.h   |   11 ++-
 3 files changed, 24 insertions(+), 16 deletions(-)

Index: linux-2.6/drivers/net/Kconfig
===
--- linux-2.6.orig/drivers/net/Kconfig
+++ linux-2.6/drivers/net/Kconfig
@@ -2029,6 +2029,14 @@ config IP1000
  To compile this driver as a module, choose M here: the module
  will be called ipg.  This is recommended.
 
+config IP1000_JUMBO_FRAME
+   bool "Support for jumbo frames (EXPERIMENTAL)"
+   depends on IP1000 && EXPERIMENTAL
+   help
+ This option enables jumbo frame support for the IP1000 driver.
+
+ If in doubt, say N.
+
 config IGB
tristate "Intel(R) 82575 PCI-Express Gigabit Ethernet support"
depends on PCI
Index: linux-2.6/drivers/net/ipg.c
===
--- linux-2.6.orig/drivers/net/ipg.c
+++ linux-2.6/drivers/net/ipg.c
@@ -42,7 +42,6 @@
 #define ipg_r16(reg)   ioread16(ioaddr + (reg))
 #define ipg_r8(reg)ioread8(ioaddr + (reg))
 
-#define JUMBO_FRAME_4k_ONLY
 enum {
netdev_io_size = 128
 };
@@ -1079,7 +1078,7 @@ static int ipg_nic_rxrestore(struct net_
return 0;
 }
 
-#ifdef JUMBO_FRAME
+#ifdef CONFIG_IP1000_JUMBO_FRAME
 
 /* use jumboindex and jumbosize to control jumbo frame status
  * initial status is jumboindex=-1 and jumbosize=0
@@ -1274,7 +1273,7 @@ static void ipg_nic_rx_with_end(struct n
 
framelen = le64_to_cpu(rxfd->rfs) & IPG_RFS_RXFRAMELEN;
 
-   endframeLen = framelen - jumbo->current_size;
+   endframelen = framelen - jumbo->current_size;
/*
if (framelen > IPG_RXFRAG_SIZE)
framelen=IPG_RXFRAG_SIZE;
@@ -1282,8 +1281,8 @@ static void ipg_nic_rx_with_end(struct n
if (framelen > IPG_RXSUPPORT_SIZE)
dev_kfree_skb_irq(jumbo->skb);
else {
-   memcpy(skb_put(jumbo->skb, endframeLen),
-  skb->data, endframeLen);
+   memcpy(skb_put(jumbo->skb, endframelen),
+  skb->data, endframelen);
 
jumbo->skb->protocol =
eth_type_trans(jumbo->skb, dev);
@@ -1355,16 +1354,16 @@ static int ipg_nic_rx(struct net_device 
 
switch (ipg_nic_rx_check_frame_type(dev)) {
case FRAME_WITH_START_WITH_END:
-   ipg_nic_rx_with_start_and_end(dev, tp, rxfd, entry);
+   ipg_nic_rx_with_start_and_end(dev, sp, rxfd, entry);
break;
case FRAME_WITH_START:
-   ipg_nic_rx_with_start(dev, tp, rxfd, entry);
+   ipg_nic_rx_with_start(dev, sp, rxfd, entry);
break;
case FRAME_WITH_END:
-   ipg_nic_rx_with_end(dev, tp, rxfd, entry);
+   ipg_nic_rx_with_end(dev, sp, rxfd, entry);
break;
case FRAME_NO_START_NO_END:
-   ipg_nic_rx_no_start_no_end(dev, tp, rxfd, entry);
+   ipg_nic_rx_no_start_no_end(dev, sp, rxfd, entry);
break;
}
}
@@ -1595,7 +1594,7 @@ static irqreturn_t ipg_interrupt_handler
 
IPG_DEBUG_MSG("_interrupt_handler\n");
 
-#ifdef JUMBO_FRAME
+#ifdef CONFIG_IP1000_JUMBO_FRAME
ipg_nic_rxrestore(dev);
 #endif
spin_lock(&sp->lock);
@@ -1807,7 +1806,7 @@ static int ipg_nic_open(struct net_devic
if (ipg_config_autoneg(dev) < 0)
printk(KERN_INFO "%s: Auto-negotiation error.\n", dev->name);
 
-#ifdef JUMBO_FRAME
+#ifdef CONFIG_IP1000_JUMBO_FRAME
/* initialize JUMBO Frame control variable */
sp->jumbo.found_start = 0;
sp->jumbo.current_size = 0;
Index: linux-2.6/drivers/net/ipg.h
===
--- linux-2.6.orig/drivers/net/ipg.h
+++ linux-2.6/drivers/net/ipg.h
@@ -536,7 +536,7 @@ enum ipg_regs {
  */
 #defineIPG_FRAMESBETWEENTXDMACOMPLETES 0x1
 
-#ifdef JUMBO_FRAME
+#ifdef CONFIG_IP1000_JUMBO_FRAME
 
 # ifdef JUMBO_FRAME_SIZE_2K
 # define JUMBO_FRAME_SIZE 2048
@@ -575,6 +575,7 @@ enum ipg_regs {
 # define __IPG_RXF

[RFC/PATCH] ipg: add jumbo frame support kconfig option

2008-02-26 Thread Pekka J Enberg
From: Pekka Enberg <[EMAIL PROTECTED]>

Convert the internal JUMBO_FRAME #ifdef to CONFIG_IP1000_JUMBO_FRAME proper and
fix compilation errors.

Cc: Francois Romieu <[EMAIL PROTECTED]>
Cc: Sorbica Shieh <[EMAIL PROTECTED]>
Cc: Jesse Huang <[EMAIL PROTECTED]>
Signed-off-by: Pekka Enberg <[EMAIL PROTECTED]>
---
 drivers/net/Kconfig |8 
 drivers/net/ipg.c   |   21 ++---
 drivers/net/ipg.h   |   11 ++-
 3 files changed, 24 insertions(+), 16 deletions(-)

Index: linux-2.6/drivers/net/Kconfig
===
--- linux-2.6.orig/drivers/net/Kconfig
+++ linux-2.6/drivers/net/Kconfig
@@ -2029,6 +2029,14 @@ config IP1000
  To compile this driver as a module, choose M here: the module
  will be called ipg.  This is recommended.
 
+config IP1000_JUMBO_FRAME
+   bool "Support for jumbo frames (EXPERIMENTAL)"
+   depends on IP1000 && EXPERIMENTAL
+   help
+ This option enables jumbo frame support for the IP1000 driver.
+
+ If in doubt, say N.
+
 config IGB
tristate "Intel(R) 82575 PCI-Express Gigabit Ethernet support"
depends on PCI
Index: linux-2.6/drivers/net/ipg.c
===
--- linux-2.6.orig/drivers/net/ipg.c
+++ linux-2.6/drivers/net/ipg.c
@@ -42,7 +42,6 @@
 #define ipg_r16(reg)   ioread16(ioaddr + (reg))
 #define ipg_r8(reg)ioread8(ioaddr + (reg))
 
-#define JUMBO_FRAME_4k_ONLY
 enum {
netdev_io_size = 128
 };
@@ -1079,7 +1078,7 @@ static int ipg_nic_rxrestore(struct net_
return 0;
 }
 
-#ifdef JUMBO_FRAME
+#ifdef CONFIG_IP1000_JUMBO_FRAME
 
 /* use jumboindex and jumbosize to control jumbo frame status
  * initial status is jumboindex=-1 and jumbosize=0
@@ -1274,7 +1273,7 @@ static void ipg_nic_rx_with_end(struct n
 
framelen = le64_to_cpu(rxfd->rfs) & IPG_RFS_RXFRAMELEN;
 
-   endframeLen = framelen - jumbo->current_size;
+   endframelen = framelen - jumbo->current_size;
/*
if (framelen > IPG_RXFRAG_SIZE)
framelen=IPG_RXFRAG_SIZE;
@@ -1282,8 +1281,8 @@ static void ipg_nic_rx_with_end(struct n
if (framelen > IPG_RXSUPPORT_SIZE)
dev_kfree_skb_irq(jumbo->skb);
else {
-   memcpy(skb_put(jumbo->skb, endframeLen),
-  skb->data, endframeLen);
+   memcpy(skb_put(jumbo->skb, endframelen),
+  skb->data, endframelen);
 
jumbo->skb->protocol =
eth_type_trans(jumbo->skb, dev);
@@ -1355,16 +1354,16 @@ static int ipg_nic_rx(struct net_device 
 
switch (ipg_nic_rx_check_frame_type(dev)) {
case FRAME_WITH_START_WITH_END:
-   ipg_nic_rx_with_start_and_end(dev, tp, rxfd, entry);
+   ipg_nic_rx_with_start_and_end(dev, sp, rxfd, entry);
break;
case FRAME_WITH_START:
-   ipg_nic_rx_with_start(dev, tp, rxfd, entry);
+   ipg_nic_rx_with_start(dev, sp, rxfd, entry);
break;
case FRAME_WITH_END:
-   ipg_nic_rx_with_end(dev, tp, rxfd, entry);
+   ipg_nic_rx_with_end(dev, sp, rxfd, entry);
break;
case FRAME_NO_START_NO_END:
-   ipg_nic_rx_no_start_no_end(dev, tp, rxfd, entry);
+   ipg_nic_rx_no_start_no_end(dev, sp, rxfd, entry);
break;
}
}
@@ -1595,7 +1594,7 @@ static irqreturn_t ipg_interrupt_handler
 
IPG_DEBUG_MSG("_interrupt_handler\n");
 
-#ifdef JUMBO_FRAME
+#ifdef CONFIG_IP1000_JUMBO_FRAME
ipg_nic_rxrestore(dev);
 #endif
spin_lock(&sp->lock);
@@ -1807,7 +1806,7 @@ static int ipg_nic_open(struct net_devic
if (ipg_config_autoneg(dev) < 0)
printk(KERN_INFO "%s: Auto-negotiation error.\n", dev->name);
 
-#ifdef JUMBO_FRAME
+#ifdef CONFIG_IP1000_JUMBO_FRAME
/* initialize JUMBO Frame control variable */
sp->jumbo.found_start = 0;
sp->jumbo.current_size = 0;
Index: linux-2.6/drivers/net/ipg.h
===
--- linux-2.6.orig/drivers/net/ipg.h
+++ linux-2.6/drivers/net/ipg.h
@@ -536,7 +536,7 @@ enum ipg_regs {
  */
 #defineIPG_FRAMESBETWEENTXDMACOMPLETES 0x1
 
-#ifdef JUMBO_FRAME
+#ifdef CONFIG_IP1000_JUMBO_FRAME
 
 # ifdef JUMBO_FRAME_SIZE_2K
 # define JUMBO_FRAME_SIZE 2048
@@ -575,6 +575,7 @@ enum ipg_regs {
 # define __IPG_RXFRAG_SIZE 4088
 # else
 # define JUMBO_FRAME_SIZE

Re: [Bluez-devel] forcing SCO connection patch

2008-02-26 Thread Marcel Holtmann

Hi Loius,


--- linux-2.6.23/net/bluetooth/hci_event.c.orig 2008-02-25
17:17:11.0 +0900
+++ linux-2.6.23/net/bluetooth/hci_event.c 2008-02-25
17:30:23.0 +0900
@@ -1313,8 +1313,17 @@
hci_dev_lock(hdev);

conn = hci_conn_hash_lookup_ba(hdev, ev->link_type, &ev->bdaddr);
- if (!conn)
- goto unlock;
+ if (!conn) {
+ if (ev->link_type != ACL_LINK) {
+ __u8 link_type = (ev->link_type == ESCO_LINK) ? SCO_LINK :  
ESCO_LINK;

+
+ conn = hci_conn_hash_lookup_ba(hdev, link_type, &ev->bdaddr);
+ if (conn)
+ conn->type = ev->link_type;
+ }
+ if (!conn)
+ goto unlock;
+ }


NAK. There is no need to check for ACL_LINK. The sync_complete will
only be called for SCO or eSCO connections.

I see. I removed this check line in the patch.

Thanks.
Louis JANG
Signed-off-by: Louis JANG <[EMAIL PROTECTED]>
--- linux-2.6.23/net/bluetooth/hci_event.c.orig	2008-02-26  
12:46:36.0 +0900
+++ linux-2.6.23/net/bluetooth/hci_event.c	2008-02-26  
12:47:23.0 +0900

@@ -1313,8 +1313,15 @@
hci_dev_lock(hdev);

conn = hci_conn_hash_lookup_ba(hdev, ev->link_type, &ev->bdaddr);
-   if (!conn)
-   goto unlock;
+   if (!conn) {
+		__u8 link_type = (ev->link_type == ESCO_LINK) ? SCO_LINK :  
ESCO_LINK;

+
+   conn = hci_conn_hash_lookup_ba(hdev, link_type, &ev->bdaddr);
+   if (conn)
+   conn->type = ev->link_type;
+   else
+   goto unlock;
+   }

if (!ev->status) {
conn->handle = __le16_to_cpu(ev->handle);


do something like this:

if (!conn) {


conn = 
if (!conn)
goto unlock;

conn->type = ev->link_type;
}

And include a description when submitting a patch.

Regards

Marcel

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: problems with e1000 and flow control

2008-02-26 Thread Kok, Auke
Brandeburg, Jesse wrote:
> Wolfgang Walter wrote:
>> it seems that e1000 enables flow-control (rx pause frames) even if
>> the switch does not advertise flow control. This seems to get a
>> problem as (at least some) switches then forward pause frames
>> directed to the card from other hosts. We think there are hosts which
>> indeed do this in the lans of our student halls.
>>
>> I think flow control should be completely disabled by default if the
>> switch does not advertise it. It still can be forced with ethtool.
> 
> We agree, and our latest standalone drivers have taken this into
> account, but the kernel drivers have not been updated all the way yet to
> fix this issue.

ok, that explains what is going on :)

I'll take a look into getting these changes upstream. Perhaps Wolfgang can 
confirm
that the driver on e1000.sf.net is properly working for him?

Auke
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: problems with e1000 and flow control

2008-02-26 Thread Kok, Auke
Wolfgang Walter wrote:
> Hello,
> 
> it seems that e1000 enables flow-control (rx pause frames) even if the switch 
> does not advertise flow control. This seems to get a problem as (at least 
> some) switches then forward pause frames directed to the card from other 
> hosts. We think there are hosts which indeed do this in the lans of our 
> student halls.
> 
> I think flow control should be completely disabled by default if the switch 
> does not advertise it. It still can be forced with ethtool.

Are you sure that the switch actually advertises the flow control "disabled"
setting properly? Perhaps you can include the e1000 ethtool flow control 
settings
and dmesg output (2.6.24 will print out FC status when link comes up).

There's a lengthy argumentation including spec references in the e1000/e1000e
driver code on how FC is handled (look for "IEEE 802.3ab"). Can you take a look 
at
that and see what might be happening?

Changing this behaviour might give other people issues which is something we 
need
to be very cautious about, obviously :)

Cheers,

Auke
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 0/6] pasemi_mac updates for 2.6.26

2008-02-26 Thread Olof Johansson
On Tue, Feb 26, 2008 at 01:21:00PM -0500, Jeff Garzik wrote:
> Olof Johansson wrote:
> > On Tue, Feb 26, 2008 at 08:49:58PM +1100, Paul Mackerras wrote:

> >> What route do you think these should take upstream?  I'm happy to take
> >> them if Jeff is OK with that.
> > 
> > I've sent them through Jeff in the past, that's been convenient when
> > there's been churn in the network APIs. I'm not sure if there's much of
> > that for .26 though.
> > 
> > If Jeff prefers to ACK, I'll just add it to my git and ask you to pull
> > that. But I was originally planning to just feed it through him.
> > 
> > (Note: I'll repost the patch set later today or tomorrow with a couple
> > of tweaks).
> 
> Not much networking churn for 2.6.26, and IMO this patchset have 
> above-average ppc changes/dependencies, so   ACK

Ok, thanks Jeff.

Paul: I'll commit to my tree and ask you to pull later.


-Olof
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: problems with e1000 and flow control

2008-02-26 Thread Brandeburg, Jesse
Wolfgang Walter wrote:
> it seems that e1000 enables flow-control (rx pause frames) even if
> the switch does not advertise flow control. This seems to get a
> problem as (at least some) switches then forward pause frames
> directed to the card from other hosts. We think there are hosts which
> indeed do this in the lans of our student halls.
> 
> I think flow control should be completely disabled by default if the
> switch does not advertise it. It still can be forced with ethtool.

We agree, and our latest standalone drivers have taken this into
account, but the kernel drivers have not been updated all the way yet to
fix this issue.

Thanks for the input, 
 Jesse
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 0/6] pasemi_mac updates for 2.6.26

2008-02-26 Thread Jeff Garzik

Olof Johansson wrote:

On Tue, Feb 26, 2008 at 08:49:58PM +1100, Paul Mackerras wrote:

Olof Johansson writes:


Here's a set of updates for pasemi_mac for 2.6.26. Some of them touch
the dma_lib in the platform code as well, but it's easier if it's all
merged through netdev to avoid dependencies.

Major highlights are jumbo frame support and ethtool basics, the rest
is mostly minor plumbing around it.

What route do you think these should take upstream?  I'm happy to take
them if Jeff is OK with that.


I've sent them through Jeff in the past, that's been convenient when
there's been churn in the network APIs. I'm not sure if there's much of
that for .26 though.

If Jeff prefers to ACK, I'll just add it to my git and ask you to pull
that. But I was originally planning to just feed it through him.

(Note: I'll repost the patch set later today or tomorrow with a couple
of tweaks).


Not much networking churn for 2.6.26, and IMO this patchset have 
above-average ppc changes/dependencies, so   ACK



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 00/28] Swap over NFS -v16

2008-02-26 Thread Andrew Morton
On Tue, 26 Feb 2008 11:50:42 +0100 Peter Zijlstra <[EMAIL PROTECTED]> wrote:

> On Tue, 2008-02-26 at 17:03 +1100, Neil Brown wrote:
> > On Saturday February 23, [EMAIL PROTECTED] wrote:
>  
> > > What is the NFS and net people's take on all of this?
> > 
> > Well I'm only vaguely an NFS person, barely a net person, sporadically
> > an mm person, but I've had a look and it seems to mostly make sense.
> 
> Thanks for taking a look, and giving such elaborate feedback. I'll try
> and address these issues asap, but first let me reply to a few points
> here.

Neil's overview of what-all-this-is and how-it-all-works is really good. 
I'd suggest that you take it over, flesh it out and attach it firmly to the
patchset.  It really helps.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 00/28] Swap over NFS -v16

2008-02-26 Thread Miklos Szeredi
> > > > mm-page_file_methods.patch
> > > > 
> > > > This makes page_offset and others more expensive by adding a
> > > > conditional jump to a function call that is not usually made.
> > > > 
> > > > Why do swap pages have a different index to everyone else?
> > > 
> > > Because the page->index of an anonymous page is related to its (anon)vma
> > > so that it satisfies the constraints for vm_normal_page().
> > > 
> > > The index in the swap file it totally unrelated and quite random. Hence
> > > the swap-cache uses page->private to store it in.
> > 
> > Yeah, and putting the condition into page_offset() will confuse code
> > which uses it for finding the offset in the VMA or in a tmpfs file.
> > 
> > So why not just have a separate page_swap_offset() function, used
> > exclusively by swap_in/out()?
> 
> Ah, we can do the page_file_offset() to match page_file_index() and
> page_file_mapping(). And convert NFS to use page_file_offset() where
> appropriate, as I already did for these others.
> 
> That would sort out the mess, right?

Yes, that sounds perfect.

Miklos
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 00/28] Swap over NFS -v16

2008-02-26 Thread Peter Zijlstra

On Tue, 2008-02-26 at 16:29 +0100, Miklos Szeredi wrote:
> > > mm-page_file_methods.patch
> > > 
> > > This makes page_offset and others more expensive by adding a
> > > conditional jump to a function call that is not usually made.
> > > 
> > > Why do swap pages have a different index to everyone else?
> > 
> > Because the page->index of an anonymous page is related to its (anon)vma
> > so that it satisfies the constraints for vm_normal_page().
> > 
> > The index in the swap file it totally unrelated and quite random. Hence
> > the swap-cache uses page->private to store it in.
> 
> Yeah, and putting the condition into page_offset() will confuse code
> which uses it for finding the offset in the VMA or in a tmpfs file.
> 
> So why not just have a separate page_swap_offset() function, used
> exclusively by swap_in/out()?

Ah, we can do the page_file_offset() to match page_file_index() and
page_file_mapping(). And convert NFS to use page_file_offset() where
appropriate, as I already did for these others.

That would sort out the mess, right?

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 00/28] Swap over NFS -v16

2008-02-26 Thread Peter Zijlstra

On Tue, 2008-02-26 at 16:29 +0100, Miklos Szeredi wrote:
> > > mm-page_file_methods.patch
> > > 
> > > This makes page_offset and others more expensive by adding a
> > > conditional jump to a function call that is not usually made.
> > > 
> > > Why do swap pages have a different index to everyone else?
> > 
> > Because the page->index of an anonymous page is related to its (anon)vma
> > so that it satisfies the constraints for vm_normal_page().
> > 
> > The index in the swap file it totally unrelated and quite random. Hence
> > the swap-cache uses page->private to store it in.
> 
> Yeah, and putting the condition into page_offset() will confuse code
> which uses it for finding the offset in the VMA 

Right, do we do that anywhere?

> or in a tmpfs file.

Good point. I really should go read tmpfs some day, its really a blind
spot for me.

> So why not just have a separate page_swap_offset() function, used
> exclusively by swap_in/out()?

That would require duplicating quite a lot of NFS code from what I can
see.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Can not send icmp netunreach packet

2008-02-26 Thread Stephen Hemminger
On Tue, 26 Feb 2008 18:59:08 +0900
Wei Yongjun <[EMAIL PROTECTED]> wrote:

> Jarek Poplawski wrote:
> 
> Maybe ip_error() does not handle the ESRCH error. In this place ESRCH eq 
> to ENETUNREACH?
> 
> static int ip_error(struct sk_buff *skb)
> {
>   struct rtable *rt = (struct rtable*)skb->dst;
>   unsigned long now;
>   int code;
> 
>   switch (rt->u.dst.error) {
>   case EINVAL:
>   default:
>   goto out;
>   case EHOSTUNREACH:
>   code = ICMP_HOST_UNREACH;
>   break;
>   case ENETUNREACH:
>   code = ICMP_NET_UNREACH;
>   break;
>   case EACCES:
>   code = ICMP_PKT_FILTERED;
>   break;
>   }
> ...snip
> }
> 
> 
> 
> > On 26-02-2008 07:34, Li Yewang wrote:
> >   
> >> Hi All
> >>
> >>There is a bug about icmp netunreach.
> >>If the kernel does not find a route for a packet, 
> >>it must send a icmp netunreach packet to the source host, 
> >>and  discard  the packet. But the  kernel  does not send 
> >>a icmp netunreach packet because of the  fib_lookup
> >>return value  of -ESRCH when a route  is not found. 
> >> 
> >
> > ...or because some function doesn't handle -ESRCH return from
> > fib_lookup? It seems changing this to -ESRCH was needed in some cases.
> > And you don't explain enough why it can't be handled later (like in
> > ipv4/route.c: ip_route_input_slow)?
> >   
> 
> 
> > Regards,
> > Jarek P.
> >
> >   
> >> Signed-off-by: Li Yewang <[EMAIL PROTECTED]>
> >>
> >> diff -Nurp net/core_back/fib_rules.c net/core/fib_rules.c
> >> --- net/core_back/fib_rules.c   2008-02-25 13:15:37.0 +0800
> >> +++ net/core/fib_rules.c2008-02-25 13:16:01.0 +0800
> >> @@ -188,7 +188,7 @@ jumped:
> >>}
> >>}
> >>  
> >> -  err = -ESRCH;
> >> +  err = -ENETUNREACH;
> >>  out:
> >>rcu_read_unlock();
> >>
> >> 

The switch shouldn't see a problem because ENETUNREACH is already substituted
for ESRCH in:


static int ip_route_input_slow(struct sk_buff *skb, __be32 daddr, __be32 saddr,
   u8 tos, struct net_device *dev)
{
...
/*
 *  Now we are ready to route packet.
 */
if ((err = fib_lookup(net, &fl, &res)) != 0) {
if (!IN_DEV_FORWARD(in_dev))
goto e_hostunreach;
goto no_route;

...
no_route:
RT_CACHE_STAT_INC(in_no_route);
spec_dst = inet_select_addr(dev, 0, RT_SCOPE_UNIVERSE);
res.type = RTN_UNREACHABLE;
if (err == -ESRCH)
err = -ENETUNREACH;
goto local_input;


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 00/28] Swap over NFS -v16

2008-02-26 Thread Miklos Szeredi
> > mm-page_file_methods.patch
> > 
> > This makes page_offset and others more expensive by adding a
> > conditional jump to a function call that is not usually made.
> > 
> > Why do swap pages have a different index to everyone else?
> 
> Because the page->index of an anonymous page is related to its (anon)vma
> so that it satisfies the constraints for vm_normal_page().
> 
> The index in the swap file it totally unrelated and quite random. Hence
> the swap-cache uses page->private to store it in.

Yeah, and putting the condition into page_offset() will confuse code
which uses it for finding the offset in the VMA or in a tmpfs file.

So why not just have a separate page_swap_offset() function, used
exclusively by swap_in/out()?

Miklos
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH][EBTABLES] Fix alignment checks in ebt_among.ko module.

2008-02-26 Thread Pavel Emelyanov
When trying to do

# ebtables -A FORWARD --among-src 0:12:34:56:78:9a=192.168.0.10 -j 
ACCEPT

on x86_64 box the ebt_among->check() callback warns me that

ebtables: among: wrong size: 1060 against expected 1056, rounded to 1056

Checking the ebtables sources, I found that the alignment is done
differently in the tool and the kernel. Tool makes it like this:

EBT_ALIGN(sizeof(struct ebt_among_info)) + X

while the kernel module like this:

EBT_ALIGN(sizeof(struct ebt_among_info) + X)

So the suggested fix is to move the alignment in the kernel. After
the fix the rule is added and appears in the ebtables -L output.

Originally developed by Evgeny Kravtsunov.

Prepared against net-2.6 tree.

Signed-off-by: Evgeny Kravtsunov <[EMAIL PROTECTED]>
Signed-off-by: Pavel Emelyanov <[EMAIL PROTECTED]>

---

diff --git a/net/bridge/netfilter/ebt_among.c b/net/bridge/netfilter/ebt_among.c
index 70b6dca..349a543 100644
--- a/net/bridge/netfilter/ebt_among.c
+++ b/net/bridge/netfilter/ebt_among.c
@@ -182,7 +182,7 @@ static int ebt_among_check(const char *tablename, unsigned 
int hookmask,
   unsigned int datalen)
 {
const struct ebt_among_info *info = data;
-   int expected_length = sizeof(struct ebt_among_info);
+   int expected_length = EBT_ALIGN(sizeof(struct ebt_among_info));
const struct ebt_mac_wormhash *wh_dst, *wh_src;
int err;
 
@@ -191,7 +191,7 @@ static int ebt_among_check(const char *tablename, unsigned 
int hookmask,
expected_length += ebt_mac_wormhash_size(wh_dst);
expected_length += ebt_mac_wormhash_size(wh_src);
 
-   if (datalen != EBT_ALIGN(expected_length)) {
+   if (datalen != expected_length) {
printk(KERN_WARNING
   "ebtables: among: wrong size: %d "
   "against expected %d, rounded to %Zd\n",
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 0/6] pasemi_mac updates for 2.6.26

2008-02-26 Thread Olof Johansson
On Tue, Feb 26, 2008 at 08:49:58PM +1100, Paul Mackerras wrote:
> Olof Johansson writes:
> 
> > Here's a set of updates for pasemi_mac for 2.6.26. Some of them touch
> > the dma_lib in the platform code as well, but it's easier if it's all
> > merged through netdev to avoid dependencies.
> > 
> > Major highlights are jumbo frame support and ethtool basics, the rest
> > is mostly minor plumbing around it.
> 
> What route do you think these should take upstream?  I'm happy to take
> them if Jeff is OK with that.

I've sent them through Jeff in the past, that's been convenient when
there's been churn in the network APIs. I'm not sure if there's much of
that for .26 though.

If Jeff prefers to ACK, I'll just add it to my git and ask you to pull
that. But I was originally planning to just feed it through him.

(Note: I'll repost the patch set later today or tomorrow with a couple
of tweaks).


-Olof

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 1/6] pasemi_mac: Move RX/TX section enablement to dma_lib

2008-02-26 Thread Olof Johansson
Hi,

On Tue, Feb 26, 2008 at 10:46:06PM +1100, Michael Ellerman wrote:
> On Wed, 2008-02-20 at 20:57 -0600, Olof Johansson wrote:
> > +   i = 1000;
> > +   pasemi_write_dma_reg(PAS_DMA_COM_RXCMD, 0);
> > +   while ((i > 0) && (pasemi_read_dma_reg(PAS_DMA_COM_RXSTA) & 1))
> > +   i--;
> > +   if (i < 0)
> > +   printk(KERN_INFO "Warning: Could not disable RX section\n");
> > +
> > +   i = 1000;
> > +   pasemi_write_dma_reg(PAS_DMA_COM_TXCMD, 0);
> > +   while ((i > 0) && (pasemi_read_dma_reg(PAS_DMA_COM_TXSTA) & 1))
> > +   i--;
> 
> This kind of caught my eye, is it still going to work when the next core
> is twice as fast?

Actually, I added the variable right before posting, I used to have an
infinite loop there while testing the patch. I've never seen it do more
than a few rounds, so I'm not that worried.

We already have a similar loop in the channel shutdown code, but it runs
a bit longer. I might bring that over instead. Thanks for pointing it
out.


-Olof

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 22/28] mm: add support for non block device backed swap files

2008-02-26 Thread Miklos Szeredi
Starting review in the middle, because this is the part I'm most
familiar with.

> New addres_space_operations methods are added:
>   int swapfile(struct address_space *, int);

Separate ->swapon() and ->swapoff() methods would be so much cleaner IMO.

Also is there a reason why 'struct file *' cannot be supplied to these
functions?

[snip]

> +int swap_set_page_dirty(struct page *page)
> +{
> + struct swap_info_struct *sis = page_swap_info(page);
> +
> + if (sis->flags & SWP_FILE) {
> + const struct address_space_operations *a_ops =
> + sis->swap_file->f_mapping->a_ops;
> + int (*spd)(struct page *) = a_ops->set_page_dirty;
> +#ifdef CONFIG_BLOCK
> + if (!spd)
> + spd = __set_page_dirty_buffers;
> +#endif

This ifdef is not really needed.  Just require ->set_page_dirty() be
filled in by filesystems which want swapfiles (and others too, in the
longer term, the fallback is just historical crud).

Here's an incremental patch addressing these issues and beautifying
the new code.

Signed-off-by: Miklos Szeredi <[EMAIL PROTECTED]>

Index: linux/mm/page_io.c
===
--- linux.orig/mm/page_io.c 2008-02-26 11:15:58.0 +0100
+++ linux/mm/page_io.c  2008-02-26 13:40:55.0 +0100
@@ -106,8 +106,10 @@ int swap_writepage(struct page *page, st
}
 
if (sis->flags & SWP_FILE) {
-   ret = sis->swap_file->f_mapping->
-   a_ops->swap_out(sis->swap_file, page, wbc);
+   struct file *swap_file = sis->swap_file;
+   struct address_space *mapping = swap_file->f_mapping;
+
+   ret = mapping->a_ops->swap_out(swap_file, page, wbc);
if (!ret)
count_vm_event(PSWPOUT);
return ret;
@@ -136,12 +138,13 @@ void swap_sync_page(struct page *page)
struct swap_info_struct *sis = page_swap_info(page);
 
if (sis->flags & SWP_FILE) {
-   const struct address_space_operations *a_ops =
-   sis->swap_file->f_mapping->a_ops;
-   if (a_ops->sync_page)
-   a_ops->sync_page(page);
-   } else
+   struct address_space *mapping = sis->swap_file->f_mapping;
+
+   if (mapping->a_ops->sync_page)
+   mapping->a_ops->sync_page(page);
+   } else {
block_sync_page(page);
+   }
 }
 
 int swap_set_page_dirty(struct page *page)
@@ -149,17 +152,12 @@ int swap_set_page_dirty(struct page *pag
struct swap_info_struct *sis = page_swap_info(page);
 
if (sis->flags & SWP_FILE) {
-   const struct address_space_operations *a_ops =
-   sis->swap_file->f_mapping->a_ops;
-   int (*spd)(struct page *) = a_ops->set_page_dirty;
-#ifdef CONFIG_BLOCK
-   if (!spd)
-   spd = __set_page_dirty_buffers;
-#endif
-   return (*spd)(page);
-   }
+   struct address_space *mapping = sis->swap_file->f_mapping;
 
-   return __set_page_dirty_nobuffers(page);
+   return mapping->a_ops->set_page_dirty(page);
+   } else {
+   return __set_page_dirty_nobuffers(page);
+   }
 }
 
 int swap_readpage(struct file *file, struct page *page)
@@ -172,8 +170,10 @@ int swap_readpage(struct file *file, str
BUG_ON(PageUptodate(page));
 
if (sis->flags & SWP_FILE) {
-   ret = sis->swap_file->f_mapping->
-   a_ops->swap_in(sis->swap_file, page);
+   struct file *swap_file = sis->swap_file;
+   struct address_space *mapping = swap_file->f_mapping;
+
+   ret = mapping->a_ops->swap_in(swap_file, page);
if (!ret)
count_vm_event(PSWPIN);
return ret;
Index: linux/include/linux/fs.h
===
--- linux.orig/include/linux/fs.h   2008-02-26 11:15:58.0 +0100
+++ linux/include/linux/fs.h2008-02-26 13:29:40.0 +0100
@@ -485,7 +485,8 @@ struct address_space_operations {
/*
 * swapfile support
 */
-   int (*swapfile)(struct address_space *, int);
+   int (*swapon)(struct file *file);
+   int (*swapoff)(struct file *file);
int (*swap_out)(struct file *file, struct page *page,
struct writeback_control *wbc);
int (*swap_in)(struct file *file, struct page *page);
Index: linux/mm/swapfile.c
===
--- linux.orig/mm/swapfile.c2008-02-26 12:43:57.0 +0100
+++ linux/mm/swapfile.c 2008-02-26 13:34:57.0 +0100
@@ -1014,9 +1014,11 @@ static void destroy_swap_extents(struct 
}
 
if (sis->flags & SWP_FILE) {
+   struct file *swap_file = sis->swap

Re: [PATCH][PPPOL2TP]: Fix SMP oops in pppol2tp driver

2008-02-26 Thread Jarek Poplawski
On Tue, Feb 26, 2008 at 01:03:34PM +, Jarek Poplawski wrote:
> On Tue, Feb 26, 2008 at 12:14:26PM +, James Chapman wrote:
...
> > there are hanging. Btw, they don't hang if I disable irqs around the  
> > ppp_input() call.
> 
> ...and disabling bh instead isn't enough, BTW?

I guess not: they are mostly disabled by ppp_input() itself...

So, it looks like a network card could mess here?

Jarek P.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


The natural ecological beauty in Taiwan

2008-02-26 Thread 『Taiwan News Express』


 






 


--
Powered by PHPlist, www.phplist.com --



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH][PPPOL2TP]: Fix SMP oops in pppol2tp driver

2008-02-26 Thread Jarek Poplawski
On Tue, Feb 26, 2008 at 12:14:26PM +, James Chapman wrote:
> Jarek Poplawski wrote:
>> Jarek Poplawski wrote, On 02/25/2008 02:39 PM:
>> ...
>>> Hmm... Wait a minute! But on the other hand David has written about
>>> his cons here, and it looks reasonable: this place would be fixed,
>>> but some others can start reports like this. Maybe, it's better to
>>> analyze yet if it's really so hard to eliminate taking this lock
>>> on the xmit path?
>>
>> James, I wonder if you could try to test this patch below?
>> ip_queue_xmit() seems to do proper things with __sk_dst_check(), and
>> if some other functions don't behave similarly lockdep should tell.
>> I think, you could test it with your "locks to _bh" patch (without
>> pppol2tp_xmit() part), and maybe with my sock.c lockdep patch (it
>> should help lockdep to see these locks a bit more distinctly).
>
> I found the same thing and was running a variant of your patch last  
> night; rather than set skb->dst to NULL though, I use __sk_dst_get() and  
> let ip_queue_xmit() do the route lookup if it returns NULL. But this has  
> the same symptoms as the code I tried a few days ago - no lockdep errors  
> but a system lockup after up to several hours. Nothing is logged in the  
> syslog.

I guess you are going to try this together with this sk_dst_lock with
bh patch too. If it's possible I'd suggest to try this skb->dst = NULL
as well (__sk_dst_get instead of __sk_dst_check seems to be too racy).

> Luckily, I'm in the lab where my two borrowed servers are today so I  
> have access to their consoles. Hopefully I'll be able to find out why  
> there are hanging. Btw, they don't hang if I disable irqs around the  
> ppp_input() call.

...and disabling bh instead isn't enough, BTW?

> Will update you later.

Thanks,
Jarek P.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 22/28] mm: add support for non block device backed swap files

2008-02-26 Thread Peter Zijlstra

On Tue, 2008-02-26 at 13:45 +0100, Miklos Szeredi wrote:
> Starting review in the middle, because this is the part I'm most
> familiar with.
> 
> > New addres_space_operations methods are added:
> >   int swapfile(struct address_space *, int);
> 
> Separate ->swapon() and ->swapoff() methods would be so much cleaner IMO.

I'm ok with that, but its a_ops bloat, do we care about that? I guess
since it has limited instances - typically one per filesystem - there is
no issue here.

> Also is there a reason why 'struct file *' cannot be supplied to these
> functions?

No real reason here. I guess its cleaner indeed. Thanks.

> > +int swap_set_page_dirty(struct page *page)
> > +{
> > +   struct swap_info_struct *sis = page_swap_info(page);
> > +
> > +   if (sis->flags & SWP_FILE) {
> > +   const struct address_space_operations *a_ops =
> > +   sis->swap_file->f_mapping->a_ops;
> > +   int (*spd)(struct page *) = a_ops->set_page_dirty;
> > +#ifdef CONFIG_BLOCK
> > +   if (!spd)
> > +   spd = __set_page_dirty_buffers;
> > +#endif
> 
> This ifdef is not really needed.  Just require ->set_page_dirty() be
> filled in by filesystems which want swapfiles (and others too, in the
> longer term, the fallback is just historical crud).

Agreed. This is a good motivation to clean up that stuff.

> Here's an incremental patch addressing these issues and beautifying
> the new code.

Thanks, I'll fold it into the patch and update the documentation. I'll
put your creds in akpm style.


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH][PPPOL2TP]: Fix SMP oops in pppol2tp driver

2008-02-26 Thread James Chapman

Jarek Poplawski wrote:

Jarek Poplawski wrote, On 02/25/2008 02:39 PM:
...

Hmm... Wait a minute! But on the other hand David has written about
his cons here, and it looks reasonable: this place would be fixed,
but some others can start reports like this. Maybe, it's better to
analyze yet if it's really so hard to eliminate taking this lock
on the xmit path?


James, I wonder if you could try to test this patch below?
ip_queue_xmit() seems to do proper things with __sk_dst_check(), and
if some other functions don't behave similarly lockdep should tell.
I think, you could test it with your "locks to _bh" patch (without
pppol2tp_xmit() part), and maybe with my sock.c lockdep patch (it
should help lockdep to see these locks a bit more distinctly).


I found the same thing and was running a variant of your patch last 
night; rather than set skb->dst to NULL though, I use __sk_dst_get() and 
let ip_queue_xmit() do the route lookup if it returns NULL. But this has 
the same symptoms as the code I tried a few days ago - no lockdep errors 
but a system lockup after up to several hours. Nothing is logged in the 
syslog.


Luckily, I'm in the lab where my two borrowed servers are today so I 
have access to their consoles. Hopefully I'll be able to find out why 
there are hanging. Btw, they don't hang if I disable irqs around the 
ppp_input() call.


Will update you later.

/james


PS: Since ppp_generic isn't endangered for now I removed Paul from CC.

---

diff --git a/drivers/net/pppol2tp.c b/drivers/net/pppol2tp.c
index e0b072d..b94659a 100644
--- a/drivers/net/pppol2tp.c
+++ b/drivers/net/pppol2tp.c
@@ -1058,7 +1058,7 @@ static int pppol2tp_xmit(struct ppp_channel *chan, struct 
sk_buff *skb)
 
 	/* Get routing info from the tunnel socket */

dst_release(skb->dst);
-   skb->dst = sk_dst_get(sk_tun);
+   skb->dst = NULL;
skb_orphan(skb);
skb->sk = sk_tun;
 
--

To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html




--
James Chapman
Katalix Systems Ltd
http://www.katalix.com
Catalysts for your Embedded Linux software development

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 00/28] Swap over NFS -v16

2008-02-26 Thread Peter Zijlstra

On Tue, 2008-02-26 at 11:50 +0100, Peter Zijlstra wrote:

> > mm-reserve.patch
> > 
> >I'm confused by __mem_reserve_add.
> > 
> > +   reserve = mem_reserve_root.pages;
> > +   __calc_reserve(res, pages, 0);
> > +   reserve = mem_reserve_root.pages - reserve;
> > 
> >__calc_reserve will always add 'pages' to mem_reserve_root.pages.
> >So this is a complex way of doing
> > reserve = pages;
> > __calc_reserve(res, pages, 0);
> > 
> > And as you can calculate reserve before calling __calc_reserve
> > (which seems odd when stated that way), the whole function looks
> > like it could become:
> > 
> >ret = adjust_memalloc_reserve(pages);
> >if (!ret)
> > __calc_reserve(res, pages, limit);
> >return ret;
> > 
> > What am I missing?
> 
> Probably the horrible twist my brain has. Looking at it makes me doubt
> my own sanity. I think you're right - it would also clean up
> __calc_reserve() a little.
> 
> This is what review for :-)

Ah, you confused me. Well, I confused me - this does deserve a comment
its tricksy.

Its correct. The trick is, the mem_reserve in question (res) need not be
connected to mem_reserve_root.

In that case, mem_reserve_root.pages will not change, but we do
propagate the change as far up as possible, so that
mem_reserve_connect() can just observe the parent and child without
being bothered by the rest of the hierarchy.



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Netem] Fixed delay patch for netem

2008-02-26 Thread Julio Kriger
Hi Stephen,
Answering your question: there is no diffence if (and only if) there
is no gap reordering. If there is gap reordering then packets will get
a delay of 0ms or 80ms +/- 15ms. With the patch I sent,  packets will
get a delay of 50ms or 80ms +/- 15ms.
I have read about other network emulators, like NS-2, and they have
the option of gap reordering.
As I say, I need a fixed delay of 50ms plus all the other usefull
stuff that comes with netem. And netem does not allow be nested with
himself.
Regards,
Julio


On 2/25/08, Stephen Hemminger <[EMAIL PROTECTED]> wrote:
> On Sun, 24 Feb 2008 12:11:16 -0200
> "Julio Kriger" <[EMAIL PROTECTED]> wrote:
>
> > Hi!
> > I have created this patch to add a fixed delay on packet filtered by
> > netem. Soon I will send the patch to iproute2.
> > This patch comes from a need I have to delay all packets 50ms, beside
> > the actual delay setting, like 30ms +- 15 ms.
>
> Why is 50ms + 30m +/- 15ms any different than 80ms +/- 15ms
>
> > This strike, IMMHO, a
> > missing point on gap reordering. If I set "gap 5 delay 10ms" every 5th
> > (10th, 15th, ...) packet to go to be sent immediately and every other
> > packet to be delayed by 10ms. This  is ok, but I also need a "fixed"
> > delay of 50ms to be applied to all packets. Since netem can't be
> > nested with himself (so I can do a fixed delay), I needed this new
> > feature on netem.
>
> The gap stuff is an awkward interface that should/could have been
> done better.
>
> > This patch was create with linux kernel version 2.6.24.2.
> > I hope you like it, and it would be great if it goes shiped with the
> > next version of the kernel :-))
> > Regards,
> > Julio Kriger
>
> Maybe, but it is getting confusing with all the growth of parameters.
> Probably time for a rethink.
>


-- 
--
>From the moment I picked your book up until I laid it down, I was
convulsed with laughter. Someday I intend reading it.
Groucho Marx

Julio Kriger
mailto:[EMAIL PROTECTED]
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 1/6] pasemi_mac: Move RX/TX section enablement to dma_lib

2008-02-26 Thread Michael Ellerman
On Wed, 2008-02-20 at 20:57 -0600, Olof Johansson wrote:
> plain text document attachment (in-progress)
> Also stop both rx and tx sections before changing the configuration of
> the dma device during init.
> 
> Signed-off-by: Olof Johansson <[EMAIL PROTECTED]>
> 
> Index: k.org/arch/powerpc/platforms/pasemi/dma_lib.c
> ===
> --- k.org.orig/arch/powerpc/platforms/pasemi/dma_lib.c
> +++ k.org/arch/powerpc/platforms/pasemi/dma_lib.c
> @@ -478,6 +478,30 @@ int pasemi_dma_init(void)
>   for (i = 0; i < MAX_RXCH; i++)
>   __set_bit(i, rxch_free);
>  
> + i = 1000;
> + pasemi_write_dma_reg(PAS_DMA_COM_RXCMD, 0);
> + while ((i > 0) && (pasemi_read_dma_reg(PAS_DMA_COM_RXSTA) & 1))
> + i--;
> + if (i < 0)
> + printk(KERN_INFO "Warning: Could not disable RX section\n");
> +
> + i = 1000;
> + pasemi_write_dma_reg(PAS_DMA_COM_TXCMD, 0);
> + while ((i > 0) && (pasemi_read_dma_reg(PAS_DMA_COM_TXSTA) & 1))
> + i--;

This kind of caught my eye, is it still going to work when the next core
is twice as fast?

cheers

-- 
Michael Ellerman
OzLabs, IBM Australia Development Lab

wwweb: http://michael.ellerman.id.au
phone: +61 2 6212 1183 (tie line 70 21183)

We do not inherit the earth from our ancestors,
we borrow it from our children. - S.M.A.R.T Person


signature.asc
Description: This is a digitally signed message part


[PATCH] changes required to flush all routing table entries in side kernel

2008-02-26 Thread dur...@it.iitb.ac.in phani
Problem :   Need to clean up routing table entries in side the Linux
 kernel. But  kernel is not providing  single command to clear all
 routing table entries at once.

 Modified the kernel to take of the changesDefined new  net link
 message under  NETLINK_ROUTE   family  RTM_FLUSHROUTE.   At the
 receiving end in side kernel,  removed all routing table entries
 related with each device.

  Changed the code so that only the corresponding device reference
 will be incremented and perform the flush of entries related to that
 device and reduce the reference count.   In side fib_sync_flush(This
 function is same as fib_sync_down, except  that checking for protocol)
 ,  checking whether the route entry is installed by protocol,  and
 marking them only as dead.


 These are the changes . (version  2.6.14.2 linux kernel)

 --- linux26/include/net/ip_fib.h 2008-01-04 04:41:45.326857000 -0800
 +++ linux26/include/net/modified_ip_fib.h   2008-01-21 04:46:55.0 -0800
 @@ -233,6 +233,9 @@ extern void ip_fib_init(void);

 +extern int inet_rtm_flushroute(struct sk_buff *skb, struct nlmsghdr*
 nlh, void *arg);

 +extern int fib_sync_flush(u32 local, struct net_device *dev, int
 force, int protocol);



 --- linux26/include/linux/rtnetlink.h   2008-01-04 02:57:57.487754000 -0800
 +++ linux26/include/linux/modified_rtnetlink.h  2008-01-21
 04:46:56.0 -0800
 @@ -35,7 +35,11 @@ enum {
  #define RTM_DELROUTE   RTM_DELROUTE
RTM_GETROUTE,
  #define RTM_GETROUTE   RTM_GETROUTE
 -
 +RTM_FLUSHROUTE,
 +#define RTM_FLUSHROUTE RTM_FLUSHROUTE
 +
RTM_NEWNEIGH= 28,
  #define RTM_NEWNEIGH   RTM_NEWNEIGH
RTM_DELNEIGH,
 @@ -199,7 +203,9 @@ enum
 ~


 --- linux26/net/ipv4/fib_frontend.c 2008-01-04 03:07:17.964607000 -0800
 +++ linux26/net/ipv4/modified_fib_frontend.c2008-01-21
 04:46:53.0 -0800
 +/*
 + * Added For flushing all the routes when clear ip route is issued
 from user space
 + */
 +int inet_rtm_flushroute(struct sk_buff *skb, struct nlmsghdr* nlh, void *arg)
 +{
 +struct net_device *dev;
 +struct in_device *in_dev;
 +struct rtattr **rta = arg;
 +struct rtmsg *r = NLMSG_DATA(nlh);
 +
 +if (inet_check_attr(r, rta))
 +  return -EINVAL;
 +
 +for (dev = dev_base; dev; dev = dev->next) {
 +in_dev = in_dev_get(dev);
 +if (!in_dev)
 +continue;
 +if(fib_sync_flush(0, dev, 0, r->rtm_protocol)) {
 +fib_flush();
 +rt_cache_flush(0);
 +}
 +in_dev_put(in_dev);
 +}
 +
 +return 0;
 +}

 This fib_sync_flush is same as fib_sync_down, except  for each entry
 it compares the protocol type of the routing entry  to be  removed.

 --- linux26/net/ipv4/fib_semantics.c2008-01-09 02:22:49.819492000 -0800
 +++ linux26/net/ipv4/modified_fib_semantics.c1  2008-01-21
 04:46:52.0 -0800
 +/*
 + * FLUSH all  the routing table entries related to a
 + * device
 + */
 +
 +int fib_sync_flush(u32 local, struct net_device *dev, int force, int protocol)
 +{
 +int ret = 0;
 +int scope = RT_SCOPE_NOWHERE;
 +
 +if (force)
 +scope = -1;
 +
 +if (local && fib_info_laddrhash) {
 +unsigned int hash = fib_laddr_hashfn(local);
 +struct hlist_head *head = &fib_info_laddrhash[hash];
 +struct hlist_node *node;
 +struct fib_info *fi;
 +hlist_for_each_entry(fi, node, head, fib_lhash) {
 +if (fi->fib_prefsrc == local) {
 +fi->fib_flags |= RTNH_F_DEAD;
 +ret++;
 +}
 +}
 +}
 +
 +if (dev) {
 +struct fib_info *prev_fi = NULL;
 +unsigned int hash = fib_devindex_hashfn(dev->ifindex);
 +struct hlist_head *head = &fib_info_devhash[hash];
 +struct hlist_node *node;
 +struct fib_nh *nh;
 +hlist_for_each_entry(nh, node, head, nh_hash) {
 +struct fib_info *fi = nh->nh_parent;
 +if(fi->fib_protocol == protocol) {
 +int dead;
 +BUG_ON(!fi->fib_nhs);
 +if (nh->nh_dev != dev || fi == prev_fi)
 +continue;
 +prev_fi = fi;
 +dead = 0;
 +change_nexthops(fi) {
 +if (nh->nh_flags&RTNH_F_DEAD)
 +dead++;
 +else if (nh->nh_dev == dev &&
 +nh->nh_scope != scope) {
 +nh->nh_flags |= RTNH_F_DEAD;
 +#ifdef CONFIG_IP_ROUTE_MULTIPATH
 +spin_lock_bh(&fib_multipath_lock);
 +fi->fib_power -= nh->nh_power;
 +nh->nh_power = 0;
 +spin_unlock_bh(&fib_multipath_lock);
 +#endif
 +dead++;
 +}
 +#ifdef CONFIG_IP_ROUTE_MULTIPATH
 +if (force > 1 && nh->nh_dev == dev) {
 +dead = fi->fib_nhs;
 +break;
 + 

Re: [PATCH] Can not send icmp netunreach packet

2008-02-26 Thread Jarek Poplawski
On Tue, Feb 26, 2008 at 06:59:08PM +0900, Wei Yongjun wrote:
> Jarek Poplawski wrote:
>
> Maybe ip_error() does not handle the ESRCH error. In this place ESRCH eq  
> to ENETUNREACH?

It doesn't handle ESRCH for sure... Current solution seems to expect
it is changed earlier to ENETUNREACH. It looks reasonable because
otherwise all other places checking for this should be updated too.

But, IMHO, it could be tested if such a change here helps in current
problem, and then maybe found where it was skipped? On the other hand,
probably checking with grep for all such ENETUNREACH cases, and adding
ESRCH where needed could be much simpler and safer...

Jarek P.

>
> static int ip_error(struct sk_buff *skb)
> {
>   struct rtable *rt = (struct rtable*)skb->dst;
>   unsigned long now;
>   int code;
>
>   switch (rt->u.dst.error) {
>   case EINVAL:
>   default:
>   goto out;
>   case EHOSTUNREACH:
>   code = ICMP_HOST_UNREACH;
>   break;
>   case ENETUNREACH:
>   code = ICMP_NET_UNREACH;
>   break;
>   case EACCES:
>   code = ICMP_PKT_FILTERED;
>   break;
>   }
> ...snip
> }
>
>
>
>> On 26-02-2008 07:34, Li Yewang wrote:
>>   
>>> Hi All
>>>
>>>There is a bug about icmp netunreach.
>>>If the kernel does not find a route for a packet,it must send 
>>> a icmp netunreach packet to the source host,and  discard  the 
>>> packet. But the  kernel  does not senda icmp netunreach packet 
>>> because of the  fib_lookup
>>>return value  of -ESRCH when a route  is not found. 
>>
>> ...or because some function doesn't handle -ESRCH return from
>> fib_lookup? It seems changing this to -ESRCH was needed in some cases.
>> And you don't explain enough why it can't be handled later (like in
>> ipv4/route.c: ip_route_input_slow)?
>>   
>
>
>> Regards,
>> Jarek P.
>>
>>   
>>> Signed-off-by: Li Yewang <[EMAIL PROTECTED]>
>>>
>>> diff -Nurp net/core_back/fib_rules.c net/core/fib_rules.c
>>> --- net/core_back/fib_rules.c   2008-02-25 13:15:37.0 +0800
>>> +++ net/core/fib_rules.c2008-02-25 13:16:01.0 +0800
>>> @@ -188,7 +188,7 @@ jumped:
>>> }
>>> }
>>>  -  err = -ESRCH;
>>> +   err = -ENETUNREACH;
>>>  out:
>>> rcu_read_unlock();
>>>
>>> 
>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 00/28] Swap over NFS -v16

2008-02-26 Thread Peter Zijlstra
Hi Neil,


On Tue, 2008-02-26 at 17:03 +1100, Neil Brown wrote:
> On Saturday February 23, [EMAIL PROTECTED] wrote:
 
> > What is the NFS and net people's take on all of this?
> 
> Well I'm only vaguely an NFS person, barely a net person, sporadically
> an mm person, but I've had a look and it seems to mostly make sense.

Thanks for taking a look, and giving such elaborate feedback. I'll try
and address these issues asap, but first let me reply to a few points
here.

> We introduce a new "emergency" concept for page allocation.
> The size of the emergency pool is set by various reservations by
> different potential users.
> If the number of free pages is below the "emergency" size, then only
> users with a "MEMALLOC" flag get to allocate pages.  Further, those
> pages get a "reserve" flag set which propagates into slab/slub so
> kmalloc/kmemalloc only return memory from those pages to MEMALLOC
> users. 
> MEMALLOC users are those that set PF_MEMALLOC.  A socket can get
> SOCK_MEMALLOC set which will cause certain pieces of code to
> temporarily set PF_MEMALLOC while working on that socket.

Small detail, there is also __GFP_MEMALLOC, this is used for single
allocations to avoid setting and unsetting PF_MEMALLOC - like in the skb
alloc once we have determined we otherwise fail and still have room.

> The upshot is that providing any MEMALLOC user reserves an appropriate
> amount of emergency space, returns the emergency memory promptly, and
> sets PF_MEMALLOC whenever allocating memory, it's memory allocations
> should never fail.
> 
> As memory is requested is small units, but allocated as pages, there
> needs to be a conversion from small-units to pages.  One of the
> patches does this and appears to err on the side of be over-generous,
> which is the right thing to do.
> 
> 
> Memory reservations are organised in a tree.  I really don't
> understand the tree.  Is it just to make /proc/reserve_info look more
> helpful?
> Certainly all the individual reservations need to be recorded, and the
> cumulative reservation needs also to be recorded (currently in the
> root of the tree) but what are all the other levels used for?

Ah, there is a little trick there, I hint at that in the reserve.c
description comment:

+ * As long as a subtree has the same usage unit, an aggregate node can be used
+ * to charge against, instead of the leaf nodes. However, do be consistent with
+ * who is charged, resource usage is not propagated up the tree (for
+ * performance reasons).

And I actually use that, if we show a little of the tree (which andrew
rightly dislikes for not being machine parseable - will fix):

+ * localhost ~ # cat /proc/reserve_info
+ * total reserve  8156K (0/544817)
+ *   total network reserve  8156K (0/544817)
+ * network TX reserve 196K (0/49)
+ *   protocol TX pages  196K (0/49)
+ * network RX reserve 7960K (0/544768)
+ *   IPv6 route cache   1372K (0/4096)
+ *   IPv4 route cache   5468K (0/16384)
+ *   SKB data reserve   1120K (0/524288)
+ * IPv6 fragment cache560K (0/262144)
+ * IPv4 fragment cache560K (0/262144)

We see that the 'SKB data reserve' is build up of the IPv4 and IPv6
fragment cache reserves.

I use the 'SKB data reserve' to charge memory against and account usage,
but use its children to grow/shrink the actual reserve.

This allows you to see the individual reserves, but still use an
aggregate.

The tree form is the simplest structure that allowed such things,
another nice thing is that you can easily detach whole sub-trees to stop
actually reserving the memory, but continue tracking its potential
needs. 

This is done when there are no SOCK_MEMALLOC sockets around. The 'total
network reserve' is detached, reducing the 'total reserve' to 0
(assuming no other reserve trees) but the individual reserves are still
tracking their potential need for when it will be re-attached.

With only a single user this might seen a little too much, but I have
hopes for more users.

> Reservations are used for all the transient memory that might be used
> by the network stack.  This particularly includes the route cache and
> skbs for incoming messages.  I have no idea if there is anything else
> that needs to be allowed for.

This is something I'd like feedback on from the network guru's. In my
reading there weren't many other allocation sites, but hey, I'm not much
of a net person myself. (I did write some instrumentation to track
allocations, but I'm sure I didn't get full coverage of the stack with
my simple usage).

> Filesystems can advertise (via address_space_operations) that files
> may be used as swap file.  They then provide swapout/swapin methods
> which are like writepage/readpage but may behave differently and have
> a different way to get credentials from a 'struct file'.

Yes, the added benefit is that even regular blockdev files

Re: [PATCH] Can not send icmp netunreach packet

2008-02-26 Thread Wei Yongjun

Jarek Poplawski wrote:

Maybe ip_error() does not handle the ESRCH error. In this place ESRCH eq 
to ENETUNREACH?


static int ip_error(struct sk_buff *skb)
{
struct rtable *rt = (struct rtable*)skb->dst;
unsigned long now;
int code;

switch (rt->u.dst.error) {
case EINVAL:
default:
goto out;
case EHOSTUNREACH:
code = ICMP_HOST_UNREACH;
break;
case ENETUNREACH:
code = ICMP_NET_UNREACH;
break;
case EACCES:
code = ICMP_PKT_FILTERED;
break;
}
...snip
}




On 26-02-2008 07:34, Li Yewang wrote:
  

Hi All

   There is a bug about icmp netunreach.
   If the kernel does not find a route for a packet, 
   it must send a icmp netunreach packet to the source host, 
   and  discard  the packet. But the  kernel  does not send 
   a icmp netunreach packet because of the  fib_lookup
   return value  of -ESRCH when a route  is not found. 



...or because some function doesn't handle -ESRCH return from
fib_lookup? It seems changing this to -ESRCH was needed in some cases.
And you don't explain enough why it can't be handled later (like in
ipv4/route.c: ip_route_input_slow)?
  




Regards,
Jarek P.

  

Signed-off-by: Li Yewang <[EMAIL PROTECTED]>

diff -Nurp net/core_back/fib_rules.c net/core/fib_rules.c
--- net/core_back/fib_rules.c   2008-02-25 13:15:37.0 +0800
+++ net/core/fib_rules.c2008-02-25 13:16:01.0 +0800
@@ -188,7 +188,7 @@ jumped:
}
}
 
-	err = -ESRCH;

+   err = -ENETUNREACH;
 out:
rcu_read_unlock();




--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 0/6] pasemi_mac updates for 2.6.26

2008-02-26 Thread Paul Mackerras
Olof Johansson writes:

> Here's a set of updates for pasemi_mac for 2.6.26. Some of them touch
> the dma_lib in the platform code as well, but it's easier if it's all
> merged through netdev to avoid dependencies.
> 
> Major highlights are jumbo frame support and ethtool basics, the rest
> is mostly minor plumbing around it.

What route do you think these should take upstream?  I'm happy to take
them if Jeff is OK with that.

Paul.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: include/linux/pcounter.h

2008-02-26 Thread Ingo Molnar

* David Miller <[EMAIL PROTECTED]> wrote:

> From: Andrew Morton <[EMAIL PROTECTED]>
> Date: Sat, 16 Feb 2008 11:26:18 -0800
> 
> > On Sat, 16 Feb 2008 13:03:54 +0100 Eric Dumazet <[EMAIL PROTECTED]> wrote:
> > 
> > > Yes, per connection basis. Some workloads want to open/close more 
> > > than 1000 sockets per second.
> > 
> > ie: slowpath
> 
> Definitely not slow path in the networking.
> 
> Connection rates are definitely as, or more, important than packet 
> rates for certain workloads.

but the main and fundamental question still remains unanswered (more 
than 3 weeks after Andrew asked that question): why was this piece of 
general infrastructure merged via net.git and not submitted to lkml 
ever? The code touching -mm does _not_ count as "review".

Now that there was review of it and there is clearly controversy, the 
code should be reverted/undone and resubmitted after all review 
observations have been addressed. Just sitting around and ignoring 
objections, hoping for the code to hit v2.6.25 is rather un-nice ...

Ingo
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html