Re: UDP packet loss when running lsof

2007-05-22 Thread Eric Dumazet

John Miller a écrit :

Hi Eric,


I CCed netdev since this stuff is about network and not
lkml.


Ok, dropped the CC...


What kind of machine do you have ? SMP or not ?


It's a HP system with two dual core CPUs at 3GHz, the
storage system is connected through QLogic FC-HBA. It should
really be fast enough to handle a data stream of 50 MB/s...


Then you might try to bind network IRQ to one CPU
(echo 1 /proc/irq/XX/smp_affinity)

XX being your NIC interrupt (cat /proc/interrupts to catch it)

and bind your user program to another cpu(s)

You might hit a cond_resched_softirq() bug that Ingo and others are sorting 
out right now. Using separate CPU for softirq handling and your programs 
should help a lot here.





If you have many sockets on this machine, lsof can be
very slow reading /proc/net/tcp and/or /proc/net/udp,
locking some tables long enough to drop packets.


First I tried with one UDP socket and during tests I switched
to 16 sockets with no effect. As I removed nearly all daemons
there aren't many open sockets.

/proc/net/tcp seems to be one cause of the problem: a simple
cat /proc/net/tcp leads nearly allways to immediate UDP packet
loss. So it seems that reading TCP statistics blocks UDP
packet processing.

As it isn't my goal to collect statistics all the time, I could
live with disabling access to /proc/net/tcp, but I wouldn't call
this a good solution...


If you have a low count of tcp sockets, you might want to
boot with thash_entries=2048 or so, to reduce tcp hash
table size.


This did help a lot, I tried thash_entries=10 and now only a
while loop around the cat ...tcp triggers packet loss. Tests
are now running and I can say more tomorrow.


I dont understand here : using a small thash_entries makes the bug always 
appear ?



Getting information about thash_entries is really hard. Even
finding out the default value: For a system with 2GB RAM
it could be around 10.


no RcvbufErrors error as well ?


The kernel is a bit too old (2.6.18). Looking at the patch
from 2.16.18 to 1.6.19 I found that RcvbufErrors is only
increased when InErrors is increased. So my answer would be
yes.


 - Network card is handled by bnx2 kernel module



I dont know this NIC, does it support ethtool ?


It is a Broadcom Corporation NetXtreme II BCM5708S
Gigabit Ethernet (rev 12), and it seems ethtool is supported.

The output below was captured after packet loss (I don't see
any hints, but maybe you):


ethtool -S eth0


NIC statistics:
 rx_bytes: 155481467364
 rx_error_bytes: 0
 tx_bytes: 5492161
 tx_error_bytes: 0
 rx_ucast_packets: 18341
 rx_mcast_packets: 137321933
 rx_bcast_packets: 2380
 tx_ucast_packets: 14416
 tx_mcast_packets: 190
 tx_bcast_packets: 8
 tx_mac_errors: 0
 tx_carrier_errors: 0
 rx_crc_errors: 0
 rx_align_errors: 0
 tx_single_collisions: 0
 tx_multi_collisions: 0
 tx_deferred: 0
 tx_excess_collisions: 0
 tx_late_collisions: 0
 tx_total_collisions: 0
 rx_fragments: 0
 rx_jabbers: 0
 rx_undersize_packets: 0
 rx_oversize_packets: 0
 rx_64_byte_packets: 244575
 rx_65_to_127_byte_packets: 6828
 rx_128_to_255_byte_packets: 167
 rx_256_to_511_byte_packets: 94
 rx_512_to_1023_byte_packets: 393
 rx_1024_to_1522_byte_packets: 137090597
 rx_1523_to_9022_byte_packets: 0
 tx_64_byte_packets: 52
 tx_65_to_127_byte_packets: 7547
 tx_128_to_255_byte_packets: 3304
 tx_256_to_511_byte_packets: 399
 tx_512_to_1023_byte_packets: 897
 tx_1024_to_1522_byte_packets: 2415
 tx_1523_to_9022_byte_packets: 0
 rx_xon_frames: 0
 rx_xoff_frames: 0
 tx_xon_frames: 0
 tx_xoff_frames: 0
 rx_mac_ctrl_frames: 0
 rx_filtered_packets: 158816
 rx_discards: 0
 rx_fw_discards: 0


ethtool -c eth0


Coalesce parameters for eth1:
Adaptive RX: off  TX: off
stats-block-usecs: 36
sample-interval: 0
pkt-rate-low: 0
pkt-rate-high: 0

rx-usecs: 18
rx-frames: 6
rx-usecs-irq: 18
rx-frames-irq: 6

tx-usecs: 80
tx-frames: 20
tx-usecs-irq: 80
tx-frames-irq: 20

rx-usecs-low: 0
rx-frame-low: 0
tx-usecs-low: 0
tx-frame-low: 0

rx-usecs-high: 0
rx-frame-high: 0
tx-usecs-high: 0
tx-frame-high: 0


ethtool -g eth0


Ring parameters for eth1:
Pre-set maximums:
RX: 1020
RX Mini:0
RX Jumbo:   0
TX: 255
Current hardware settings:
RX: 100
RX Mini:0
RX Jumbo:   0
TX: 255


Just to make sure, does your application setup a huge
enough SO_RCVBUF val?


Yes, my first try with one socket was 5MB, but I also tested
with 10 and even 25MB. With 16 sockets I also set it to 5MB.
When pausing the application netstat shows the filled buffers.


What values do you have in /proc/sys/net/ipv4/tcp_rmem ?


I kept the default values there:
409643689   87378


cat /proc/meminfo


MemTotal:  2060664 kB
MemFree:146536 kB
Buffers: 10984 kB
Cached:1667740 kB
SwapCached:  

Re: UDP packet loss when running lsof

2007-05-22 Thread Eric Dumazet

Eric Dumazet a écrit :

John Miller a écrit :

Hi Eric,


I CCed netdev since this stuff is about network and not
lkml.


Ok, dropped the CC...


What kind of machine do you have ? SMP or not ?


It's a HP system with two dual core CPUs at 3GHz, the
storage system is connected through QLogic FC-HBA. It should
really be fast enough to handle a data stream of 50 MB/s...


Then you might try to bind network IRQ to one CPU
(echo 1 /proc/irq/XX/smp_affinity)

XX being your NIC interrupt (cat /proc/interrupts to catch it)

and bind your user program to another cpu(s)

You might hit a cond_resched_softirq() bug that Ingo and others are 
sorting out right now. Using separate CPU for softirq handling and your 
programs should help a lot here.


You might try this patch, now that Ingo Signed-off-by it.

http://marc.info/?l=linux-kernelm=117981607429875w=2


I guess that with a correct softirq resched, no need to play with IRQ 
affinities, unless you really want to push performance.


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] [Net] Support Xen accelerated network plugin modules

2007-05-22 Thread Kieran Mansley
On Mon, 2007-05-21 at 10:50 -0700, Stephen Hemminger wrote:

 Your mailer is word wrapping the patch so it won't apply as is.

Apologies - I'll make sure it doesn't for the next revision.  There
should also have been a copy attached to the email that I would not
expect to be wrapped.

Thanks

Kieran

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] [Net] Support Xen accelerated network plugin modules

2007-05-22 Thread Kieran Mansley
On Mon, 2007-05-21 at 10:52 -0700, Stephen Hemminger wrote:
 On Fri, 18 May 2007 14:16:48 +0100
 Kieran Mansley [EMAIL PROTECTED] wrote:
 
  Add support to Xen netfront for accelerated plugin module
  
 
  
  +/*
  + * List of all netfront accelerator plugin modules available.  Each
  + * list entry is of type struct netfront_accelerator.
  + */ 
  +static struct list_head accelerators_list;
  +/*
  + * Lock to protect access to accelerators_list, and also used to
  + * protect the hooks_usecount field in struct netfront_accelerator
  + * against concurrent access 
  + */
  +static spinlock_t accelerators_lock;
  +
 
 
 Your locking model is more complex than it needs to be.
 If you just used RCU for access, and depended on the existing RT netlink
 mutex (a.k.a big network lock), for setup; then you wouldn't need to
 do any of your own locking.

The complexity arises from the requirement to not take additional locks
on the data path, and not hold a lock during the calls from netfront
into the accelerated plugin.  The lock and ref count aren't really
providing mutual exclusion, but ensuring the function pointers persist
and are consistent while they're in use.  RCU on its own wouldn't
prevent the accelerated plugin being unloaded while netfront was using
one of the hooks.  Wrapping each one of these calls with the big network
lock would be OK, but then we'd be holding locks while calling into the
hooks.

I could remove the lock and ref count as you suggest if I increased the
plugin module's use count to prevent it being unloaded, but then we'd
need some mechanism to signal that a network interface should no longer
be accelerated so that the hooks can be removed, module use count
decreased, and module safely unloaded.

This is something I'm happy to change if necessary, but I thought I
should explain why it's complex before going ahead.  If you still think
it needs changing, then let me know.

Kieran

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Xen-devel] Re: [PATCH 3/4] [Net] Support Xen accelerated network plugin modules

2007-05-22 Thread Kieran Mansley
On Tue, 2007-05-22 at 08:15 +0100, Kieran Mansley wrote:
 RCU on its own wouldn't
 prevent the accelerated plugin being unloaded while netfront was using
 one of the hooks. 

Hmm, actually I think it could be used to do that.  I'll take a look.

Kieran

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Xen-devel] Re: [PATCH 3/4] [Net] Support Xen accelerated network plugin modules

2007-05-22 Thread Keir Fraser



On 22/5/07 08:28, Kieran Mansley [EMAIL PROTECTED] wrote:

 On Tue, 2007-05-22 at 08:15 +0100, Kieran Mansley wrote:
 RCU on its own wouldn't
 prevent the accelerated plugin being unloaded while netfront was using
 one of the hooks.
 
 Hmm, actually I think it could be used to do that.  I'll take a look.

Eagerly zap the function pointers, then wait one RCU period so every CPU
goes through a quiescent point before unloading the module?

 -- Keir

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Xen-devel] Re: [PATCH 3/4] [Net] Support Xen accelerated network plugin modules

2007-05-22 Thread Kieran Mansley
On Tue, 2007-05-22 at 08:48 +0100, Keir Fraser wrote:
 
 
 On 22/5/07 08:28, Kieran Mansley [EMAIL PROTECTED] wrote:
 
  On Tue, 2007-05-22 at 08:15 +0100, Kieran Mansley wrote:
  RCU on its own wouldn't
  prevent the accelerated plugin being unloaded while netfront was using
  one of the hooks.
  
  Hmm, actually I think it could be used to do that.  I'll take a look.
 
 Eagerly zap the function pointers, then wait one RCU period so every CPU
 goes through a quiescent point before unloading the module?

Yes, that's what I was going to try.  

Kieran

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: TCP_MD5 and Intel e1000

2007-05-22 Thread Eric Dumazet
On Tue, 22 May 2007 09:33:29 +0200
Marc Donner [EMAIL PROTECTED] wrote:

 Hi,
 
 I have tried to set up quagga with tcp-md5 support from kernel. All seems ok
 with a intel e100 NIC, but as i testetd with a intel e1000 NIC the tcp
 packets have an invalid md5 digest.
 If i run tcpdump on the mashine the packets are generated, it shows on the
 outgoing interface invalid md5 digests.
 Are there known issues about tcp-md5 and e1000 NICs?
 

Hi Marc

CCed netdev as more appropriate to discuss about network stuff.

Would be nice if you sent some tcpdump samples to share with us, and tell us
which exact linux version you tried.

You could try ethtool -K tx off, and/or other ethtool -K settings

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RTNETLINK]: Remove remains of wireless extensions over rtnetlink

2007-05-22 Thread Johannes Berg

 [RTNETLINK]: Remove remains of wireless extensions over rtnetlink
 
 Remove some unused variables and function arguments related to the recently
 removed wireless extensions over rtnetlink.

Still more! Sorry about that and thanks!

 Signed-off-by: Patrick McHardy [EMAIL PROTECTED]

Since I did the removal in the first place,
Acked-by: Johannes Berg [EMAIL PROTECTED]

johannes


signature.asc
Description: This is a digitally signed message part


Re: TCP_MD5 and Intel e1000

2007-05-22 Thread YOSHIFUJI Hideaki / 吉藤英明
In article [EMAIL PROTECTED] (at Tue, 22 May 2007 10:57:38 +0200), Eric 
Dumazet [EMAIL PROTECTED] says:

  I have tried to set up quagga with tcp-md5 support from kernel. All seems ok
  with a intel e100 NIC, but as i testetd with a intel e1000 NIC the tcp
  packets have an invalid md5 digest.
  If i run tcpdump on the mashine the packets are generated, it shows on the
  outgoing interface invalid md5 digests.
  Are there known issues about tcp-md5 and e1000 NICs?
:
 You could try ethtool -K tx off, and/or other ethtool -K settings

Disabling offloading should help; currently tcp-md5 stack
blindly copy md5-signature from the first segment
which is not appropriate for rest of segments.

--yoshfuji
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: TCP_MD5 and Intel e1000

2007-05-22 Thread Dunc
Eric Dumazet wrote:
 On Tue, 22 May 2007 09:33:29 +0200
 Marc Donner [EMAIL PROTECTED] wrote:
 
 Hi,

 I have tried to set up quagga with tcp-md5 support from kernel. All seems ok
 with a intel e100 NIC, but as i testetd with a intel e1000 NIC the tcp
 packets have an invalid md5 digest.
 If i run tcpdump on the mashine the packets are generated, it shows on the
 outgoing interface invalid md5 digests.
 Are there known issues about tcp-md5 and e1000 NICs?

 
 Hi Marc
 
 CCed netdev as more appropriate to discuss about network stuff.
 
 Would be nice if you sent some tcpdump samples to share with us, and tell us
 which exact linux version you tried.
 
 You could try ethtool -K tx off, and/or other ethtool -K settings
 
 -
 To unsubscribe from this list: send the line unsubscribe netdev in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

I had this with e1000 NICs and it was just because I had TSO on.

It is disabled with ethtool as Eric suggests

Cheers,

Dunc
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] [-mm] ACPI: export ACPI events via netlink

2007-05-22 Thread Zhang Rui
From: Zhang Rui [EMAIL PROTECTED]

Export ACPI events via netlink.
A netlink message is broadcasted when an ACPI event is generated.

Note: The behaviour of how ACPI event works nowadays is not changed.
Netlink is used to export ACPI event instead of /proc/acpi/event 
someday,
but not now.
This patch only adds the function of sending netlink messages
when an ACPI event is generated.
Following is an example of how to receive ACPI event messages.

#include linux/socket.h
#include linux/netlink.h
#include stdio.h

#define NETLINK_ACPI_EVENT  20
#define SOCK_RAW3

struct acpi_event{
char device_class[20];
char bus_id[15];
unsigned int type;
unsigned int data;
};

struct sockaddr_nl src_addr, dest_addr;
int sock_fd;
struct msghdr msg;
struct iovec iov;


int main (void){
int i = 10;
int result;
struct acpi_event event;

sock_fd = socket(PF_NETLINK, SOCK_RAW, NETLINK_ACPI_EVENT);
if (sock_fd == -1) {
printf(Socket faied!\n);
return 0;
}

src_addr.nl_family = AF_NETLINK;
src_addr.nl_pid = getpid();

src_addr.nl_groups = 1;

result = bind(sock_fd, (struct sockaddr*)src_addr, sizeof(src_addr));
if (result) {
printf(Bind faied! %d.\n, result);
return result;
}

iov.iov_base = (void *)event;
iov.iov_len = sizeof(struct acpi_event);

msg.msg_name = (void *)dest_addr;
msg.msg_namelen = sizeof(dest_addr);
msg.msg_iov = iov;
msg.msg_iovlen = 1;

while(i  0) {
printf(Wait...\n);
result = recvmsg(sock_fd, msg, 0);
if (result == -1) {
printf(Rui: recvmsg failed, error is %d\n, result);
return result;
}
printf(%20s %15s %08x %08x\n,
event.device_class, event.bus_id, event.type, 
event.data);
i--;
}

close(sock_fd);
return 0;
}

Signed-off-by: Zhang Rui [EMAIL PROTECTED]
---
 drivers/acpi/bus.c  |   42 ++
 drivers/acpi/event.c|   25 +
 include/acpi/acpi_bus.h |4 +++-
 include/linux/netlink.h |1 +
 4 files changed, 71 insertions(+), 1 deletion(-)

Index: linux-2.6.22-rc1/drivers/acpi/bus.c
===
--- linux-2.6.22-rc1.orig/drivers/acpi/bus.c2007-05-21 10:18:58.0 
+0800
+++ linux-2.6.22-rc1/drivers/acpi/bus.c 2007-05-21 15:38:06.0 +0800
@@ -37,6 +37,7 @@
 #endif
 #include acpi/acpi_bus.h
 #include acpi/acpi_drivers.h
+#include linux/netlink.h
 
 #define _COMPONENT ACPI_BUS_COMPONENT
 ACPI_MODULE_NAME(bus);
@@ -275,6 +276,43 @@
 /* --
 Event Management
-- 
*/
+#ifdef CONFIG_NET
+struct acpi_bus_netlink_event {
+   acpi_device_class device_class;
+   char bus_id[15];
+   u32 type;
+   u32 data;
+};
+
+static int acpi_bus_generate_netlink_event(struct acpi_device *device,
+   u8 type, int data)
+{
+   struct sk_buff *skb = NULL;
+   struct acpi_bus_netlink_event *event = NULL;
+
+   skb = alloc_skb(sizeof(struct acpi_bus_event), GFP_ATOMIC);
+   if (!skb)
+   return -ENOMEM;
+
+   event = (struct acpi_bus_netlink_event *)
+   skb_put(skb, sizeof(struct acpi_bus_netlink_event));
+   strcpy(event-device_class, device-pnp.device_class);
+   strcpy(event-bus_id, device-dev.bus_id);
+   event-type = type;
+   event-data = data;
+
+   NETLINK_CB(skb).dst_group = 1;
+
+   netlink_broadcast(acpi_event_sock, skb, 0, 1, GFP_ATOMIC);
+   return 0;
+}
+#else
+static int acpi_bus_generate_netlink_event(struct acpi_device *device,
+   u8 type, int data)
+{
+   return 0;
+}
+#endif
 
 static DEFINE_SPINLOCK(acpi_bus_event_lock);
 
@@ -292,6 +330,10 @@
if (!device)
return -EINVAL;
 
+   if (acpi_bus_generate_netlink_event(device, type, data))
+   printk(KERN_WARNING PREFIX
+   Failed to generate a netlink message for ACPI 
event!\n);
+
/* drop event on the floor if no one's listening */
if (!event_is_open)
return 0;
Index: linux-2.6.22-rc1/drivers/acpi/event.c
===
--- linux-2.6.22-rc1.orig/drivers/acpi/event.c  2007-05-16 16:12:46.0 
+0800
+++ linux-2.6.22-rc1/drivers/acpi/event.c   2007-05-21 15:38:32.0 
+0800
@@ -11,6 +11,7 @@
 #include linux/init.h
 

Re: [PATCH] [-mm] ACPI: export ACPI events via netlink

2007-05-22 Thread Samuel Ortiz

On 5/22/2007, Zhang Rui [EMAIL PROTECTED] wrote:
Index: linux-2.6.22-rc1/include/linux/netlink.h
===
--- linux-2.6.22-rc1.orig/include/linux/netlink.h  2007-05-21 
10:19:00.0 +0800
+++ linux-2.6.22-rc1/include/linux/netlink.h   2007-05-21 15:26:14.0 
+0800
@@ -24,6 +24,7 @@
 /* leave room for NETLINK_DM (DM Events) */
 #define NETLINK_SCSITRANSPORT 18  /* SCSI Transports */
 #define NETLINK_ECRYPTFS  19
+#define NETLINK_ACPI_EVENT20  /* acpi event notifications */
I think it is recommended to use the generic netlink layer instead of
having everyone adding its own netlink ID.

Cheers,
Samuel.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] [-mm] ACPI: export ACPI events via netlink

2007-05-22 Thread Evgeniy Polyakov
On Tue, May 22, 2007 at 10:05:00AM -, Samuel Ortiz ([EMAIL PROTECTED]) 
wrote:
 
 On 5/22/2007, Zhang Rui [EMAIL PROTECTED] wrote:
 Index: linux-2.6.22-rc1/include/linux/netlink.h
 ===
 --- linux-2.6.22-rc1.orig/include/linux/netlink.h2007-05-21 
 10:19:00.0 +0800
 +++ linux-2.6.22-rc1/include/linux/netlink.h 2007-05-21 15:26:14.0 
 +0800
 @@ -24,6 +24,7 @@
  /* leave room for NETLINK_DM (DM Events) */
  #define NETLINK_SCSITRANSPORT   18  /* SCSI Transports */
  #define NETLINK_ECRYPTFS19
 +#define NETLINK_ACPI_EVENT  20  /* acpi event notifications */
 I think it is recommended to use the generic netlink layer instead of
 having everyone adding its own netlink ID.

It is possible to allocate your own netlink protocol number, but Samuel
is right, it is better to use existing delivery mechanisms like
genetlink and connector.

 Cheers,
 Samuel.
 -
 To unsubscribe from this list: send the line unsubscribe netdev in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
Evgeniy Polyakov
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: TCP_MD5 and Intel e1000

2007-05-22 Thread David Miller
From: YOSHIFUJI Hideaki / 吉藤英明 [EMAIL PROTECTED]
Date: Tue, 22 May 2007 18:36:47 +0900 (JST)

 In article [EMAIL PROTECTED] (at Tue, 22 May 2007 10:57:38 +0200), Eric 
 Dumazet [EMAIL PROTECTED] says:
 
   I have tried to set up quagga with tcp-md5 support from kernel. All seems 
   ok
   with a intel e100 NIC, but as i testetd with a intel e1000 NIC the tcp
   packets have an invalid md5 digest.
   If i run tcpdump on the mashine the packets are generated, it shows on the
   outgoing interface invalid md5 digests.
   Are there known issues about tcp-md5 and e1000 NICs?
 :
  You could try ethtool -K tx off, and/or other ethtool -K settings
 
 Disabling offloading should help; currently tcp-md5 stack
 blindly copy md5-signature from the first segment
 which is not appropriate for rest of segments.

It is clear we should disable TSO for sockets making use of TCP-MD5.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: TCP_MD5 and Intel e1000

2007-05-22 Thread Marc Donner
On Tuesday 22 May 2007, David Miller wrote:
 From: YOSHIFUJI Hideaki / 吉藤英明 [EMAIL PROTECTED]
 Date: Tue, 22 May 2007 18:36:47 +0900 (JST)

  In article [EMAIL PROTECTED] (at Tue, 22 May 
2007 10:57:38 +0200), Eric Dumazet [EMAIL PROTECTED] says:
I have tried to set up quagga with tcp-md5 support from kernel. All
seems ok with a intel e100 NIC, but as i testetd with a intel e1000
NIC the tcp packets have an invalid md5 digest.
If i run tcpdump on the mashine the packets are generated, it shows
on the outgoing interface invalid md5 digests.
Are there known issues about tcp-md5 and e1000 NICs?
  
   You could try ethtool -K tx off, and/or other ethtool -K settings
 
  Disabling offloading should help; currently tcp-md5 stack
  blindly copy md5-signature from the first segment
  which is not appropriate for rest of segments.

 It is clear we should disable TSO for sockets making use of TCP-MD5.

disabling tso works. thanks
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] [-mm] ACPI: export ACPI events via netlink

2007-05-22 Thread jamal
Hi Zhang Rui,

Really cool stuff. Can you instead use genetlink?
http://linux-net.osdl.org/index.php/Generic_Netlink_HOWTO 
should help. And if you have more questions post on netdev (not lk).

cheers,
jamal

On Tue, 2007-22-05 at 17:47 +0800, Zhang Rui wrote:
 From: Zhang Rui [EMAIL PROTECTED]
 
 Export ACPI events via netlink.
 A netlink message is broadcasted when an ACPI event is generated.
 


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[IPSEC]: Fix warnings with casting int to pointer

2007-05-22 Thread Herbert Xu
Hi Dave:

Here's patch to fix the warnings.

[IPSEC]: Fix warnings with casting int to pointer

This patch adds some casts to shut up the warnings introduced by my
last patch that added a common interator function for xfrm algorightms.

Signed-off-by: Herbert Xu [EMAIL PROTECTED] 

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED]
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
diff --git a/net/xfrm/xfrm_algo.c b/net/xfrm/xfrm_algo.c
index 94e3588..ffa1515 100644
--- a/net/xfrm/xfrm_algo.c
+++ b/net/xfrm/xfrm_algo.c
@@ -390,27 +390,27 @@ static struct xfrm_algo_desc *xfrm_find_algo(
 static int xfrm_alg_id_match(const struct xfrm_algo_desc *entry,
 const void *data)
 {
-   return entry-desc.sadb_alg_id == (int)data;
+   return entry-desc.sadb_alg_id == (unsigned long)data;
 }
 
 struct xfrm_algo_desc *xfrm_aalg_get_byid(int alg_id)
 {
return xfrm_find_algo(xfrm_aalg_list, xfrm_alg_id_match,
- (void *)alg_id, 1);
+ (void *)(unsigned long)alg_id, 1);
 }
 EXPORT_SYMBOL_GPL(xfrm_aalg_get_byid);
 
 struct xfrm_algo_desc *xfrm_ealg_get_byid(int alg_id)
 {
return xfrm_find_algo(xfrm_ealg_list, xfrm_alg_id_match,
- (void *)alg_id, 1);
+ (void *)(unsigned long)alg_id, 1);
 }
 EXPORT_SYMBOL_GPL(xfrm_ealg_get_byid);
 
 struct xfrm_algo_desc *xfrm_calg_get_byid(int alg_id)
 {
return xfrm_find_algo(xfrm_calg_list, xfrm_alg_id_match,
- (void *)alg_id, 1);
+ (void *)(unsigned long)alg_id, 1);
 }
 EXPORT_SYMBOL_GPL(xfrm_calg_get_byid);
 
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] ucc_geth: Fix MODULE_DEVICE_TABLE() duplication

2007-05-22 Thread Li Yang

Fix MODULE_DEVICE_TABLE() duplication in ucc_geth.c and ucc_geth_mii.c
for ucc_geth to be compiled as module.

Signed-off-by: Li Yang [EMAIL PROTECTED]
---
drivers/net/ucc_geth_mii.c |2 --
1 files changed, 0 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ucc_geth_mii.c b/drivers/net/ucc_geth_mii.c
index f96966d..7bcb82f 100644
--- a/drivers/net/ucc_geth_mii.c
+++ b/drivers/net/ucc_geth_mii.c
@@ -260,8 +260,6 @@ static struct of_device_id uec_mdio_match[] = {
{},
};

-MODULE_DEVICE_TABLE(of, uec_mdio_match);
-
static struct of_platform_driver uec_mdio_driver = {
.name   = DRV_NAME,
.probe  = uec_mdio_probe,

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] ucc_geth:trivial fix

2007-05-22 Thread Li Yang

Remove redundant includes.

Signed-off-by: Li Yang [EMAIL PROTECTED]
---
drivers/net/ucc_geth.c |3 ---
1 files changed, 0 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ucc_geth.c b/drivers/net/ucc_geth.c
index 864b1aa..3b27b6d 100644
--- a/drivers/net/ucc_geth.c
+++ b/drivers/net/ucc_geth.c
@@ -23,11 +23,8 @@
#include linux/skbuff.h
#include linux/spinlock.h
#include linux/mm.h
-#include linux/ethtool.h
-#include linux/delay.h
#include linux/dma-mapping.h
#include linux/fsl_devices.h
-#include linux/ethtool.h
#include linux/mii.h
#include linux/phy.h
#include linux/workqueue.h

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Xen-devel] Re: [PATCH 3/4] [Net] Support Xen accelerated network plugin modules

2007-05-22 Thread Kieran Mansley
On Tue, 2007-05-22 at 08:48 +0100, Keir Fraser wrote:
 
 
 On 22/5/07 08:28, Kieran Mansley [EMAIL PROTECTED] wrote:
 
  On Tue, 2007-05-22 at 08:15 +0100, Kieran Mansley wrote:
  RCU on its own wouldn't
  prevent the accelerated plugin being unloaded while netfront was using
  one of the hooks.
  
  Hmm, actually I think it could be used to do that.  I'll take a look.
 
 Eagerly zap the function pointers, then wait one RCU period so every CPU
 goes through a quiescent point before unloading the module?
 
  -- Keir

Am I right in thinking that if one of the functions that was protected
by RCU was to block, that would be a bad thing?  Clearly the data path
hooks can't/don't block, but I'm not sure it's so obvious for things
like probing a new device. 

Kieran

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: select(0, ..) is valid ?

2007-05-22 Thread Steve Fox
On Wed, 2007-05-16 at 17:59 -0700, Badari Pulavarty wrote:

 Here it is ..
 
 Should I do one for poll() also ?
 
 Thanks,
 Badari
 
 Optimize select by a using stack space for small fd sets.
 core_sys_select() already has this optimization. This is
 for compat version. 
 
 Signed-off-by: Badari Pulavarty [EMAIL PROTECTED]
 ---
  fs/compat.c |   17 +++--
  1 file changed, 11 insertions(+), 6 deletions(-)
 
 Index: linux-2.6.22-rc1/fs/compat.c
 ===
 --- linux-2.6.22-rc1.orig/fs/compat.c 2007-05-12 18:45:56.0 -0700
 +++ linux-2.6.22-rc1/fs/compat.c  2007-05-16 17:50:39.0 -0700
 @@ -1544,9 +1544,10 @@ int compat_core_sys_select(int n, compat
   compat_ulong_t __user *outp, compat_ulong_t __user *exp, s64 *timeout)
  {
   fd_set_bits fds;
 - char *bits;
 + void *bits;
   int size, max_fds, ret = -EINVAL;
   struct fdtable *fdt;
 + long stack_fds[SELECT_STACK_ALLOC/sizeof(long)];
 
   if (n  0)
   goto out_nofds;
 @@ -1564,11 +1565,14 @@ int compat_core_sys_select(int n, compat
* since we used fdset we need to allocate memory in units of
* long-words.
*/
 - ret = -ENOMEM;
   size = FDS_BYTES(n);
 - bits = kmalloc(6 * size, GFP_KERNEL);
 - if (!bits)
 - goto out_nofds;
 + bits = stack_fds;
 + if (size  sizeof(stack_fds) / 6) {
 + bits = kmalloc(6 * size, GFP_KERNEL);
 + ret = -ENOMEM;
 + if (!bits)
 + goto out_nofds;
 + }
   fds.in  = (unsigned long *)  bits;
   fds.out = (unsigned long *) (bits +   size);
   fds.ex  = (unsigned long *) (bits + 2*size);
 @@ -1600,7 +1604,8 @@ int compat_core_sys_select(int n, compat
   compat_set_fd_set(n, exp, fds.res_ex))
   ret = -EFAULT;
  out:
 - kfree(bits);
 + if (bits != stack_fds)
 + kfree(bits);
  out_nofds:
   return ret;
  }

Andy put this through a couple machines on test.kernel.org and elm3b6
was fixed, however elm3b239 still had a boot error.

BUG: at mm/slab.c:777 __find_general_cachep()

Call Trace:
 [802729c6] __kmalloc+0xa6/0xe0
 [8021d21b] cache_k8_northbridges+0x9b/0x120
 [80688af3] gart_iommu_init+0x33/0x5b0
 [802211a3] __wake_up+0x43/0x70
 [80453b90] genl_rcv+0x0/0x70
 [80452175] netlink_kernel_create+0x155/0x170
 [80684029] pci_iommu_init+0x9/0x20
 [8067e6f4] kernel_init+0x154/0x330
 [8020a8d8] child_rip+0xa/0x12
 [80348e10] acpi_ds_init_one_object+0x0/0x7c
 [8067e5a0] kernel_init+0x0/0x330
 [8020a8ce] child_rip+0x0/0x12

See the 2.6.22-rc2-git1 +1 row at http://test.kernel.org/ for full
logs.

-- 

Steve Fox
IBM Linux Technology Center

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Xen-devel] Re: [PATCH 3/4] [Net] Support Xen accelerated network plugin modules

2007-05-22 Thread Kieran Mansley
On Tue, 2007-05-22 at 15:07 +0100, Keir Fraser wrote:
 On 22/5/07 13:44, Kieran Mansley [EMAIL PROTECTED] wrote:
 
  Eagerly zap the function pointers, then wait one RCU period so every CPU
  goes through a quiescent point before unloading the module?
  
   -- Keir
  
  Am I right in thinking that if one of the functions that was protected
  by RCU was to block, that would be a bad thing?  Clearly the data path
  hooks can't/don't block, but I'm not sure it's so obvious for things
  like probing a new device.
 
 Are there still module reference counts? If so, functions which may block
 can manipulate their module's reference count.
 
 Or if not, I guess the accelerator module can have a private reference count
 checked by whatever unload function gets called from the RCU subsystem. So
 that unload becomes deferred until *both* an RCU phase has passed *and* a
 reference count has fallen to zero.

That's true I suppose, but it replaces the current spinlock and ref
count with an RCU and a ref count, so does little to address the
complexity that Stephen Hemminger was rightly concerned about.  It does
I suppose put the complexity in the plugin module rather than netfront,
and only have it when necessary, which might make it better, but makes
the job of writing the plugin modules harder and more prone to bugs. 

Kieran

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Xen-devel] Re: [PATCH 3/4] [Net] Support Xen accelerated network plugin modules

2007-05-22 Thread Keir Fraser
On 22/5/07 13:44, Kieran Mansley [EMAIL PROTECTED] wrote:

 Eagerly zap the function pointers, then wait one RCU period so every CPU
 goes through a quiescent point before unloading the module?
 
  -- Keir
 
 Am I right in thinking that if one of the functions that was protected
 by RCU was to block, that would be a bad thing?  Clearly the data path
 hooks can't/don't block, but I'm not sure it's so obvious for things
 like probing a new device.

Are there still module reference counts? If so, functions which may block
can manipulate their module's reference count.

Or if not, I guess the accelerator module can have a private reference count
checked by whatever unload function gets called from the RCU subsystem. So
that unload becomes deferred until *both* an RCU phase has passed *and* a
reference count has fallen to zero.

 -- Keir

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: select(0, ..) is valid ?

2007-05-22 Thread Nishanth Aravamudan
On 22.05.2007 [09:16:37 -0500], Steve Fox wrote:
 On Wed, 2007-05-16 at 17:59 -0700, Badari Pulavarty wrote:
 
  Here it is ..
  
  Should I do one for poll() also ?
  
  Thanks,
  Badari
  
  Optimize select by a using stack space for small fd sets.
  core_sys_select() already has this optimization. This is
  for compat version. 
  
  Signed-off-by: Badari Pulavarty [EMAIL PROTECTED]
  ---
   fs/compat.c |   17 +++--
   1 file changed, 11 insertions(+), 6 deletions(-)
  
  Index: linux-2.6.22-rc1/fs/compat.c
  ===
  --- linux-2.6.22-rc1.orig/fs/compat.c   2007-05-12 18:45:56.0 
  -0700
  +++ linux-2.6.22-rc1/fs/compat.c2007-05-16 17:50:39.0 -0700
  @@ -1544,9 +1544,10 @@ int compat_core_sys_select(int n, compat
  compat_ulong_t __user *outp, compat_ulong_t __user *exp, s64 *timeout)
   {
  fd_set_bits fds;
  -   char *bits;
  +   void *bits;
  int size, max_fds, ret = -EINVAL;
  struct fdtable *fdt;
  +   long stack_fds[SELECT_STACK_ALLOC/sizeof(long)];
  
  if (n  0)
  goto out_nofds;
  @@ -1564,11 +1565,14 @@ int compat_core_sys_select(int n, compat
   * since we used fdset we need to allocate memory in units of
   * long-words.
   */
  -   ret = -ENOMEM;
  size = FDS_BYTES(n);
  -   bits = kmalloc(6 * size, GFP_KERNEL);
  -   if (!bits)
  -   goto out_nofds;
  +   bits = stack_fds;
  +   if (size  sizeof(stack_fds) / 6) {
  +   bits = kmalloc(6 * size, GFP_KERNEL);
  +   ret = -ENOMEM;
  +   if (!bits)
  +   goto out_nofds;
  +   }
  fds.in  = (unsigned long *)  bits;
  fds.out = (unsigned long *) (bits +   size);
  fds.ex  = (unsigned long *) (bits + 2*size);
  @@ -1600,7 +1604,8 @@ int compat_core_sys_select(int n, compat
  compat_set_fd_set(n, exp, fds.res_ex))
  ret = -EFAULT;
   out:
  -   kfree(bits);
  +   if (bits != stack_fds)
  +   kfree(bits);
   out_nofds:
  return ret;
   }
 
 Andy put this through a couple machines on test.kernel.org and elm3b6
 was fixed, however elm3b239 still had a boot error.
 
 BUG: at mm/slab.c:777 __find_general_cachep()
 
 Call Trace:
  [802729c6] __kmalloc+0xa6/0xe0
  [8021d21b] cache_k8_northbridges+0x9b/0x120

I believe this is fixed by:

http://lkml.org/lkml/2007/5/18/19

Care to stack it on top and retest?

Thanks,
Nish

-- 
Nishanth Aravamudan [EMAIL PROTECTED]
IBM Linux Technology Center
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Xen-devel] Re: [PATCH 3/4] [Net] Support Xen accelerated network plugin modules

2007-05-22 Thread Stephen Hemminger
On Tue, 22 May 2007 13:44:28 +0100
Kieran Mansley [EMAIL PROTECTED] wrote:

 On Tue, 2007-05-22 at 08:48 +0100, Keir Fraser wrote:
  
  
  On 22/5/07 08:28, Kieran Mansley [EMAIL PROTECTED] wrote:
  
   On Tue, 2007-05-22 at 08:15 +0100, Kieran Mansley wrote:
   RCU on its own wouldn't
   prevent the accelerated plugin being unloaded while netfront was using
   one of the hooks.
   
   Hmm, actually I think it could be used to do that.  I'll take a look.
  
  Eagerly zap the function pointers, then wait one RCU period so every CPU
  goes through a quiescent point before unloading the module?
  
   -- Keir
 
 Am I right in thinking that if one of the functions that was protected
 by RCU was to block, that would be a bad thing?  Clearly the data path
 hooks can't/don't block, but I'm not sure it's so obvious for things
 like probing a new device. 
 
 Kieran
 

The same thing is already done to handle network protocols already.
RCU is used for the object handle (including function pointers).
You need to use:
  * put rcu structure in accelerator list member
and initialize it to the callback
  * on addition increase refcount
on deletion
  * call list_del_rcu() on removal
  * in rcu callback you do last step
like drop module refcount and free.

-- 
Stephen Hemminger [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch 29/33] xen: Add the Xen virtual network device driver.

2007-05-22 Thread Jeremy Fitzhardinge
The network device frontend driver allows the kernel to access network
devices exported exported by a virtual machine containing a physical
network device driver.



Signed-off-by: Jeremy Fitzhardinge [EMAIL PROTECTED]
Signed-off-by: Chris Wright [EMAIL PROTECTED]
Cc: Ian Pratt [EMAIL PROTECTED]
Cc: Christian Limpach [EMAIL PROTECTED]
Cc: Jeff Garzik [EMAIL PROTECTED]
Cc: Stephen Hemminger [EMAIL PROTECTED]
Cc: Christoph Hellwig [EMAIL PROTECTED]
Cc: Rusty Russell [EMAIL PROTECTED]
Cc: Herbert Xu [EMAIL PROTECTED]
Cc: Keir Fraser [EMAIL PROTECTED]
Cc: netdev@vger.kernel.org

---
 drivers/net/Kconfig|   12 
 drivers/net/Makefile   |1 
 drivers/net/xen-netfront.c | 1995 
 3 files changed, 2008 insertions(+)

===
--- a/drivers/net/Kconfig
+++ b/drivers/net/Kconfig
@@ -2583,6 +2583,18 @@ source drivers/atm/Kconfig
 
 source drivers/s390/net/Kconfig
 
+config XEN_NETDEV_FRONTEND
+   tristate Xen network device frontend driver
+   depends on XEN
+   default y
+   help
+ The network device frontend driver allows the kernel to
+ access network devices exported exported by a virtual
+ machine containing a physical network device driver. The
+ frontend driver is intended for unprivileged guest domains;
+ if you are compiling a kernel for a Xen guest, you almost
+ certainly want to enable this.
+
 config ISERIES_VETH
tristate iSeries Virtual Ethernet driver support
depends on PPC_ISERIES
===
--- a/drivers/net/Makefile
+++ b/drivers/net/Makefile
@@ -229,3 +229,4 @@ obj-$(CONFIG_FS_ENET) += fs_enet/
 
 obj-$(CONFIG_NETXEN_NIC) += netxen/
 obj-$(CONFIG_LGUEST_GUEST) += lguest_net.o
+obj-$(CONFIG_XEN_NETDEV_FRONTEND) += xen-netfront.o
===
--- /dev/null
+++ b/drivers/net/xen-netfront.c
@@ -0,0 +1,1995 @@
+/*
+ * Virtual network driver for conversing with remote driver backends.
+ *
+ * Copyright (c) 2002-2005, K A Fraser
+ * Copyright (c) 2005, XenSource Ltd
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License version 2
+ * as published by the Free Software Foundation; or, when distributed
+ * separately from the Linux kernel or incorporated into other
+ * software packages, subject to the following license:
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this source file (the Software), to deal in the Software without
+ * restriction, including without limitation the rights to use, copy, modify,
+ * merge, publish, distribute, sublicense, and/or sell copies of the Software,
+ * and to permit persons to whom the Software is furnished to do so, subject to
+ * the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ */
+
+#include linux/module.h
+#include linux/kernel.h
+#include linux/netdevice.h
+#include linux/etherdevice.h
+#include linux/skbuff.h
+#include linux/ethtool.h
+#include linux/if_ether.h
+#include linux/tcp.h
+#include linux/udp.h
+#include linux/moduleparam.h
+#include linux/mm.h
+#include net/ip.h
+
+#include xen/xenbus.h
+#include xen/events.h
+#include xen/page.h
+#include xen/grant_table.h
+
+#include xen/interface/io/netif.h
+#include xen/interface/memory.h
+#include xen/interface/grant_table.h
+
+static struct ethtool_ops xennet_ethtool_ops;
+
+struct netfront_cb {
+   struct page *page;
+   unsigned offset;
+};
+
+#define NETFRONT_SKB_CB(skb)   ((struct netfront_cb *)((skb)-cb))
+
+/*
+ * Mutually-exclusive module options to select receive data path:
+ *  copy : Packets are copied by network backend into local memory
+ *  flip : Page containing packet data is transferred to our ownership
+ * For fully-virtualised guests there is no option - copying must be used.
+ * For paravirtualised guests, flipping is the default.
+ */
+typedef enum rx_mode {
+   RX_COPY = 0,
+   RX_FLIP = 1,
+} rx_mode_t;
+
+static enum rx_mode rx_mode = RX_FLIP;
+
+#define param_check_rx_mode_t(name, p) __param_check(name, p, rx_mode_t)
+
+static int param_set_rx_mode_t(const char *val, struct kernel_param *kp)
+{
+   enum rx_mode *rxmp = kp-arg;
+   int ret = 0;
+

Re: [Xen-devel] Re: [PATCH 3/4] [Net] Support Xen accelerated network plugin modules

2007-05-22 Thread Kieran Mansley
On Tue, 2007-05-22 at 08:05 -0700, Stephen Hemminger wrote:

 The same thing is already done to handle network protocols already.
 RCU is used for the object handle (including function pointers).
 You need to use:
   * put rcu structure in accelerator list member
 and initialize it to the callback
   * on addition increase refcount
 on deletion
   * call list_del_rcu() on removal
   * in rcu callback you do last step
   like drop module refcount and free.

Apologies for coming back to this, but I want to make sure this is going
to work before I write the code.

The current scheme uses a spin lock to protect the list and a reference
count for each item on that list.  This reference count is initialised
to 1 when the accelerator module is loaded, incremented before each call
into the accelerator, decremented after it, and decremented when the
module's exit function is called as a result of rmmod being called on
the module.  rmmod is then blocked.  When the ref count reaches zero the
function pointers are set to NULL, resulting in no more calls into the
accelerator module, and the rmmod is unblocked.  The accelerator now
exits safely.

The critical bits I don't understand about your suggested scheme are:
 i) how deletion/list_del_rcu() is triggered (see below);
 ii) how it prevents the accelerated module being unloaded in the middle
of call into that module. 

I assume you're suggesting using the module use count to solve (ii), but
this essentially causes (i):  if we increase the module use count for
each interface using the accelerator we can never unload the module
because there's no mechanism to request that an interface stop being
accelerated (and so decrease the ref count).

If you're suggesting using RCU to protect against the hooks being
modified during a call into them, that's only allowed if the protected
region doesn't block, and I'm not convinced that the protected regions
here (the calls into the accelerator module) will never block.

Apologies again if I've misinterpreted your suggestion,

Kieran

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] fix e100 rx path on ARM (was [PATCH] e100 rx: or s and el bits)

2007-05-22 Thread Milton Miller


On May 21, 2007, at 12:45 PM, Kok, Auke wrote:


Milton Miller wrote:

On May 18, 2007, at 12:11 PM, David Acker wrote:

Kok, Auke wrote:
First impression just came in: It seems RX performance is dropped 
to 10mbit. TX is unaffected and runs at 94mbit/tcp, but RX the new 
code seems to misbehave and  fluctuate, dropping below 10mbit after 
a few netperf runs and staying there...

ideas?
I found the problem.  Another casualty of working with two different 
kernels at once...arg.
The blank rfd needs to have its el-bit clear now.  Here is the new 
and improved patch.

...

Proceeding with the review:
Coding style:
(1) if body on seperate line.
(2) space after if before (
(3) The other enums in this driver are not ALL_CAPS
(4) This driver doesn't do CONSTANT != value but value != enum
 (see nic-mac for examples)


I sent Milton my copy of this patch which has these style issues 
corrected and
applies cleanly to a recent git tree. If anyone else specifically 
wants a copy

let me know.

Auke


It addressed 1 and 2, and applies, but did not address 3 and 4.


But the bigger point is it didn't address the holes I identified.

I think we need to change the logic to reclaim the size from 0
only if we are restarting, and make rx_indicate look ahead to
rx-next if it encounters a !EL size 0 buffer.  Without this we
are doing a prohibited rx_start to a running machine.  The
device can still see this size 0 !EL state.  Also we will get
stuck when the device finds the window between the two writes.

We can remove some register pressure by finding old_before_last_rfd
when we are ready to use it, just comparing old_before_last_rx
to new.

Also, as I pointed out, the rx_to_start change to start_reciever
is compicated and unnecessary, as rx_to_clean can always be used
and it was the starting point before the changes.

As far as the RU_SUSPENDED, I don't think we need it, instead
we should poll the device.

Here is my proposal:
rx_indicate can stop when it hits the packet with EL.  If it
hits a packet with size 0, look ahead to rx-next to see if
it is complete, if so complete this one otherwise leave it
as next to clean.  After the rx_indicate loop, try to allocate
more skbs.  If we are successful, then fixup the before-next
as we do now.  Then check the device status for RNR, and if
its stopped then set rx_to_clean rfd size back to ETH_LEN
and restart the reciever.

This does have a small hole: if we only add one packet at
a time we will end up with all size 0 descriptors in the
lopp.   We can detect that and not remove EL from the old
before-next unless we are restarting.  That would imply
moving the status poll before we allocate the list.

milton

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] e1000: Don't enable polling in open() (was: e1000: assertion hit in e1000_clean(), kernel 2.6.21.1)

2007-05-22 Thread Chuck Ebbert
Herbert Xu wrote:
 On Mon, May 21, 2007 at 07:42:39PM -0400, Jeff Garzik wrote:
 applied, though as a poster (DaveJ?) noted, I'm not sure it completely 
 fixes the bug
 
 It should fix the problem completely in 2.6.22.  For 2.6.21, we need
 a different fix because e1000_open is directly calling e1000_up.
 

Is there going to be a 2.6.21-stable fix for this?

Fedora is going to backport a big patchset but that won't work
for -stable.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Please pull 'upstream-fixes' branch of wireless-2.6

2007-05-22 Thread John W. Linville
The following changes since commit 55b637c6a003a8c4850b41a2c2fd6942d8a7f530:
  Linus Torvalds (1):
Linux v2.6.22-rc2

are found in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6.git 
upstream-fixes

Eugene Teo (2):
  drivers/net/wireless/libertas/fw.c: fix use-before-check
  drivers/net/wireless/libertas/rx.c: fix use-after-free

Florin Malita (1):
  libertas: skb dereferenced after netif_rx

 drivers/net/wireless/libertas/decl.h |2 +-
 drivers/net/wireless/libertas/fw.c   |   14 +-
 drivers/net/wireless/libertas/rx.c   |   24 +---
 3 files changed, 15 insertions(+), 25 deletions(-)

diff --git a/drivers/net/wireless/libertas/decl.h 
b/drivers/net/wireless/libertas/decl.h
index 606bdd0..dfe2764 100644
--- a/drivers/net/wireless/libertas/decl.h
+++ b/drivers/net/wireless/libertas/decl.h
@@ -46,7 +46,7 @@ u32 libertas_index_to_data_rate(u8 index);
 u8 libertas_data_rate_to_index(u32 rate);
 void libertas_get_fwversion(wlan_adapter * adapter, char *fwversion, int 
maxlen);
 
-int libertas_upload_rx_packet(wlan_private * priv, struct sk_buff *skb);
+void libertas_upload_rx_packet(wlan_private * priv, struct sk_buff *skb);
 
 /** The proc fs interface */
 int libertas_process_rx_command(wlan_private * priv);
diff --git a/drivers/net/wireless/libertas/fw.c 
b/drivers/net/wireless/libertas/fw.c
index 441123c..5c63c9b 100644
--- a/drivers/net/wireless/libertas/fw.c
+++ b/drivers/net/wireless/libertas/fw.c
@@ -333,18 +333,22 @@ static void command_timer_fn(unsigned long data)
unsigned long flags;
 
ptempnode = adapter-cur_cmd;
+   if (ptempnode == NULL) {
+   lbs_pr_debug(1, PTempnode Empty\n);
+   return;
+   }
+
cmd = (struct cmd_ds_command *)ptempnode-bufvirtualaddr;
+   if (!cmd) {
+   lbs_pr_debug(1, cmd is NULL\n);
+   return;
+   }
 
lbs_pr_info(command_timer_fn fired (%x)\n, cmd-command);
 
if (!adapter-fw_ready)
return;
 
-   if (ptempnode == NULL) {
-   lbs_pr_debug(1, PTempnode Empty\n);
-   return;
-   }
-
spin_lock_irqsave(adapter-driver_lock, flags);
adapter-cur_cmd = NULL;
spin_unlock_irqrestore(adapter-driver_lock, flags);
diff --git a/drivers/net/wireless/libertas/rx.c 
b/drivers/net/wireless/libertas/rx.c
index d17924f..96619a3 100644
--- a/drivers/net/wireless/libertas/rx.c
+++ b/drivers/net/wireless/libertas/rx.c
@@ -136,7 +136,7 @@ static void wlan_compute_rssi(wlan_private * priv, struct 
rxpd *p_rx_pd)
LEAVE();
 }
 
-int libertas_upload_rx_packet(wlan_private * priv, struct sk_buff *skb)
+void libertas_upload_rx_packet(wlan_private * priv, struct sk_buff *skb)
 {
lbs_pr_debug(1, skb-data=%p\n, skb-data);
 
@@ -148,8 +148,6 @@ int libertas_upload_rx_packet(wlan_private * priv, struct 
sk_buff *skb)
skb-ip_summed = CHECKSUM_UNNECESSARY;
 
netif_rx(skb);
-
-   return 0;
 }
 
 /**
@@ -269,15 +267,11 @@ int libertas_process_rxed_packet(wlan_private * priv, 
struct sk_buff *skb)
wlan_compute_rssi(priv, p_rx_pd);
 
lbs_pr_debug(1, RX Data: size of actual packet = %d\n, skb-len);
-   if (libertas_upload_rx_packet(priv, skb)) {
-   lbs_pr_debug(1, RX error: libertas_upload_rx_packet
-   returns failure\n);
-   ret = -1;
-   goto done;
-   }
priv-stats.rx_bytes += skb-len;
priv-stats.rx_packets++;
 
+   libertas_upload_rx_packet(priv, skb);
+
ret = 0;
 done:
LEAVE();
@@ -438,22 +432,14 @@ static int process_rxed_802_11_packet(wlan_private * 
priv, struct sk_buff *skb)
wlan_compute_rssi(priv, prxpd);
 
lbs_pr_debug(1, RX Data: size of actual packet = %d\n, skb-len);
-
-   if (libertas_upload_rx_packet(priv, skb)) {
-   lbs_pr_debug(1, RX error: libertas_upload_rx_packet 
-   returns failure\n);
-   ret = -1;
-   goto done;
-   }
-
priv-stats.rx_bytes += skb-len;
priv-stats.rx_packets++;
 
+   libertas_upload_rx_packet(priv, skb);
+
ret = 0;
 done:
LEAVE();
 
-   skb-protocol = __constant_htons(0x0019);   /* ETH_P_80211_RAW */
-
return (ret);
 }
-- 
John W. Linville
[EMAIL PROTECTED]

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] e1000: Don't enable polling in open() (was: e1000: assertion hit in e1000_clean(), kernel 2.6.21.1)

2007-05-22 Thread Kok, Auke

Chuck Ebbert wrote:

Herbert Xu wrote:

On Mon, May 21, 2007 at 07:42:39PM -0400, Jeff Garzik wrote:
applied, though as a poster (DaveJ?) noted, I'm not sure it completely 
fixes the bug

It should fix the problem completely in 2.6.22.  For 2.6.21, we need
a different fix because e1000_open is directly calling e1000_up.



Is there going to be a 2.6.21-stable fix for this?

Fedora is going to backport a big patchset but that won't work
for -stable.


I've posted that before and it's up to the stable team. I think that it's a 
seriously toolarge change unless Herbert posts his short version of the fix for 
2.6.21.1. I would be OK with that.


BTW this bug is present in most recent kernels, certainly before 2.6.20...

Auke
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: select(0, ..) is valid ?

2007-05-22 Thread Steve Fox
On Tue, 2007-05-22 at 07:34 -0700, Nishanth Aravamudan wrote:
 On 22.05.2007 [09:16:37 -0500], Steve Fox wrote:
  
  Andy put this through a couple machines on test.kernel.org and elm3b6
  was fixed, however elm3b239 still had a boot error.
  
  BUG: at mm/slab.c:777 __find_general_cachep()
  
  Call Trace:
   [802729c6] __kmalloc+0xa6/0xe0
   [8021d21b] cache_k8_northbridges+0x9b/0x120
 
 I believe this is fixed by:
 
 http://lkml.org/lkml/2007/5/18/19
 
 Care to stack it on top and retest?

Looks good. See the 2.6.22-rc2-git1 +1 +1 row on tko. Thanks.

-- 

Steve Fox
IBM Linux Technology Center

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: r8169: hard freezes on TX

2007-05-22 Thread Francois Romieu
Rolf Eike Beer [EMAIL PROTECTED] :
[...]
 I often see freezes when I do much outgoing transfer. I have never seen this 
 happening on incoming transfers. When this happens the system locks up hard, 
 I don't see anything in the log. Since this is my laptop I have trouble 
 debugging it: there is no serial console and debugging this via netconsole 
 doesn't look like a good idea.

Keyboard leds are dead afterwards I guess, right ?

If you are experiencing bugs related to networking, I suggest to stay
away from netconsole. It is not funny to analyze several bugs at the same
time.

 When I say much outgoing transfer this means several megabytes. If I copy 
 out 30 MB I almost everytime get this. I usually copy that much only at home 
 when I feed my gentoo server. That host only has a 10 MBit connection. 
 Nevertheless I've also seen that on different hosts using different files on 
 different protocols (ftp, scp, smb).

:o/

So it can be reproduced with a simple ftp put of several megabytes of
data completely cached in memory (no disk access) ?

[...]
 This is the output of lspci for my NIC:
 
 05:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8101E PCI 
 Express Fast Ethernet controller (rev 01)

I have not seen a lot of reports for this one. Either it is perfect or
it is barely used.

[...]
 Hm, is there a reason why we don't use MSI here?

A request for testers was posted (netdev + lk) on 16/03/2007 which contained
MSI code for the 8168. I did not enable it for the 8101 because it had only
received (positive) reports from 8168 users.

Afair, the RFT got no feedback.

 Ah, one thing is missing: I've not tested it with current kernel, latest I 
 tested was 2.6.21-rc7. But I've seen this on many previous version, although 
 I thought it became better some versions ago. I wont bet on it, it might just 
 have been luck.

You can/should try:
http://www.fr.zoreil.com/linux/kernel/2.6.x/2.6.22-rc2 (patch-kit)
or:
http://www.fr.zoreil.com/people/francois/misc/20070522-2.6.22-rc2-r8169.patch

If you are fluent with git and you do not mind rebasing, you can try
git://electric-eye.fr.zoreil.com/home/romieu/linux/linux-2.6-out r8169

(don't do the initial clone from here, thanks)

As an option, akpm includes the git branch for you in -mm.

-- 
Ueimor
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


UDP checksum broken since 2.6.18?

2007-05-22 Thread Thomas B. Rücker
hi,

a friend of mine recently contacted me about what he at first thought
were IPv6 issues with some java software.

As it turns out it probably is a general IP issue with the Linux kernel:

He wrote this piece of c which sends an UDP packet to 127.28.50.50 -
http://www2.futureware.at/~philipp/udp-problem.c
Packets generated by this code were captured by tcpdump and wireshark.
When feeding the dump into wireshark it says:
Checksum: 0x62fd [incorrect, should be 0xe4f3] for the udp packet.

We've tested this on several kernel versions.
Wireshark reports checksum broken:
Linux version 2.6.18-4-vserver-686 (Debian 2.6.18.dfsg.1-12)
([EMAIL PROTECTED]) (gcc version 4.1.2 20061115 (prerelease) (Debian
4.1.1-21)) #1 SMP Mon Mar 26 19:55:22 UTC 2007
Linux version 2.6.19-dm8tbr-1 ([EMAIL PROTECTED]) (gcc version 4.1.2 20061028
(prerelease) (Debian 4.1.1-19)) #3 SMP PREEMPT Sun Dec 3 18:31:00 CET
2006 - (that's vanilla)
Linux version 2.6.21.1-dm8tbr-1 ([EMAIL PROTECTED]) (gcc version 4.1.2
20061115 (prerelease) (Debian 4.1.1-21)) #3 SMP Fri May 18 09:04:55 CEST
2007 - (that's vanilla + dscape patch)

Wireshark reports checksum ok:
Linux version 2.6.16.13-4-default ([EMAIL PROTECTED]) (gcc version 4.1.0
(SUSE Linux)) #1 Wed May 3 04:53:23 UTC 2006
Linux version 2.6.17-11-386 ([EMAIL PROTECTED]) (gcc version 4.1.2 20060928
(prerelease) (Ubuntu 4.1.1-13ubuntu5)) #2 Tue Mar 13 23:30:30 UTC 2007
(Ubuntu 2.6.17-11.37-386)

So my guess is something between 2.6.17 and 2.6.18 broke.

Second option is: The way you are supposed to send UDP packets changed
in 2.6.18 and sun javavm and that piece of c are broken for the same reason.

Third option: everything is perfectly ok, the UDP checksum is computed
in a different way since 2.6.18 - due to some reason I don't know - and
Wireshark is broken.

We'd be grateful for some enlightment.

Cheers

Thomas

PS: please keep me CCed - I'm not subscribed to the netdev ml.




-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] fix e100 rx path on ARM (was [PATCH] e100 rx: or s and el bits)

2007-05-22 Thread David Acker

Milton Miller wrote:

Proceeding with the review:
Coding style:
(1) if body on seperate line.
(2) space after if before (
(3) The other enums in this driver are not ALL_CAPS
(4) This driver doesn't do CONSTANT != value but value != enum
 (see nic-mac for examples)


I sent Milton my copy of this patch which has these style issues 
corrected and
applies cleanly to a recent git tree. If anyone else specifically 
wants a copy

let me know.

Auke


It addressed 1 and 2, and applies, but did not address 3 and 4.



Sorry about the style bugs.  I will be more careful about that next time.

Many of the issues you bring have been in the e100 for some time.  If 
you ignore the s-bit patch, I basically did the the following:
moved the el-bit to before the last buffer so that the last buffer was 
protected during chaining.
set the el-bit buffer (next-to-last) size to 0 so that it is not written 
to while we are writing to it.


This seemed the only way to protect the buffers we are changing while 
having code to stay ahead of the hardware.


(3) The driver had these all caps constants before my patch.  They went 
away with the s-bit patch.  I just put them back.  I agree they stick 
out but I wanted to leave style changes out of a bug fix patch.


I agree about the name of the constant, RU_SUSPENDED.  It is not 
accurate and I had a patch to fix this when I experimented with using 
the S-bit.




But the bigger point is it didn't address the holes I identified.



I also see that the driver sets the size from 0 back to frame
size.  While there is a wmb() between the removal of EL and the
restore of the frame size, there is only one pci_sync for both
of them.  Since the size is in a separate word, depending on skb
alignment it may be in a different cache line and we may end up
with all orderings being visible to the device. 
Hmmm...interesting.  The start of the data of the skb may not be aligned 
on a cache line.  Ok DMA experts...what happens when you sync across 
cache lines?  It should dump both of them, right?  I guess the orderings 
could vary...hence why I saw completions with size set but the el-bit 
still set.  So perhaps the skb alloc needs to be aligned by cache line? 
This issue existed before my patches as well.




This patch adds a lot of code that checks if the next RFD has
EL set and assumes that if it sees the next frame to clean has
EL set then the device will have seen it and entered the stopped
state.   In theory, the device might be held off on the PCI bus
and not have yet read the descriptor and stopped.   In addition,
while it anticipates the RNR interrupt and restarts the receiver
from the clean process, it doesn't clear the pending interrupt
and the cpu will still take the interrupt, although it will read
the status and not actually restart the RU. 
So the device is just before the el-bit buffer and stops.  We get a poll 
call with interrupts disabled and see the el-bit buffer and decide we 
need to restart.  First we alloc new buffers and move the el-bit/size 0 
down.  The restart occurs on the next poll (interrupts are not turned on 
if we did any work).  In theory, the hardware could have read the buffer 
that no longer has a size of 0 and tried to use it when we hit it with 
the start.  I am not sure how the hardware handles this.  Perhaps it is 
ignored and thus harmless.  We could poll the hardware to check before 
actually sending the start...it adds an extra pci operation but would 
avoid a command that is illegal in the manual.


If we ignore a buffer with the el-bit set but without the complete bit, 
we must wait for the hardware to RNR interrupt.  I have found that when 
a buffer has both the el-bit set and the size set to 0, it will not set 
the complete bit on that buffer.  This was part of the reason why my 
first patch had hardware just using the list of buffers given all the 
way up until the end.



I think we need to change the logic to reclaim the size from 0
only if we are restarting, and make rx_indicate look ahead to
rx-next if it encounters a !EL size 0 buffer.  Without this we
are doing a prohibited rx_start to a running machine.
The hardware is only restarted when we enter the RU_SUSPENDED state. 
This only happens when we:

1) get an RNR interrupt
2) get an EL-bit buffer without a completion
3) get an el-bit buffer with a completion (hardware saw size set but not 
el-bit clear)


State 2) seems to be the problem.  Get rid of that and we wait for an 
interrupt to tell us it saw the buffer in most cases.  Of course, then 
we always wait.


If we do not reclaim the size from 0, but clear the el-bit, that buffer 
will always return an error.


 The

device can still see this size 0 !EL state.  Also we will get
stuck when the device finds the window between the two writes.


 This window also means that the device may have skipped this
 previously 0-length descriptor and gone on to the next one,
 so we will stick waiting for the device to write to this
 

Re: [RFC] New driver API to speed up small packets xmits

2007-05-22 Thread David Miller
From: Shirley Ma [EMAIL PROTECTED]
Date: Tue, 22 May 2007 15:22:35 -0700

  Yep, for any NIC that supports SG but not TSO then software GSO will
  be a big win.  When the NIC doesn't support SG then the win is mostly
  offset by the need to copy the packet again.
 
  Cheers,
  --
 
   We could avoid packet copy by allocating number of 64K/MTU skb
 buffers instead of one big 64K buffer size. Is it a possible approach?

I think you misunderstand the problem.

If the device does not support scatter gather, it is impossible to
avoid the copy.

SKB's from TSO are composed of discontiguous page chunks represented
by the skb_shared_info() array.  These can come either from sendmsg()
user data (which the kernel fills into a per-socket data page which is
reallocated once filled), or from sendfile() page cache pages.

Therefore, there has to be a copy somewhere to be able to give
that packet to a non-scatter-gather device.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: UDP packet loss when running lsof

2007-05-22 Thread John Miller


Hi Eric,


 It's a HP system with two dual core CPUs at 3GHz, the



Then you might try to bind network IRQ to one CPU
(echo 1 /proc/irq/XX/smp_affinity)



XX being your NIC interrupt (cat /proc/interrupts to catch it)



and bind your user program to another cpu(s)


the NIC was already fixed at CPU0 and the irq_balancer switched
the timer interrupt between all CPUs and the storage HBA between
CPU1 and CPU4. Stopping the balancer and leaving NIC alone on CPU0
and the other interrupts and my program on CPU2-4 did not improve
the situation.
At least I could not see an improvement over just adding
thash_entries=2048.


You might hit a cond_resched_softirq() bug that Ingo and others
are sorting out right now. Using separate CPU for softirq
handling and your programs should help a lot here.


Shouldn't I get some syslog messages if this bug is triggered?

Nevertheless I also opened a call on Novell about this issue,
as the current cond_resched_softirq() does look completely
different than in 2.6.18


 This did help a lot, I tried thash_entries=10 and now only a
 while loop around the cat ...tcp triggers packet loss. Tests



I dont understand here : using a small thash_entries makes
the bug always appear ?


No. thash_entries=10 improves the situation. Without the param
nearly every look at /proc/net/tcp leads to packet loss, with
thash_entries=10 (or 2048, does not matter) I have to start a
while true; do cat /prc/net/tcp ; done to get packet loss
every minute.

But even with thash_entries=10 and if I leave my program alone
 on he system I get packet loss every few hours.

Regards,
John




-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2 v4] s2io: add PCI error recovery support

2007-05-22 Thread Linas Vepstas
On Mon, May 21, 2007 at 06:51:45PM -0400, Jeff Garzik wrote:
 
 The part that confuses me is that I'd gotten a message from Jeff
 back in March (well before 2.6.21 came out), saying it was in his
 development tree; yet, the patch its not in 2.6.22-rc; Torvalds
 hasn't yet pulled from it?
 
 It only appeared in my tree on May 14.  I tend to drop patches that are 
 repeatedly revised, allowing the dust to settle.

OK, a new patch is coming. I did not want to pester you until after -rc1
came out, but perhaps that was the wrong strategy.

--linas

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2 v4] s2io: add PCI error recovery support

2007-05-22 Thread Jeff Garzik

Linas Vepstas wrote:

On Mon, May 21, 2007 at 06:51:45PM -0400, Jeff Garzik wrote:

The part that confuses me is that I'd gotten a message from Jeff
back in March (well before 2.6.21 came out), saying it was in his
development tree; yet, the patch its not in 2.6.22-rc; Torvalds
hasn't yet pulled from it?
It only appeared in my tree on May 14.  I tend to drop patches that are 
repeatedly revised, allowing the dust to settle.


OK, a new patch is coming. I did not want to pester you until after -rc1
came out, but perhaps that was the wrong strategy.


You should be sending patches as soon as they are available, so that 
they do not miss merge window + 1.


Jeff



-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] s2io: don't run MSI handlers if device is offline.

2007-05-22 Thread Linas Vepstas

Don't run any of the MSI handlers if the channel is off;
also don't gather device statatistics. Also, netif_wake 
not needed, per suggestions from
Sivakumar Subramani [EMAIL PROTECTED].

Signed-off-by: Linas Vepstas [EMAIL PROTECTED]
Cc: Ramkrishna Vepa [EMAIL PROTECTED]
Cc: Sivakumar Subramani [EMAIL PROTECTED]
Cc: Sreenivasa Honnur [EMAIL PROTECTED]
Cc: Rastapur Santosh [EMAIL PROTECTED]
Cc: Wen Xiong [EMAIL PROTECTED]


diff --git a/drivers/net/s2io.c b/drivers/net/s2io.c
index e46e164..871c37c 100644
--- a/drivers/net/s2io.c
+++ b/drivers/net/s2io.c
@@ -4202,6 +4202,9 @@ static irqreturn_t s2io_msi_handle(int i
struct mac_info *mac_control;
struct config_param *config;
 
+   if (pci_channel_offline(sp-pdev))
+   return IRQ_NONE;
+
atomic_inc(sp-isr_cnt);
mac_control = sp-mac_control;
config = sp-config;
@@ -4232,6 +4235,9 @@ static irqreturn_t s2io_msix_ring_handle
struct ring_info *ring = (struct ring_info *)dev_id;
struct s2io_nic *sp = ring-nic;
 
+   if (pci_channel_offline(sp-pdev))
+   return IRQ_NONE;
+
atomic_inc(sp-isr_cnt);
 
rx_intr_handler(ring);
@@ -4246,6 +4252,9 @@ static irqreturn_t s2io_msix_fifo_handle
struct fifo_info *fifo = (struct fifo_info *)dev_id;
struct s2io_nic *sp = fifo-nic;
 
+   if (pci_channel_offline(sp-pdev))
+   return IRQ_NONE;
+
atomic_inc(sp-isr_cnt);
tx_intr_handler(fifo);
atomic_dec(sp-isr_cnt);
@@ -4428,6 +4437,9 @@ static void s2io_updt_stats(struct s2io_
u64 val64;
int cnt = 0;
 
+   if (pci_channel_offline(sp-pdev))
+   return;
+
if (atomic_read(sp-card_state) == CARD_UP) {
/* Apprx 30us on a 133 MHz bus */
val64 = SET_UPDT_CLICKS(10) |
@@ -8122,5 +8134,4 @@ static void s2io_io_resume(struct pci_de
}
 
netif_device_attach(netdev);
-   netif_wake_queue(netdev);
 }
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] New driver API to speed up small packets xmits

2007-05-22 Thread David Miller
From: Shirley Ma [EMAIL PROTECTED]
Date: Tue, 22 May 2007 15:58:05 -0700

 Sorry for the confusion. I am thinking to avoid copy in skb_segment() for
 GSO. The way could be in tcp_sendmsg() to allocate small discontiguous
 buffers (equal = MTU) instead of allocating pages.

The SKB splitting algorithm in TCP's transmit engine depends upon the
skb_shared_info() array being splittable at arbitrary points with only
page counts to manage.  This is the only way I found to make SKB
splitting at transmit time extremely inexpensive.

SACK block processing needs to perform these kinds of splits
at well, so it really really has to be cheap.

The invariant is that every TCP TSO packet must have it's header
at skb-data and all of it's data in the paged skb_shared_info().
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/10] spidernet: skb used after netif_receive_skb

2007-05-22 Thread Linas Vepstas
From: Florin Malita [EMAIL PROTECTED]

The stats update code in spider_net_pass_skb_up() is touching the skb 
after it's been passed up to the stack. To avoid that, just update the 
stats first.

Signed-off-by: Florin Malita [EMAIL PROTECTED]
Signed-off-by: Linas Vepstas [EMAIL PROTECTED]


 drivers/net/spider_net.c |6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/spider_net.c b/drivers/net/spider_net.c
index 108adbf..1df2f0b 100644
Index: netdev-2.6/drivers/net/spider_net.c
===
--- netdev-2.6.orig/drivers/net/spider_net.c2007-05-21 17:40:49.0 
-0500
+++ netdev-2.6/drivers/net/spider_net.c 2007-05-22 18:03:16.0 -0500
@@ -1014,12 +1014,12 @@ spider_net_pass_skb_up(struct spider_net
 */
}
 
-   /* pass skb up to stack */
-   netif_receive_skb(skb);
-
/* update netdevice statistics */
card-netdev_stats.rx_packets++;
card-netdev_stats.rx_bytes += skb-len;
+
+   /* pass skb up to stack */
+   netif_receive_skb(skb);
 }
 
 #ifdef DEBUG
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] New driver API to speed up small packets xmits

2007-05-22 Thread Herbert Xu
On Tue, May 22, 2007 at 03:36:36PM -0700, David Miller wrote:
 
   Yep, for any NIC that supports SG but not TSO then software GSO will
   be a big win.  When the NIC doesn't support SG then the win is mostly
   offset by the need to copy the packet again.

...
 
 SKB's from TSO are composed of discontiguous page chunks represented
 by the skb_shared_info() array.  These can come either from sendmsg()
 user data (which the kernel fills into a per-socket data page which is
 reallocated once filled), or from sendfile() page cache pages.

Yes sendmsg() is the case where it's almost even whether GSO is
turned on or off because GSO does an extra copy which offsets the
win in reduced per-packet cost.

For sendfile() it's still a win though since we have to copy it
anyway and doing it in GSO avoids the per-packet cost.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED]
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [IPSEC]: Fix warnings with casting int to pointer

2007-05-22 Thread David Miller
From: Herbert Xu [EMAIL PROTECTED]
Date: Tue, 22 May 2007 21:27:03 +1000

 Hi Dave:
 
 Here's patch to fix the warnings.
 
 [IPSEC]: Fix warnings with casting int to pointer
 
 This patch adds some casts to shut up the warnings introduced by my
 last patch that added a common interator function for xfrm algorightms.
 
 Signed-off-by: Herbert Xu [EMAIL PROTECTED] 

Applied thanks Herbert.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/10] spidernet: beautify error messages

2007-05-22 Thread Linas Vepstas

Make error messages print which interface they apply to.

Signed-off-by: Linas Vepstas [EMAIL PROTECTED]


 drivers/net/spider_net.c |   10 ++
 drivers/net/spider_net.h |2 +-
 2 files changed, 7 insertions(+), 5 deletions(-)

Index: netdev-2.6/drivers/net/spider_net.c
===
--- netdev-2.6.orig/drivers/net/spider_net.c2007-05-22 18:03:16.0 
-0500
+++ netdev-2.6/drivers/net/spider_net.c 2007-05-22 18:03:24.0 -0500
@@ -434,7 +434,8 @@ spider_net_prepare_rx_descr(struct spide
  bufsize + SPIDER_NET_RXBUF_ALIGN - 1);
if (!descr-skb) {
if (netif_msg_rx_err(card)  net_ratelimit())
-   pr_err(Not enough memory to allocate rx buffer\n);
+   pr_err(%s: Not enough memory to allocate rx buffer\n,
+   card-netdev-name);
card-spider_stats.alloc_rx_skb_error++;
return -ENOMEM;
}
@@ -455,7 +456,8 @@ spider_net_prepare_rx_descr(struct spide
dev_kfree_skb_any(descr-skb);
descr-skb = NULL;
if (netif_msg_rx_err(card)  net_ratelimit())
-   pr_err(Could not iommu-map rx buffer\n);
+   pr_err(%s: Could not iommu-map rx buffer\n,
+ card-netdev-name);
card-spider_stats.rx_iommu_map_error++;
hwdescr-dmac_cmd_status = SPIDER_NET_DESCR_NOT_IN_USE;
} else {
@@ -1455,8 +1457,8 @@ spider_net_handle_error_irq(struct spide
case SPIDER_NET_GRFAFLLINT: /* fallthrough */
case SPIDER_NET_GRMFLLINT:
if (netif_msg_intr(card)  net_ratelimit())
-   pr_err(Spider RX RAM full, incoming packets 
-  might be discarded!\n);
+   pr_err(%s: Spider RX RAM full, incoming packets 
+  might be discarded!\n, card-netdev-name);
spider_net_rx_irq_off(card);
netif_rx_schedule(card-netdev);
show_error = 0;
Index: netdev-2.6/drivers/net/spider_net.h
===
--- netdev-2.6.orig/drivers/net/spider_net.h2007-05-21 17:40:49.0 
-0500
+++ netdev-2.6/drivers/net/spider_net.h 2007-05-22 18:03:24.0 -0500
@@ -25,7 +25,7 @@
 #ifndef _SPIDER_NET_H
 #define _SPIDER_NET_H
 
-#define VERSION 2.0 A
+#define VERSION 2.0 B
 
 #include sungem_phy.h
 
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: UDP checksum broken since 2.6.18?

2007-05-22 Thread Stephen Hemminger
On Tue, 22 May 2007 21:47:22 +
Thomas B. Rücker [EMAIL PROTECTED] wrote:

 hi,
 
 a friend of mine recently contacted me about what he at first thought
 were IPv6 issues with some java software.
 
 As it turns out it probably is a general IP issue with the Linux kernel:
 
 He wrote this piece of c which sends an UDP packet to 127.28.50.50 -
 http://www2.futureware.at/~philipp/udp-problem.c
 Packets generated by this code were captured by tcpdump and wireshark.
 When feeding the dump into wireshark it says:
 Checksum: 0x62fd [incorrect, should be 0xe4f3] for the udp packet.
 
 We've tested this on several kernel versions.
 Wireshark reports checksum broken:
 Linux version 2.6.18-4-vserver-686 (Debian 2.6.18.dfsg.1-12)
 ([EMAIL PROTECTED]) (gcc version 4.1.2 20061115 (prerelease) (Debian
 4.1.1-21)) #1 SMP Mon Mar 26 19:55:22 UTC 2007
 Linux version 2.6.19-dm8tbr-1 ([EMAIL PROTECTED]) (gcc version 4.1.2 20061028
 (prerelease) (Debian 4.1.1-19)) #3 SMP PREEMPT Sun Dec 3 18:31:00 CET
 2006 - (that's vanilla)
 Linux version 2.6.21.1-dm8tbr-1 ([EMAIL PROTECTED]) (gcc version 4.1.2
 20061115 (prerelease) (Debian 4.1.1-21)) #3 SMP Fri May 18 09:04:55 CEST
 2007 - (that's vanilla + dscape patch)
 
 Wireshark reports checksum ok:
 Linux version 2.6.16.13-4-default ([EMAIL PROTECTED]) (gcc version 4.1.0
 (SUSE Linux)) #1 Wed May 3 04:53:23 UTC 2006
 Linux version 2.6.17-11-386 ([EMAIL PROTECTED]) (gcc version 4.1.2 20060928
 (prerelease) (Ubuntu 4.1.1-13ubuntu5)) #2 Tue Mar 13 23:30:30 UTC 2007
 (Ubuntu 2.6.17-11.37-386)
 
 So my guess is something between 2.6.17 and 2.6.18 broke.
 
 Second option is: The way you are supposed to send UDP packets changed
 in 2.6.18 and sun javavm and that piece of c are broken for the same reason.
 
 Third option: everything is perfectly ok, the UDP checksum is computed
 in a different way since 2.6.18 - due to some reason I don't know - and
 Wireshark is broken.
 
 We'd be grateful for some enlightment.
 
 Cheers
 
 Thomas


The packet passed to packet capture programs may not have a valid checksum
if you have checksum offload configured on the device.  What kind of hardware
do you have on sender and receiver?  Try disabling checksum offload with
ethtool.

If you are getting bad UDP checksums then the counters in 'netstat -s'
will be increasing.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/10] spidernet: move a block of code around

2007-05-22 Thread Linas Vepstas

Put the enable and disable routines next to one-another, 
as this makes verifying thier symmetry that much easier.

Signed-off-by: Linas Vepstas [EMAIL PROTECTED]


 drivers/net/spider_net.c |   28 ++--
 1 file changed, 14 insertions(+), 14 deletions(-)

Index: netdev-2.6/drivers/net/spider_net.c
===
--- netdev-2.6.orig/drivers/net/spider_net.c2007-05-22 18:03:24.0 
-0500
+++ netdev-2.6/drivers/net/spider_net.c 2007-05-22 18:03:30.0 -0500
@@ -506,6 +506,20 @@ spider_net_enable_rxdmac(struct spider_n
 }
 
 /**
+ * spider_net_disable_rxdmac - disables the receive DMA controller
+ * @card: card structure
+ *
+ * spider_net_disable_rxdmac terminates processing on the DMA controller
+ * by turing off the DMA controller, with the force-end flag set.
+ */
+static inline void
+spider_net_disable_rxdmac(struct spider_net_card *card)
+{
+   spider_net_write_reg(card, SPIDER_NET_GDADMACCNTR,
+SPIDER_NET_DMA_RX_FEND_VALUE);
+}
+
+/**
  * spider_net_refill_rx_chain - refills descriptors/skbs in the rx chains
  * @card: card structure
  *
@@ -657,20 +671,6 @@ write_hash:
 }
 
 /**
- * spider_net_disable_rxdmac - disables the receive DMA controller
- * @card: card structure
- *
- * spider_net_disable_rxdmac terminates processing on the DMA controller by
- * turing off DMA and issueing a force end
- */
-static void
-spider_net_disable_rxdmac(struct spider_net_card *card)
-{
-   spider_net_write_reg(card, SPIDER_NET_GDADMACCNTR,
-SPIDER_NET_DMA_RX_FEND_VALUE);
-}
-
-/**
  * spider_net_prepare_tx_descr - fill tx descriptor with skb data
  * @card: card structure
  * @descr: descriptor structure to fill out
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: UDP checksum broken since 2.6.18?

2007-05-22 Thread Thomas B. Rücker
Stephen Hemminger wrote:
 The packet passed to packet capture programs may not have a valid checksum
 if you have checksum offload configured on the device.  What kind of hardware
 do you have on sender and receiver? 
The c-snippet uses 127.x.x.x -- loopback
I think Philipp tested this on real NICs too.

I just proxied this information to this list because I don't have the
indepth knowledge about IP, UDP and the kernel. I had a Chat with Ralf
Baechle and he recommended sending a report to [EMAIL PROTECTED] Maybe Philipp
can give some more information. He discovered this.

Cheers

Thomas
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Cbe-oss-dev] [PATCH 4/10] spidernet: zero out a pointer.

2007-05-22 Thread Linas Vepstas
On Thu, May 17, 2007 at 09:32:56AM +1000, Michael Ellerman wrote:
  +   hwdescr-buf_addr = 0x0;
 
 If you're going to be paranoid, shouldn't you do something here to make
 sure the value's hit the device?

I thought the whole point of paranoia is that its inexplicable.

Here's a delusional reply: I didn't see any point to it. 
1) a wmb would add overhead
2) the hardware is supposed to be looking at the status flag,
   anyway, and not misbehaving.
3) there is a wmb when the descr is actually refilled in such
   a way as to actually mean something to the hardware.

All that I really acomplished here is a minor trick to 
aid in debug printing when looking for something bad.

--linas

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/10] spidernet: zero out a pointer.

2007-05-22 Thread Linas Vepstas

Invalidate a pointer as its pci_unmap'ed; this is a bit of 
paranoia to make sure hardware doesn't continue trying to 
DMA to it.

Signed-off-by: Linas Vepstas [EMAIL PROTECTED]


 drivers/net/spider_net.c |7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

Index: netdev-2.6/drivers/net/spider_net.c
===
--- netdev-2.6.orig/drivers/net/spider_net.c2007-05-22 18:03:30.0 
-0500
+++ netdev-2.6/drivers/net/spider_net.c 2007-05-22 18:03:32.0 -0500
@@ -1069,6 +1069,7 @@ spider_net_decode_one_descr(struct spide
struct spider_net_descr_chain *chain = card-rx_chain;
struct spider_net_descr *descr = chain-tail;
struct spider_net_hw_descr *hwdescr = descr-hwdescr;
+   u32 hw_buf_addr;
int status;
 
status = spider_net_get_descr_status(hwdescr);
@@ -1082,7 +1083,9 @@ spider_net_decode_one_descr(struct spide
chain-tail = descr-next;
 
/* unmap descriptor */
-   pci_unmap_single(card-pdev, hwdescr-buf_addr,
+   hw_buf_addr = hwdescr-buf_addr;
+   hwdescr-buf_addr = 0x0;
+   pci_unmap_single(card-pdev, hw_buf_addr,
SPIDER_NET_MAX_FRAME, PCI_DMA_FROMDEVICE);
 
if ( (status == SPIDER_NET_DESCR_RESPONSE_ERROR) ||
@@ -1118,7 +1121,7 @@ spider_net_decode_one_descr(struct spide
pr_err(%s: bad status, cmd_status=x%08x\n,
   card-netdev-name,
   hwdescr-dmac_cmd_status);
-   pr_err(buf_addr=x%08x\n, hwdescr-buf_addr);
+   pr_err(buf_addr=x%08x\n, hw_buf_addr);
pr_err(buf_size=x%08x\n, hwdescr-buf_size);
pr_err(next_descr_addr=x%08x\n, hwdescr-next_descr_addr);
pr_err(result_size=x%08x\n, hwdescr-result_size);
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 6/10] spidernet: Don't terminate the RX ring

2007-05-22 Thread Linas Vepstas

There is no real reason to terminate the RX ring; it
doesn't make the operation any smooother, and it does
require an extra sync. So don't do it.

Signed-off-by: Linas Vepstas [EMAIL PROTECTED]


 drivers/net/spider_net.c |   18 +-
 1 file changed, 9 insertions(+), 9 deletions(-)

Index: netdev-2.6/drivers/net/spider_net.c
===
--- netdev-2.6.orig/drivers/net/spider_net.c2007-05-22 18:03:34.0 
-0500
+++ netdev-2.6/drivers/net/spider_net.c 2007-05-22 18:03:35.0 -0500
@@ -462,13 +462,9 @@ spider_net_prepare_rx_descr(struct spide
hwdescr-dmac_cmd_status = SPIDER_NET_DESCR_NOT_IN_USE;
} else {
hwdescr-buf_addr = buf;
-   hwdescr-next_descr_addr = 0;
wmb();
hwdescr-dmac_cmd_status = SPIDER_NET_DESCR_CARDOWNED |
 SPIDER_NET_DMAC_NOINTR_COMPLETE;
-
-   wmb();
-   descr-prev-hwdescr-next_descr_addr = descr-bus_addr;
}
 
return 0;
@@ -557,12 +553,16 @@ spider_net_refill_rx_chain(struct spider
 static int
 spider_net_alloc_rx_skbs(struct spider_net_card *card)
 {
-   int result;
-   struct spider_net_descr_chain *chain;
+   struct spider_net_descr_chain *chain = card-rx_chain;
+   struct spider_net_descr *start= chain-tail;
+   struct spider_net_descr *descr = start;
 
-   result = -ENOMEM;
+   /* Link up the hardware chain pointers */
+   do {
+   descr-prev-hwdescr-next_descr_addr = descr-bus_addr;
+   descr = descr-next;
+   } while (descr != start);
 
-   chain = card-rx_chain;
/* Put at least one buffer into the chain. if this fails,
 * we've got a problem. If not, spider_net_refill_rx_chain
 * will do the rest at the end of this function. */
@@ -579,7 +579,7 @@ spider_net_alloc_rx_skbs(struct spider_n
 
 error:
spider_net_free_rx_chain_contents(card);
-   return result;
+   return -ENOMEM;
 }
 
 /**
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 7/10] spidernet: enhance the dump routine

2007-05-22 Thread Linas Vepstas

Crazy device problems are hard to debug, when one does not have
good trace info. This patch makes a major enhancement to the
device dump routine.

Signed-off-by: Linas Vepstas [EMAIL PROTECTED]


 drivers/net/spider_net.c |   62 ---
 1 file changed, 54 insertions(+), 8 deletions(-)

Index: netdev-2.6/drivers/net/spider_net.c
===
--- netdev-2.6.orig/drivers/net/spider_net.c2007-05-22 18:03:35.0 
-0500
+++ netdev-2.6/drivers/net/spider_net.c 2007-05-22 18:03:37.0 -0500
@@ -1024,34 +1024,78 @@ spider_net_pass_skb_up(struct spider_net
netif_receive_skb(skb);
 }
 
-#ifdef DEBUG
 static void show_rx_chain(struct spider_net_card *card)
 {
struct spider_net_descr_chain *chain = card-rx_chain;
struct spider_net_descr *start= chain-tail;
struct spider_net_descr *descr= start;
+   struct spider_net_hw_descr *hwd = start-hwdescr;
+   char *iface = card-netdev-name;
+   u32 curr_desc, next_desc;
int status;
 
int cnt = 0;
-   int cstat = spider_net_get_descr_status(descr);
-   printk(KERN_INFO RX chain tail at descr=%ld\n,
-(start - card-descr) - card-tx_chain.num_desc);
+   int off = 0;
+   int cstat = hwd-dmac_cmd_status;
+
+   printk(KERN_INFO %s: Total number of descrs=%d\n,
+   iface, chain-num_desc);
+   printk(KERN_INFO %s: Chain tail located at descr=%d\n,
+   iface, (int) (start - chain-ring));
+
+   curr_desc = spider_net_read_reg(card, SPIDER_NET_GDACTDPA);
+   next_desc = spider_net_read_reg(card, SPIDER_NET_GDACNEXTDA);
+
status = cstat;
do
{
-   status = spider_net_get_descr_status(descr);
+   hwd = descr-hwdescr;
+   off = descr - chain-ring;
+   if (descr==chain-head)
+   printk(KERN_INFO %s: chain head is at %d\n, iface, 
off);
+   if (curr_desc == descr-bus_addr)
+   printk(KERN_INFO %s: hw curr desc is at %d\n, iface, 
off);
+   if (next_desc == descr-bus_addr)
+   printk(KERN_INFO %s: hw next desc is at %d\n, iface, 
off);
+   if (hwd-next_descr_addr == 0)
+   printk(KERN_INFO %s: chain is cut at %d\n, iface, 
off);
+   status = hwd-dmac_cmd_status;
if (cstat != status) {
-   printk(KERN_INFO Have %d descrs with stat=x%08x\n, 
cnt, cstat);
+   printk(KERN_INFO %s: Have %d descrs with stat=x%08x\n,
+   iface, cnt, cstat);
cstat = status;
cnt = 0;
}
cnt ++;
descr = descr-next;
} while (descr != start);
-   printk(KERN_INFO Last %d descrs with stat=x%08x\n, cnt, cstat);
-}
+   printk(KERN_INFO %s: Last %d descrs with stat=x%08x\n,
+   iface, cnt, cstat);
+
+#ifdef DEBUG
+   /* Now dump the whole ring */
+   descr = start;
+   do
+   {
+   struct spider_net_hw_descr *hwd = descr-hwdescr;
+   status = spider_net_get_descr_status(hwd);
+   cnt = descr - chain-ring;
+   printk(KERN_INFO Descr %d stat=0x%08x skb=%p\n,
+   cnt, status, descr-skb);
+   printk(KERN_INFO bus addr=%08x buf addr=%08x sz=%d\n,
+   descr-bus_addr, hwd-buf_addr, hwd-buf_size);
+   printk(KERN_INFO next=%08x result sz=%d valid sz=%d\n,
+   hwd-next_descr_addr, hwd-result_size, 
hwd-valid_size);
+   printk(KERN_INFO dmac=%08x data stat=%08x data err=%08x\n,
+   hwd-dmac_cmd_status, hwd-data_status, 
hwd-data_error);
+   printk(KERN_INFO \n);
+
+   descr = descr-next;
+   } while (descr != start);
 #endif
 
+}
+
 /**
  * spider_net_decode_one_descr - processes an RX descriptor
  * @card: card structure
@@ -1141,6 +1185,8 @@ spider_net_decode_one_descr(struct spide
return 1;
 
 bad_desc:
+   if (netif_msg_rx_err(card))
+   show_rx_chain(card);
dev_kfree_skb_irq(descr-skb);
descr-skb = NULL;
hwdescr-dmac_cmd_status = SPIDER_NET_DESCR_NOT_IN_USE;
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 8/10] spidernet: reset the card when an rxramfull is seen

2007-05-22 Thread Linas Vepstas

Some versions of the spider have a firmware bug, where the
RX ring sequencer goes crazy when the RX RAM on the device
fills up. Appearently the only viable wrkaround is a soft
reset of the card.

Signed-off-by: Linas Vepstas [EMAIL PROTECTED]


 drivers/net/spider_net.c |   14 +++---
 1 file changed, 11 insertions(+), 3 deletions(-)

Index: netdev-2.6/drivers/net/spider_net.c
===
--- netdev-2.6.orig/drivers/net/spider_net.c2007-05-22 18:03:37.0 
-0500
+++ netdev-2.6/drivers/net/spider_net.c 2007-05-22 18:03:39.0 -0500
@@ -1506,11 +1506,17 @@ spider_net_handle_error_irq(struct spide
case SPIDER_NET_GRFBFLLINT: /* fallthrough */
case SPIDER_NET_GRFAFLLINT: /* fallthrough */
case SPIDER_NET_GRMFLLINT:
-   if (netif_msg_intr(card)  net_ratelimit())
-   pr_err(%s: Spider RX RAM full, incoming packets 
-  might be discarded!\n, card-netdev-name);
+   if (netif_msg_intr(card)  net_ratelimit()) {
+   pr_err(%s: Spider RX RAM full, reseting device.\n,
+  card-netdev-name);
+   show_rx_chain(card);
+   }
spider_net_rx_irq_off(card);
netif_rx_schedule(card-netdev);
+
+   /* If the card is spewing rxramfulls, then reset */
+   atomic_inc(card-tx_timeout_task_counter);
+   schedule_work(card-tx_timeout_task);
show_error = 0;
break;
 
@@ -2087,6 +2093,8 @@ spider_net_workaround_rxramfull(struct s
 {
int i, sequencer = 0;
 
+   printk(KERN_INFO %s: calling rxramfull workaround\n, 
card-netdev-name);
+
/* cancel reset */
spider_net_write_reg(card, SPIDER_NET_CKRCTRL,
 SPIDER_NET_CKRCTRL_RUN_VALUE);
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 9/10] spidernet: service TX later.

2007-05-22 Thread Linas Vepstas

When entering the netdev poll routine, empty out the RX
chain first, before cleaning up the TX chain. This should
help avoid RX buffer overflows.

Signed-off-by: Linas Vepstas [EMAIL PROTECTED]


 drivers/net/spider_net.c |3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Index: netdev-2.6/drivers/net/spider_net.c
===
--- netdev-2.6.orig/drivers/net/spider_net.c2007-05-22 18:03:39.0 
-0500
+++ netdev-2.6/drivers/net/spider_net.c 2007-05-22 18:03:41.0 -0500
@@ -1212,7 +1212,6 @@ spider_net_poll(struct net_device *netde
int packets_to_do, packets_done = 0;
int no_more_packets = 0;
 
-   spider_net_cleanup_tx_ring(card);
packets_to_do = min(*budget, netdev-quota);
 
while (packets_to_do) {
@@ -1231,6 +1230,8 @@ spider_net_poll(struct net_device *netde
spider_net_refill_rx_chain(card);
spider_net_enable_rxdmac(card);
 
+   spider_net_cleanup_tx_ring(card);
+
/* if all packets are in the stack, enable interrupts and return 0 */
/* if not, return 1 */
if (no_more_packets) {
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 10/10] spidernet: increase the NAPI weight

2007-05-22 Thread Linas Vepstas

Another way of minimizing the likelyhood of RX ram from overflowing
is to empty out the entire rx ring every chance we get. Change
the crazy watchdog timeout from 50 seconds to 3 seconds, while
we're here.

Signed-off-by: Linas Vepstas [EMAIL PROTECTED]


 drivers/net/spider_net.h |9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

Index: netdev-2.6/drivers/net/spider_net.h
===
--- netdev-2.6.orig/drivers/net/spider_net.h2007-05-22 18:03:24.0 
-0500
+++ netdev-2.6/drivers/net/spider_net.h 2007-05-22 18:03:43.0 -0500
@@ -56,8 +56,13 @@ extern char spider_net_driver_name[];
 
 #define SPIDER_NET_RX_CSUM_DEFAULT 1
 
-#define SPIDER_NET_WATCHDOG_TIMEOUT50*HZ
-#define SPIDER_NET_NAPI_WEIGHT 64
+#define SPIDER_NET_WATCHDOG_TIMEOUT3*HZ
+
+/* We really really want to empty the ring buffer every time,
+ * so as to avoid the RX ram full bug. So set the napi weight
+ * to the ring size.
+ */
+#define SPIDER_NET_NAPI_WEIGHT 
SPIDER_NET_RX_DESCRIPTORS_DEFAULT
 
 #define SPIDER_NET_FIRMWARE_SEQS   6
 #define SPIDER_NET_FIRMWARE_SEQWORDS   1024
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RTNETLINK]: Allow changing of subsets of netdevice flags in rtnl_setlink

2007-05-22 Thread David Miller

Applied, thanks for finding this interface deficiency.
:-)
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RTNETLINK]: Remove remains of wireless extensions over rtnetlink

2007-05-22 Thread David Miller
From: Johannes Berg [EMAIL PROTECTED]
Date: Tue, 22 May 2007 11:27:46 +0200

 
  [RTNETLINK]: Remove remains of wireless extensions over rtnetlink
  
  Remove some unused variables and function arguments related to the recently
  removed wireless extensions over rtnetlink.
 
 Still more! Sorry about that and thanks!
 
  Signed-off-by: Patrick McHardy [EMAIL PROTECTED]
 
 Since I did the removal in the first place,
 Acked-by: Johannes Berg [EMAIL PROTECTED]

Applied, thanks everyone.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: IFF_PROMISC again

2007-05-22 Thread Ben Greear

Martín Ferrari wrote:

Hi, for the nth time I send this email, hoping that majordomo won't eat
it again.


I know this has been extensibly discussed circa 2001, but I found that
there's still problems: in debian (at least) neither ifconfig nor ip
can tell that the interface is in promiscuous mode.

I know about the deprecation of IFF_PROMISC, but I couldn't find out
which is the current way of knowing the real state of the interface. I
want to fix ifconfig, so this is not an issue of
PACKET_(ADD|REMOVE)_MEMBERSHIP, I need to query the real device state.


I have the same problem.  I think you can tell by looking at bit 0x100
in /sys/class/net/[ethX]/flags

Not exactly fun to use, but it seems to work.

Anyone know the reasoning for masking out the PROMISC flag
in dev_get_flags() ?

Ben

--
Ben Greear [EMAIL PROTECTED]
Candela Technologies Inc  http://www.candelatech.com

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: IFF_PROMISC again

2007-05-22 Thread David Miller
From: Ben Greear [EMAIL PROTECTED]
Date: Tue, 22 May 2007 17:08:18 -0700

 Anyone know the reasoning for masking out the PROMISC flag
 in dev_get_flags() ?

Because promiscuous status is a counter, not a binary
on-off state.

You can't expect to just clear it and expect all the
other promiscuous users to just go away and be ok
with the device leaving promiscuous mode.

Since you can't sanely set it, we don't provide it
either.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: IFF_PROMISC again

2007-05-22 Thread Ben Greear

David Miller wrote:

From: Ben Greear [EMAIL PROTECTED]
Date: Tue, 22 May 2007 17:08:18 -0700


Anyone know the reasoning for masking out the PROMISC flag
in dev_get_flags() ?


Because promiscuous status is a counter, not a binary
on-off state.

You can't expect to just clear it and expect all the
other promiscuous users to just go away and be ok
with the device leaving promiscuous mode.


Yes, I understand why you wouldn't let a user set promisc in this
manner.


Since you can't sanely set it, we don't provide it
either.


What harm is there letting the user know if their hardware is PROMISC
or not, regardless of how it got that way?

Also, it seems you *can* at least turn it on with ifconfig, and
you can decrement at least once with ifconfig as well.  If nothing
else has promiscuity set, then this will indeed toggle the state, right?

Thanks,
Ben


--
Ben Greear [EMAIL PROTECTED]
Candela Technologies Inc  http://www.candelatech.com

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Cbe-oss-dev] [PATCH 4/10] spidernet: zero out a pointer.

2007-05-22 Thread Segher Boessenkool
If you're going to be paranoid, shouldn't you do something here to 
make

sure the value's hit the device?


I thought the whole point of paranoia is that its inexplicable.

Here's a delusional reply: I didn't see any point to it.
1) a wmb would add overhead


A wmb() doesn't guarantee the write has reached the device.


2) the hardware is supposed to be looking at the status flag,
   anyway, and not misbehaving.


But you're paranoid, right?  Can't trust that device! :-)


Segher

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[no subject]

2007-05-22 Thread Inaky Perez-Gonzalez
subscribe [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] [-mm] ACPI: export ACPI events via netlink

2007-05-22 Thread Zhang Rui
On Tue, 2007-05-22 at 07:03 -0400, jamal wrote:
 Hi Zhang Rui,
 
 Really cool stuff. Can you instead use genetlink?
 http://linux-net.osdl.org/index.php/Generic_Netlink_HOWTO 
 should help. And if you have more questions post on netdev (not lk).
 
That's really helpful, thanks.
Will post the second version soon. :)

Thanks,
Rui

 On Tue, 2007-22-05 at 17:47 +0800, Zhang Rui wrote:
  From: Zhang Rui [EMAIL PROTECTED]
  
  Export ACPI events via netlink.
  A netlink message is broadcasted when an ACPI event is generated.
  
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Cbe-oss-dev] [PATCH 4/10] spidernet: zero out a pointer.

2007-05-22 Thread Benjamin Herrenschmidt

 Here's a delusional reply: I didn't see any point to it. 
 1) a wmb would add overhead
 2) the hardware is supposed to be looking at the status flag,
anyway, and not misbehaving.
 3) there is a wmb when the descr is actually refilled in such
a way as to actually mean something to the hardware.
 
 All that I really acomplished here is a minor trick to 
 aid in debug printing when looking for something bad.

And the whole thing is moot because 0 is actually a perfectly valid DMA
address :-) I suspect spider will end up trying to hit some internal
register or it's PCIe or whatever it has mapped at 0 internally and will
blow up ... At least on spider, it's not RAM there

Ben.


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH] s2io: don't run MSI handlers if device is offline.

2007-05-22 Thread Sivakumar Subramani
Fix looks good. No comments.
~Siva 

-Original Message-
From: Linas Vepstas [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, May 23, 2007 4:20 AM
To: Jeff Garzik; Andrew Morton
Cc: [EMAIL PROTECTED]; netdev@vger.kernel.org;
Ramkrishna Vepa; Sivakumar Subramani; Sreenivasa Honnur; Rastapur
Santosh; Wen Xiong
Subject: [PATCH] s2io: don't run MSI handlers if device is offline.


Don't run any of the MSI handlers if the channel is off; also don't
gather device statatistics. Also, netif_wake not needed, per suggestions
from Sivakumar Subramani [EMAIL PROTECTED].

Signed-off-by: Linas Vepstas [EMAIL PROTECTED]
Cc: Ramkrishna Vepa [EMAIL PROTECTED]
Cc: Sivakumar Subramani [EMAIL PROTECTED]
Cc: Sreenivasa Honnur [EMAIL PROTECTED]
Cc: Rastapur Santosh [EMAIL PROTECTED]
Cc: Wen Xiong [EMAIL PROTECTED]


diff --git a/drivers/net/s2io.c b/drivers/net/s2io.c index
e46e164..871c37c 100644
--- a/drivers/net/s2io.c
+++ b/drivers/net/s2io.c
@@ -4202,6 +4202,9 @@ static irqreturn_t s2io_msi_handle(int i
struct mac_info *mac_control;
struct config_param *config;
 
+   if (pci_channel_offline(sp-pdev))
+   return IRQ_NONE;
+
atomic_inc(sp-isr_cnt);
mac_control = sp-mac_control;
config = sp-config;
@@ -4232,6 +4235,9 @@ static irqreturn_t s2io_msix_ring_handle
struct ring_info *ring = (struct ring_info *)dev_id;
struct s2io_nic *sp = ring-nic;
 
+   if (pci_channel_offline(sp-pdev))
+   return IRQ_NONE;
+
atomic_inc(sp-isr_cnt);
 
rx_intr_handler(ring);
@@ -4246,6 +4252,9 @@ static irqreturn_t s2io_msix_fifo_handle
struct fifo_info *fifo = (struct fifo_info *)dev_id;
struct s2io_nic *sp = fifo-nic;
 
+   if (pci_channel_offline(sp-pdev))
+   return IRQ_NONE;
+
atomic_inc(sp-isr_cnt);
tx_intr_handler(fifo);
atomic_dec(sp-isr_cnt);
@@ -4428,6 +4437,9 @@ static void s2io_updt_stats(struct s2io_
u64 val64;
int cnt = 0;
 
+   if (pci_channel_offline(sp-pdev))
+   return;
+
if (atomic_read(sp-card_state) == CARD_UP) {
/* Apprx 30us on a 133 MHz bus */
val64 = SET_UPDT_CLICKS(10) |
@@ -8122,5 +8134,4 @@ static void s2io_io_resume(struct pci_de
}
 
netif_device_attach(netdev);
-   netif_wake_queue(netdev);
 }
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html