Re: RED + ECN not working

2007-01-15 Thread Jarek Poplawski
On 09-01-2007 17:08, [EMAIL PROTECTED] wrote:
 Hello,
 
 I have been trying to get the RED qdisc and ECN to work for the past few weeks
 and all my experiments have failed. Here is the setup I am using. 
 
 Src -- R1 -- R2 -- Dst
 
 Between Src and R1 is a 100Mbps link and between R1 and R2 a 10Mbps link. I 
 set up
 the qdisc at R1 as follows
 
 tc qdisc add dev eth3 root handle 1: prio
 tc qdisc add dev eth3 parent 1:1 handle 10: sfq
 tc qdisc add dev eth3 parent 1:2 handle 20: sfq
 tc qdisc add dev eth3 parent 1:3 handle 30: red limit 1 min 3000 max 5000
 avpkt 1000 burst 5 probability 0.5 bandwidth 256kbit ecn
 
 I also inserted printk statments inside the code to print the calculate queue 
 average (RED param), the 
 backlog (Qdisc param) and the queue length (sk_buff_head param). I also 
 inserted print statements for 
 each action of RED i.e. DONT_MARK, PROB_MARK, HARD_MARK and DROP.
 
 For the purpose of my experiments, I transferred a 25 MB file. I also did 100 
 simultaneous TCP 
 transfers for 2 mins using ipref. In all cases none of the packets were 
 either marked or dropped by the 
 RED code. This was verified by the print statements in the logs. For all 
 runs, qavg and backlog were 0 
 and qlen was 1. 
 
 I have even tried classless red and got same results. Does anyone know 
 whether RED+ECN work ? Any 
 tests and setups that someone has used and got successful results ? I would 
 really appreciate any input 
 or suggestions on this.

Hi,

Did you try with lartc.org list? 

And probably there could be more details, like:
- where is eth3
- tc -s -d qdisc show dev eth3 (before and after the transfer)
- kernel and iproute versions

Regards,

Jarek P.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Two Dual Core processors and NICS (not handling interrupts on one CPU/assigning a Two Dual Core processors and NICS (not handling interrupts on one CPU / assigning a CPU to a NIC)

2007-01-15 Thread Mark Ryden

Hello,


I have a machine with 2 dual core CPUs. This machine runs Fedora Core 6.
I have two Intel e1000 GigaBit network cards on this machine; I use bonding so
that the machine assigns the same IP address to both NICs ;
It seems to me that bonding is configured OK, bacuse when running:
cat /proc/net/bonding/bond0
I get:

Ethernet Channel Bonding Driver: v3.0.3 (March 23, 2006)

Bonding Mode: load balancing (round-robin)
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0

Slave Interface: eth0
MII Status: up
Link Failure Count: 1
Permanent HW addr: .

Slave Interface: eth1
MII Status: up
Link Failure Count: 1
Permanent HW addr: 

(And the Permanent HW addr is diffenet in these two entries).

I send a large amount of packets to this machine (more than 20,000 in
a second).

cat /proc/interrupts shops something like this:
CPU0   CPU1 CPU2 CPU3
50:3359337  0  0  0 PCI-MSI  eth0
58: 493396136  0  0 PCI-MSI  eth1

CPU0 and CPU1 are of the first CPU as far as I understand ; so
this means as far as I understand that the second CPU (which has CPU3
and CPU4) does not handle
interrupts of the arrived packets; Can I somehow change it so the second
CPU will also handle network interrupts of receiving packets on the nic ?

Can I assign one CPU to  eth0 and the second CPU to eth1  ?

Regards,
Mark
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: watchdog timeout panic in e1000 driver

2007-01-15 Thread Kenzo Iwami
Hi,

During the holiday season, I posted a patch that fixed this problem without
using spinlocks nor disabling interrupts.
  http://marc.theaimsgroup.com/?l=linux-netdevm=116649413613845w=2

With this patch applied, I confirmed that the system doesn't panic.
I think this patch can fix this problem.
Does this patch have problems.

I welcome any comments.

--
  Kenzo Iwami ([EMAIL PROTECTED])
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] [IrDA] irda-usb TX path optimization (was Re: IrDA spams logfiles - since 2.6.19)

2007-01-15 Thread Samuel Ortiz
Hi Dave,

Since we stop using dev_alloc_skb on the IrDA TX frame, we constantly run
into the case of the skb headroom being 0, and thus we call skb_cow for
every IrDA TX frame.
This patch uses a local buffer and memcpy the skb to it, saving us a
kmalloc for each of those IrDA TX frames.

Signed-off-by: Samuel Ortiz [EMAIL PROTECTED]

---
 drivers/net/irda/irda-usb.c |   43 ---
 drivers/net/irda/irda-usb.h |1 +
 2 files changed, 21 insertions(+), 23 deletions(-)

diff --git a/drivers/net/irda/irda-usb.c b/drivers/net/irda/irda-usb.c
index 3ca1082..8381c04 100644
--- a/drivers/net/irda/irda-usb.c
+++ b/drivers/net/irda/irda-usb.c
@@ -441,25 +441,13 @@ static int irda_usb_hard_xmit(struct sk_buff *skb, struct 
net_device *netdev)
goto drop;
}
 
-   /* Make sure there is room for IrDA-USB header. The actual
-* allocation will be done lower in skb_push().
-* Also, we don't use directly skb_cow(), because it require
-* headroom = 16, which force unnecessary copies - Jean II */
-   if (skb_headroom(skb)  self-header_length) {
-   IRDA_DEBUG(0, %s(), Insuficient skb headroom.\n, 
__FUNCTION__);
-   if (skb_cow(skb, self-header_length)) {
-   IRDA_WARNING(%s(), failed skb_cow() !!!\n, 
__FUNCTION__);
-   goto drop;
-   }
-   }
+   memcpy(self-tx_buff + self-header_length, skb-data, skb-len);
 
/* Change setting for next frame */
-
if (self-capability  IUC_STIR421X) {
__u8 turnaround_time;
-   __u8* frame;
+   __u8* frame = self-tx_buff;
turnaround_time = get_turnaround_time( skb );
-   frame= skb_push(skb, self-header_length);
irda_usb_build_header(self, frame, 0);
frame[2] = turnaround_time;
if ((skb-len != 0) 
@@ -472,17 +460,17 @@ static int irda_usb_hard_xmit(struct sk_buff *skb, struct 
net_device *netdev)
frame[1] = 0;
}
} else {
-   irda_usb_build_header(self, skb_push(skb, self-header_length), 
0);
+   irda_usb_build_header(self, self-tx_buff, 0);
}
 
/* FIXME: Make macro out of this one */
((struct irda_skb_cb *)skb-cb)-context = self;
 
-usb_fill_bulk_urb(urb, self-usbdev, 
+   usb_fill_bulk_urb(urb, self-usbdev,
  usb_sndbulkpipe(self-usbdev, self-bulk_out_ep),
-  skb-data, IRDA_SKB_MAX_MTU,
+  self-tx_buff, skb-len + self-header_length,
   write_bulk_callback, skb);
-   urb-transfer_buffer_length = skb-len;
+
/* This flag (URB_ZERO_PACKET) indicates that what we send is not
 * a continuous stream of data but separate packets.
 * In this case, the USB layer will insert an empty USB frame (TD)
@@ -1455,6 +1443,9 @@ static inline void irda_usb_close(struct irda_usb_cb 
*self)
/* Remove the speed buffer */
kfree(self-speed_buff);
self-speed_buff = NULL;
+
+   kfree(self-tx_buff);
+   self-tx_buff = NULL;
 }
 
 /** USB CONFIG SUBROUTINES **/
@@ -1753,9 +1744,14 @@ static int irda_usb_probe(struct usb_interface *intf,
 
memset(self-speed_buff, 0, IRDA_USB_SPEED_MTU);
 
+   self-tx_buff = kzalloc(IRDA_SKB_MAX_MTU + self-header_length,
+   GFP_KERNEL);
+   if (self-tx_buff == NULL)
+   goto err_out_4;
+
ret = irda_usb_open(self);
if (ret) 
-   goto err_out_4;
+   goto err_out_5;
 
IRDA_MESSAGE(IrDA: Registered device %s\n, net-name);
usb_set_intfdata(intf, self);
@@ -1766,14 +1762,14 @@ static int irda_usb_probe(struct usb_interface *intf,
self-needspatch = (ret  0);
if (self-needspatch) {
IRDA_ERROR(STIR421X: Couldn't upload patch\n);
-   goto err_out_5;
+   goto err_out_6;
}
 
/* replace IrDA class descriptor with what patched device is 
now reporting */
irda_desc = irda_usb_find_class_desc (self-usbintf);
if (irda_desc == NULL) {
ret = -ENODEV;
-   goto err_out_5;
+   goto err_out_6;
}
if (self-irda_desc)
kfree (self-irda_desc);
@@ -1782,9 +1778,10 @@ static int irda_usb_probe(struct usb_interface *intf,
}
 
return 0;
-
-err_out_5:
+err_out_6:
unregister_netdev(self-netdev);
+err_out_5:
+   kfree(self-tx_buff);
 err_out_4:
kfree(self-speed_buff);
 err_out_3:
diff --git a/drivers/net/irda/irda-usb.h b/drivers/net/irda/irda-usb.h
index 6b2271f..e846c38 100644
--- a/drivers/net/irda/irda-usb.h
+++ 

[PATCH 2/2] [IrDA] Removed incorrect IRDA_ASSERT()

2007-01-15 Thread Samuel Ortiz
With USB2.0 bulk out MTU can be 512 bytes, so checking it only for 64 bytes is
incorrect.

Signed-off-by: Samuel Ortiz [EMAIL PROTECTED]

---
 drivers/net/irda/irda-usb.c |2 --
 1 files changed, 0 insertions(+), 2 deletions(-)

diff --git a/drivers/net/irda/irda-usb.c b/drivers/net/irda/irda-usb.c
index 8381c04..a66aacf 100644
--- a/drivers/net/irda/irda-usb.c
+++ b/drivers/net/irda/irda-usb.c
@@ -1515,8 +1515,6 @@ static inline int irda_usb_parse_endpoints(struct 
irda_usb_cb *self, struct usb_
 
IRDA_DEBUG(0, %s(), And our endpoints are : in=%02X, out=%02X (%d), 
int=%02X\n,
__FUNCTION__, self-bulk_in_ep, self-bulk_out_ep, 
self-bulk_out_mtu, self-bulk_int_ep);
-   /* Should be 8, 16, 32 or 64 bytes */
-   IRDA_ASSERT(self-bulk_out_mtu == 64, ;);
 
return((self-bulk_in_ep != 0)  (self-bulk_out_ep != 0));
 }
-- 
1.4.4.4

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: rare bad TCP checksum with 2.6.19?

2007-01-15 Thread Herbert Xu
Michael Tokarev [EMAIL PROTECTED] wrote:

 Note there's no funny/interesting hardware involved, like network cards with
 tcp checksumming offload capabilities (this is plain dumb 8139 card).

The 8139 card might be dumb, but the driver isn't :) It emulates
checksum offload in software, meaning that tcpdump will show bogus
checksums.

So please disable hardware checksum offload with ethtool -K and
then try again.

Thanks,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED]
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Two Dual Core processors and NICS (not handling interrupts on one CPU/assigning a Two Dual Core processors and NICS (not handling interrupts on one CPU / assigning a CPU to a NIC)

2007-01-15 Thread Robert Iakobashvili

Hi Mark,

On 1/15/07, Mark Ryden [EMAIL PROTECTED] wrote:

I have a machine with 2 dual core CPUs. This machine runs Fedora Core 6.
I have two Intel e1000 GigaBit network cards on this machine; I use bonding so
that the machine assigns the same IP address to both NICs ;



cat /proc/interrupts shops something like this:
CPU0   CPU1 CPU2 CPU3
 50:3359337  0  0  0 PCI-MSI  eth0
 58: 493396136  0  0 PCI-MSI  eth1

 CPU0 and CPU1 are of the first CPU as far as I understand ; so
 this means as far as I understand that the second CPU (which has CPU3
and CPU4) does not handle
 interrupts of the arrived packets; Can I somehow change it so the second
 CPU will also handle network interrupts of receiving packets on the nic ?

Can I assign one CPU to  eth0 and the second CPU to eth1  ?


How it will help you?
Y can set smp-affinity mask for each irq in /proc/irq-number/
google for 'linux smp-affinity.

The subject in more details is discussed in:
http://linux-net.osdl.org/index.php/TODO#TCP
and thread
http://marc.theaimsgroup.com/?t=11669529061r=1w=2, read from
bottom.


--
Sincerely,
Robert Iakobashvili,
coroberti %x40 gmail %x2e com
...
Navigare necesse est, vivere non est necesse
...
http://sourceforge.net/projects/curl-loader
A powerful open-source HTTP/S, FTP/S traffic
generating, loading and testing tool.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [IPROUTE 04/05]: Replace usec by time in function names

2007-01-15 Thread Jarek Poplawski
On 10-01-2007 11:01, Patrick McHardy wrote:
 [IPROUTE]: Replace usec by time in function names
 
 Rename functions containing usec since they don't necessarily return
 usec units anymore.
 
 Signed-off-by: Patrick McHardy [EMAIL PROTECTED]
 
 ---
...
 diff --git a/tc/q_cbq.c b/tc/q_cbq.c
 index a56..913b26a 100644
 --- a/tc/q_cbq.c
 +++ b/tc/q_cbq.c
 @@ -500,17 +500,17 @@ static int cbq_print_opt(struct qdisc_ut
   if (lss  show_details) {
   fprintf(f, \nlevel %u ewma %u avpkt %ub , lss-level, 
 lss-ewma_log, lss-avpkt);
   if (lss-maxidle) {
 - fprintf(f, maxidle %luus , 
 tc_core_tick2usec(lss-maxidlelss-ewma_log));
 + fprintf(f, maxidle %luus , 
 tc_core_tick2time(lss-maxidlelss-ewma_log));

If not necessarily usec, %luus could be misleading
here and later. 

...
 diff --git a/tc/q_netem.c b/tc/q_netem.c
 index cfd1799..24fb95e 100644
 --- a/tc/q_netem.c
 +++ b/tc/q_netem.c
 @@ -108,15 +108,15 @@ static int get_ticks(__u32 *ticks, const
  {
   unsigned t;
  
 - if(get_usecs(t, str))
 + if(get_time(t, str))
   return -1;
  
 - if (tc_core_usec2big(t)) {
 + if (tc_core_time2big(t)) {
   fprintf(stderr, Illegal %d usecs (too large)\n, t);

Like above but usecs.

...
 diff --git a/tc/tc_core.c b/tc/tc_core.c
 index 07dc4ba..e27254e 100644
 --- a/tc/tc_core.c
 +++ b/tc/tc_core.c
 @@ -27,21 +27,21 @@ static __u32 t2us=1;
  static __u32 us2t=1;
  static double tick_in_usec = 1;
  
 -int tc_core_usec2big(long usec)
 +int tc_core_time2big(long time)
  {
 - __u64 t = usec;
 + __u64 t = time;
  
   t *= tick_in_usec;
   return (t  32) != 0;
  }
  
  
 -long tc_core_usec2tick(long usec)
 +long tc_core_time2tick(long time)
  {
 - return usec*tick_in_usec;
 + return time*tick_in_usec;
  }
  
 -long tc_core_tick2usec(long tick)
 +long tc_core_tick2time(long tick)
  {
   return tick/tick_in_usec;
  }

Similarly (tick_in_time)?

Regards,
Jarek P.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Remove CONFIG_NET_WIRELESS

2007-01-15 Thread Johannes Berg
On Sat, 2007-01-13 at 18:17 +0100, Maarten Lankhorst wrote:
 Remove CONFIG_NET_WIRELESS
 Nothing uses this, and it breaks the kernel build if a wireless device is 
 used with a unsupported type of bus.
 Verified this with a grep.

I don't really care about the symbol and I'm in favour of removing it if
it is useless, but I don't understand the rationale. How does enabling
this cause anything to fail?

johannes


signature.asc
Description: This is a digitally signed message part


Re: [IPROUTE 02/05]: Introduce tc_calc_xmitsize and use where appropriate

2007-01-15 Thread Jarek Poplawski
On 10-01-2007 11:01, Patrick McHardy wrote:
 [IPROUTE]: Introduce tc_calc_xmitsize and use where appropriate
 
 Add tc_calc_xmitsize() as complement to tc_calc_xmittime(), which calculates
 the size that can be transmitted at a given rate during a given time.
 
 Replace all expressions of the form size = 
 rate*tc_core_tick2usec(time))/100
 by tc_calc_xmitsize() calls.
 
 Signed-off-by: Patrick McHardy [EMAIL PROTECTED]
 
 ---
...
 +unsigned tc_calc_xmitsize(unsigned rate, unsigned ticks)
 +{
 + return ((double)rate*tc_core_tick2usec(ticks))/100;
 +}
 +

Actually, besides of replacing the expression, this function
changes its type to unsigned also.

Regards,
Jarek P. 
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Remove CONFIG_NET_WIRELESS

2007-01-15 Thread Maarten Lankhorst
Johannes Berg schreef:
 On Sat, 2007-01-13 at 18:17 +0100, Maarten Lankhorst wrote:
   
 Remove CONFIG_NET_WIRELESS
 Nothing uses this, and it breaks the kernel build if a wireless device is 
 used with a unsupported type of bus.
 Verified this with a grep.
 

 I don't really care about the symbol and I'm in favour of removing it if
 it is useless, but I don't understand the rationale. How does enabling
 this cause anything to fail?

 johannes
   
Enabling this doesn't cause anything to fail, but my wireless router
doesn't have a pci bus, but instead a native SSB, so CONFIG_NET_WIRELESS
isn't selected. This in turn causes wext-common.o to not be built, so I
get missing symbols and a build breakage. That's why I made
wext-common.o depend on CONFIG_WIRELESS_EXT instead of
CONFIG_NET_WIRELESS. Since nothing else uses CONFIG_NET_WIRELESS I
decided to kill that symbol.

maarten
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Remove CONFIG_NET_WIRELESS

2007-01-15 Thread Johannes Berg
On Mon, 2007-01-15 at 13:55 +0100, Maarten Lankhorst wrote:
 Johannes Berg schreef:
  On Sat, 2007-01-13 at 18:17 +0100, Maarten Lankhorst wrote:

  Remove CONFIG_NET_WIRELESS
  Nothing uses this, and it breaks the kernel build if a wireless device is 
  used with a unsupported type of bus.
  Verified this with a grep.
  
 
  I don't really care about the symbol and I'm in favour of removing it if
  it is useless, but I don't understand the rationale. How does enabling
  this cause anything to fail?
 
  johannes

 Enabling this doesn't cause anything to fail, but my wireless router
 doesn't have a pci bus, but instead a native SSB, so CONFIG_NET_WIRELESS
 isn't selected. This in turn causes wext-common.o to not be built, so I
 get missing symbols and a build breakage. That's why I made
 wext-common.o depend on CONFIG_WIRELESS_EXT instead of
 CONFIG_NET_WIRELESS. Since nothing else uses CONFIG_NET_WIRELESS I
 decided to kill that symbol.

Ok, that makes sense to me. Let's put this in but with this better
description rather than the original one.

johannes


signature.asc
Description: This is a digitally signed message part


Re: rare bad TCP checksum with 2.6.19?

2007-01-15 Thread Michael Tokarev
Herbert Xu wrote:
 Michael Tokarev [EMAIL PROTECTED] wrote:
 Note there's no funny/interesting hardware involved, like network cards with
 tcp checksumming offload capabilities (this is plain dumb 8139 card).
 
 The 8139 card might be dumb, but the driver isn't :) It emulates
 checksum offload in software, meaning that tcpdump will show bogus
 checksums.
 
 So please disable hardware checksum offload with ethtool -K and
 then try again.

# ethtool -k eth0
Offload parameters for eth0:
Cannot get device rx csum settings: Operation not supported
Cannot get device tx csum settings: Operation not supported
Cannot get device scatter-gather settings: Operation not supported
Cannot get device tcp segmentation offload settings: Operation not supported
no offload info available

# ethtool -K eth0 rx off tx off tso off
Cannot set device rx csum settings: Operation not supported

So I guess the problem is not related to hw checksumming offloading.

Meanwhile, I tried many times to reproduce the problem - with little
success.  With different sizings, options, et al - I can't force the
sending side to send some data within a FIN packet.  I.e, most of the
time, the thing just works, because no data goes with FIN packet.
But once every 50..100 tries, I see single FIN-with-data packet, and
that one ALWAYS has bad checksum.

I was never able to reproduce the problem on a LAN, only when going from
a distant host.  And even with that distant host, it's very difficult to
reproduce.

At least one network (also distant) triggers this problem on every 2nd
try or so (the one I experimented with yesterday).  But I've no access
to that network - I kindly asked for help yesterday, but I can't abuse
their willingness to help more.

And another thing I noticed.  Right now I'm experimenting with another
machine, running 2.6.17(.13) - it also shows similar behavior with bad
csums, but MUCH rarer than this 2.6.19.  Like this:

16:29:32.490976 IP (tos 0x60, ttl  48, id 14110, offset 0, flags [DF], length: 
80)
 69.42.67.34.2612  81.13.94.6.1234: . [bad tcp cksum f4b4 (-c1cc)!] ack 93407 
win 9821
 nop,nop,timestamp 1046528199 5497679,nop,nop,sack sack 3 
{104991:109335}{110783:112231}{104991:109335} 
16:29:32.525988 IP (tos 0x60, ttl  48, id 14112, offset 0, flags [DF], length: 
80)
 69.42.67.34.2612  81.13.94.6.1234: . [bad tcp cksum 3fb1 (-1819)!] ack 93407 
win 9821
 nop,nop,timestamp 1046528202 5497679,nop,nop,sack sack 3 
{110783:113679}{122367:123815}{110783:113679} 
16:29:32.561407 IP (tos 0x60, ttl  48, id 14116, offset 0, flags [DF], length: 
80)
 69.42.67.34.2612  81.13.94.6.1234: . [bad tcp cksum 87c0 (-2610)!] ack 93407 
win 9821
 nop,nop,timestamp 1046528205 5497679,nop,nop,sack sack 3 
{122367:127103}{128551:129572}{122367:127103} 

Here, 69.42.67.34 is 2.6.17 from which I'm requesting data, and
81.13.94.6 is the sender.  This behavior so far is demonstrated with
sack packets only, but I've seen it in other direction too (also with
sack), at least once.

Any idea how to force sending FIN-with-data?

Thanks!

/mjt
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Remove CONFIG_NET_WIRELESS

2007-01-15 Thread Jiri Benc
On Mon, 15 Jan 2007 13:31:06 +, Johannes Berg wrote:
 On Mon, 2007-01-15 at 13:55 +0100, Maarten Lankhorst wrote:
  Enabling this doesn't cause anything to fail, but my wireless router
  doesn't have a pci bus, but instead a native SSB, so CONFIG_NET_WIRELESS
  isn't selected. This in turn causes wext-common.o to not be built, so I
  get missing symbols and a build breakage. That's why I made
  wext-common.o depend on CONFIG_WIRELESS_EXT instead of
  CONFIG_NET_WIRELESS. Since nothing else uses CONFIG_NET_WIRELESS I
  decided to kill that symbol.
 
 Ok, that makes sense to me. Let's put this in but with this better
 description rather than the original one.

The original mail with patch apparently didn't get to netdev (I haven't
received it and it's not in netdev archive). Maarten, could you resend
it please?

Thanks,

 Jiri

-- 
Jiri Benc
SUSE Labs
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


network failures w. r8169 (RTL8111/RTL8168B)

2007-01-15 Thread Jens Stroebel

Hello.

I am trying to get a RTL8111 (RealTek ethernet controller) running w.
the r8169 kernel module. I am using kernel 2.6.19.2 on a
LinuxFromScratch system; the motherboard on which said RTL8111 sits is
an Asus P5B.

lspci says (regarding the ethernet chip):
03:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168B PCI 
Express Gigabit Ethernet controller (rev 01)

During the use of network connections, we experience network transfer
stops during which a transfer seems to stall completely for many
seconds, after which the transfer runs as if nothing happened.

This is reproducable everytime w. 
 svn co http://svnserver/svn/tree
   (hangs VERY LONG)

and w. LWP::Parallel::UserAgent. With the latter, I fired 100 runs of
100 requests, 7 clients trying parallel requests.
Of these 100 runs, at least one, sometimes 2 stall for about 90 secs,
after which the run continues and ends successfully, although the time
of more than 90 secs for 100 requests can't be called sucessful, really.

Both the subversion checkout and the performance testing via
LWP::Parallel::UserAgent run as expected (- without stalling somewhere)
on our other machines which do not have RTL8111.
They also run as expected with kernels 2.6.18.x and
the realtek driver r1000.

With kernel 2.6.19.x, the r1000 driver is unusable as it has enormous
packet loss (used version: r1000_v1.05.tgz).
The r8169 SEEMS to have no packet loss (ping, ping -f) but above
mentioned phenomenon seems to indicate otherwise.

Has someone experienced similar effects w. r8169.ko and RTL8111?
(I searched the archives but didn't quite find anything like this)

I also tried kernel 2.6.20-rc5 to see if the problem eventually went
away, but unfortunately the scenario remains the same.

Greets,
  Jens
-- 
[EMAIL PROTECTED]
23.56...drifting 
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [IPROUTE 02/05]: Introduce tc_calc_xmitsize and use where appropriate

2007-01-15 Thread Patrick McHardy
Jarek Poplawski wrote:
 On 10-01-2007 11:01, Patrick McHardy wrote:
 
[IPROUTE]: Introduce tc_calc_xmitsize and use where appropriate

Add tc_calc_xmitsize() as complement to tc_calc_xmittime(), which calculates
the size that can be transmitted at a given rate during a given time.

Replace all expressions of the form size = 
rate*tc_core_tick2usec(time))/100
by tc_calc_xmitsize() calls.

Signed-off-by: Patrick McHardy [EMAIL PROTECTED]

---
 
 ...
 
+unsigned tc_calc_xmitsize(unsigned rate, unsigned ticks)
+{
+ return ((double)rate*tc_core_tick2usec(ticks))/100;
+}
+
 
 
 Actually, besides of replacing the expression, this function
 changes its type to unsigned also.


It doesn't change it, all expressions I replaced were directly
assigned to an unsigned int variable.

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


3c59x.c patch to 2.6.18 fixing Wake on Lan (WOL)

2007-01-15 Thread Harry Coin

The 3c59x.c in kernel 2.6.18 (and as I see later ones too) attempts
to enable PME from the D0 state. The PME config space on Dell Optiplexs
for this chip has a zero in the capabilities as it doesn't 'wake from d0'.

So the pci_wake call fails, its result is not tested, so no error is reported.

The routine changes the wake request from 0 to D3_hot.  This fix
causes wake on lan (WOL) to work properly on older Dell Optiplex models.

Harry Coin
Bettendorf, Iowa


--- drivers-orig/3c59x.c2007-01-15 00:03:52.0 -0600
+++ drivers-fixed/3c59x.c   2007-01-15 00:46:37.0 -0600
@@ -3090,8 +3090,8 @@
 /* Set Wake-On-LAN mode and put the board into D3 (power-down) state. */
 static void acpi_set_WOL(struct net_device *dev)
 {
-   struct vortex_private *vp = netdev_priv(dev);
-   void __iomem *ioaddr = vp-ioaddr;
+  struct vortex_private *vp = netdev_priv(dev);
+  void __iomem *ioaddr = vp-ioaddr;

if (vp-enable_wol) {
/* Power up on: 1==Downloaded Filter, 2==Magic Packets, 
4==Link Status. */

@@ -3101,7 +3101,7 @@
iowrite16(SetRxFilter|RxStation|RxMulticast|RxBroadcast, 
ioaddr + EL3_CMD);

iowrite16(RxEnable, ioaddr + EL3_CMD);

-   pci_enable_wake(VORTEX_PCI(vp), 0, 1);
+   pci_enable_wake(VORTEX_PCI(vp),PCI_D3hot,1);

/* Change the power state to D3; RxEnable doesn't take 
effect. */

pci_set_power_state(VORTEX_PCI(vp), PCI_D3hot);


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


3c59x.c patch to 2.6.18 fixing Wake on Lan (WOL)

2007-01-15 Thread Harry Coin

Hello all.

The 3c59x.c in kernel 2.6.18 (and as I see later ones too) attempts
to enable PME from the already awake D0 state. The PME config space on Dell 
Optiplexs

for this chip has a zero in the capabilities for this bit-- no 'wake from d0'.

The pci_enable_wake in 2.6.18 tests the capabilities before enabling PME,
so the pci_wake call fails, its result is not tested, so no error is reported.

The routine changes the wake request from 0 to D3_hot.  This fix
causes wake on lan (WOL) to work properly on older Dell Optiplex models.

Kindly overlook newbie mistakes.  Thank you.

Harry Coin
Bettendorf, Iowa


--- drivers-orig/3c59x.c2007-01-15 00:03:52.0 -0600
+++ drivers-fixed/3c59x.c   2007-01-15 00:46:37.0 -0600
@@ -3090,8 +3090,8 @@
 /* Set Wake-On-LAN mode and put the board into D3 (power-down) state. */
 static void acpi_set_WOL(struct net_device *dev)
 {
-   struct vortex_private *vp = netdev_priv(dev);
-   void __iomem *ioaddr = vp-ioaddr;
+  struct vortex_private *vp = netdev_priv(dev);
+  void __iomem *ioaddr = vp-ioaddr;

if (vp-enable_wol) {
/* Power up on: 1==Downloaded Filter, 2==Magic Packets, 
4==Link Status. */

@@ -3101,7 +3101,7 @@
iowrite16(SetRxFilter|RxStation|RxMulticast|RxBroadcast, 
ioaddr + EL3_CMD);

iowrite16(RxEnable, ioaddr + EL3_CMD);

-   pci_enable_wake(VORTEX_PCI(vp), 0, 1);
+   pci_enable_wake(VORTEX_PCI(vp),PCI_D3hot,1);

/* Change the power state to D3; RxEnable doesn't take 
effect. */

pci_set_power_state(VORTEX_PCI(vp), PCI_D3hot);


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: network failures w. r8169 (RTL8111/RTL8168B)

2007-01-15 Thread Jens Stroebel
On Mon, Jan 15, 2007 at 03:38:57PM +0100, Jens Stroebel wrote:

 During the use of network connections, we experience network transfer
 stops during which a transfer seems to stall completely for many
 seconds, after which the transfer runs as if nothing happened.
 

Addition:

Trying to debug the scenario a little, I used tcpdump to maybe find out
what/where things get lost.

This didn't work, as running tcpdump on !either server or client! made
the symtom go away (..?)

Greets,
  Jens
-- 
[EMAIL PROTECTED]
23.56...drifting 
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: watchdog timeout panic in e1000 driver

2007-01-15 Thread Auke Kok

Kenzo Iwami wrote:

With this patch applied, I confirmed that the system doesn't panic.
I think this patch can fix this problem.
Does this patch have problems.


Kenzo,

thanks for staying patient while most of us were out or busy. Apart from acknowledging 
that you might have fixed a problem with your patch, we're very reluctant to merge such 
a huge change in our driver that touches much more cases then the one that seems to be 
giving you problems.


I've thought up a much more elegant solution that prevents the driver from asserting the 
swfw semaphore during normal operations by checking the mac LU (link up) register in the 
watchdog. This allows the watchdog task to bypass all PHY checking in case all link 
statuses are OK, and thus removes the big problem that you are seeing.


Attached a version that should apply against most current trees. Please give it a try 
and let us know if this also fixes the problem for you. I will most likely push this 
patch to the netdev tree in any case.


Cheers,

Auke
---

From: Auke Kok [EMAIL PROTECTED]

e1000: Don't do PHY reads in watchdog unless link status is down

The watchdog runs code that every 2 seconds performs several PHY reads
that are locked with the swfw semaphore, causing the semaphore to be
unavailable for a short time. This is completely unneeded in case the
MAC detects PHY link up (LU).

Signed-off-by: Auke Kok [EMAIL PROTECTED]

---
 drivers/net/e1000/e1000_main.c |5 +
 1 file changed, 5 insertions(+)

diff --git a/drivers/net/e1000/e1000_main.c b/drivers/net/e1000/e1000_main.c
index 34d8e5d..9660925 100644
--- a/drivers/net/e1000/e1000_main.c
+++ b/drivers/net/e1000/e1000_main.c
@@ -2556,6 +2556,10 @@ e1000_watchdog(unsigned long data)
 	uint32_t link, tctl;
 	int32_t ret_val;
 
+	if ((netif_carrier_ok(netdev)) 
+	(E1000_READ_REG(adapter-hw, STATUS)  E1000_STATUS_LU))
+		goto link_up;
+
 	ret_val = e1000_check_for_link(adapter-hw);
 	if ((ret_val == E1000_ERR_PHY) 
 	(adapter-hw.phy_type == e1000_phy_igp_3) 
@@ -2684,6 +2688,7 @@ e1000_watchdog(unsigned long data)
 		e1000_smartspeed(adapter);
 	}
 
+link_up:
 	e1000_update_stats(adapter);
 
 	adapter-hw.tx_packet_delta = adapter-stats.tpt - adapter-tpt_old;


[PATCH]: 8139cp: Don't blindly enable interrupts in cp_start_xmit

2007-01-15 Thread Chris Lalancette
(trying again, this time to the correct maintainer)

All,
 Similar to this commit:

http://kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=d15e9c4d9a75702b30e00cdf95c71c88e3f3f51e

It's not safe in cp_start_xmit to blindly call spin_lock_irq and then 
spin_unlock_irq, since it may very well be the case that cp_start_xmit was 
called with interrupts already disabled (I came across this bug in the context 
of netdump in RedHat kernels, but the same issue holds, for example, in 
netconsole).  Therefore, replace all instances of spin_lock_irq and 
spin_unlock_irq with spin_lock_irqsave and spin_unlock_irqrestore, 
respectively, in cp_start_xmit().  I tested this against a fully-virtualized 
Xen guest, which happens to use the 8139cp driver to talk to the emulated 
hardware.  I don't have a real piece of 8139cp hardware to test on, so someone 
else will have to do that.

Signed-off-by: Chris Lalancette [EMAIL PROTECTED]
diff --git a/drivers/net/8139cp.c b/drivers/net/8139cp.c
index e2cb19b..6f93a76 100644
--- a/drivers/net/8139cp.c
+++ b/drivers/net/8139cp.c
@@ -765,17 +765,18 @@ static int cp_start_xmit (struct sk_buff *skb, struct net_device *dev)
 	struct cp_private *cp = netdev_priv(dev);
 	unsigned entry;
 	u32 eor, flags;
+	unsigned long intr_flags;
 #if CP_VLAN_TAG_USED
 	u32 vlan_tag = 0;
 #endif
 	int mss = 0;
 
-	spin_lock_irq(cp-lock);
+	spin_lock_irqsave(cp-lock, intr_flags);
 
 	/* This is a hard error, log it. */
 	if (TX_BUFFS_AVAIL(cp) = (skb_shinfo(skb)-nr_frags + 1)) {
 		netif_stop_queue(dev);
-		spin_unlock_irq(cp-lock);
+		spin_unlock_irqrestore(cp-lock, intr_flags);
 		printk(KERN_ERR PFX %s: BUG! Tx Ring full when queue awake!\n,
 		   dev-name);
 		return 1;
@@ -908,7 +909,7 @@ static int cp_start_xmit (struct sk_buff *skb, struct net_device *dev)
 	if (TX_BUFFS_AVAIL(cp) = (MAX_SKB_FRAGS + 1))
 		netif_stop_queue(dev);
 
-	spin_unlock_irq(cp-lock);
+	spin_unlock_irqrestore(cp-lock, intr_flags);
 
 	cpw8(TxPoll, NormalTxPoll);
 	dev-trans_start = jiffies;




[PATCH] ixgb: Don't stop queue unnecesarily

2007-01-15 Thread Auke-Jan H Kok
From: Auke Kok [EMAIL PROTECTED]

ixgb: Don't stop queue unnecesarily

We don't need to stop twice in ixgb_xmit_frame.

Signed-off-by: Auke Kok [EMAIL PROTECTED]

diff --git a/drivers/net/ixgb/ixgb_main.c b/drivers/net/ixgb/ixgb_main.c
index 51bd7e8..83f4d67 100644
--- a/drivers/net/ixgb/ixgb_main.c
+++ b/drivers/net/ixgb/ixgb_main.c
@@ -1473,7 +1473,6 @@ ixgb_xmit_frame(struct sk_buff *skb, struct net_device 
*netdev)
 
if (unlikely(ixgb_maybe_stop_tx(netdev, adapter-tx_ring,
  DESC_NEEDED))) {
-   netif_stop_queue(netdev);
spin_unlock_irqrestore(adapter-tx_lock, flags);
return NETDEV_TX_BUSY;
}
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Two Dual Core processors and NICS (not handling interrupts on one CPU/assigning a Two Dual Core processors and NICS (not handling interrupts on one CPU / assigning a CPU to a NIC)

2007-01-15 Thread Auke Kok

Mark Ryden wrote:

Hello,


I have a machine with 2 dual core CPUs. This machine runs Fedora Core 6.
I have two Intel e1000 GigaBit network cards on this machine; I use 
bonding so

that the machine assigns the same IP address to both NICs ;
It seems to me that bonding is configured OK, bacuse when running:
cat /proc/net/bonding/bond0
I get:

Ethernet Channel Bonding Driver: v3.0.3 (March 23, 2006)

Bonding Mode: load balancing (round-robin)
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0

Slave Interface: eth0
MII Status: up
Link Failure Count: 1
Permanent HW addr: .

Slave Interface: eth1
MII Status: up
Link Failure Count: 1
Permanent HW addr: 

(And the Permanent HW addr is diffenet in these two entries).

I send a large amount of packets to this machine (more than 20,000 in
a second).

cat /proc/interrupts shops something like this:
CPU0   CPU1 CPU2 CPU3
50:3359337  0  0  0 PCI-MSI  eth0
58: 493396136  0  0 PCI-MSI  eth1

CPU0 and CPU1 are of the first CPU as far as I understand ; so
this means as far as I understand that the second CPU (which has CPU3
and CPU4) does not handle
interrupts of the arrived packets; Can I somehow change it so the second
CPU will also handle network interrupts of receiving packets on the nic ?

Can I assign one CPU to  eth0 and the second CPU to eth1  ?


you will most likely have better performance from the shared cache on the core 2 duo by 
keeping it the way that it is right now - packets that need to transverse the bridge now 
make the cpus happy because after receive the sending NIC already has the data in it's 
cache. Moving one of the NICs over to cpu2/cpu3 would cause a cascade of cache misses 
for every packet that passes across the two nics in the bridge.


Cheers,

Auke
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


sky2: transmit timed out...

2007-01-15 Thread Daniel J Blueman

Stephen,

After some days of uptime, I've been seeing 'transmit timed out'
messages [1]. Let me know if there is any useful debugging you'd like.

--- [1]

sky2 v1.10 addr 0xdfb0 irq 16 Yukon-EC (0xb6) rev 1
sky2 eth1: addr 00:03:2d:05:9c:27
sky2 lan0: enabling interface
sky2 lan0: Link is up at 1000 Mbps, full duplex, flow control both
[snip]
NETDEV WATCHDOG: lan0: transmit timed out
sky2 lan0: tx timeout
sky2 lan0: transmit ring 464 .. 441 report=466 done=466
sky2 status report lost?
NETDEV WATCHDOG: lan0: transmit timed out
sky2 lan0: tx timeout
sky2 lan0: transmit ring 466 .. 441 report=466 done=466
sky2 hardware hung? flushing
NETDEV WATCHDOG: lan0: transmit timed out
sky2 lan0: tx timeout
sky2 lan0: transmit ring 441 .. 418 report=466 done=466
sky2 status report lost?
NETDEV WATCHDOG: lan0: transmit timed out
sky2 lan0: tx timeout
sky2 lan0: transmit ring 466 .. 443 report=466 done=466
sky2 hardware hung? flushing
NETDEV WATCHDOG: lan0: transmit timed out
sky2 lan0: tx timeout
sky2 lan0: transmit ring 443 .. 420 report=466 done=466
sky2 status report lost?
NETDEV WATCHDOG: lan0: transmit timed out
sky2 lan0: tx timeout
sky2 lan0: transmit ring 466 .. 443 report=466 done=466
sky2 hardware hung? flushing
NETDEV WATCHDOG: lan0: transmit timed out
sky2 lan0: tx timeout
sky2 lan0: transmit ring 443 .. 420 report=466 done=466
sky2 status report lost?
NETDEV WATCHDOG: lan0: transmit timed out
sky2 lan0: tx timeout
sky2 lan0: transmit ring 466 .. 443 report=466 done=466
sky2 hardware hung? flushing
NETDEV WATCHDOG: lan0: transmit timed out
sky2 lan0: tx timeout
sky2 lan0: transmit ring 443 .. 420 report=466 done=466
sky2 status report lost?
NETDEV WATCHDOG: lan0: transmit timed out
sky2 lan0: tx timeout
sky2 lan0: transmit ring 466 .. 443 report=466 done=466
sky2 hardware hung? flushing
NETDEV WATCHDOG: lan0: transmit timed out
sky2 lan0: tx timeout
sky2 lan0: transmit ring 443 .. 420 report=466 done=466
sky2 status report lost?
NETDEV WATCHDOG: lan0: transmit timed out
sky2 lan0: tx timeout
sky2 lan0: transmit ring 466 .. 443 report=466 done=466
sky2 hardware hung? flushing
NETDEV WATCHDOG: lan0: transmit timed out
sky2 lan0: tx timeout
sky2 lan0: transmit ring 443 .. 420 report=466 done=466
sky2 status report lost?
NETDEV WATCHDOG: lan0: transmit timed out
sky2 lan0: tx timeout
sky2 lan0: transmit ring 466 .. 443 report=466 done=466
sky2 hardware hung? flushing
NETDEV WATCHDOG: lan0: transmit timed out
sky2 lan0: tx timeout
sky2 lan0: transmit ring 443 .. 420 report=466 done=466
sky2 status report lost?
NETDEV WATCHDOG: lan0: transmit timed out
sky2 lan0: tx timeout
sky2 lan0: transmit ring 466 .. 443 report=466 done=466
sky2 hardware hung? flushing

--
Daniel J Blueman
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 3c59x.c patch to 2.6.18 fixing Wake on Lan (WOL)

2007-01-15 Thread Harry Coin

At 11:00 AM 1/15/2007 -0500, Dan Williams wrote:

On Mon, 2007-01-15 at 09:12 -0600, Harry Coin wrote:
 Hello all.

 The 3c59x.c in kernel 2.6.18 (and as I see later ones too) attempts
 to enable PME from the already awake D0 state. The PME config space on 
Dell

 Optiplexs
 for this chip has a zero in the capabilities for this bit-- no 'wake 
from d0'.


 The pci_enable_wake in 2.6.18 tests the capabilities before enabling PME,
 so the pci_wake call fails, its result is not tested, so no error is 
reported.


 The routine changes the wake request from 0 to D3_hot.  This fix
 causes wake on lan (WOL) to work properly on older Dell Optiplex models.

 Kindly overlook newbie mistakes.  Thank you.

You'll want to include a line like:

Signed-off-by: Harry Coin your email here

which signifies that you are legally able to contribute the attached
patch under the GPL license.  Do this right before the start of the
patch (where you put your signature in the previous mail).

Dan


 Thank you.  I've added it to a repeat of the original posting copied below.




 --- drivers-orig/3c59x.c2007-01-15 00:03:52.0 -0600
 +++ drivers-fixed/3c59x.c   2007-01-15 00:46:37.0 -0600
 @@ -3090,8 +3090,8 @@
   /* Set Wake-On-LAN mode and put the board into D3 (power-down) state. */
   static void acpi_set_WOL(struct net_device *dev)
   {
 -   struct vortex_private *vp = netdev_priv(dev);
 -   void __iomem *ioaddr = vp-ioaddr;
 +  struct vortex_private *vp = netdev_priv(dev);
 +  void __iomem *ioaddr = vp-ioaddr;

  if (vp-enable_wol) {
  /* Power up on: 1==Downloaded Filter, 2==Magic Packets,
 4==Link Status. */
 @@ -3101,7 +3101,7 @@
  iowrite16(SetRxFilter|RxStation|RxMulticast|RxBroadcast,
 ioaddr + EL3_CMD);
  iowrite16(RxEnable, ioaddr + EL3_CMD);

 -   pci_enable_wake(VORTEX_PCI(vp), 0, 1);
 +   pci_enable_wake(VORTEX_PCI(vp),PCI_D3hot,1);

  /* Change the power state to D3; RxEnable doesn't take
 effect. */
  pci_set_power_state(VORTEX_PCI(vp), PCI_D3hot);

 Harry Coin
 Bettendorf, Iowa

Signed-off-by: Harry Coin [EMAIL PROTECTED]


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


e100.c patch to 2.6.18 fixing Wake on Lan (WOL)

2007-01-15 Thread Harry Coin

Hello from Iowa.

Below please find a fix to the Wake On Lan function in the e100.c (intel 
10/100) driver.   With the original driver distributed in kernel 2.6.18 in 
debian etch, wake on lan did not work.   This was tested on 14 dell 
optiplexes with built-in ethernet chips in a totally diskless environment 
(initramfs / pxelinux).  All operations were normal save wake on lan.


When WOL has been enabled with ethtools, the old driver assumes wrongly 
that e100_configure will be called at least once with !netif_running.  Only 
in that instance will it set the chip to notice 'magic' wol packets if the 
ethtools -s wol g has been called prior.


The old e100_down routine never does call e100_configure so that the driver 
never does  turn off the 'disable WOL magic packet' bit.
Neither does the .shutdown routine.   This fix tries to only enable the WOL 
recognition only when e100_down is called for the last time
before module unload or system shutdown, while leaving ifconfig down 
untouched.(testing for being run in the context of dev-stop).


Notice that the hw_reset routine is called in the old e100_down, and that 
silently causes WOL to be reset.  In some attempt to avoid this debian (and 
I don't know which other sysvinit tools) added a NETDOWN define to the 
/etc/init.d/halt script, which when changed from the default=yes to 'no' 
avoids the -i option to 'halt' leaving the e100 configured.


With the below fix the default in /etc/init.d/halt is required, the define 
change is not necessary, in fact it is important that halt call down for 
wol to work.  (In the case of the old e100 driver it didn't matter either 
way, as e100_configure was never called once the driver was stopped).


Notice that the binary /sbin/halt in debain etch has a bug and in fact 
never does call ifdown, whether -i is or isn't specified.   Compiling from 
the source by hand does work.  I have submitted a bug report for this.


A further e100 fix I didn't add was for .shutdown to check whether the 
driver was down and to call e100_down if it was still up.   That added fix 
would make sure WOL would work no matter if the halt script did or didn't 
down the driver before system shutdown.I'm not sure what the 
implications of my fix are in the context of sleep /resume.


I have also submitted the above to the e1000 group at intel privately as 
they are the 'maintainers', but this appears to be the only apropos open 
group I thought to note he here as well.


Thanks


Harry Coin
N4 Communications
Bettendorf, Iowa
Signed-off-by: Harry Coin [EMAIL PROTECTED]



--- drivers-orig/e100.c 2007-01-15 00:01:48.0 -0600
+++ drivers-fixed/e100.c2007-01-14 23:32:08.0 -0600
@@ -2088,10 +2088,26 @@
 static void e100_down(struct nic *nic)
 {
-   /* wait here for poll to complete */
-   netif_poll_disable(nic-netdev);
-   netif_stop_queue(nic-netdev);
-   e100_hw_reset(nic);
+if ((!netif_running(nic-netdev))  (nic-flags  wol_magic)) {
+  /* if this is a device close, and not an ifdown, and wol is enabled, */
+  /* then turn off the bit disabling wol magic packet recognition on */
+  /* the chip.  Previously, WOL magic packet recognition was never */
+  /* enabled as e100_down never called e100_configure when */
+  /* nif_running was false.   So: */
+  /* This makes the e100 not only work with WOL, but */
+  /* also avoids having to edit the default NETDOWN variable */
+  /* in /etc/init.d/halt from the default 'yes' to 'no'. */
+  e100_exec_cb(nic, NULL, e100_configure);
+ /* wait here for poll to complete */
+ netif_poll_disable(nic-netdev);
+ netif_stop_queue(nic-netdev);
+  e100_disable_irq(nic);
+} else {
+ /* wait here for poll to complete */
+ netif_poll_disable(nic-netdev);
+ netif_stop_queue(nic-netdev);
+ e100_hw_reset(nic);
+   }
free_irq(nic-pdev-irq, nic-netdev);
del_timer_sync(nic-watchdog);
netif_carrier_off(nic-netdev);
@@ -2099,6 +2115,7 @@
e100_rx_clean_list(nic);
 }
+
 static void e100_tx_timeout(struct net_device *netdev)
 {
struct nic *nic = netdev_priv(netdev);


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: e100.c patch to 2.6.18 fixing Wake on Lan (WOL)

2007-01-15 Thread Auke Kok

Harry Coin wrote:

Hello from Iowa.

Below please find a fix to the Wake On Lan function in the e100.c (intel 
10/100) driver.   With the original driver distributed in kernel 2.6.18 
in debian etch, wake on lan did not work.   This was tested on 14 dell 
optiplexes with built-in ethernet chips in a totally diskless 
environment (initramfs / pxelinux).  All operations were normal save 
wake on lan.


Oi,

I've done quite a bit of work especially on that since 2.6.18 and as far as I could see 
those changes fixed WoL, suspend/resume and netconsole, as was confirmed by Andrew 
Morton even.


Have you tried the version in 2.6.19?

When WOL has been enabled with ethtools, the old driver assumes wrongly 
that e100_configure will be called at least once with !netif_running.  
Only in that instance will it set the chip to notice 'magic' wol packets 
if the ethtools -s wol g has been called prior.


The old e100_down routine never does call e100_configure so that the 
driver never does  turn off the 'disable WOL magic packet' bit.
Neither does the .shutdown routine.   This fix tries to only enable the 
WOL recognition only when e100_down is called for the last time
before module unload or system shutdown, while leaving ifconfig down 
untouched.(testing for being run in the context of dev-stop).


that's exactly what my patches should fix as far as I can remember

I have also submitted the above to the e1000 group at intel privately as 
they are the 'maintainers', but this appears to be the only apropos open 
group I thought to note he here as well.


I have not seen this patch before, care to Cc me to that? We also publically discuss 
e1000/e100 and ixgb issues on [EMAIL PROTECTED] Feel free to Cc that list.


Cheers,

Auke
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: sky2: transmit timed out...

2007-01-15 Thread Stephen Hemminger
Please reproduce problem with this patch, then do:
cat /proc/sys/net/sky2/lan0

This patch (which shouldn't go into the mainline driver), adds a debug
interface to sky2 driver to dump the receive and transmit rings.
The file /proc/net/sky2/ethX will show the status of transmits in process,
status responses not handled, and receives pending.

---
 drivers/net/sky2.c |  158 +++--
 drivers/net/sky2.h |4 +
 2 files changed, 157 insertions(+), 5 deletions(-)

--- sky2-2.6.orig/drivers/net/sky2.c2007-01-11 10:05:09.0 -0800
+++ sky2-2.6/drivers/net/sky2.c 2007-01-11 10:23:01.0 -0800
@@ -38,6 +38,7 @@
 #include linux/workqueue.h
 #include linux/if_vlan.h
 #include linux/prefetch.h
+#include linux/proc_fs.h
 #include linux/mii.h
 
 #include asm/irq.h
@@ -866,10 +867,11 @@
 
 /* Build description to hardware for one possibly fragmented skb */
 static void sky2_rx_submit(struct sky2_port *sky2,
-  const struct rx_ring_info *re)
+  struct rx_ring_info *re)
 {
int i;
 
+   re-idx = sky2-rx_put;
sky2_rx_add(sky2, OP_PACKET, re-data_addr, sky2-rx_data_size);
 
for (i = 0; i  skb_shinfo(re-skb)-nr_frags; i++)
@@ -1462,6 +1464,7 @@
}
 
le-ctrl |= EOP;
+   re-idx = le - sky2-tx_le; /* debug */
 
if (tx_avail(sky2) = MAX_SKB_TX_LE)
netif_stop_queue(dev);
@@ -3296,6 +3299,139 @@
.get_perm_addr  = ethtool_op_get_perm_addr,
 };
 
+
+static struct proc_dir_entry *sky2_proc;
+
+static int sky2_seq_show(struct seq_file *seq, void *v)
+{
+   struct net_device *dev = seq-private;
+   const struct sky2_port *sky2 = netdev_priv(dev);
+   const struct sky2_hw *hw = sky2-hw;
+   unsigned port = sky2-port;
+   unsigned idx, ridx, rend, last;
+
+   last = sky2_read16(hw, STAT_PUT_IDX);
+
+   if (hw-st_idx == last)
+   seq_puts(seq, Status ring (empty)\n);
+   else {
+   seq_puts(seq, Status ring\n);
+   for (idx = hw-st_idx; idx != last;
+idx = RING_NEXT(idx, STATUS_RING_SIZE)) {
+   const struct sky2_status_le *le = hw-st_le + idx;
+   seq_printf(seq, [%d] %#x %d %#x\n,
+  idx, le-opcode, le-length, le-status);
+   }
+   }
+
+   if (sky2-tx_cons == sky2-tx_prod)
+   seq_puts(seq, \nTx ring (empty)\n);
+   else {
+   seq_puts(seq, \nTx ring\n);
+   idx = sky2-tx_cons;
+   while (idx != sky2-tx_prod) {
+   const struct tx_ring_info *re = sky2-tx_ring + idx;
+
+   seq_printf(seq, [%d] %p\n, idx, re-skb);
+   do {
+   const struct sky2_tx_le *le = sky2-tx_le + idx;
+   seq_printf(seq, \t%#x %d, le-opcode, 
le-addr);
+   idx = RING_NEXT(idx, TX_RING_SIZE);
+   } while (idx != re-idx || idx != sky2-tx_prod);
+   seq_putc(seq, '\n');
+   }
+   }
+
+   seq_printf(seq, \nRx pending hw get=%d put=%d last=%d\n,
+  sky2_read16(hw, Y2_QADDR(rxqaddr[port], PREF_UNIT_GET_IDX)),
+  last = sky2_read16(hw, Y2_QADDR(rxqaddr[port], 
PREF_UNIT_PUT_IDX)),
+  sky2_read16(hw, Y2_QADDR(rxqaddr[port], 
PREF_UNIT_LAST_IDX)));
+
+   ridx = sky2-rx_next;
+   do {
+   const struct rx_ring_info *re = sky2-rx_ring + ridx;
+   seq_printf(seq, [%d] %p |, ridx, re-skb);
+
+   idx = re-idx;
+   ridx = (ridx + 1) % sky2-rx_pending;
+
+   if (ridx == sky2-rx_next)
+   rend = last;
+   else
+   rend = sky2-rx_ring[ridx].idx;
+
+   do {
+   const struct sky2_rx_le *le = sky2-rx_le + idx;
+
+   switch (le-opcode  ~HW_OWNER) {
+   case OP_PACKET:
+   case OP_BUFFER:
+   seq_printf(seq,  %#x(%d), le-addr, 
le-length);
+   break;
+   case OP_ADDR64:
+   seq_printf(seq,  %#x:, le-addr);
+   break;
+   default:
+   seq_printf(seq,  {%x} %#x(%d),
+  le-opcode, le-addr, le-length);
+   }
+
+   } while ((idx = RING_NEXT(idx, RX_LE_SIZE)) != rend);
+
+   seq_puts(seq, \n);
+   } while (ridx != sky2-rx_next);
+
+   return 0;
+}
+
+static int sky2_proc_open(struct inode *inode, struct file  *file)
+{
+   return single_open(file, sky2_seq_show, PDE(inode)-data);
+}
+
+static const struct file_operations sky2_proc_fops = {
+ 

Re: [patch 0/6] sky2 driver update (v1.11)

2007-01-15 Thread Stephen Hemminger
On Sat, 13 Jan 2007 14:03:29 +0100
Tino Keitel [EMAIL PROTECTED] wrote:

 On Tue, Jan 02, 2007 at 20:10:15 +0100, Tino Keitel wrote:
 
 [...]
 
  Btw., I just built 2.6.20-rc3 with patches 4 and 5 and wake on LAN now
  works. Thanks for your work.
 
 Hi,
 
 I had some failures during resume from suspend with 2.6.20-rc3 and
 -rc4. I enabled pm_trace and it looks like the sky2 driver is the
 culprit:
 
 hash matches drivers/base/power/resume.c:56
 hash matches device :01:00.0
 
 $ lspci | grep 01:00.0
 01:00.0 Ethernet controller: Marvell Technology Group Ltd. 88E8053
 PCI-E Gigabit Ethernet Controller (rev 22)
 
 I removed the patches and had no resume failure so far.
 
 Regards,
 Tino
 

What kind of failures, did the system just not come up?
Did you have WOL enabled or not?

The new code checks for pci_ errors on resume and it could
be that the errors were always there.

-- 
Stephen Hemminger [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: rare bad TCP checksum with 2.6.19?

2007-01-15 Thread Eric Dumazet

Michael Tokarev a e'crit :


Any idea how to force sending FIN-with-data?


int flag_on = 1;
setsockopt(fd, SOL_TCP, TCP_CORK, flag_on, sizeof(int));
send(fd, data, datalen, 0);
close(fd);


Eric Dumazet
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 0/6] sky2 driver update (v1.11)

2007-01-15 Thread Tino Keitel
On Mon, Jan 15, 2007 at 10:21:49 -0800, Stephen Hemminger wrote:
 On Sat, 13 Jan 2007 14:03:29 +0100
 Tino Keitel [EMAIL PROTECTED] wrote:
 
  On Tue, Jan 02, 2007 at 20:10:15 +0100, Tino Keitel wrote:
  
  [...]
  
   Btw., I just built 2.6.20-rc3 with patches 4 and 5 and wake on LAN now
   works. Thanks for your work.
  
  Hi,
  
  I had some failures during resume from suspend with 2.6.20-rc3 and
  -rc4. I enabled pm_trace and it looks like the sky2 driver is the
  culprit:
  
  hash matches drivers/base/power/resume.c:56
  hash matches device :01:00.0
  
  $ lspci | grep 01:00.0
  01:00.0 Ethernet controller: Marvell Technology Group Ltd. 88E8053
  PCI-E Gigabit Ethernet Controller (rev 22)
  
  I removed the patches and had no resume failure so far.
  
  Regards,
  Tino
  
 
 What kind of failures, did the system just not come up?

Yes, screen stayed dark and machine was dead. However, it was hardly
reproducable. I set up a suspend/resume loop for an hour without
failures. Then, when I just wanted to suspend for a while, resume
failed.

 Did you have WOL enabled or not?

I had WOL enabled.

Regards,
Tino
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: e100.c patch to 2.6.18 fixing Wake on Lan (WOL)

2007-01-15 Thread Harry Coin

At 10:19 AM 1/15/2007 -0800, Auke Kok wrote:

Have you tried the version in 2.6.19?


I even tried copying and pasting the e100_down and the latest PM stuff from 
the newest e100.c version on sourceforge.   I admit to being defeated as to 
how to join a sourceforge group.   Too many hours writing Microsoft 
drivers  maybe?


It comes down to this:

1)  The e100_configure command is the only place that turns off the WOL 
disable bit.


2)  That bit is only turned off if e100_configure is called after 
netif_running is false and wol is set.


3)  e100_configure is not called at any point after dev-stop (the first 
moment netif_running is false) through the end of .shutdown.  Therefore WOL 
disable is always turned on, no matter the request by ethtools.


I sense there is a sense that if pci_enable_wake has been called properly, 
then all's well.   But on this board, there is a configuration bit that 
also has to be disabled, a but that is silently reset during a hw_reset, 
and hw_reset   __is__ called in e100_down.


Hence, the fix I submitted.   I know it isn't perfect because I'm not 
intimately familiar with the dynamics of this chip.   But I do know this:


14 Dell Optiplex systems failed to WOL with the stock 2.6.18 distributed 
with debian etch.   After my patch is applied to e100.c, and no other 
changes from anything default in 2.6.18 and debian etch, it works perfectly 
every time.   I should have added that ACPI and lapic are in use, but 
that's the usual case.


Cheers,

Harry Coin
N4 Communications
Bettendorf, Iowa

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Remove CONFIG_NET_WIRELESS

2007-01-15 Thread Maarten Lankhorst
Jiri Benc schreef:
 On Mon, 15 Jan 2007 13:31:06 +, Johannes Berg wrote:
   
 On Mon, 2007-01-15 at 13:55 +0100, Maarten Lankhorst wrote:
 
 Enabling this doesn't cause anything to fail, but my wireless router
 doesn't have a pci bus, but instead a native SSB, so CONFIG_NET_WIRELESS
 isn't selected. This in turn causes wext-common.o to not be built, so I
 get missing symbols and a build breakage. That's why I made
 wext-common.o depend on CONFIG_WIRELESS_EXT instead of
 CONFIG_NET_WIRELESS. Since nothing else uses CONFIG_NET_WIRELESS I
 decided to kill that symbol.
   
 Ok, that makes sense to me. Let's put this in but with this better
 description rather than the original one.
 

 The original mail with patch apparently didn't get to netdev (I haven't
 received it and it's not in netdev archive). Maarten, could you resend
 it please?

 Thanks,

  Jiri

   
Sorry, must have missed sending it to netdev, original message follows.

Remove CONFIG_NET_WIRELESS
Nothing uses this, and it breaks the kernel build if a wireless device is used 
with a unsupported type of bus.
Verified this with a grep.

Signed-off-by: Maarten Lankhorst [EMAIL PROTECTED]

diff --git a/drivers/net/wireless/Kconfig b/drivers/net/wireless/Kconfig
index 03dbe60..b9620c6 100644
--- a/drivers/net/wireless/Kconfig
+++ b/drivers/net/wireless/Kconfig
@@ -544,11 +544,5 @@ source drivers/net/wireless/zd1211rw/Kc
 
 source drivers/net/wireless/d80211/Kconfig
 
-# yes, this works even when no drivers are selected
-config NET_WIRELESS
-   bool
-   depends on NET_RADIO  (ISA || PCI || PPC_PMAC || PCMCIA)
-   default y
-
 endmenu
 
diff --git a/net/wireless/Makefile b/net/wireless/Makefile
index f285440..44ae23a 100644
--- a/net/wireless/Makefile
+++ b/net/wireless/Makefile
@@ -12,5 +12,5 @@ obj-ny :=
 
 # this needs to be compiled in...
 obj-$(CONFIG_CFG80211_WEXT_COMPAT) += wext-compat.o
-obj-$(CONFIG_CFG80211_WEXT_COMPAT)$(CONFIG_NET_WIRELESS) += wext-common.o
+obj-$(CONFIG_CFG80211_WEXT_COMPAT)$(CONFIG_WIRELESS_EXT) += wext-common.o
 obj-y += $(obj-yy) $(obj-yn) $(obj-ny)

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: e100.c patch to 2.6.18 fixing Wake on Lan (WOL)

2007-01-15 Thread Auke Kok

Harry Coin wrote:

At 10:19 AM 1/15/2007 -0800, Auke Kok wrote:

Have you tried the version in 2.6.19?


I even tried copying and pasting the e100_down and the latest PM stuff 
from the newest e100.c version on sourceforge.   I admit to being 
defeated as to how to join a sourceforge group.   Too many hours writing 
Microsoft drivers  maybe?


the list is open to posting, so that's fairly easy.


It comes down to this:

1)  The e100_configure command is the only place that turns off the WOL 
disable bit.


2)  That bit is only turned off if e100_configure is called after 
netif_running is false and wol is set.


3)  e100_configure is not called at any point after dev-stop (the first 
moment netif_running is false) through the end of .shutdown.  Therefore 
WOL disable is always turned on, no matter the request by ethtools.


I sense there is a sense that if pci_enable_wake has been called 
properly, then all's well.   But on this board, there is a configuration 
bit that also has to be disabled, a but that is silently reset during a 
hw_reset, and hw_reset   __is__ called in e100_down.


Hence, the fix I submitted.   I know it isn't perfect because I'm not 
intimately familiar with the dynamics of this chip.   But I do know this:


14 Dell Optiplex systems failed to WOL with the stock 2.6.18 distributed 
with debian etch.   After my patch is applied to e100.c, and no other 
changes from anything default in 2.6.18 and debian etch, it works 
perfectly every time.   I should have added that ACPI and lapic are in 
use, but that's the usual case.


okay, I don't necesary meant that your patch is incorrect, however we need to make sure 
that your patch doesn't break 2.6.19, because that code is already upstream.


on top of that, both patches might be needed, and I suspect that is the case to keep 
suspend and netconsole to keep working, so I would still like to ask you to test 2.6.19, 
with and without your patch.


I'll do the same here and push your patch to Garzik if (after testing) we're both OK 
with it.


Thanks,

Auke
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH]: 8139cp: Don't blindly enable interrupts in cp_start_xmit

2007-01-15 Thread Francois Romieu
Chris Lalancette [EMAIL PROTECTED] :
[...]
  Similar to this commit:
 
 http://kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=d15e9c4d9a75702b30e00cdf95c71c88e3f3f51e
 
 It's not safe in cp_start_xmit to blindly call spin_lock_irq and then
 spin_unlock_irq, since it may very well be the case that cp_start_xmit
 was called with interrupts already disabled (I came across this bug in
 the context of netdump in RedHat kernels, but the same issue holds, for
 example, in netconsole).  Therefore, replace all instances of spin_lock_irq
 and spin_unlock_irq with spin_lock_irqsave and spin_unlock_irqrestore,
 respectively, in cp_start_xmit().  I tested this against a fully-virtualized
 Xen guest, which happens to use the 8139cp driver to talk to the emulated
 hardware.  I don't have a real piece of 8139cp hardware to test on, so
 someone else will have to do that.

(message reformated to fit in 80 columns, please fix your mailer)

As I understand http://lkml.org/lkml/2006/12/12/239, something like the
patch below should had been sent instead. Herbert, ack/nak ?

diff --git a/net/core/netpoll.c b/net/core/netpoll.c
index 823215d..ff95641 100644
--- a/net/core/netpoll.c
+++ b/net/core/netpoll.c
@@ -55,7 +55,6 @@ static void queue_process(struct work_struct *work)
struct netpoll_info *npinfo =
container_of(work, struct netpoll_info, tx_work.work);
struct sk_buff *skb;
-   unsigned long flags;
 
while ((skb = skb_dequeue(npinfo-txq))) {
struct net_device *dev = skb-dev;
@@ -65,19 +64,16 @@ static void queue_process(struct work_struct *work)
continue;
}
 
-   local_irq_save(flags);
netif_tx_lock(dev);
if (netif_queue_stopped(dev) ||
dev-hard_start_xmit(skb, dev) != NETDEV_TX_OK) {
skb_queue_head(npinfo-txq, skb);
netif_tx_unlock(dev);
-   local_irq_restore(flags);
 
schedule_delayed_work(npinfo-tx_work, HZ/10);
return;
}
netif_tx_unlock(dev);
-   local_irq_restore(flags);
}
 }
 
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: rare bad TCP checksum with 2.6.19?

2007-01-15 Thread Herbert Xu
On Mon, Jan 15, 2007 at 04:34:41PM +0300, Michael Tokarev wrote:
 
 # ethtool -k eth0
 Offload parameters for eth0:
 Cannot get device rx csum settings: Operation not supported
 Cannot get device tx csum settings: Operation not supported
 Cannot get device scatter-gather settings: Operation not supported
 Cannot get device tcp segmentation offload settings: Operation not supported
 no offload info available
 
 # ethtool -K eth0 rx off tx off tso off
 Cannot set device rx csum settings: Operation not supported
 
 So I guess the problem is not related to hw checksumming offloading.

Nope, it just means that 8139too doesn't provide ethtool handlers to
disable checksum offloading.

So I suggest that you try doing the tcpdump on the receive side as
that should show the real checksum.

BTW, the reason tcpdump only shows some packets with bogus checksums
is because it cuts packets off at 100 bytes by default so for most
packets it can't verify the checksum at all.  If you run it with
-s 1600 you should see bogus checksums on every packet with payload.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED]
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: rare bad TCP checksum with 2.6.19?

2007-01-15 Thread Michael Tokarev
Herbert Xu wrote:
 On Mon, Jan 15, 2007 at 04:34:41PM +0300, Michael Tokarev wrote:
[]
 So I guess the problem is not related to hw checksumming offloading.
 
 Nope, it just means that 8139too doesn't provide ethtool handlers to
 disable checksum offloading.
 
 So I suggest that you try doing the tcpdump on the receive side as
 that should show the real checksum.

I'm doing the capture on an intermediate host - the whole day today ;)

 BTW, the reason tcpdump only shows some packets with bogus checksums
 is because it cuts packets off at 100 bytes by default so for most
 packets it can't verify the checksum at all.  If you run it with
 -s 1600 you should see bogus checksums on every packet with payload.

And I'm capturing with -s 2000.  By the way, tcpdump just does not
verify the cheksum of truncated (due to capture size) packets.  At
least not the version I'm using (which is 3.9.5).

Herbert, the problem IS real, it's not due to some bad behavior due
to improper capturing or something like that.  Yes it's difficult to
come to it, but it is real.

I've saved quite alot of packets today, but it's all quite.. useless
as the thing is difficult to hit.  Here's some traces made with the
following filter:

 proto TCP and tcp[tcpflags]  (tcp-fin|tcp-push) == (tcp-fin|tcp-push)

(I've choosen FIN+PUSH because this combination is where the problem
is seen most - to be fair, it looks like I haven't seen it with other
flags).

In there, some packets are ok, but some are not.  So - again, it seems
like - I was wrong about 100% hit ratio -- ie, that the bad checksum
is ALWAYS the case with packets where some data goes in FIN packets --
this is incorrect, because the trace shows quite a few examples of right
behavior.

The trace is here: http://www.corpit.ru/mjt/bad-tcp-cksum-dmp.bin

(it contains some data which it sholdn't - but I hope there's nothing
confidential in there ;)

So, after the whole day digging around, I still don't have any more-or-less
clean way to reproduce it.  But I've noticied another thing as well: many
different machines here, with different kernels, behave the same way.
So it can't be a hardware problem for example.

And only at VERY rare cases, the thing causes noticeable transfer slowdowns
or stalls.  But some networks triggers those rare cases more often than others
(so the only more or less sane conclusion I can come with is that it's
somehow timing-related).

Thanks!

/mjt
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH]: 8139cp: Don't blindly enable interrupts in cp_start_xmit

2007-01-15 Thread Herbert Xu
On Mon, Jan 15, 2007 at 08:56:35PM +0100, Francois Romieu wrote:
 
 As I understand http://lkml.org/lkml/2006/12/12/239, something like the
 patch below should had been sent instead. Herbert, ack/nak ?

Sorry, what I said in that thread is in error.  Netpoll may
unfortunately call the transmit routine with IRQs off.  So
the drivers can't currently use spin_lock_irq and must save
the current flags instead.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED]
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: rare bad TCP checksum with 2.6.19?

2007-01-15 Thread Herbert Xu
On Tue, Jan 16, 2007 at 12:46:08AM +0300, Michael Tokarev wrote:
 
 I'm doing the capture on an intermediate host - the whole day today ;)

Cool, I was just trying to make sure :)
 
 The trace is here: http://www.corpit.ru/mjt/bad-tcp-cksum-dmp.bin

I'll take a look.

Are you using anything extra like netfilter?

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED]
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: rare bad TCP checksum with 2.6.19?

2007-01-15 Thread Eric Dumazet

Michael Tokarev a écrit :

Eric Dumazet wrote:

Michael Tokarev a e'crit :

Any idea how to force sending FIN-with-data?

int flag_on = 1;
setsockopt(fd, SOL_TCP, TCP_CORK, flag_on, sizeof(int));
send(fd, data, datalen, 0);
close(fd);


That produces two packets - one (or more - depending on the
size) data packet and one FIN packet w/o any data.

This is the first thing I've tried.


This may be because I forgot the shutdown() ?

int flag_on = 1;
setsockopt(fd, SOL_TCP, TCP_CORK, flag_on, sizeof(int));
send(fd, data, datalen, 0);
shutdown(fd, 1);
close(fd);

At least this is working on my machines (with and without shutdown())

Eric
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/4] atl1: Header files for Attansic L1 driver

2007-01-15 Thread Jay Cliburn

Christoph Hellwig wrote:

On Wed, Jan 10, 2007 at 06:41:37PM -0600, Jay Cliburn wrote:



+struct csum_param {
+   unsigned buf_len:14;
+   unsigned dma_int:1;
+   unsigned pkt_int:1;
+   u16 valan_tag;
+   unsigned eop:1;
+   /* command */
+   unsigned coalese:1;
+   unsigned ins_vlag:1;
+   unsigned custom_chksum:1;
+   unsigned segment:1;
+   unsigned ip_chksum:1;
+   unsigned tcp_chksum:1;
+   unsigned udp_chksum:1;
+   /* packet state */
+   unsigned vlan_tagged:1;
+   unsigned eth_type:1;
+   unsigned iphl:4;
+   unsigned:2;
+   unsigned payload_offset:8;
+   unsigned xsum_offset:8;
+} _ATL1_ATTRIB_PACK_;


Bitfields should not be used for hardware datastructures ever.
Please convert this to explicit masking and shifting.




+/* formerly ATL1_WRITE_REG */
+static inline void atl1_write32(const struct atl1_hw *hw, int reg, u32 val)
+{
+writel(val, hw-hw_addr + reg);
+}
+
+/* formerly ATL1_READ_REG */
+static inline u32 atl1_read32(const struct atl1_hw *hw, int reg)
+{
+return readl(hw-hw_addr + reg);
+}


Just kill all these wrappers.  Also you probably want to convert to
pci_iomap + ioread*/iowrite*.


Christoph et al.,

I've incorporated all your comments except the two shown above.  I 
killed the indicated atl1_write*/atl1_read* wrappers, but I'm not yet 
familiar enough with pci_iomap/iowrite*/ioread* to make that particular 
conversion, and I'm having trouble getting the bitfield struct converted 
to shift/mask semantics (No matter how hard I try, I keep breaking the 
transmit side of the adapter).


I'd like to plead for relief on these two items and submit a new version 
of the driver containing all your other comments.  I need help from a 
more experienced netdev hacker, and in my mind, the best way to do that 
is to get the driver in the kernel so more people can use it and 
contribute changes and make improvements.


I welcome any comments on the rationality of this approach.

Jay
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/4] atl1: Header files for Attansic L1 driver

2007-01-15 Thread Francois Romieu
Jay Cliburn [EMAIL PROTECTED] :
[...]
 I welcome any comments on the rationality of this approach.

An URL for the current version of the patch would be welcome too :o)

-- 
Ueimor
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/4] atl1: Header files for Attansic L1 driver

2007-01-15 Thread Jay Cliburn

Francois Romieu wrote:

Jay Cliburn [EMAIL PROTECTED] :
[...]

I welcome any comments on the rationality of this approach.


An URL for the current version of the patch would be welcome too :o)



Sorry.  Forgot to do that.  The current version may be found here:

ftp://hogchain.net/pub/linux/m2v/attansic/kernel_driver/atl1-2.0.4/atl1-2.0.4-linux-2.6.20.rc5.patch.bz2

Jay
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/4] [SCTP]: Set correct error cause value for missing parameters

2007-01-15 Thread David Miller
From: Sridhar Samudrala [EMAIL PROTECTED]
Date: Thu, 11 Jan 2007 11:41:25 -0800

 [SCTP]: Set correct error cause value for missing parameters
 
 sctp_process_missing_param() needs to use the SCTP_ERROR_MISS_PARAM
 error cause value.
 
 Signed-off-by: Vlad Yasevich [EMAIL PROTECTED]
 Signed-off-by: Sridhar Samudrala [EMAIL PROTECTED]

Applied, thank you.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/4] [SCTP]: Verify some mandatory parameters.

2007-01-15 Thread David Miller
From: Sridhar Samudrala [EMAIL PROTECTED]
Date: Thu, 11 Jan 2007 11:41:27 -0800

 [SCTP]: Verify some mandatory parameters.
 
 Verify init_tag and a_rwnd mandatory parameters in INIT and
 INIT-ACK chunks.
 
 Signed-off-by: Vlad Yasevich [EMAIL PROTECTED]
 Signed-off-by: Sridhar Samudrala [EMAIL PROTECTED]

Applied.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] [SCTP]: Correctly handle unexpected INIT-ACK chunk.

2007-01-15 Thread David Miller
From: Sridhar Samudrala [EMAIL PROTECTED]
Date: Thu, 11 Jan 2007 11:41:29 -0800

 [SCTP]: Correctly handle unexpected INIT-ACK chunk.
 
 Consider the chunk as Out-of-the-Blue if we don't have
 an endpoint.  Otherwise discard it as before.
 
 Signed-off-by: Vlad Yasevich [EMAIL PROTECTED]
 Signed-off-by: Sridhar Samudrala [EMAIL PROTECTED]

Applied, thanks.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/4] [SCTP]: Fix SACK sequence during shutdown

2007-01-15 Thread David Miller
From: Sridhar Samudrala [EMAIL PROTECTED]
Date: Thu, 11 Jan 2007 11:41:32 -0800

 [SCTP]: Fix SACK sequence during shutdown
 
 Currently, when association enters SHUTDOWN state,the
 implementation will SACK any DATA first and then transmit
 the SHUTDOWN chunk.  This is against the order required by
 2960bis spec.  SHUTDOWN must always be first, followed by
 SACK. This change forces this order and also enables bundling.
 
 Signed-off-by: Vlad Yasevich [EMAIL PROTECTED]
 Signed-off-by: Sridhar Samudrala [EMAIL PROTECTED]

Also applied, thanks a lot.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: rare bad TCP checksum with 2.6.19?

2007-01-15 Thread Herbert Xu
On Tue, Jan 16, 2007 at 12:46:08AM +0300, Michael Tokarev wrote:
 
 The trace is here: http://www.corpit.ru/mjt/bad-tcp-cksum-dmp.bin

I'm sorry but this dump does NOT look like it was taken from an
intermediate box.  I verified two bad checksums (chosen randomly)
and they were both correct but partial checksums.  This means that
this dump was most likely taken from the sending host.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED]
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] [IrDA] irda-usb TX path optimization

2007-01-15 Thread David Miller
From: Samuel Ortiz [EMAIL PROTECTED]
Date: Mon, 15 Jan 2007 11:15:11 +0200

 Since we stop using dev_alloc_skb on the IrDA TX frame, we constantly run
 into the case of the skb headroom being 0, and thus we call skb_cow for
 every IrDA TX frame.
 This patch uses a local buffer and memcpy the skb to it, saving us a
 kmalloc for each of those IrDA TX frames.
 
 Signed-off-by: Samuel Ortiz [EMAIL PROTECTED]

Applied, thanks.

Technically this is a bug fix too because once an SKB hits the
transmit function it should essentially be immutable, ie. you
shouldn't be writing to it.  tcpdump sniffers could be looking
at the SKB, as one example.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: rare bad TCP checksum with 2.6.19?

2007-01-15 Thread Herbert Xu
On Tue, Jan 16, 2007 at 02:27:39PM +1100, Herbert Xu wrote:
 
 I'm sorry but this dump does NOT look like it was taken from an
 intermediate box.  I verified two bad checksums (chosen randomly)
 and they were both correct but partial checksums.  This means that
 this dump was most likely taken from the sending host.

I did see one strange bit:

02:39:51.758803 IP (tos 0x0, ttl  63, id 41084, offset 0, flags [DF], length: 
102) 192.168.1.1.25  81.13.94.6.21350: FP [bad tcp cksum 81b0 (-9ee8)!] 
4271854025:4271
854075(50) ack 3772789166 win 272 nop,nop,timestamp 145420525 6279830
0x:  4500 0066 a07c 4000 3f06 2a59 c0a8 0101  E..f.|@.?.*Y
0x0010:  510d 5e06 0019 5366 fe9f 51c9 e0e0 31ae  Q.^...Sf..Q...1.
0x0020:  8019 0110 81b0  0101 080a 08aa f0ed  
0x0030:  005f d296 3235 3020 322e 302e 3020 4f6b  ._..250.2.0.0.Ok
0x0040:  3a20 7175 6575 6564 2061 7320 3631 3345  :.queued.as.613E
0x0050:  4137 4637 440d 0a32 3231 2032 2e30 2e30  A7F7D..221.2.0.0
0x0060:  2042 7965 0d0a   .Bye..

Most of the bad checksums are from 81.13.94.6, which I presume is
the host you were dumping on.  However, this packet is destined
for it instead and yet it too has a partial (but correct) checksum.

So the question is where in your network is 192.168.1.1 and how is
your network setup in terms of NAT?

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED]
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] [IrDA] Removed incorrect IRDA_ASSERT()

2007-01-15 Thread David Miller
From: Samuel Ortiz [EMAIL PROTECTED]
Date: Mon, 15 Jan 2007 11:15:42 +0200

 With USB2.0 bulk out MTU can be 512 bytes, so checking it only for 64 bytes is
 incorrect.
 
 Signed-off-by: Samuel Ortiz [EMAIL PROTECTED]

Applied, thanks a lot.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] [IrDA] irda-usb TX path optimization

2007-01-15 Thread Herbert Xu
David Miller [EMAIL PROTECTED] wrote:
 
 Technically this is a bug fix too because once an SKB hits the
 transmit function it should essentially be immutable, ie. you
 shouldn't be writing to it.  tcpdump sniffers could be looking
 at the SKB, as one example.

We do have a way around that with skb_header_cloned.  In fact
it looks like VLAN should use it as otherwise TCP packets will
get copied unnecessarily.

This is still not optimal for AF_PACKET users since they will
still cause things like VLANs to do the copy even when it isn't
necessary because it doesn't touch any part of the packet that
AF_PACKET actually looks at.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED]
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] [IPV6] fixed the size of the netlink message notified by inet6_rt_notify().

2007-01-15 Thread Noriaki TAKAMIYA
Hi,

  I think the return value of rt6_nlmsg_size() should includes the
  amount of RTA_METRICS.

  Regards,

---
 net/ipv6/route.c |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 8c3d568..5f0043c 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -2017,6 +2017,7 @@ static inline size_t rt6_nlmsg_size(void
   + nla_total_size(4) /* RTA_IIF */
   + nla_total_size(4) /* RTA_OIF */
   + nla_total_size(4) /* RTA_PRIORITY */
+  + RTAX_MAX * nla_total_size(4) /* RTA_METRICS */
   + nla_total_size(sizeof(struct rta_cacheinfo));
 }
 
-- 
1.4.4

--
Noriaki TAKAMIYA
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] [IPV6] fixed the size of the netlink message notified by inet6_rt_notify().

2007-01-15 Thread Noriaki TAKAMIYA
Hi,

  I'm sorry to re-send...

  I think the return value of rt6_nlmsg_size() should includes the
  amount of RTA_METRICS.

  Regards,

Signed-off-by: Noriaki TAKAMIYA [EMAIL PROTECTED]
---
 net/ipv6/route.c |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 8c3d568..5f0043c 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -2017,6 +2017,7 @@ static inline size_t rt6_nlmsg_size(void
   + nla_total_size(4) /* RTA_IIF */
   + nla_total_size(4) /* RTA_OIF */
   + nla_total_size(4) /* RTA_PRIORITY */
+  + RTAX_MAX * nla_total_size(4) /* RTA_METRICS */
   + nla_total_size(sizeof(struct rta_cacheinfo));
 }
 
-- 
1.4.4

--
Noriaki TAKAMIYA
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH -mm 0/10][RFC] aio: make struct kiocb private

2007-01-15 Thread Nate Diller
This series is an attempt to generalize the async I/O paths to be
implementation agnostic.  It completely eliminates knowledge of
the kiocb structure in the generic code and makes it private within the
current aio code.  Things get noticeably cleaner without that layering
violation.

The new interface takes a file_endio_t function pointer, and a private data
pointer, which would normally be aio_complete and a kiocb pointer,
respectively.  If the aio submission function gets back EIOCBQUEUED, that is
a guarantee that the endio function will be called, or *already has been
called*.  If the file_endio_t pointer provided to aio_[read|write] is NULL,
the FS must block on I/O completion, then return either the number of bytes
read, or an error.

I had to touch more areas that I had originally expected, so there are
changes in a corner of the socket code, and a slight behavior change in the
direct-io completion path with affects XFS and OCFS2.  I would appreciate
further review there, so I copied some extra people I hope can help.

This patch is against 2.6.20-rc4-mm1.  It has been compile-tested at each
stage.  It needs some runtime testing yet, but I prefer to get it out for
commentary and test later.

These patches are for RFC only and have not yet been signed off.

NATE

---

 Documentation/filesystems/Locking |   11 +
 Documentation/filesystems/vfs.txt |   11 +
 arch/s390/hypfs/inode.c   |   16 +-
 drivers/net/pppoe.c   |8 -
 drivers/net/tun.c |   13 +-
 drivers/usb/gadget/inode.c|  239 +-
 fs/aio.c  |   74 ++-
 fs/bad_inode.c|   10 -
 fs/block_dev.c|  109 +++--
 fs/cifs/cifsfs.c  |   10 -
 fs/compat.c   |   56 
 fs/direct-io.c|   92 --
 fs/ecryptfs/file.c|   16 +-
 fs/ext2/inode.c   |   12 -
 fs/ext3/file.c|9 -
 fs/ext3/inode.c   |   11 -
 fs/ext4/file.c|9 -
 fs/ext4/inode.c   |   11 -
 fs/fat/inode.c|   12 -
 fs/fuse/dev.c |   13 +-
 fs/gfs2/ops_address.c |   14 +-
 fs/hfs/inode.c|   13 --
 fs/hfsplus/inode.c|   13 --
 fs/jfs/inode.c|   12 -
 fs/nfs/direct.c   |   92 +++---
 fs/nfs/file.c |   62 +
 fs/ntfs/file.c|   71 ++-
 fs/ocfs2/aops.c   |   24 +--
 fs/ocfs2/aops.h   |8 -
 fs/ocfs2/file.c   |   44 +++---
 fs/ocfs2/inode.h  |2 
 fs/pipe.c |   12 -
 fs/read_write.c   |  225 ---
 fs/read_write.h   |8 -
 fs/reiserfs/inode.c   |   13 --
 fs/smbfs/file.c   |   28 ++--
 fs/udf/file.c |   13 +-
 fs/xfs/linux-2.6/xfs_aops.c   |   44 +++---
 fs/xfs/linux-2.6/xfs_file.c   |   58 +
 fs/xfs/linux-2.6/xfs_lrw.c|   29 ++--
 fs/xfs/linux-2.6/xfs_lrw.h|   10 -
 fs/xfs/linux-2.6/xfs_vnode.h  |   20 +--
 include/linux/aio.h   |   11 -
 include/linux/fs.h|  114 +-
 include/linux/net.h   |   18 +-
 include/linux/nfs_fs.h|   12 -
 include/net/bluetooth/bluetooth.h |2 
 include/net/inet_common.h |3 
 include/net/scm.h |2 
 include/net/sock.h|   45 +--
 include/net/tcp.h |6 
 include/net/udp.h |3 
 mm/filemap.c  |  109 -
 net/appletalk/ddp.c   |5 
 net/atm/common.c  |6 
 net/atm/common.h  |7 -
 net/ax25/af_ax25.c|7 -
 net/bluetooth/af_bluetooth.c  |4 
 net/bluetooth/hci_sock.c  |7 -
 net/bluetooth/l2cap.c |2 
 net/bluetooth/rfcomm/sock.c   |8 -
 net/bluetooth/sco.c   |3 
 net/core/sock.c   |   12 -
 net/dccp/dccp.h   |8 -
 net/dccp/probe.c  |3 
 net/dccp/proto.c  |7 -
 net/decnet/af_decnet.c|7 -
 net/econet/af_econet.c|7 -
 net/ipv4/af_inet.c|5 
 net/ipv4/raw.c|8 -
 net/ipv4/tcp.c|7 -
 net/ipv4/tcp_probe.c  |3 
 net/ipv4/udp.c|9 -
 net/ipv4/udp_impl.h   |2 
 net/ipv6/raw.c|6 
 net/ipv6/udp.c|   10 -
 net/ipv6/udp_impl.h   |6 
 net/ipx/af_ipx.c  |7 -
 net/irda/af_irda.c|   29 ++--
 net/key/af_key.c 

[PATCH -mm 4/10][RFC] aio: convert aio_complete to file_endio_t

2007-01-15 Thread Nate Diller
Define a new function typedef for I/O completion at the file/iovec level --

typedef void (file_endio_t)(void *endio_data, ssize_t count, int err);

and convert aio_complete and all its callers to this new prototype.

---

 drivers/usb/gadget/inode.c |   24 +++---
 fs/aio.c   |   59 -
 fs/block_dev.c |8 +-
 fs/direct-io.c |   18 +
 fs/nfs/direct.c|9 ++
 include/linux/aio.h|   11 +++-
 include/linux/fs.h |2 +
 7 files changed, 61 insertions(+), 70 deletions(-)

---

diff -urpN -X dontdiff a/drivers/usb/gadget/inode.c b/drivers/usb/gadget/inode.c
--- a/drivers/usb/gadget/inode.c2007-01-12 14:42:29.0 -0800
+++ b/drivers/usb/gadget/inode.c2007-01-12 14:25:34.0 -0800
@@ -559,35 +559,32 @@ static int ep_aio_cancel(struct kiocb *i
return value;
 }
 
-static ssize_t ep_aio_read_retry(struct kiocb *iocb)
+static int ep_aio_read_retry(struct kiocb *iocb)
 {
struct kiocb_priv   *priv = iocb-private;
-   ssize_t len, total;
-   int i;
+   ssize_t total;
+   int i, err = 0;
 
/* we retry to get the right mm context for this: */
 
/* copy stuff into user buffers */
total = priv-actual;
-   len = 0;
for (i=0; i  priv-nr_segs; i++) {
ssize_t this = min((ssize_t)(priv-iv[i].iov_len), total);
 
if (copy_to_user(priv-iv[i].iov_base, priv-buf, this)) {
-   if (len == 0)
-   len = -EFAULT;
+   err = -EFAULT;
break;
}
 
total -= this;
-   len += this;
if (total == 0)
break;
}
kfree(priv-buf);
kfree(priv);
aio_put_req(iocb);
-   return len;
+   return err;
 }
 
 static void ep_aio_complete(struct usb_ep *ep, struct usb_request *req)
@@ -610,9 +607,7 @@ static void ep_aio_complete(struct usb_e
if (unlikely(kiocbIsCancelled(iocb)))
aio_put_req(iocb);
else
-   aio_complete(iocb,
-   req-actual ? req-actual : req-status,
-   req-status);
+   aio_complete(iocb, req-actual, req-status);
} else {
/* retry() won't report both; so we hide some faults */
if (unlikely(0 != req-status))
@@ -702,16 +697,17 @@ ep_aio_read(struct kiocb *iocb, const st
 {
struct ep_data  *epdata = iocb-ki_filp-private_data;
char*buf;
+   size_t  len = iov_length(iov, nr_segs);
 
if (unlikely(epdata-desc.bEndpointAddress  USB_DIR_IN))
return -EINVAL;
 
-   buf = kmalloc(iocb-ki_left, GFP_KERNEL);
+   buf = kmalloc(len, GFP_KERNEL);
if (unlikely(!buf))
return -ENOMEM;
 
iocb-ki_retry = ep_aio_read_retry;
-   return ep_aio_rwtail(iocb, buf, iocb-ki_left, epdata, iov, nr_segs);
+   return ep_aio_rwtail(iocb, buf, len, epdata, iov, nr_segs);
 }
 
 static ssize_t
@@ -726,7 +722,7 @@ ep_aio_write(struct kiocb *iocb, const s
if (unlikely(!(epdata-desc.bEndpointAddress  USB_DIR_IN)))
return -EINVAL;
 
-   buf = kmalloc(iocb-ki_left, GFP_KERNEL);
+   buf = kmalloc(iov_length(iov, nr_segs), GFP_KERNEL);
if (unlikely(!buf))
return -ENOMEM;
 
diff -urpN -X dontdiff a/fs/aio.c b/fs/aio.c
--- a/fs/aio.c  2007-01-12 14:42:29.0 -0800
+++ b/fs/aio.c  2007-01-12 14:29:20.0 -0800
@@ -658,16 +658,16 @@ static inline int __queue_kicked_iocb(st
  * simplifies the coding of individual aio operations as
  * it avoids various potential races.
  */
-static ssize_t aio_run_iocb(struct kiocb *iocb)
+static void aio_run_iocb(struct kiocb *iocb)
 {
struct kioctx   *ctx = iocb-ki_ctx;
-   ssize_t (*retry)(struct kiocb *);
+   int (*retry)(struct kiocb *);
wait_queue_t *io_wait = current-io_wait;
-   ssize_t ret;
+   int err;
 
if (!(retry = iocb-ki_retry)) {
printk(aio_run_iocb: iocb-ki_retry = NULL\n);
-   return 0;
+   return;
}
 
/*
@@ -702,8 +702,8 @@ static ssize_t aio_run_iocb(struct kiocb
 
/* Quit retrying if the i/o has been cancelled */
if (kiocbIsCancelled(iocb)) {
-   ret = -EINTR;
-   aio_complete(iocb, ret, 0);
+   err = -EINTR;
+   aio_complete(iocb, iocb-ki_nbytes - iocb-ki_left, err);
/* must not access the iocb after this */
goto out;
}
@@ -720,17 +720,17 @@ static ssize_t aio_run_iocb(struct kiocb
 */
   

[PATCH -mm 5/10][RFC] aio: make blk_directIO use file_endio_t

2007-01-15 Thread Nate Diller
Convert the internals of blkdev_direct_IO to use a generic endio function,
instead of directly calling aio_complete.  This may also fix some bugs/races
in this code, for instance it checks bio-bi_size instead of assuming it's
zero, and it atomically accumulates the bytes_done counter (assuming that
the bio completion handler can't race with itself *might* be valid here, but
the direct-io code makes no such assumption).  I'm also pretty sure that
the address_space-directIO functions aren't supposed to mess with the
iocb-ki_pos or -ki_left.

---

diff -urpN -X dontdiff a/fs/block_dev.c b/fs/block_dev.c
--- a/fs/block_dev.c2007-01-12 20:26:25.0 -0800
+++ b/fs/block_dev.c2007-01-12 20:23:55.0 -0800
@@ -131,10 +131,32 @@ blkdev_get_block(struct inode *inode, se
return 0;
 }
 
-static int blk_end_aio(struct bio *bio, unsigned int bytes_done, int error)
+struct bdev_aio {
+   atomic_tiocount;/* refcount */
+   atomic_tbytes_done; /* byte counter */
+   int err;/* error handling */
+   file_endio_t*endio; /* end I/O notify fn */
+   void*endio_data;/* notify fn private data */
+};
+
+static void blk_io_put(struct bdev_aio *io)
+{
+   if (!atomic_dec_and_test(io-iocount))
+   return;
+
+   if (!io-endio)
+   return complete((struct completion*)io-endio_data);
+
+   io-endio(io-endio_data, atomic_read(io-bytes_done), io-err);
+   kfree(io);
+}
+
+static int blk_bio_endio(struct bio *bio, unsigned int bytes_done, int error)
 {
-   struct kiocb *iocb = bio-bi_private;
-   atomic_t *bio_count = iocb-ki_bio_count;
+   struct bdev_aio *io = bio-bi_private;
+
+   if (bio-bi_size)
+   return 1;
 
if (bio_data_dir(bio) == READ)
bio_check_pages_dirty(bio);
@@ -143,16 +165,21 @@ static int blk_end_aio(struct bio *bio, 
bio_put(bio);
}
 
-   /* iocb-ki_nbytes stores error code from LLDD */
-   if (error)
-   iocb-ki_nbytes = -EIO;
-
-   if (atomic_dec_and_test(bio_count))
-   aio_complete(iocb, iocb-ki_left, iocb-ki_nbytes);
+   if (error)
+   io-err = error;
+   atomic_add(bytes_done, io-bytes_done);
 
+   blk_io_put(io);
return 0;
 }
 
+static void blk_io_init(struct bdev_aio *io)
+{
+   atomic_set(io-iocount, 1);
+   atomic_set(io-bytes_done, 0);
+   io-err = 0;
+}
+
 #define VEC_SIZE   16
 struct pvec {
unsigned short nr;
@@ -208,24 +235,33 @@ blkdev_direct_IO(int rw, struct kiocb *i
 
unsigned long addr; /* user iovec address */
size_t count;   /* user iovec len */
-   size_t nbytes = iocb-ki_nbytes = iocb-ki_left; /* total xfer size */
+   size_t nbytes;   /* total xfer size */
loff_t size;/* size of block device */
struct bio *bio;
-   atomic_t *bio_count = iocb-ki_bio_count;
+   struct bdev_aio stack_io, *io;
+   file_endio_t *endio = aio_complete;
+   void *endio_data = iocb;
struct page *page;
struct pvec pvec;
 
pvec.nr = 0;
pvec.idx = 0;
 
+   io = stack_io;
+   if (endio) {
+   io = kmalloc(sizeof(struct bdev_aio), GFP_KERNEL);
+   if (!io)
+   return -ENOMEM;
+   }
+   blk_io_init(io);
+
if (pos  blocksize_mask)
return -EINVAL;
 
+   nbytes = iov_length(iov, nr_segs);
size = i_size_read(inode);
-   if (pos + nbytes  size) {
+   if (pos + nbytes  size)
nbytes = size - pos;
-   iocb-ki_left = nbytes;
-   }
 
/*
 * check first non-zero iov alignment, the remaining
@@ -237,7 +273,6 @@ blkdev_direct_IO(int rw, struct kiocb *i
if (addr  blocksize_mask || count  blocksize_mask)
return -EINVAL;
} while (!count  ++seg  nr_segs);
-   atomic_set(bio_count, 1);
 
while (nbytes) {
/* roughly estimate number of bio vec needed */
@@ -248,8 +283,8 @@ blkdev_direct_IO(int rw, struct kiocb *i
/* bio_alloc should not fail with GFP_KERNEL flag */
bio = bio_alloc(GFP_KERNEL, nvec);
bio-bi_bdev = I_BDEV(inode);
-   bio-bi_end_io = blk_end_aio;
-   bio-bi_private = iocb;
+   bio-bi_end_io = blk_bio_endio;
+   bio-bi_private = io;
bio-bi_sector = pos  blkbits;
 same_bio:
cur_off = addr  ~PAGE_MASK;
@@ -289,18 +324,27 @@ same_bio:
/* bio is ready, submit it */
if (rw == READ)
bio_set_pages_dirty(bio);
-   atomic_inc(bio_count);
+   atomic_inc(io-iocount);
submit_bio(rw, bio);
}
 
 completion:
-  

[PATCH -mm 6/10][RFC] aio: make nfs_directIO use file_endio_t

2007-01-15 Thread Nate Diller
This converts the iternals of nfs's directIO support to use a generic endio
function, instead of directly calling aio_complete.  It's pretty easy
because it already has a pretty abstracted completion path.

---

diff -urpN -X dontdiff a/fs/nfs/direct.c b/fs/nfs/direct.c
--- a/fs/nfs/direct.c   2007-01-12 14:53:48.0 -0800
+++ b/fs/nfs/direct.c   2007-01-12 15:02:30.0 -0800
@@ -68,7 +68,6 @@ struct nfs_direct_req {
 
/* I/O parameters */
struct nfs_open_context *ctx;   /* file open context info */
-   struct kiocb *  iocb;   /* controlling i/o request */
struct inode *  inode;  /* target file of i/o */
 
/* completion state */
@@ -77,6 +76,8 @@ struct nfs_direct_req {
ssize_t count,  /* bytes actually processed */
error;  /* any reported error */
struct completion   completion; /* wait for i/o completion */
+   file_endio_t*endio; /* async completion function */
+   void*endio_data;/* private completion data */
 
/* commit state */
struct list_headrewrite_list;   /* saved nfs_write_data structs 
*/
@@ -151,7 +152,7 @@ static inline struct nfs_direct_req *nfs
kref_get(dreq-kref);
init_completion(dreq-completion);
INIT_LIST_HEAD(dreq-rewrite_list);
-   dreq-iocb = NULL;
+   dreq-endio = NULL;
dreq-ctx = NULL;
spin_lock_init(dreq-lock);
atomic_set(dreq-io_count, 0);
@@ -179,7 +180,7 @@ static ssize_t nfs_direct_wait(struct nf
ssize_t result = -EIOCBQUEUED;
 
/* Async requests don't wait here */
-   if (dreq-iocb)
+   if (!dreq-endio)
goto out;
 
result = wait_for_completion_interruptible(dreq-completion);
@@ -194,14 +195,10 @@ out:
return (ssize_t) result;
 }
 
-/*
- * Synchronous I/O uses a stack-allocated iocb.  Thus we can't trust
- * the iocb is still valid here if this is a synchronous request.
- */
 static void nfs_direct_complete(struct nfs_direct_req *dreq)
 {
-   if (dreq-iocb)
-   aio_complete(dreq-iocb, dreq-count, dreq-error);
+   if (dreq-endio)
+   dreq-endio(dreq-endio_data, dreq-count, dreq-error);
 
complete_all(dreq-completion);
 
@@ -332,11 +329,13 @@ static ssize_t nfs_direct_read_schedule(
return result  0 ? (ssize_t) result : -EFAULT;
 }
 
-static ssize_t nfs_direct_read(struct kiocb *iocb, unsigned long user_addr, 
size_t count, loff_t pos)
+static ssize_t nfs_direct_read(struct file *file, unsigned long user_addr,
+  size_t count, loff_t pos,
+  file_endio_t *endio, void *endio_data)
 {
ssize_t result = 0;
sigset_t oldset;
-   struct inode *inode = iocb-ki_filp-f_mapping-host;
+   struct inode *inode = file-f_mapping-host;
struct rpc_clnt *clnt = NFS_CLIENT(inode);
struct nfs_direct_req *dreq;
 
@@ -345,9 +344,9 @@ static ssize_t nfs_direct_read(struct ki
return -ENOMEM;
 
dreq-inode = inode;
-   dreq-ctx = get_nfs_open_context((struct nfs_open_context 
*)iocb-ki_filp-private_data);
-   if (!is_sync_kiocb(iocb))
-   dreq-iocb = iocb;
+   dreq-ctx = get_nfs_open_context((struct nfs_open_context 
*)file-private_data);
+   dreq-endio = endio;
+   dreq-endio_data = endio_data;
 
nfs_add_stats(inode, NFSIOS_DIRECTREADBYTES, count);
rpc_clnt_sigmask(clnt, oldset);
@@ -663,11 +662,13 @@ static ssize_t nfs_direct_write_schedule
return result  0 ? (ssize_t) result : -EFAULT;
 }
 
-static ssize_t nfs_direct_write(struct kiocb *iocb, unsigned long user_addr, 
size_t count, loff_t pos)
+static ssize_t nfs_direct_write(struct file *file, unsigned long user_addr,
+   size_t count, loff_t pos,
+   file_endio_t *endio, void *endio_data)
 {
ssize_t result = 0;
sigset_t oldset;
-   struct inode *inode = iocb-ki_filp-f_mapping-host;
+   struct inode *inode = file-f_mapping-host;
struct rpc_clnt *clnt = NFS_CLIENT(inode);
struct nfs_direct_req *dreq;
size_t wsize = NFS_SERVER(inode)-wsize;
@@ -682,9 +683,9 @@ static ssize_t nfs_direct_write(struct k
sync = FLUSH_STABLE;
 
dreq-inode = inode;
-   dreq-ctx = get_nfs_open_context((struct nfs_open_context 
*)iocb-ki_filp-private_data);
-   if (!is_sync_kiocb(iocb))
-   dreq-iocb = iocb;
+   dreq-ctx = get_nfs_open_context((struct nfs_open_context 
*)file-private_data);
+   dreq-endio = endio;
+   dreq-endio_data = endio_data;
 
nfs_add_stats(inode, NFSIOS_DIRECTWRITTENBYTES, count);
 
@@ -701,10 +702,12 @@ static ssize_t nfs_direct_write(struct k
 
 /**
  * nfs_file_direct_read - file direct read operation for NFS 

[PATCH -mm 3/10][RFC] aio: use iov_length instead of ki_left

2007-01-15 Thread Nate Diller
Convert code using iocb-ki_left to use the more generic iov_length() call. 

---

diff -urpN -X dontdiff a/fs/ocfs2/file.c b/fs/ocfs2/file.c
--- a/fs/ocfs2/file.c   2007-01-10 11:50:26.0 -0800
+++ b/fs/ocfs2/file.c   2007-01-10 12:42:09.0 -0800
@@ -1157,7 +1157,7 @@ static ssize_t ocfs2_file_aio_write(stru
   filp-f_path.dentry-d_name.name);
 
/* happy write of zero bytes */
-   if (iocb-ki_left == 0)
+   if (iov_length(iov, nr_segs) == 0)
return 0;
 
mutex_lock(inode-i_mutex);
@@ -1177,7 +1177,7 @@ static ssize_t ocfs2_file_aio_write(stru
}
 
ret = ocfs2_prepare_inode_for_write(filp-f_path.dentry, iocb-ki_pos,
-   iocb-ki_left, appending);
+   iov_length(iov, nr_segs), appending);
if (ret  0) {
mlog_errno(ret);
goto out;
diff -urpN -X dontdiff a/fs/smbfs/file.c b/fs/smbfs/file.c
--- a/fs/smbfs/file.c   2007-01-10 11:50:28.0 -0800
+++ b/fs/smbfs/file.c   2007-01-10 12:42:09.0 -0800
@@ -222,7 +222,7 @@ smb_file_aio_read(struct kiocb *iocb, co
ssize_t status;
 
VERBOSE(file %s/%s, [EMAIL PROTECTED], DENTRY_PATH(dentry),
-   (unsigned long) iocb-ki_left, (unsigned long) pos);
+   (unsigned long) iov_length(iov, nr_segs), (unsigned long) pos);
 
status = smb_revalidate_inode(dentry);
if (status) {
@@ -328,7 +328,7 @@ smb_file_aio_write(struct kiocb *iocb, c
 
VERBOSE(file %s/%s, [EMAIL PROTECTED],
DENTRY_PATH(dentry),
-   (unsigned long) iocb-ki_left, (unsigned long) pos);
+   (unsigned long) iov_length(iov, nr_segs), (unsigned long) pos);
 
result = smb_revalidate_inode(dentry);
if (result) {
@@ -341,7 +341,7 @@ smb_file_aio_write(struct kiocb *iocb, c
if (result)
goto out;
 
-   if (iocb-ki_left  0) {
+   if (iov_length(iov, nr_segs)  0) {
result = generic_file_aio_write(iocb, iov, nr_segs, pos);
VERBOSE(pos=%ld, size=%ld, mtime=%ld, atime=%ld\n,
(long) file-f_pos, (long) dentry-d_inode-i_size,
diff -urpN -X dontdiff a/fs/udf/file.c b/fs/udf/file.c
--- a/fs/udf/file.c 2007-01-10 11:53:02.0 -0800
+++ b/fs/udf/file.c 2007-01-10 12:42:09.0 -0800
@@ -109,7 +109,7 @@ static ssize_t udf_file_aio_write(struct
struct file *file = iocb-ki_filp;
struct inode *inode = file-f_path.dentry-d_inode;
int err, pos;
-   size_t count = iocb-ki_left;
+   size_t count = iov_length(iov, nr_segs);
 
if (UDF_I_ALLOCTYPE(inode) == ICBTAG_FLAG_AD_IN_ICB)
{
diff -urpN -X dontdiff a/net/socket.c b/net/socket.c
--- a/net/socket.c  2007-01-10 12:40:54.0 -0800
+++ b/net/socket.c  2007-01-10 12:42:09.0 -0800
@@ -632,7 +632,7 @@ static ssize_t sock_aio_read(struct kioc
if (pos != 0)
return -ESPIPE;
 
-   if (iocb-ki_left == 0) /* Match SYS5 behaviour */
+   if (iov_length(iov, nr_segs) == 0)  /* Match SYS5 behaviour */
return 0;
 
for (i = 0; i  nr_segs; i++)
@@ -660,7 +660,7 @@ static ssize_t sock_aio_write(struct kio
if (pos != 0)
return -ESPIPE;
 
-   if (iocb-ki_left == 0) /* Match SYS5 behaviour */
+   if (iov_length(iov, nr_segs) == 0)  /* Match SYS5 behaviour */
return 0;
 
for (i = 0; i  nr_segs; i++)
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH -mm 9/10][RFC] aio: usb gadget remove aio file ops

2007-01-15 Thread Nate Diller
This removes the aio implementation from the usb gadget file system.  Aside
from making very creative (!) use of the aio retry path, it can't be of any
use performance-wise because it always kmalloc()s a bounce buffer for the
*whole* I/O size.  Perhaps the only reason to keep it around is the ability
to cancel I/O requests, which only applies when using the user space async
I/O interface.  I highly doubt that is enough incentive to justify the extra
complexity here or in user-space, so I think it's a safe bet to remove this. 
If that feature still desired, it would be possible to implement a sync
interface that does an interruptible sleep.

I can be convinced otherwise, but the alternatives are difficult.  See for
example the fuse, get_user_pages, flush_anon_page, aliasing caches and all
that again LKML thread recently for why it's waaay easier to kmalloc a
bounce buffer here, and (ab)use the retry interface.

---

diff -urpN -X dontdiff a/drivers/usb/gadget/inode.c b/drivers/usb/gadget/inode.c
--- a/drivers/usb/gadget/inode.c2007-01-10 13:23:46.0 -0800
+++ b/drivers/usb/gadget/inode.c2007-01-10 16:56:09.0 -0800
@@ -527,218 +527,6 @@ static int ep_ioctl (struct inode *inode
 
 /*--*/
 
-/* ASYNCHRONOUS ENDPOINT I/O OPERATIONS (bulk/intr/iso) */
-
-struct kiocb_priv {
-   struct usb_request  *req;
-   struct ep_data  *epdata;
-   void*buf;
-   const struct iovec  *iv;
-   unsigned long   nr_segs;
-   unsignedactual;
-};
-
-static int ep_aio_cancel(struct kiocb *iocb, struct io_event *e)
-{
-   struct kiocb_priv   *priv = iocb-private;
-   struct ep_data  *epdata;
-   int value;
-
-   local_irq_disable();
-   epdata = priv-epdata;
-   // spin_lock(epdata-dev-lock);
-   kiocbSetCancelled(iocb);
-   if (likely(epdata  epdata-ep  priv-req))
-   value = usb_ep_dequeue (epdata-ep, priv-req);
-   else
-   value = -EINVAL;
-   // spin_unlock(epdata-dev-lock);
-   local_irq_enable();
-
-   aio_put_req(iocb);
-   return value;
-}
-
-static int ep_aio_read_retry(struct kiocb *iocb)
-{
-   struct kiocb_priv   *priv = iocb-private;
-   ssize_t total;
-   int i, err = 0;
-
-   /* we retry to get the right mm context for this: */
-
-   /* copy stuff into user buffers */
-   total = priv-actual;
-   for (i=0; i  priv-nr_segs; i++) {
-   ssize_t this = min((ssize_t)(priv-iv[i].iov_len), total);
-
-   if (copy_to_user(priv-iv[i].iov_base, priv-buf, this)) {
-   err = -EFAULT;
-   break;
-   }
-
-   total -= this;
-   if (total == 0)
-   break;
-   }
-   kfree(priv-buf);
-   kfree(priv);
-   aio_put_req(iocb);
-   return err;
-}
-
-static void ep_aio_complete(struct usb_ep *ep, struct usb_request *req)
-{
-   struct kiocb*iocb = req-context;
-   struct kiocb_priv   *priv = iocb-private;
-   struct ep_data  *epdata = priv-epdata;
-
-   /* lock against disconnect (and ideally, cancel) */
-   spin_lock(epdata-dev-lock);
-   priv-req = NULL;
-   priv-epdata = NULL;
-   if (priv-iv == NULL
-   || unlikely(req-actual == 0)
-   || unlikely(kiocbIsCancelled(iocb))) {
-   kfree(req-buf);
-   kfree(priv);
-   iocb-private = NULL;
-   /* aio_complete() reports bytes-transferred _and_ faults */
-   if (unlikely(kiocbIsCancelled(iocb)))
-   aio_put_req(iocb);
-   else
-   aio_complete(iocb, req-actual, req-status);
-   } else {
-   /* retry() won't report both; so we hide some faults */
-   if (unlikely(0 != req-status))
-   DBG(epdata-dev, %s fault %d len %d\n,
-   ep-name, req-status, req-actual);
-
-   priv-buf = req-buf;
-   priv-actual = req-actual;
-   kick_iocb(iocb);
-   }
-   spin_unlock(epdata-dev-lock);
-
-   usb_ep_free_request(ep, req);
-   put_ep(epdata);
-}
-
-static ssize_t
-ep_aio_rwtail(
-   struct kiocb*iocb,
-   char*buf,
-   size_t  len,
-   struct ep_data  *epdata,
-   const struct iovec *iv,
-   unsigned long   nr_segs
-)
-{
-   struct kiocb_priv   *priv;
-   struct usb_request  *req;
-   ssize_t value;
-
-   priv = kmalloc(sizeof *priv, GFP_KERNEL);
-   if (!priv) {
-   value = -ENOMEM;
-fail:
-   kfree(buf);
-   return value;
-   }
-   iocb-private = priv;
-   

[PATCH -mm 7/10][RFC] aio: make __blockdev_direct_IO use file_endio_t

2007-01-15 Thread Nate Diller
This converts the internals of __blockdev_direct_IO in fs/direct-io.c to use
a generic endio function, instead of directly calling aio_complete.  It also
changes the semantics of dio_iodone to be more friendly to its only users,
xfs and ocfs2.  This allows the caller to know how to release locks and tear
down data structures on error.

It also converts the _own_locking and _no_locking variants of
blockdev_direct_IO to use a generic endio function.

---

 fs/direct-io.c  |   74 ++--
 fs/gfs2/ops_address.c   |6 +--
 fs/ocfs2/aops.c |   15 ++--
 fs/ocfs2/aops.h |8 
 fs/ocfs2/file.c |   18 --
 fs/ocfs2/inode.h|2 -
 fs/xfs/linux-2.6/xfs_aops.c |   33 +++
 include/linux/fs.h  |   57 ++---
 8 files changed, 104 insertions(+), 109 deletions(-)

---

diff -urpN -X dontdiff a/fs/direct-io.c b/fs/direct-io.c
--- a/fs/direct-io.c2007-01-12 14:53:48.0 -0800
+++ b/fs/direct-io.c2007-01-12 15:06:44.0 -0800
@@ -67,7 +67,7 @@ struct dio {
struct bio *bio;/* bio under assembly */
struct inode *inode;
int rw;
-   loff_t i_size;  /* i_size when submitted */
+   unsigned max_to_read;   /* (i_size when submitted) - offset */
int lock_type;  /* doesn't change */
unsigned blkbits;   /* doesn't change */
unsigned blkfactor; /* When we're using an alignment which
@@ -89,6 +89,7 @@ struct dio {
int reap_counter;   /* rate limit reaping */
get_block_t *get_block; /* block mapping function */
dio_iodone_t *end_io;   /* IO completion function */
+   void *destructor_data;  /* private data for completion fn */
sector_t final_block_in_bio;/* current final block in bio + 1 */
sector_t next_block_for_io; /* next block to be put under IO,
   in dio_blocks units */
@@ -127,7 +128,8 @@ struct dio {
struct task_struct *waiter; /* waiting task (NULL if none) */
 
/* AIO related stuff */
-   struct kiocb *iocb; /* kiocb */
+   file_endio_t *file_endio;   /* aio completion function */
+   void *endio_data;   /* private data for aio completion */
int is_async;   /* is IO async ? */
int io_error;   /* IO error in completion path */
ssize_t result; /* IO result */
@@ -222,7 +224,7 @@ static struct page *dio_get_page(struct 
  * filesystems can use it to hold additional state between get_block calls and
  * dio_complete.
  */
-static int dio_complete(struct dio *dio, loff_t offset, int ret)
+static int dio_complete(struct dio *dio, int ret)
 {
/*
 * AIO submission can race with bio completion to get here while
@@ -232,25 +234,21 @@ static int dio_complete(struct dio *dio,
 */
if (ret == -EIOCBQUEUED)
ret = 0;
+   if (ret == 0)
+   ret = dio-page_errors;
+   if (ret == 0)
+   ret = dio-io_error;
 
if (dio-result) {
/* Check for short read case */
-   if ((dio-rw == READ)  ((offset + dio-result)  dio-i_size))
-   dio-result = dio-i_size - offset;
+   if ((dio-rw == READ)  (dio-result  dio-max_to_read))
+   dio-result = dio-max_to_read;
}
 
-   if (dio-end_io  dio-result)
-   dio-end_io(dio-iocb, offset, dio-result,
-   dio-map_bh.b_private);
if (dio-lock_type == DIO_LOCKING)
/* lockdep: non-owner release */
up_read_non_owner(dio-inode-i_alloc_sem);
 
-   if (ret == 0)
-   ret = dio-page_errors;
-   if (ret == 0)
-   ret = dio-io_error;
-
return ret;
 }
 
@@ -277,8 +275,11 @@ static int dio_bio_end_aio(struct bio *b
spin_unlock_irqrestore(dio-bio_lock, flags);
 
if (remaining == 0) {
-   int err = dio_complete(dio, dio-iocb-ki_pos, 0);
-   aio_complete(dio-iocb, dio-result, err);
+   int err = dio_complete(dio, 0);
+   if (dio-end_io)
+   dio-end_io(dio-destructor_data, dio-result,
+   dio-map_bh.b_private);
+   dio-file_endio(dio-endio_data, dio-result, err);
kfree(dio);
}
 
@@ -944,10 +945,11 @@ out:
  * Releases both i_mutex and i_alloc_sem
  */
 static ssize_t
-direct_io_worker(int rw, struct kiocb *iocb, struct inode *inode, 
+direct_io_worker(int rw, struct file *file, struct inode *inode, 
const struct iovec *iov, loff_t offset, unsigned long nr_segs, 
unsigned blkbits, get_block_t get_block, 

Re: [PATCH -mm 3/10][RFC] aio: use iov_length instead of ki_left

2007-01-15 Thread Christoph Hellwig
On Mon, Jan 15, 2007 at 05:54:50PM -0800, Nate Diller wrote:
 Convert code using iocb-ki_left to use the more generic iov_length() call. 

No way.  We need to reduce the numer of iovec traversals, not adding
more of them.

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH -mm 8/10][RFC] aio: make direct_IO aops use file_endio_t

2007-01-15 Thread Nate Diller
This converts the _locking variant of blockdev_direct_IO to use a generic
endio function, and updates all the FS callsites.

---

 Documentation/filesystems/Locking |5 +++--
 Documentation/filesystems/vfs.txt |5 +++--
 fs/block_dev.c|9 -
 fs/ext2/inode.c   |   12 +---
 fs/ext3/inode.c   |   11 +--
 fs/ext4/inode.c   |   11 +--
 fs/fat/inode.c|   12 ++--
 fs/gfs2/ops_address.c |8 
 fs/hfs/inode.c|   13 ++---
 fs/hfsplus/inode.c|   13 ++---
 fs/jfs/inode.c|   12 +---
 fs/nfs/direct.c   |8 +---
 fs/ocfs2/aops.c   |9 +
 fs/reiserfs/inode.c   |   13 +
 fs/xfs/linux-2.6/xfs_aops.c   |   11 ++-
 fs/xfs/linux-2.6/xfs_lrw.c|4 ++--
 include/linux/fs.h|   28 +---
 include/linux/nfs_fs.h|4 ++--
 mm/filemap.c  |   34 ++
 19 files changed, 108 insertions(+), 114 deletions(-)

---

diff -urpN -X dontdiff a/Documentation/filesystems/Locking 
b/Documentation/filesystems/Locking
--- a/Documentation/filesystems/Locking 2007-01-12 20:26:06.0 -0800
+++ b/Documentation/filesystems/Locking 2007-01-12 20:42:37.0 -0800
@@ -169,8 +169,9 @@ prototypes:
sector_t (*bmap)(struct address_space *, sector_t);
int (*invalidatepage) (struct page *, unsigned long);
int (*releasepage) (struct page *, int);
-   int (*direct_IO)(int, struct kiocb *, const struct iovec *iov,
-   loff_t offset, unsigned long nr_segs);
+   int (*direct_IO)(int, struct file *, const struct iovec *iov,
+   loff_t offset, unsigned long nr_segs,
+   file_endio_t *endio, void *endio_data);
int (*launder_page) (struct page *);
 
 locking rules:
diff -urpN -X dontdiff a/Documentation/filesystems/vfs.txt 
b/Documentation/filesystems/vfs.txt
--- a/Documentation/filesystems/vfs.txt 2007-01-12 20:26:06.0 -0800
+++ b/Documentation/filesystems/vfs.txt 2007-01-12 20:42:37.0 -0800
@@ -537,8 +537,9 @@ struct address_space_operations {
sector_t (*bmap)(struct address_space *, sector_t);
int (*invalidatepage) (struct page *, unsigned long);
int (*releasepage) (struct page *, int);
-   ssize_t (*direct_IO)(int, struct kiocb *, const struct iovec *iov,
-   loff_t offset, unsigned long nr_segs);
+   ssize_t (*direct_IO)(int, struct file *, const struct iovec *iov,
+   loff_t offset, unsigned long nr_segs,
+   file_endio_t *endio, void *endio_data);
struct page* (*get_xip_page)(struct address_space *, sector_t,
int);
/* migrate the contents of a page to the specified target */
diff -urpN -X dontdiff a/fs/block_dev.c b/fs/block_dev.c
--- a/fs/block_dev.c2007-01-12 20:29:02.0 -0800
+++ b/fs/block_dev.c2007-01-12 20:42:37.0 -0800
@@ -222,10 +222,11 @@ static void blk_unget_page(struct page *
 }
 
 static ssize_t
-blkdev_direct_IO(int rw, struct kiocb *iocb, const struct iovec *iov,
-loff_t pos, unsigned long nr_segs)
+blkdev_direct_IO(int rw, struct file *file, const struct iovec *iov,
+loff_t pos, unsigned long nr_segs, file_endio_t *endio,
+void *endio_data)
 {
-   struct inode *inode = iocb-ki_filp-f_mapping-host;
+   struct inode *inode = file-f_mapping-host;
unsigned blkbits = blksize_bits(bdev_hardsect_size(I_BDEV(inode)));
unsigned blocksize_mask = (1  blkbits) - 1;
unsigned long seg = 0;  /* iov segment iterator */
@@ -239,8 +240,6 @@ blkdev_direct_IO(int rw, struct kiocb *i
loff_t size;/* size of block device */
struct bio *bio;
struct bdev_aio stack_io, *io;
-   file_endio_t *endio = aio_complete;
-   void *endio_data = iocb;
struct page *page;
struct pvec pvec;
 
diff -urpN -X dontdiff a/fs/ext2/inode.c b/fs/ext2/inode.c
--- a/fs/ext2/inode.c   2007-01-12 20:26:06.0 -0800
+++ b/fs/ext2/inode.c   2007-01-12 20:42:37.0 -0800
@@ -752,14 +752,12 @@ static sector_t ext2_bmap(struct address
 }
 
 static ssize_t
-ext2_direct_IO(int rw, struct kiocb *iocb, const struct iovec *iov,
-   loff_t offset, unsigned long nr_segs)
+ext2_direct_IO(int rw, struct file *file, const struct iovec *iov,
+  loff_t offset, unsigned long nr_segs, file_endio_t *endio,
+  void *endio_data)
 {
-   struct file *file = iocb-ki_filp;
-   struct inode *inode = file-f_mapping-host;
-
-   return blockdev_direct_IO(rw, iocb, inode, inode-i_sb-s_bdev, iov,
-   

[PATCH -mm 2/10][RFC] aio: net use struct socket for io

2007-01-15 Thread Nate Diller
Remove unused arg from socket operations

The sendmsg and recvmsg socket operations take a kiocb pointer, but none of
the functions actually use it.  There's really no need even theoretically,
it's really quite ugly having it there at all.  Also, removing it will pave
the way for a more generic completion path in the file_operations.

---

 drivers/net/pppoe.c   |8 +++
 include/linux/net.h   |   18 +++--
 include/net/bluetooth/bluetooth.h |2 -
 include/net/inet_common.h |3 --
 include/net/sock.h|   19 --
 include/net/tcp.h |6 ++---
 include/net/udp.h |3 --
 net/appletalk/ddp.c   |5 +---
 net/atm/common.c  |6 +
 net/atm/common.h  |7 ++
 net/ax25/af_ax25.c|7 ++
 net/bluetooth/af_bluetooth.c  |4 +--
 net/bluetooth/hci_sock.c  |7 ++
 net/bluetooth/l2cap.c |2 -
 net/bluetooth/rfcomm/sock.c   |8 +++
 net/bluetooth/sco.c   |3 --
 net/core/sock.c   |   12 ---
 net/dccp/dccp.h   |8 +++
 net/dccp/probe.c  |3 --
 net/dccp/proto.c  |7 ++
 net/decnet/af_decnet.c|7 ++
 net/econet/af_econet.c|7 ++
 net/ipv4/af_inet.c|5 +---
 net/ipv4/raw.c|8 ++-
 net/ipv4/tcp.c|7 ++
 net/ipv4/tcp_probe.c  |3 --
 net/ipv4/udp.c|9 +++-
 net/ipv4/udp_impl.h   |2 -
 net/ipv6/raw.c|6 +
 net/ipv6/udp.c|   10 +++--
 net/ipv6/udp_impl.h   |6 +
 net/ipx/af_ipx.c  |7 ++
 net/irda/af_irda.c|   29 +---
 net/key/af_key.c  |6 +
 net/llc/af_llc.c  |7 ++
 net/netlink/af_netlink.c  |6 +
 net/netrom/af_netrom.c|7 ++
 net/packet/af_packet.c|   11 --
 net/rose/af_rose.c|7 ++
 net/sctp/socket.c |9 +++-
 net/socket.c  |   32 ++-
 net/tipc/socket.c |   28 +--
 net/unix/af_unix.c|   39 +++---
 net/wanrouter/af_wanpipe.c|7 ++
 net/x25/af_x25.c  |6 +
 45 files changed, 166 insertions(+), 243 deletions(-)

---

diff -urpN -X dontdiff a/drivers/net/pppoe.c b/drivers/net/pppoe.c
--- a/drivers/net/pppoe.c   2007-01-12 11:18:47.244855016 -0800
+++ b/drivers/net/pppoe.c   2007-01-12 11:29:21.179177108 -0800
@@ -746,8 +746,8 @@ static int pppoe_ioctl(struct socket *so
 }
 
 
-static int pppoe_sendmsg(struct kiocb *iocb, struct socket *sock,
- struct msghdr *m, size_t total_len)
+static int pppoe_sendmsg(struct socket *sock, struct msghdr *m,
+size_t total_len)
 {
struct sk_buff *skb = NULL;
struct sock *sk = sock-sk;
@@ -912,8 +912,8 @@ static struct ppp_channel_ops pppoe_chan
.start_xmit = pppoe_xmit,
 };
 
-static int pppoe_recvmsg(struct kiocb *iocb, struct socket *sock,
- struct msghdr *m, size_t total_len, int flags)
+static int pppoe_recvmsg(struct socket *sock, struct msghdr *m,
+size_t total_len, int flags)
 {
struct sock *sk = sock-sk;
struct sk_buff *skb = NULL;
diff -urpN -X dontdiff a/include/linux/net.h b/include/linux/net.h
--- a/include/linux/net.h   2007-01-12 11:18:56.683629587 -0800
+++ b/include/linux/net.h   2007-01-12 11:29:21.185175058 -0800
@@ -118,7 +118,6 @@ struct socket {
 
 struct vm_area_struct;
 struct page;
-struct kiocb;
 struct sockaddr;
 struct msghdr;
 struct module;
@@ -156,11 +155,10 @@ struct proto_ops {
  int optname, char __user *optval, int 
optlen);
int (*compat_getsockopt)(struct socket *sock, int level,
  int optname, char __user *optval, int 
__user *optlen);
-   int (*sendmsg)   (struct kiocb *iocb, struct socket *sock,
- struct msghdr *m, size_t total_len);
-   int (*recvmsg)   (struct kiocb *iocb, struct socket *sock,
- struct msghdr *m, size_t total_len,
- int flags);
+   int (*sendmsg)   (struct socket *sock, struct msghdr *m,
+ size_t total_len);
+   int (*recvmsg)   (struct socket *sock, struct msghdr *m,
+ size_t total_len, int flags);
int