date:20061109

Re: Generic Netlink HOW-TO based on Jamal's original doc

2006-11-09 Thread Paul Moore

James Morris wrote:
>>An Introduction To Using Generic Netlink
>>===
> 
> 
> Wow, this is great!

Thanks.  I consider it an act of penance for all of the evil things I did
with Netlink on my first few iterations of NetLabel ;)

-- 
paul moore
linux security @ hp
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Generic Netlink HOW-TO based on Jamal's original doc

2006-11-09 Thread James Morris


> An Introduction To Using Generic Netlink
> ===

Wow, this is great!


-- 
James Morris
<[EMAIL PROTECTED]>
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Generic Netlink HOW-TO based on Jamal's original doc

2006-11-09 Thread Paul Moore

A couple of months ago I promised Jamal and Thomas I would post some comments to
Jamal's original genetlink how-to.  However, as I started to work on the
document the diff from the original started to get a little ridiculous so
instead of posting a patch against Jamal's original how-to I'm just posting the
revised document in it's entirety.

In the document below I tried to summarize all of the things I learned while
developing NetLabel.  Some of it came from Jamal's document, some the kernel
code, and some from discussions with Thomas.  Hopefully this document will make
it much easier for others to use genetlink in the future.

If this text below is acceptable to everyone, should this be added to the
Documentation directory?


An Introduction To Using Generic Netlink
===

Last Updated: November 10, 2006

Table of Contents

 1. Introduction
 1.1. Document Overview
 1.2. Netlink And Generic Netlink
 2. Architectural Overview
 3. Generic Netlink Families
3.1. Family Overview
 3.1.1. The genl_family Structure
 3.1.2. The genl_ops Structure
3.2. Registering A Family
 4. Generic Netlink Communications
4.1. Generic Netlink Message Format
4.2. Kernel Communication
 4.2.1. Sending Messages
 4.2.2. Receiving Messages
4.3. Userspace Communication
 5. Recommendations
5.1. Attributes And Message Payloads
5.2. Operation Granularity
5.3. Acknowledgment And Error Reporting


1. Introduction
--

1.1. Document Overview
--

This document gives is a brief introduction to Generic Netlink, some simple
examples on how to use it, and some recommendations on how to make the most of
the Generic Netlink communications interface.  While this document does not
require that the reader have a detailed understanding of what Netlink is
and how it works, some basic Netlink knowledge is assumed.  As usual, the
kernel source code is your best friend here.

While this document talks briefly about Generic Netlink from a userspace point
of view it's primary focus is on the kernel's Generic Netlink API.  It is
recommended that application developers who are interested in using Generic
Netlink make use of the libnl library[1].

[1] http://people.suug.ch/~tgr/libnl

1.2. Netlink And Generic Netlink
--

Netlink is a flexible, robust wire-format communications channel typically
used for kernel to user communication although it can also be used for
user to user and kernel to kernel communications.  Netlink communication
channels are associated with families or "busses", where each bus deals with a
specific service; for example, different Netlink busses exist for routing,
XFRM, netfilter, and several other kernel subsystems.  More information about
Netlink can be found in RFC 3549[1].

Over the years, Netlink has become very popular which has brought about a very
real concern that the number of Netlink family numbers may be exhausted in the
near future.  In response to this the Generic Netlink family was created which
acts as a Netlink multiplexer, allowing multiple service to use a single
Netlink bus.

[1] ftp://ftp.rfc-editor.org/in-notes/rfc3549.txt

2. Architectural Overview
--

Figure #1 illustrates how the basic Generic Netlink architecture which is
composed of five different types of components.

 1) The Netlink subsystem which serves as the underlying transport layer for
all of the Generic Netlink communications.

 2) The Generic Netlink bus which is implemented inside the kernel, but which
is available to userspace through the socket API and inside the kernel via
the normal Netlink and Generic Netlink APIs.

 3) The Generic Netlink users who communicate with each other over the Generic
Netlink bus; users can exist both in kernel and user space.

 4) The Generic Netlink controller which is part of the kernel and is
responsible for dynamically allocating Generic Netlink communication
channels and other management tasks.  The Generic Netlink controller is
implemented as a standard Generic Netlink user, however, it listens on a
special, pre-allocated Generic Netlink channel.

 5) The kernel socket API.  Generic Netlink sockets are created with the
PF_NETLINK domain and the NETLINK_GENERIC protocol values.

  +-+  +-+
  | (3) application "A" |  | (3) application "B" |
  +--+--+  +--+--+
 ||
 \/
  \  /
   |

bcm43xx-d80211 broadcast reception with WPA

2006-11-09 Thread Paul Hampson


Hi,

Long time lurker, first time poster. ^_^

I've been backporting the bcm43xx-d80211 driver to whatever the released
2.6 kernel was using the rt2x00 project's d80211 stack (equivalent to
current wireless-dev but with a workaround for not having a ieee80211_dev
pointer and still using the _tfm interface instead of the _cypher interface.)

As of last night's wireless-dev tree bcm43xx, everything seems to be
operating fine except incoming broadcast traffic is coming in 14 bytes too
long and scrambled. I presume this means it's not decrypting properly...

Anyway, I just thought I'd mention it. It might have gone unnoticed by the
bcm43xx-d80211 developers, since it doesn't interfere with normal operation
(A DHCP client's only broadcasts are outgoing) and only showed up for me
because radvd's RAs were not arriving and my IPv6 address was not being set.

I couldn't find any mention of such a thing on the list, and I'm happy to
provide whatever debugging output is useful, but the laptop with the device
isn't with me at the moment.

Relevant facts:

Platform: Debian/unstable (PPC) w/linux-image-2.6.18-1-powerpc (2.6.18-3)
Drivers:  bcm43xx-d80211 from wireless-dev 
774f233b7915a2c36480eb4d98e6f57938f04b7b
Firmware: 4.80.46.0 (BE, from AppleAirPortBrcm4311)
Stack:ieee80211 from http://rt2x00.serialmonkey.com/rt2x00-cvs-daily.tar.gz
2006110303 is the date on the output, I believe. Hasn't been updated since 
20061028
Plus a backport of the following commits:
[PATCH] d80211: extend extra_hdr_room to be a bytecount 
522e078b9f1f8309770dd161d90ddac1573a7877
[PATCH] d80211: remove unused variable in ieee80211_rx_irqsafe 
10bfc9cdf9621385a3b69aa35f9fa86cc6a46bc6
[PATCH] d80211: Add wireless statistics 448bf25bc9e3d70a211fdf235426472089371c43
(as well as anything else that showed up in a diff of the d80211 dir against 
the rt2x00
iee80211 dir and wasn't a 2.6.19ism or wireless-devism)

I'm basically using the instructions I posted at [1] except also patching 
rt2x00's
ieee80211 stack.

I acknowledge that any of the firmware version, the backporting, the forward 
porting
or the current lunar cycle may be causing this problem. If no one pipes up with 
an
insight, I'll try tonight with a v3 firmware, although the reason I moved to a 
v3
firmware was my previous build of bcm43xx-d80211 also wasn't getting an IPv6 
address,
although I don't believe the RAs were scrambled in that case.

[1] 
http://openfacts.berlios.de/index-en.phtml?title=Broadcom_43xx_Linux_Driver/Debian_Unstable_with_Devicescape_802.11_stack

--
Paul "TBBle" Hampson
Opinions expressed here do not reflect the views of my employer
Hell, we don't even agree on my pay cheque

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Questions regarding network drivers

2006-11-09 Thread Jonathan Day

Hi,

I've got an interesting problem to contend with and
need some advice from the great wise ones here.

First of all, is it possible (and/or "reasonable
practice") when developing a network driver to do
zero-copy transfers between main memory and the
network device?

Secondly, the network device is only designed to work
with short packets and I really want to keep the
throughput up. My thought was that if I fired off an
interrupt then transfer a page of data into an area I
know is safe, the kernel will have enough time to find
a new safe area and post the address before the next
page is ready to send.

Can anyone suggest why this wouldn't work or, assuming
it can work, why this would be a Bad Idea?

Lastly, assuming my sanity lasts that long, would I be
correct in assuming that the first step in the process
of getting the driver peer-reviewed and accepted would
be to post the patches here?

Thanks for any help,

Jonathan Day



 

Yahoo! Music Unlimited
Access over 1 million songs.
http://music.yahoo.com/unlimited
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [patch 3/9] iPhase: 64bit cleanup

2006-11-09 Thread Alan Cox

Ar Iau, 2006-11-09 am 16:12 -0800, ysgrifennodd David Miller:
> Really, this driver has a ton of unresolved portability problems.

Agreed - but at least its now 64bit clean. No objection to leaving it !
64BIT at all even with the patch merged.

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 4/4] skge: version 1.9

2006-11-09 Thread Jay Vosburgh

Michael Stone <[EMAIL PROTECTED]> wrote:

>On Tue, Nov 07, 2006 at 12:28:26PM -0800, Jay Vosburgh wrote:
>>  Can you provide some bonding configuration details?  Which mode,
>>options, etc, as well as the relevant bits from dmesg (you can send it
>> to me privately if it's huge)?
>
>I think I sent another message that I'm just doing ifenslave bond0 eth2
>ifenslave bond0 eth3 with no particular options set.

I was thinking of the options to the bonding driver (probably in
the network configuration or /etc/modprobe.conf); a reasonable set of
information is in /proc/net/bonding/bond0, so that's a good place to
start, along with whatever "uname -a" prints (kernel version,
architecture, mostly).

[...] The application is a network sniffer, and
>the cards are forced to 1000Mbps/full duplex because the other end doesn't
>negotiate. What's the relevant part of dmesg?

Well, pretty much anything from the bonding driver or the
ethernet driver.

-J

---
-Jay Vosburgh, IBM Linux Technology Center, [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 5/5] ip-sysctl.txt alphabetize

2006-11-09 Thread David Miller

From: Stephen Hemminger <[EMAIL PROTECTED]>
Date: Tue, 31 Oct 2006 15:01:45 -0800

> Rearrange TCP entries in alpha order.
> 
> Signed-off-by: Stephen Hemminger <[EMAIL PROTECTED]>

Also applied, thanks a lot.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 3/5] tcp: restrict congestion control choices

2006-11-09 Thread David Miller

From: Stephen Hemminger <[EMAIL PROTECTED]>
Date: Tue, 31 Oct 2006 15:01:43 -0800

> Allow normal users to only choose among a restricted set of congestion
> control choices.  The default is reno and what ever has been configured
> as default. But the policy can be changed by administrator at any time.
> 
> For example, to allow any choice:
> cp /proc/sys/net/ipv4/tcp_available_congestion_control \
>/proc/sys/net/ipv4/tcp_allowed_congestion_control
> 
> Signed-off-by: Stephen Hemminger <[EMAIL PROTECTED]>

Applied, thanks a lot Stephen.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 4/5] tcp: allow autoloading of congestion control via setsockopt

2006-11-09 Thread David Miller

From: Stephen Hemminger <[EMAIL PROTECTED]>
Date: Tue, 31 Oct 2006 15:01:44 -0800

> If user has permision to load modules, then autoload then attempt
> autoload of TCP congestion module.
> 
> Signed-off-by: Stephen Hemminger <[EMAIL PROTECTED]>

Applied, thanks.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 4/4] skge: version 1.9

2006-11-09 Thread Michael Stone


On Tue, Nov 07, 2006 at 12:28:26PM -0800, Jay Vosburgh wrote:

Can you provide some bonding configuration details?  Which mode,
options, etc, as well as the relevant bits from dmesg (you can send it
to me privately if it's huge)?  


I think I sent another message that I'm just doing 
ifenslave bond0 eth2

ifenslave bond0 eth3
with no particular options set. The application is a network sniffer, 
and the cards are forced to 1000Mbps/full duplex because the other end 
doesn't negotiate. What's the relevant part of dmesg?


Mike Stone
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/5] tcp: add tcp_available_congestion_control sysctl

2006-11-09 Thread David Miller

From: Stephen Hemminger <[EMAIL PROTECTED]>
Date: Tue, 31 Oct 2006 15:01:42 -0800

> Create /proc/sys/net/ipv4/tcp_available_congestion_control
> that reflects currently available TCP choices.
> 
> Signed-off-by: Stephen Hemminger <[EMAIL PROTECTED]>

Applied, thanks Stephen.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] warning in SCTP

2006-11-09 Thread David Miller

From: Sridhar Samudrala <[EMAIL PROTECTED]>
Date: Thu, 02 Nov 2006 10:29:49 -0800

> On Thu, 2006-11-02 at 11:09 -0500, Vlad Yasevich wrote:
> > Meelis Roos wrote:
> > >> Actually, I'm backing this one out, it creates new warnings because
> > >> callers of this function pass in a "const" pointer.
> > > 
> > > Yes, it now seems it's not so simple. Marking it non-const there would
> > > mark the it non-const in the whole family of sctp_state_fn_t and I'm not
> > > sure that's the best thing to do. I guess the maintainer has better
> > > bases for deciding what to do about it.
> > > 
> > 
> > An alternate solution would be to make the digest a pointer, allocate
> > it in sctp_endpoint_init() and free it in sctp_endpoint_destroy().
> 
> I agree that this is a better solution.
> 
> Acked-by: Sridhar Samudrala <[EMAIL PROTECTED]>

Applied to net-2.6.20, thanks everyone.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Intel 82559 NIC corrupted EEPROM

2006-11-09 Thread Jesse Brandeburg

On 11/9/06, John <[EMAIL PROTECTED]> wrote:

> The second thought is that the adapter is in D3, and something about
> your kernel or the driver doesn't successfully wake it up to D0.

On my NICs, the EEPROM ID (Word 0Ah) is set to 0x40a2.
Thus DDPD (bit 6) is set to 0.

DDPD is the "Disable Deep Power Down while PME is disabled" bit.
0 - Deep Power Down is enabled in D3 state while PME-disabled.
1 - Deep Power Down disabled in D3 state while PME-disabled.
This bit should be set to 1b if a TCO controller is being used via the
SMB because it requires receive functionality at all power states.

Are you suggesting I try and set DDPD to 1?
Or is this completely unrelated?

This may be related but I doubt it.  Something is strange about how
memory is being mapped in your system.  whatever is creating the
problem moved when you changed the kernel version.  I'm wondering if
there is a device collision at e5302000.  I'm not convinced at this
point it is e100's fault.

can you send output of cat /proc/iomem

> An indication of this would be looking at lspci -vv before/after
> loading the driver.

$ diff -u lspci_vv_before_e100.txt lspci_vv_after_e100.txt
--- lspci_vv_before_e100.txt2006-11-09 14:51:30.0 +0100
+++ lspci_vv_after_e100.txt 2006-11-09 14:51:30.0 +0100
@@ -74,21 +74,20 @@
 Expansion ROM at 2000 [disabled] [size=1M]
 Capabilities: [dc] Power Management version 2
 Flags: PMEClk- DSI+ D1+ D2+ AuxCurrent=0mA
PME(D0+,D1+,D2+,D3hot+,D3cold+)
-   Status: D0 PME-Enable+ DSel=0 DScale=2 PME-
+   Status: D0 PME-Enable- DSel=0 DScale=2 PME-

okay when the driver loads it is clearing PME enable, but not
re-enabling it when it unloads.  That is pretty much expected.

  00:09.0 Ethernet controller: Intel Corporation 82557/8/9 [Ethernet Pro
100] (rev 08)
 Subsystem: Intel Corporation EtherExpress PRO/100B (TX)
-   Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR- FastB2B-
+   Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR- FastB2B-
 Status: Cap+ 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium
 >TAbort- SERR- 

pci_enable_device should be enabling io,mem,busmaster, they are
probably being disabled when the driver errors out of init.  maybe you
should add a call to  pci_set_power_state(dev, PCI_D0); before the
call to e100_reset

> Also, after loading/unloading eepro100 does the e100 driver work?

No.

now that is really odd.

> A third idea is look for a master abort in lspci after e100 fails to
> load.

I don't understand that one.

There isn't one, MAbort+ would be showing in the above lspci output.

The all 0x returns when you read registers is a sure sign the
hardware either isn't at the address specified or is in a power down
state.  The only other option i can think of is that something else is
intercepting memory reads and writes.

try something like the attached patch, compile tested only:

e100_debug.patch
Description: Binary data

Re: [patch] make sch_fifo.o available when CONFIG_NET_SCHED is not set

2006-11-09 Thread David Miller

From: David Kimdon <[EMAIL PROTECTED]>
Date: Wed, 8 Nov 2006 06:06:18 -0800

> Based on patch by Patrick McHardy.
> 
> Add a new option, NET_SCH_FIFO, which provides a simple fifo qdisc
> without requiring CONFIG_NET_SCHED.
> 
> The d80211 stack needs a generic fifo qdisc for WME.  At present it
> uses net/d80211/fifo_qdisc.c which is functionally equivalent to
> sch_fifo.c.  This patch will allow the d80211 stack to remove
> net/d80211/fifo_qdisc.c and use sch_fifo.c instead. 
> 
> Signed-off-by: David Kimdon <[EMAIL PROTECTED]>

Applied to net-2.6.20, thanks David.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [patch 3/9] iPhase: 64bit cleanup

2006-11-09 Thread David Miller

From: [EMAIL PROTECTED]
Date: Wed, 08 Nov 2006 19:51:04 -0800

> From: Alan Cox <[EMAIL PROTECTED]>
> 
> Signed-off-by: Alan Cox <[EMAIL PROTECTED]>
> Signed-off-by: Andrew Morton <[EMAIL PROTECTED]>

This fixes the most obvious 64-bit problems, but it is still very very
broken in other aspects.  It is bad enough that I feel irresponsible
turning it on in the 64-bit build.

For example, it takes an ioremap()'d value, and accesses it as a
regular cpu pointer which will explode on many architectures since
all such accesses should go through asm/io.h accessors.

Specifically I'm talking about dev->seg_ram, it is initialized like
this:

base = ioremap(real_base,iadev->pci_map_size);  /* ioremap is not 
resolved ??? */  
 ...
iadev->seg_ram = base + ACTUAL_SEG_RAM_BASE;  
 ...

Then used like this:

 desc1 = *(u_short *)(dev->seg_ram + dev->host_tcq_wr);

and this:

*(u_short *) (dev->seg_ram + dev->host_tcq_wr) = 0;

and this:

   *(u_short *)(dev->seg_ram + dev->ffL.tcq_rd) = i+1;

and this:

  desc_num = *(u_short *)(dev->seg_ram + dev->ffL.tcq_rd);

and this:

 desc_num = *(u_short *)(dev->seg_ram + dev->ffL.tcq_rd);

and this:

  SchedTbl = (u16*)(dev->seg_ram+CBR_SCHED_TABLE*dev->memSize); 
 ...
  TstSchedTbl = (u16*)(SchedTbl+testSlot);  //set index and read in value
 ...
  memcpy((caddr_t)&cbrVC,(caddr_t)TstSchedTbl,sizeof(cbrVC));
 ...
   memcpy((caddr_t)TstSchedTbl, (caddr_t)&vcIndex,sizeof(TstSchedTbl));

Really, this driver has a ton of unresolved portability problems.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [patch 1/9] bonding: lockdep annotation

2006-11-09 Thread David Miller

From: [EMAIL PROTECTED]
Date: Wed, 08 Nov 2006 19:51:01 -0800

> The bonding driver nests other drivers, give the bonding driver its own
> lock class.
> 
> Signed-off-by: Peter Zijlstra <[EMAIL PROTECTED]>
> Acked-by: Ingo Molnar <[EMAIL PROTECTED]>
> Cc: Stephen Hemminger <[EMAIL PROTECTED]>
> Signed-off-by: Andrew Morton <[EMAIL PROTECTED]>

Applied, thanks Peter.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Realtek 8139 driver (8139too.c) TX Timeout doesn't allow interrupt handler to disable receive interrupts at high bi-directional traffic

2006-11-09 Thread Francois Romieu

Basheer, Mansoor Ahamed <[EMAIL PROTECTED]> :
[...]
> My understanding here is, on receive interrupt the ISR should disable
> the receive interrupt irrespective of the polling task's state (active
> or inactive).

Apparently it could happen even with 2.6.19-rc5, yes.

> I changed the code (as shown below) and it works perfectly. 

Afaics your change may disable the Rx irq right after the poll
routine enabled it again. It will not always work either.

The (slow) timeout watchdog could grab the poll handler and hack
the irq mask depending on whether poll was scheduled or not.

> Is this a known issue? If so, is there a fix already available?
> 
> Also, we get frequent TX timeouts during high rate of traffic. What
> could be the reason for this frequent TX timeouts?

No idea. Have you considered upgrading to a kernel which is not almost
2 years old ?

-- 
Ueimor
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 6/6] [NET] rules: Add support to invert selectors

2006-11-09 Thread David Miller

From: Thomas Graf <[EMAIL PROTECTED]>
Date: Thu, 09 Nov 2006 12:27:41 +0100

> Introduces a new flag FIB_RULE_INVERT causing rules to apply
> if the specified selector doesn't match.
> 
> Signed-off-by: Thomas Graf <[EMAIL PROTECTED]>

Also applied, thanks.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 3/6] [IPv4] nl_fib_lookup: Rename fl_fwmark to fl_mark

2006-11-09 Thread David Miller

From: Thomas Graf <[EMAIL PROTECTED]>
Date: Thu, 09 Nov 2006 12:27:38 +0100

> For the sake of consistency.
> 
> Signed-off-by: Thomas Graf <[EMAIL PROTECTED]>

Applied, thanks Thomas.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/6] [NET]: Rethink mark field in struct flowi

2006-11-09 Thread David Miller

From: Thomas Graf <[EMAIL PROTECTED]>
Date: Thu, 09 Nov 2006 12:27:37 +0100

> Now that all protocols have been made aware of the mark
> field it can be moved out of the union thus simplyfing
> its usage.
> 
> The config options in the IPv4/IPv6/DECnet subsystems
> to enable respectively disable mark based routing only
> obfuscate the code with ifdefs, the cost for the
> additional comparison in the flow key is insignificant,
> and most distributions have all these options enabled
> by default anyway. Therefore it makes sense to remove
> the config options and enable mark based routing by
> default.
> 
> Signed-off-by: Thomas Graf <[EMAIL PROTECTED]>

Applied, and I moved the mark in the flowi up to the top right after
oif/iif in order to make sure it's in the same 32-byte cache line with
the ipv4 addressing.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 5/6] [NET] rules: Share common attribute validation policy

2006-11-09 Thread David Miller

From: Thomas Graf <[EMAIL PROTECTED]>
Date: Thu, 09 Nov 2006 12:27:40 +0100

> Move the attribute policy for the non-specific attributes into
> net/fib_rules.h and include it in the respective protocols.
> 
> Signed-off-by: Thomas Graf <[EMAIL PROTECTED]>

Looks nice, applied, thanks.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 4/6] [NET] rules: Protocol independant mark selector

2006-11-09 Thread David Miller

From: Thomas Graf <[EMAIL PROTECTED]>
Date: Thu, 09 Nov 2006 12:27:39 +0100

> Move mark selector currently implemented per protocol into
> the protocol independant part.
> 
> Signed-off-by: Thomas Graf <[EMAIL PROTECTED]>

Applied, thanks.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/6] [NET]: Turn nfmark into generic mark

2006-11-09 Thread David Miller

From: Thomas Graf <[EMAIL PROTECTED]>
Date: Thu, 09 Nov 2006 12:27:36 +0100

> nfmark is being used in various subsystems and has become
> the defacto mark field for all kinds of packets. Therefore
> it makes sense to rename it to `mark' and remove the
> dependency on CONFIG_NETFILTER.
> 
> Signed-off-by: Thomas Graf <[EMAIL PROTECTED]>

Applied, thanks Thomas.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH wireless-2.6-git] prism54: WPA/RSN support for fullmac cards

2006-11-09 Thread Luis R. Rodriguez


On 11/9/06, Luis R. Rodriguez <[EMAIL PROTECTED]> wrote:

On 11/9/06, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:
> Am Mittwoch, 8. November 2006 01:39 schrieben Sie:
> > On Fri, Nov 03, 2006 at 01:41:46PM -0500, Luis R. Rodriguez wrote:
> > > On 11/3/06, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:
> > > >yes, especially mgt_commit_list caused alot headaches, until I removed
> > > >DOT11_OID_PSM from the cache list.
> > > >Now, I can "hammer" it with ping -f for hours.
> > >
> > > nice, perhaps that's been the culprit all along... going to dig to see
> > > if I find a fullmac prism card. Will like to get this merged in.
> >
> > Any resolution on this?
>
> no replies.
> Seems like it works for just fine for everybody. ;)

I found a card, I just need time to test it. Dan didn't you say you
ran into issues with the patch on your card?

  Luis


CC'ing Dan to make sure he gets it ;)
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH wireless-2.6-git] prism54: WPA/RSN support for fullmac cards

2006-11-09 Thread Luis R. Rodriguez


On 11/9/06, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:

Am Mittwoch, 8. November 2006 01:39 schrieben Sie:
> On Fri, Nov 03, 2006 at 01:41:46PM -0500, Luis R. Rodriguez wrote:
> > On 11/3/06, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:
> > >yes, especially mgt_commit_list caused alot headaches, until I removed
> > >DOT11_OID_PSM from the cache list.
> > >Now, I can "hammer" it with ping -f for hours.
> >
> > nice, perhaps that's been the culprit all along... going to dig to see
> > if I find a fullmac prism card. Will like to get this merged in.
>
> Any resolution on this?

no replies.
Seems like it works for just fine for everybody. ;)


I found a card, I just need time to test it. Dan didn't you say you
ran into issues with the patch on your card?

 Luis
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] mark non-compiling ISA network drivers i386 only

2006-11-09 Thread Olaf Hering

On Thu, Nov 09, Stephen Hemminger wrote:

> On Thu, 9 Nov 2006 19:40:21 +0100 (MET)
> Olaf Hering <[EMAIL PROTECTED]> wrote:
> 
> > 
> > Provide drivers for the old toys only on i386
> > isa_bus_to_virt is defined only on i386, mips and arm
> > isa_virt_to_bus is used for floppy.ko
> 
> Why not mark all of ISA as i386, mips, arm only?

This patch is mainly for users of these 3 macros:

'(isa_virt_to_bus|isa_page_to_bus|isa_bus_to_virt)'

3c505.c 3c515.c 3c523.c 3c527.c aha1542.c cs89x0.c esp.c ibmmca.c
lance.c mca_53c9x.c ni52.c ni65.c ps2esdi.c ultrastor.c wd7000.c

I did not enable all of them in a ppc32 pmac config.
ppc32 PReP does have ISA slots on the motorola boards, no idea if anyone
really cares about ISA cards today.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: why do we mangle checksums for v6 ICMP?

2006-11-09 Thread David Miller

From: Brian Haley <[EMAIL PROTECTED]>
Date: Thu, 09 Nov 2006 12:32:18 -0500

> Al Viro wrote:
> > AFAICS, the rules are:
> > 
> > (1) checksum is 16-bit one's complement of the one's complement sum of
> > relevant 16bit words.
> > 
> > (2) for v4 UDP all-zeroes has special meaning - no checksum; if you get
> > it from (1), send all-ones instead.
> > 
> > (3) for v6 UDP we have the same remapping as in (2), but all-zeroes has
> > different meaning - not "ignore checksum" as in v4, but "reject the
> > packet".
> > 
> > (4) there is no (4).
> > 
> > IOW, nobody except UDP has any business doing that 0->0x
> > replacement.  However, we have
> >if (icmp6h->icmp6_cksum == 0)
> >icmp6h->icmp6_cksum = -1;
> 
> This doesn't look necessary, RFCs 4443/2463 don't mention it being 
> necessary, and BSD doesn't do it either.  I'll cook-up a patch to remove 
> that since I was doing some other mods in that codepath.

This is how things look to me too.

> > and similar in net/ipv6/raw.c
> 
> Maybe here it only needs to be done if (fl->proto == IPPROTO_UDP)?

Yes, I believe that is what is needed.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: e1000 driver 2.6.18 - how to waste processor cycles

2006-11-09 Thread Jeff V. Merkey


Jeff V. Merkey wrote:


Jesse Brandeburg wrote:


On 11/9/06, Jeffrey V. Merkey <[EMAIL PROTECTED]> wrote:


In the case I am referring to, the memory is already mapped with a
previous call, which means it may be getting
mapped twice.




I guess maybe I'm not keeping up with you.  This is what I see looking
in 2.6.18, i see e1000_clean_rx_irq:




Check e1000_alloc_rx_buffers:

if (skb already exists in ring buffer)
 goto map_skb:
else
  dev_alloc_skb
 ( drop through to map_skb)


map_skb:
 pci_map_single

Jeff







check done bit
pci_unmap_single
copybreak and recycle
OR
hand buffer up stack

the only branch before the unmap is the napi break out, and in that
case we don't change any memory state, so alloc will not do anything.

As for alloc rx, we always map, because we always unmapped.


Unmapping every single buffer in rx_irq the remapping them in 
alloc_rx_buffers is wasteful of cycles. 


Jeff



Did I miss something?  I would appreciate a more detailed explanation
of what you see going wrong.




-
To unsubscribe from this list: send the line "unsubscribe 
linux-kernel" in

the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [patch sungem] improved locking

2006-11-09 Thread David Miller


Please use GREG_STAT_* instead of magic constants for the
interrupt mask and ACK register writes.  In fact, there are
some questionable values you use, in particular this one:

+static inline void gem_ack_int(struct gem *gp)
+{
+   writel(0x3f, gp->regs + GREG_IACK);
+}

There is no bit defined in GREG_STAT_* for 0x08, but you
set it in this magic bitmask.  It is another reason not
to use magic constants like this :-)

Also, if you need to use an attachment to get the tabbing
right, that's fine, but please also provide a copy inline
so that it is easy to quote the patch for review purposes.
It's a truly a pain in the rear to quote things when you use
a binary attachment.

I'd like these very simple and straightforward issues to
be worked out before I even begin to review the actual
locking change itself.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: e1000 driver 2.6.18 - how to waste processor cycles

2006-11-09 Thread Jeff V. Merkey


Jesse Brandeburg wrote:


On 11/9/06, Jeffrey V. Merkey <[EMAIL PROTECTED]> wrote:


In the case I am referring to, the memory is already mapped with a
previous call, which means it may be getting
mapped twice.



I guess maybe I'm not keeping up with you.  This is what I see looking
in 2.6.18, i see e1000_clean_rx_irq:



Check e1000_alloc_rx_buffers:

if (skb already exists in ring buffer)
 goto map_skb:
else
  dev_alloc_skb
 ( drop through to map_skb)


map_skb:
 pci_map_single

Jeff



check done bit
pci_unmap_single
copybreak and recycle
OR
hand buffer up stack

the only branch before the unmap is the napi break out, and in that
case we don't change any memory state, so alloc will not do anything.

As for alloc rx, we always map, because we always unmapped.

Did I miss something?  I would appreciate a more detailed explanation
of what you see going wrong.




-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: e1000 driver 2.6.18 - how to waste processor cycles

2006-11-09 Thread Jesse Brandeburg


On 11/9/06, Jeffrey V. Merkey <[EMAIL PROTECTED]> wrote:

In the case I am referring to, the memory is already mapped with a
previous call, which means it may be getting
mapped twice.


I guess maybe I'm not keeping up with you.  This is what I see looking
in 2.6.18, i see e1000_clean_rx_irq:

check done bit
pci_unmap_single
copybreak and recycle
OR
hand buffer up stack

the only branch before the unmap is the napi break out, and in that
case we don't change any memory state, so alloc will not do anything.

As for alloc rx, we always map, because we always unmapped.

Did I miss something?  I would appreciate a more detailed explanation
of what you see going wrong.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[patch sungem] improved locking

2006-11-09 Thread Eric Lemoine


The attached patch improves locking in the sungem driver:

- a single lock is used in the driver
- gem_start_xmit, gem_poll, and gem_interrupt are lockless

The new locking design is based on what's in tg3.c.

The patch runs smoothly on my ibook (with CONFIG_SMP set), but it will
need extensive testing on a multi-cpu box.

The patch includes two implementations for gem_interrupt(). One is
lockless while the other makes use of a spinlock. The spinlock version
is there because I was not sure the lockless version would work with
net_poll_controller. One of the two versions must be removed in the
final patch.

Patch applies to current git net-2.6.

Please review, and test if possible.

Thanks,

Signed-ff-by: Eric Lemoine <[EMAIL PROTECTED]>

--
Eric


sungem-locking.patch
Description: Binary data

Re: [take24 3/6] kevent: poll/select() notifications.

2006-11-09 Thread Davide Libenzi

On Thu, 9 Nov 2006, Davide Libenzi wrote:

> On Thu, 9 Nov 2006, Evgeniy Polyakov wrote:
> 
> > On Thu, Nov 09, 2006 at 10:51:56AM -0800, Davide Libenzi 
> > (davidel@xmailserver.org) wrote:
> > > On Thu, 9 Nov 2006, Evgeniy Polyakov wrote:
> > > 
> > > > +static int kevent_poll_callback(struct kevent *k)
> > > > +{
> > > > +   if (k->event.req_flags & KEVENT_REQ_LAST_CHECK) {
> > > > +   return 1;
> > > > +   } else {
> > > > +   struct file *file = k->st->origin;
> > > > +   unsigned int revents = file->f_op->poll(file, NULL);
> > > > +
> > > > +   k->event.ret_data[0] = revents & k->event.event;
> > > > +   
> > > > +   return (revents & k->event.event);
> > > > +   }
> > > > +}
> > > 
> > > You need to be careful that file->f_op->poll is not called inside the 
> > > spin_lock_irqsave/spin_lock_irqrestore pair, since (even this came up 
> > > during epoll developemtn days) file->f_op->poll might do a simple 
> > > spin_lock_irq/spin_unlock_irq. This unfortunate constrain forced epoll to 
> > > have a suboptimal double O(R) loop to handle LT events.
> >  
> > It is tricky - users call wake_up() from any context, which in turn ends
> > up calling kevent_storage_ready(), which calls kevent_poll_callback() with
> > KEVENT_REQ_LAST_CHECK bit set, which becomes almost empty call in fast
> > path. Since callback returns 1, kevent will be queued into ready queue,
> > which is processed on behalf of syscalls - in that case kevent will
> > check the flag and since KEVENT_REQ_LAST_CHECK is set, will call
> > callback again to check if kevent is correctly marked, but already
> > without that flag (it happens in syscall context, i.e. process context
> > without any locks held), so callback calls ->poll(), which can sleep,
> > but it is safe. If ->poll() returns 'ready' value, kevent is transfers
> > data into userspace, otherwise it is 'requeued' (just removed from
> > ready queue).
> 
> Oh, mine was only a general warn. I hadn't looked at the generic code 
> before. But now that I poke on it, I see:
> 
> void kevent_requeue(struct kevent *k)
> {
>unsigned long flags;
> 
>spin_lock_irqsave(&k->st->lock, flags);
>__kevent_requeue(k, 0);
>spin_unlock_irqrestore(&k->st->lock, flags);
> }
> 
> and then:
> 
> static int __kevent_requeue(struct kevent *k, u32 event)
> {
>int ret, rem;
>unsigned long flags;
> 
>ret = k->callbacks.callback(k);
> 
> Isn't the k->callbacks.callback() possibly end up calling f_op->poll?

Ack, there the check for KEVENT_REQ_LAST_CHECK inside the callback.
The problem with f_op->poll was not that it can sleep (not excluded 
though) but that some f_op->poll can do a simple spin_lock_irq/spin_unlock_irq.
But for a quick peek your new code seems fine with that.



- Davide


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] mark non-compiling ISA network drivers i386 only

2006-11-09 Thread Stephen Hemminger

On Thu, 9 Nov 2006 19:40:21 +0100 (MET)
Olaf Hering <[EMAIL PROTECTED]> wrote:

> 
> Provide drivers for the old toys only on i386
> isa_bus_to_virt is defined only on i386, mips and arm
> isa_virt_to_bus is used for floppy.ko

Why not mark all of ISA as i386, mips, arm only?
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [take24 3/6] kevent: poll/select() notifications.

2006-11-09 Thread Davide Libenzi

On Thu, 9 Nov 2006, Evgeniy Polyakov wrote:

> On Thu, Nov 09, 2006 at 10:51:56AM -0800, Davide Libenzi 
> (davidel@xmailserver.org) wrote:
> > On Thu, 9 Nov 2006, Evgeniy Polyakov wrote:
> > 
> > > +static int kevent_poll_callback(struct kevent *k)
> > > +{
> > > + if (k->event.req_flags & KEVENT_REQ_LAST_CHECK) {
> > > + return 1;
> > > + } else {
> > > + struct file *file = k->st->origin;
> > > + unsigned int revents = file->f_op->poll(file, NULL);
> > > +
> > > + k->event.ret_data[0] = revents & k->event.event;
> > > + 
> > > + return (revents & k->event.event);
> > > + }
> > > +}
> > 
> > You need to be careful that file->f_op->poll is not called inside the 
> > spin_lock_irqsave/spin_lock_irqrestore pair, since (even this came up 
> > during epoll developemtn days) file->f_op->poll might do a simple 
> > spin_lock_irq/spin_unlock_irq. This unfortunate constrain forced epoll to 
> > have a suboptimal double O(R) loop to handle LT events.
>  
> It is tricky - users call wake_up() from any context, which in turn ends
> up calling kevent_storage_ready(), which calls kevent_poll_callback() with
> KEVENT_REQ_LAST_CHECK bit set, which becomes almost empty call in fast
> path. Since callback returns 1, kevent will be queued into ready queue,
> which is processed on behalf of syscalls - in that case kevent will
> check the flag and since KEVENT_REQ_LAST_CHECK is set, will call
> callback again to check if kevent is correctly marked, but already
> without that flag (it happens in syscall context, i.e. process context
> without any locks held), so callback calls ->poll(), which can sleep,
> but it is safe. If ->poll() returns 'ready' value, kevent is transfers
> data into userspace, otherwise it is 'requeued' (just removed from
> ready queue).

Oh, mine was only a general warn. I hadn't looked at the generic code 
before. But now that I poke on it, I see:

void kevent_requeue(struct kevent *k)
{
   unsigned long flags;

   spin_lock_irqsave(&k->st->lock, flags);
   __kevent_requeue(k, 0);
   spin_unlock_irqrestore(&k->st->lock, flags);
}

and then:

static int __kevent_requeue(struct kevent *k, u32 event)
{
   int ret, rem;
   unsigned long flags;

   ret = k->callbacks.callback(k);

Isn't the k->callbacks.callback() possibly end up calling f_op->poll?



- Davide


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [take24 3/6] kevent: poll/select() notifications.

2006-11-09 Thread Evgeniy Polyakov

On Thu, Nov 09, 2006 at 10:51:56AM -0800, Davide Libenzi 
(davidel@xmailserver.org) wrote:
> On Thu, 9 Nov 2006, Evgeniy Polyakov wrote:
> 
> > +static int kevent_poll_callback(struct kevent *k)
> > +{
> > +   if (k->event.req_flags & KEVENT_REQ_LAST_CHECK) {
> > +   return 1;
> > +   } else {
> > +   struct file *file = k->st->origin;
> > +   unsigned int revents = file->f_op->poll(file, NULL);
> > +
> > +   k->event.ret_data[0] = revents & k->event.event;
> > +   
> > +   return (revents & k->event.event);
> > +   }
> > +}
> 
> You need to be careful that file->f_op->poll is not called inside the 
> spin_lock_irqsave/spin_lock_irqrestore pair, since (even this came up 
> during epoll developemtn days) file->f_op->poll might do a simple 
> spin_lock_irq/spin_unlock_irq. This unfortunate constrain forced epoll to 
> have a suboptimal double O(R) loop to handle LT events.

It is tricky - users call wake_up() from any context, which in turn ends
up calling kevent_storage_ready(), which calls kevent_poll_callback() with
KEVENT_REQ_LAST_CHECK bit set, which becomes almost empty call in fast
path. Since callback returns 1, kevent will be queued into ready queue,
which is processed on behalf of syscalls - in that case kevent will
check the flag and since KEVENT_REQ_LAST_CHECK is set, will call
callback again to check if kevent is correctly marked, but already
without that flag (it happens in syscall context, i.e. process context
without any locks held), so callback calls ->poll(), which can sleep,
but it is safe. If ->poll() returns 'ready' value, kevent is transfers
data into userspace, otherwise it is 'requeued' (just removed from
ready queue).

> - Davide
> 

-- 
Evgeniy Polyakov
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [take24 3/6] kevent: poll/select() notifications.

2006-11-09 Thread Davide Libenzi

On Thu, 9 Nov 2006, Evgeniy Polyakov wrote:

> +static int kevent_poll_callback(struct kevent *k)
> +{
> + if (k->event.req_flags & KEVENT_REQ_LAST_CHECK) {
> + return 1;
> + } else {
> + struct file *file = k->st->origin;
> + unsigned int revents = file->f_op->poll(file, NULL);
> +
> + k->event.ret_data[0] = revents & k->event.event;
> + 
> + return (revents & k->event.event);
> + }
> +}

You need to be careful that file->f_op->poll is not called inside the 
spin_lock_irqsave/spin_lock_irqrestore pair, since (even this came up 
during epoll developemtn days) file->f_op->poll might do a simple 
spin_lock_irq/spin_unlock_irq. This unfortunate constrain forced epoll to 
have a suboptimal double O(R) loop to handle LT events.

- Davide

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] mark non-compiling ISA network drivers i386 only

2006-11-09 Thread Olaf Hering


Provide drivers for the old toys only on i386
isa_bus_to_virt is defined only on i386, mips and arm
isa_virt_to_bus is used for floppy.ko

Add missing '&& ISA_DMA_API' to NI52

WARNING: "isa_bus_to_virt" [drivers/net/ni65.ko] undefined!
WARNING: "isa_virt_to_bus" [drivers/net/ni65.ko] undefined!
WARNING: "isa_bus_to_virt" [drivers/net/ni52.ko] undefined!
WARNING: "isa_bus_to_virt" [drivers/net/lance.ko] undefined!
WARNING: "isa_virt_to_bus" [drivers/net/lance.ko] undefined!
WARNING: "isa_bus_to_virt" [drivers/net/3c515.ko] undefined!
WARNING: "isa_virt_to_bus" [drivers/net/3c515.ko] undefined!
WARNING: "isa_virt_to_bus" [drivers/net/3c505.ko] undefined!

I'm sure noone will miss the drivers.

Signed-off-by: Olaf Hering <[EMAIL PROTECTED]>

---
 drivers/net/Kconfig |   10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

Index: linux-2.6/drivers/net/Kconfig
===
--- linux-2.6.orig/drivers/net/Kconfig
+++ linux-2.6/drivers/net/Kconfig
@@ -616,7 +616,7 @@ config EL2
 
 config ELPLUS
tristate "3c505 \"EtherLink Plus\" support"
-   depends on NET_VENDOR_3COM && ISA && ISA_DMA_API
+   depends on NET_VENDOR_3COM && ISA && ISA_DMA_API && X86
---help---
  Information about this network (Ethernet) card can be found in
  .  If you have a card of
@@ -657,7 +657,7 @@ config EL3
 
 config 3C515
tristate "3c515 ISA \"Fast EtherLink\""
-   depends on NET_VENDOR_3COM && (ISA || EISA) && ISA_DMA_API
+   depends on NET_VENDOR_3COM && (ISA || EISA) && ISA_DMA_API && X86
help
  If you have a 3Com ISA EtherLink XL "Corkscrew" 3c515 Fast Ethernet
  network card, say Y and read the Ethernet-HOWTO, available from
@@ -735,7 +735,7 @@ config TYPHOON
 
 config LANCE
tristate "AMD LANCE and PCnet (AT1500 and NE2100) support"
-   depends on NET_ETHERNET && ISA && ISA_DMA_API
+   depends on NET_ETHERNET && ISA && ISA_DMA_API && X86
help
  If you have a network (Ethernet) card of this type, say Y and read
  the Ethernet-HOWTO, available from
@@ -918,7 +918,7 @@ config NI5010
 
 config NI52
tristate "NI5210 support"
-   depends on NET_VENDOR_RACAL && ISA
+   depends on NET_VENDOR_RACAL && ISA && ISA_DMA_API && X86
help
  If you have a network (Ethernet) card of this type, say Y and read
  the Ethernet-HOWTO, available from
@@ -930,7 +930,7 @@ config NI52
 
 config NI65
tristate "NI6510 support"
-   depends on NET_VENDOR_RACAL && ISA && ISA_DMA_API
+   depends on NET_VENDOR_RACAL && ISA && ISA_DMA_API && X86
help
  If you have a network (Ethernet) card of this type, say Y and read
  the Ethernet-HOWTO, available from
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: tg3_close question

2006-11-09 Thread Eric Lemoine

On 11/9/06, Michael Chan <[EMAIL PROTECTED]> wrote:

Eric Lemoine wrote:

> > On 11/9/06, Michael Chan <[EMAIL PROTECTED]> wrote:
> > > So it is not possible for tg3_poll() -> tg3_tx() to run any more
> > > after tg3_close() is called.
> >
> > But, while tg3_close() starts executing, an interrupt may
> > come in and
> > schedule polling (set  __LINK_STATE_RX_SCHED). So tg3_poll() ->
> > tg3_tx() may well occur.
>
> Actually I don't understand the purpose of having dev_close() wait for
> __LINK_STATE_RX_SCHED to be cleared. An interrupt may arrive at any
> time after it's cleared, and reset __LINK_STATE_RX_SCHED. Can someone
> explain please?
>

If netif_running() is cleared, netif_rx_schedule() will not schedule
the ->poll().  So even if tg3 gets an interrupt after close,
tg3_poll() will not be scheduled.

Oh! I had missed the netif_running() check in netif_rx_schedule_prep().

Thanks Michael and Maxime for this clarification.

--
Eric
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH wireless-2.6-git] prism54: WPA/RSN support for fullmac cards

2006-11-09 Thread chunkeey

Am Mittwoch, 8. November 2006 01:39 schrieben Sie:
> On Fri, Nov 03, 2006 at 01:41:46PM -0500, Luis R. Rodriguez wrote:
> > On 11/3/06, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:
> > >yes, especially mgt_commit_list caused alot headaches, until I removed
> > >DOT11_OID_PSM from the cache list.
> > >Now, I can "hammer" it with ping -f for hours.
> >
> > nice, perhaps that's been the culprit all along... going to dig to see
> > if I find a fullmac prism card. Will like to get this merged in.
>
> Any resolution on this?

no replies.
Seems like it works for just fine for everybody. ;)

Christian
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: tg3_close question

2006-11-09 Thread Maxime Bizon


On Thu, 2006-11-09 at 18:22 +0100, Eric Lemoine wrote:

> Actually I don't understand the purpose of having dev_close() wait for
> __LINK_STATE_RX_SCHED to be cleared. An interrupt may arrive at any
> time after it's cleared, and reset __LINK_STATE_RX_SCHED. Can someone
> explain please?

Further interrupts won't schedule poll because netif_rx_schedule_prep()
checks for netif_running().

-- 
Maxime
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: why do we mangle checksums for v6 ICMP?

2006-11-09 Thread Brian Haley


Hi Al,

Al Viro wrote:

AFAICS, the rules are:

(1) checksum is 16-bit one's complement of the one's complement sum of
relevant 16bit words.

(2) for v4 UDP all-zeroes has special meaning - no checksum; if you get
it from (1), send all-ones instead.

(3) for v6 UDP we have the same remapping as in (2), but all-zeroes has
different meaning - not "ignore checksum" as in v4, but "reject the
packet".

(4) there is no (4).

IOW, nobody except UDP has any business doing that 0->0x
replacement.  However, we have
   if (icmp6h->icmp6_cksum == 0)
   icmp6h->icmp6_cksum = -1;


This doesn't look necessary, RFCs 4443/2463 don't mention it being 
necessary, and BSD doesn't do it either.  I'll cook-up a patch to remove 
that since I was doing some other mods in that codepath.



and similar in net/ipv6/raw.c


Maybe here it only needs to be done if (fl->proto == IPPROTO_UDP)?

-Brian
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: tg3_close question

2006-11-09 Thread Michael Chan

Eric Lemoine wrote:

> > On 11/9/06, Michael Chan <[EMAIL PROTECTED]> wrote:
> > > So it is not possible for tg3_poll() -> tg3_tx() to run any more
> > > after tg3_close() is called.
> >
> > But, while tg3_close() starts executing, an interrupt may 
> > come in and
> > schedule polling (set  __LINK_STATE_RX_SCHED). So tg3_poll() ->
> > tg3_tx() may well occur.
> 
> Actually I don't understand the purpose of having dev_close() wait for
> __LINK_STATE_RX_SCHED to be cleared. An interrupt may arrive at any
> time after it's cleared, and reset __LINK_STATE_RX_SCHED. Can someone
> explain please?
> 

If netif_running() is cleared, netif_rx_schedule() will not schedule
the ->poll().  So even if tg3 gets an interrupt after close,
tg3_poll() will not be scheduled.

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/3] mlsxfrm: Various fixes

2006-11-09 Thread Paul Moore

James Morris wrote:
> On Thu, 9 Nov 2006, Paul Moore wrote:
> 
>>It sounds like you have an idea of how you would like to see this implemented,
>>can you give me a rough outline?  Is this the partitioned SECMARK field you
>>talked about earlier?
> 
> No, just the fact that you are in the same kernel address space and can 
> readily access the security context of the peer.

For a minute I got all excited thinking that you had found a solution to this :)

The problem I keep running into is that it is not obvious to me how we can
determine the security context of the sending socket on the receive side by
looking at the skb.  I'm really hoping that it is just because I haven't looked
at the code long enough, or thought about it hard enough.  It is just so
frustrating because you are right - all the information is there, I just don't
know how to get to it when we need it without using external labeling.

-- 
paul moore
linux security @ hp
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: tg3_close question

2006-11-09 Thread Eric Lemoine

On 11/9/06, Eric Lemoine <[EMAIL PROTECTED]> wrote:

On 11/9/06, Michael Chan <[EMAIL PROTECTED]> wrote:
> Michael Chan wrote:
>
> > Eric Lemoine wrote:
> >
> > > Instead of tg3_netif_stop() tg3_close() uses netif_stop_queue()
> > > to stop xmit. This doesn't seem right to me. E.g. another CPU
> > > in tg3_tx()
> > > could do netif_wake_queue() just after tg3_close() did
> > > netif_stop_queue(). Isn't a bug?
> > >
> >
> > I think you're right.  It is more correct to call tg3_netif_stop().
> >
>
> I take it back.  Before ->stop() is called, netif_running is cleared,
> and it waits for __LINK_STATE_RX_SCHED to be cleared.  This means
> that it will wait for the last ->poll() to finish.

Correct.

> So it is not possible for tg3_poll() -> tg3_tx() to run any more
> after tg3_close() is called.

But, while tg3_close() starts executing, an interrupt may come in and
schedule polling (set  __LINK_STATE_RX_SCHED). So tg3_poll() ->
tg3_tx() may well occur.

Actually I don't understand the purpose of having dev_close() wait for
__LINK_STATE_RX_SCHED to be cleared. An interrupt may arrive at any
time after it's cleared, and reset __LINK_STATE_RX_SCHED. Can someone
explain please?

Thanks,
--
Eric
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: tg3_close question

2006-11-09 Thread Eric Lemoine

On 11/9/06, Michael Chan <[EMAIL PROTECTED]> wrote:

Michael Chan wrote:

> Eric Lemoine wrote:
>
> > Instead of tg3_netif_stop() tg3_close() uses netif_stop_queue()
> > to stop xmit. This doesn't seem right to me. E.g. another CPU
> > in tg3_tx()
> > could do netif_wake_queue() just after tg3_close() did
> > netif_stop_queue(). Isn't a bug?
> >
>
> I think you're right.  It is more correct to call tg3_netif_stop().
>

I take it back.  Before ->stop() is called, netif_running is cleared,
and it waits for __LINK_STATE_RX_SCHED to be cleared.  This means
that it will wait for the last ->poll() to finish.

Correct.

So it is not possible for tg3_poll() -> tg3_tx() to run any more
after tg3_close() is called.

But, while tg3_close() starts executing, an interrupt may come in and
schedule polling (set  __LINK_STATE_RX_SCHED). So tg3_poll() ->
tg3_tx() may well occur.

--
Eric
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [take23 0/5] kevent: Generic event handling mechanism.

2006-11-09 Thread Davide Libenzi

On Thu, 9 Nov 2006, Eric Dumazet wrote:

> > > Lost forever means? If there are more processes watching some fd (external
> > > events), they all get their own copy of the events in their own private
> > > epoll fd. It's not that we "steal" things out of the kernel, is not a 1:1
> > > producer/consumer thing (one producer, 1 queue). It's one producer,
> > > broadcast to all listeners (consumers) thing. The only case where it'd
> > > matter is in the case of multiple threads sharing the same epoll fd.
> > 
> > In my particular epoll application, the producer is tcp stack, and I have
> > one consumer. If an network event is lost in the EFAULT handling, its lost
> > forever. In any case, my application do provide a correct user area, so this
> > problem is only theorical.
> 
> I realize I was not explicit, and dit not answer your question (Lost forever
> means ?)
> 
> if (epi->revents) {
> if (__put_user(epi->revents,
>&events[eventcnt].events) ||
> __put_user(epi->event.data,
>&events[eventcnt].data))
> return -EFAULT;
> >>if (epi->event.events & EPOLLONESHOT)
> >>epi->event.events &= EP_PRIVATE_BITS;
> eventcnt++;
> }
> 
> If one EPOLLONESHOT event is correctly copied to user space, its status is
> updated.
> 
> If other ready events in the same epoll_wait() call cannot be transferred
> because of an EFAULT (we reach the real end of user provided area), this
> EPOLLONESHOT event is lost forever, because it wont be requeued in ready list.

Your application is feeding crap to the kernel, because of programming 
bugs. If that happens, I want an EFAULT and not a partially filled buffer. 
And which buffer then? This could have been scribbled in userspace memory 
(the pointer), and the try of the kernel to mask out bugs might create 
even more subtle problems. Such bug will *never* show up in the up in case 
the wrong buffer is partially valid (first part, that is the *only* case 
where your fix would make a difference compared to the status quo), since 
in case of no ready events we'll never hit it, and in case of some events 
we'll always return few of them and never EFAULT. No, the more I think 
about it, the more I personally disagree with the change.

> Please dont slow the hot path for a basically "User Error". It's already 
> tested in the transfert function, with two conditional 
> branches for each transfered event.

Ohh, if you think you can measure them from userspace, those can be turned 
in 'err |= __put_user();' with err tested only out of the loop.

- Davide

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Intel 82559 NIC corrupted EEPROM

2006-11-09 Thread Auke Kok


John wrote:

Auke Kok wrote:

This is what I was afraid of: even though the code allows you to 
bypass the EEPROM checksum, the probe fails on a further check to see 
if the MAC address is valid.


Since something with this NIC specifically made the EEPROM return all 
0xff's, the MAC address is automatically invalid, and thus probe fails.


I don't understand why you think there is something wrong with a
specific NIC?


that was completely not my point - I was merely trying to point out that the original 
problem causes a cascade of error events later on, and bypassing the eeprom check in 
this case didn't help you at all. Something is wrong in the driver, but I don't 
understand yet why it only affects one of the 3 nics in your system.



In 2.6.14.7, e100.ko fails to read the EEPROM on :00:08.0 (eth0)
In 2.6.18.1, e100.ko fails to read the EEPROM on :00:09.0 (eth1)


almost sounds like a bug got fixed and it introduced a regression. this wouldn't be the 
right time to pull out git-bisect would it? even loading 2.6.15, 2.6.16, 2.6.17 on it 
would give us some good information.



Cheers,

Auke
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [IPROUTE2] Add support for inverted selectors

2006-11-09 Thread Stephen Hemminger


added
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: tg3_close question

2006-11-09 Thread Michael Chan

Michael Chan wrote:

> Eric Lemoine wrote:
> 
> > Instead of tg3_netif_stop() tg3_close() uses netif_stop_queue()
> > to stop xmit. This doesn't seem right to me. E.g. another CPU 
> > in tg3_tx()
> > could do netif_wake_queue() just after tg3_close() did
> > netif_stop_queue(). Isn't a bug?
> > 
> 
> I think you're right.  It is more correct to call tg3_netif_stop().
> 

I take it back.  Before ->stop() is called, netif_running is cleared,
and it waits for __LINK_STATE_RX_SCHED to be cleared.  This means
that it will wait for the last ->poll() to finish.

So it is not possible for tg3_poll() -> tg3_tx() to run any more
after tg3_close() is called.

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: tg3_close question

2006-11-09 Thread Michael Chan

Eric Lemoine wrote:

> Instead of tg3_netif_stop() tg3_close() uses netif_stop_queue() to
> stop xmit. This doesn't seem right to me. E.g. another CPU in tg3_tx()
> could do netif_wake_queue() just after tg3_close() did
> netif_stop_queue(). Isn't a bug?
> 

I think you're right.  It is more correct to call tg3_netif_stop().

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCHSET] packet mark & fib rules work

2006-11-09 Thread Steven Whitehouse

Hi,

On Thu, Nov 09, 2006 at 01:49:11PM +0100, Thomas Graf wrote:
> * Steven Whitehouse <[EMAIL PROTECTED]> 2006-11-09 11:46
> > 
> > so far as all the DECnet bits go. One question though... will you
> > be adding later (as your slide #5 and #11 from your netconf presentation
> > appear to imply) a way to set the mark from the routing table (presumably
> > included in the nexthop info) ?
> 
> So far I haven't planned this, slide #11 describes that if I add an
> address with a given mark the corresponding route will only apply
> to packets with a matching mark. Slide #5 shows the idea of an ingress
> classifier/action setting the mark field based on iif. I focus on
> selecting routes based on marks, not the other way around but its
> certainly a intersting idea if you can elaborate it further.

So here is roughly what I was thinking... this comes from having
spent a little while thinking about the best way to integrate
MPLS into the network stack. An MPLS label is 32 bits in size
which conviently matches the size of the packet mark.

So one thought was this (for MPLS edge routers). Add the ability to
set a mark to the IP routing table. Something along the lines of:

/sbin/ip route add 10.1.0.0/16 via 10.2.1.1 dev eth0 setmark 6

and then use the mark as the FEC (forwarding equivalence class)
for MPLS (which is just an index, but in simple cases could
contain a whole MPLS label). I was hoping that it might be possible
to use the xfrm infrastructure to deal with the actual application
of MPLS labels, but I'm not yet 100% certain that its a good fit.

Either way, MPLS will require some kind of way to indicate the FEC
for each route, so using the generic mark like this seems to me
a reasonable solution on the basis that other uses might then be found for
it as well.

Since MPLS labels are only a subset of the full 32 bits, being able
to use a mask in conjunction with setting the mark might also be
a useful feature, so that the logic (pseudo code) after route lookup
might look something like:

skb->mark &= ~nh->nh_setmask;
skb->mark |= nh->nh_setmark; /* Assume mark only sets bits allowed by mask */

The big question being, is this going to be a problem bearing in mind
it would appear in the routing fast path?

On the MPLS input side, packet marks would be set according to the
incoming MPLS label and then work in just the same way that you propose
using the marks to create separate routing for different VLANs for
example.

If people are generally happy with the idea, and since its not already
part of your plans, then I'll try and put a patch together for it
in the not too distant future,

Steve.

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Realtek 8139 driver (8139too.c) TX Timeout doesn't allow interrupt handler to disable receive interrupts at high bi-directional traffic

2006-11-09 Thread Basheer, Mansoor Ahamed

Hi All

I found an issue with Realtek 8139 driver (2.6.10 branch) at high
bi-directional traffic. On transmit timeout, driver's timeout callback
re-enables the receive interrupt. On the next receive interrupt, the ISR
disables the receive interrupt "only" when the receive poll task is not
active. But the poll task is actually active and hence it doesn't
disable the receive interrupt. So the ISR returns without clearing the
receive interrupt. This un-serviced receive interrupt brings the system
into hung state.

My understanding here is, on receive interrupt the ISR should disable
the receive interrupt irrespective of the polling task's state (active
or inactive). I changed the code (as shown below) and it works
perfectly. 

Is this a known issue? If so, is there a fix already available?

Also, we get frequent TX timeouts during high rate of traffic. What
could be the reason for this frequent TX timeouts?
 

--- old/8139too.c   2006-11-09 11:49:25.0 +0530
+++ new/8139too.c   2006-11-09 11:50:02.0 +0530
@@ -2200,8 +2200,8 @@
/* Receive packets are processed by poll routine.
   If not running start it now. */
if (status & RxAckBits){
-   if (netif_rx_schedule_prep(dev)) {
RTL_W16_F (IntrMask, rtl8139_norx_intr_mask);
+   if (netif_rx_schedule_prep(dev)) {
__netif_rx_schedule (dev);
}
}


Thanks
Mansoor
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Intel 82559 NIC corrupted EEPROM

2006-11-09 Thread John


Jesse Brandeburg wrote:


I suspect that one reason Becker's code works is that it uses IO
based access (slower, and different method) to the adapter rather
than memory mapped access.


I've noticed this difference.


The second thought is that the adapter is in D3, and something about
your kernel or the driver doesn't successfully wake it up to D0.


On my NICs, the EEPROM ID (Word 0Ah) is set to 0x40a2.
Thus DDPD (bit 6) is set to 0.

DDPD is the "Disable Deep Power Down while PME is disabled" bit.
0 - Deep Power Down is enabled in D3 state while PME-disabled.
1 - Deep Power Down disabled in D3 state while PME-disabled.
This bit should be set to 1b if a TCO controller is being used via the 
SMB because it requires receive functionality at all power states.


Are you suggesting I try and set DDPD to 1?
Or is this completely unrelated?


An indication of this would be looking at lspci -vv before/after
loading the driver.


$ diff -u lspci_vv_before_e100.txt lspci_vv_after_e100.txt
--- lspci_vv_before_e100.txt2006-11-09 14:51:30.0 +0100
+++ lspci_vv_after_e100.txt 2006-11-09 14:51:30.0 +0100
@@ -74,21 +74,20 @@
Expansion ROM at 2000 [disabled] [size=1M]
Capabilities: [dc] Power Management version 2
Flags: PMEClk- DSI+ D1+ D2+ AuxCurrent=0mA 
PME(D0+,D1+,D2+,D3hot+,D3cold+)

-   Status: D0 PME-Enable+ DSel=0 DScale=2 PME-
+   Status: D0 PME-Enable- DSel=0 DScale=2 PME-

 00:09.0 Ethernet controller: Intel Corporation 82557/8/9 [Ethernet Pro 
100] (rev 08)

Subsystem: Intel Corporation EtherExpress PRO/100B (TX)
-   Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- 
ParErr- Stepping- SERR- FastB2B-
+   Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- 
ParErr- Stepping- SERR- FastB2B-
Status: Cap+ 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium 
>TAbort- SERR- 
-   Latency: 32 (2000ns min, 14000ns max), cache line size 08
Interrupt: pin A routed to IRQ 10
-   Region 0: Memory at e5302000 (32-bit, non-prefetchable) [size=4K]
-   Region 1: I/O ports at dc00 [size=64]
-   Region 2: Memory at e510 (32-bit, non-prefetchable) [size=1M]
+   Region 0: Memory at e5302000 (32-bit, non-prefetchable) 
[disabled] [size=4K]

+   Region 1: I/O ports at dc00 [disabled] [size=64]
+   Region 2: Memory at e510 (32-bit, non-prefetchable) 
[disabled] [size=1M]

Expansion ROM at 2010 [disabled] [size=1M]
Capabilities: [dc] Power Management version 2
Flags: PMEClk- DSI+ D1+ D2+ AuxCurrent=0mA 
PME(D0+,D1+,D2+,D3hot+,D3cold+)

-   Status: D0 PME-Enable+ DSel=0 DScale=2 PME-
+   Status: D0 PME-Enable- DSel=0 DScale=2 PME-

 00:0a.0 Ethernet controller: Intel Corporation 82557/8/9 [Ethernet Pro 
100] (rev 08)

Subsystem: Intel Corporation EtherExpress PRO/100B (TX)


Also, after loading/unloading eepro100 does the e100 driver work?


No.


A third idea is look for a master abort in lspci after e100 fails to
load.


I don't understand that one.

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [PATCH 0/3] labeled-ipsec: Repost patchset with updates [Originally: mlsxfrm: Various Fixes]

2006-11-09 Thread Venkat Yekkirala

> I think this should be aimed at 2.6.20, because we are at the last or 
> second-last -rc currently, and I don't think these fixes are 
> urgent enough 
> to justify the risk at this stage.

That makes sense. Thanks.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH take 2] Atmel MACB ethernet driver

2006-11-09 Thread Haavard Skinnemoen

Driver for the Atmel MACB on-chip ethernet module.

Tested on AVR32/AT32AP7000/ATSTK1000. I've heard rumours that it works
with AT91SAM9260 as well, and it may be possible to share some code with
the at91_ether driver for AT91RM9200.

Hardware documentation can be found in the AT32AP7000 data sheet,
which can be downloaded from

http://www.atmel.com/dyn/products/datasheets.asp?family_id=682

Changes since previous version:
  * Probe for PHY ID instead of depending on it being provided through
platform_data.
  * Grab initial ethernet address from the MACB registers instead
of depending on platform_data.
  * Set MII/RMII mode correctly.

These changes are mostly about making the driver more compatible with
the at91 infrastructure.

Signed-off-by: Haavard Skinnemoen <[EMAIL PROTECTED]>
---
 MAINTAINERS  |7 +
 drivers/net/Kconfig  |   11 +
 drivers/net/Makefile |2 +
 drivers/net/macb.c   | 1210 ++
 drivers/net/macb.h   |  387 
 5 files changed, 1617 insertions(+), 0 deletions(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index d708702..b8c28b5 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -426,6 +426,13 @@ L: [EMAIL PROTECTED]
 W: http://linux-atm.sourceforge.net
 S: Maintained
 
+ATMEL MACB ETHERNET DRIVER
+P: Atmel AVR32 Support Team
+M: [EMAIL PROTECTED]
+P: Haavard Skinnemoen
+M: [EMAIL PROTECTED]
+S: Supported
+
 ATMEL WIRELESS DRIVER
 P: Simon Kelley
 M: [EMAIL PROTECTED]
diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
index 9cb3ca5..4e033b1 100644
--- a/drivers/net/Kconfig
+++ b/drivers/net/Kconfig
@@ -188,6 +188,17 @@ config MII
  or internal device.  It is safe to say Y or M here even if your
  ethernet card lack MII.
 
+config MACB
+   tristate "Atmel MACB support"
+   depends on NET_ETHERNET && AVR32
+   select MII
+   help
+ The Atmel MACB ethernet interface is found on many AT32 and AT91
+ parts. Say Y to include support for the MACB chip.
+
+ To compile this driver as a module, choose M here: the module
+ will be called macb.
+
 source "drivers/net/arm/Kconfig"
 
 config MACE
diff --git a/drivers/net/Makefile b/drivers/net/Makefile
index f270bc4..8e67697 100644
--- a/drivers/net/Makefile
+++ b/drivers/net/Makefile
@@ -197,6 +197,8 @@ obj-$(CONFIG_SMC911X) += smc911x.o
 obj-$(CONFIG_DM9000) += dm9000.o
 obj-$(CONFIG_FEC_8XX) += fec_8xx/
 
+obj-$(CONFIG_MACB) += macb.o
+
 obj-$(CONFIG_ARM) += arm/
 obj-$(CONFIG_DEV_APPLETALK) += appletalk/
 obj-$(CONFIG_TR) += tokenring/
diff --git a/drivers/net/macb.c b/drivers/net/macb.c
new file mode 100644
index 000..bd0ce98
--- /dev/null
+++ b/drivers/net/macb.c
@@ -0,0 +1,1210 @@
+/*
+ * Atmel MACB Ethernet Controller driver
+ *
+ * Copyright (C) 2004-2006 Atmel Corporation
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+
+#include "macb.h"
+
+#define to_net_dev(class) container_of(class, struct net_device, class_dev)
+
+#define RX_BUFFER_SIZE 128
+#define RX_RING_SIZE   512
+#define RX_RING_BYTES  (sizeof(struct dma_desc) * RX_RING_SIZE)
+
+/* Make the IP header word-aligned (the ethernet header is 14 bytes) */
+#define RX_OFFSET  2
+
+#define TX_RING_SIZE   128
+#define DEF_TX_RING_PENDING(TX_RING_SIZE - 1)
+#define TX_RING_BYTES  (sizeof(struct dma_desc) * TX_RING_SIZE)
+
+#define TX_RING_GAP(bp)\
+   (TX_RING_SIZE - (bp)->tx_pending)
+#define TX_BUFFS_AVAIL(bp) \
+   (((bp)->tx_tail <= (bp)->tx_head) ? \
+(bp)->tx_tail + (bp)->tx_pending - (bp)->tx_head : \
+(bp)->tx_tail - (bp)->tx_head - TX_RING_GAP(bp))
+#define NEXT_TX(n) (((n) + 1) & (TX_RING_SIZE - 1))
+
+#define NEXT_RX(n) (((n) + 1) & (RX_RING_SIZE - 1))
+
+/* minimum number of free TX descriptors before waking up TX process */
+#define MACB_TX_WAKEUP_THRESH  (TX_RING_SIZE / 4)
+
+#define MACB_RX_INT_FLAGS  (MACB_BIT(RCOMP) | MACB_BIT(RXUBR)  \
+| MACB_BIT(ISR_ROVR))
+
+static void __macb_set_hwaddr(struct macb *bp)
+{
+   u32 bottom;
+   u16 top;
+
+   bottom = cpu_to_le32(*((u32 *)bp->dev->dev_addr));
+   macb_writel(bp, SA1B, bottom);
+   top = cpu_to_le16(*((u16 *)(bp->dev->dev_addr + 4)));
+   macb_writel(bp, SA1T, top);
+}
+
+static void __init macb_get_hwaddr(struct macb *bp)
+{
+   u32 bottom;
+   u16 top;
+   u8 addr[6];
+
+   bottom = macb_readl(bp, SA1B);
+   top = macb_readl(bp, SA1T);
+

tg3_close question

2006-11-09 Thread Eric Lemoine


Hi

Instead of tg3_netif_stop() tg3_close() uses netif_stop_queue() to
stop xmit. This doesn't seem right to me. E.g. another CPU in tg3_tx()
could do netif_wake_queue() just after tg3_close() did
netif_stop_queue(). Isn't a bug?

Thanks,
--
Eric
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/6] [NET]: Rethink mark field in struct flowi

2006-11-09 Thread Thomas Graf

* Eric Dumazet <[EMAIL PROTECTED]> 2006-11-09 14:23
> I give a big NACK to this patch.
> 
> By moving fwmark outside of union, you basically touch more cache lines in 
> lookups. I have many machines doing XX.XXX of lookups per second, with long 
> chains, already using 10% of CPU. I am sure a lot of other machines would 
> suffer with this patch, especially machines with 32 bytes cache lines.
> 
> For IPV4 lookups, compare offset of fwmark before your patch and after.
> The size of ip6_u is so large that moving fwmark after nl_u union is not an 
> option. Many packets in flight on the Internet are still IPV4.

Would you be happy if mark is moved in front of the union after iif?
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/6] [NET]: Rethink mark field in struct flowi

2006-11-09 Thread Eric Dumazet

On Thursday 09 November 2006 12:27, Thomas Graf wrote:
> Now that all protocols have been made aware of the mark
> field it can be moved out of the union thus simplyfing
> its usage.
>
> The config options in the IPv4/IPv6/DECnet subsystems
> to enable respectively disable mark based routing only
> obfuscate the code with ifdefs, the cost for the
> additional comparison in the flow key is insignificant,
> and most distributions have all these options enabled
> by default anyway. Therefore it makes sense to remove
> the config options and enable mark based routing by
> default.

I give a big NACK to this patch.

By moving fwmark outside of union, you basically touch more cache lines in 
lookups. I have many machines doing XX.XXX of lookups per second, with long 
chains, already using 10% of CPU. I am sure a lot of other machines would 
suffer with this patch, especially machines with 32 bytes cache lines.

For IPV4 lookups, compare offset of fwmark before your patch and after.
The size of ip6_u is so large that moving fwmark after nl_u union is not an 
option. Many packets in flight on the Internet are still IPV4.

If you think code is obfuscated, you can make it more readable using macros 
defined in include files, and used in C file without ifdefs.

Thank you
Eric
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Turn nfmark into generic mark

2006-11-09 Thread Meelis Roos

> The mark is already a bitfield, you may dividide it into separate
> marks with the exception of routes which do not yet support a
> mask.

Just checked, now that we have --and-mask and --or-mask, this is much 
better than before.

The bitmask is OK when up to 32 marks are needed (like, for 
classification). But a common setup is NAT+QoS that first hides the src 
IP and then has to do QoS and mark is the only usable carrier of this 
information. So the mark value needs to carry both classification info 
and IP address info and here things become very limited. Though using 
say 8 bits for host should be usually enough...

Maybe just add original src and/ord DST for carrying this information 
through SNAT/DNAT? Or is it too much bloat for carrying around?

-- 
Meelis Roos ([EMAIL PROTECTED])
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCHSET] packet mark & fib rules work

2006-11-09 Thread Thomas Graf

* Steven Whitehouse <[EMAIL PROTECTED]> 2006-11-09 11:46
> On Thu, Nov 09, 2006 at 12:27:35PM +0100, Thomas Graf wrote:
> > Renames nfmark to mark and remove the dependency on netfilter
> > to ease usage by all subsystems. Also removes all the unneeded
> > config options to enable routing by fwmark, it can be safely
> > enabled by default.
> > 
> > Moves mark selector code from per protocol part into the generic
> > part and adds support for inverting selectors.
> > 
> 
> Acked-by: Steven Whitehouse <[EMAIL PROTECTED]>
> 
> so far as all the DECnet bits go. One question though... will you
> be adding later (as your slide #5 and #11 from your netconf presentation
> appear to imply) a way to set the mark from the routing table (presumably
> included in the nexthop info) ?

So far I haven't planned this, slide #11 describes that if I add an
address with a given mark the corresponding route will only apply
to packets with a matching mark. Slide #5 shows the idea of an ingress
classifier/action setting the mark field based on iif. I focus on
selecting routes based on marks, not the other way around but its
certainly a intersting idea if you can elaborate it further.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Turn nfmark into generic mark

2006-11-09 Thread Thomas Graf

* Meelis Roos <[EMAIL PROTECTED]> 2006-11-09 14:32
> Another thought: sometimes a single mark makes rulesets inconvenient.
> What about several independent marks on a packet?

The mark is already a bitfield, you may dividide it into separate
marks with the exception of routes which do not yet support a
mask.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Turn nfmark into generic mark

2006-11-09 Thread Meelis Roos

Another thought: sometimes a single mark makes rulesets inconvenient.
What about several independent marks on a packet?

-- 
Meelis Roos <[EMAIL PROTECTED]>
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Intel 82559 NIC corrupted EEPROM

2006-11-09 Thread John


Auke Kok wrote:

This is what I was afraid of: even though the code allows you to bypass 
the EEPROM checksum, the probe fails on a further check to see if the 
MAC address is valid.


Since something with this NIC specifically made the EEPROM return all 
0xff's, the MAC address is automatically invalid, and thus probe fails.


I don't understand why you think there is something wrong with a
specific NIC?

In 2.6.14.7, e100.ko fails to read the EEPROM on :00:08.0 (eth0)
In 2.6.18.1, e100.ko fails to read the EEPROM on :00:09.0 (eth1)
In both kernels, eepro100.ko successfully reads all the EEPROMs.

It seems that the driver has more problems with this NIC than just the 
eeprom checksum being bad. Needless to say this might need fixing.


Can you load the eepro driver and send me the full eeprom dump?
Perhaps I can duplicate things over here.


00:08.0 EEPROM contents, size 64x16

  3000 0464 e4e6 0e03  0201 4701 
  7213 8310 40a2 0001 8086   
         
         
         
         
  0128       
         92f7

00:09.0 EEPROM contents, size 64x16

  3000 0464 e5e6 0e03  0201 4701 
  7213 8310 40a2 0001 8086   
         
         
         
         
  0128       
         91f7

00:0a.0 EEPROM contents, size 64x16

  3000 0464 e6e6 0e03  0201 4701 
  7213 8310 40a2 0001 8086   
         
         
         
         
  0128       
         90f7
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCHSET] packet mark & fib rules work

2006-11-09 Thread Steven Whitehouse

Hi,

On Thu, Nov 09, 2006 at 12:27:35PM +0100, Thomas Graf wrote:
> Renames nfmark to mark and remove the dependency on netfilter
> to ease usage by all subsystems. Also removes all the unneeded
> config options to enable routing by fwmark, it can be safely
> enabled by default.
> 
> Moves mark selector code from per protocol part into the generic
> part and adds support for inverting selectors.
> 

Acked-by: Steven Whitehouse <[EMAIL PROTECTED]>

so far as all the DECnet bits go. One question though... will you
be adding later (as your slide #5 and #11 from your netconf presentation
appear to imply) a way to set the mark from the routing table (presumably
included in the nexthop info) ?

Steve.

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[IPROUTE2] Add support for inverted selectors

2006-11-09 Thread Thomas Graf

Index: iproute2.git/include/linux/fib_rules.h
===
--- /dev/null   1970-01-01 00:00:00.0 +
+++ iproute2.git/include/linux/fib_rules.h  2006-11-09 11:48:07.0 
+0100
@@ -0,0 +1,66 @@
+#ifndef __LINUX_FIB_RULES_H
+#define __LINUX_FIB_RULES_H
+
+#include 
+#include 
+
+/* rule is permanent, and cannot be deleted */
+#define FIB_RULE_PERMANENT 1
+#define FIB_RULE_INVERT2
+
+struct fib_rule_hdr
+{
+   __u8family;
+   __u8dst_len;
+   __u8src_len;
+   __u8tos;
+
+   __u8table;
+   __u8res1;   /* reserved */
+   __u8res2;   /* reserved */
+   __u8action;
+
+   __u32   flags;
+};
+
+enum
+{
+   FRA_UNSPEC,
+   FRA_DST,/* destination address */
+   FRA_SRC,/* source address */
+   FRA_IFNAME, /* interface name */
+   FRA_UNUSED1,
+   FRA_UNUSED2,
+   FRA_PRIORITY,   /* priority/preference */
+   FRA_UNUSED3,
+   FRA_UNUSED4,
+   FRA_UNUSED5,
+   FRA_FWMARK, /* mark */
+   FRA_FLOW,   /* flow/class id */
+   FRA_UNUSED6,
+   FRA_UNUSED7,
+   FRA_UNUSED8,
+   FRA_TABLE,  /* Extended table id */
+   FRA_FWMASK, /* mask for netfilter mark */
+   __FRA_MAX
+};
+
+#define FRA_MAX (__FRA_MAX - 1)
+
+enum
+{
+   FR_ACT_UNSPEC,
+   FR_ACT_TO_TBL,  /* Pass to fixed table */
+   FR_ACT_RES1,
+   FR_ACT_RES2,
+   FR_ACT_RES3,
+   FR_ACT_RES4,
+   FR_ACT_BLACKHOLE,   /* Drop without notification */
+   FR_ACT_UNREACHABLE, /* Drop with ENETUNREACH */
+   FR_ACT_PROHIBIT,/* Drop with EACCES */
+   __FR_ACT_MAX,
+};
+
+#define FR_ACT_MAX (__FR_ACT_MAX - 1)
+
+#endif
Index: iproute2.git/ip/iprule.c
===
--- iproute2.git.orig/ip/iprule.c   2006-11-09 11:46:20.0 +0100
+++ iproute2.git/ip/iprule.c2006-11-09 11:51:35.0 +0100
@@ -24,6 +24,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "rt_names.h"
 #include "utils.h"
@@ -36,7 +37,7 @@
 static void usage(void)
 {
fprintf(stderr, "Usage: ip rule [ list | add | del | flush ] SELECTOR 
ACTION\n");
-   fprintf(stderr, "SELECTOR := [ from PREFIX ] [ to PREFIX ] [ tos TOS ] 
[ fwmark FWMARK ]\n");
+   fprintf(stderr, "SELECTOR := [ not ] [ from PREFIX ] [ to PREFIX ] [ 
tos TOS ] [ fwmark FWMARK ]\n");
fprintf(stderr, "[ dev STRING ] [ pref NUMBER ]\n");
fprintf(stderr, "ACTION := [ table TABLE_ID ]\n");
fprintf(stderr, "  [ prohibit | reject | unreachable ]\n");
@@ -80,6 +81,9 @@
else
fprintf(fp, "0:\t");
 
+   if (r->rtm_flags & FIB_RULE_INVERT)
+   fprintf(fp, "not ");
+
if (tb[RTA_SRC]) {
if (r->rtm_src_len != host_len) {
fprintf(fp, "from %s/%u ", rt_addr_n2a(r->rtm_family,
@@ -209,6 +213,7 @@
req.r.rtm_scope = RT_SCOPE_UNIVERSE;
req.r.rtm_table = 0;
req.r.rtm_type = RTN_UNSPEC;
+   req.r.rtm_flags = 0;
 
if (cmd == RTM_NEWRULE) {
req.n.nlmsg_flags |= NLM_F_CREATE|NLM_F_EXCL;
@@ -216,7 +221,9 @@
}
 
while (argc > 0) {
-   if (strcmp(*argv, "from") == 0) {
+   if (strcmp(*argv, "not") == 0) {
+   req.r.rtm_flags |= FIB_RULE_INVERT;
+   } else if (strcmp(*argv, "from") == 0) {
inet_prefix dst;
NEXT_ARG();
get_prefix(&dst, *argv, req.r.rtm_family);
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 6/6] [NET] rules: Add support to invert selectors

2006-11-09 Thread Thomas Graf

Introduces a new flag FIB_RULE_INVERT causing rules to apply
if the specified selector doesn't match.

Signed-off-by: Thomas Graf <[EMAIL PROTECTED]>

Index: net-2.6.20/include/linux/fib_rules.h
===
--- net-2.6.20.orig/include/linux/fib_rules.h   2006-11-08 23:32:35.0 
+0100
+++ net-2.6.20/include/linux/fib_rules.h2006-11-08 23:34:13.0 
+0100
@@ -6,6 +6,7 @@
 
 /* rule is permanent, and cannot be deleted */
 #define FIB_RULE_PERMANENT 1
+#define FIB_RULE_INVERT2
 
 struct fib_rule_hdr
 {
Index: net-2.6.20/net/core/fib_rules.c
===
--- net-2.6.20.orig/net/core/fib_rules.c2006-11-08 23:32:35.0 
+0100
+++ net-2.6.20/net/core/fib_rules.c 2006-11-08 23:34:51.0 +0100
@@ -107,6 +107,22 @@
 
 EXPORT_SYMBOL_GPL(fib_rules_unregister);
 
+static int fib_rule_match(struct fib_rule *rule, struct fib_rules_ops *ops,
+ struct flowi *fl, int flags)
+{
+   int ret = 0;
+
+   if (rule->ifindex && (rule->ifindex != fl->iif))
+   goto out;
+
+   if ((rule->mark ^ fl->mark) & rule->mark_mask)
+   goto out;
+
+   ret = ops->match(rule, fl, flags);
+out:
+   return (rule->flags & FIB_RULE_INVERT) ? !ret : ret;
+}
+
 int fib_rules_lookup(struct fib_rules_ops *ops, struct flowi *fl,
 int flags, struct fib_lookup_arg *arg)
 {
@@ -116,13 +132,7 @@
rcu_read_lock();
 
list_for_each_entry_rcu(rule, ops->rules_list, list) {
-   if (rule->ifindex && (rule->ifindex != fl->iif))
-   continue;
-
-   if ((rule->mark ^ fl->mark) & rule->mark_mask)
-   continue;
-
-   if (!ops->match(rule, fl, flags))
+   if (!fib_rule_match(rule, ops, fl, flags))
continue;
 
err = ops->action(rule, fl, flags, arg);

--

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 5/6] [NET] rules: Share common attribute validation policy

2006-11-09 Thread Thomas Graf

Move the attribute policy for the non-specific attributes into
net/fib_rules.h and include it in the respective protocols.

Signed-off-by: Thomas Graf <[EMAIL PROTECTED]>

Index: net-2.6.20/include/net/fib_rules.h
===
--- net-2.6.20.orig/include/net/fib_rules.h 2006-11-08 23:32:35.0 
+0100
+++ net-2.6.20/include/net/fib_rules.h  2006-11-08 23:33:21.0 +0100
@@ -59,6 +59,13 @@
struct module   *owner;
 };
 
+#define FRA_GENERIC_POLICY \
+   [FRA_IFNAME]= { .type = NLA_STRING, .len = IFNAMSIZ - 1 }, \
+   [FRA_PRIORITY]  = { .type = NLA_U32 }, \
+   [FRA_FWMARK]= { .type = NLA_U32 }, \
+   [FRA_FWMASK]= { .type = NLA_U32 }, \
+   [FRA_TABLE] = { .type = NLA_U32 }
+
 static inline void fib_rule_get(struct fib_rule *rule)
 {
atomic_inc(&rule->refcnt);
Index: net-2.6.20/net/decnet/dn_rules.c
===
--- net-2.6.20.orig/net/decnet/dn_rules.c   2006-11-08 23:32:35.0 
+0100
+++ net-2.6.20/net/decnet/dn_rules.c2006-11-08 23:33:21.0 +0100
@@ -108,13 +108,9 @@
 }
 
 static struct nla_policy dn_fib_rule_policy[FRA_MAX+1] __read_mostly = {
-   [FRA_IFNAME]= { .type = NLA_STRING, .len = IFNAMSIZ - 1 },
-   [FRA_PRIORITY]  = { .type = NLA_U32 },
+   FRA_GENERIC_POLICY,
[FRA_SRC]   = { .type = NLA_U16 },
[FRA_DST]   = { .type = NLA_U16 },
-   [FRA_FWMARK]= { .type = NLA_U32 },
-   [FRA_FWMASK]= { .type = NLA_U32 },
-   [FRA_TABLE] = { .type = NLA_U32 },
 };
 
 static int dn_fib_rule_match(struct fib_rule *rule, struct flowi *fl, int 
flags)
Index: net-2.6.20/net/ipv4/fib_rules.c
===
--- net-2.6.20.orig/net/ipv4/fib_rules.c2006-11-08 23:32:35.0 
+0100
+++ net-2.6.20/net/ipv4/fib_rules.c 2006-11-08 23:33:21.0 +0100
@@ -170,14 +170,10 @@
 }
 
 static struct nla_policy fib4_rule_policy[FRA_MAX+1] __read_mostly = {
-   [FRA_IFNAME]= { .type = NLA_STRING, .len = IFNAMSIZ - 1 },
-   [FRA_PRIORITY]  = { .type = NLA_U32 },
+   FRA_GENERIC_POLICY,
[FRA_SRC]   = { .type = NLA_U32 },
[FRA_DST]   = { .type = NLA_U32 },
-   [FRA_FWMARK]= { .type = NLA_U32 },
-   [FRA_FWMASK]= { .type = NLA_U32 },
[FRA_FLOW]  = { .type = NLA_U32 },
-   [FRA_TABLE] = { .type = NLA_U32 },
 };
 
 static int fib4_rule_configure(struct fib_rule *rule, struct sk_buff *skb,
Index: net-2.6.20/net/ipv6/fib6_rules.c
===
--- net-2.6.20.orig/net/ipv6/fib6_rules.c   2006-11-08 23:32:35.0 
+0100
+++ net-2.6.20/net/ipv6/fib6_rules.c2006-11-08 23:33:21.0 +0100
@@ -130,13 +130,9 @@
 }
 
 static struct nla_policy fib6_rule_policy[FRA_MAX+1] __read_mostly = {
-   [FRA_IFNAME]= { .type = NLA_STRING, .len = IFNAMSIZ - 1 },
-   [FRA_PRIORITY]  = { .type = NLA_U32 },
+   FRA_GENERIC_POLICY,
[FRA_SRC]   = { .len = sizeof(struct in6_addr) },
[FRA_DST]   = { .len = sizeof(struct in6_addr) },
-   [FRA_FWMARK]= { .type = NLA_U32 },
-   [FRA_FWMASK]= { .type = NLA_U32 },
-   [FRA_TABLE] = { .type = NLA_U32 },
 };
 
 static int fib6_rule_configure(struct fib_rule *rule, struct sk_buff *skb,

--

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCHSET] packet mark & fib rules work

2006-11-09 Thread Thomas Graf

Renames nfmark to mark and remove the dependency on netfilter
to ease usage by all subsystems. Also removes all the unneeded
config options to enable routing by fwmark, it can be safely
enabled by default.

Moves mark selector code from per protocol part into the generic
part and adds support for inverting selectors.

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 4/6] [NET] rules: Protocol independant mark selector

2006-11-09 Thread Thomas Graf

Move mark selector currently implemented per protocol into
the protocol independant part.

Signed-off-by: Thomas Graf <[EMAIL PROTECTED]>

Index: net-2.6.20/include/net/fib_rules.h
===
--- net-2.6.20.orig/include/net/fib_rules.h 2006-11-08 15:29:26.0 
+0100
+++ net-2.6.20/include/net/fib_rules.h  2006-11-08 23:32:35.0 +0100
@@ -13,6 +13,8 @@
atomic_trefcnt;
int ifindex;
charifname[IFNAMSIZ];
+   u32 mark;
+   u32 mark_mask;
u32 pref;
u32 flags;
u32 table;
Index: net-2.6.20/net/core/fib_rules.c
===
--- net-2.6.20.orig/net/core/fib_rules.c2006-11-08 15:29:26.0 
+0100
+++ net-2.6.20/net/core/fib_rules.c 2006-11-08 23:32:35.0 +0100
@@ -119,6 +119,9 @@
if (rule->ifindex && (rule->ifindex != fl->iif))
continue;
 
+   if ((rule->mark ^ fl->mark) & rule->mark_mask)
+   continue;
+
if (!ops->match(rule, fl, flags))
continue;
 
@@ -179,6 +182,18 @@
rule->ifindex = dev->ifindex;
}
 
+   if (tb[FRA_FWMARK]) {
+   rule->mark = nla_get_u32(tb[FRA_FWMARK]);
+   if (rule->mark)
+   /* compatibility: if the mark value is non-zero all bits
+* are compared unless a mask is explicitly specified.
+*/
+   rule->mark_mask = 0x;
+   }
+
+   if (tb[FRA_FWMASK])
+   rule->mark_mask = nla_get_u32(tb[FRA_FWMASK]);
+
rule->action = frh->action;
rule->flags = frh->flags;
rule->table = frh_get_table(frh, tb);
@@ -250,6 +265,14 @@
nla_strcmp(tb[FRA_IFNAME], rule->ifname))
continue;
 
+   if (tb[FRA_FWMARK] &&
+   (rule->mark != nla_get_u32(tb[FRA_FWMARK])))
+   continue;
+
+   if (tb[FRA_FWMASK] &&
+   (rule->mark_mask != nla_get_u32(tb[FRA_FWMASK])))
+   continue;
+
if (!ops->compare(rule, frh, tb))
continue;
 
@@ -298,6 +321,12 @@
if (rule->pref)
NLA_PUT_U32(skb, FRA_PRIORITY, rule->pref);
 
+   if (rule->mark)
+   NLA_PUT_U32(skb, FRA_FWMARK, rule->mark);
+
+   if (rule->mark_mask || rule->mark)
+   NLA_PUT_U32(skb, FRA_FWMASK, rule->mark_mask);
+
if (ops->fill(rule, skb, nlh, frh) < 0)
goto nla_put_failure;
 
Index: net-2.6.20/net/decnet/dn_rules.c
===
--- net-2.6.20.orig/net/decnet/dn_rules.c   2006-11-08 16:12:32.0 
+0100
+++ net-2.6.20/net/decnet/dn_rules.c2006-11-08 23:32:35.0 +0100
@@ -45,8 +45,6 @@
__le16  dstmask;
__le16  srcmap;
u8  flags;
-   u32 fwmark;
-   u32 fwmask;
 };
 
 static struct dn_fib_rule default_rule = {
@@ -129,9 +127,6 @@
((daddr ^ r->dst) & r->dstmask))
return 0;
 
-   if ((r->fwmark ^ fl->mark) & r->fwmask)
-   return 0;
-
return 1;
 }
 
@@ -165,18 +160,6 @@
if (tb[FRA_DST])
r->dst = nla_get_u16(tb[FRA_DST]);
 
-   if (tb[FRA_FWMARK]) {
-   r->fwmark = nla_get_u32(tb[FRA_FWMARK]);
-   if (r->fwmark)
-   /* compatibility: if the mark value is non-zero all bits
-* are compared unless a mask is explicitly specified.
-*/
-   r->fwmask = 0x;
-   }
-
-   if (tb[FRA_FWMASK])
-   r->fwmask = nla_get_u32(tb[FRA_FWMASK]);
-
r->src_len = frh->src_len;
r->srcmask = dnet_make_mask(r->src_len);
r->dst_len = frh->dst_len;
@@ -197,12 +180,6 @@
if (frh->dst_len && (r->dst_len != frh->dst_len))
return 0;
 
-   if (tb[FRA_FWMARK] && (r->fwmark != nla_get_u32(tb[FRA_FWMARK])))
-   return 0;
-
-   if (tb[FRA_FWMASK] && (r->fwmask != nla_get_u32(tb[FRA_FWMASK])))
-   return 0;
-
if (tb[FRA_SRC] && (r->src != nla_get_u16(tb[FRA_SRC])))
return 0;
 
@@ -240,10 +217,6 @@
frh->src_len = r->src_len;
frh->tos = 0;
 
-   if (r->fwmark)
-   NLA_PUT_U32(skb, FRA_FWMARK, r->fwmark);
-   if (r->fwmask || r->fwmark)
-   NLA_PUT_U32(skb, FRA_FWMASK, r->fwmask);
if (r->dst_len)
NLA_PUT_U16(skb, FRA_DST, r

[PATCH 1/6] [NET]: Turn nfmark into generic mark

2006-11-09 Thread Thomas Graf

nfmark is being used in various subsystems and has become
the defacto mark field for all kinds of packets. Therefore
it makes sense to rename it to `mark' and remove the
dependency on CONFIG_NETFILTER.

Signed-off-by: Thomas Graf <[EMAIL PROTECTED]>

Index: net-2.6.20/include/linux/skbuff.h
===
--- net-2.6.20.orig/include/linux/skbuff.h  2006-11-08 15:34:13.0 
+0100
+++ net-2.6.20/include/linux/skbuff.h   2006-11-08 16:12:30.0 +0100
@@ -216,7 +216,7 @@
  * @tail: Tail pointer
  * @end: End pointer
  * @destructor: Destruct function
- * @nfmark: Can be used for communication between hooks
+ * @mark: Generic packet mark
  * @nfct: Associated connection, if any
  * @ipvs_property: skbuff is owned by ipvs
  * @nfctinfo: Relationship of this skb to the connection
@@ -295,7 +295,6 @@
 #ifdef CONFIG_BRIDGE_NETFILTER
struct nf_bridge_info   *nf_bridge;
 #endif
-   __u32   nfmark;
 #endif /* CONFIG_NETFILTER */
 #ifdef CONFIG_NET_SCHED
__u16   tc_index;   /* traffic control index */
@@ -310,6 +309,7 @@
__u32   secmark;
 #endif
 
+   __u32   mark;
 
/* These elements must be at the end, see alloc_skb() for details.  */
unsigned inttruesize;
Index: net-2.6.20/net/core/skbuff.c
===
--- net-2.6.20.orig/net/core/skbuff.c   2006-11-08 15:34:13.0 +0100
+++ net-2.6.20/net/core/skbuff.c2006-11-08 16:12:30.0 +0100
@@ -473,8 +473,8 @@
 #endif
C(protocol);
n->destructor = NULL;
+   C(mark);
 #ifdef CONFIG_NETFILTER
-   C(nfmark);
C(nfct);
nf_conntrack_get(skb->nfct);
C(nfctinfo);
@@ -534,8 +534,8 @@
new->pkt_type   = old->pkt_type;
new->tstamp = old->tstamp;
new->destructor = NULL;
+   new->mark   = old->mark;
 #ifdef CONFIG_NETFILTER
-   new->nfmark = old->nfmark;
new->nfct   = old->nfct;
nf_conntrack_get(old->nfct);
new->nfctinfo   = old->nfctinfo;
Index: net-2.6.20/net/ipv4/netfilter/iptable_mangle.c
===
--- net-2.6.20.orig/net/ipv4/netfilter/iptable_mangle.c 2006-11-08 
15:34:13.0 +0100
+++ net-2.6.20/net/ipv4/netfilter/iptable_mangle.c  2006-11-08 
16:12:30.0 +0100
@@ -132,7 +132,7 @@
unsigned int ret;
u_int8_t tos;
__be32 saddr, daddr;
-   unsigned long nfmark;
+   u_int32_t mark;
 
/* root is playing with raw sockets. */
if ((*pskb)->len < sizeof(struct iphdr)
@@ -143,7 +143,7 @@
}
 
/* Save things which could affect route */
-   nfmark = (*pskb)->nfmark;
+   mark = (*pskb)->mark;
saddr = (*pskb)->nh.iph->saddr;
daddr = (*pskb)->nh.iph->daddr;
tos = (*pskb)->nh.iph->tos;
@@ -154,7 +154,7 @@
&& ((*pskb)->nh.iph->saddr != saddr
|| (*pskb)->nh.iph->daddr != daddr
 #ifdef CONFIG_IP_ROUTE_FWMARK
-   || (*pskb)->nfmark != nfmark
+   || (*pskb)->mark != mark
 #endif
|| (*pskb)->nh.iph->tos != tos))
if (ip_route_me_harder(pskb, RTN_UNSPEC))
Index: net-2.6.20/net/bridge/netfilter/ebt_mark.c
===
--- net-2.6.20.orig/net/bridge/netfilter/ebt_mark.c 2006-11-08 
15:34:13.0 +0100
+++ net-2.6.20/net/bridge/netfilter/ebt_mark.c  2006-11-08 16:12:30.0 
+0100
@@ -25,13 +25,13 @@
int action = info->target & -16;
 
if (action == MARK_SET_VALUE)
-   (*pskb)->nfmark = info->mark;
+   (*pskb)->mark = info->mark;
else if (action == MARK_OR_VALUE)
-   (*pskb)->nfmark |= info->mark;
+   (*pskb)->mark |= info->mark;
else if (action == MARK_AND_VALUE)
-   (*pskb)->nfmark &= info->mark;
+   (*pskb)->mark &= info->mark;
else
-   (*pskb)->nfmark ^= info->mark;
+   (*pskb)->mark ^= info->mark;
 
return info->target | -16;
 }
Index: net-2.6.20/net/bridge/netfilter/ebt_mark_m.c
===
--- net-2.6.20.orig/net/bridge/netfilter/ebt_mark_m.c   2006-11-08 
15:34:13.0 +0100
+++ net-2.6.20/net/bridge/netfilter/ebt_mark_m.c2006-11-08 
16:12:30.0 +0100
@@ -19,8 +19,8 @@
struct ebt_mark_m_info *info = (struct ebt_mark_m_info *) data;
 
if (info->bitmask & EBT_MARK_OR)
-   return !(!!(skb->nfmark & info->mask) ^ info->invert);
-   return !(((skb->nfmark & info->mask) == info->mark) ^ info->invert);
+   return !(!!(skb->mark & info->mask) ^ info->invert);
+   return !(((skb->mark & info->mask) == info->mark) ^

[PATCH 3/6] [IPv4] nl_fib_lookup: Rename fl_fwmark to fl_mark

2006-11-09 Thread Thomas Graf

For the sake of consistency.

Signed-off-by: Thomas Graf <[EMAIL PROTECTED]>

Index: net-2.6.20/include/net/ip_fib.h
===
--- net-2.6.20.orig/include/net/ip_fib.h2006-11-08 15:34:12.0 
+0100
+++ net-2.6.20/include/net/ip_fib.h 2006-11-08 16:12:34.0 +0100
@@ -115,7 +115,7 @@
 
 struct fib_result_nl {
__be32  fl_addr;   /* To be looked up*/
-   u32 fl_fwmark; 
+   u32 fl_mark;
unsigned char   fl_tos;
unsigned char   fl_scope;
unsigned char   tb_id_in;
Index: net-2.6.20/net/ipv4/fib_frontend.c
===
--- net-2.6.20.orig/net/ipv4/fib_frontend.c 2006-11-08 16:12:32.0 
+0100
+++ net-2.6.20/net/ipv4/fib_frontend.c  2006-11-08 16:12:34.0 +0100
@@ -768,7 +768,7 @@
 {

struct fib_result   res;
-   struct flowifl = { .mark = frn->fl_fwmark,
+   struct flowifl = { .mark = frn->fl_mark,
   .nl_u = { .ip4_u = { .daddr = 
frn->fl_addr,
.tos = frn->fl_tos,
.scope = 
frn->fl_scope } } };

--

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 2/6] [NET]: Rethink mark field in struct flowi

2006-11-09 Thread Thomas Graf

Now that all protocols have been made aware of the mark
field it can be moved out of the union thus simplyfing
its usage.

The config options in the IPv4/IPv6/DECnet subsystems
to enable respectively disable mark based routing only
obfuscate the code with ifdefs, the cost for the
additional comparison in the flow key is insignificant,
and most distributions have all these options enabled
by default anyway. Therefore it makes sense to remove
the config options and enable mark based routing by
default.

Signed-off-by: Thomas Graf <[EMAIL PROTECTED]>

Index: net-2.6.20/include/net/flow.h
===
--- net-2.6.20.orig/include/net/flow.h  2006-11-08 15:34:12.0 +0100
+++ net-2.6.20/include/net/flow.h   2006-11-08 16:12:32.0 +0100
@@ -18,7 +18,6 @@
struct {
__be32  daddr;
__be32  saddr;
-   __u32   fwmark;
__u8tos;
__u8scope;
} ip4_u;
@@ -26,28 +25,23 @@
struct {
struct in6_addr daddr;
struct in6_addr saddr;
-   __u32   fwmark;
__be32  flowlabel;
} ip6_u;
 
struct {
__le16  daddr;
__le16  saddr;
-   __u32   fwmark;
__u8scope;
} dn_u;
} nl_u;
 #define fld_dstnl_u.dn_u.daddr
 #define fld_srcnl_u.dn_u.saddr
-#define fld_fwmark nl_u.dn_u.fwmark
 #define fld_scope  nl_u.dn_u.scope
 #define fl6_dstnl_u.ip6_u.daddr
 #define fl6_srcnl_u.ip6_u.saddr
-#define fl6_fwmark nl_u.ip6_u.fwmark
 #define fl6_flowlabel  nl_u.ip6_u.flowlabel
 #define fl4_dstnl_u.ip4_u.daddr
 #define fl4_srcnl_u.ip4_u.saddr
-#define fl4_fwmark nl_u.ip4_u.fwmark
 #define fl4_tosnl_u.ip4_u.tos
 #define fl4_scope  nl_u.ip4_u.scope
 
@@ -86,6 +80,7 @@
 #ifdef CONFIG_IPV6_MIP6
 #define fl_mh_type uli_u.mht.type
 #endif
+   __u32   mark;
__u32   secid;  /* used by xfrm; see secid.txt */
 } __attribute__((__aligned__(BITS_PER_LONG/8)));
 
Index: net-2.6.20/net/decnet/dn_route.c
===
--- net-2.6.20.orig/net/decnet/dn_route.c   2006-11-08 16:12:30.0 
+0100
+++ net-2.6.20/net/decnet/dn_route.c2006-11-08 16:12:32.0 +0100
@@ -269,9 +269,7 @@
 {
return ((fl1->nl_u.dn_u.daddr ^ fl2->nl_u.dn_u.daddr) |
(fl1->nl_u.dn_u.saddr ^ fl2->nl_u.dn_u.saddr) |
-#ifdef CONFIG_DECNET_ROUTE_FWMARK
-   (fl1->nl_u.dn_u.fwmark ^ fl2->nl_u.dn_u.fwmark) |
-#endif
+   (fl1->mark ^ fl2->mark) |
(fl1->nl_u.dn_u.scope ^ fl2->nl_u.dn_u.scope) |
(fl1->oif ^ fl2->oif) |
(fl1->iif ^ fl2->iif)) == 0;
@@ -882,10 +880,8 @@
  { .daddr = oldflp->fld_dst,
.saddr = oldflp->fld_src,
.scope = RT_SCOPE_UNIVERSE,
-#ifdef CONFIG_DECNET_ROUTE_FWMARK
-   .fwmark = oldflp->fld_fwmark
-#endif
 } },
+   .mark = oldflp->mark,
.iif = loopback_dev.ifindex,
.oif = oldflp->oif };
struct dn_route *rt = NULL;
@@ -903,7 +899,7 @@
   "dn_route_output_slow: dst=%04x src=%04x mark=%d"
   " iif=%d oif=%d\n", dn_ntohs(oldflp->fld_dst),
   dn_ntohs(oldflp->fld_src),
-   oldflp->fld_fwmark, loopback_dev.ifindex, oldflp->oif);
+   oldflp->mark, loopback_dev.ifindex, oldflp->oif);
 
/* If we have an output interface, verify its a DECnet device */
if (oldflp->oif) {
@@ -1108,9 +1104,7 @@
rt->fl.fld_dst= oldflp->fld_dst;
rt->fl.oif= oldflp->oif;
rt->fl.iif= 0;
-#ifdef CONFIG_DECNET_ROUTE_FWMARK
-   rt->fl.fld_fwmark = oldflp->fld_fwmark;
-#endif
+   rt->fl.mark   = oldflp->mark;
 
rt->rt_saddr  = fl.fld_src;
rt->rt_daddr  = fl.fld_dst;
@@ -1178,9 +1172,7 @@
rt = rcu_dereference(rt->u.rt_next)) {
if ((flp->fld_dst == rt->fl.fld_dst) &&
(flp->fld_src == rt->fl.fld_src) &&
-#ifdef CONFIG_DECNET_ROUTE_FWMARK
-   (flp->fld_fwmark == rt->fl.fld_fwmark) &&
-#e

Re: [take24 3/6] kevent: poll/select() notifications.

2006-11-09 Thread Evgeniy Polyakov

On Thu, Nov 09, 2006 at 10:08:44AM +0100, Eric Dumazet ([EMAIL PROTECTED]) 
wrote:
> Here you test both KEVENT_SOCKET and KEVENT_PIPE
> 
> > +#if defined CONFIG_KEVENT_SOCKET || defined CONFIG_KEVENT_PIPE
> > +   kevent_storage_init(inode, &inode->st);
> > +#endif
> > }
> > return inode;
> >  }
> >
> >  void destroy_inode(struct inode *inode)
> >  {
> 
> but here you test only KEVENT_SOCKET
> 
> > +#if defined CONFIG_KEVENT_SOCKET
> > +   kevent_storage_fini(&inode->st);
> > +#endif

Indeed, it must be 
#if defined CONFIG_KEVENT_SOCKET || defined CONFIG_KEVENT_PIPE

> > BUG_ON(inode_has_buffers(inode));
> > security_inode_free(inode);
> > if (inode->i_sb->s_op->destroy_inode)
> > diff --git a/include/linux/fs.h b/include/linux/fs.h
> > index 5baf3a1..c529723 100644
> > --- a/include/linux/fs.h
> > +++ b/include/linux/fs.h
> > @@ -276,6 +276,7 @@ #include 
> >  #include 
> >  #include 
> >  #include 
> > +#include 
> >
> >  #include 
> >  #include 
> > @@ -586,6 +587,10 @@ #ifdef CONFIG_INOTIFY
> > struct mutexinotify_mutex;  /* protects the watches list */
> >  #endif
> >
> 
> Here you include a kevent_storage only if KEVENT_SOCKET
> 
> > +#ifdef CONFIG_KEVENT_SOCKET
> > +   struct kevent_storage   st;
> > +#endif
> > +

It must be 
#if defined CONFIG_KEVENT_SOCKET || defined CONFIG_KEVENT_PIPE

-- 
Evgeniy Polyakov
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: 2.6.19-rc1: Volanomark slowdown

2006-11-09 Thread Olaf Kirch

On Wed, Nov 08, 2006 at 02:07:32PM -0800, Tim Chen wrote:
> In my testing, the CPU utilization is at 100%.  So
> increase in ACKs will cost CPU to devote more
> time to process those ACKs and reduce throughput.

Oh, I see. I would test on a real network with real clients. I doubt
you would observe a noticeable effect there.

Olaf
-- 
Walks like a duck. Quacks like a duck. Must be a chicken.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [take24 3/6] kevent: poll/select() notifications.

2006-11-09 Thread Eric Dumazet

On Thursday 09 November 2006 09:23, Evgeniy Polyakov wrote:
> poll/select() notifications.
>
> This patch includes generic poll/select notifications.
> kevent_poll works simialr to epoll and has the same issues (callback
> is invoked not from internal state machine of the caller, but through
> process awake, a lot of allocations and so on).
>
> Signed-off-by: Evgeniy Polyakov <[EMAIL PROTECTED]>
>
> diff --git a/fs/file_table.c b/fs/file_table.c
> index bc35a40..0805547 100644
> --- a/fs/file_table.c
> +++ b/fs/file_table.c
> @@ -20,6 +20,7 @@ #include 
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>
>  #include 
> @@ -119,6 +120,7 @@ struct file *get_empty_filp(void)
>   f->f_uid = tsk->fsuid;
>   f->f_gid = tsk->fsgid;
>   eventpoll_init_file(f);
> + kevent_init_file(f);
>   /* f->f_version: 0 */
>   return f;
>
> @@ -164,6 +166,7 @@ void fastcall __fput(struct file *file)
>* in the file cleanup chain.
>*/
>   eventpoll_release(file);
> + kevent_cleanup_file(file);
>   locks_remove_flock(file);
>
>   if (file->f_op && file->f_op->release)
> diff --git a/fs/inode.c b/fs/inode.c
> index ada7643..6745c00 100644
> --- a/fs/inode.c
> +++ b/fs/inode.c
> @@ -21,6 +21,7 @@ #include 
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>
>  /*
> @@ -164,12 +165,18 @@ #endif
>   }
>   inode->i_private = 0;
>   inode->i_mapping = mapping;

Here you test both KEVENT_SOCKET and KEVENT_PIPE

> +#if defined CONFIG_KEVENT_SOCKET || defined CONFIG_KEVENT_PIPE
> + kevent_storage_init(inode, &inode->st);
> +#endif
>   }
>   return inode;
>  }
>
>  void destroy_inode(struct inode *inode)
>  {

but here you test only KEVENT_SOCKET

> +#if defined CONFIG_KEVENT_SOCKET
> + kevent_storage_fini(&inode->st);
> +#endif
>   BUG_ON(inode_has_buffers(inode));
>   security_inode_free(inode);
>   if (inode->i_sb->s_op->destroy_inode)
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 5baf3a1..c529723 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -276,6 +276,7 @@ #include 
>  #include 
>  #include 
>  #include 
> +#include 
>
>  #include 
>  #include 
> @@ -586,6 +587,10 @@ #ifdef CONFIG_INOTIFY
>   struct mutexinotify_mutex;  /* protects the watches list */
>  #endif
>

Here you include a kevent_storage only if KEVENT_SOCKET

> +#ifdef CONFIG_KEVENT_SOCKET
> + struct kevent_storage   st;
> +#endif
> +
>   unsigned long   i_state;
>   unsigned long   dirtied_when;   /* jiffies of first dirtying */
>
> @@ -739,6 +744,9 @@ #ifdef CONFIG_EPOLL
>   struct list_headf_ep_links;
>   spinlock_t  f_ep_lock;
>  #endif /* #ifdef CONFIG_EPOLL */
> +#ifdef CONFIG_KEVENT_POLL
> + struct kevent_storage   st;
> +#endif
>   struct address_space*f_mapping;
>  };
>  extern spinlock_t files_lock;
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Linux-2.6.10 - Realtek 8139 driver (8139too.c) TX Timeout doesn't allow interrupt handler to disable receive interrupts at high bi-directional traffic

2006-11-09 Thread Basheer, Mansoor Ahamed

Hi All

I found an issue with Realtek 8139 driver (2.6.10 branch) at high
bi-directional traffic. On transmit timeout, driver's timeout callback
re-enables the receive interrupt. On the next receive interrupt, the ISR
disables the receive interrupt "only" when the receive poll task is not
active. But the poll task is actually active and hence it doesn't
disable the receive interrupt. So the ISR returns without clearing the
receive interrupt. This un-serviced receive interrupt brings the system
into hung state.

My understanding here is, on receive interrupt the ISR should disable
the receive interrupt irrespective of the polling task's state (active
or inactive). I changed the code (as shown below) and it works
perfectly. 

Is this a known issue? If so, is there a fix already available?

Also, we get frequent TX timeouts during high rate of traffic. What
could be the reason for this frequent TX timeouts?
 

--- old/8139too.c   2006-11-09 11:49:25.0 +0530
+++ new/8139too.c   2006-11-09 11:50:02.0 +0530
@@ -2200,8 +2200,8 @@
/* Receive packets are processed by poll routine.
   If not running start it now. */
if (status & RxAckBits){
-   if (netif_rx_schedule_prep(dev)) {
RTL_W16_F (IntrMask, rtl8139_norx_intr_mask);
+   if (netif_rx_schedule_prep(dev)) {
__netif_rx_schedule (dev);
}
}


Thanks
Mansoor
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[take24 0/6] kevent: Generic event handling mechanism.

2006-11-09 Thread Evgeniy Polyakov


Generic event handling mechanism.

Kevent is a generic subsytem which allows to handle event notifications.
It supports both level and edge triggered events. It is similar to
poll/epoll in some cases, but it is more scalable, it is faster and
allows to work with essentially eny kind of events.

Events are provided into kernel through control syscall and can be read
back through ring buffer or using usual syscalls.
Kevent update (i.e. readiness switching) happens directly from internals
of the appropriate state machine of the underlying subsytem (like
network, filesystem, timer or any other).

Homepage:
http://tservice.net.ru/~s0mbre/old/?section=projects&item=kevent

Documentation page:
http://linux-net.osdl.org/index.php/Kevent

Consider for inclusion.

Changes from 'take23' patchset:
 * kevent PIPE notifications
 * KEVENT_REQ_LAST_CHECK flag, which allows to perform last check in dequeuing 
time
 * fixed poll/select notifications (were broken due to tree manipulations)
 * made Documentation/kevent.txt look nice in 80-col terminal
 * fix for copy_to_user() failure report for the first kevent (Andrew Morton)
 * minor fucntion renames
Here is pipe result with kevent_pipe kernel kevent part with 2000 pipes 
(Eric Dumazet's application):
epoll (edge-triggered):   248408 events/sec
kevent (edge-triggered):  269282 events/sec
Busy reading loop:269519 events/sec

Changes from 'take22' patchset:
 * new ring buffer implementation in process' memory
 * wakeup-one-thread flag
 * edge-triggered behaviour
With this release additional independent benchmark shows kevent speed compared 
to epoll:
Eric Dumazet created special benchmark which creates set of AF_INET sockets and 
two threads 
start to simultaneously read and write data from/into them.
Here is results:
epoll (no EPOLLET): 57428 events/sec
kevent (no ET): 59794 events/sec
epoll (with EPOLLET): 71000 events/sec
kevent (with ET): 78265 events/sec
Maximum (busy loop reading events): 88482 events/sec

Changes from 'take21' patchset:
 * minor cleanups (different return values, removed unneded variables, 
whitespaces and so on)
 * fixed bug in kevent removal in case when kevent being removed
   is the same as overflow_kevent (spotted by Eric Dumazet)

Changes from 'take20' patchset:
 * new ring buffer implementation
 * removed artificial limit on possible number of kevents
With this release and fixed userspace web server it was possible to 
achive 3960+ req/s with client connection rate of 4000 con/s
over 100 Mbit lan, data IO over network was about 10582.7 KB/s, which
is too close to wire speed if we get into account headers and the like.

Changes from 'take19' patchset:
 * use __init instead of __devinit
 * removed 'default N' from config for user statistic
 * removed kevent_user_fini() since kevent can not be unloaded
 * use KERN_INFO for statistic output

Changes from 'take18' patchset:
 * use __init instead of __devinit
 * removed 'default N' from config for user statistic
 * removed kevent_user_fini() since kevent can not be unloaded
 * use KERN_INFO for statistic output

Changes from 'take17' patchset:
 * Use RB tree instead of hash table. 
At least for a web sever, frequency of addition/deletion of new kevent 
is comparable with number of search access, i.e. most of the time 
events 
are added, accesed only couple of times and then removed, so it 
justifies 
RB tree usage over AVL tree, since the latter does have much slower 
deletion 
time (max O(log(N)) compared to 3 ops), 
although faster search time (1.44*O(log(N)) vs. 2*O(log(N))). 
So for kevents I use RB tree for now and later, when my AVL tree 
implementation 
is ready, it will be possible to compare them.
 * Changed readiness check for socket notifications.

With both above changes it is possible to achieve more than 3380 req/second 
compared to 2200, 
sometimes 2500 req/second for epoll() for trivial web-server and httperf client 
on the same
hardware.
It is possible that above kevent limit is due to maximum allowed kevents in a 
time limit, which is
4096 events.

Changes from 'take16' patchset:
 * misc cleanups (__read_mostly, const ...)
 * created special macro which is used for mmap size (number of pages) 
calculation
 * export kevent_socket_notify(), since it is used in network protocols which 
can be 
built as modules (IPv6 for example)

Changes from 'take15' patchset:
 * converted kevent_timer to high-resolution timers, this forces timer API 
update at
http://linux-net.osdl.org/index.php/Kevent
 * use struct ukevent* instead of void * in syscalls (documentation has been 
updated)
 * added warning in kevent_add_ukevent() if ring has broken index (for testing)

Changes from 'take14' patchset:
 * added kevent_wait()
This syscall waits until either timeout expires or at least one event
becomes ready. It also commits that @num events from @start are processed
by userspace and thus can be be removed

[take24 6/6] kevent: Pipe notifications.

2006-11-09 Thread Evgeniy Polyakov


Pipe notifications.


diff --git a/fs/pipe.c b/fs/pipe.c
index f3b6f71..aeaee9c 100644
--- a/fs/pipe.c
+++ b/fs/pipe.c
@@ -16,6 +16,7 @@ #include 
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -312,6 +313,7 @@ redo:
break;
}
if (do_wakeup) {
+   kevent_pipe_notify(inode, KEVENT_SOCKET_SEND);
wake_up_interruptible_sync(&pipe->wait);
kill_fasync(&pipe->fasync_writers, SIGIO, POLL_OUT);
}
@@ -321,6 +323,7 @@ redo:
 
/* Signal writers asynchronously that there is more room. */
if (do_wakeup) {
+   kevent_pipe_notify(inode, KEVENT_SOCKET_SEND);
wake_up_interruptible(&pipe->wait);
kill_fasync(&pipe->fasync_writers, SIGIO, POLL_OUT);
}
@@ -490,6 +493,7 @@ redo2:
break;
}
if (do_wakeup) {
+   kevent_pipe_notify(inode, KEVENT_SOCKET_RECV);
wake_up_interruptible_sync(&pipe->wait);
kill_fasync(&pipe->fasync_readers, SIGIO, POLL_IN);
do_wakeup = 0;
@@ -501,6 +505,7 @@ redo2:
 out:
mutex_unlock(&inode->i_mutex);
if (do_wakeup) {
+   kevent_pipe_notify(inode, KEVENT_SOCKET_RECV);
wake_up_interruptible(&pipe->wait);
kill_fasync(&pipe->fasync_readers, SIGIO, POLL_IN);
}
@@ -605,6 +610,7 @@ pipe_release(struct inode *inode, int de
free_pipe_info(inode);
} else {
wake_up_interruptible(&pipe->wait);
+   kevent_pipe_notify(inode, 
KEVENT_SOCKET_SEND|KEVENT_SOCKET_RECV);
kill_fasync(&pipe->fasync_readers, SIGIO, POLL_IN);
kill_fasync(&pipe->fasync_writers, SIGIO, POLL_OUT);
}
diff --git a/kernel/kevent/kevent_pipe.c b/kernel/kevent/kevent_pipe.c
new file mode 100644
index 000..32c6f19
--- /dev/null
+++ b/kernel/kevent/kevent_pipe.c
@@ -0,0 +1,112 @@
+/*
+ * kevent_pipe.c
+ * 
+ * 2006 Copyright (c) Evgeniy Polyakov <[EMAIL PROTECTED]>
+ * All rights reserved.
+ * 
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+static int kevent_pipe_callback(struct kevent *k)
+{
+   struct inode *inode = k->st->origin;
+   struct pipe_inode_info *pipe = inode->i_pipe;
+   int nrbufs = pipe->nrbufs;
+
+   if (k->event.event & KEVENT_SOCKET_RECV && nrbufs > 0) {
+   if (!pipe->writers)
+   return -1;
+   return 1;
+   }
+   
+   if (k->event.event & KEVENT_SOCKET_SEND && nrbufs < PIPE_BUFFERS) {
+   if (!pipe->readers)
+   return -1;
+   return 1;
+   }
+
+   return 0;
+}
+
+int kevent_pipe_enqueue(struct kevent *k)
+{
+   struct file *pipe;
+   int err = -EBADF;
+   struct inode *inode;
+
+   pipe = fget(k->event.id.raw[0]);
+   if (!pipe)
+   goto err_out_exit;
+
+   inode = igrab(pipe->f_dentry->d_inode);
+   if (!inode)
+   goto err_out_fput;
+
+   err = kevent_storage_enqueue(&inode->st, k);
+   if (err)
+   goto err_out_iput;
+
+   err = k->callbacks.callback(k);
+   if (err)
+   goto err_out_dequeue;
+
+   fput(pipe);
+
+   return err;
+
+err_out_dequeue:
+   kevent_storage_dequeue(k->st, k);
+err_out_iput:
+   iput(inode);
+err_out_fput:
+   fput(pipe);
+err_out_exit:
+   return err;
+}
+
+int kevent_pipe_dequeue(struct kevent *k)
+{
+   struct inode *inode = k->st->origin;
+
+   kevent_storage_dequeue(k->st, k);
+   iput(inode);
+
+   return 0;
+}
+
+void kevent_pipe_notify(struct inode *inode, u32 event)
+{
+   kevent_storage_ready(&inode->st, NULL, event);
+}
+
+static int __init kevent_init_pipe(void)
+{
+   struct kevent_callbacks sc = {
+   .callback = &kevent_pipe_callback,
+   .enqueue = &kevent_pipe_enqueue,
+   .dequeue = &kevent_pipe_dequeue};
+
+   return kevent_add_callbacks(&sc, KEVENT_PIPE);
+}
+module_init(kevent_init_pipe

[take24 4/6] kevent: Socket notifications.

2006-11-09 Thread Evgeniy Polyakov


Socket notifications.

This patch includes socket send/recv/accept notifications.
Using trivial web server based on kevent and this features
instead of epoll it's performance increased more than noticebly.
More details about various benchmarks and server itself 
(evserver_kevent.c) can be found on project's homepage.

Signed-off-by: Evgeniy Polyakov <[EMAIL PROTECTED]>

diff --git a/fs/inode.c b/fs/inode.c
index ada7643..6745c00 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -21,6 +21,7 @@ #include 
 #include 
 #include 
 #include 
+#include 
 #include 
 
 /*
@@ -164,12 +165,18 @@ #endif
}
inode->i_private = 0;
inode->i_mapping = mapping;
+#if defined CONFIG_KEVENT_SOCKET || defined CONFIG_KEVENT_PIPE
+   kevent_storage_init(inode, &inode->st);
+#endif
}
return inode;
 }
 
 void destroy_inode(struct inode *inode) 
 {
+#if defined CONFIG_KEVENT_SOCKET
+   kevent_storage_fini(&inode->st);
+#endif
BUG_ON(inode_has_buffers(inode));
security_inode_free(inode);
if (inode->i_sb->s_op->destroy_inode)
diff --git a/include/net/sock.h b/include/net/sock.h
index edd4d73..d48ded8 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -48,6 +48,7 @@ #include 
 #include 
 #include   /* struct sk_buff */
 #include 
+#include 
 
 #include 
 
@@ -450,6 +451,21 @@ static inline int sk_stream_memory_free(
 
 extern void sk_stream_rfree(struct sk_buff *skb);
 
+struct socket_alloc {
+   struct socket socket;
+   struct inode vfs_inode;
+};
+
+static inline struct socket *SOCKET_I(struct inode *inode)
+{
+   return &container_of(inode, struct socket_alloc, vfs_inode)->socket;
+}
+
+static inline struct inode *SOCK_INODE(struct socket *socket)
+{
+   return &container_of(socket, struct socket_alloc, socket)->vfs_inode;
+}
+
 static inline void sk_stream_set_owner_r(struct sk_buff *skb, struct sock *sk)
 {
skb->sk = sk;
@@ -477,6 +493,7 @@ static inline void sk_add_backlog(struct
sk->sk_backlog.tail = skb;
}
skb->next = NULL;
+   kevent_socket_notify(sk, KEVENT_SOCKET_RECV);
 }
 
 #define sk_wait_event(__sk, __timeo, __condition)  \
@@ -679,21 +696,6 @@ static inline struct kiocb *siocb_to_kio
return si->kiocb;
 }
 
-struct socket_alloc {
-   struct socket socket;
-   struct inode vfs_inode;
-};
-
-static inline struct socket *SOCKET_I(struct inode *inode)
-{
-   return &container_of(inode, struct socket_alloc, vfs_inode)->socket;
-}
-
-static inline struct inode *SOCK_INODE(struct socket *socket)
-{
-   return &container_of(socket, struct socket_alloc, socket)->vfs_inode;
-}
-
 extern void __sk_stream_mem_reclaim(struct sock *sk);
 extern int sk_stream_mem_schedule(struct sock *sk, int size, int kind);
 
diff --git a/include/net/tcp.h b/include/net/tcp.h
index 7a093d0..69f4ad2 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -857,6 +857,7 @@ static inline int tcp_prequeue(struct so
tp->ucopy.memory = 0;
} else if (skb_queue_len(&tp->ucopy.prequeue) == 1) {
wake_up_interruptible(sk->sk_sleep);
+   kevent_socket_notify(sk, 
KEVENT_SOCKET_RECV|KEVENT_SOCKET_SEND);
if (!inet_csk_ack_scheduled(sk))
inet_csk_reset_xmit_timer(sk, ICSK_TIME_DACK,
  (3 * TCP_RTO_MIN) / 4,
diff --git a/kernel/kevent/kevent_socket.c b/kernel/kevent/kevent_socket.c
new file mode 100644
index 000..7f74110
--- /dev/null
+++ b/kernel/kevent/kevent_socket.c
@@ -0,0 +1,135 @@
+/*
+ * kevent_socket.c
+ * 
+ * 2006 Copyright (c) Evgeniy Polyakov <[EMAIL PROTECTED]>
+ * All rights reserved.
+ * 
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+
+static int kevent_socket_callback(struct kevent *k)
+{
+   struct inode *inode = k->st->origin;
+   unsigned int events = SOCKET_I(inode)->ops->poll(SOCKET_I(inode)->file, 
SOCKET_I(inode), NULL);
+
+   if ((events & (POLLIN | POLLRDNORM)) && (k->event.event & 
(KEVENT_SOCKET_RECV | KEVENT_SOCKET_

Re. Please pull 'upstream' branch of wireless-2.6

2006-11-09 Thread Roger While


John wrote :

Yeah, looks like I was a bit overzealous on the warning squelch...

I'll cook-up a new patch that doesn't error-out.


I hope you do not bloat the kernel with meaningless warning messages.
Something simple like the following will do.

/* enable MWI */
/* Shut up the must_check tests - We don't care if this does not 
succeed */

if (pci_set_mwi(pdev))
rvalue = 0;


Roger While


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[take24 2/6] kevent: Core files.

2006-11-09 Thread Evgeniy Polyakov


Core files.

This patch includes core kevent files:
 * userspace controlling
 * kernelspace interfaces
 * initialization
 * notification state machines

Some bits of documentation can be found on project's homepage (and links from 
there):
http://tservice.net.ru/~s0mbre/old/?section=projects&item=kevent

Signed-off-by: Evgeniy Polyakov <[EMAIL PROTECTED]>

diff --git a/arch/i386/kernel/syscall_table.S b/arch/i386/kernel/syscall_table.S
index 7e639f7..fa8075b 100644
--- a/arch/i386/kernel/syscall_table.S
+++ b/arch/i386/kernel/syscall_table.S
@@ -318,3 +318,7 @@ ENTRY(sys_call_table)
.long sys_vmsplice
.long sys_move_pages
.long sys_getcpu
+   .long sys_kevent_get_events
+   .long sys_kevent_ctl/* 320 */
+   .long sys_kevent_wait
+   .long sys_kevent_ring_init
diff --git a/arch/x86_64/ia32/ia32entry.S b/arch/x86_64/ia32/ia32entry.S
index b4aa875..95fb252 100644
--- a/arch/x86_64/ia32/ia32entry.S
+++ b/arch/x86_64/ia32/ia32entry.S
@@ -714,8 +714,12 @@ #endif
.quad compat_sys_get_robust_list
.quad sys_splice
.quad sys_sync_file_range
-   .quad sys_tee
+   .quad sys_tee   /* 315 */
.quad compat_sys_vmsplice
.quad compat_sys_move_pages
.quad sys_getcpu
+   .quad sys_kevent_get_events
+   .quad sys_kevent_ctl/* 320 */
+   .quad sys_kevent_wait
+   .quad sys_kevent_ring_init
 ia32_syscall_end:  
diff --git a/include/asm-i386/unistd.h b/include/asm-i386/unistd.h
index bd99870..2161ef2 100644
--- a/include/asm-i386/unistd.h
+++ b/include/asm-i386/unistd.h
@@ -324,10 +324,14 @@ #define __NR_tee  315
 #define __NR_vmsplice  316
 #define __NR_move_pages317
 #define __NR_getcpu318
+#define __NR_kevent_get_events 319
+#define __NR_kevent_ctl320
+#define __NR_kevent_wait   321
+#define __NR_kevent_ring_init  322
 
 #ifdef __KERNEL__
 
-#define NR_syscalls 319
+#define NR_syscalls 323
 #include 
 
 /*
diff --git a/include/asm-x86_64/unistd.h b/include/asm-x86_64/unistd.h
index 6137146..3669c0f 100644
--- a/include/asm-x86_64/unistd.h
+++ b/include/asm-x86_64/unistd.h
@@ -619,10 +619,18 @@ #define __NR_vmsplice 278
 __SYSCALL(__NR_vmsplice, sys_vmsplice)
 #define __NR_move_pages279
 __SYSCALL(__NR_move_pages, sys_move_pages)
+#define __NR_kevent_get_events 280
+__SYSCALL(__NR_kevent_get_events, sys_kevent_get_events)
+#define __NR_kevent_ctl281
+__SYSCALL(__NR_kevent_ctl, sys_kevent_ctl)
+#define __NR_kevent_wait   282
+__SYSCALL(__NR_kevent_wait, sys_kevent_wait)
+#define __NR_kevent_ring_init  283
+__SYSCALL(__NR_kevent_ring_init, sys_kevent_ring_init)
 
 #ifdef __KERNEL__
 
-#define __NR_syscall_max __NR_move_pages
+#define __NR_syscall_max __NR_kevent_ring_init
 #include 
 
 #ifndef __NO_STUBS
diff --git a/include/linux/kevent.h b/include/linux/kevent.h
new file mode 100644
index 000..f7cbf6b
--- /dev/null
+++ b/include/linux/kevent.h
@@ -0,0 +1,223 @@
+/*
+ * 2006 Copyright (c) Evgeniy Polyakov <[EMAIL PROTECTED]>
+ * All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
+ */
+
+#ifndef __KEVENT_H
+#define __KEVENT_H
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#define KEVENT_MIN_BUFFS_ALLOC 3
+
+struct kevent;
+struct kevent_storage;
+typedef int (* kevent_callback_t)(struct kevent *);
+
+/* @callback is called each time new event has been caught. */
+/* @enqueue is called each time new event is queued. */
+/* @dequeue is called each time event is dequeued. */
+
+struct kevent_callbacks {
+   kevent_callback_t   callback, enqueue, dequeue;
+};
+
+#define KEVENT_READY   0x1
+#define KEVENT_STORAGE 0x2
+#define KEVENT_USER0x4
+
+struct kevent
+{
+   /* Used for kevent freeing.*/
+   struct rcu_head rcu_head;
+   struct ukevent  event;
+   /* This lock protects ukevent manipulations, e.g. ret_flags changes. */
+   spinlock_t  ulock;
+
+   /* Entry of user's tree. */
+   struct rb_node  kevent_node;
+   /* Entry of origin's queue. */
+   struct list_headstorage_entry;
+

[take24 1/6] kevent: Description.

2006-11-09 Thread Evgeniy Polyakov


Description.


diff --git a/Documentation/kevent.txt b/Documentation/kevent.txt
new file mode 100644
index 000..ca49e4b
--- /dev/null
+++ b/Documentation/kevent.txt
@@ -0,0 +1,186 @@
+Description.
+
+int kevent_ctl(int fd, unsigned int cmd, unsigned int num, struct ukevent 
*arg);
+
+fd - is the file descriptor referring to the kevent queue to manipulate. 
+It is created by opening "/dev/kevent" char device, which is created with 
+dynamic minor number and major number assigned for misc devices. 
+
+cmd - is the requested operation. It can be one of the following:
+KEVENT_CTL_ADD - add event notification 
+KEVENT_CTL_REMOVE - remove event notification 
+KEVENT_CTL_MODIFY - modify existing notification 
+
+num - number of struct ukevent in the array pointed to by arg 
+arg - array of struct ukevent
+
+When called, kevent_ctl will carry out the operation specified in the 
+cmd parameter.
+---
+
+ int kevent_get_events(int ctl_fd, unsigned int min_nr, unsigned int max_nr, 
+   __u64 timeout, struct ukevent *buf, unsigned flags)
+
+ctl_fd - file descriptor referring to the kevent queue 
+min_nr - minimum number of completed events that kevent_get_events will block 
+waiting for 
+max_nr - number of struct ukevent in buf 
+timeout - number of nanoseconds to wait before returning less than min_nr 
+ events. If this is -1, then wait forever. 
+buf - pointer to an array of struct ukevent. 
+flags - unused 
+
+kevent_get_events will wait timeout milliseconds for at least min_nr completed 
+events, copying completed struct ukevents to buf and deleting any 
+KEVENT_REQ_ONESHOT event requests. In nonblocking mode it returns as many 
+events as possible, but not more than max_nr. In blocking mode it waits until 
+timeout or if at least min_nr events are ready.
+---
+
+ int kevent_wait(int ctl_fd, unsigned int num, __u64 timeout)
+
+ctl_fd - file descriptor referring to the kevent queue 
+num - number of processed kevents 
+timeout - this timeout specifies number of nanoseconds to wait until there is 
+   free space in kevent queue 
+
+This syscall waits until either timeout expires or at least one event becomes 
+ready. It also copies that num events into special ring buffer and requeues 
+them (or removes depending on flags). 
+---
+
+ int kevent_ring_init(int ctl_fd, struct kevent_ring *ring, unsigned int num)
+
+ctl_fd - file descriptor referring to the kevent queue 
+num - size of the ring buffer in events 
+
+ struct kevent_ring
+ {
+   unsigned int ring_kidx;
+   struct ukevent event[0];
+ }
+
+ring_kidx - is an index in the ring buffer where kernel will put new events 
+   when kevent_wait() or kevent_get_events() is called 
+
+Example userspace code (ring_buffer.c) can be found on project's homepage.
+
+Each kevent syscall can be so called cancellation point in glibc, i.e. when 
+thread has been cancelled in kevent syscall, thread can be safely removed 
+and no events will be lost, since each syscall (kevent_wait() or 
+kevent_get_events()) will copy event into special ring buffer, accessible 
+from other threads or even processes (if shared memory is used).
+
+When kevent is removed (not dequeued when it is ready, but just removed), 
+even if it was ready, it is not copied into ring buffer, since if it is 
+removed, no one cares about it (otherwise user would wait until it becomes 
+ready and got it through usual way using kevent_get_events() or kevent_wait()) 
+and thus no need to copy it to the ring buffer.
+
+It is possible with userspace ring buffer, that events in the ring buffer 
+can be replaced without knowledge for the thread currently reading them 
+(when other thread calls kevent_get_events() or kevent_wait()), so appropriate 
+locking between threads or processes, which can simultaneously access the same 
+ring buffer, is required.
+---
+
+The bulk of the interface is entirely done through the ukevent struct. 
+It is used to add event requests, modify existing event requests, 
+specify which event requests to remove, and return completed events.
+
+struct ukevent contains the following members:
+
+struct kevent_id id
+Id of this request, e.g. socket number, file descriptor and so on 
+__u32 type
+Event type, e.g. KEVENT_SOCK, KEVENT_INODE, KEVENT_TIMER and so on 
+__u32 event
+Event itself, e.g. SOCK_ACCEPT, INODE_CREATED, TIMER_FIRED 
+__u32 req_flags
+Per-event request flags,
+
+KEVENT_REQ_ONESHOT
+event will be removed when it is ready 
+
+KEVENT_REQ_WAKEUP_ONE
+When several threads wait on the same kevent queue and requested the 
+   same event, for example 'wake me up when new

[take24 3/6] kevent: poll/select() notifications.

2006-11-09 Thread Evgeniy Polyakov


poll/select() notifications.

This patch includes generic poll/select notifications.
kevent_poll works simialr to epoll and has the same issues (callback
is invoked not from internal state machine of the caller, but through
process awake, a lot of allocations and so on).

Signed-off-by: Evgeniy Polyakov <[EMAIL PROTECTED]>

diff --git a/fs/file_table.c b/fs/file_table.c
index bc35a40..0805547 100644
--- a/fs/file_table.c
+++ b/fs/file_table.c
@@ -20,6 +20,7 @@ #include 
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #include 
@@ -119,6 +120,7 @@ struct file *get_empty_filp(void)
f->f_uid = tsk->fsuid;
f->f_gid = tsk->fsgid;
eventpoll_init_file(f);
+   kevent_init_file(f);
/* f->f_version: 0 */
return f;
 
@@ -164,6 +166,7 @@ void fastcall __fput(struct file *file)
 * in the file cleanup chain.
 */
eventpoll_release(file);
+   kevent_cleanup_file(file);
locks_remove_flock(file);
 
if (file->f_op && file->f_op->release)
diff --git a/fs/inode.c b/fs/inode.c
index ada7643..6745c00 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -21,6 +21,7 @@ #include 
 #include 
 #include 
 #include 
+#include 
 #include 
 
 /*
@@ -164,12 +165,18 @@ #endif
}
inode->i_private = 0;
inode->i_mapping = mapping;
+#if defined CONFIG_KEVENT_SOCKET || defined CONFIG_KEVENT_PIPE
+   kevent_storage_init(inode, &inode->st);
+#endif
}
return inode;
 }
 
 void destroy_inode(struct inode *inode) 
 {
+#if defined CONFIG_KEVENT_SOCKET
+   kevent_storage_fini(&inode->st);
+#endif
BUG_ON(inode_has_buffers(inode));
security_inode_free(inode);
if (inode->i_sb->s_op->destroy_inode)
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 5baf3a1..c529723 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -276,6 +276,7 @@ #include 
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -586,6 +587,10 @@ #ifdef CONFIG_INOTIFY
struct mutexinotify_mutex;  /* protects the watches list */
 #endif
 
+#ifdef CONFIG_KEVENT_SOCKET
+   struct kevent_storage   st;
+#endif
+
unsigned long   i_state;
unsigned long   dirtied_when;   /* jiffies of first dirtying */
 
@@ -739,6 +744,9 @@ #ifdef CONFIG_EPOLL
struct list_headf_ep_links;
spinlock_t  f_ep_lock;
 #endif /* #ifdef CONFIG_EPOLL */
+#ifdef CONFIG_KEVENT_POLL
+   struct kevent_storage   st;
+#endif
struct address_space*f_mapping;
 };
 extern spinlock_t files_lock;
diff --git a/kernel/kevent/kevent_poll.c b/kernel/kevent/kevent_poll.c
new file mode 100644
index 000..7030d21
--- /dev/null
+++ b/kernel/kevent/kevent_poll.c
@@ -0,0 +1,228 @@
+/*
+ * 2006 Copyright (c) Evgeniy Polyakov <[EMAIL PROTECTED]>
+ * All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+static kmem_cache_t *kevent_poll_container_cache;
+static kmem_cache_t *kevent_poll_priv_cache;
+
+struct kevent_poll_ctl
+{
+   struct poll_table_structpt;
+   struct kevent   *k;
+};
+
+struct kevent_poll_wait_container
+{
+   struct list_headcontainer_entry;
+   wait_queue_head_t   *whead;
+   wait_queue_twait;
+   struct kevent   *k;
+};
+
+struct kevent_poll_private
+{
+   struct list_headcontainer_list;
+   spinlock_t  container_lock;
+};
+
+static int kevent_poll_enqueue(struct kevent *k);
+static int kevent_poll_dequeue(struct kevent *k);
+static int kevent_poll_callback(struct kevent *k);
+
+static int kevent_poll_wait_callback(wait_queue_t *wait,
+   unsigned mode, int sync, void *key)
+{
+   struct kevent_poll_wait_container *cont =
+   container_of(wait, struct kevent_poll_wait_container, wait);
+   struct kevent *k = cont->k;
+
+   kevent_storage_ready(k->st, NULL, KEVENT_MASK_ALL);
+   return 0;
+}
+
+static void kevent_poll_qproc(struct file *file, wait_queue_head_t *whead,
+   struct poll_table_struct *poll_table)
+{
+   struct kevent *k =
+   container_of(poll_table, struct kevent_poll_ctl, pt)->k;
+   struct kevent_poll_private *priv = k->priv;
+   struct kevent_poll_wait_container *co

[take24 5/6] kevent: Timer notifications.

2006-11-09 Thread Evgeniy Polyakov


Timer notifications.

Timer notifications can be used for fine grained per-process time 
management, since interval timers are very inconvenient to use, 
and they are limited.

This subsystem uses high-resolution timers.
id.raw[0] is used as number of seconds
id.raw[1] is used as number of nanoseconds

Signed-off-by: Evgeniy Polyakov <[EMAIL PROTECTED]>

diff --git a/kernel/kevent/kevent_timer.c b/kernel/kevent/kevent_timer.c
new file mode 100644
index 000..df93049
--- /dev/null
+++ b/kernel/kevent/kevent_timer.c
@@ -0,0 +1,112 @@
+/*
+ * 2006 Copyright (c) Evgeniy Polyakov <[EMAIL PROTECTED]>
+ * All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+struct kevent_timer
+{
+   struct hrtimer  ktimer;
+   struct kevent_storage   ktimer_storage;
+   struct kevent   *ktimer_event;
+};
+
+static int kevent_timer_func(struct hrtimer *timer)
+{
+   struct kevent_timer *t = container_of(timer, struct kevent_timer, 
ktimer);
+   struct kevent *k = t->ktimer_event;
+
+   kevent_storage_ready(&t->ktimer_storage, NULL, KEVENT_MASK_ALL);
+   hrtimer_forward(timer, timer->base->softirq_time,
+   ktime_set(k->event.id.raw[0], k->event.id.raw[1]));
+   return HRTIMER_RESTART;
+}
+
+static struct lock_class_key kevent_timer_key;
+
+static int kevent_timer_enqueue(struct kevent *k)
+{
+   int err;
+   struct kevent_timer *t;
+
+   t = kmalloc(sizeof(struct kevent_timer), GFP_KERNEL);
+   if (!t)
+   return -ENOMEM;
+
+   hrtimer_init(&t->ktimer, CLOCK_MONOTONIC, HRTIMER_REL);
+   t->ktimer.expires = ktime_set(k->event.id.raw[0], k->event.id.raw[1]);
+   t->ktimer.function = kevent_timer_func;
+   t->ktimer_event = k;
+
+   err = kevent_storage_init(&t->ktimer, &t->ktimer_storage);
+   if (err)
+   goto err_out_free;
+   lockdep_set_class(&t->ktimer_storage.lock, &kevent_timer_key);
+
+   err = kevent_storage_enqueue(&t->ktimer_storage, k);
+   if (err)
+   goto err_out_st_fini;
+
+   hrtimer_start(&t->ktimer, t->ktimer.expires, HRTIMER_REL);
+
+   return 0;
+
+err_out_st_fini:
+   kevent_storage_fini(&t->ktimer_storage);
+err_out_free:
+   kfree(t);
+
+   return err;
+}
+
+static int kevent_timer_dequeue(struct kevent *k)
+{
+   struct kevent_storage *st = k->st;
+   struct kevent_timer *t = container_of(st, struct kevent_timer, 
ktimer_storage);
+
+   hrtimer_cancel(&t->ktimer);
+   kevent_storage_dequeue(st, k);
+   kfree(t);
+
+   return 0;
+}
+
+static int kevent_timer_callback(struct kevent *k)
+{
+   k->event.ret_data[0] = jiffies_to_msecs(jiffies);
+   return 1;
+}
+
+static int __init kevent_init_timer(void)
+{
+   struct kevent_callbacks tc = {
+   .callback = &kevent_timer_callback,
+   .enqueue = &kevent_timer_enqueue,
+   .dequeue = &kevent_timer_dequeue};
+
+   return kevent_add_callbacks(&tc, KEVENT_TIMER);
+}
+module_init(kevent_init_timer);
+

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

86 matches

Mail list logo