Re: Generic Netlink HOW-TO based on Jamal's original doc
James Morris wrote: >>An Introduction To Using Generic Netlink >>=== > > > Wow, this is great! Thanks. I consider it an act of penance for all of the evil things I did with Netlink on my first few iterations of NetLabel ;) -- paul moore linux security @ hp - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Generic Netlink HOW-TO based on Jamal's original doc
> An Introduction To Using Generic Netlink > === Wow, this is great! -- James Morris <[EMAIL PROTECTED]> - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Generic Netlink HOW-TO based on Jamal's original doc
A couple of months ago I promised Jamal and Thomas I would post some comments to Jamal's original genetlink how-to. However, as I started to work on the document the diff from the original started to get a little ridiculous so instead of posting a patch against Jamal's original how-to I'm just posting the revised document in it's entirety. In the document below I tried to summarize all of the things I learned while developing NetLabel. Some of it came from Jamal's document, some the kernel code, and some from discussions with Thomas. Hopefully this document will make it much easier for others to use genetlink in the future. If this text below is acceptable to everyone, should this be added to the Documentation directory? An Introduction To Using Generic Netlink === Last Updated: November 10, 2006 Table of Contents 1. Introduction 1.1. Document Overview 1.2. Netlink And Generic Netlink 2. Architectural Overview 3. Generic Netlink Families 3.1. Family Overview 3.1.1. The genl_family Structure 3.1.2. The genl_ops Structure 3.2. Registering A Family 4. Generic Netlink Communications 4.1. Generic Netlink Message Format 4.2. Kernel Communication 4.2.1. Sending Messages 4.2.2. Receiving Messages 4.3. Userspace Communication 5. Recommendations 5.1. Attributes And Message Payloads 5.2. Operation Granularity 5.3. Acknowledgment And Error Reporting 1. Introduction -- 1.1. Document Overview -- This document gives is a brief introduction to Generic Netlink, some simple examples on how to use it, and some recommendations on how to make the most of the Generic Netlink communications interface. While this document does not require that the reader have a detailed understanding of what Netlink is and how it works, some basic Netlink knowledge is assumed. As usual, the kernel source code is your best friend here. While this document talks briefly about Generic Netlink from a userspace point of view it's primary focus is on the kernel's Generic Netlink API. It is recommended that application developers who are interested in using Generic Netlink make use of the libnl library[1]. [1] http://people.suug.ch/~tgr/libnl 1.2. Netlink And Generic Netlink -- Netlink is a flexible, robust wire-format communications channel typically used for kernel to user communication although it can also be used for user to user and kernel to kernel communications. Netlink communication channels are associated with families or "busses", where each bus deals with a specific service; for example, different Netlink busses exist for routing, XFRM, netfilter, and several other kernel subsystems. More information about Netlink can be found in RFC 3549[1]. Over the years, Netlink has become very popular which has brought about a very real concern that the number of Netlink family numbers may be exhausted in the near future. In response to this the Generic Netlink family was created which acts as a Netlink multiplexer, allowing multiple service to use a single Netlink bus. [1] ftp://ftp.rfc-editor.org/in-notes/rfc3549.txt 2. Architectural Overview -- Figure #1 illustrates how the basic Generic Netlink architecture which is composed of five different types of components. 1) The Netlink subsystem which serves as the underlying transport layer for all of the Generic Netlink communications. 2) The Generic Netlink bus which is implemented inside the kernel, but which is available to userspace through the socket API and inside the kernel via the normal Netlink and Generic Netlink APIs. 3) The Generic Netlink users who communicate with each other over the Generic Netlink bus; users can exist both in kernel and user space. 4) The Generic Netlink controller which is part of the kernel and is responsible for dynamically allocating Generic Netlink communication channels and other management tasks. The Generic Netlink controller is implemented as a standard Generic Netlink user, however, it listens on a special, pre-allocated Generic Netlink channel. 5) The kernel socket API. Generic Netlink sockets are created with the PF_NETLINK domain and the NETLINK_GENERIC protocol values. +-+ +-+ | (3) application "A" | | (3) application "B" | +--+--+ +--+--+ || \/ \ / |
bcm43xx-d80211 broadcast reception with WPA
Hi, Long time lurker, first time poster. ^_^ I've been backporting the bcm43xx-d80211 driver to whatever the released 2.6 kernel was using the rt2x00 project's d80211 stack (equivalent to current wireless-dev but with a workaround for not having a ieee80211_dev pointer and still using the _tfm interface instead of the _cypher interface.) As of last night's wireless-dev tree bcm43xx, everything seems to be operating fine except incoming broadcast traffic is coming in 14 bytes too long and scrambled. I presume this means it's not decrypting properly... Anyway, I just thought I'd mention it. It might have gone unnoticed by the bcm43xx-d80211 developers, since it doesn't interfere with normal operation (A DHCP client's only broadcasts are outgoing) and only showed up for me because radvd's RAs were not arriving and my IPv6 address was not being set. I couldn't find any mention of such a thing on the list, and I'm happy to provide whatever debugging output is useful, but the laptop with the device isn't with me at the moment. Relevant facts: Platform: Debian/unstable (PPC) w/linux-image-2.6.18-1-powerpc (2.6.18-3) Drivers: bcm43xx-d80211 from wireless-dev 774f233b7915a2c36480eb4d98e6f57938f04b7b Firmware: 4.80.46.0 (BE, from AppleAirPortBrcm4311) Stack:ieee80211 from http://rt2x00.serialmonkey.com/rt2x00-cvs-daily.tar.gz 2006110303 is the date on the output, I believe. Hasn't been updated since 20061028 Plus a backport of the following commits: [PATCH] d80211: extend extra_hdr_room to be a bytecount 522e078b9f1f8309770dd161d90ddac1573a7877 [PATCH] d80211: remove unused variable in ieee80211_rx_irqsafe 10bfc9cdf9621385a3b69aa35f9fa86cc6a46bc6 [PATCH] d80211: Add wireless statistics 448bf25bc9e3d70a211fdf235426472089371c43 (as well as anything else that showed up in a diff of the d80211 dir against the rt2x00 iee80211 dir and wasn't a 2.6.19ism or wireless-devism) I'm basically using the instructions I posted at [1] except also patching rt2x00's ieee80211 stack. I acknowledge that any of the firmware version, the backporting, the forward porting or the current lunar cycle may be causing this problem. If no one pipes up with an insight, I'll try tonight with a v3 firmware, although the reason I moved to a v3 firmware was my previous build of bcm43xx-d80211 also wasn't getting an IPv6 address, although I don't believe the RAs were scrambled in that case. [1] http://openfacts.berlios.de/index-en.phtml?title=Broadcom_43xx_Linux_Driver/Debian_Unstable_with_Devicescape_802.11_stack -- Paul "TBBle" Hampson Opinions expressed here do not reflect the views of my employer Hell, we don't even agree on my pay cheque - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Questions regarding network drivers
Hi, I've got an interesting problem to contend with and need some advice from the great wise ones here. First of all, is it possible (and/or "reasonable practice") when developing a network driver to do zero-copy transfers between main memory and the network device? Secondly, the network device is only designed to work with short packets and I really want to keep the throughput up. My thought was that if I fired off an interrupt then transfer a page of data into an area I know is safe, the kernel will have enough time to find a new safe area and post the address before the next page is ready to send. Can anyone suggest why this wouldn't work or, assuming it can work, why this would be a Bad Idea? Lastly, assuming my sanity lasts that long, would I be correct in assuming that the first step in the process of getting the driver peer-reviewed and accepted would be to post the patches here? Thanks for any help, Jonathan Day Yahoo! Music Unlimited Access over 1 million songs. http://music.yahoo.com/unlimited - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 3/9] iPhase: 64bit cleanup
Ar Iau, 2006-11-09 am 16:12 -0800, ysgrifennodd David Miller: > Really, this driver has a ton of unresolved portability problems. Agreed - but at least its now 64bit clean. No objection to leaving it ! 64BIT at all even with the patch merged. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/4] skge: version 1.9
Michael Stone <[EMAIL PROTECTED]> wrote: >On Tue, Nov 07, 2006 at 12:28:26PM -0800, Jay Vosburgh wrote: >> Can you provide some bonding configuration details? Which mode, >>options, etc, as well as the relevant bits from dmesg (you can send it >> to me privately if it's huge)? > >I think I sent another message that I'm just doing ifenslave bond0 eth2 >ifenslave bond0 eth3 with no particular options set. I was thinking of the options to the bonding driver (probably in the network configuration or /etc/modprobe.conf); a reasonable set of information is in /proc/net/bonding/bond0, so that's a good place to start, along with whatever "uname -a" prints (kernel version, architecture, mostly). [...] The application is a network sniffer, and >the cards are forced to 1000Mbps/full duplex because the other end doesn't >negotiate. What's the relevant part of dmesg? Well, pretty much anything from the bonding driver or the ethernet driver. -J --- -Jay Vosburgh, IBM Linux Technology Center, [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 5/5] ip-sysctl.txt alphabetize
From: Stephen Hemminger <[EMAIL PROTECTED]> Date: Tue, 31 Oct 2006 15:01:45 -0800 > Rearrange TCP entries in alpha order. > > Signed-off-by: Stephen Hemminger <[EMAIL PROTECTED]> Also applied, thanks a lot. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/5] tcp: restrict congestion control choices
From: Stephen Hemminger <[EMAIL PROTECTED]> Date: Tue, 31 Oct 2006 15:01:43 -0800 > Allow normal users to only choose among a restricted set of congestion > control choices. The default is reno and what ever has been configured > as default. But the policy can be changed by administrator at any time. > > For example, to allow any choice: > cp /proc/sys/net/ipv4/tcp_available_congestion_control \ >/proc/sys/net/ipv4/tcp_allowed_congestion_control > > Signed-off-by: Stephen Hemminger <[EMAIL PROTECTED]> Applied, thanks a lot Stephen. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/5] tcp: allow autoloading of congestion control via setsockopt
From: Stephen Hemminger <[EMAIL PROTECTED]> Date: Tue, 31 Oct 2006 15:01:44 -0800 > If user has permision to load modules, then autoload then attempt > autoload of TCP congestion module. > > Signed-off-by: Stephen Hemminger <[EMAIL PROTECTED]> Applied, thanks. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/4] skge: version 1.9
On Tue, Nov 07, 2006 at 12:28:26PM -0800, Jay Vosburgh wrote: Can you provide some bonding configuration details? Which mode, options, etc, as well as the relevant bits from dmesg (you can send it to me privately if it's huge)? I think I sent another message that I'm just doing ifenslave bond0 eth2 ifenslave bond0 eth3 with no particular options set. The application is a network sniffer, and the cards are forced to 1000Mbps/full duplex because the other end doesn't negotiate. What's the relevant part of dmesg? Mike Stone - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/5] tcp: add tcp_available_congestion_control sysctl
From: Stephen Hemminger <[EMAIL PROTECTED]> Date: Tue, 31 Oct 2006 15:01:42 -0800 > Create /proc/sys/net/ipv4/tcp_available_congestion_control > that reflects currently available TCP choices. > > Signed-off-by: Stephen Hemminger <[EMAIL PROTECTED]> Applied, thanks Stephen. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] warning in SCTP
From: Sridhar Samudrala <[EMAIL PROTECTED]> Date: Thu, 02 Nov 2006 10:29:49 -0800 > On Thu, 2006-11-02 at 11:09 -0500, Vlad Yasevich wrote: > > Meelis Roos wrote: > > >> Actually, I'm backing this one out, it creates new warnings because > > >> callers of this function pass in a "const" pointer. > > > > > > Yes, it now seems it's not so simple. Marking it non-const there would > > > mark the it non-const in the whole family of sctp_state_fn_t and I'm not > > > sure that's the best thing to do. I guess the maintainer has better > > > bases for deciding what to do about it. > > > > > > > An alternate solution would be to make the digest a pointer, allocate > > it in sctp_endpoint_init() and free it in sctp_endpoint_destroy(). > > I agree that this is a better solution. > > Acked-by: Sridhar Samudrala <[EMAIL PROTECTED]> Applied to net-2.6.20, thanks everyone. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Intel 82559 NIC corrupted EEPROM
On 11/9/06, John <[EMAIL PROTECTED]> wrote: > The second thought is that the adapter is in D3, and something about > your kernel or the driver doesn't successfully wake it up to D0. On my NICs, the EEPROM ID (Word 0Ah) is set to 0x40a2. Thus DDPD (bit 6) is set to 0. DDPD is the "Disable Deep Power Down while PME is disabled" bit. 0 - Deep Power Down is enabled in D3 state while PME-disabled. 1 - Deep Power Down disabled in D3 state while PME-disabled. This bit should be set to 1b if a TCO controller is being used via the SMB because it requires receive functionality at all power states. Are you suggesting I try and set DDPD to 1? Or is this completely unrelated? This may be related but I doubt it. Something is strange about how memory is being mapped in your system. whatever is creating the problem moved when you changed the kernel version. I'm wondering if there is a device collision at e5302000. I'm not convinced at this point it is e100's fault. can you send output of cat /proc/iomem > An indication of this would be looking at lspci -vv before/after > loading the driver. $ diff -u lspci_vv_before_e100.txt lspci_vv_after_e100.txt --- lspci_vv_before_e100.txt2006-11-09 14:51:30.0 +0100 +++ lspci_vv_after_e100.txt 2006-11-09 14:51:30.0 +0100 @@ -74,21 +74,20 @@ Expansion ROM at 2000 [disabled] [size=1M] Capabilities: [dc] Power Management version 2 Flags: PMEClk- DSI+ D1+ D2+ AuxCurrent=0mA PME(D0+,D1+,D2+,D3hot+,D3cold+) - Status: D0 PME-Enable+ DSel=0 DScale=2 PME- + Status: D0 PME-Enable- DSel=0 DScale=2 PME- okay when the driver loads it is clearing PME enable, but not re-enabling it when it unloads. That is pretty much expected. 00:09.0 Ethernet controller: Intel Corporation 82557/8/9 [Ethernet Pro 100] (rev 08) Subsystem: Intel Corporation EtherExpress PRO/100B (TX) - Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- + Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap+ 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- SERR- pci_enable_device should be enabling io,mem,busmaster, they are probably being disabled when the driver errors out of init. maybe you should add a call to pci_set_power_state(dev, PCI_D0); before the call to e100_reset > Also, after loading/unloading eepro100 does the e100 driver work? No. now that is really odd. > A third idea is look for a master abort in lspci after e100 fails to > load. I don't understand that one. There isn't one, MAbort+ would be showing in the above lspci output. The all 0x returns when you read registers is a sure sign the hardware either isn't at the address specified or is in a power down state. The only other option i can think of is that something else is intercepting memory reads and writes. try something like the attached patch, compile tested only: e100_debug.patch Description: Binary data
Re: [patch] make sch_fifo.o available when CONFIG_NET_SCHED is not set
From: David Kimdon <[EMAIL PROTECTED]> Date: Wed, 8 Nov 2006 06:06:18 -0800 > Based on patch by Patrick McHardy. > > Add a new option, NET_SCH_FIFO, which provides a simple fifo qdisc > without requiring CONFIG_NET_SCHED. > > The d80211 stack needs a generic fifo qdisc for WME. At present it > uses net/d80211/fifo_qdisc.c which is functionally equivalent to > sch_fifo.c. This patch will allow the d80211 stack to remove > net/d80211/fifo_qdisc.c and use sch_fifo.c instead. > > Signed-off-by: David Kimdon <[EMAIL PROTECTED]> Applied to net-2.6.20, thanks David. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 3/9] iPhase: 64bit cleanup
From: [EMAIL PROTECTED] Date: Wed, 08 Nov 2006 19:51:04 -0800 > From: Alan Cox <[EMAIL PROTECTED]> > > Signed-off-by: Alan Cox <[EMAIL PROTECTED]> > Signed-off-by: Andrew Morton <[EMAIL PROTECTED]> This fixes the most obvious 64-bit problems, but it is still very very broken in other aspects. It is bad enough that I feel irresponsible turning it on in the 64-bit build. For example, it takes an ioremap()'d value, and accesses it as a regular cpu pointer which will explode on many architectures since all such accesses should go through asm/io.h accessors. Specifically I'm talking about dev->seg_ram, it is initialized like this: base = ioremap(real_base,iadev->pci_map_size); /* ioremap is not resolved ??? */ ... iadev->seg_ram = base + ACTUAL_SEG_RAM_BASE; ... Then used like this: desc1 = *(u_short *)(dev->seg_ram + dev->host_tcq_wr); and this: *(u_short *) (dev->seg_ram + dev->host_tcq_wr) = 0; and this: *(u_short *)(dev->seg_ram + dev->ffL.tcq_rd) = i+1; and this: desc_num = *(u_short *)(dev->seg_ram + dev->ffL.tcq_rd); and this: desc_num = *(u_short *)(dev->seg_ram + dev->ffL.tcq_rd); and this: SchedTbl = (u16*)(dev->seg_ram+CBR_SCHED_TABLE*dev->memSize); ... TstSchedTbl = (u16*)(SchedTbl+testSlot); //set index and read in value ... memcpy((caddr_t)&cbrVC,(caddr_t)TstSchedTbl,sizeof(cbrVC)); ... memcpy((caddr_t)TstSchedTbl, (caddr_t)&vcIndex,sizeof(TstSchedTbl)); Really, this driver has a ton of unresolved portability problems. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 1/9] bonding: lockdep annotation
From: [EMAIL PROTECTED] Date: Wed, 08 Nov 2006 19:51:01 -0800 > The bonding driver nests other drivers, give the bonding driver its own > lock class. > > Signed-off-by: Peter Zijlstra <[EMAIL PROTECTED]> > Acked-by: Ingo Molnar <[EMAIL PROTECTED]> > Cc: Stephen Hemminger <[EMAIL PROTECTED]> > Signed-off-by: Andrew Morton <[EMAIL PROTECTED]> Applied, thanks Peter. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Realtek 8139 driver (8139too.c) TX Timeout doesn't allow interrupt handler to disable receive interrupts at high bi-directional traffic
Basheer, Mansoor Ahamed <[EMAIL PROTECTED]> : [...] > My understanding here is, on receive interrupt the ISR should disable > the receive interrupt irrespective of the polling task's state (active > or inactive). Apparently it could happen even with 2.6.19-rc5, yes. > I changed the code (as shown below) and it works perfectly. Afaics your change may disable the Rx irq right after the poll routine enabled it again. It will not always work either. The (slow) timeout watchdog could grab the poll handler and hack the irq mask depending on whether poll was scheduled or not. > Is this a known issue? If so, is there a fix already available? > > Also, we get frequent TX timeouts during high rate of traffic. What > could be the reason for this frequent TX timeouts? No idea. Have you considered upgrading to a kernel which is not almost 2 years old ? -- Ueimor - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 6/6] [NET] rules: Add support to invert selectors
From: Thomas Graf <[EMAIL PROTECTED]> Date: Thu, 09 Nov 2006 12:27:41 +0100 > Introduces a new flag FIB_RULE_INVERT causing rules to apply > if the specified selector doesn't match. > > Signed-off-by: Thomas Graf <[EMAIL PROTECTED]> Also applied, thanks. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/6] [IPv4] nl_fib_lookup: Rename fl_fwmark to fl_mark
From: Thomas Graf <[EMAIL PROTECTED]> Date: Thu, 09 Nov 2006 12:27:38 +0100 > For the sake of consistency. > > Signed-off-by: Thomas Graf <[EMAIL PROTECTED]> Applied, thanks Thomas. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/6] [NET]: Rethink mark field in struct flowi
From: Thomas Graf <[EMAIL PROTECTED]> Date: Thu, 09 Nov 2006 12:27:37 +0100 > Now that all protocols have been made aware of the mark > field it can be moved out of the union thus simplyfing > its usage. > > The config options in the IPv4/IPv6/DECnet subsystems > to enable respectively disable mark based routing only > obfuscate the code with ifdefs, the cost for the > additional comparison in the flow key is insignificant, > and most distributions have all these options enabled > by default anyway. Therefore it makes sense to remove > the config options and enable mark based routing by > default. > > Signed-off-by: Thomas Graf <[EMAIL PROTECTED]> Applied, and I moved the mark in the flowi up to the top right after oif/iif in order to make sure it's in the same 32-byte cache line with the ipv4 addressing. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 5/6] [NET] rules: Share common attribute validation policy
From: Thomas Graf <[EMAIL PROTECTED]> Date: Thu, 09 Nov 2006 12:27:40 +0100 > Move the attribute policy for the non-specific attributes into > net/fib_rules.h and include it in the respective protocols. > > Signed-off-by: Thomas Graf <[EMAIL PROTECTED]> Looks nice, applied, thanks. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/6] [NET] rules: Protocol independant mark selector
From: Thomas Graf <[EMAIL PROTECTED]> Date: Thu, 09 Nov 2006 12:27:39 +0100 > Move mark selector currently implemented per protocol into > the protocol independant part. > > Signed-off-by: Thomas Graf <[EMAIL PROTECTED]> Applied, thanks. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/6] [NET]: Turn nfmark into generic mark
From: Thomas Graf <[EMAIL PROTECTED]> Date: Thu, 09 Nov 2006 12:27:36 +0100 > nfmark is being used in various subsystems and has become > the defacto mark field for all kinds of packets. Therefore > it makes sense to rename it to `mark' and remove the > dependency on CONFIG_NETFILTER. > > Signed-off-by: Thomas Graf <[EMAIL PROTECTED]> Applied, thanks Thomas. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH wireless-2.6-git] prism54: WPA/RSN support for fullmac cards
On 11/9/06, Luis R. Rodriguez <[EMAIL PROTECTED]> wrote: On 11/9/06, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote: > Am Mittwoch, 8. November 2006 01:39 schrieben Sie: > > On Fri, Nov 03, 2006 at 01:41:46PM -0500, Luis R. Rodriguez wrote: > > > On 11/3/06, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote: > > > >yes, especially mgt_commit_list caused alot headaches, until I removed > > > >DOT11_OID_PSM from the cache list. > > > >Now, I can "hammer" it with ping -f for hours. > > > > > > nice, perhaps that's been the culprit all along... going to dig to see > > > if I find a fullmac prism card. Will like to get this merged in. > > > > Any resolution on this? > > no replies. > Seems like it works for just fine for everybody. ;) I found a card, I just need time to test it. Dan didn't you say you ran into issues with the patch on your card? Luis CC'ing Dan to make sure he gets it ;) - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH wireless-2.6-git] prism54: WPA/RSN support for fullmac cards
On 11/9/06, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote: Am Mittwoch, 8. November 2006 01:39 schrieben Sie: > On Fri, Nov 03, 2006 at 01:41:46PM -0500, Luis R. Rodriguez wrote: > > On 11/3/06, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote: > > >yes, especially mgt_commit_list caused alot headaches, until I removed > > >DOT11_OID_PSM from the cache list. > > >Now, I can "hammer" it with ping -f for hours. > > > > nice, perhaps that's been the culprit all along... going to dig to see > > if I find a fullmac prism card. Will like to get this merged in. > > Any resolution on this? no replies. Seems like it works for just fine for everybody. ;) I found a card, I just need time to test it. Dan didn't you say you ran into issues with the patch on your card? Luis - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] mark non-compiling ISA network drivers i386 only
On Thu, Nov 09, Stephen Hemminger wrote: > On Thu, 9 Nov 2006 19:40:21 +0100 (MET) > Olaf Hering <[EMAIL PROTECTED]> wrote: > > > > > Provide drivers for the old toys only on i386 > > isa_bus_to_virt is defined only on i386, mips and arm > > isa_virt_to_bus is used for floppy.ko > > Why not mark all of ISA as i386, mips, arm only? This patch is mainly for users of these 3 macros: '(isa_virt_to_bus|isa_page_to_bus|isa_bus_to_virt)' 3c505.c 3c515.c 3c523.c 3c527.c aha1542.c cs89x0.c esp.c ibmmca.c lance.c mca_53c9x.c ni52.c ni65.c ps2esdi.c ultrastor.c wd7000.c I did not enable all of them in a ppc32 pmac config. ppc32 PReP does have ISA slots on the motorola boards, no idea if anyone really cares about ISA cards today. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: why do we mangle checksums for v6 ICMP?
From: Brian Haley <[EMAIL PROTECTED]> Date: Thu, 09 Nov 2006 12:32:18 -0500 > Al Viro wrote: > > AFAICS, the rules are: > > > > (1) checksum is 16-bit one's complement of the one's complement sum of > > relevant 16bit words. > > > > (2) for v4 UDP all-zeroes has special meaning - no checksum; if you get > > it from (1), send all-ones instead. > > > > (3) for v6 UDP we have the same remapping as in (2), but all-zeroes has > > different meaning - not "ignore checksum" as in v4, but "reject the > > packet". > > > > (4) there is no (4). > > > > IOW, nobody except UDP has any business doing that 0->0x > > replacement. However, we have > >if (icmp6h->icmp6_cksum == 0) > >icmp6h->icmp6_cksum = -1; > > This doesn't look necessary, RFCs 4443/2463 don't mention it being > necessary, and BSD doesn't do it either. I'll cook-up a patch to remove > that since I was doing some other mods in that codepath. This is how things look to me too. > > and similar in net/ipv6/raw.c > > Maybe here it only needs to be done if (fl->proto == IPPROTO_UDP)? Yes, I believe that is what is needed. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: e1000 driver 2.6.18 - how to waste processor cycles
Jeff V. Merkey wrote: Jesse Brandeburg wrote: On 11/9/06, Jeffrey V. Merkey <[EMAIL PROTECTED]> wrote: In the case I am referring to, the memory is already mapped with a previous call, which means it may be getting mapped twice. I guess maybe I'm not keeping up with you. This is what I see looking in 2.6.18, i see e1000_clean_rx_irq: Check e1000_alloc_rx_buffers: if (skb already exists in ring buffer) goto map_skb: else dev_alloc_skb ( drop through to map_skb) map_skb: pci_map_single Jeff check done bit pci_unmap_single copybreak and recycle OR hand buffer up stack the only branch before the unmap is the napi break out, and in that case we don't change any memory state, so alloc will not do anything. As for alloc rx, we always map, because we always unmapped. Unmapping every single buffer in rx_irq the remapping them in alloc_rx_buffers is wasteful of cycles. Jeff Did I miss something? I would appreciate a more detailed explanation of what you see going wrong. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch sungem] improved locking
Please use GREG_STAT_* instead of magic constants for the interrupt mask and ACK register writes. In fact, there are some questionable values you use, in particular this one: +static inline void gem_ack_int(struct gem *gp) +{ + writel(0x3f, gp->regs + GREG_IACK); +} There is no bit defined in GREG_STAT_* for 0x08, but you set it in this magic bitmask. It is another reason not to use magic constants like this :-) Also, if you need to use an attachment to get the tabbing right, that's fine, but please also provide a copy inline so that it is easy to quote the patch for review purposes. It's a truly a pain in the rear to quote things when you use a binary attachment. I'd like these very simple and straightforward issues to be worked out before I even begin to review the actual locking change itself. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: e1000 driver 2.6.18 - how to waste processor cycles
Jesse Brandeburg wrote: On 11/9/06, Jeffrey V. Merkey <[EMAIL PROTECTED]> wrote: In the case I am referring to, the memory is already mapped with a previous call, which means it may be getting mapped twice. I guess maybe I'm not keeping up with you. This is what I see looking in 2.6.18, i see e1000_clean_rx_irq: Check e1000_alloc_rx_buffers: if (skb already exists in ring buffer) goto map_skb: else dev_alloc_skb ( drop through to map_skb) map_skb: pci_map_single Jeff check done bit pci_unmap_single copybreak and recycle OR hand buffer up stack the only branch before the unmap is the napi break out, and in that case we don't change any memory state, so alloc will not do anything. As for alloc rx, we always map, because we always unmapped. Did I miss something? I would appreciate a more detailed explanation of what you see going wrong. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: e1000 driver 2.6.18 - how to waste processor cycles
On 11/9/06, Jeffrey V. Merkey <[EMAIL PROTECTED]> wrote: In the case I am referring to, the memory is already mapped with a previous call, which means it may be getting mapped twice. I guess maybe I'm not keeping up with you. This is what I see looking in 2.6.18, i see e1000_clean_rx_irq: check done bit pci_unmap_single copybreak and recycle OR hand buffer up stack the only branch before the unmap is the napi break out, and in that case we don't change any memory state, so alloc will not do anything. As for alloc rx, we always map, because we always unmapped. Did I miss something? I would appreciate a more detailed explanation of what you see going wrong. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch sungem] improved locking
The attached patch improves locking in the sungem driver: - a single lock is used in the driver - gem_start_xmit, gem_poll, and gem_interrupt are lockless The new locking design is based on what's in tg3.c. The patch runs smoothly on my ibook (with CONFIG_SMP set), but it will need extensive testing on a multi-cpu box. The patch includes two implementations for gem_interrupt(). One is lockless while the other makes use of a spinlock. The spinlock version is there because I was not sure the lockless version would work with net_poll_controller. One of the two versions must be removed in the final patch. Patch applies to current git net-2.6. Please review, and test if possible. Thanks, Signed-ff-by: Eric Lemoine <[EMAIL PROTECTED]> -- Eric sungem-locking.patch Description: Binary data
Re: [take24 3/6] kevent: poll/select() notifications.
On Thu, 9 Nov 2006, Davide Libenzi wrote: > On Thu, 9 Nov 2006, Evgeniy Polyakov wrote: > > > On Thu, Nov 09, 2006 at 10:51:56AM -0800, Davide Libenzi > > (davidel@xmailserver.org) wrote: > > > On Thu, 9 Nov 2006, Evgeniy Polyakov wrote: > > > > > > > +static int kevent_poll_callback(struct kevent *k) > > > > +{ > > > > + if (k->event.req_flags & KEVENT_REQ_LAST_CHECK) { > > > > + return 1; > > > > + } else { > > > > + struct file *file = k->st->origin; > > > > + unsigned int revents = file->f_op->poll(file, NULL); > > > > + > > > > + k->event.ret_data[0] = revents & k->event.event; > > > > + > > > > + return (revents & k->event.event); > > > > + } > > > > +} > > > > > > You need to be careful that file->f_op->poll is not called inside the > > > spin_lock_irqsave/spin_lock_irqrestore pair, since (even this came up > > > during epoll developemtn days) file->f_op->poll might do a simple > > > spin_lock_irq/spin_unlock_irq. This unfortunate constrain forced epoll to > > > have a suboptimal double O(R) loop to handle LT events. > > > > It is tricky - users call wake_up() from any context, which in turn ends > > up calling kevent_storage_ready(), which calls kevent_poll_callback() with > > KEVENT_REQ_LAST_CHECK bit set, which becomes almost empty call in fast > > path. Since callback returns 1, kevent will be queued into ready queue, > > which is processed on behalf of syscalls - in that case kevent will > > check the flag and since KEVENT_REQ_LAST_CHECK is set, will call > > callback again to check if kevent is correctly marked, but already > > without that flag (it happens in syscall context, i.e. process context > > without any locks held), so callback calls ->poll(), which can sleep, > > but it is safe. If ->poll() returns 'ready' value, kevent is transfers > > data into userspace, otherwise it is 'requeued' (just removed from > > ready queue). > > Oh, mine was only a general warn. I hadn't looked at the generic code > before. But now that I poke on it, I see: > > void kevent_requeue(struct kevent *k) > { >unsigned long flags; > >spin_lock_irqsave(&k->st->lock, flags); >__kevent_requeue(k, 0); >spin_unlock_irqrestore(&k->st->lock, flags); > } > > and then: > > static int __kevent_requeue(struct kevent *k, u32 event) > { >int ret, rem; >unsigned long flags; > >ret = k->callbacks.callback(k); > > Isn't the k->callbacks.callback() possibly end up calling f_op->poll? Ack, there the check for KEVENT_REQ_LAST_CHECK inside the callback. The problem with f_op->poll was not that it can sleep (not excluded though) but that some f_op->poll can do a simple spin_lock_irq/spin_unlock_irq. But for a quick peek your new code seems fine with that. - Davide - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] mark non-compiling ISA network drivers i386 only
On Thu, 9 Nov 2006 19:40:21 +0100 (MET) Olaf Hering <[EMAIL PROTECTED]> wrote: > > Provide drivers for the old toys only on i386 > isa_bus_to_virt is defined only on i386, mips and arm > isa_virt_to_bus is used for floppy.ko Why not mark all of ISA as i386, mips, arm only? - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [take24 3/6] kevent: poll/select() notifications.
On Thu, 9 Nov 2006, Evgeniy Polyakov wrote: > On Thu, Nov 09, 2006 at 10:51:56AM -0800, Davide Libenzi > (davidel@xmailserver.org) wrote: > > On Thu, 9 Nov 2006, Evgeniy Polyakov wrote: > > > > > +static int kevent_poll_callback(struct kevent *k) > > > +{ > > > + if (k->event.req_flags & KEVENT_REQ_LAST_CHECK) { > > > + return 1; > > > + } else { > > > + struct file *file = k->st->origin; > > > + unsigned int revents = file->f_op->poll(file, NULL); > > > + > > > + k->event.ret_data[0] = revents & k->event.event; > > > + > > > + return (revents & k->event.event); > > > + } > > > +} > > > > You need to be careful that file->f_op->poll is not called inside the > > spin_lock_irqsave/spin_lock_irqrestore pair, since (even this came up > > during epoll developemtn days) file->f_op->poll might do a simple > > spin_lock_irq/spin_unlock_irq. This unfortunate constrain forced epoll to > > have a suboptimal double O(R) loop to handle LT events. > > It is tricky - users call wake_up() from any context, which in turn ends > up calling kevent_storage_ready(), which calls kevent_poll_callback() with > KEVENT_REQ_LAST_CHECK bit set, which becomes almost empty call in fast > path. Since callback returns 1, kevent will be queued into ready queue, > which is processed on behalf of syscalls - in that case kevent will > check the flag and since KEVENT_REQ_LAST_CHECK is set, will call > callback again to check if kevent is correctly marked, but already > without that flag (it happens in syscall context, i.e. process context > without any locks held), so callback calls ->poll(), which can sleep, > but it is safe. If ->poll() returns 'ready' value, kevent is transfers > data into userspace, otherwise it is 'requeued' (just removed from > ready queue). Oh, mine was only a general warn. I hadn't looked at the generic code before. But now that I poke on it, I see: void kevent_requeue(struct kevent *k) { unsigned long flags; spin_lock_irqsave(&k->st->lock, flags); __kevent_requeue(k, 0); spin_unlock_irqrestore(&k->st->lock, flags); } and then: static int __kevent_requeue(struct kevent *k, u32 event) { int ret, rem; unsigned long flags; ret = k->callbacks.callback(k); Isn't the k->callbacks.callback() possibly end up calling f_op->poll? - Davide - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [take24 3/6] kevent: poll/select() notifications.
On Thu, Nov 09, 2006 at 10:51:56AM -0800, Davide Libenzi (davidel@xmailserver.org) wrote: > On Thu, 9 Nov 2006, Evgeniy Polyakov wrote: > > > +static int kevent_poll_callback(struct kevent *k) > > +{ > > + if (k->event.req_flags & KEVENT_REQ_LAST_CHECK) { > > + return 1; > > + } else { > > + struct file *file = k->st->origin; > > + unsigned int revents = file->f_op->poll(file, NULL); > > + > > + k->event.ret_data[0] = revents & k->event.event; > > + > > + return (revents & k->event.event); > > + } > > +} > > You need to be careful that file->f_op->poll is not called inside the > spin_lock_irqsave/spin_lock_irqrestore pair, since (even this came up > during epoll developemtn days) file->f_op->poll might do a simple > spin_lock_irq/spin_unlock_irq. This unfortunate constrain forced epoll to > have a suboptimal double O(R) loop to handle LT events. It is tricky - users call wake_up() from any context, which in turn ends up calling kevent_storage_ready(), which calls kevent_poll_callback() with KEVENT_REQ_LAST_CHECK bit set, which becomes almost empty call in fast path. Since callback returns 1, kevent will be queued into ready queue, which is processed on behalf of syscalls - in that case kevent will check the flag and since KEVENT_REQ_LAST_CHECK is set, will call callback again to check if kevent is correctly marked, but already without that flag (it happens in syscall context, i.e. process context without any locks held), so callback calls ->poll(), which can sleep, but it is safe. If ->poll() returns 'ready' value, kevent is transfers data into userspace, otherwise it is 'requeued' (just removed from ready queue). > - Davide > -- Evgeniy Polyakov - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [take24 3/6] kevent: poll/select() notifications.
On Thu, 9 Nov 2006, Evgeniy Polyakov wrote: > +static int kevent_poll_callback(struct kevent *k) > +{ > + if (k->event.req_flags & KEVENT_REQ_LAST_CHECK) { > + return 1; > + } else { > + struct file *file = k->st->origin; > + unsigned int revents = file->f_op->poll(file, NULL); > + > + k->event.ret_data[0] = revents & k->event.event; > + > + return (revents & k->event.event); > + } > +} You need to be careful that file->f_op->poll is not called inside the spin_lock_irqsave/spin_lock_irqrestore pair, since (even this came up during epoll developemtn days) file->f_op->poll might do a simple spin_lock_irq/spin_unlock_irq. This unfortunate constrain forced epoll to have a suboptimal double O(R) loop to handle LT events. - Davide - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] mark non-compiling ISA network drivers i386 only
Provide drivers for the old toys only on i386 isa_bus_to_virt is defined only on i386, mips and arm isa_virt_to_bus is used for floppy.ko Add missing '&& ISA_DMA_API' to NI52 WARNING: "isa_bus_to_virt" [drivers/net/ni65.ko] undefined! WARNING: "isa_virt_to_bus" [drivers/net/ni65.ko] undefined! WARNING: "isa_bus_to_virt" [drivers/net/ni52.ko] undefined! WARNING: "isa_bus_to_virt" [drivers/net/lance.ko] undefined! WARNING: "isa_virt_to_bus" [drivers/net/lance.ko] undefined! WARNING: "isa_bus_to_virt" [drivers/net/3c515.ko] undefined! WARNING: "isa_virt_to_bus" [drivers/net/3c515.ko] undefined! WARNING: "isa_virt_to_bus" [drivers/net/3c505.ko] undefined! I'm sure noone will miss the drivers. Signed-off-by: Olaf Hering <[EMAIL PROTECTED]> --- drivers/net/Kconfig | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) Index: linux-2.6/drivers/net/Kconfig === --- linux-2.6.orig/drivers/net/Kconfig +++ linux-2.6/drivers/net/Kconfig @@ -616,7 +616,7 @@ config EL2 config ELPLUS tristate "3c505 \"EtherLink Plus\" support" - depends on NET_VENDOR_3COM && ISA && ISA_DMA_API + depends on NET_VENDOR_3COM && ISA && ISA_DMA_API && X86 ---help--- Information about this network (Ethernet) card can be found in . If you have a card of @@ -657,7 +657,7 @@ config EL3 config 3C515 tristate "3c515 ISA \"Fast EtherLink\"" - depends on NET_VENDOR_3COM && (ISA || EISA) && ISA_DMA_API + depends on NET_VENDOR_3COM && (ISA || EISA) && ISA_DMA_API && X86 help If you have a 3Com ISA EtherLink XL "Corkscrew" 3c515 Fast Ethernet network card, say Y and read the Ethernet-HOWTO, available from @@ -735,7 +735,7 @@ config TYPHOON config LANCE tristate "AMD LANCE and PCnet (AT1500 and NE2100) support" - depends on NET_ETHERNET && ISA && ISA_DMA_API + depends on NET_ETHERNET && ISA && ISA_DMA_API && X86 help If you have a network (Ethernet) card of this type, say Y and read the Ethernet-HOWTO, available from @@ -918,7 +918,7 @@ config NI5010 config NI52 tristate "NI5210 support" - depends on NET_VENDOR_RACAL && ISA + depends on NET_VENDOR_RACAL && ISA && ISA_DMA_API && X86 help If you have a network (Ethernet) card of this type, say Y and read the Ethernet-HOWTO, available from @@ -930,7 +930,7 @@ config NI52 config NI65 tristate "NI6510 support" - depends on NET_VENDOR_RACAL && ISA && ISA_DMA_API + depends on NET_VENDOR_RACAL && ISA && ISA_DMA_API && X86 help If you have a network (Ethernet) card of this type, say Y and read the Ethernet-HOWTO, available from - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: tg3_close question
On 11/9/06, Michael Chan <[EMAIL PROTECTED]> wrote: Eric Lemoine wrote: > > On 11/9/06, Michael Chan <[EMAIL PROTECTED]> wrote: > > > So it is not possible for tg3_poll() -> tg3_tx() to run any more > > > after tg3_close() is called. > > > > But, while tg3_close() starts executing, an interrupt may > > come in and > > schedule polling (set __LINK_STATE_RX_SCHED). So tg3_poll() -> > > tg3_tx() may well occur. > > Actually I don't understand the purpose of having dev_close() wait for > __LINK_STATE_RX_SCHED to be cleared. An interrupt may arrive at any > time after it's cleared, and reset __LINK_STATE_RX_SCHED. Can someone > explain please? > If netif_running() is cleared, netif_rx_schedule() will not schedule the ->poll(). So even if tg3 gets an interrupt after close, tg3_poll() will not be scheduled. Oh! I had missed the netif_running() check in netif_rx_schedule_prep(). Thanks Michael and Maxime for this clarification. -- Eric - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH wireless-2.6-git] prism54: WPA/RSN support for fullmac cards
Am Mittwoch, 8. November 2006 01:39 schrieben Sie: > On Fri, Nov 03, 2006 at 01:41:46PM -0500, Luis R. Rodriguez wrote: > > On 11/3/06, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote: > > >yes, especially mgt_commit_list caused alot headaches, until I removed > > >DOT11_OID_PSM from the cache list. > > >Now, I can "hammer" it with ping -f for hours. > > > > nice, perhaps that's been the culprit all along... going to dig to see > > if I find a fullmac prism card. Will like to get this merged in. > > Any resolution on this? no replies. Seems like it works for just fine for everybody. ;) Christian - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: tg3_close question
On Thu, 2006-11-09 at 18:22 +0100, Eric Lemoine wrote: > Actually I don't understand the purpose of having dev_close() wait for > __LINK_STATE_RX_SCHED to be cleared. An interrupt may arrive at any > time after it's cleared, and reset __LINK_STATE_RX_SCHED. Can someone > explain please? Further interrupts won't schedule poll because netif_rx_schedule_prep() checks for netif_running(). -- Maxime - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: why do we mangle checksums for v6 ICMP?
Hi Al, Al Viro wrote: AFAICS, the rules are: (1) checksum is 16-bit one's complement of the one's complement sum of relevant 16bit words. (2) for v4 UDP all-zeroes has special meaning - no checksum; if you get it from (1), send all-ones instead. (3) for v6 UDP we have the same remapping as in (2), but all-zeroes has different meaning - not "ignore checksum" as in v4, but "reject the packet". (4) there is no (4). IOW, nobody except UDP has any business doing that 0->0x replacement. However, we have if (icmp6h->icmp6_cksum == 0) icmp6h->icmp6_cksum = -1; This doesn't look necessary, RFCs 4443/2463 don't mention it being necessary, and BSD doesn't do it either. I'll cook-up a patch to remove that since I was doing some other mods in that codepath. and similar in net/ipv6/raw.c Maybe here it only needs to be done if (fl->proto == IPPROTO_UDP)? -Brian - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: tg3_close question
Eric Lemoine wrote: > > On 11/9/06, Michael Chan <[EMAIL PROTECTED]> wrote: > > > So it is not possible for tg3_poll() -> tg3_tx() to run any more > > > after tg3_close() is called. > > > > But, while tg3_close() starts executing, an interrupt may > > come in and > > schedule polling (set __LINK_STATE_RX_SCHED). So tg3_poll() -> > > tg3_tx() may well occur. > > Actually I don't understand the purpose of having dev_close() wait for > __LINK_STATE_RX_SCHED to be cleared. An interrupt may arrive at any > time after it's cleared, and reset __LINK_STATE_RX_SCHED. Can someone > explain please? > If netif_running() is cleared, netif_rx_schedule() will not schedule the ->poll(). So even if tg3 gets an interrupt after close, tg3_poll() will not be scheduled. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3] mlsxfrm: Various fixes
James Morris wrote: > On Thu, 9 Nov 2006, Paul Moore wrote: > >>It sounds like you have an idea of how you would like to see this implemented, >>can you give me a rough outline? Is this the partitioned SECMARK field you >>talked about earlier? > > No, just the fact that you are in the same kernel address space and can > readily access the security context of the peer. For a minute I got all excited thinking that you had found a solution to this :) The problem I keep running into is that it is not obvious to me how we can determine the security context of the sending socket on the receive side by looking at the skb. I'm really hoping that it is just because I haven't looked at the code long enough, or thought about it hard enough. It is just so frustrating because you are right - all the information is there, I just don't know how to get to it when we need it without using external labeling. -- paul moore linux security @ hp - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: tg3_close question
On 11/9/06, Eric Lemoine <[EMAIL PROTECTED]> wrote: On 11/9/06, Michael Chan <[EMAIL PROTECTED]> wrote: > Michael Chan wrote: > > > Eric Lemoine wrote: > > > > > Instead of tg3_netif_stop() tg3_close() uses netif_stop_queue() > > > to stop xmit. This doesn't seem right to me. E.g. another CPU > > > in tg3_tx() > > > could do netif_wake_queue() just after tg3_close() did > > > netif_stop_queue(). Isn't a bug? > > > > > > > I think you're right. It is more correct to call tg3_netif_stop(). > > > > I take it back. Before ->stop() is called, netif_running is cleared, > and it waits for __LINK_STATE_RX_SCHED to be cleared. This means > that it will wait for the last ->poll() to finish. Correct. > So it is not possible for tg3_poll() -> tg3_tx() to run any more > after tg3_close() is called. But, while tg3_close() starts executing, an interrupt may come in and schedule polling (set __LINK_STATE_RX_SCHED). So tg3_poll() -> tg3_tx() may well occur. Actually I don't understand the purpose of having dev_close() wait for __LINK_STATE_RX_SCHED to be cleared. An interrupt may arrive at any time after it's cleared, and reset __LINK_STATE_RX_SCHED. Can someone explain please? Thanks, -- Eric - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: tg3_close question
On 11/9/06, Michael Chan <[EMAIL PROTECTED]> wrote: Michael Chan wrote: > Eric Lemoine wrote: > > > Instead of tg3_netif_stop() tg3_close() uses netif_stop_queue() > > to stop xmit. This doesn't seem right to me. E.g. another CPU > > in tg3_tx() > > could do netif_wake_queue() just after tg3_close() did > > netif_stop_queue(). Isn't a bug? > > > > I think you're right. It is more correct to call tg3_netif_stop(). > I take it back. Before ->stop() is called, netif_running is cleared, and it waits for __LINK_STATE_RX_SCHED to be cleared. This means that it will wait for the last ->poll() to finish. Correct. So it is not possible for tg3_poll() -> tg3_tx() to run any more after tg3_close() is called. But, while tg3_close() starts executing, an interrupt may come in and schedule polling (set __LINK_STATE_RX_SCHED). So tg3_poll() -> tg3_tx() may well occur. -- Eric - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [take23 0/5] kevent: Generic event handling mechanism.
On Thu, 9 Nov 2006, Eric Dumazet wrote: > > > Lost forever means? If there are more processes watching some fd (external > > > events), they all get their own copy of the events in their own private > > > epoll fd. It's not that we "steal" things out of the kernel, is not a 1:1 > > > producer/consumer thing (one producer, 1 queue). It's one producer, > > > broadcast to all listeners (consumers) thing. The only case where it'd > > > matter is in the case of multiple threads sharing the same epoll fd. > > > > In my particular epoll application, the producer is tcp stack, and I have > > one consumer. If an network event is lost in the EFAULT handling, its lost > > forever. In any case, my application do provide a correct user area, so this > > problem is only theorical. > > I realize I was not explicit, and dit not answer your question (Lost forever > means ?) > > if (epi->revents) { > if (__put_user(epi->revents, >&events[eventcnt].events) || > __put_user(epi->event.data, >&events[eventcnt].data)) > return -EFAULT; > >>if (epi->event.events & EPOLLONESHOT) > >>epi->event.events &= EP_PRIVATE_BITS; > eventcnt++; > } > > If one EPOLLONESHOT event is correctly copied to user space, its status is > updated. > > If other ready events in the same epoll_wait() call cannot be transferred > because of an EFAULT (we reach the real end of user provided area), this > EPOLLONESHOT event is lost forever, because it wont be requeued in ready list. Your application is feeding crap to the kernel, because of programming bugs. If that happens, I want an EFAULT and not a partially filled buffer. And which buffer then? This could have been scribbled in userspace memory (the pointer), and the try of the kernel to mask out bugs might create even more subtle problems. Such bug will *never* show up in the up in case the wrong buffer is partially valid (first part, that is the *only* case where your fix would make a difference compared to the status quo), since in case of no ready events we'll never hit it, and in case of some events we'll always return few of them and never EFAULT. No, the more I think about it, the more I personally disagree with the change. > Please dont slow the hot path for a basically "User Error". It's already > tested in the transfert function, with two conditional > branches for each transfered event. Ohh, if you think you can measure them from userspace, those can be turned in 'err |= __put_user();' with err tested only out of the loop. - Davide - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Intel 82559 NIC corrupted EEPROM
John wrote: Auke Kok wrote: This is what I was afraid of: even though the code allows you to bypass the EEPROM checksum, the probe fails on a further check to see if the MAC address is valid. Since something with this NIC specifically made the EEPROM return all 0xff's, the MAC address is automatically invalid, and thus probe fails. I don't understand why you think there is something wrong with a specific NIC? that was completely not my point - I was merely trying to point out that the original problem causes a cascade of error events later on, and bypassing the eeprom check in this case didn't help you at all. Something is wrong in the driver, but I don't understand yet why it only affects one of the 3 nics in your system. In 2.6.14.7, e100.ko fails to read the EEPROM on :00:08.0 (eth0) In 2.6.18.1, e100.ko fails to read the EEPROM on :00:09.0 (eth1) almost sounds like a bug got fixed and it introduced a regression. this wouldn't be the right time to pull out git-bisect would it? even loading 2.6.15, 2.6.16, 2.6.17 on it would give us some good information. Cheers, Auke - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [IPROUTE2] Add support for inverted selectors
added - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: tg3_close question
Michael Chan wrote: > Eric Lemoine wrote: > > > Instead of tg3_netif_stop() tg3_close() uses netif_stop_queue() > > to stop xmit. This doesn't seem right to me. E.g. another CPU > > in tg3_tx() > > could do netif_wake_queue() just after tg3_close() did > > netif_stop_queue(). Isn't a bug? > > > > I think you're right. It is more correct to call tg3_netif_stop(). > I take it back. Before ->stop() is called, netif_running is cleared, and it waits for __LINK_STATE_RX_SCHED to be cleared. This means that it will wait for the last ->poll() to finish. So it is not possible for tg3_poll() -> tg3_tx() to run any more after tg3_close() is called. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: tg3_close question
Eric Lemoine wrote: > Instead of tg3_netif_stop() tg3_close() uses netif_stop_queue() to > stop xmit. This doesn't seem right to me. E.g. another CPU in tg3_tx() > could do netif_wake_queue() just after tg3_close() did > netif_stop_queue(). Isn't a bug? > I think you're right. It is more correct to call tg3_netif_stop(). - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHSET] packet mark & fib rules work
Hi, On Thu, Nov 09, 2006 at 01:49:11PM +0100, Thomas Graf wrote: > * Steven Whitehouse <[EMAIL PROTECTED]> 2006-11-09 11:46 > > > > so far as all the DECnet bits go. One question though... will you > > be adding later (as your slide #5 and #11 from your netconf presentation > > appear to imply) a way to set the mark from the routing table (presumably > > included in the nexthop info) ? > > So far I haven't planned this, slide #11 describes that if I add an > address with a given mark the corresponding route will only apply > to packets with a matching mark. Slide #5 shows the idea of an ingress > classifier/action setting the mark field based on iif. I focus on > selecting routes based on marks, not the other way around but its > certainly a intersting idea if you can elaborate it further. So here is roughly what I was thinking... this comes from having spent a little while thinking about the best way to integrate MPLS into the network stack. An MPLS label is 32 bits in size which conviently matches the size of the packet mark. So one thought was this (for MPLS edge routers). Add the ability to set a mark to the IP routing table. Something along the lines of: /sbin/ip route add 10.1.0.0/16 via 10.2.1.1 dev eth0 setmark 6 and then use the mark as the FEC (forwarding equivalence class) for MPLS (which is just an index, but in simple cases could contain a whole MPLS label). I was hoping that it might be possible to use the xfrm infrastructure to deal with the actual application of MPLS labels, but I'm not yet 100% certain that its a good fit. Either way, MPLS will require some kind of way to indicate the FEC for each route, so using the generic mark like this seems to me a reasonable solution on the basis that other uses might then be found for it as well. Since MPLS labels are only a subset of the full 32 bits, being able to use a mask in conjunction with setting the mark might also be a useful feature, so that the logic (pseudo code) after route lookup might look something like: skb->mark &= ~nh->nh_setmask; skb->mark |= nh->nh_setmark; /* Assume mark only sets bits allowed by mask */ The big question being, is this going to be a problem bearing in mind it would appear in the routing fast path? On the MPLS input side, packet marks would be set according to the incoming MPLS label and then work in just the same way that you propose using the marks to create separate routing for different VLANs for example. If people are generally happy with the idea, and since its not already part of your plans, then I'll try and put a patch together for it in the not too distant future, Steve. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Realtek 8139 driver (8139too.c) TX Timeout doesn't allow interrupt handler to disable receive interrupts at high bi-directional traffic
Hi All I found an issue with Realtek 8139 driver (2.6.10 branch) at high bi-directional traffic. On transmit timeout, driver's timeout callback re-enables the receive interrupt. On the next receive interrupt, the ISR disables the receive interrupt "only" when the receive poll task is not active. But the poll task is actually active and hence it doesn't disable the receive interrupt. So the ISR returns without clearing the receive interrupt. This un-serviced receive interrupt brings the system into hung state. My understanding here is, on receive interrupt the ISR should disable the receive interrupt irrespective of the polling task's state (active or inactive). I changed the code (as shown below) and it works perfectly. Is this a known issue? If so, is there a fix already available? Also, we get frequent TX timeouts during high rate of traffic. What could be the reason for this frequent TX timeouts? --- old/8139too.c 2006-11-09 11:49:25.0 +0530 +++ new/8139too.c 2006-11-09 11:50:02.0 +0530 @@ -2200,8 +2200,8 @@ /* Receive packets are processed by poll routine. If not running start it now. */ if (status & RxAckBits){ - if (netif_rx_schedule_prep(dev)) { RTL_W16_F (IntrMask, rtl8139_norx_intr_mask); + if (netif_rx_schedule_prep(dev)) { __netif_rx_schedule (dev); } } Thanks Mansoor - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Intel 82559 NIC corrupted EEPROM
Jesse Brandeburg wrote: I suspect that one reason Becker's code works is that it uses IO based access (slower, and different method) to the adapter rather than memory mapped access. I've noticed this difference. The second thought is that the adapter is in D3, and something about your kernel or the driver doesn't successfully wake it up to D0. On my NICs, the EEPROM ID (Word 0Ah) is set to 0x40a2. Thus DDPD (bit 6) is set to 0. DDPD is the "Disable Deep Power Down while PME is disabled" bit. 0 - Deep Power Down is enabled in D3 state while PME-disabled. 1 - Deep Power Down disabled in D3 state while PME-disabled. This bit should be set to 1b if a TCO controller is being used via the SMB because it requires receive functionality at all power states. Are you suggesting I try and set DDPD to 1? Or is this completely unrelated? An indication of this would be looking at lspci -vv before/after loading the driver. $ diff -u lspci_vv_before_e100.txt lspci_vv_after_e100.txt --- lspci_vv_before_e100.txt2006-11-09 14:51:30.0 +0100 +++ lspci_vv_after_e100.txt 2006-11-09 14:51:30.0 +0100 @@ -74,21 +74,20 @@ Expansion ROM at 2000 [disabled] [size=1M] Capabilities: [dc] Power Management version 2 Flags: PMEClk- DSI+ D1+ D2+ AuxCurrent=0mA PME(D0+,D1+,D2+,D3hot+,D3cold+) - Status: D0 PME-Enable+ DSel=0 DScale=2 PME- + Status: D0 PME-Enable- DSel=0 DScale=2 PME- 00:09.0 Ethernet controller: Intel Corporation 82557/8/9 [Ethernet Pro 100] (rev 08) Subsystem: Intel Corporation EtherExpress PRO/100B (TX) - Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- + Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap+ 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- SERR- - Latency: 32 (2000ns min, 14000ns max), cache line size 08 Interrupt: pin A routed to IRQ 10 - Region 0: Memory at e5302000 (32-bit, non-prefetchable) [size=4K] - Region 1: I/O ports at dc00 [size=64] - Region 2: Memory at e510 (32-bit, non-prefetchable) [size=1M] + Region 0: Memory at e5302000 (32-bit, non-prefetchable) [disabled] [size=4K] + Region 1: I/O ports at dc00 [disabled] [size=64] + Region 2: Memory at e510 (32-bit, non-prefetchable) [disabled] [size=1M] Expansion ROM at 2010 [disabled] [size=1M] Capabilities: [dc] Power Management version 2 Flags: PMEClk- DSI+ D1+ D2+ AuxCurrent=0mA PME(D0+,D1+,D2+,D3hot+,D3cold+) - Status: D0 PME-Enable+ DSel=0 DScale=2 PME- + Status: D0 PME-Enable- DSel=0 DScale=2 PME- 00:0a.0 Ethernet controller: Intel Corporation 82557/8/9 [Ethernet Pro 100] (rev 08) Subsystem: Intel Corporation EtherExpress PRO/100B (TX) Also, after loading/unloading eepro100 does the e100 driver work? No. A third idea is look for a master abort in lspci after e100 fails to load. I don't understand that one. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH 0/3] labeled-ipsec: Repost patchset with updates [Originally: mlsxfrm: Various Fixes]
> I think this should be aimed at 2.6.20, because we are at the last or > second-last -rc currently, and I don't think these fixes are > urgent enough > to justify the risk at this stage. That makes sense. Thanks. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH take 2] Atmel MACB ethernet driver
Driver for the Atmel MACB on-chip ethernet module. Tested on AVR32/AT32AP7000/ATSTK1000. I've heard rumours that it works with AT91SAM9260 as well, and it may be possible to share some code with the at91_ether driver for AT91RM9200. Hardware documentation can be found in the AT32AP7000 data sheet, which can be downloaded from http://www.atmel.com/dyn/products/datasheets.asp?family_id=682 Changes since previous version: * Probe for PHY ID instead of depending on it being provided through platform_data. * Grab initial ethernet address from the MACB registers instead of depending on platform_data. * Set MII/RMII mode correctly. These changes are mostly about making the driver more compatible with the at91 infrastructure. Signed-off-by: Haavard Skinnemoen <[EMAIL PROTECTED]> --- MAINTAINERS |7 + drivers/net/Kconfig | 11 + drivers/net/Makefile |2 + drivers/net/macb.c | 1210 ++ drivers/net/macb.h | 387 5 files changed, 1617 insertions(+), 0 deletions(-) diff --git a/MAINTAINERS b/MAINTAINERS index d708702..b8c28b5 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -426,6 +426,13 @@ L: [EMAIL PROTECTED] W: http://linux-atm.sourceforge.net S: Maintained +ATMEL MACB ETHERNET DRIVER +P: Atmel AVR32 Support Team +M: [EMAIL PROTECTED] +P: Haavard Skinnemoen +M: [EMAIL PROTECTED] +S: Supported + ATMEL WIRELESS DRIVER P: Simon Kelley M: [EMAIL PROTECTED] diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig index 9cb3ca5..4e033b1 100644 --- a/drivers/net/Kconfig +++ b/drivers/net/Kconfig @@ -188,6 +188,17 @@ config MII or internal device. It is safe to say Y or M here even if your ethernet card lack MII. +config MACB + tristate "Atmel MACB support" + depends on NET_ETHERNET && AVR32 + select MII + help + The Atmel MACB ethernet interface is found on many AT32 and AT91 + parts. Say Y to include support for the MACB chip. + + To compile this driver as a module, choose M here: the module + will be called macb. + source "drivers/net/arm/Kconfig" config MACE diff --git a/drivers/net/Makefile b/drivers/net/Makefile index f270bc4..8e67697 100644 --- a/drivers/net/Makefile +++ b/drivers/net/Makefile @@ -197,6 +197,8 @@ obj-$(CONFIG_SMC911X) += smc911x.o obj-$(CONFIG_DM9000) += dm9000.o obj-$(CONFIG_FEC_8XX) += fec_8xx/ +obj-$(CONFIG_MACB) += macb.o + obj-$(CONFIG_ARM) += arm/ obj-$(CONFIG_DEV_APPLETALK) += appletalk/ obj-$(CONFIG_TR) += tokenring/ diff --git a/drivers/net/macb.c b/drivers/net/macb.c new file mode 100644 index 000..bd0ce98 --- /dev/null +++ b/drivers/net/macb.c @@ -0,0 +1,1210 @@ +/* + * Atmel MACB Ethernet Controller driver + * + * Copyright (C) 2004-2006 Atmel Corporation + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include + +#include "macb.h" + +#define to_net_dev(class) container_of(class, struct net_device, class_dev) + +#define RX_BUFFER_SIZE 128 +#define RX_RING_SIZE 512 +#define RX_RING_BYTES (sizeof(struct dma_desc) * RX_RING_SIZE) + +/* Make the IP header word-aligned (the ethernet header is 14 bytes) */ +#define RX_OFFSET 2 + +#define TX_RING_SIZE 128 +#define DEF_TX_RING_PENDING(TX_RING_SIZE - 1) +#define TX_RING_BYTES (sizeof(struct dma_desc) * TX_RING_SIZE) + +#define TX_RING_GAP(bp)\ + (TX_RING_SIZE - (bp)->tx_pending) +#define TX_BUFFS_AVAIL(bp) \ + (((bp)->tx_tail <= (bp)->tx_head) ? \ +(bp)->tx_tail + (bp)->tx_pending - (bp)->tx_head : \ +(bp)->tx_tail - (bp)->tx_head - TX_RING_GAP(bp)) +#define NEXT_TX(n) (((n) + 1) & (TX_RING_SIZE - 1)) + +#define NEXT_RX(n) (((n) + 1) & (RX_RING_SIZE - 1)) + +/* minimum number of free TX descriptors before waking up TX process */ +#define MACB_TX_WAKEUP_THRESH (TX_RING_SIZE / 4) + +#define MACB_RX_INT_FLAGS (MACB_BIT(RCOMP) | MACB_BIT(RXUBR) \ +| MACB_BIT(ISR_ROVR)) + +static void __macb_set_hwaddr(struct macb *bp) +{ + u32 bottom; + u16 top; + + bottom = cpu_to_le32(*((u32 *)bp->dev->dev_addr)); + macb_writel(bp, SA1B, bottom); + top = cpu_to_le16(*((u16 *)(bp->dev->dev_addr + 4))); + macb_writel(bp, SA1T, top); +} + +static void __init macb_get_hwaddr(struct macb *bp) +{ + u32 bottom; + u16 top; + u8 addr[6]; + + bottom = macb_readl(bp, SA1B); + top = macb_readl(bp, SA1T); +
tg3_close question
Hi Instead of tg3_netif_stop() tg3_close() uses netif_stop_queue() to stop xmit. This doesn't seem right to me. E.g. another CPU in tg3_tx() could do netif_wake_queue() just after tg3_close() did netif_stop_queue(). Isn't a bug? Thanks, -- Eric - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/6] [NET]: Rethink mark field in struct flowi
* Eric Dumazet <[EMAIL PROTECTED]> 2006-11-09 14:23 > I give a big NACK to this patch. > > By moving fwmark outside of union, you basically touch more cache lines in > lookups. I have many machines doing XX.XXX of lookups per second, with long > chains, already using 10% of CPU. I am sure a lot of other machines would > suffer with this patch, especially machines with 32 bytes cache lines. > > For IPV4 lookups, compare offset of fwmark before your patch and after. > The size of ip6_u is so large that moving fwmark after nl_u union is not an > option. Many packets in flight on the Internet are still IPV4. Would you be happy if mark is moved in front of the union after iif? - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/6] [NET]: Rethink mark field in struct flowi
On Thursday 09 November 2006 12:27, Thomas Graf wrote: > Now that all protocols have been made aware of the mark > field it can be moved out of the union thus simplyfing > its usage. > > The config options in the IPv4/IPv6/DECnet subsystems > to enable respectively disable mark based routing only > obfuscate the code with ifdefs, the cost for the > additional comparison in the flow key is insignificant, > and most distributions have all these options enabled > by default anyway. Therefore it makes sense to remove > the config options and enable mark based routing by > default. I give a big NACK to this patch. By moving fwmark outside of union, you basically touch more cache lines in lookups. I have many machines doing XX.XXX of lookups per second, with long chains, already using 10% of CPU. I am sure a lot of other machines would suffer with this patch, especially machines with 32 bytes cache lines. For IPV4 lookups, compare offset of fwmark before your patch and after. The size of ip6_u is so large that moving fwmark after nl_u union is not an option. Many packets in flight on the Internet are still IPV4. If you think code is obfuscated, you can make it more readable using macros defined in include files, and used in C file without ifdefs. Thank you Eric - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Turn nfmark into generic mark
> The mark is already a bitfield, you may dividide it into separate > marks with the exception of routes which do not yet support a > mask. Just checked, now that we have --and-mask and --or-mask, this is much better than before. The bitmask is OK when up to 32 marks are needed (like, for classification). But a common setup is NAT+QoS that first hides the src IP and then has to do QoS and mark is the only usable carrier of this information. So the mark value needs to carry both classification info and IP address info and here things become very limited. Though using say 8 bits for host should be usually enough... Maybe just add original src and/ord DST for carrying this information through SNAT/DNAT? Or is it too much bloat for carrying around? -- Meelis Roos ([EMAIL PROTECTED]) - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHSET] packet mark & fib rules work
* Steven Whitehouse <[EMAIL PROTECTED]> 2006-11-09 11:46 > On Thu, Nov 09, 2006 at 12:27:35PM +0100, Thomas Graf wrote: > > Renames nfmark to mark and remove the dependency on netfilter > > to ease usage by all subsystems. Also removes all the unneeded > > config options to enable routing by fwmark, it can be safely > > enabled by default. > > > > Moves mark selector code from per protocol part into the generic > > part and adds support for inverting selectors. > > > > Acked-by: Steven Whitehouse <[EMAIL PROTECTED]> > > so far as all the DECnet bits go. One question though... will you > be adding later (as your slide #5 and #11 from your netconf presentation > appear to imply) a way to set the mark from the routing table (presumably > included in the nexthop info) ? So far I haven't planned this, slide #11 describes that if I add an address with a given mark the corresponding route will only apply to packets with a matching mark. Slide #5 shows the idea of an ingress classifier/action setting the mark field based on iif. I focus on selecting routes based on marks, not the other way around but its certainly a intersting idea if you can elaborate it further. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Turn nfmark into generic mark
* Meelis Roos <[EMAIL PROTECTED]> 2006-11-09 14:32 > Another thought: sometimes a single mark makes rulesets inconvenient. > What about several independent marks on a packet? The mark is already a bitfield, you may dividide it into separate marks with the exception of routes which do not yet support a mask. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Turn nfmark into generic mark
Another thought: sometimes a single mark makes rulesets inconvenient. What about several independent marks on a packet? -- Meelis Roos <[EMAIL PROTECTED]> - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Intel 82559 NIC corrupted EEPROM
Auke Kok wrote: This is what I was afraid of: even though the code allows you to bypass the EEPROM checksum, the probe fails on a further check to see if the MAC address is valid. Since something with this NIC specifically made the EEPROM return all 0xff's, the MAC address is automatically invalid, and thus probe fails. I don't understand why you think there is something wrong with a specific NIC? In 2.6.14.7, e100.ko fails to read the EEPROM on :00:08.0 (eth0) In 2.6.18.1, e100.ko fails to read the EEPROM on :00:09.0 (eth1) In both kernels, eepro100.ko successfully reads all the EEPROMs. It seems that the driver has more problems with this NIC than just the eeprom checksum being bad. Needless to say this might need fixing. Can you load the eepro driver and send me the full eeprom dump? Perhaps I can duplicate things over here. 00:08.0 EEPROM contents, size 64x16 3000 0464 e4e6 0e03 0201 4701 7213 8310 40a2 0001 8086 0128 92f7 00:09.0 EEPROM contents, size 64x16 3000 0464 e5e6 0e03 0201 4701 7213 8310 40a2 0001 8086 0128 91f7 00:0a.0 EEPROM contents, size 64x16 3000 0464 e6e6 0e03 0201 4701 7213 8310 40a2 0001 8086 0128 90f7 - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHSET] packet mark & fib rules work
Hi, On Thu, Nov 09, 2006 at 12:27:35PM +0100, Thomas Graf wrote: > Renames nfmark to mark and remove the dependency on netfilter > to ease usage by all subsystems. Also removes all the unneeded > config options to enable routing by fwmark, it can be safely > enabled by default. > > Moves mark selector code from per protocol part into the generic > part and adds support for inverting selectors. > Acked-by: Steven Whitehouse <[EMAIL PROTECTED]> so far as all the DECnet bits go. One question though... will you be adding later (as your slide #5 and #11 from your netconf presentation appear to imply) a way to set the mark from the routing table (presumably included in the nexthop info) ? Steve. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[IPROUTE2] Add support for inverted selectors
Index: iproute2.git/include/linux/fib_rules.h === --- /dev/null 1970-01-01 00:00:00.0 + +++ iproute2.git/include/linux/fib_rules.h 2006-11-09 11:48:07.0 +0100 @@ -0,0 +1,66 @@ +#ifndef __LINUX_FIB_RULES_H +#define __LINUX_FIB_RULES_H + +#include +#include + +/* rule is permanent, and cannot be deleted */ +#define FIB_RULE_PERMANENT 1 +#define FIB_RULE_INVERT2 + +struct fib_rule_hdr +{ + __u8family; + __u8dst_len; + __u8src_len; + __u8tos; + + __u8table; + __u8res1; /* reserved */ + __u8res2; /* reserved */ + __u8action; + + __u32 flags; +}; + +enum +{ + FRA_UNSPEC, + FRA_DST,/* destination address */ + FRA_SRC,/* source address */ + FRA_IFNAME, /* interface name */ + FRA_UNUSED1, + FRA_UNUSED2, + FRA_PRIORITY, /* priority/preference */ + FRA_UNUSED3, + FRA_UNUSED4, + FRA_UNUSED5, + FRA_FWMARK, /* mark */ + FRA_FLOW, /* flow/class id */ + FRA_UNUSED6, + FRA_UNUSED7, + FRA_UNUSED8, + FRA_TABLE, /* Extended table id */ + FRA_FWMASK, /* mask for netfilter mark */ + __FRA_MAX +}; + +#define FRA_MAX (__FRA_MAX - 1) + +enum +{ + FR_ACT_UNSPEC, + FR_ACT_TO_TBL, /* Pass to fixed table */ + FR_ACT_RES1, + FR_ACT_RES2, + FR_ACT_RES3, + FR_ACT_RES4, + FR_ACT_BLACKHOLE, /* Drop without notification */ + FR_ACT_UNREACHABLE, /* Drop with ENETUNREACH */ + FR_ACT_PROHIBIT,/* Drop with EACCES */ + __FR_ACT_MAX, +}; + +#define FR_ACT_MAX (__FR_ACT_MAX - 1) + +#endif Index: iproute2.git/ip/iprule.c === --- iproute2.git.orig/ip/iprule.c 2006-11-09 11:46:20.0 +0100 +++ iproute2.git/ip/iprule.c2006-11-09 11:51:35.0 +0100 @@ -24,6 +24,7 @@ #include #include #include +#include #include "rt_names.h" #include "utils.h" @@ -36,7 +37,7 @@ static void usage(void) { fprintf(stderr, "Usage: ip rule [ list | add | del | flush ] SELECTOR ACTION\n"); - fprintf(stderr, "SELECTOR := [ from PREFIX ] [ to PREFIX ] [ tos TOS ] [ fwmark FWMARK ]\n"); + fprintf(stderr, "SELECTOR := [ not ] [ from PREFIX ] [ to PREFIX ] [ tos TOS ] [ fwmark FWMARK ]\n"); fprintf(stderr, "[ dev STRING ] [ pref NUMBER ]\n"); fprintf(stderr, "ACTION := [ table TABLE_ID ]\n"); fprintf(stderr, " [ prohibit | reject | unreachable ]\n"); @@ -80,6 +81,9 @@ else fprintf(fp, "0:\t"); + if (r->rtm_flags & FIB_RULE_INVERT) + fprintf(fp, "not "); + if (tb[RTA_SRC]) { if (r->rtm_src_len != host_len) { fprintf(fp, "from %s/%u ", rt_addr_n2a(r->rtm_family, @@ -209,6 +213,7 @@ req.r.rtm_scope = RT_SCOPE_UNIVERSE; req.r.rtm_table = 0; req.r.rtm_type = RTN_UNSPEC; + req.r.rtm_flags = 0; if (cmd == RTM_NEWRULE) { req.n.nlmsg_flags |= NLM_F_CREATE|NLM_F_EXCL; @@ -216,7 +221,9 @@ } while (argc > 0) { - if (strcmp(*argv, "from") == 0) { + if (strcmp(*argv, "not") == 0) { + req.r.rtm_flags |= FIB_RULE_INVERT; + } else if (strcmp(*argv, "from") == 0) { inet_prefix dst; NEXT_ARG(); get_prefix(&dst, *argv, req.r.rtm_family); - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 6/6] [NET] rules: Add support to invert selectors
Introduces a new flag FIB_RULE_INVERT causing rules to apply if the specified selector doesn't match. Signed-off-by: Thomas Graf <[EMAIL PROTECTED]> Index: net-2.6.20/include/linux/fib_rules.h === --- net-2.6.20.orig/include/linux/fib_rules.h 2006-11-08 23:32:35.0 +0100 +++ net-2.6.20/include/linux/fib_rules.h2006-11-08 23:34:13.0 +0100 @@ -6,6 +6,7 @@ /* rule is permanent, and cannot be deleted */ #define FIB_RULE_PERMANENT 1 +#define FIB_RULE_INVERT2 struct fib_rule_hdr { Index: net-2.6.20/net/core/fib_rules.c === --- net-2.6.20.orig/net/core/fib_rules.c2006-11-08 23:32:35.0 +0100 +++ net-2.6.20/net/core/fib_rules.c 2006-11-08 23:34:51.0 +0100 @@ -107,6 +107,22 @@ EXPORT_SYMBOL_GPL(fib_rules_unregister); +static int fib_rule_match(struct fib_rule *rule, struct fib_rules_ops *ops, + struct flowi *fl, int flags) +{ + int ret = 0; + + if (rule->ifindex && (rule->ifindex != fl->iif)) + goto out; + + if ((rule->mark ^ fl->mark) & rule->mark_mask) + goto out; + + ret = ops->match(rule, fl, flags); +out: + return (rule->flags & FIB_RULE_INVERT) ? !ret : ret; +} + int fib_rules_lookup(struct fib_rules_ops *ops, struct flowi *fl, int flags, struct fib_lookup_arg *arg) { @@ -116,13 +132,7 @@ rcu_read_lock(); list_for_each_entry_rcu(rule, ops->rules_list, list) { - if (rule->ifindex && (rule->ifindex != fl->iif)) - continue; - - if ((rule->mark ^ fl->mark) & rule->mark_mask) - continue; - - if (!ops->match(rule, fl, flags)) + if (!fib_rule_match(rule, ops, fl, flags)) continue; err = ops->action(rule, fl, flags, arg); -- - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 5/6] [NET] rules: Share common attribute validation policy
Move the attribute policy for the non-specific attributes into net/fib_rules.h and include it in the respective protocols. Signed-off-by: Thomas Graf <[EMAIL PROTECTED]> Index: net-2.6.20/include/net/fib_rules.h === --- net-2.6.20.orig/include/net/fib_rules.h 2006-11-08 23:32:35.0 +0100 +++ net-2.6.20/include/net/fib_rules.h 2006-11-08 23:33:21.0 +0100 @@ -59,6 +59,13 @@ struct module *owner; }; +#define FRA_GENERIC_POLICY \ + [FRA_IFNAME]= { .type = NLA_STRING, .len = IFNAMSIZ - 1 }, \ + [FRA_PRIORITY] = { .type = NLA_U32 }, \ + [FRA_FWMARK]= { .type = NLA_U32 }, \ + [FRA_FWMASK]= { .type = NLA_U32 }, \ + [FRA_TABLE] = { .type = NLA_U32 } + static inline void fib_rule_get(struct fib_rule *rule) { atomic_inc(&rule->refcnt); Index: net-2.6.20/net/decnet/dn_rules.c === --- net-2.6.20.orig/net/decnet/dn_rules.c 2006-11-08 23:32:35.0 +0100 +++ net-2.6.20/net/decnet/dn_rules.c2006-11-08 23:33:21.0 +0100 @@ -108,13 +108,9 @@ } static struct nla_policy dn_fib_rule_policy[FRA_MAX+1] __read_mostly = { - [FRA_IFNAME]= { .type = NLA_STRING, .len = IFNAMSIZ - 1 }, - [FRA_PRIORITY] = { .type = NLA_U32 }, + FRA_GENERIC_POLICY, [FRA_SRC] = { .type = NLA_U16 }, [FRA_DST] = { .type = NLA_U16 }, - [FRA_FWMARK]= { .type = NLA_U32 }, - [FRA_FWMASK]= { .type = NLA_U32 }, - [FRA_TABLE] = { .type = NLA_U32 }, }; static int dn_fib_rule_match(struct fib_rule *rule, struct flowi *fl, int flags) Index: net-2.6.20/net/ipv4/fib_rules.c === --- net-2.6.20.orig/net/ipv4/fib_rules.c2006-11-08 23:32:35.0 +0100 +++ net-2.6.20/net/ipv4/fib_rules.c 2006-11-08 23:33:21.0 +0100 @@ -170,14 +170,10 @@ } static struct nla_policy fib4_rule_policy[FRA_MAX+1] __read_mostly = { - [FRA_IFNAME]= { .type = NLA_STRING, .len = IFNAMSIZ - 1 }, - [FRA_PRIORITY] = { .type = NLA_U32 }, + FRA_GENERIC_POLICY, [FRA_SRC] = { .type = NLA_U32 }, [FRA_DST] = { .type = NLA_U32 }, - [FRA_FWMARK]= { .type = NLA_U32 }, - [FRA_FWMASK]= { .type = NLA_U32 }, [FRA_FLOW] = { .type = NLA_U32 }, - [FRA_TABLE] = { .type = NLA_U32 }, }; static int fib4_rule_configure(struct fib_rule *rule, struct sk_buff *skb, Index: net-2.6.20/net/ipv6/fib6_rules.c === --- net-2.6.20.orig/net/ipv6/fib6_rules.c 2006-11-08 23:32:35.0 +0100 +++ net-2.6.20/net/ipv6/fib6_rules.c2006-11-08 23:33:21.0 +0100 @@ -130,13 +130,9 @@ } static struct nla_policy fib6_rule_policy[FRA_MAX+1] __read_mostly = { - [FRA_IFNAME]= { .type = NLA_STRING, .len = IFNAMSIZ - 1 }, - [FRA_PRIORITY] = { .type = NLA_U32 }, + FRA_GENERIC_POLICY, [FRA_SRC] = { .len = sizeof(struct in6_addr) }, [FRA_DST] = { .len = sizeof(struct in6_addr) }, - [FRA_FWMARK]= { .type = NLA_U32 }, - [FRA_FWMASK]= { .type = NLA_U32 }, - [FRA_TABLE] = { .type = NLA_U32 }, }; static int fib6_rule_configure(struct fib_rule *rule, struct sk_buff *skb, -- - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCHSET] packet mark & fib rules work
Renames nfmark to mark and remove the dependency on netfilter to ease usage by all subsystems. Also removes all the unneeded config options to enable routing by fwmark, it can be safely enabled by default. Moves mark selector code from per protocol part into the generic part and adds support for inverting selectors. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/6] [NET] rules: Protocol independant mark selector
Move mark selector currently implemented per protocol into the protocol independant part. Signed-off-by: Thomas Graf <[EMAIL PROTECTED]> Index: net-2.6.20/include/net/fib_rules.h === --- net-2.6.20.orig/include/net/fib_rules.h 2006-11-08 15:29:26.0 +0100 +++ net-2.6.20/include/net/fib_rules.h 2006-11-08 23:32:35.0 +0100 @@ -13,6 +13,8 @@ atomic_trefcnt; int ifindex; charifname[IFNAMSIZ]; + u32 mark; + u32 mark_mask; u32 pref; u32 flags; u32 table; Index: net-2.6.20/net/core/fib_rules.c === --- net-2.6.20.orig/net/core/fib_rules.c2006-11-08 15:29:26.0 +0100 +++ net-2.6.20/net/core/fib_rules.c 2006-11-08 23:32:35.0 +0100 @@ -119,6 +119,9 @@ if (rule->ifindex && (rule->ifindex != fl->iif)) continue; + if ((rule->mark ^ fl->mark) & rule->mark_mask) + continue; + if (!ops->match(rule, fl, flags)) continue; @@ -179,6 +182,18 @@ rule->ifindex = dev->ifindex; } + if (tb[FRA_FWMARK]) { + rule->mark = nla_get_u32(tb[FRA_FWMARK]); + if (rule->mark) + /* compatibility: if the mark value is non-zero all bits +* are compared unless a mask is explicitly specified. +*/ + rule->mark_mask = 0x; + } + + if (tb[FRA_FWMASK]) + rule->mark_mask = nla_get_u32(tb[FRA_FWMASK]); + rule->action = frh->action; rule->flags = frh->flags; rule->table = frh_get_table(frh, tb); @@ -250,6 +265,14 @@ nla_strcmp(tb[FRA_IFNAME], rule->ifname)) continue; + if (tb[FRA_FWMARK] && + (rule->mark != nla_get_u32(tb[FRA_FWMARK]))) + continue; + + if (tb[FRA_FWMASK] && + (rule->mark_mask != nla_get_u32(tb[FRA_FWMASK]))) + continue; + if (!ops->compare(rule, frh, tb)) continue; @@ -298,6 +321,12 @@ if (rule->pref) NLA_PUT_U32(skb, FRA_PRIORITY, rule->pref); + if (rule->mark) + NLA_PUT_U32(skb, FRA_FWMARK, rule->mark); + + if (rule->mark_mask || rule->mark) + NLA_PUT_U32(skb, FRA_FWMASK, rule->mark_mask); + if (ops->fill(rule, skb, nlh, frh) < 0) goto nla_put_failure; Index: net-2.6.20/net/decnet/dn_rules.c === --- net-2.6.20.orig/net/decnet/dn_rules.c 2006-11-08 16:12:32.0 +0100 +++ net-2.6.20/net/decnet/dn_rules.c2006-11-08 23:32:35.0 +0100 @@ -45,8 +45,6 @@ __le16 dstmask; __le16 srcmap; u8 flags; - u32 fwmark; - u32 fwmask; }; static struct dn_fib_rule default_rule = { @@ -129,9 +127,6 @@ ((daddr ^ r->dst) & r->dstmask)) return 0; - if ((r->fwmark ^ fl->mark) & r->fwmask) - return 0; - return 1; } @@ -165,18 +160,6 @@ if (tb[FRA_DST]) r->dst = nla_get_u16(tb[FRA_DST]); - if (tb[FRA_FWMARK]) { - r->fwmark = nla_get_u32(tb[FRA_FWMARK]); - if (r->fwmark) - /* compatibility: if the mark value is non-zero all bits -* are compared unless a mask is explicitly specified. -*/ - r->fwmask = 0x; - } - - if (tb[FRA_FWMASK]) - r->fwmask = nla_get_u32(tb[FRA_FWMASK]); - r->src_len = frh->src_len; r->srcmask = dnet_make_mask(r->src_len); r->dst_len = frh->dst_len; @@ -197,12 +180,6 @@ if (frh->dst_len && (r->dst_len != frh->dst_len)) return 0; - if (tb[FRA_FWMARK] && (r->fwmark != nla_get_u32(tb[FRA_FWMARK]))) - return 0; - - if (tb[FRA_FWMASK] && (r->fwmask != nla_get_u32(tb[FRA_FWMASK]))) - return 0; - if (tb[FRA_SRC] && (r->src != nla_get_u16(tb[FRA_SRC]))) return 0; @@ -240,10 +217,6 @@ frh->src_len = r->src_len; frh->tos = 0; - if (r->fwmark) - NLA_PUT_U32(skb, FRA_FWMARK, r->fwmark); - if (r->fwmask || r->fwmark) - NLA_PUT_U32(skb, FRA_FWMASK, r->fwmask); if (r->dst_len) NLA_PUT_U16(skb, FRA_DST, r
[PATCH 1/6] [NET]: Turn nfmark into generic mark
nfmark is being used in various subsystems and has become the defacto mark field for all kinds of packets. Therefore it makes sense to rename it to `mark' and remove the dependency on CONFIG_NETFILTER. Signed-off-by: Thomas Graf <[EMAIL PROTECTED]> Index: net-2.6.20/include/linux/skbuff.h === --- net-2.6.20.orig/include/linux/skbuff.h 2006-11-08 15:34:13.0 +0100 +++ net-2.6.20/include/linux/skbuff.h 2006-11-08 16:12:30.0 +0100 @@ -216,7 +216,7 @@ * @tail: Tail pointer * @end: End pointer * @destructor: Destruct function - * @nfmark: Can be used for communication between hooks + * @mark: Generic packet mark * @nfct: Associated connection, if any * @ipvs_property: skbuff is owned by ipvs * @nfctinfo: Relationship of this skb to the connection @@ -295,7 +295,6 @@ #ifdef CONFIG_BRIDGE_NETFILTER struct nf_bridge_info *nf_bridge; #endif - __u32 nfmark; #endif /* CONFIG_NETFILTER */ #ifdef CONFIG_NET_SCHED __u16 tc_index; /* traffic control index */ @@ -310,6 +309,7 @@ __u32 secmark; #endif + __u32 mark; /* These elements must be at the end, see alloc_skb() for details. */ unsigned inttruesize; Index: net-2.6.20/net/core/skbuff.c === --- net-2.6.20.orig/net/core/skbuff.c 2006-11-08 15:34:13.0 +0100 +++ net-2.6.20/net/core/skbuff.c2006-11-08 16:12:30.0 +0100 @@ -473,8 +473,8 @@ #endif C(protocol); n->destructor = NULL; + C(mark); #ifdef CONFIG_NETFILTER - C(nfmark); C(nfct); nf_conntrack_get(skb->nfct); C(nfctinfo); @@ -534,8 +534,8 @@ new->pkt_type = old->pkt_type; new->tstamp = old->tstamp; new->destructor = NULL; + new->mark = old->mark; #ifdef CONFIG_NETFILTER - new->nfmark = old->nfmark; new->nfct = old->nfct; nf_conntrack_get(old->nfct); new->nfctinfo = old->nfctinfo; Index: net-2.6.20/net/ipv4/netfilter/iptable_mangle.c === --- net-2.6.20.orig/net/ipv4/netfilter/iptable_mangle.c 2006-11-08 15:34:13.0 +0100 +++ net-2.6.20/net/ipv4/netfilter/iptable_mangle.c 2006-11-08 16:12:30.0 +0100 @@ -132,7 +132,7 @@ unsigned int ret; u_int8_t tos; __be32 saddr, daddr; - unsigned long nfmark; + u_int32_t mark; /* root is playing with raw sockets. */ if ((*pskb)->len < sizeof(struct iphdr) @@ -143,7 +143,7 @@ } /* Save things which could affect route */ - nfmark = (*pskb)->nfmark; + mark = (*pskb)->mark; saddr = (*pskb)->nh.iph->saddr; daddr = (*pskb)->nh.iph->daddr; tos = (*pskb)->nh.iph->tos; @@ -154,7 +154,7 @@ && ((*pskb)->nh.iph->saddr != saddr || (*pskb)->nh.iph->daddr != daddr #ifdef CONFIG_IP_ROUTE_FWMARK - || (*pskb)->nfmark != nfmark + || (*pskb)->mark != mark #endif || (*pskb)->nh.iph->tos != tos)) if (ip_route_me_harder(pskb, RTN_UNSPEC)) Index: net-2.6.20/net/bridge/netfilter/ebt_mark.c === --- net-2.6.20.orig/net/bridge/netfilter/ebt_mark.c 2006-11-08 15:34:13.0 +0100 +++ net-2.6.20/net/bridge/netfilter/ebt_mark.c 2006-11-08 16:12:30.0 +0100 @@ -25,13 +25,13 @@ int action = info->target & -16; if (action == MARK_SET_VALUE) - (*pskb)->nfmark = info->mark; + (*pskb)->mark = info->mark; else if (action == MARK_OR_VALUE) - (*pskb)->nfmark |= info->mark; + (*pskb)->mark |= info->mark; else if (action == MARK_AND_VALUE) - (*pskb)->nfmark &= info->mark; + (*pskb)->mark &= info->mark; else - (*pskb)->nfmark ^= info->mark; + (*pskb)->mark ^= info->mark; return info->target | -16; } Index: net-2.6.20/net/bridge/netfilter/ebt_mark_m.c === --- net-2.6.20.orig/net/bridge/netfilter/ebt_mark_m.c 2006-11-08 15:34:13.0 +0100 +++ net-2.6.20/net/bridge/netfilter/ebt_mark_m.c2006-11-08 16:12:30.0 +0100 @@ -19,8 +19,8 @@ struct ebt_mark_m_info *info = (struct ebt_mark_m_info *) data; if (info->bitmask & EBT_MARK_OR) - return !(!!(skb->nfmark & info->mask) ^ info->invert); - return !(((skb->nfmark & info->mask) == info->mark) ^ info->invert); + return !(!!(skb->mark & info->mask) ^ info->invert); + return !(((skb->mark & info->mask) == info->mark) ^
[PATCH 3/6] [IPv4] nl_fib_lookup: Rename fl_fwmark to fl_mark
For the sake of consistency. Signed-off-by: Thomas Graf <[EMAIL PROTECTED]> Index: net-2.6.20/include/net/ip_fib.h === --- net-2.6.20.orig/include/net/ip_fib.h2006-11-08 15:34:12.0 +0100 +++ net-2.6.20/include/net/ip_fib.h 2006-11-08 16:12:34.0 +0100 @@ -115,7 +115,7 @@ struct fib_result_nl { __be32 fl_addr; /* To be looked up*/ - u32 fl_fwmark; + u32 fl_mark; unsigned char fl_tos; unsigned char fl_scope; unsigned char tb_id_in; Index: net-2.6.20/net/ipv4/fib_frontend.c === --- net-2.6.20.orig/net/ipv4/fib_frontend.c 2006-11-08 16:12:32.0 +0100 +++ net-2.6.20/net/ipv4/fib_frontend.c 2006-11-08 16:12:34.0 +0100 @@ -768,7 +768,7 @@ { struct fib_result res; - struct flowifl = { .mark = frn->fl_fwmark, + struct flowifl = { .mark = frn->fl_mark, .nl_u = { .ip4_u = { .daddr = frn->fl_addr, .tos = frn->fl_tos, .scope = frn->fl_scope } } }; -- - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/6] [NET]: Rethink mark field in struct flowi
Now that all protocols have been made aware of the mark field it can be moved out of the union thus simplyfing its usage. The config options in the IPv4/IPv6/DECnet subsystems to enable respectively disable mark based routing only obfuscate the code with ifdefs, the cost for the additional comparison in the flow key is insignificant, and most distributions have all these options enabled by default anyway. Therefore it makes sense to remove the config options and enable mark based routing by default. Signed-off-by: Thomas Graf <[EMAIL PROTECTED]> Index: net-2.6.20/include/net/flow.h === --- net-2.6.20.orig/include/net/flow.h 2006-11-08 15:34:12.0 +0100 +++ net-2.6.20/include/net/flow.h 2006-11-08 16:12:32.0 +0100 @@ -18,7 +18,6 @@ struct { __be32 daddr; __be32 saddr; - __u32 fwmark; __u8tos; __u8scope; } ip4_u; @@ -26,28 +25,23 @@ struct { struct in6_addr daddr; struct in6_addr saddr; - __u32 fwmark; __be32 flowlabel; } ip6_u; struct { __le16 daddr; __le16 saddr; - __u32 fwmark; __u8scope; } dn_u; } nl_u; #define fld_dstnl_u.dn_u.daddr #define fld_srcnl_u.dn_u.saddr -#define fld_fwmark nl_u.dn_u.fwmark #define fld_scope nl_u.dn_u.scope #define fl6_dstnl_u.ip6_u.daddr #define fl6_srcnl_u.ip6_u.saddr -#define fl6_fwmark nl_u.ip6_u.fwmark #define fl6_flowlabel nl_u.ip6_u.flowlabel #define fl4_dstnl_u.ip4_u.daddr #define fl4_srcnl_u.ip4_u.saddr -#define fl4_fwmark nl_u.ip4_u.fwmark #define fl4_tosnl_u.ip4_u.tos #define fl4_scope nl_u.ip4_u.scope @@ -86,6 +80,7 @@ #ifdef CONFIG_IPV6_MIP6 #define fl_mh_type uli_u.mht.type #endif + __u32 mark; __u32 secid; /* used by xfrm; see secid.txt */ } __attribute__((__aligned__(BITS_PER_LONG/8))); Index: net-2.6.20/net/decnet/dn_route.c === --- net-2.6.20.orig/net/decnet/dn_route.c 2006-11-08 16:12:30.0 +0100 +++ net-2.6.20/net/decnet/dn_route.c2006-11-08 16:12:32.0 +0100 @@ -269,9 +269,7 @@ { return ((fl1->nl_u.dn_u.daddr ^ fl2->nl_u.dn_u.daddr) | (fl1->nl_u.dn_u.saddr ^ fl2->nl_u.dn_u.saddr) | -#ifdef CONFIG_DECNET_ROUTE_FWMARK - (fl1->nl_u.dn_u.fwmark ^ fl2->nl_u.dn_u.fwmark) | -#endif + (fl1->mark ^ fl2->mark) | (fl1->nl_u.dn_u.scope ^ fl2->nl_u.dn_u.scope) | (fl1->oif ^ fl2->oif) | (fl1->iif ^ fl2->iif)) == 0; @@ -882,10 +880,8 @@ { .daddr = oldflp->fld_dst, .saddr = oldflp->fld_src, .scope = RT_SCOPE_UNIVERSE, -#ifdef CONFIG_DECNET_ROUTE_FWMARK - .fwmark = oldflp->fld_fwmark -#endif } }, + .mark = oldflp->mark, .iif = loopback_dev.ifindex, .oif = oldflp->oif }; struct dn_route *rt = NULL; @@ -903,7 +899,7 @@ "dn_route_output_slow: dst=%04x src=%04x mark=%d" " iif=%d oif=%d\n", dn_ntohs(oldflp->fld_dst), dn_ntohs(oldflp->fld_src), - oldflp->fld_fwmark, loopback_dev.ifindex, oldflp->oif); + oldflp->mark, loopback_dev.ifindex, oldflp->oif); /* If we have an output interface, verify its a DECnet device */ if (oldflp->oif) { @@ -1108,9 +1104,7 @@ rt->fl.fld_dst= oldflp->fld_dst; rt->fl.oif= oldflp->oif; rt->fl.iif= 0; -#ifdef CONFIG_DECNET_ROUTE_FWMARK - rt->fl.fld_fwmark = oldflp->fld_fwmark; -#endif + rt->fl.mark = oldflp->mark; rt->rt_saddr = fl.fld_src; rt->rt_daddr = fl.fld_dst; @@ -1178,9 +1172,7 @@ rt = rcu_dereference(rt->u.rt_next)) { if ((flp->fld_dst == rt->fl.fld_dst) && (flp->fld_src == rt->fl.fld_src) && -#ifdef CONFIG_DECNET_ROUTE_FWMARK - (flp->fld_fwmark == rt->fl.fld_fwmark) && -#e
Re: [take24 3/6] kevent: poll/select() notifications.
On Thu, Nov 09, 2006 at 10:08:44AM +0100, Eric Dumazet ([EMAIL PROTECTED]) wrote: > Here you test both KEVENT_SOCKET and KEVENT_PIPE > > > +#if defined CONFIG_KEVENT_SOCKET || defined CONFIG_KEVENT_PIPE > > + kevent_storage_init(inode, &inode->st); > > +#endif > > } > > return inode; > > } > > > > void destroy_inode(struct inode *inode) > > { > > but here you test only KEVENT_SOCKET > > > +#if defined CONFIG_KEVENT_SOCKET > > + kevent_storage_fini(&inode->st); > > +#endif Indeed, it must be #if defined CONFIG_KEVENT_SOCKET || defined CONFIG_KEVENT_PIPE > > BUG_ON(inode_has_buffers(inode)); > > security_inode_free(inode); > > if (inode->i_sb->s_op->destroy_inode) > > diff --git a/include/linux/fs.h b/include/linux/fs.h > > index 5baf3a1..c529723 100644 > > --- a/include/linux/fs.h > > +++ b/include/linux/fs.h > > @@ -276,6 +276,7 @@ #include > > #include > > #include > > #include > > +#include > > > > #include > > #include > > @@ -586,6 +587,10 @@ #ifdef CONFIG_INOTIFY > > struct mutexinotify_mutex; /* protects the watches list */ > > #endif > > > > Here you include a kevent_storage only if KEVENT_SOCKET > > > +#ifdef CONFIG_KEVENT_SOCKET > > + struct kevent_storage st; > > +#endif > > + It must be #if defined CONFIG_KEVENT_SOCKET || defined CONFIG_KEVENT_PIPE -- Evgeniy Polyakov - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.19-rc1: Volanomark slowdown
On Wed, Nov 08, 2006 at 02:07:32PM -0800, Tim Chen wrote: > In my testing, the CPU utilization is at 100%. So > increase in ACKs will cost CPU to devote more > time to process those ACKs and reduce throughput. Oh, I see. I would test on a real network with real clients. I doubt you would observe a noticeable effect there. Olaf -- Walks like a duck. Quacks like a duck. Must be a chicken. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [take24 3/6] kevent: poll/select() notifications.
On Thursday 09 November 2006 09:23, Evgeniy Polyakov wrote: > poll/select() notifications. > > This patch includes generic poll/select notifications. > kevent_poll works simialr to epoll and has the same issues (callback > is invoked not from internal state machine of the caller, but through > process awake, a lot of allocations and so on). > > Signed-off-by: Evgeniy Polyakov <[EMAIL PROTECTED]> > > diff --git a/fs/file_table.c b/fs/file_table.c > index bc35a40..0805547 100644 > --- a/fs/file_table.c > +++ b/fs/file_table.c > @@ -20,6 +20,7 @@ #include > #include > #include > #include > +#include > #include > > #include > @@ -119,6 +120,7 @@ struct file *get_empty_filp(void) > f->f_uid = tsk->fsuid; > f->f_gid = tsk->fsgid; > eventpoll_init_file(f); > + kevent_init_file(f); > /* f->f_version: 0 */ > return f; > > @@ -164,6 +166,7 @@ void fastcall __fput(struct file *file) >* in the file cleanup chain. >*/ > eventpoll_release(file); > + kevent_cleanup_file(file); > locks_remove_flock(file); > > if (file->f_op && file->f_op->release) > diff --git a/fs/inode.c b/fs/inode.c > index ada7643..6745c00 100644 > --- a/fs/inode.c > +++ b/fs/inode.c > @@ -21,6 +21,7 @@ #include > #include > #include > #include > +#include > #include > > /* > @@ -164,12 +165,18 @@ #endif > } > inode->i_private = 0; > inode->i_mapping = mapping; Here you test both KEVENT_SOCKET and KEVENT_PIPE > +#if defined CONFIG_KEVENT_SOCKET || defined CONFIG_KEVENT_PIPE > + kevent_storage_init(inode, &inode->st); > +#endif > } > return inode; > } > > void destroy_inode(struct inode *inode) > { but here you test only KEVENT_SOCKET > +#if defined CONFIG_KEVENT_SOCKET > + kevent_storage_fini(&inode->st); > +#endif > BUG_ON(inode_has_buffers(inode)); > security_inode_free(inode); > if (inode->i_sb->s_op->destroy_inode) > diff --git a/include/linux/fs.h b/include/linux/fs.h > index 5baf3a1..c529723 100644 > --- a/include/linux/fs.h > +++ b/include/linux/fs.h > @@ -276,6 +276,7 @@ #include > #include > #include > #include > +#include > > #include > #include > @@ -586,6 +587,10 @@ #ifdef CONFIG_INOTIFY > struct mutexinotify_mutex; /* protects the watches list */ > #endif > Here you include a kevent_storage only if KEVENT_SOCKET > +#ifdef CONFIG_KEVENT_SOCKET > + struct kevent_storage st; > +#endif > + > unsigned long i_state; > unsigned long dirtied_when; /* jiffies of first dirtying */ > > @@ -739,6 +744,9 @@ #ifdef CONFIG_EPOLL > struct list_headf_ep_links; > spinlock_t f_ep_lock; > #endif /* #ifdef CONFIG_EPOLL */ > +#ifdef CONFIG_KEVENT_POLL > + struct kevent_storage st; > +#endif > struct address_space*f_mapping; > }; > extern spinlock_t files_lock; - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Linux-2.6.10 - Realtek 8139 driver (8139too.c) TX Timeout doesn't allow interrupt handler to disable receive interrupts at high bi-directional traffic
Hi All I found an issue with Realtek 8139 driver (2.6.10 branch) at high bi-directional traffic. On transmit timeout, driver's timeout callback re-enables the receive interrupt. On the next receive interrupt, the ISR disables the receive interrupt "only" when the receive poll task is not active. But the poll task is actually active and hence it doesn't disable the receive interrupt. So the ISR returns without clearing the receive interrupt. This un-serviced receive interrupt brings the system into hung state. My understanding here is, on receive interrupt the ISR should disable the receive interrupt irrespective of the polling task's state (active or inactive). I changed the code (as shown below) and it works perfectly. Is this a known issue? If so, is there a fix already available? Also, we get frequent TX timeouts during high rate of traffic. What could be the reason for this frequent TX timeouts? --- old/8139too.c 2006-11-09 11:49:25.0 +0530 +++ new/8139too.c 2006-11-09 11:50:02.0 +0530 @@ -2200,8 +2200,8 @@ /* Receive packets are processed by poll routine. If not running start it now. */ if (status & RxAckBits){ - if (netif_rx_schedule_prep(dev)) { RTL_W16_F (IntrMask, rtl8139_norx_intr_mask); + if (netif_rx_schedule_prep(dev)) { __netif_rx_schedule (dev); } } Thanks Mansoor - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[take24 0/6] kevent: Generic event handling mechanism.
Generic event handling mechanism. Kevent is a generic subsytem which allows to handle event notifications. It supports both level and edge triggered events. It is similar to poll/epoll in some cases, but it is more scalable, it is faster and allows to work with essentially eny kind of events. Events are provided into kernel through control syscall and can be read back through ring buffer or using usual syscalls. Kevent update (i.e. readiness switching) happens directly from internals of the appropriate state machine of the underlying subsytem (like network, filesystem, timer or any other). Homepage: http://tservice.net.ru/~s0mbre/old/?section=projects&item=kevent Documentation page: http://linux-net.osdl.org/index.php/Kevent Consider for inclusion. Changes from 'take23' patchset: * kevent PIPE notifications * KEVENT_REQ_LAST_CHECK flag, which allows to perform last check in dequeuing time * fixed poll/select notifications (were broken due to tree manipulations) * made Documentation/kevent.txt look nice in 80-col terminal * fix for copy_to_user() failure report for the first kevent (Andrew Morton) * minor fucntion renames Here is pipe result with kevent_pipe kernel kevent part with 2000 pipes (Eric Dumazet's application): epoll (edge-triggered): 248408 events/sec kevent (edge-triggered): 269282 events/sec Busy reading loop:269519 events/sec Changes from 'take22' patchset: * new ring buffer implementation in process' memory * wakeup-one-thread flag * edge-triggered behaviour With this release additional independent benchmark shows kevent speed compared to epoll: Eric Dumazet created special benchmark which creates set of AF_INET sockets and two threads start to simultaneously read and write data from/into them. Here is results: epoll (no EPOLLET): 57428 events/sec kevent (no ET): 59794 events/sec epoll (with EPOLLET): 71000 events/sec kevent (with ET): 78265 events/sec Maximum (busy loop reading events): 88482 events/sec Changes from 'take21' patchset: * minor cleanups (different return values, removed unneded variables, whitespaces and so on) * fixed bug in kevent removal in case when kevent being removed is the same as overflow_kevent (spotted by Eric Dumazet) Changes from 'take20' patchset: * new ring buffer implementation * removed artificial limit on possible number of kevents With this release and fixed userspace web server it was possible to achive 3960+ req/s with client connection rate of 4000 con/s over 100 Mbit lan, data IO over network was about 10582.7 KB/s, which is too close to wire speed if we get into account headers and the like. Changes from 'take19' patchset: * use __init instead of __devinit * removed 'default N' from config for user statistic * removed kevent_user_fini() since kevent can not be unloaded * use KERN_INFO for statistic output Changes from 'take18' patchset: * use __init instead of __devinit * removed 'default N' from config for user statistic * removed kevent_user_fini() since kevent can not be unloaded * use KERN_INFO for statistic output Changes from 'take17' patchset: * Use RB tree instead of hash table. At least for a web sever, frequency of addition/deletion of new kevent is comparable with number of search access, i.e. most of the time events are added, accesed only couple of times and then removed, so it justifies RB tree usage over AVL tree, since the latter does have much slower deletion time (max O(log(N)) compared to 3 ops), although faster search time (1.44*O(log(N)) vs. 2*O(log(N))). So for kevents I use RB tree for now and later, when my AVL tree implementation is ready, it will be possible to compare them. * Changed readiness check for socket notifications. With both above changes it is possible to achieve more than 3380 req/second compared to 2200, sometimes 2500 req/second for epoll() for trivial web-server and httperf client on the same hardware. It is possible that above kevent limit is due to maximum allowed kevents in a time limit, which is 4096 events. Changes from 'take16' patchset: * misc cleanups (__read_mostly, const ...) * created special macro which is used for mmap size (number of pages) calculation * export kevent_socket_notify(), since it is used in network protocols which can be built as modules (IPv6 for example) Changes from 'take15' patchset: * converted kevent_timer to high-resolution timers, this forces timer API update at http://linux-net.osdl.org/index.php/Kevent * use struct ukevent* instead of void * in syscalls (documentation has been updated) * added warning in kevent_add_ukevent() if ring has broken index (for testing) Changes from 'take14' patchset: * added kevent_wait() This syscall waits until either timeout expires or at least one event becomes ready. It also commits that @num events from @start are processed by userspace and thus can be be removed
[take24 6/6] kevent: Pipe notifications.
Pipe notifications. diff --git a/fs/pipe.c b/fs/pipe.c index f3b6f71..aeaee9c 100644 --- a/fs/pipe.c +++ b/fs/pipe.c @@ -16,6 +16,7 @@ #include #include #include #include +#include #include #include @@ -312,6 +313,7 @@ redo: break; } if (do_wakeup) { + kevent_pipe_notify(inode, KEVENT_SOCKET_SEND); wake_up_interruptible_sync(&pipe->wait); kill_fasync(&pipe->fasync_writers, SIGIO, POLL_OUT); } @@ -321,6 +323,7 @@ redo: /* Signal writers asynchronously that there is more room. */ if (do_wakeup) { + kevent_pipe_notify(inode, KEVENT_SOCKET_SEND); wake_up_interruptible(&pipe->wait); kill_fasync(&pipe->fasync_writers, SIGIO, POLL_OUT); } @@ -490,6 +493,7 @@ redo2: break; } if (do_wakeup) { + kevent_pipe_notify(inode, KEVENT_SOCKET_RECV); wake_up_interruptible_sync(&pipe->wait); kill_fasync(&pipe->fasync_readers, SIGIO, POLL_IN); do_wakeup = 0; @@ -501,6 +505,7 @@ redo2: out: mutex_unlock(&inode->i_mutex); if (do_wakeup) { + kevent_pipe_notify(inode, KEVENT_SOCKET_RECV); wake_up_interruptible(&pipe->wait); kill_fasync(&pipe->fasync_readers, SIGIO, POLL_IN); } @@ -605,6 +610,7 @@ pipe_release(struct inode *inode, int de free_pipe_info(inode); } else { wake_up_interruptible(&pipe->wait); + kevent_pipe_notify(inode, KEVENT_SOCKET_SEND|KEVENT_SOCKET_RECV); kill_fasync(&pipe->fasync_readers, SIGIO, POLL_IN); kill_fasync(&pipe->fasync_writers, SIGIO, POLL_OUT); } diff --git a/kernel/kevent/kevent_pipe.c b/kernel/kevent/kevent_pipe.c new file mode 100644 index 000..32c6f19 --- /dev/null +++ b/kernel/kevent/kevent_pipe.c @@ -0,0 +1,112 @@ +/* + * kevent_pipe.c + * + * 2006 Copyright (c) Evgeniy Polyakov <[EMAIL PROTECTED]> + * All rights reserved. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA + */ + +#include +#include +#include +#include +#include +#include +#include +#include + +static int kevent_pipe_callback(struct kevent *k) +{ + struct inode *inode = k->st->origin; + struct pipe_inode_info *pipe = inode->i_pipe; + int nrbufs = pipe->nrbufs; + + if (k->event.event & KEVENT_SOCKET_RECV && nrbufs > 0) { + if (!pipe->writers) + return -1; + return 1; + } + + if (k->event.event & KEVENT_SOCKET_SEND && nrbufs < PIPE_BUFFERS) { + if (!pipe->readers) + return -1; + return 1; + } + + return 0; +} + +int kevent_pipe_enqueue(struct kevent *k) +{ + struct file *pipe; + int err = -EBADF; + struct inode *inode; + + pipe = fget(k->event.id.raw[0]); + if (!pipe) + goto err_out_exit; + + inode = igrab(pipe->f_dentry->d_inode); + if (!inode) + goto err_out_fput; + + err = kevent_storage_enqueue(&inode->st, k); + if (err) + goto err_out_iput; + + err = k->callbacks.callback(k); + if (err) + goto err_out_dequeue; + + fput(pipe); + + return err; + +err_out_dequeue: + kevent_storage_dequeue(k->st, k); +err_out_iput: + iput(inode); +err_out_fput: + fput(pipe); +err_out_exit: + return err; +} + +int kevent_pipe_dequeue(struct kevent *k) +{ + struct inode *inode = k->st->origin; + + kevent_storage_dequeue(k->st, k); + iput(inode); + + return 0; +} + +void kevent_pipe_notify(struct inode *inode, u32 event) +{ + kevent_storage_ready(&inode->st, NULL, event); +} + +static int __init kevent_init_pipe(void) +{ + struct kevent_callbacks sc = { + .callback = &kevent_pipe_callback, + .enqueue = &kevent_pipe_enqueue, + .dequeue = &kevent_pipe_dequeue}; + + return kevent_add_callbacks(&sc, KEVENT_PIPE); +} +module_init(kevent_init_pipe
[take24 4/6] kevent: Socket notifications.
Socket notifications. This patch includes socket send/recv/accept notifications. Using trivial web server based on kevent and this features instead of epoll it's performance increased more than noticebly. More details about various benchmarks and server itself (evserver_kevent.c) can be found on project's homepage. Signed-off-by: Evgeniy Polyakov <[EMAIL PROTECTED]> diff --git a/fs/inode.c b/fs/inode.c index ada7643..6745c00 100644 --- a/fs/inode.c +++ b/fs/inode.c @@ -21,6 +21,7 @@ #include #include #include #include +#include #include /* @@ -164,12 +165,18 @@ #endif } inode->i_private = 0; inode->i_mapping = mapping; +#if defined CONFIG_KEVENT_SOCKET || defined CONFIG_KEVENT_PIPE + kevent_storage_init(inode, &inode->st); +#endif } return inode; } void destroy_inode(struct inode *inode) { +#if defined CONFIG_KEVENT_SOCKET + kevent_storage_fini(&inode->st); +#endif BUG_ON(inode_has_buffers(inode)); security_inode_free(inode); if (inode->i_sb->s_op->destroy_inode) diff --git a/include/net/sock.h b/include/net/sock.h index edd4d73..d48ded8 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -48,6 +48,7 @@ #include #include #include /* struct sk_buff */ #include +#include #include @@ -450,6 +451,21 @@ static inline int sk_stream_memory_free( extern void sk_stream_rfree(struct sk_buff *skb); +struct socket_alloc { + struct socket socket; + struct inode vfs_inode; +}; + +static inline struct socket *SOCKET_I(struct inode *inode) +{ + return &container_of(inode, struct socket_alloc, vfs_inode)->socket; +} + +static inline struct inode *SOCK_INODE(struct socket *socket) +{ + return &container_of(socket, struct socket_alloc, socket)->vfs_inode; +} + static inline void sk_stream_set_owner_r(struct sk_buff *skb, struct sock *sk) { skb->sk = sk; @@ -477,6 +493,7 @@ static inline void sk_add_backlog(struct sk->sk_backlog.tail = skb; } skb->next = NULL; + kevent_socket_notify(sk, KEVENT_SOCKET_RECV); } #define sk_wait_event(__sk, __timeo, __condition) \ @@ -679,21 +696,6 @@ static inline struct kiocb *siocb_to_kio return si->kiocb; } -struct socket_alloc { - struct socket socket; - struct inode vfs_inode; -}; - -static inline struct socket *SOCKET_I(struct inode *inode) -{ - return &container_of(inode, struct socket_alloc, vfs_inode)->socket; -} - -static inline struct inode *SOCK_INODE(struct socket *socket) -{ - return &container_of(socket, struct socket_alloc, socket)->vfs_inode; -} - extern void __sk_stream_mem_reclaim(struct sock *sk); extern int sk_stream_mem_schedule(struct sock *sk, int size, int kind); diff --git a/include/net/tcp.h b/include/net/tcp.h index 7a093d0..69f4ad2 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -857,6 +857,7 @@ static inline int tcp_prequeue(struct so tp->ucopy.memory = 0; } else if (skb_queue_len(&tp->ucopy.prequeue) == 1) { wake_up_interruptible(sk->sk_sleep); + kevent_socket_notify(sk, KEVENT_SOCKET_RECV|KEVENT_SOCKET_SEND); if (!inet_csk_ack_scheduled(sk)) inet_csk_reset_xmit_timer(sk, ICSK_TIME_DACK, (3 * TCP_RTO_MIN) / 4, diff --git a/kernel/kevent/kevent_socket.c b/kernel/kevent/kevent_socket.c new file mode 100644 index 000..7f74110 --- /dev/null +++ b/kernel/kevent/kevent_socket.c @@ -0,0 +1,135 @@ +/* + * kevent_socket.c + * + * 2006 Copyright (c) Evgeniy Polyakov <[EMAIL PROTECTED]> + * All rights reserved. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include +#include +#include + +static int kevent_socket_callback(struct kevent *k) +{ + struct inode *inode = k->st->origin; + unsigned int events = SOCKET_I(inode)->ops->poll(SOCKET_I(inode)->file, SOCKET_I(inode), NULL); + + if ((events & (POLLIN | POLLRDNORM)) && (k->event.event & (KEVENT_SOCKET_RECV | KEVENT_SOCKET_
Re. Please pull 'upstream' branch of wireless-2.6
John wrote : Yeah, looks like I was a bit overzealous on the warning squelch... I'll cook-up a new patch that doesn't error-out. I hope you do not bloat the kernel with meaningless warning messages. Something simple like the following will do. /* enable MWI */ /* Shut up the must_check tests - We don't care if this does not succeed */ if (pci_set_mwi(pdev)) rvalue = 0; Roger While - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[take24 2/6] kevent: Core files.
Core files. This patch includes core kevent files: * userspace controlling * kernelspace interfaces * initialization * notification state machines Some bits of documentation can be found on project's homepage (and links from there): http://tservice.net.ru/~s0mbre/old/?section=projects&item=kevent Signed-off-by: Evgeniy Polyakov <[EMAIL PROTECTED]> diff --git a/arch/i386/kernel/syscall_table.S b/arch/i386/kernel/syscall_table.S index 7e639f7..fa8075b 100644 --- a/arch/i386/kernel/syscall_table.S +++ b/arch/i386/kernel/syscall_table.S @@ -318,3 +318,7 @@ ENTRY(sys_call_table) .long sys_vmsplice .long sys_move_pages .long sys_getcpu + .long sys_kevent_get_events + .long sys_kevent_ctl/* 320 */ + .long sys_kevent_wait + .long sys_kevent_ring_init diff --git a/arch/x86_64/ia32/ia32entry.S b/arch/x86_64/ia32/ia32entry.S index b4aa875..95fb252 100644 --- a/arch/x86_64/ia32/ia32entry.S +++ b/arch/x86_64/ia32/ia32entry.S @@ -714,8 +714,12 @@ #endif .quad compat_sys_get_robust_list .quad sys_splice .quad sys_sync_file_range - .quad sys_tee + .quad sys_tee /* 315 */ .quad compat_sys_vmsplice .quad compat_sys_move_pages .quad sys_getcpu + .quad sys_kevent_get_events + .quad sys_kevent_ctl/* 320 */ + .quad sys_kevent_wait + .quad sys_kevent_ring_init ia32_syscall_end: diff --git a/include/asm-i386/unistd.h b/include/asm-i386/unistd.h index bd99870..2161ef2 100644 --- a/include/asm-i386/unistd.h +++ b/include/asm-i386/unistd.h @@ -324,10 +324,14 @@ #define __NR_tee 315 #define __NR_vmsplice 316 #define __NR_move_pages317 #define __NR_getcpu318 +#define __NR_kevent_get_events 319 +#define __NR_kevent_ctl320 +#define __NR_kevent_wait 321 +#define __NR_kevent_ring_init 322 #ifdef __KERNEL__ -#define NR_syscalls 319 +#define NR_syscalls 323 #include /* diff --git a/include/asm-x86_64/unistd.h b/include/asm-x86_64/unistd.h index 6137146..3669c0f 100644 --- a/include/asm-x86_64/unistd.h +++ b/include/asm-x86_64/unistd.h @@ -619,10 +619,18 @@ #define __NR_vmsplice 278 __SYSCALL(__NR_vmsplice, sys_vmsplice) #define __NR_move_pages279 __SYSCALL(__NR_move_pages, sys_move_pages) +#define __NR_kevent_get_events 280 +__SYSCALL(__NR_kevent_get_events, sys_kevent_get_events) +#define __NR_kevent_ctl281 +__SYSCALL(__NR_kevent_ctl, sys_kevent_ctl) +#define __NR_kevent_wait 282 +__SYSCALL(__NR_kevent_wait, sys_kevent_wait) +#define __NR_kevent_ring_init 283 +__SYSCALL(__NR_kevent_ring_init, sys_kevent_ring_init) #ifdef __KERNEL__ -#define __NR_syscall_max __NR_move_pages +#define __NR_syscall_max __NR_kevent_ring_init #include #ifndef __NO_STUBS diff --git a/include/linux/kevent.h b/include/linux/kevent.h new file mode 100644 index 000..f7cbf6b --- /dev/null +++ b/include/linux/kevent.h @@ -0,0 +1,223 @@ +/* + * 2006 Copyright (c) Evgeniy Polyakov <[EMAIL PROTECTED]> + * All rights reserved. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA + */ + +#ifndef __KEVENT_H +#define __KEVENT_H +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#define KEVENT_MIN_BUFFS_ALLOC 3 + +struct kevent; +struct kevent_storage; +typedef int (* kevent_callback_t)(struct kevent *); + +/* @callback is called each time new event has been caught. */ +/* @enqueue is called each time new event is queued. */ +/* @dequeue is called each time event is dequeued. */ + +struct kevent_callbacks { + kevent_callback_t callback, enqueue, dequeue; +}; + +#define KEVENT_READY 0x1 +#define KEVENT_STORAGE 0x2 +#define KEVENT_USER0x4 + +struct kevent +{ + /* Used for kevent freeing.*/ + struct rcu_head rcu_head; + struct ukevent event; + /* This lock protects ukevent manipulations, e.g. ret_flags changes. */ + spinlock_t ulock; + + /* Entry of user's tree. */ + struct rb_node kevent_node; + /* Entry of origin's queue. */ + struct list_headstorage_entry; +
[take24 1/6] kevent: Description.
Description. diff --git a/Documentation/kevent.txt b/Documentation/kevent.txt new file mode 100644 index 000..ca49e4b --- /dev/null +++ b/Documentation/kevent.txt @@ -0,0 +1,186 @@ +Description. + +int kevent_ctl(int fd, unsigned int cmd, unsigned int num, struct ukevent *arg); + +fd - is the file descriptor referring to the kevent queue to manipulate. +It is created by opening "/dev/kevent" char device, which is created with +dynamic minor number and major number assigned for misc devices. + +cmd - is the requested operation. It can be one of the following: +KEVENT_CTL_ADD - add event notification +KEVENT_CTL_REMOVE - remove event notification +KEVENT_CTL_MODIFY - modify existing notification + +num - number of struct ukevent in the array pointed to by arg +arg - array of struct ukevent + +When called, kevent_ctl will carry out the operation specified in the +cmd parameter. +--- + + int kevent_get_events(int ctl_fd, unsigned int min_nr, unsigned int max_nr, + __u64 timeout, struct ukevent *buf, unsigned flags) + +ctl_fd - file descriptor referring to the kevent queue +min_nr - minimum number of completed events that kevent_get_events will block +waiting for +max_nr - number of struct ukevent in buf +timeout - number of nanoseconds to wait before returning less than min_nr + events. If this is -1, then wait forever. +buf - pointer to an array of struct ukevent. +flags - unused + +kevent_get_events will wait timeout milliseconds for at least min_nr completed +events, copying completed struct ukevents to buf and deleting any +KEVENT_REQ_ONESHOT event requests. In nonblocking mode it returns as many +events as possible, but not more than max_nr. In blocking mode it waits until +timeout or if at least min_nr events are ready. +--- + + int kevent_wait(int ctl_fd, unsigned int num, __u64 timeout) + +ctl_fd - file descriptor referring to the kevent queue +num - number of processed kevents +timeout - this timeout specifies number of nanoseconds to wait until there is + free space in kevent queue + +This syscall waits until either timeout expires or at least one event becomes +ready. It also copies that num events into special ring buffer and requeues +them (or removes depending on flags). +--- + + int kevent_ring_init(int ctl_fd, struct kevent_ring *ring, unsigned int num) + +ctl_fd - file descriptor referring to the kevent queue +num - size of the ring buffer in events + + struct kevent_ring + { + unsigned int ring_kidx; + struct ukevent event[0]; + } + +ring_kidx - is an index in the ring buffer where kernel will put new events + when kevent_wait() or kevent_get_events() is called + +Example userspace code (ring_buffer.c) can be found on project's homepage. + +Each kevent syscall can be so called cancellation point in glibc, i.e. when +thread has been cancelled in kevent syscall, thread can be safely removed +and no events will be lost, since each syscall (kevent_wait() or +kevent_get_events()) will copy event into special ring buffer, accessible +from other threads or even processes (if shared memory is used). + +When kevent is removed (not dequeued when it is ready, but just removed), +even if it was ready, it is not copied into ring buffer, since if it is +removed, no one cares about it (otherwise user would wait until it becomes +ready and got it through usual way using kevent_get_events() or kevent_wait()) +and thus no need to copy it to the ring buffer. + +It is possible with userspace ring buffer, that events in the ring buffer +can be replaced without knowledge for the thread currently reading them +(when other thread calls kevent_get_events() or kevent_wait()), so appropriate +locking between threads or processes, which can simultaneously access the same +ring buffer, is required. +--- + +The bulk of the interface is entirely done through the ukevent struct. +It is used to add event requests, modify existing event requests, +specify which event requests to remove, and return completed events. + +struct ukevent contains the following members: + +struct kevent_id id +Id of this request, e.g. socket number, file descriptor and so on +__u32 type +Event type, e.g. KEVENT_SOCK, KEVENT_INODE, KEVENT_TIMER and so on +__u32 event +Event itself, e.g. SOCK_ACCEPT, INODE_CREATED, TIMER_FIRED +__u32 req_flags +Per-event request flags, + +KEVENT_REQ_ONESHOT +event will be removed when it is ready + +KEVENT_REQ_WAKEUP_ONE +When several threads wait on the same kevent queue and requested the + same event, for example 'wake me up when new
[take24 3/6] kevent: poll/select() notifications.
poll/select() notifications. This patch includes generic poll/select notifications. kevent_poll works simialr to epoll and has the same issues (callback is invoked not from internal state machine of the caller, but through process awake, a lot of allocations and so on). Signed-off-by: Evgeniy Polyakov <[EMAIL PROTECTED]> diff --git a/fs/file_table.c b/fs/file_table.c index bc35a40..0805547 100644 --- a/fs/file_table.c +++ b/fs/file_table.c @@ -20,6 +20,7 @@ #include #include #include #include +#include #include #include @@ -119,6 +120,7 @@ struct file *get_empty_filp(void) f->f_uid = tsk->fsuid; f->f_gid = tsk->fsgid; eventpoll_init_file(f); + kevent_init_file(f); /* f->f_version: 0 */ return f; @@ -164,6 +166,7 @@ void fastcall __fput(struct file *file) * in the file cleanup chain. */ eventpoll_release(file); + kevent_cleanup_file(file); locks_remove_flock(file); if (file->f_op && file->f_op->release) diff --git a/fs/inode.c b/fs/inode.c index ada7643..6745c00 100644 --- a/fs/inode.c +++ b/fs/inode.c @@ -21,6 +21,7 @@ #include #include #include #include +#include #include /* @@ -164,12 +165,18 @@ #endif } inode->i_private = 0; inode->i_mapping = mapping; +#if defined CONFIG_KEVENT_SOCKET || defined CONFIG_KEVENT_PIPE + kevent_storage_init(inode, &inode->st); +#endif } return inode; } void destroy_inode(struct inode *inode) { +#if defined CONFIG_KEVENT_SOCKET + kevent_storage_fini(&inode->st); +#endif BUG_ON(inode_has_buffers(inode)); security_inode_free(inode); if (inode->i_sb->s_op->destroy_inode) diff --git a/include/linux/fs.h b/include/linux/fs.h index 5baf3a1..c529723 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -276,6 +276,7 @@ #include #include #include #include +#include #include #include @@ -586,6 +587,10 @@ #ifdef CONFIG_INOTIFY struct mutexinotify_mutex; /* protects the watches list */ #endif +#ifdef CONFIG_KEVENT_SOCKET + struct kevent_storage st; +#endif + unsigned long i_state; unsigned long dirtied_when; /* jiffies of first dirtying */ @@ -739,6 +744,9 @@ #ifdef CONFIG_EPOLL struct list_headf_ep_links; spinlock_t f_ep_lock; #endif /* #ifdef CONFIG_EPOLL */ +#ifdef CONFIG_KEVENT_POLL + struct kevent_storage st; +#endif struct address_space*f_mapping; }; extern spinlock_t files_lock; diff --git a/kernel/kevent/kevent_poll.c b/kernel/kevent/kevent_poll.c new file mode 100644 index 000..7030d21 --- /dev/null +++ b/kernel/kevent/kevent_poll.c @@ -0,0 +1,228 @@ +/* + * 2006 Copyright (c) Evgeniy Polyakov <[EMAIL PROTECTED]> + * All rights reserved. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +static kmem_cache_t *kevent_poll_container_cache; +static kmem_cache_t *kevent_poll_priv_cache; + +struct kevent_poll_ctl +{ + struct poll_table_structpt; + struct kevent *k; +}; + +struct kevent_poll_wait_container +{ + struct list_headcontainer_entry; + wait_queue_head_t *whead; + wait_queue_twait; + struct kevent *k; +}; + +struct kevent_poll_private +{ + struct list_headcontainer_list; + spinlock_t container_lock; +}; + +static int kevent_poll_enqueue(struct kevent *k); +static int kevent_poll_dequeue(struct kevent *k); +static int kevent_poll_callback(struct kevent *k); + +static int kevent_poll_wait_callback(wait_queue_t *wait, + unsigned mode, int sync, void *key) +{ + struct kevent_poll_wait_container *cont = + container_of(wait, struct kevent_poll_wait_container, wait); + struct kevent *k = cont->k; + + kevent_storage_ready(k->st, NULL, KEVENT_MASK_ALL); + return 0; +} + +static void kevent_poll_qproc(struct file *file, wait_queue_head_t *whead, + struct poll_table_struct *poll_table) +{ + struct kevent *k = + container_of(poll_table, struct kevent_poll_ctl, pt)->k; + struct kevent_poll_private *priv = k->priv; + struct kevent_poll_wait_container *co
[take24 5/6] kevent: Timer notifications.
Timer notifications. Timer notifications can be used for fine grained per-process time management, since interval timers are very inconvenient to use, and they are limited. This subsystem uses high-resolution timers. id.raw[0] is used as number of seconds id.raw[1] is used as number of nanoseconds Signed-off-by: Evgeniy Polyakov <[EMAIL PROTECTED]> diff --git a/kernel/kevent/kevent_timer.c b/kernel/kevent/kevent_timer.c new file mode 100644 index 000..df93049 --- /dev/null +++ b/kernel/kevent/kevent_timer.c @@ -0,0 +1,112 @@ +/* + * 2006 Copyright (c) Evgeniy Polyakov <[EMAIL PROTECTED]> + * All rights reserved. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA + */ + +#include +#include +#include +#include +#include +#include +#include +#include + +struct kevent_timer +{ + struct hrtimer ktimer; + struct kevent_storage ktimer_storage; + struct kevent *ktimer_event; +}; + +static int kevent_timer_func(struct hrtimer *timer) +{ + struct kevent_timer *t = container_of(timer, struct kevent_timer, ktimer); + struct kevent *k = t->ktimer_event; + + kevent_storage_ready(&t->ktimer_storage, NULL, KEVENT_MASK_ALL); + hrtimer_forward(timer, timer->base->softirq_time, + ktime_set(k->event.id.raw[0], k->event.id.raw[1])); + return HRTIMER_RESTART; +} + +static struct lock_class_key kevent_timer_key; + +static int kevent_timer_enqueue(struct kevent *k) +{ + int err; + struct kevent_timer *t; + + t = kmalloc(sizeof(struct kevent_timer), GFP_KERNEL); + if (!t) + return -ENOMEM; + + hrtimer_init(&t->ktimer, CLOCK_MONOTONIC, HRTIMER_REL); + t->ktimer.expires = ktime_set(k->event.id.raw[0], k->event.id.raw[1]); + t->ktimer.function = kevent_timer_func; + t->ktimer_event = k; + + err = kevent_storage_init(&t->ktimer, &t->ktimer_storage); + if (err) + goto err_out_free; + lockdep_set_class(&t->ktimer_storage.lock, &kevent_timer_key); + + err = kevent_storage_enqueue(&t->ktimer_storage, k); + if (err) + goto err_out_st_fini; + + hrtimer_start(&t->ktimer, t->ktimer.expires, HRTIMER_REL); + + return 0; + +err_out_st_fini: + kevent_storage_fini(&t->ktimer_storage); +err_out_free: + kfree(t); + + return err; +} + +static int kevent_timer_dequeue(struct kevent *k) +{ + struct kevent_storage *st = k->st; + struct kevent_timer *t = container_of(st, struct kevent_timer, ktimer_storage); + + hrtimer_cancel(&t->ktimer); + kevent_storage_dequeue(st, k); + kfree(t); + + return 0; +} + +static int kevent_timer_callback(struct kevent *k) +{ + k->event.ret_data[0] = jiffies_to_msecs(jiffies); + return 1; +} + +static int __init kevent_init_timer(void) +{ + struct kevent_callbacks tc = { + .callback = &kevent_timer_callback, + .enqueue = &kevent_timer_enqueue, + .dequeue = &kevent_timer_dequeue}; + + return kevent_add_callbacks(&tc, KEVENT_TIMER); +} +module_init(kevent_init_timer); + - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html