Re: [Devel] Re: [RFC] network namespaces
On Sunday 10 September 2006 06:47, Herbert Poetzl wrote: well, I think it would be best to have both, as they are complementary to some degree, and IMHO both, the full virtualization _and_ the isolation will require a separate namespace to work, [snip] I do not think that folks would want to recompile their kernel just to get a light-weight guest or a fully virtualized one In this case light-weight guest will have unnecessary overhead. For example, instead of using static pointer, we have to find the required common namespace before. And there will be no advantages for such guest over full-featured. best, Herbert -- Thanks, Dmitry. -- Thanks, Dmitry. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Devel] Re: [RFC] network namespaces
On Sunday 10 September 2006 07:41, Eric W. Biederman wrote: I certainly agree that we are not at a point where a final decision can be made. A major piece of that is that a layer 2 approach has not shown to be without a performance penalty. But it is required. Why to limit possible usages? A practical question. Do the IPs assigned to guests ever get used by anything besides the guest? In case of level2 virtualization - no. -- Thanks, Dmitry. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: r8168, 2.6.17 et r8169.c
Le samedi 9 septembre 2006 13:24, vous avez écrit : Corentin CHARY [EMAIL PROTECTED] : [...] J'ai vu après qu'il y'avais des patchs plus à jour ici http://www.fr.zoreil.com/linux/kernel/2.6.x/2.6.18-rc5/r8169/ , mais pour 2.6.18. Je me demande donc si il y'a un moyen de faire marcher le driver r8169.c pour ma r8168 sur le kernel 2.6.17. Sinon, j'attendrais le 2.6.18 :). Vous appliquez les patchs pour 2.6.8-rc5 a la derniere version 2.6.18-rc disponible, vous modifiez la ligne qui comprend une directive 'request_irq' pour qu'elle soit identique a celle du pilote du noyau 2.6.17 et le pilote obtenu devrait compiler avec votre noyau 2.6.17. Si le lien ne se negocie pas bien, forcez une autonegotiation avec l'utilitaire mii-tool (ne s'applique qu'au pilote apres bricolage). Dans tous les cas un 'lspci -vvx' de votre machine sera le bienvenu. En cas de probleme, il serait preferable de poursuivre la discussion en anglais en mettant la liste de diffusion 'netdev@vger.kernel.org' en copie. Small summary (for the mailing list) : -- I have a Asus Laptop A6JC, and a r8168 network card. r1000 driver from realtek works, but I want to use r8169.c. I first tried with http://www.fr.zoreil.com/linux/kernel/2.6.x/2.6.17-rc6/r8169/ patchs, but it doesn't Work (eth0 is here, I can use ifconfig and route, but ping or dhcpcd doesn't work) -- Now I tried with http://www.fr.zoreil.com/linux/kernel/2.6.x/2.6.18-rc5/r8169/ I changed : line 1751 - request_irq(dev-irq, rtl8169_interrupt, IRQF_SHARED, dev-name, dev); + request_irq(dev-irq, rtl8169_interrupt, SA_SHIRQ, dev-name, dev); line 2224 - u32 mss = skb_shinfo(skb)-gso_size; + u32 mss = skb_shinfo(skb)-tso_size; to get it compile on 2.6.17 I can load the module, eth0 is here, I can play with ifconfig. But i can't ping other machines on my network. dhcpd doesn't work too. dmesg output : r8169 Gigabit Ethernet driver 2.2LK loaded ACPI: PCI Interrupt :02:00.0[A] - GSI 16 (level, low) - IRQ 16 PCI: Setting latency timer of device :02:00.0 to 64 eth0: RTL8168b/8111b at 0xf0fe6000, 00:17:31:c1:8d:ee, IRQ 16 r8169: eth0: link down r8169: eth0: link down r8169: eth0: link up lspci : 02:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168B PCI Express Gigabit Ethernet controller (rev 01) Subsystem: ASUSTeK Computer Inc. Unknown device 11f5 Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast TAbort- TAbort- MAbort- SERR- PERR- Interrupt: pin A routed to IRQ 16 Region 0: I/O ports at c800 [size=256] Region 2: Memory at fe0ff000 (64-bit, non-prefetchable) [size=4K] Expansion ROM at fe0e [disabled] [size=64K] Capabilities: [40] Power Management version 2 Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=375mA PME(D0-,D1+,D2+,D3hot+,D3cold+) Status: D0 PME-Enable- DSel=0 DScale=0 PME- Capabilities: [48] Vital Product Data Capabilities: [50] Message Signalled Interrupts: 64bit+ Queue=0/1 Enable- Address: Data: Capabilities: [60] Express Endpoint IRQ 0 Device: Supported: MaxPayload 1024 bytes, PhantFunc 0, ExtTag- Device: Latency L0s unlimited, L1 unlimited Device: AtnBtn+ AtnInd+ PwrInd+ Device: Errors: Correctable- Non-Fatal- Fatal- Unsupported- Device: RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+ Device: MaxPayload 128 bytes, MaxReadReq 512 bytes Link: Supported Speed 2.5Gb/s, Width x4, ASPM L0s L1, Port 0 Link: Latency L0s unlimited, L1 unlimited Link: ASPM Disabled RCB 64 bytes CommClk+ ExtSynch- Link: Speed 2.5Gb/s, Width x4 Capabilities: [84] Vendor Specific Information 00: ec 10 68 81 03 00 10 00 01 00 00 02 08 00 00 00 10: 01 c8 00 00 00 00 00 00 04 f0 0f fe 00 00 00 00 20: 00 00 00 00 00 00 00 00 02 01 00 00 43 10 f5 11 30: 00 00 0e fe 40 00 00 00 00 00 00 00 0b 01 00 00 I tried mii-tool (and mii-diag) but I get that : SCIOCGMIIPHY on eth0 failed: Operation not supported Thanks -- CHARY 'Iksaif' Corentin [EMAIL PROTECTED] - [EMAIL PROTECTED] http://xf.iksaif.net - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
tiacx module fail
Hi, i have a little problem with dlink dwl650+ pcmcia. After I downloaded wireless-2.6.git with cogito, compiled, rebooted my laptop and loaded the acx_pci module. But my pcmcia don't work. If i give this command line: # iwlist scanning eth1 Interface doesn't support scanning : Resource temporarily unavailable I tried with a 2.6.18-rc5 with tiacx and 2.6.17.13 with acx module, same result. The dmesg with tiacx module is: -- Sep 9 19:19:37 vinTux acx: compiled to use 32bit I/O access. I/O timing issues might occur, such as non-working fir mware upload. Report them Sep 9 19:19:37 vinTux running on a little-endian CPU Sep 9 19:19:37 vinTux PCI module v0.4.7 initialized, waiting for cards to probe... Sep 9 19:19:37 vinTux PCI: Enabling device :02:00.0 ( - 0003) Sep 9 19:19:37 vinTux ACPI: PCI Interrupt :02:00.0[A] - Link [LNKA] - GSI 11 (level, low) - IRQ 11 Sep 9 19:19:37 vinTux PCI: Setting latency timer of device :02:00.0 to 64 Sep 9 19:19:37 vinTux acx: found ACX100-based wireless network card at :02:00.0, irq:11, phymem1:0x4201, ph ymem2:0x4200, mem1:0xfc994000, mem1_size:4096, mem2:0xfc9e, mem2_size:65536 Sep 9 19:19:37 vinTux initial debug setting is 0x000A Sep 9 19:19:37 vinTux using IRQ 11 Sep 9 19:19:37 vinTux requesting firmware image 'tiacx100c0D' Sep 9 19:19:37 vinTux acx: firmware image 'tiacx100c0D' was not provided. Check your hotplug scripts Sep 9 19:19:37 vinTux requesting firmware image 'tiacx100' Sep 9 19:19:38 vinTux acx_write_fw (main/combined):0 Sep 9 19:19:38 vinTux acx_validate_fw (main/combined):0 Sep 9 19:19:38 vinTux acx100_s_init_wep: writing WEP options Sep 9 19:19:38 vinTux initializing max packet templates Sep 9 19:19:38 vinTux unknown chip and EEPROM version combination (ACX100, v0), don't know how to parse config opti ons yet. Please report Sep 9 19:19:38 vinTux get_mask 0x4D82, set_mask 0x Sep 9 19:19:38 vinTux got sensitivity value 64 Sep 9 19:19:38 vinTux got antenna value 0x08 Sep 9 19:19:38 vinTux got Energy Detect (ED) threshold 112 Sep 9 19:19:38 vinTux got Channel Clear Assessment (CCA) value 13 Sep 9 19:19:38 vinTux got regulatory domain 0x30 Sep 9 19:19:38 vinTux get_mask 0x, set_mask 0x - after update Sep 9 19:19:38 vinTux new ratevector: 82 84 0B 16 2C Sep 9 19:19:38 vinTux setting RXconfig to 2010:0FDD Sep 9 19:19:38 vinTux acx: form factor 0x01 ((mini-)PCI / CardBus), radio type 0x0D (Maxim), EEPROM version 0x00, u ploaded firmware 'Rev 1.9.10.0_A4' (0x01030505) Sep 9 19:19:38 vinTux creating /proc entry driver/acx_eth1 Sep 9 19:19:38 vinTux creating /proc entry driver/acx_eth1_diag Sep 9 19:19:38 vinTux creating /proc entry driver/acx_eth1_eeprom Sep 9 19:19:38 vinTux creating /proc entry driver/acx_eth1_phy Sep 9 19:19:38 vinTux acx v0.4.7: net device eth1, driver compiled against wireless extensions 20 Sep 9 19:19:39 vinTux acx_set_status(1):SCANNING Sep 9 19:19:39 vinTux updating initial settings on iface activation Sep 9 19:19:39 vinTux get_mask 0x, set_mask 0x0036EEFC Sep 9 19:19:39 vinTux important setting has been changed. Need to update packet templates, too Sep 9 19:19:39 vinTux updating packet templates Sep 9 19:19:39 vinTux updating Tx fallback to 1 retries Sep 9 19:19:39 vinTux updating transmit power: 18 dBm Sep 9 19:19:39 vinTux updating antenna value: 0x08 Sep 9 19:19:39 vinTux updating Energy Detect (ED) threshold: 112 Sep 9 19:19:39 vinTux updating Channel Clear Assessment (CCA) value: 0x0D Sep 9 19:19:39 vinTux updating channel to: 1 Sep 9 19:19:39 vinTux updating: enable Tx Sep 9 19:19:39 vinTux updating: enable Rx on channel: 1 Sep 9 19:19:39 vinTux updating short retry limit: 7, long retry limit: 4 Sep 9 19:19:39 vinTux updating tx MSDU lifetime: 4096 Sep 9 19:19:39 vinTux updating regulatory domain: 0x30 Sep 9 19:19:39 vinTux setting RXconfig to 2010:0FDD Sep 9 19:19:39 vinTux updating WEP key settings Sep 9 19:19:39 vinTux setting WEP key 0 as default Sep 9 19:19:39 vinTux acx_set_status(1):SCANNING Sep 9 19:19:39 vinTux starting radio scan Sep 9 19:19:39 vinTux get_mask 0x, set_mask 0x - after update Sep 9 19:19:39 vinTux get_mask 0x, set_mask 0x0040 Sep 9 19:19:39 vinTux setting RXconfig to 2010:0FDD Sep 9 19:19:39 vinTux get_mask 0x, set_mask 0x - after update Sep 9 19:19:39 vinTux get_mask 0x, set_mask 0x0040 Sep 9 19:19:39 vinTux setting RXconfig to 2010:0FDD Sep 9 19:19:39 vinTux get_mask 0x, set_mask 0x - after update Sep 9 19:19:40 vinTux acx_i_timer: adev-status=1 (SCANNING) Sep 9 19:19:40 vinTux continuing scan (1 sec) Sep 9 19:19:40 vinTux no matching station found in range yet Sep 9 19:19:40 vinTux acx_set_status(1):SCANNING Sep 9 19:19:40 vinTux starting radio scan Sep 9 19:19:41 vinTux rc-scripts: no access points found Sep 9 19:19:41 vinTux rc-scripts: Couldn't find any access points on eth1 Sep 9 19:19:41 vinTux
Re: [RFC] network namespaces
Dmitry Mishin [EMAIL PROTECTED] writes: On Sunday 10 September 2006 07:41, Eric W. Biederman wrote: I certainly agree that we are not at a point where a final decision can be made. A major piece of that is that a layer 2 approach has not shown to be without a performance penalty. But it is required. Why to limit possible usages? Wrong perspective. The point is that we need to dig in and show that there is no measurable penalty for the current cases. Showing that there is little penalty for the advanced configurations is a plus. The practical question is, do we need to implement the grand unified lookup before we can do this cheaply, or can we implement this without needing that optimization? To get a perspective, to get a good implementation of the pid namespace I am having to refactor significant parts of the kernel so it uses abstractions that can cleanly express what we are doing. The networking stack is in better shape but there is a lot of it. A practical question. Do the IPs assigned to guests ever get used by anything besides the guest? In case of level2 virtualization - no. Actually that is one of the benefits of a layer 2 implementation you can set up weird things like shared IPs, that various types of fail over scenarios want. My question was really about the layer 3 bind filtering techniques, and how people are using them. The basic attraction with layer 3 is that you can do a simple implementation, and it will run very fast, and it doesn't need to conflict with the layer 2 work at all. If you can make that layer 3 implementation clean and generally mergeable as well it is worth pursuing. Eric - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
problem with DMA when writting driver for rtl8139?
hi, everybody my name is Mariusz, I am newbie to linux kernel, For several weeks I have been writing kernel driver for network card based on rtl8139c chip. I am writing this driver for micrococontrollers technology course in my university I have some problems with DMA, i suppose. there is a bit in Transmit Status Descriptor of RTL8139c which after clearing(It must be cleared to start transmit operation) shouldb be placed in 1 state - which according to RTL8139 specification means that DMA copy from memory to internal RTL fifo has finished. The problem is: rtl doesn't clear this bit I use pci_map_single to map address of packet buffer to dma capable memory, then cpu_to_le32 to get physicall address of this buffer. Do you have any idea what may work wrong? here is my code: 1) rtlmodule contains functions related to initialization issues 2) rtlopen contains functions related to open device issues and interrupt handling 3) rtltransmit contains functions related to transmision issues and here is output from kernel best regards, Mariusz Witosz - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH 2/7] secid reconciliation-v02: Add LSM hooks
Is there any way you can send patches without format=flowed in the content-type? On two mailers I've tried, the patches get mangled. Yes. I will send them to you in a few minutes with format=flowed disabled. As soon as you let me know you see them fine, I will resend them to the lists. Thanks. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH]:[XFRM] BEET mode
Hi, as part of this email you can find a patch which introduces the BEET mode (Bound End-to-End Tunnel) as specified by the ietf draft at the following link: http://www.ietf.org/internet-drafts/draft-nikander-esp-beet-mode-06.txt A BEET mode Security Associations records two pairs of IP addresses, called inner addresses and outer addresses. The inner addresses are what the applications see. The outer addresses are what appear on the wire. The presented BEET mode allows for transformation having inner family equal to outer family. Signed-off-by: Diego Beltrami [EMAIL PROTECTED] Miika Komu [EMAIL PROTECTED] Herbert Xu [EMAIL PROTECTED] Abhinav Pathak [EMAIL PROTECTED] Jeff Ahrenholz [EMAIL PROTECTED] -- Diego Beltrami diff --git a/include/linux/in.h b/include/linux/in.h index 94f557f..9290d99 100644 --- a/include/linux/in.h +++ b/include/linux/in.h @@ -40,6 +40,7 @@ enum { IPPROTO_ESP = 50,/* Encapsulation Security Payload protocol */ IPPROTO_AH = 51, /* Authentication Header protocol */ + IPPROTO_BEETPH = 94,/* IP option pseudo header for BEET */ IPPROTO_PIM= 103,/* Protocol Independent Multicast */ IPPROTO_COMP = 108,/* Compression Header protocol */ diff --git a/include/linux/ip.h b/include/linux/ip.h index 4b55cf1..e4d8a39 100644 --- a/include/linux/ip.h +++ b/include/linux/ip.h @@ -79,6 +79,8 @@ #defineIPOPT_TS_TSANDADDR 1 /* timestamps and addresses */ #defineIPOPT_TS_PRESPEC3 /* specified modules only */ +#define IPV4_BEET_PHMAXLEN 8 + struct iphdr { #if defined(__LITTLE_ENDIAN_BITFIELD) __u8ihl:4, @@ -122,4 +124,11 @@ struct ip_comp_hdr { __u16 cpi; }; +struct ip_beet_phdr { + __u8 nexthdr; + __u8 hdrlen; + __u8 padlen; + __u8 reserved; +}; + #endif /* _LINUX_IP_H */ diff --git a/include/linux/ipsec.h b/include/linux/ipsec.h index d3c5276..d17a630 100644 --- a/include/linux/ipsec.h +++ b/include/linux/ipsec.h @@ -12,7 +12,8 @@ enum { IPSEC_MODE_ANY = 0,/* We do not support this for SA */ IPSEC_MODE_TRANSPORT= 1, - IPSEC_MODE_TUNNEL = 2 + IPSEC_MODE_TUNNEL = 2, + IPSEC_MODE_BEET = 3 }; enum { diff --git a/include/linux/xfrm.h b/include/linux/xfrm.h index 46a15c7..6a616de 100644 --- a/include/linux/xfrm.h +++ b/include/linux/xfrm.h @@ -120,7 +120,8 @@ enum #define XFRM_MODE_TRANSPORT 0 #define XFRM_MODE_TUNNEL 1 -#define XFRM_MODE_MAX 2 +#define XFRM_MODE_BEET 2 +#define XFRM_MODE_MAX 3 /* Netlink configuration messages. */ enum { diff --git a/net/ipv4/Kconfig b/net/ipv4/Kconfig index 8514106..02c5ff7 100644 --- a/net/ipv4/Kconfig +++ b/net/ipv4/Kconfig @@ -432,6 +432,15 @@ config INET_XFRM_MODE_TUNNEL If unsure, say Y. +config INET_XFRM_MODE_BEET + tristate IP: IPsec BEET mode + default y + select XFRM + ---help--- + Support for IPsec BEET mode. + + If unsure, say Y. + config INET_DIAG tristate INET: socket monitoring interface default y diff --git a/net/ipv4/Makefile b/net/ipv4/Makefile index 4878fc5..ad22492 100644 --- a/net/ipv4/Makefile +++ b/net/ipv4/Makefile @@ -26,6 +26,7 @@ obj-$(CONFIG_INET_XFRM_TUNNEL) += xfrm4_ obj-$(CONFIG_INET_TUNNEL) += tunnel4.o obj-$(CONFIG_INET_XFRM_MODE_TRANSPORT) += xfrm4_mode_transport.o obj-$(CONFIG_INET_XFRM_MODE_TUNNEL) += xfrm4_mode_tunnel.o +obj-$(CONFIG_INET_XFRM_MODE_BEET) += xfrm4_mode_beet.o obj-$(CONFIG_IP_PNP) += ipconfig.o obj-$(CONFIG_IP_ROUTE_MULTIPATH_RR) += multipath_rr.o obj-$(CONFIG_IP_ROUTE_MULTIPATH_RANDOM) += multipath_random.o diff --git a/net/ipv4/ah4.c b/net/ipv4/ah4.c index 1366bc6..9d6f0e7 100644 --- a/net/ipv4/ah4.c +++ b/net/ipv4/ah4.c @@ -253,7 +253,7 @@ static int ah_init_state(struct xfrm_sta goto error; x-props.header_len = XFRM_ALIGN8(sizeof(struct ip_auth_hdr) + ahp-icv_trunc_len); - if (x-props.mode) + if (x-props.mode == XFRM_MODE_TUNNEL) x-props.header_len += sizeof(struct iphdr); x-data = ahp; diff --git a/net/ipv4/esp4.c b/net/ipv4/esp4.c index fc2f8ce..76722e1 100644 --- a/net/ipv4/esp4.c +++ b/net/ipv4/esp4.c @@ -237,7 +237,8 @@ static int esp_input(struct xfrm_state * *as per draft-ietf-ipsec-udp-encaps-06, *section 3.1.2 */ - if (!x-props.mode) + if (x-props.mode == XFRM_MODE_TUNNEL || + x-props.mode == XFRM_MODE_BEET ) skb-ip_summed = CHECKSUM_UNNECESSARY; } @@ -255,17 +256,28 @@ static u32 esp4_get_max_size(struct xfrm { struct esp_data *esp = x-data; u32 blksize = ALIGN(crypto_tfm_alg_blocksize(esp-conf.tfm), 4); + int enclen = 0; - if (x-props.mode) { -
[patch] d80211: fix WEP on big endian cpus
The ICV is transmitted on the network as a 4 byte little endian quantity. WEP encryption needs to swap the bytes before transmission and decryption needs to swap bytes before ICV verification. Index: wireless-dev/net/d80211/wep.c === --- wireless-dev.orig/net/d80211/wep.c 2006-09-10 14:50:52.073583400 + +++ wireless-dev/net/d80211/wep.c 2006-09-10 14:51:10.146835848 + @@ -121,9 +121,11 @@ { struct scatterlist sg; u32 *icv; + u32 crc; icv = (u32 *)(data + data_len); - *icv = ~crc32_le(~0, data, data_len); + crc = ~crc32_le(~0, data, data_len); + *icv = cpu_to_le32(crc); crypto_cipher_setkey(tfm, rc4key, klen); sg.page = virt_to_page(data); @@ -196,6 +198,7 @@ crypto_cipher_decrypt(tfm, sg, sg, sg.length); crc = ~crc32_le(~0, data, data_len); + crc = cpu_to_le32(crc); if (memcmp(crc, data + data_len, WEP_ICV_LEN) != 0) /* ICV mismatch */ return -1; -- - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch] d80211: fix WEP on big endian cpus
On Sunday 10 September 2006 19:36, David Kimdon wrote: The ICV is transmitted on the network as a 4 byte little endian quantity. WEP encryption needs to swap the bytes before transmission and decryption needs to swap bytes before ICV verification. Holy shit, this fixes the bug I am hunting for hours at the moment! I tested this patch and it works. Thanks very much, you saved me hours. ;) John (Jiri), please apply this one. Acked-by: Michael Buesch [EMAIL PROTECTED] Index: wireless-dev/net/d80211/wep.c === --- wireless-dev.orig/net/d80211/wep.c2006-09-10 14:50:52.073583400 + +++ wireless-dev/net/d80211/wep.c 2006-09-10 14:51:10.146835848 + @@ -121,9 +121,11 @@ { struct scatterlist sg; u32 *icv; + u32 crc; icv = (u32 *)(data + data_len); - *icv = ~crc32_le(~0, data, data_len); + crc = ~crc32_le(~0, data, data_len); + *icv = cpu_to_le32(crc); crypto_cipher_setkey(tfm, rc4key, klen); sg.page = virt_to_page(data); @@ -196,6 +198,7 @@ crypto_cipher_decrypt(tfm, sg, sg, sg.length); crc = ~crc32_le(~0, data, data_len); + crc = cpu_to_le32(crc); if (memcmp(crc, data + data_len, WEP_ICV_LEN) != 0) /* ICV mismatch */ return -1; -- - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html -- Greetings Michael. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Devel] Re: [RFC] network namespaces
On Sat, Sep 09, 2006 at 09:41:35PM -0600, Eric W. Biederman wrote: Herbert Poetzl [EMAIL PROTECTED] writes: On Sat, Sep 09, 2006 at 11:57:24AM +0400, Dmitry Mishin wrote: On Friday 08 September 2006 22:11, Herbert Poetzl wrote: actually the light-weight ip isolation runs perfectly fine _without_ CAP_NET_ADMIN, as you do not want the guest to be able to mess with the 'configured' ips at all (not to speak of interfaces here) It was only an example. I'm thinking about how to implement flexible solution, which permits light-weight ip isolation as well as full-fledged netwrok virtualization. Another solution is to split CONFIG_NET_NAMESPACE. Is it good for you? well, I think it would be best to have both, as they are complementary to some degree, and IMHO both, the full virtualization _and_ the isolation will require a separate namespace to work, I also think that limiting the isolation to something very simple (like one IP + network or so) would be acceptable for a start, because especially multi IP or network range checks require a little more efford to get them right ... I do not think that folks would want to recompile their kernel just to get a light-weight guest or a fully virtualized one I certainly agree that we are not at a point where a final decision can be made. A major piece of that is that a layer 2 approach has not shown to be without a performance penalty. A practical question. Do the IPs assigned to guests ever get used by anything besides the guest? only in special setups and for testing routing and general operation of course, i.e. one typical failure scenario is this: - 'provider' has a bunch of ips assigned - 'host' ip works perfectly - 'guest' ip is not routed (by the external router) in this case, for example, I always suggest to test on the host with a guest ip, simplest example: ping -I guest-ip google.com but for 'normal' operation, the guest ip is reserved for the guests, unless some service like named is shared between guests ... HTH, Herbert Eric - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Devel] Re: [RFC] network namespaces
On Sun, Sep 10, 2006 at 11:45:35AM +0400, Dmitry Mishin wrote: On Sunday 10 September 2006 06:47, Herbert Poetzl wrote: well, I think it would be best to have both, as they are complementary to some degree, and IMHO both, the full virtualization _and_ the isolation will require a separate namespace to work, [snip] I do not think that folks would want to recompile their kernel just to get a light-weight guest or a fully virtualized one In this case light-weight guest will have unnecessary overhead. For example, instead of using static pointer, we have to find the required common namespace before. this is only required at 'bind' time, which makes a non measurable fraction of the actual connection usage (unless you keep binding ports over and over without ever using them) And there will be no advantages for such guest over full-featured. the advantage is in the flexibility, simplicity of setup and the basically non-existant overhead on the hot (conenction/transfer) part ... best, Herbert -- Thanks, Dmitry. -- Thanks, Dmitry. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH 2/7] secid reconciliation-v02: Add LSM hooks
On Sun, 10 Sep 2006, Venkat Yekkirala wrote: Is there any way you can send patches without format=flowed in the content-type? On two mailers I've tried, the patches get mangled. Yes. I will send them to you in a few minutes with format=flowed disabled. As soon as you let me know you see them fine, I will resend them to the lists. Tanks, they look fine now. Don't worry about re-sending yet. -- James Morris [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/7] 8139cp: ring_info removal for the receive path
Jeff Garzik [EMAIL PROTECTED] : Francois Romieu wrote: The ring_info.len field is not used at all. cp_private.rx_skb is turned into an array of sk_buff *. Signed-off-by: Francois Romieu [EMAIL PROTECTED] Need to remove the now-dead struct ring_info. I should have written is not used at all in the receive path. If one wants to remove ring_info completely, something like the patch below is required: 8139cp: ring_info removal for the transmit path handwaving As long as the descriptor fits on a single cacheline, the change should be really free. /handwaving Now ring_info is not used at all. Signed-off-by: Francois Romieu [EMAIL PROTECTED] diff --git a/drivers/net/8139cp.c b/drivers/net/8139cp.c index bbdaa18..c3b8400 100644 --- a/drivers/net/8139cp.c +++ b/drivers/net/8139cp.c @@ -314,11 +314,6 @@ struct cp_desc { u64 addr; }; -struct ring_info { - struct sk_buff *skb; - u32 len; -}; - struct cp_dma_stats { u64 tx_ok; u64 rx_ok; @@ -360,7 +355,7 @@ struct cp_private { unsignedtx_head cacheline_aligned; unsignedtx_tail; struct cp_desc *tx_ring; - struct ring_infotx_skb[CP_TX_RING_SIZE]; + struct sk_buff *tx_skb[CP_TX_RING_SIZE]; unsignedrx_buf_sz; unsignedwol_enabled : 1; /* Is Wake-on-LAN enabled? */ @@ -721,11 +716,12 @@ static void cp_tx (struct cp_private *cp if (status DescOwn) break; - skb = cp-tx_skb[tx_tail].skb; + skb = cp-tx_skb[tx_tail]; BUG_ON(!skb); pci_unmap_single(cp-pdev, le64_to_cpu(txd-addr), -cp-tx_skb[tx_tail].len, PCI_DMA_TODEVICE); +le32_to_cpu(txd-opts1) 0x, +PCI_DMA_TODEVICE); if (status LastFrag) { if (status (TxError | TxFIFOUnder)) { @@ -752,7 +748,7 @@ static void cp_tx (struct cp_private *cp dev_kfree_skb_irq(skb); } - cp-tx_skb[tx_tail].skb = NULL; + cp-tx_skb[tx_tail] = NULL; tx_tail = NEXT_TX(tx_tail); } @@ -822,8 +818,7 @@ #endif txd-opts1 = cpu_to_le32(flags); wmb(); - cp-tx_skb[entry].skb = skb; - cp-tx_skb[entry].len = len; + cp-tx_skb[entry] = skb; entry = NEXT_TX(entry); } else { struct cp_desc *txd; @@ -839,8 +834,7 @@ #endif first_len = skb_headlen(skb); first_mapping = pci_map_single(cp-pdev, skb-data, first_len, PCI_DMA_TODEVICE); - cp-tx_skb[entry].skb = skb; - cp-tx_skb[entry].len = first_len; + cp-tx_skb[entry] = skb; entry = NEXT_TX(entry); for (frag = 0; frag skb_shinfo(skb)-nr_frags; frag++) { @@ -881,8 +875,7 @@ #endif txd-opts1 = cpu_to_le32(ctrl); wmb(); - cp-tx_skb[entry].skb = skb; - cp-tx_skb[entry].len = len; + cp-tx_skb[entry] = skb; entry = NEXT_TX(entry); } @@ -1159,12 +1152,13 @@ static void cp_clean_rings (struct cp_pr } for (i = 0; i CP_TX_RING_SIZE; i++) { - if (cp-tx_skb[i].skb) { - struct sk_buff *skb = cp-tx_skb[i].skb; + if (cp-tx_skb[i]) { + struct sk_buff *skb = cp-tx_skb[i]; desc = cp-tx_ring + i; pci_unmap_single(cp-pdev, le64_to_cpu(desc-addr), -cp-tx_skb[i].len, PCI_DMA_TODEVICE); +le32_to_cpu(desc-opts1) 0x, +PCI_DMA_TODEVICE); if (le32_to_cpu(desc-opts1) LastFrag) dev_kfree_skb(skb); cp-net_stats.tx_dropped++; @@ -1175,7 +1169,7 @@ static void cp_clean_rings (struct cp_pr memset(cp-tx_ring, 0, sizeof(struct cp_desc) * CP_TX_RING_SIZE); memset(cp-rx_skb, 0, sizeof(struct sk_buff *) * CP_RX_RING_SIZE); - memset(cp-tx_skb, 0, sizeof(struct ring_info) * CP_TX_RING_SIZE); + memset(cp-tx_skb, 0, sizeof(struct sk_buff *) * CP_TX_RING_SIZE); } static void cp_free_rings (struct cp_private *cp) - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] VIOC: New Network Device Driver
Am Friday 15 September 2006 02:15 schrieb Misha Tomushev: VIOC Device Driver provides a standard device interface to the internal fabric interconnected network used on servers designed and built by Fabric 7 Systems. The patch can be found at ftp.fabric7.com/VIOC. We recently had a discussion about tx descriptor cleanup in general. It would probably be more efficient to call vnic_clean_txq from the vioc_rx_poll() function. To do that, your tx interrupt handler should disable the tx interrupt line and call netif_rx_schedule, like you do for the receive interrupts. A few comments on coding style: - Lots of macros like your GET_VNIC_TX_BUFADDR_LO: they seem overly complicated. Maybe replace the users with something simpler, e.g. instead of 'if (GET_VNIC_RXC_FLAGGED(rxcd) != VNIC_RXC_FLAGGED_HW_W)', do 'if (vnic_rxc_word3(rxcd) VNIC_RXC_FLAGGED_HW_W)'. - whitespace: please follow the style in Documentation/CodingStyle, use tabs for indentation instead of spaces, run everything through 'lindent' or 'indent -kr -i8' once to get spaces in the right places. - unnecessary typecasts: try to avoid casts in the C source, in particular from or to 'void *', that is done by C automatically. When you do a macro like GETRELADDR(), make it return the right type so you don't need a cast. - macros: whereever possible, use an inline function instead - printk: use dev_info/dev_dbg/... instead of plain printk, when you have a pointer to a device. - extern declarations: belong into header files, not C files. This will guarantee that the definition matches the declaration. - static forward declarations: get rid of them by moving the static functions into the right order. This also makes reading easier, since you know static functions are only called from below. - vmalloc: try to avoid. use it only when allocating more than a few pages. Arnd - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: r8168, 2.6.17 et r8169.c
Corentin CHARY [EMAIL PROTECTED] : [no need to quote the original part in french] I have a Asus Laptop A6JC, and a r8168 network card. r1000 driver from realtek works, but I want to use r8169.c. I first tried with http://www.fr.zoreil.com/linux/kernel/2.6.x/2.6.17-rc6/r8169/ patchs, but it doesn't Work (eth0 is here, I can use ifconfig and route, but ping or dhcpcd doesn't work) If you still have it at hand, turn NET_IP_ALIGN into NET_IP_ALIGN + 6 and it should work (at least it has been reported to by Boris Zhmurov). [...] Now I tried with http://www.fr.zoreil.com/linux/kernel/2.6.x/2.6.18-rc5/r8169/ I changed : line 1751 - request_irq(dev-irq, rtl8169_interrupt, IRQF_SHARED, dev-name, dev); + request_irq(dev-irq, rtl8169_interrupt, SA_SHIRQ, dev-name, dev); line 2224 - u32 mss = skb_shinfo(skb)-gso_size; + u32 mss = skb_shinfo(skb)-tso_size; to get it compile on 2.6.17 I can load the module, eth0 is here, I can play with ifconfig. But i can't ping other machines on my network. dhcpd doesn't work too. dmesg output : [...] Full story please, no cut. [...] I tried mii-tool (and mii-diag) but I get that : SCIOCGMIIPHY on eth0 failed: Operation not supported Huh ? Are you really sure that you have applied the 11 patches and built the right file ? -- Ueimor - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch] d80211: fix WEP on big endian cpus
Huh. I assumed crc32_le gave us the result in little endian, but I guess that's wrong. I've attached another patch which basically does the same thing but adds some sparse bitwise annotations to make things clear. Also, it has a signed-off-by line. :) d80211: fix WEP on big endian cpus This patch fixes the endian issues with the ICV in WEP, as pointed out by David Kimdon [EMAIL PROTECTED], and uses __le32 where appropriate to make things clear. Signed-off-by: Michael Wu [EMAIL PROTECTED] --- net/d80211/wep.c | 10 +- 1 files changed, 5 insertions(+), 5 deletions(-) diff --git a/net/d80211/wep.c b/net/d80211/wep.c index c3e4728..22c2e53 100644 --- a/net/d80211/wep.c +++ b/net/d80211/wep.c @@ -120,10 +120,10 @@ void ieee80211_wep_encrypt_data(struct c size_t klen, u8 *data, size_t data_len) { struct scatterlist sg; - u32 *icv; + __le32 *icv; - icv = (u32 *)(data + data_len); - *icv = ~crc32_le(~0, data, data_len); + icv = (__le32 *)(data + data_len); + *icv = cpu_to_le32(~crc32_le(~0, data, data_len)); crypto_cipher_setkey(tfm, rc4key, klen); sg.page = virt_to_page(data); @@ -187,7 +187,7 @@ int ieee80211_wep_decrypt_data(struct cr size_t klen, u8 *data, size_t data_len) { struct scatterlist sg; - u32 crc; + __le32 crc; crypto_cipher_setkey(tfm, rc4key, klen); sg.page = virt_to_page(data); @@ -195,7 +195,7 @@ int ieee80211_wep_decrypt_data(struct cr sg.length = data_len + WEP_ICV_LEN; crypto_cipher_decrypt(tfm, sg, sg, sg.length); - crc = ~crc32_le(~0, data, data_len); + crc = cpu_to_le32(~crc32_le(~0, data, data_len)); if (memcmp(crc, data + data_len, WEP_ICV_LEN) != 0) /* ICV mismatch */ return -1; pgpNK32qlwZjB.pgp Description: PGP signature
Re: Realtek r1000 driver
Paolo [EMAIL PROTECTED] : [...] Realtek offers the GPL'd driver r1000, v1.04 at present, but seems it's not compatible with current 2.6.x kernel at the module param interface. It is probably not compatible with the kernel developpers at the code review interface either. :o) I've pushed Realtek src in-tree after applying the attached patches. The r1000 driver now compiles and WFM fine for .16.28, .17.13, .18-rc6-git3. Hope this helps make them into .18. Nope: upcoming 2.6.18 has been in feature freeze for ages. The current r8169 maintainer has a patch which merges almost everything relevant for the 8136 (almost = the device needs to be massaged through mii-tool to correctly negotiate the link). -- Ueimor - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/3] [IrDA] af_irda.c cleanups
Hi Dave, We lock the socket when both releasing and getting a disconnected notification. In the latter case, we also ste the socket as orphan. This fixes a potential kernel bug that can be triggered when we get the disconnection notification before closing the socket. Signed-off-by: Samuel Ortiz [EMAIL PROTECTED] --- net/irda/af_irda.c |9 +++-- 1 files changed, 7 insertions(+), 2 deletions(-) diff --git a/net/irda/af_irda.c b/net/irda/af_irda.c index 17699ee..7b7cd5b 100644 --- a/net/irda/af_irda.c +++ b/net/irda/af_irda.c @@ -132,13 +132,14 @@ static void irda_disconnect_indication(v /* Prevent race conditions with irda_release() and irda_shutdown() */ if (!sock_flag(sk, SOCK_DEAD) sk-sk_state != TCP_CLOSE) { + lock_sock(sk); sk-sk_state = TCP_CLOSE; sk-sk_err = ECONNRESET; sk-sk_shutdown |= SEND_SHUTDOWN; sk-sk_state_change(sk); - /* Uh-oh... Should use sock_orphan ? */ -sock_set_flag(sk, SOCK_DEAD); +sock_orphan(sk); + release_sock(sk); /* Close our TSAP. * If we leave it open, IrLMP put it back into the list of @@ -1212,6 +1213,7 @@ static int irda_release(struct socket *s if (sk == NULL) return 0; + lock_sock(sk); sk-sk_state = TCP_CLOSE; sk-sk_shutdown |= SEND_SHUTDOWN; sk-sk_state_change(sk); @@ -1221,6 +1223,7 @@ static int irda_release(struct socket *s sock_orphan(sk); sock-sk = NULL; + release_sock(sk); /* Purge queues (see sock_init_data()) */ skb_queue_purge(sk-sk_receive_queue); @@ -1353,6 +1356,7 @@ static int irda_recvmsg_dgram(struct kio IRDA_DEBUG(4, %s()\n, __FUNCTION__); IRDA_ASSERT(self != NULL, return -1;); + IRDA_ASSERT(!sock_error(sk), return -1;); skb = skb_recv_datagram(sk, flags ~MSG_DONTWAIT, flags MSG_DONTWAIT, err); @@ -1405,6 +1409,7 @@ static int irda_recvmsg_stream(struct ki IRDA_DEBUG(3, %s()\n, __FUNCTION__); IRDA_ASSERT(self != NULL, return -1;); + IRDA_ASSERT(!sock_error(sk), return -1;); if (sock-flags __SO_ACCEPTCON) return(-EINVAL); -- 1.4.1.1 - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/3] [IrDA] Memory allocations cleanups
This patch replaces the bunch of arbitrary 64 and 128 bytes alloc_skb() calls with more accurate allocation sizes. Signed-off-by: Samuel Ortiz [EMAIL PROTECTED] --- include/net/irda/irlan_common.h | 10 ++- include/net/irda/irlap_frame.h | 31 +++- include/net/irda/irlmp.h|2 + net/irda/af_irda.c |3 +- net/irda/ircomm/ircomm_lmp.c|4 +-- net/irda/iriap.c|9 +++--- net/irda/iriap_event.c |2 + net/irda/irlan/irlan_common.c | 46 -- net/irda/irlan/irlan_provider.c | 12 ++-- net/irda/irlap_frame.c | 59 +-- net/irda/irlmp.c|2 + net/irda/irttp.c| 14 + 12 files changed, 136 insertions(+), 58 deletions(-) diff --git a/include/net/irda/irlan_common.h b/include/net/irda/irlan_common.h index 1c73bdb..9592c37 100644 --- a/include/net/irda/irlan_common.h +++ b/include/net/irda/irlan_common.h @@ -98,7 +98,15 @@ #define IRLAN_BYTE 0 #define IRLAN_SHORT 1 #define IRLAN_ARRAY 2 -#define IRLAN_MAX_HEADER (TTP_HEADER+LMP_HEADER+LAP_MAX_HEADER) +/* IrLAN sits on top if IrTTP */ +#define IRLAN_MAX_HEADER (TTP_HEADER+LMP_HEADER) +/* 1 byte for the command code and 1 byte for the parameter count */ +#define IRLAN_CMD_HEADER 2 + +#define IRLAN_STRING_PARAMETER_LEN(name, value) (1 + strlen((name)) + 2 \ + + strlen ((value))) +#define IRLAN_BYTE_PARAMETER_LEN(name) (1 + strlen((name)) + 2 + 1) +#define IRLAN_SHORT_PARAMETER_LEN(name) (1 + strlen((name)) + 2 + 2) /* * IrLAN client diff --git a/include/net/irda/irlap_frame.h b/include/net/irda/irlap_frame.h index 3452ae2..9dd54a5 100644 --- a/include/net/irda/irlap_frame.h +++ b/include/net/irda/irlap_frame.h @@ -74,6 +74,19 @@ #define RSP_FRAME 0x00 #define PF_BIT0x10 /* Poll/final bit */ +/* Some IrLAP field lengths */ +/* + * Only baud rate triplet is 4 bytes (PV can be 2 bytes). + * All others params (7) are 3 bytes, so that's 7*3 + 1*4 bytes. + */ +#define IRLAP_NEGOCIATION_PARAMS_LEN 25 +#define IRLAP_DISCOVERY_INFO_LEN 32 + +struct disc_frame { + __u8 caddr; /* Connection address */ + __u8 control; +} IRDA_PACK; + struct xid_frame { __u8 caddr; /* Connection address */ __u8 control; @@ -95,11 +108,25 @@ struct test_frame { struct ua_frame { __u8 caddr; __u8 control; - __u32 saddr; /* Source device address */ __u32 daddr; /* Dest device address */ } IRDA_PACK; - + +struct dm_frame { + __u8 caddr; /* Connection address */ + __u8 control; +} IRDA_PACK; + +struct rd_frame { + __u8 caddr; /* Connection address */ + __u8 control; +} IRDA_PACK; + +struct rr_frame { + __u8 caddr; /* Connection address */ + __u8 control; +} IRDA_PACK; + struct i_frame { __u8 caddr; __u8 control; diff --git a/include/net/irda/irlmp.h b/include/net/irda/irlmp.h index 11ecfa5..e212b9b 100644 --- a/include/net/irda/irlmp.h +++ b/include/net/irda/irlmp.h @@ -48,7 +48,7 @@ #define LSAP_CONNLESS 0x70 /* Connection #define DEV_ADDR_ANY 0x #define LMP_HEADER 2/* Dest LSAP + Source LSAP */ -#define LMP_CONTROL_HEADER 4 +#define LMP_CONTROL_HEADER 4/* LMP_HEADER + opcode + parameter */ #define LMP_PID_HEADER 1/* Used by Ultra */ #define LMP_MAX_HEADER (LMP_CONTROL_HEADER+LAP_MAX_HEADER) diff --git a/net/irda/af_irda.c b/net/irda/af_irda.c index 7b7cd5b..7e1aea8 100644 --- a/net/irda/af_irda.c +++ b/net/irda/af_irda.c @@ -309,7 +309,8 @@ static void irda_connect_response(struct IRDA_ASSERT(self != NULL, return;); - skb = alloc_skb(64, GFP_ATOMIC); + skb = alloc_skb(TTP_MAX_HEADER + TTP_SAR_HEADER, + GFP_ATOMIC); if (skb == NULL) { IRDA_DEBUG(0, %s() Unable to allocate sk_buff!\n, __FUNCTION__); diff --git a/net/irda/ircomm/ircomm_lmp.c b/net/irda/ircomm/ircomm_lmp.c index 959874b..c8e0d89 100644 --- a/net/irda/ircomm/ircomm_lmp.c +++ b/net/irda/ircomm/ircomm_lmp.c @@ -81,7 +81,7 @@ static int ircomm_lmp_connect_response(s /* Any userdata supplied? */ if (userdata == NULL) { - tx_skb = alloc_skb(64, GFP_ATOMIC); + tx_skb = alloc_skb(LMP_MAX_HEADER, GFP_ATOMIC); if (!tx_skb) return -ENOMEM; @@ -115,7 +115,7 @@ static int ircomm_lmp_disconnect_request IRDA_DEBUG(0, %s()\n, __FUNCTION__ ); if (!userdata) { - tx_skb = alloc_skb(64, GFP_ATOMIC); + tx_skb = alloc_skb(LMP_MAX_HEADER, GFP_ATOMIC); if (!tx_skb) return -ENOMEM; diff --git a/net/irda/iriap.c b/net/irda/iriap.c index 61128aa..415cf4e 100644
[PATCH 2/3] [IrDA] irda-usb needs firmware loader
With the inclusion of the stir421x code, we now need to select FW_LOADER whenever we try to build the irda-usb code. Signed-off-by: Samuel Ortiz [EMAIL PROTECTED] --- drivers/net/irda/Kconfig |1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/drivers/net/irda/Kconfig b/drivers/net/irda/Kconfig index e9e6d99..7c8ccc0 100644 --- a/drivers/net/irda/Kconfig +++ b/drivers/net/irda/Kconfig @@ -287,6 +287,7 @@ comment FIR device drivers config USB_IRDA tristate IrDA USB dongles depends on IRDA USB + select FW_LOADER ---help--- Say Y here if you want to build support for the USB IrDA FIR Dongle device driver. To compile it as a module, choose M here: the module -- 1.4.1.1 - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH]:[XFRM] BEET mode
Hello. In article [EMAIL PROTECTED] (at Sun, 10 Sep 2006 20:10:06 +0300), Diego Beltrami [EMAIL PROTECTED] says: as part of this email you can find a patch which introduces the BEET mode (Bound End-to-End Tunnel) as specified by the ietf draft at the following link: : Signed-off-by: Diego Beltrami [EMAIL PROTECTED] Miika Komu [EMAIL PROTECTED] Herbert Xu [EMAIL PROTECTED] Abhinav Pathak [EMAIL PROTECTED] Jeff Ahrenholz [EMAIL PROTECTED] Please put one Signed-off-by: per person,. diff --git a/include/linux/xfrm.h b/include/linux/xfrm.h index 46a15c7..6a616de 100644 --- a/include/linux/xfrm.h +++ b/include/linux/xfrm.h @@ -120,7 +120,8 @@ enum #define XFRM_MODE_TRANSPORT 0 #define XFRM_MODE_TUNNEL 1 -#define XFRM_MODE_MAX 2 +#define XFRM_MODE_BEET 2 +#define XFRM_MODE_MAX 3 /* Netlink configuration messages. */ enum { This clearly indicates that this patch conflicts with current net-2.6.19. Please rebase to the current tree. --yoshfuji - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/3] sky2: tx pause bug fix
Stephen, After some serious testing, this patch seems to fix the lockup issue completely. I manually applied these changes against the 2.6.17.13 release. - Original Message - From: Stephen Hemminger [EMAIL PROTECTED] To: Jeff Garzik [EMAIL PROTECTED] Cc: netdev@vger.kernel.org Sent: Thursday, September 07, 2006 5:44 AM Subject: [PATCH 1/3] sky2: tx pause bug fix Fix problems with transmit pause frames. The driver was telling the GMAC to flush (not process) pause frames. Manually disabling pause wasn't working because of problems in the setup. This maybe the cause of the lockup under load. http://bugzilla.kernel.org/show_bug.cgi?id=6839 Patch against netdev-2.6 git tree Signed-off-by: Stephen Hemminger [EMAIL PROTECTED] --- drivers/net/sky2.c | 123 ++-- drivers/net/sky2.h |2 - 2 files changed, 43 insertions(+), 82 deletions(-) 1419c8ab49f8fd56bad1ac0d3dcbaf830cd5a5d6 diff --git a/drivers/net/sky2.c b/drivers/net/sky2.c index 7ce0663..a3a4ab2 100644 --- a/drivers/net/sky2.c +++ b/drivers/net/sky2.c @@ -289,7 +289,7 @@ static void sky2_gmac_reset(struct sky2_ static void sky2_phy_init(struct sky2_hw *hw, unsigned port) { struct sky2_port *sky2 = netdev_priv(hw-dev[port]); - u16 ctrl, ct1000, adv, pg, ledctrl, ledover; + u16 ctrl, ct1000, adv, pg, ledctrl, ledover, reg; if (sky2-autoneg == AUTONEG_ENABLE !(hw-chip_id == CHIP_ID_YUKON_XL || hw-chip_id == CHIP_ID_YUKON_EC_U)) { @@ -358,6 +358,7 @@ static void sky2_phy_init(struct sky2_hw ctrl = 0; ct1000 = 0; adv = PHY_AN_CSMA; + reg = 0; if (sky2-autoneg == AUTONEG_ENABLE) { if (hw-copper) { @@ -390,21 +391,46 @@ static void sky2_phy_init(struct sky2_hw /* forced speed/duplex settings */ ct1000 = PHY_M_1000C_MSE; - if (sky2-duplex == DUPLEX_FULL) - ctrl |= PHY_CT_DUP_MD; + /* Disable auto update for duplex flow control and speed */ + reg |= GM_GPCR_AU_ALL_DIS; switch (sky2-speed) { case SPEED_1000: ctrl |= PHY_CT_SP1000; + reg |= GM_GPCR_SPEED_1000; break; case SPEED_100: ctrl |= PHY_CT_SP100; + reg |= GM_GPCR_SPEED_100; break; } + if (sky2-duplex == DUPLEX_FULL) { + reg |= GM_GPCR_DUP_FULL; + ctrl |= PHY_CT_DUP_MD; + } else if (sky2-speed != SPEED_1000 hw-chip_id != CHIP_ID_YUKON_EC_U) { + /* Turn off flow control for 10/100mbps */ + sky2-rx_pause = 0; + sky2-tx_pause = 0; + } + + if (!sky2-rx_pause) + reg |= GM_GPCR_FC_RX_DIS; + + if (!sky2-tx_pause) + reg |= GM_GPCR_FC_TX_DIS; + + /* Forward pause packets to GMAC? */ + if (sky2-tx_pause || sky2-rx_pause) + sky2_write8(hw, SK_REG(port, GMAC_CTRL), GMC_PAUSE_ON); + else + sky2_write8(hw, SK_REG(port, GMAC_CTRL), GMC_PAUSE_OFF); + ctrl |= PHY_CT_RESET; } + gma_write16(hw, port, GM_GP_CTRL, reg); + if (hw-chip_id != CHIP_ID_YUKON_FE) gm_phy_write(hw, port, PHY_MARV_1000T_CTRL, ct1000); @@ -508,6 +534,7 @@ static void sky2_phy_init(struct sky2_hw gm_phy_write(hw, port, PHY_MARV_LED_OVER, ledover); } + /* Enable phy interrupt on auto-negotiation complete (or link up) */ if (sky2-autoneg == AUTONEG_ENABLE) gm_phy_write(hw, port, PHY_MARV_INT_MASK, PHY_M_IS_AN_COMPL); @@ -570,49 +597,11 @@ static void sky2_mac_init(struct sky2_hw gm_phy_read(hw, 1, PHY_MARV_INT_MASK) != 0); } - if (sky2-autoneg == AUTONEG_DISABLE) { - reg = gma_read16(hw, port, GM_GP_CTRL); - reg |= GM_GPCR_AU_ALL_DIS; - gma_write16(hw, port, GM_GP_CTRL, reg); - gma_read16(hw, port, GM_GP_CTRL); - - switch (sky2-speed) { - case SPEED_1000: - reg = ~GM_GPCR_SPEED_100; - reg |= GM_GPCR_SPEED_1000; - break; - case SPEED_100: - reg = ~GM_GPCR_SPEED_1000; - reg |= GM_GPCR_SPEED_100; - break; - case SPEED_10: - reg = ~(GM_GPCR_SPEED_1000 | GM_GPCR_SPEED_100); - break; - } - - if (sky2-duplex == DUPLEX_FULL) - reg |= GM_GPCR_DUP_FULL; - - /* turn off pause in 10/100mbps half duplex */ - else if (sky2-speed != SPEED_1000 - hw-chip_id != CHIP_ID_YUKON_EC_U) - sky2-tx_pause = sky2-rx_pause = 0; - } else - reg = GM_GPCR_SPEED_1000 | GM_GPCR_SPEED_100 | GM_GPCR_DUP_FULL; - - if (!sky2-tx_pause !sky2-rx_pause) { - sky2_write32(hw, SK_REG(port, GMAC_CTRL), GMC_PAUSE_OFF); - reg |= - GM_GPCR_FC_TX_DIS | GM_GPCR_FC_RX_DIS | GM_GPCR_AU_FCT_DIS; - } else if (sky2-tx_pause !sky2-rx_pause) { - /* disable Rx flow-control */ - reg |= GM_GPCR_FC_RX_DIS | GM_GPCR_AU_FCT_DIS; - } - - gma_write16(hw, port, GM_GP_CTRL, reg); - sky2_read16(hw, SK_REG(port, GMAC_IRQ_SRC)); + /* Enable Transmit FIFO Underrun */ + sky2_write8(hw, SK_REG(port, GMAC_IRQ_MSK), GMAC_DEF_MSK); + spin_lock_bh(sky2-phy_lock); sky2_phy_init(hw, port); spin_unlock_bh(sky2-phy_lock); @@ -1529,40 +1518,10 @@ static void sky2_link_up(struct sky2_por unsigned port = sky2-port; u16 reg; - /* Enable Transmit FIFO Underrun */ - sky2_write8(hw, SK_REG(port, GMAC_IRQ_MSK), GMAC_DEF_MSK); - - reg = gma_read16(hw, port, GM_GP_CTRL); - if (sky2-autoneg == AUTONEG_DISABLE) { - reg |= GM_GPCR_AU_ALL_DIS; - - /* Is write/read necessary? Copied from sky2_mac_init */ -
Re: e1000 Detected Tx Unit Hang
Jesse, testing without NAPI, will see how it behaves. Paul Aviles - Original Message - From: Jesse Brandeburg [EMAIL PROTECTED] To: Paul Aviles [EMAIL PROTECTED] Cc: netdev@vger.kernel.org Sent: Tuesday, September 05, 2006 12:09 PM Subject: Re: e1000 Detected Tx Unit Hang On 9/3/06, Paul Aviles [EMAIL PROTECTED] wrote: Hey Jesse, thanks for your reply. Here is the stuff on /procs. The weird no problem, part is that I have several other identical systems and only one is affected. Today I moved the hard drive to another similar system and I am not seeing the problem so I am wondering if is something maybe wrong with the card eeprom? Is there a way to check that? I doubt it is an eeprom problem. you can dump the eeproms with ethtool -e eth0 from both machines and compare them . Odd that only one system is having the problem. Could it be that the hardware on that box is having issues? Are you sure the machines are running the same bios version with the same settings? Any overclocking? cat /proc/interrupts CPU0 CPU1 16: 70540 0 IO-APIC-level uhci_hcd:usb4, eth0 this could contribute to your problem, were you able to test without NAPI? Jesse - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] VIOC: New Network Device Driver
Arnd Bergmann [EMAIL PROTECTED] : [...] A few comments on coding style: Add: - use netdev_priv() - use DMA_{32/64}_BIT_MASK in place of private #define - turn some define into enum ? -- Ueimor - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: TG3 data corruption (TSO ?)
On Sun, 2006-09-10 at 22:33 -0700, Michael Chan wrote: Benjamin Herrenschmidt wrote: I've done: #define tw32_rx_mbox(reg, val) do { wmb(); tp-write32_rx_mbox(tp, reg, val); } while(0) #define tw32_tx_mbox(reg, val) do { wmb(); tp-write32_tx_mbox(tp, reg, val); } while(0) That should do it. I think we need those tcpdump after all. Can you send it to me? Looks like adding a sync to writel does fix it though... I'm trying to figure out which specific writel in the driver makes a difference. I'll then look into slicing those tcpdumps. Ben. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: netdevice name corruption is still present in 2.6.18-rc6-mm1
Nick Orlov wrote: On Fri, Sep 08, 2006 at 11:29:39PM -0400, Nick Orlov wrote: I would like to confirm that issue with netdevice name corruption is still present in 2.6.18-rc6-mm1 and extremely easy to reproduce (at least on my system) with 100% hit rate. All I have to do is 'sudo /etc/init.d/networking stop'. And here we go: Sep 8 22:50:11 nickolas kernel: [events/1:7]: Changing netdevice name from [ath0] to [\200^C^Bб\206] Confirmed: Patrick's patch fixes the issue for me. (http://marc.theaimsgroup.com/?l=linux-kernelm=115777959918268w=2) Thanks Nick. Dave, please apply the attached patch to net-2.6.19, it fixes the netdevice name corruption reported by multiple people. [RTNETLINK]: Fix netdevice name corruption When changing a device by ifindex without including a IFLA_IFNAME attribute, the ifname variable contains random garbage and is used to change the device name. Signed-off-by: Patrick McHardy [EMAIL PROTECTED] --- commit bc3417f679c035e4296cd34f6a55d6b9215764fc tree e43f52402d79560cbed73a769f4def3e761e7a03 parent 6ddbd02eb61532f9af4f28912a09717ab8c71d8a author Patrick McHardy [EMAIL PROTECTED] Sat, 09 Sep 2006 16:18:12 +0200 committer Patrick McHardy [EMAIL PROTECTED] Sat, 09 Sep 2006 16:18:12 +0200 net/core/rtnetlink.c |2 ++ 1 files changed, 2 insertions(+), 0 deletions(-) diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c index 63b882a..d8e25e0 100644 --- a/net/core/rtnetlink.c +++ b/net/core/rtnetlink.c @@ -394,6 +394,8 @@ static int rtnl_setlink(struct sk_buff * if (tb[IFLA_IFNAME]) nla_strlcpy(ifname, tb[IFLA_IFNAME], IFNAMSIZ); + else + ifname[0] = '\0'; err = -EINVAL; ifm = nlmsg_data(nlh);
[PATCH] VIOC: New Network Device Driver
VIOC Device Driver provides a standard device interface to the internal fabric interconnected network used on servers designed and built by Fabric 7 Systems. The patch can be found at ftp.fabric7.com/VIOC. Misha Tomushev - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] VIOC: New Network Device Driver
On Thu, 2006-09-14 17:15:21 -0700, Misha Tomushev [EMAIL PROTECTED] wrote: VIOC Device Driver provides a standard device interface to the internal fabric interconnected network used on servers designed and built by Fabric 7 Systems. The patch can be found at ftp.fabric7.com/VIOC. To get the driver into upstream kernel sources, you'd post it in reviewable pieces to the list. Thanks, JBG -- Jan-Benedict Glaw [EMAIL PROTECTED] +49-172-7608481 Signature of:Don't believe in miracles: Rely on them! the second : signature.asc Description: Digital signature
Re: [PATCH 2.6.18-rc6 1/2] dllink driver: porting v1.19 to linux 2.6.18-rc6
I'm not really an expert, and I didn't understand all your remarks but I can tell you this: The driver supplied with 2.6.15 looks like dlink's driver version 1.17. I had a dlink NIC that got stuck once in a while running that driver. dlink's version 1.19 is written for 2.4 kernels, so all I did was convert it to 2.6 kernels. The new version still got stuck once in a while. Maybe because the bugs you pointed out. I don't have the dlink NIC anymore so I don't see how I can help here. Maybe Edward can answer to that. Sorry. On Thu, 2006-09-07 at 11:19 +0200, Arjan van de Ven wrote: @@ -335,8 +374,9 @@ #endif /* Read eeprom */ for (i = 0; i 128; i++) { - ((u16 *) sromdata)[i] = le16_to_cpu (read_eeprom (ioaddr, i)); + ((u16 *) sromdata)[i] = cpu_to_le16 (read_eeprom (ioaddr, i)); } + psrom-crc = le32_to_cpu(psrom-crc); this looks wrong, the data comes from the hw as le, so le*_to_cpu() sounds the right direction @@ -401,7 +441,7 @@ int i; u16 macctrl; - i = request_irq (dev-irq, rio_interrupt, IRQF_SHARED, dev-name, dev); + i = request_irq (dev-irq, rio_interrupt, SA_SHIRQ, dev-name, dev); if (i) return i; this is backing out a fix/conversion to the new API. Bad. @@ -434,9 +474,12 @@ writeb (0x30, ioaddr + RxDMABurstThresh); writeb (0x30, ioaddr + RxDMAUrgentThresh); writel (0x0007, ioaddr + RmonStatMask); + /* clear statistics */ clear_stats (dev); + atomic_set(np-tx_desc_lock, 0); I'm quite scared by this naming; it suggests home-brew locking dev-trans_start = jiffies; + tasklet_enable(np-tx_tasklet); + writew(DEFAULT_INTR, ioaddr + IntEnable); + return; this looks like a PCI posting bug -rio_free_tx (struct net_device *dev, int irq) +rio_free_tx (struct net_device *dev) { struct netdev_private *np = netdev_priv(dev); int entry = np-old_tx % TX_RING_SIZE; - int tx_use = 0; unsigned long flag = 0; + int irq = in_interrupt(); ep + + if (atomic_read(np-tx_desc_lock)) + return; + atomic_inc(np-tx_desc_lock); and yes.. it is broken self made locking there is a nice race between the _read and the _inc here. if (irq) spin_lock(np-tx_lock); else spin_lock_irqsave(np-tx_lock, flag); double p this is wrong to do with in_interrupt() as gating factor! Always doing the irqsave() is fine btw - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] FRV: do_gettimeofday() should no longer use tickadj
But funky cascading using chained flow handlers doesn't work if the cascade must share an IRQ with some other device, right? Indeed. Best way there is then to have a normal action handler like you do and have it call generic_handle_irq() on the cascaded interrupts. Ben. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: TG3 data corruption (TSO ?)
Oh, we know about this. The powerpc writel() used to have memory barriers in 2.4 kernels but not any more in 2.6 kernels. Red Hat's version of tg3 has extra wmb()'s to fix this problem. David doesn't think that the upstream version of tg3 should have these wmb()'s, and the problem should instead be fixed in powerpc's writel(). I've added a wmb() in tw32_rx_mbox() and tw32_tx_mbox() and can still reproduce the problem. I've also done a 2 days run without TSO enabled without a failure (my test program normally fails after a couple of minutes). Thus, do you see any other code path in the driver where a synchronisation might be missing ? Is there any case where the chip might use data in memory before it has been told to do so with a mailbox write ? (There are no OWN bits that I can see in the descriptors, thus I doubt it will use a transmit descriptor that is half-built before the store to the mailbox allows using it) but who knows That leads to the question that there might be an unrelated bug in the driver. Segher thinks we might be overriding live descriptors, though I haven't seen how yet. It seems to be TSO specific tho... maybe some missing smp synchronisation in the driver itself or a problem when the TX ring is full ? I don't have the chip docs and I'm not familiar with the driver, so I'll keep looking, but advice is welcome. I'll also see if I can reproduce with some other TSO capable card, in case the problem is in the kernel TSO code and not in the driver. Cheers, Ben. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: TG3 data corruption (TSO ?)
Benjamin Herrenschmidt wrote: I've added a wmb() in tw32_rx_mbox() and tw32_tx_mbox() and can still reproduce the problem. I've also done a 2 days run without TSO enabled without a failure (my test program normally fails after a couple of minutes). Hi Ben, The code is a bit tricky. It uses function pointers for the various register read/write methods. For the 5780, I believe it will be assigned a simple writel() and not tg3_write32_tx_mbox(). Can you double check to make sure you have actually added the wmb()? It's probably easiest to just add the wmb() in tg3_xmit_dma_bug() before the tw32_tx_mbox(). - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: TG3 data corruption (TSO ?)
On Sun, 2006-09-10 at 22:18 -0700, Michael Chan wrote: Benjamin Herrenschmidt wrote: I've added a wmb() in tw32_rx_mbox() and tw32_tx_mbox() and can still reproduce the problem. I've also done a 2 days run without TSO enabled without a failure (my test program normally fails after a couple of minutes). Hi Ben, The code is a bit tricky. It uses function pointers for the various register read/write methods. For the 5780, I believe it will be assigned a simple writel() and not tg3_write32_tx_mbox(). Can you double check to make sure you have actually added the wmb()? It's probably easiest to just add the wmb() in tg3_xmit_dma_bug() before the tw32_tx_mbox(). I've done: #define tw32_rx_mbox(reg, val) do { wmb(); tp-write32_rx_mbox(tp, reg, val); } while(0) #define tw32_tx_mbox(reg, val) do { wmb(); tp-write32_tx_mbox(tp, reg, val); } while(0) Cheers, Ben. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: TG3 data corruption (TSO ?)
Benjamin Herrenschmidt wrote: I've done: #define tw32_rx_mbox(reg, val)do { wmb(); tp-write32_rx_mbox(tp, reg, val); } while(0) #define tw32_tx_mbox(reg, val)do { wmb(); tp-write32_tx_mbox(tp, reg, val); } while(0) That should do it. I think we need those tcpdump after all. Can you send it to me? - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [take14 0/3] kevent: Generic event handling mechanism.
On Sat, Sep 09, 2006 at 09:10:35AM -0700, Ulrich Drepper ([EMAIL PROTECTED]) wrote: - one point of critique which applied to many proposals over the years: multiplexer syscalls a bad, really bad. [...] Can you convince Christoph? I do not care about interfaces, but until several people agree on it, I will not change anything. I hope that Linus and/or Andrew simply decree that multiplexers are bad. glibc and probably strace are the two most affected programs so their maintainers should have a say. My opinion os clear. Also for analysis tools the multiplexers are bad since different numbers of parameters are used and maybe even with different types. Types are exactly the same, actually the whole set of operations multiplexed in kevents is add/remove/modify. They really look and work very similar, so it is not that bad to multiplex them in one syscall. But yes, we can extend it to 3 differently named ones, which will end up just in waste of space in syscall tables. I use there only id provided by user, it is not his cookie, but it was done to make strucutre as small as possible. Think about size of the mapped buffer when there are several kevent queues - it is all mapped and thus pinned memory. It of course can be extended. It being what? The problem is that the structure of the ring buffer elements cannot easily be changed later. So we have to get it right now which means being a bit pessimistic about future requirements. Add padding, there will certainly be future uses which need more space. It was/is a whole situation about mmaped buffer - we can extend it, no problem, what fields you think needs to be added? Next, the current interfaces once again fail to learn from a mistake we made and which got corrected for the other interfaces. We need to be able to change the signal mask around the delay atomically. Just like we have ppoll for poll, pselect for select (and hopefully soon also epoll_pwait for epoll_wait) we need to have this feature in the new interfaces. We able to change kevents atomically. I don't understand. Or you don't understand. I was talking about changing the signal mask atomically around the wait call. I.e., the call needs an additional optional parameter specifying the signal mask to use (for the kernel: two parameters, pointer and length). This parameter is not available in the version of the patch I looked at and should be added if it's still missing in the latest version of the patch. Again, look at the difference between poll() and ppoll() and do the same. You meant atomically with respect to signals, I meant about atomically compared to simultaneous access. Looking into ppol() I wonder what is the difference between doing the same in userspace? There are no special locks, nothing special except TIF_RESTORE_SIGMASK bit set, so what's the point of it not being done in userspace? Well, I rarely talk about what other people want, but if you strongly feel, that all posix crap is better than epoll interface, then I can not agree with you. You miss the point entirely like DaveM before you. What I ask for is simply a uniform and well established form to tell an interface to use the kevent notification mechanism and not sue signals etc. Look at the mail I sent in reply to DaveM's mail. There is special function in kevents which is used for kevents addition, which can be called from everywhere (except modules since it is not exported right now), so one can create _any_ interface he likes. POSIX timer-look API is not what a lot of people want, since epoll/poll/select is completely different thing and exactly _that_ is what majority of people use. So I create similar interface. But there are no problem to implement any additional, is is simple. It is possible to create additional one using any POSIX API you like, but I strongly insist on having possibility to use lightweight syscall interface too. Again, missing the point. We can without any significant change enable POSIX interfaces and GNU extensions like the timer, AIO, the async DNS code, etc use kevents. For the latter, which is entirely implemented at userlevel, we need interfaces to queue kevents from userlevel. I think this is already supported. The other two definitely benefit from using kevent notification and since they are/will be handled in the kernel the completion events should be queued in a kevent queue as specified in the sigevent structure passed to the system call. I do not object against additional interfaces, no problem, implementation is really simple. But I strongly object against removing existing interface, it is there not for the furniture, but since it is the most convenient way (in my opinion) to use existing (supported by kevent) event notifications. If we need additional interfaces, it is really simple to add them, just use kevent_user_add_ukevent(), which requires struct ukevent, which desctribe requested