Re: Netchannles: first stage has been completed. Further ideas.
Hello! Hello, Alexey. [ Sorry for long delay, there are some problems with mail servers, so I can not access them remotely, so I create mail by hads, hopefully thread will not be broken. ] There is no socket spinlock anymore. Above lock is skb_queue lock which is held inside skb_dequeue/skb_queue_tail calls. Lock is named differently, but it is still here. BTW for UDP even the name is the same. There is no bh processing, that lock is needed for 4 operations when skb is enqueued/dequeued. And if I would changed skbs to different structures there were no locks at all - it is extremely lightweight, it can not be compared with socket lock at all. No bh/irq processing at all, natural speed management - that is main idea behind netchannels. Equivalent of socket user lock. No, it is an equivalent for hash lock in socket table. OK. But you have to introduce socket mutex somewhere in any case. Even in ATCP. Actually not - VJ's idea is to have only one consumer and one provider, so no locks needed, but I agree, in general case it is needed, but _only_ to protect against several netchannel userspace consumers. There is no BH protocol processing at all, so there is no need to pprotect against someone who will add data while you are processing own chunk. Just an example - tcp_established() can be called with bh disabled under the socket lock. When we have a process context in hands, it is not. Did you ask youself, why do not we put all the packets to backlog/prequeue and just wait when user will read the data? It would be 100% equivalent to netchannels. How many hacks just to be a bit closer to userspace processing, implemented in netchannels! The answer is simple: because we cannot wait. If user delays for 200msec, wait for connection collapse due to retransmissions. If the segment is out of order, immediate attention is required. Any scheme, which tries to wait for user unconditionally, at least has to run a watchdog timer, which fires before sender senses the gap. If userspace is scheduled away for too much time, it is bloody wrong to ack the data, that is impossible to read due to the fact that system is being busy. It is just postponing the work from one end to another - ack now and stop when queue is full, or postpone the ack generation when segment is realy being read. And this is what we do for ages. Grep for VJ in sources. :-) netchannels have nothing to do with it, it is much elder idea. And it was Van, who decided to move away from BH/irq processing. It was slow and a bit pain way (how many hacks with prequeue, with direct processing, it is enough just to look how TCP socket lock is locked in different contexts :) In that case one copies the whole data into userspace, so access for 20 bytes of headers completely does not matter. For short packets it matters. But I said not this. I said it looks _worse_. A bit, but worse. At least for 80 bytes it does not matter at all. And it is very likely that data is misaligned, so half of the header will be in a cache line. And socket code has the same problem - skb-cb can be flushed away, and tcp_recvmsg() needs to get it again. And actually I never understood nanooptimisation behind more serious problems (i.e. one cache line vs. 50MB/sec speed). Hmm, for 80 bytes sized packets win was about 2.5 times. Could you please show me lines inside existing code, which should be commented, so I got 50Mbyte/sec for that? If I knew it would be done. :-) Actually, it is the action, which I would expect. This, but not dropping all the TCP stack. I tried to use existing one, and I had speed and CPU usage win, but it's magnitude was not what I expected, so I started userspace network stack implementation. It was succeded, and there are _very_ major optimisations over existing code, when processing is fully moved into userspace, but also there are big problems, like one syscall per ack, so I decided to use that stack as a base for in-kernel process protocol processing, and I succeded. Probably I will return to the userspace network stack idea when I complete zero-copy networking support. I showed there, that using existing stack it is imposible Please, understand, it is such statements that compromise your work. If it is impossible then it is not interesting. Do not mix soft and warm - I just post the facts, that netchannel TCP implementation works (sumetimes much) faster. It is socket code that probably has some misoptimisations, and if it is impossible to fix them (well, it least it is very hard), then it is not interesting. I definitely do not say, that it must be removed/replaced/anything - it works perfectly ok, but it is possible to have better performance by changing architecture, and it was done. Alexey -- Evgeniy Polyakov - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Netchannles: first stage has been completed. Further ideas.
Hello. [ Sorry for long delay, there are some problems with mail servers, so I can not access them remotely, so I create mail by hads, hopefully thread will not be broken. ] Your description makes it sound as if you would take a huge leap, changing all in-kernel code _and_ the userspace interface in a single patch. Am I wrong? Or am I right and would it make sense to extract small incremental steps from your patch similar to those Van did in his non-published work? My first implementation used existing kernel code and showed small performance win - there was binding of the socket to netchannel and all protocol processing was moved into process context. Iirc, Van didn't show performance numbers but rather cpu utilization numbers. And those went down significantly without changing the userspace interface. At least lca presentation graphs shows exactly different numbers - performance without CPU utilization (but not as his tables). Did you look at cpu utilization as well? If you did and your numbers are worse than Vans, he either did something smarter than you or forged his numbers (quite unlikely). Interesting sentence from political correcteness point of view :) I did both CPU and speed measurements when used socket code [1], and both of them showed small gain, but I only tested 1gbit setup, so they can not be compared with Van's. But even with 1gb I was not satisfied with them, so I started different implementation, which I described in my e-mail to Alexey. 1. speed/cpu measurements of one of the netchannels implementation which used socket code. http://thread.gmane.org/gmane.linux.network/36609/focus=36614 -- Evgeniy Polyakov - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] drivers/net/wireless/d80211: Check configuration type in hw-config_interface.
On Wed, 19 Jul 2006 22:26:52 +0200, Jean-Mickael Guerin wrote: This patch prevents a NULL pointer dereferencing in AP mode: ieee80211_if_config will set conf-bssid only if device is of type STA or IBSS. I see it using following commands right after module loading (with rt61) # iwconfig wlan0 mode Master # ifconfig wlan0 up The patch seems to fix the problem at a wrong place. rt2x00 has broken add_interface handler - it allows adding of AP interface even though the driver doesn't support AP mode. It is add_interface callback that should be fixed in rt2x00. The check in the patch most likely won't be needed even after AP mode support is added to rt2x00 - the driver needs to handle AP mode differently so config_interface callback will be rewritten anyway. adm8211 driver doesn't have this problem and doesn't need to be modified. Thanks, Jiri -- Jiri Benc SUSE Labs - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] drivers/net/wireless/d80211: Check configuration type in hw-config_interface.
On Wed, 19 Jul 2006 18:07:05 -0700, Michael Wu wrote: Why is that? Isn't there a BSSID in AP mode too? Perhaps it is calling config_interface before setting the BSSID? The bssid field in ieee80211_if_conf structure is not set in AP mode. There is no need for that - you already have a MAC address of the AP interface (from add_interface callback). That's your BSSID. adm8211 doesn't support AP mode yet, but it's good to know this crash won't occur when it does. :) The crash won't occur even without the patch - you will need to do completely different things in adm8211_config_interface for AP mode than for STA or IBSS mode and you will put some switch there anyway. No reason for doing it now and bloating the code with a check for condition that cannot happen. Thanks, Jiri -- Jiri Benc SUSE Labs - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Qlogic qla3xxx driver v2.02.00-k36 for upstream inclusion.
By the way, should it work with ISP4010 controllers? Those expose network interface card subdevice too, but aren't listed in pci_device_table of the driver, and after adding the device ID to the driver, it still does not quite work (I tried, just out of curiosity) - the NIC on ISP4010 is - it seems - close but not exactly the same as the driver expects. Thanks. /mjt - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH][NET] ULi526x - driver cleanups
From: Henrik Kretzschmar [EMAIL PROTECTED] Some little cleanups for ULI-TULIP-driver: pci_module_init() conversion to pci_register_driver() remove rc, an unneeded variable from uli526x_module_init() let the debug macros use correct loglevels add a loglevel to a printk let some code more look like CodingStyle Signed-off-by: Henrik Kretzschmar [EMAIL PROTECTED] --- --- linux-2.6.18-rc2/drivers/net/tulip/uli526x.c2006-07-18 13:37:09.0 +0200 +++ linux/drivers/net/tulip/uli526x.c 2006-07-20 11:43:07.0 +0200 @@ -82,9 +82,9 @@ #define ULI526X_TX_TIMEOUT ((16*HZ)/2) /* tx packet time-out time 8 s */ #define ULI526X_TX_KICK(4*HZ/2)/* tx packet Kick-out time 2 s */ -#define ULI526X_DBUG(dbug_now, msg, value) if (uli526x_debug || (dbug_now)) printk(KERN_ERR DRV_NAME : %s %lx\n, (msg), (long) (value)) +#define ULI526X_DBUG(dbug_now, msg, value) if (uli526x_debug || (dbug_now)) printk(KERN_DEBUG DRV_NAME : %s %lx\n, (msg), (long) (value)) -#define SHOW_MEDIA_TYPE(mode) printk(KERN_ERR DRV_NAME : Change Speed to %sMhz %s duplex\n,mode 1 ?100:10, mode 4 ? full:half); +#define SHOW_MEDIA_TYPE(mode) printk(KERN_NOTICE DRV_NAME : Change Speed to %sMhz %s duplex\n,mode 1 ?100:10, mode 4 ? full:half); /* CR9 definition: SROM/MII */ @@ -373,7 +373,8 @@ if (err) goto err_out_res; - printk(KERN_INFO %s: ULi M%04lx at pci%s,,dev-name,ent-driver_data 16,pci_name(pdev)); + printk(KERN_INFO %s: ULi M%04lx at pci%s,,dev-name, + ent-driver_data 16,pci_name(pdev)); for (i = 0; i 6; i++) printk(%c%02x, i ? ':' : ' ', dev-dev_addr[i]); @@ -1027,7 +1028,7 @@ if ( time_after(jiffies, dev-trans_start + ULI526X_TX_TIMEOUT) ) { db-reset_TXtimeout++; db-wait_reset = 1; - printk( %s: Tx timeout - resetting\n, + printk(KERN_ERR %s: Tx timeout - resetting\n, dev-name); } } @@ -1671,18 +1672,17 @@ static struct pci_device_id uli526x_pci_tbl[] = { - { 0x10B9, 0x5261, PCI_ANY_ID, PCI_ANY_ID, 0, 0, PCI_ULI5261_ID }, - { 0x10B9, 0x5263, PCI_ANY_ID, PCI_ANY_ID, 0, 0, PCI_ULI5263_ID }, - { 0, } + {0x10B9, 0x5261, PCI_ANY_ID, PCI_ANY_ID, 0, 0, PCI_ULI5261_ID}, + {0x10B9, 0x5263, PCI_ANY_ID, PCI_ANY_ID, 0, 0, PCI_ULI5263_ID}, + {} }; MODULE_DEVICE_TABLE(pci, uli526x_pci_tbl); - static struct pci_driver uli526x_driver = { - .name = uli526x, - .id_table = uli526x_pci_tbl, - .probe = uli526x_init_one, - .remove = __devexit_p(uli526x_remove_one), + .name = uli526x, + .id_table = uli526x_pci_tbl, + .probe = uli526x_init_one, + .remove = __devexit_p(uli526x_remove_one), }; MODULE_AUTHOR(Peer Chen, [EMAIL PROTECTED]); @@ -1702,7 +1702,6 @@ static int __init uli526x_init_module(void) { - int rc; printk(version); printed_version = 1; @@ -1714,25 +1713,21 @@ if (cr6set) uli526x_cr6_user_set = cr6set; - switch(mode) { + switch (mode) { case ULI526X_10MHF: case ULI526X_100MHF: case ULI526X_10MFD: case ULI526X_100MFD: uli526x_media_mode = mode; break; - default:uli526x_media_mode = ULI526X_AUTO; + default: + uli526x_media_mode = ULI526X_AUTO; break; } - rc = pci_module_init(uli526x_driver); - if (rc 0) - return rc; - - return 0; + return pci_register_driver(uli526x_driver); } - /* * Description: * when user used rmmod to delete module, system invoked clean_module() - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC/PATCH][Bonding]: keep slave state when admin down
When a bonding netdevice is admin-ed down it looses the slaves attributes (set via ifenslave). This is not consistent with other behavior of netdevices (example a qdisc attached to a netdevice doesnt disappear or an attached IP address etc). The included patch fixes this. Ive tested by ifenslaving, downing the bond, checking /proc and making sure it still has the slaves, up-ing the bond and making sure things continue to work. Jay/Bonding folks if you are ok with it, just ACK it or include it in your tree etc. Otherwise we can discuss. This is against linus tree. cheers, jamal diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c index 8b95123..df319be 100644 --- a/drivers/net/bonding/bond_main.c +++ b/drivers/net/bonding/bond_main.c @@ -3420,7 +3420,6 @@ static int bond_close(struct net_device write_lock_bh(bond-lock); - bond_mc_list_destroy(bond); /* signal timers not to re-arm */ bond-kill_timers = 1; @@ -3451,8 +3450,6 @@ static int bond_close(struct net_device break; } - /* Release the bonded slaves */ - bond_release_all(bond_dev); if ((bond-params.mode == BOND_MODE_TLB) || (bond-params.mode == BOND_MODE_ALB)) { @@ -4237,6 +4234,9 @@ static void bond_free_all(void) list_for_each_entry_safe(bond, nxt, bond_dev_list, bond_list) { struct net_device *bond_dev = bond-dev; + bond_mc_list_destroy(bond); + /* Release the bonded slaves */ + bond_release_all(bond_dev); unregister_netdevice(bond_dev); bond_deinit(bond_dev); }
Oops in IFB
Hi, When there is no memory left for creating all IFB devices (requesting by user), a oops happens on the system. Please find enclosed a patch to solve this. Regards, Nicolas [IFB] After ifb_init_one() failed, i is increased. Decrease it before entering in the loop for freeing the other ifb devices. Signed-off-by: Nicolas Dichtel [EMAIL PROTECTED] --- a/drivers/net/ifb.c 2006-07-20 15:16:31.923529050 +0200 +++ b/drivers/net/ifb.c 2006-07-20 15:17:36.370188249 +0200 @@ -271,6 +271,7 @@ for (i = 0; i numifbs !err; i++) err = ifb_init_one(i); if (err) { + i--; while (--i = 0) ifb_free_one(i); }
Re: Oops in IFB
On Thu, 2006-20-07 at 15:33 +0200, Nicolas DICHTEL wrote: [IFB] After ifb_init_one() failed, i is increased. Decrease it before entering in the loop for freeing the other ifb devices. Signed-off-by: Nicolas Dichtel [EMAIL PROTECTED] Thanks Nicolas. Acked-by: Jamal Hadi Salim [EMAIL PROTECTED] cheers, jamal - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [IPROUTE2]: update documentation on mirred and IFB
On Thu, 2006-20-07 at 01:59 +0100, Andy Furniss wrote: jamal wrote: About two more or so to complete these.. cheers, jamal +tc qdisc add dev lo eth0 ? Thanks for catching that Andy. It was attempt at adding ingress to qdisc. I will wait for Stephen to swallow the other patches and then fix this - I have at least two more patches to send in that area. Or you can get a little gitty and send a patch ;- BTW, if there are areas in the docs, help etc that need clarification let me know or fix them and send patches. Or if there are better examples to give send patches. cheers, jamal - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Oops in IFB
On Thu, 2006-20-07 at 09:40 -0400, jamal wrote: On Thu, 2006-20-07 at 15:33 +0200, Nicolas DICHTEL wrote: [IFB] After ifb_init_one() failed, i is increased. Decrease it before entering in the loop for freeing the other ifb devices. Signed-off-by: Nicolas Dichtel [EMAIL PROTECTED] Thanks Nicolas. BTW, in the name of the LinuxWay(tm) - can you also submit a similar patch for dummy? It suffers from the same bug. cheers, jamal - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Oops in IFB
jamal a écrit : BTW, in the name of the LinuxWay(tm) - can you also submit a similar patch for dummy? It suffers from the same bug. No problem, patch is enclosed. Cheers, Nicolas [DUMMY] Avoid an oops when dummy_init_one() failed Signed-off-by: Nicolas Dichtel [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Oops in IFB
Sorry, I forgot the patch ;-) Nicolas Nicolas DICHTEL a écrit : jamal a écrit : BTW, in the name of the LinuxWay(tm) - can you also submit a similar patch for dummy? It suffers from the same bug. No problem, patch is enclosed. Cheers, Nicolas [DUMMY] Avoid an oops when dummy_init_one() failed Signed-off-by: Nicolas Dichtel [EMAIL PROTECTED] --- a/drivers/net/dummy.c 2006-07-20 16:19:09.395351558 +0200 +++ b/drivers/net/dummy.c 2006-07-20 16:19:58.802327279 +0200 @@ -132,6 +132,7 @@ for (i = 0; i numdummies !err; i++) err = dummy_init_one(i); if (err) { + i--; while (--i = 0) dummy_free_one(i); }
[IPV4]: Fix nexthop realm dumping for multipath routes
[IPV4]: Fix nexthop realm dumping for multipath routes Routing realms exist per nexthop, but are only returned to userspace for the first nexthop. This is due to the fact that iproute2 only allows to set the realm for the first nexthop and the kernel refuses multipath routes where only a single realm is present. Dump all realms for multipath routes to enable iproute to correctly display them. Signed-off-by: Patrick McHardy [EMAIL PROTECTED] --- commit c76610a1027809f58840fe65b7abc8704f80dcc8 tree 9651193c156548539845ed0a2bd8af8e51182a00 parent 8e0ae6dc963ce12c8d9264d27509ff551dcb57fa author Patrick McHardy [EMAIL PROTECTED] Wed, 19 Jul 2006 19:22:24 +0200 committer Patrick McHardy [EMAIL PROTECTED] Wed, 19 Jul 2006 19:22:24 +0200 net/ipv4/fib_semantics.c | 12 1 files changed, 8 insertions(+), 4 deletions(-) diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c index 3c45256..1f19cdf 100644 --- a/net/ipv4/fib_semantics.c +++ b/net/ipv4/fib_semantics.c @@ -963,10 +963,6 @@ fib_dump_info(struct sk_buff *skb, u32 p rtm-rtm_protocol = fi-fib_protocol; if (fi-fib_priority) RTA_PUT(skb, RTA_PRIORITY, 4, fi-fib_priority); -#ifdef CONFIG_NET_CLS_ROUTE - if (fi-fib_nh[0].nh_tclassid) - RTA_PUT(skb, RTA_FLOW, 4, fi-fib_nh[0].nh_tclassid); -#endif if (rtnetlink_put_metrics(skb, fi-fib_metrics) 0) goto rtattr_failure; if (fi-fib_prefsrc) @@ -976,6 +972,10 @@ #endif RTA_PUT(skb, RTA_GATEWAY, 4, fi-fib_nh-nh_gw); if (fi-fib_nh-nh_oif) RTA_PUT(skb, RTA_OIF, sizeof(int), fi-fib_nh-nh_oif); +#ifdef CONFIG_NET_CLS_ROUTE + if (fi-fib_nh[0].nh_tclassid) + RTA_PUT(skb, RTA_FLOW, 4, fi-fib_nh[0].nh_tclassid); +#endif } #ifdef CONFIG_IP_ROUTE_MULTIPATH if (fi-fib_nhs 1) { @@ -994,6 +994,10 @@ #ifdef CONFIG_IP_ROUTE_MULTIPATH nhp-rtnh_ifindex = nh-nh_oif; if (nh-nh_gw) RTA_PUT(skb, RTA_GATEWAY, 4, nh-nh_gw); +#ifdef CONFIG_NET_CLS_ROUTE + if (nh-nh_tclassid) + RTA_PUT(skb, RTA_FLOW, 4, nh-nh_tclassid); +#endif nhp-rtnh_len = skb-tail - (unsigned char*)nhp; } endfor_nexthops(fi); mp_head-rta_type = RTA_MULTIPATH;
RE: [PATCH] Qlogic qla3xxx driver v2.02.00-k36 for upstream inclusion.
qla3xxx driver does not support ISP4010. -Original Message- From: Michael Tokarev [mailto:[EMAIL PROTECTED] Sent: Thursday, July 20, 2006 2:13 AM To: Ron Mercer Cc: netdev@vger.kernel.org Subject: Re: [PATCH] Qlogic qla3xxx driver v2.02.00-k36 for upstream inclusion. By the way, should it work with ISP4010 controllers? Those expose network interface card subdevice too, but aren't listed in pci_device_table of the driver, and after adding the device ID to the driver, it still does not quite work (I tried, just out of curiosity) - the NIC on ISP4010 is - it seems - close but not exactly the same as the driver expects. Thanks. /mjt - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Qlogic qla3xxx driver v2.02.00-k36 for upstream inclusion.
On Thu, 20 Jul 2006, Ron Mercer wrote: qla3xxx driver does not support ISP4010. Exactly... The qla3xxx driver supports the NIC function only. -Original Message- From: Michael Tokarev [mailto:[EMAIL PROTECTED] Sent: Thursday, July 20, 2006 2:13 AM To: Ron Mercer Cc: netdev@vger.kernel.org Subject: Re: [PATCH] Qlogic qla3xxx driver v2.02.00-k36 for upstream inclusion. By the way, should it work with ISP4010 controllers? Those expose network interface card subdevice too, but aren't listed in pci_device_table of the driver, and after adding the device ID to the driver, it still does not quite work (I tried, just out of curiosity) - the NIC on ISP4010 is - it seems - close but not exactly the same as the driver expects. You'll need to use the qla4xxx driver to drive the iSCSI function. Regards, Andrew Vasquez - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Qlogic qla3xxx driver v2.02.00-k36 for upstream inclusion.
Andrew Vasquez wrote: On Thu, 20 Jul 2006, Ron Mercer wrote: qla3xxx driver does not support ISP4010. Exactly... The qla3xxx driver supports the NIC function only. ...which is provided by ISP4010 card, as appears on PCI bus: 04:04.0 Ethernet controller: QLogic Corp. QLA3010 Network Adapter (rev 05) 04:04.1 Network controller: QLogic Corp. QLA4010 iSCSI TOE Adapter (rev 05) (the first (sub)device). So it *looks* like the card has *both* a NIC and iSCSI TOE adapter, and the NIC part is pretty much similar to what qla3xxx driver expects... That's why my curiosity. ;) (not that it matters much, just.. curious, really. Well. Not exactly. It'd be nice to compare a NIC w/o Jumbo frames support (which we have on all machines connected to the iSCSI segment), with something more.. advanced. So I wondered if I can utilize the NIC part of the ISP4010 for the test. iSCSI part of the card works significantly slower than open-iscsi stack on non-jumbo-frames-aware Tigon GigE NIC). [] You'll need to use the qla4xxx driver to drive the iSCSI function. Yeah, I know. I posted some results to open-iscsi@ list about a week ago. It basically works (the new one, with open-iscsi infrastructure), but is slooow... ;) Thanks. /mjt - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH][NET]: fix dummy initialization
Same problem and same fix that for IFB. Regards, Nicolas [NET][DUMMY] Avoid an oops when dummy_init_one() failed Signed-off-by: Nicolas Dichtel [EMAIL PROTECTED] --- a/drivers/net/dummy.c 2006-07-20 16:19:09.395351558 +0200 +++ b/drivers/net/dummy.c 2006-07-20 16:19:58.802327279 +0200 @@ -132,6 +132,7 @@ for (i = 0; i numdummies !err; i++) err = dummy_init_one(i); if (err) { + i--; while (--i = 0) dummy_free_one(i); }
Re: Weird TCP SACK problem. in Linux...
Hi Alexy, Is there anything linux specific about the DSACK implementation that might lead to increase in the number of retransmissions, but leads to improvment in download time when timestamps are not used (and the reverse effect when timestamps are used, less retransmissions but bigger download times)? because I couldnt figure it out,also is there anywhere where the reordering response of tcp linux described? (it seem dupthreshold is dynamically adjusted based on the reordering history... but I was not able to find out how...)... Oumer Teyeb wrote: Oumer Teyeb wrote: Hi, Alexey Kuznetsov wrote: Condition triggering start of fast retransmit is the same. The behaviour while retransmit is different. FACKless code behaves more like NewReno. Ok, that is a good point!! Now at least I can convince myself the CDFs for the first retransmissions showing that SACK leads to earlier retransmissions than no SACK are not wrongand I can even convince myself that this is the real reason behind sack/fack's performance degredation for the case of no timestamps,:-)... ... Actually, then the increase in the number of retransmissions and the increase in teh download time from no SACK - SACK for timestamp case seems to make sense also...my reasoning is like this...if there is timestamps, that means there is reordering detection...hence the number retransmissions are reduced because we avoid the time spent in fast recovery when we introduce SACK on top of timestamps, we enter fast retransmits earlier than no SACK case as we seem to agree, and since the timestamp reduces the number of retransmission once we are in fast recovery, the retransmissions we see are basically the first few retransmissions that made us enter the false fast retransmits, so we have a little increase in the retransmissions and a little increase in the download times... but when no timestamps are used, there is no reordering detection and so SACK leads to less number of retransmissions because it retransmits selectively, but it doesnt improve the download time because it enters fast retransmit eralier than the no SACK and in this case the fast retransmits are very costly because they are not detected lead to window reduction am I making sense?:-) still the DSACK case is puzzling me Regards, Oumer - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] drivers/net/wireless/d80211: Check configuration type in hw-config_interface.
Hi, This patch prevents a NULL pointer dereferencing in AP mode: ieee80211_if_config will set conf-bssid only if device is of type STA or IBSS. I see it using following commands right after module loading (with rt61) # iwconfig wlan0 mode Master # ifconfig wlan0 up The patch seems to fix the problem at a wrong place. rt2x00 has broken add_interface handler - it allows adding of AP interface even though the driver doesn't support AP mode. It is add_interface callback that should be fixed in rt2x00. Well rt2x00 does support AP mode, our latest CVS tree (patches for wireless-dev are in progress) has even shown a working configuration for some users. So add_interface is correct at allowing the AP interface, perhaps some more steps are required to make it completely work, but it is work in progress. The check in the patch most likely won't be needed even after AP mode support is added to rt2x00 - the driver needs to handle AP mode differently so config_interface callback will be rewritten anyway. I'll make a check to see if the bssid is NULL or invalid in the config_bssid() function, and make sure that in AP mode the MAC is written as BSSID. Ivo - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Netchannles: first stage has been completed. Further ideas.
Hello! Small question first: userspace, but also there are big problems, like one syscall per ack, I do not see redundant syscalls. Is not it expected to send ACKs only after receiving data as you said? What is the problem? Now boring things: There is no BH protocol processing at all, so there is no need to pprotect against someone who will add data while you are processing own chunk. Essential part of socket user lock is the same mutex. Backlog is actually not a protection, but a thing equivalent to netchannel. The difference is only that it tries to process something immediately, when it is safe. You can omit this and push everything to backlog(=netchannel), which is processed only by syscalls, if you do not care about latency. How many hacks just to be a bit closer to userspace processing, implemented in netchannels! Moving processing closer to userspace is not a goal, it is a tool. Which sometimes useful, but generally quite useless. F.e. in your tests it should not affect performance at all, end user is just a sink. What's about prequeueing, it is a bright example. Guess why is it useful? What does it save? Nothing, like netchannel. Answer is: it is just a tool to generate coarsed ACKs in a controlled manner without essential violation of protocol. (Well, and to combine checksumming and copy if you do not like how your card does this) If userspace is scheduled away for too much time, it is bloody wrong to ack the data, that is impossible to read due to the fact that system is being busy. It is just postponing the work from one end to another - ack now and stop when queue is full, or postpone the ack generation when segment is realy being read. ... when you get all the segments nicely aligned, blah-blah-blah. If you do not care about losses-congestion-delays-delacks-whatever, you have a totally different protocol. Sending window feedback is only a minor part of tcp. But even these boring tcp intrinsics are not so important, look at ideal lossless network: Think what happens f.e. while plain file transfer to your notebook. You get 110MB/sec for a few seconds, then writeback is fired and disk io subsystems discovers that the disk holds only 50MB/sec. If you are unlucky and some another application starts, disk is so congested that it will take lots of seconds to make a progress with io. For this time another side will retransmit, because poor thing thought rtt is 100 usecs and you will never return to 50MB/sec. You have to _CLOSE_ window in the case of long delay, rather than to forget to ack. See the difference? It is just because actual end user is still far far away. And this happens all the time, when you relay the results to another application via pipe, when... Well, the only case where real end user is user of netchannel is when you receive to a sink. But I said not this. I said it looks _worse_. A bit, but worse. At least for 80 bytes it does not matter at all. Hello-o, do you hear me? :-) I am asking: it looks not much better, but a bit worse, then what is real reason for better performance, unless it is due to castration of protocol? Simplify protocol, move all the processing (even memory copies) to softirq, leave to user space only feeding pages to copy and you will have unbeatable performance. Been there, done that, not with TCP of course, but if you do not care about losses and ACK clocking and send an ACK once per window, I do not see how it can spoil the situation. And actually I never understood nanooptimisation behind more serious problems (i.e. one cache line vs. 50MB/sec speed). You deal with 80 byte packets, to all that I understand. If you lose one cacheline per packet, it is a big problem. All that we can change is protocol overhead. Handling data part is invariant anyway. You are scared of complexity of tcp, but you obviously forget one thing: cpu is fast. The code can look very complicated: some crazy hash functions, damn hairy protocol processing, but if you take care about caches etc., all this is dominated by the first look into packet in eth_type_trans() or ip_rcv(). BTW, when you deal with normal data flow, cache can be not dirtied by data at all, it can be bypassed. works perfectly ok, but it is possible to have better performance by changing architecture, and it was done. It is exactly the point of trouble. From all that I see and you said, better performance is got not due to change of architecture, but despite of this. A proof that we can perform better by changing protocol is not required, it is kinda obvious. The question is how to make existing protocol to perform better. I have no idea, why your tcp performs better. It can be everything: absence of slow start, more coarse ACKs, whatever. I believe you were careful to check those reasons and to do a fair comparison, but then the only guess remains that you saved lots of i-cache getting rid of long code path. And none of those guesses can be attributed to
Re: [PATCH] drivers/net/wireless/d80211: Check configuration type in hw-config_interface.
Hi, This patch prevents a NULL pointer dereferencing in AP mode: ieee80211_if_config will set conf-bssid only if device is of type STA or IBSS. I see it using following commands right after module loading (with rt61) # iwconfig wlan0 mode Master # ifconfig wlan0 up The patch seems to fix the problem at a wrong place. rt2x00 has broken add_interface handler - it allows adding of AP interface even though the driver doesn't support AP mode. It is add_interface callback that should be fixed in rt2x00. Well rt2x00 does support AP mode, our latest CVS tree (patches for wireless-dev are in progress) has even shown a working configuration for some users. So add_interface is correct at allowing the AP interface, perhaps some more steps are required to make it completely work, but it is work in progress. The check in the patch most likely won't be needed even after AP mode support is added to rt2x00 - the driver needs to handle AP mode differently so config_interface callback will be rewritten anyway. I'll make a check to see if the bssid is NULL or invalid in the config_bssid() function, and make sure that in AP mode the MAC is written as BSSID. I won't be able to send this patch seperately, so it will become part of the larger series that I am currently working on, that patch series throws a lot of code around and has some major changes to the rt2x00 code so expect very large patches. The bssid is NULL fix was already in my cvs tree, so only the update to use the MAC as BSSID in master mode has been added. Ivo - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] smc911x: Re-release spinlock on spurious interrupt
Peter == Peter Korsgaard [EMAIL PROTECTED] writes: Peter Hi, Peter The smc911x driver forgets to release the spinlock on spurious Peter interrupts. This little patch fixes it. Crap - forgot to sign off :/ Signed-off-by: Peter Korsgaard [EMAIL PROTECTED] diff -Naur linux-2.6.18-rc2.orig/drivers/net/smc911x.c linux-2.6.18-rc2/drivers/net/smc911x.c --- linux-2.6.18-rc2.orig/drivers/net/smc911x.c 2006-07-20 10:26:20.0 +0200 +++ linux-2.6.18-rc2/drivers/net/smc911x.c 2006-07-20 17:44:26.0 +0200 @@ -1092,6 +1092,7 @@ /* Spurious interrupt check */ if ((SMC_GET_IRQ_CFG() (INT_CFG_IRQ_INT_ | INT_CFG_IRQ_EN_)) != (INT_CFG_IRQ_INT_ | INT_CFG_IRQ_EN_)) { + spin_unlock_irqrestore(lp-lock, flags); return IRQ_NONE; } -- Bye, Peter Korsgaard - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: e1000: fix it on thinkpad x60 / eeprom checksum read fails
Pavel Machek wrote: Hi! e1000 in thinkpad x60 fails without this dirty hack. What to do with it? Signed-off-by: Pavel Machek [EMAIL PROTECTED] NAK, certainly this should never be merged in any tree... this is a known issue that we're tracking here: http://sourceforge.net/tracker/index.php?func=detailaid=1474679group_id=42302atid=447449 Summary of the issue: Lenovo has used certain BIOS versions where ASPD/DSPD was turned on which turns the PHY off when no cable is inserted to save power. The e1000 driver already turns off this feature but can't do this until the driver is loaded. It seems that turning this feature on causes the MAC to give read errors. Lenovo seems to have the feature turned off in their latest BIOS versions, we encourage all people to upgrade their BIOS with the latest version from Lenovo (available from their website). It seems that for at least 2 people, this has fixed the problem. Inserting a cable obviously might also work :) Hehe. We did reproduce the problem initially with the old BIOS (1.01-1.03) on a T60 system, but unfortunately the bug disappeared into nothingness. Bypassing the checksum leaves the NIC in an uncertain state and is not recommended. Okay, perhaps this should be inserted as a comment into the driver, and printk should be fixed to point at this explanation? Can't we enable the driver with the bad checksum, then read the _real_ data? no. We're working on a solution where we make sure that the PHY is physically turned on properly before we read the EEPROM, which would be the proper fix. It's completely not acceptable to run when the EEPROM checksum fails - you might even be running with the wrong MAC address, or worse. Lets fix this the right way instead. Auke PS: adding netdev to the CC... - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC/PATCH][Bonding]: keep slave state when admin down
jamal [EMAIL PROTECTED] wrote: When a bonding netdevice is admin-ed down it looses the slaves attributes (set via ifenslave). This is not consistent with other behavior of netdevices (example a qdisc attached to a netdevice doesnt disappear or an attached IP address etc). The included patch fixes this. Ive tested by ifenslaving, downing the bond, checking /proc and making sure it still has the slaves, up-ing the bond and making sure things continue to work. Do the initscript and sysconfig packages (/sbin/ifup, ifdown, that stuff in /etc/sysconfig/network-scripts, etc) do the right thing with this change? If memory serves, the initscripts will down the bond during setup; I'm not sure if there is any dependency on that action releasing all (possibly preexisting) slaves. I don't have a big problem with this, but I'm a little concerned that there may be dependencies on the existing behavior. -J --- -Jay Vosburgh, IBM Linux Technology Center, [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Alternate to Ixia's ANVL test harness for tcp compliance.
Hey Gang: Both at UNM and Bluelane we have used Ixia's ANVL test harness for verifying TCP protocol compliance with the RFC's. Recent additions to Ixia's ANVL GUI provide a ethereal like GUI. It looks really slick; even providing ladder diagrams for quickly viewing the big picture. Unfortunately Ixia told me they don't have any plains to port the new GUI to linux. Instead they are trying to migrate Linux developers, us, to using Windows. Yeck! With Ixia migrating away from Linux I was wondering if we should consider using an alternate test bed for TCP protocol compliance. Do any of you use tools other than ANVL for RFC compliance while hacking to the tcp code? In the unlikely event that there isn't an alternate; is there any interest in a netdev group effort to motivate Ixia to porting their C sharp code to linux. I get the feeling that come of their developers would like to port the code to linux. -piet -- Piet Delaney BlueLane Teck W: (408) 200-5256; [EMAIL PROTECTED] H: (408) 243-8872; [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Alternate to Ixia's ANVL test harness for tcp compliance.
Piet Delaney wrote: Do any of you use tools other than ANVL for RFC compliance while hacking to the tcp code? In the unlikely event that there isn't an alternate; is there any interest in a netdev group effort to motivate Ixia to porting their C sharp code to linux. I get the feeling that come of their developers would like to port the code to linux. Linux is the most RFC-compliant net stack in the world... if they don't want to support Linux, it's their loss. :) Jeff - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Alternate to Ixia's ANVL test harness for tcp compliance.
On Thu, 2006-20-07 at 12:49 -0700, Piet Delaney wrote: Hey Gang: Both at UNM and Bluelane we have used Ixia's ANVL test harness for verifying TCP protocol compliance with the RFC's. Recent additions to Ixia's ANVL GUI provide a ethereal like GUI. It looks really slick; even providing ladder diagrams for quickly viewing the big picture. Unfortunately Ixia told me they don't have any plains to port the new GUI to linux. Instead they are trying to migrate Linux developers, us, to using Windows. Yeck! With Ixia migrating away from Linux I was wondering if we should consider using an alternate test bed for TCP protocol compliance. Do any of you use tools other than ANVL for RFC compliance while hacking to the tcp code? Talk to the USAGI folks. They have something similar to ANVL called TAHI that they use to check compliance in IPV4, IPV6 and IPSEC. It should be extendable with some effort to do TCP. In the unlikely event that there isn't an alternate; is there any interest in a netdev group effort to motivate Ixia to porting their C sharp code to linux. I get the feeling that come of their developers would like to port the code to linux. Create competition for them - it is the easiest way to get them motivated. cheers, jamal - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Bug in e1000 + semantics of flow control WAS(Re: [e1000]: flow control on by default - good idea really?
I went back to this today. I am typing this from a scribbled sticky note in a big hurry - but i still believe I took the correct notes. It does seem there is no distinction between what ethernet advertises for flow control capability vs what it ends up negotiating with its partner i.e there is some ambiguity. I havent checked tg3, this on e1000 only. On Fri, 2006-07-07 at 08:28 -0400, jamal wrote: On Thu, 2006-06-07 at 23:59 -0700, David Miller wrote: It's autonegotiated, check you kernel message logs when the link came up, you'll see this: tg3: eth0: Flow control is on for TX and on for RX. yikes - yes, this would be it. I could be wrong and i will double check: I think when the e1000 says via ethtool rx is on - it means that it is _advertising_ flow control as opposed to detecting partner has flow control capability. Auke, can you also check this as well? Semantic #1 For example, configure: ethtool -A eth0 rx off ethtool -A eth0 tx on debopolis:~# ethtool -a eth0 Pause parameters for eth0: Autonegotiate: on RX: off TX: on The other side was set to do symmetric TX flow control only. Now enforce autonegotiation: ethtool -r eth0 ethtool -a eth0 Pause parameters for eth0: Autonegotiate: on RX: off TX: off Ok, this is what i expected if this thing (output of ethtool) was supposed to store state as opposed to configuration. But if it is state that is stored, then what about that the values before autonegotian - surely that state is invalid, no? It would be nice (for debug/usability reasons) to be able to see what i configured vs what i end up negotiating with the link partner. I think this may be an ethtool issue, but it could also be a driver issue. I send 1 Mpps to eth0 and see no flow control packets back. good. So it does store state #2: The other semantic debopolis:~# ethtool -A eth0 rx on debopolis:~# ethtool -A eth0 tx on debopolis:~# ethtool -a eth0 Pause parameters for eth0: Autonegotiate: on RX: on TX: on Other side was set to do symmetric TX flow control only. Now enforce autonegotiation: debopolis:~# ethtool -r eth0 lets see what we came up with: debopolis:~# ethtool -a eth0 Pause parameters for eth0: Autonegotiate: on RX: on TX: on Now that is contradictory to #1 semantic - I would have expected this TX flow control on the e1000 to be off. Unless it is meant to store configuration info and not what you have negotiated. Trying sending traffic to the e1000 at about 1Mpps. I observe that the e1000 is sending out about 800Kpps of flow control packets back ;- So which semantics are correct? I would claim #2 flow control behavior to be a bug. I just dont have time to chase a fix - hopefully whoever reads this from the e1000 crowd can fix it. More importantly can we have two variables storing the two pieces on information separately instead of the ambiguity of just one? cheers, jamal - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC/PATCH][Bonding]: keep slave state when admin down
On Thu, 2006-20-07 at 11:50 -0700, Jay Vosburgh wrote: Do the initscript and sysconfig packages (/sbin/ifup, ifdown, that stuff in /etc/sysconfig/network-scripts, etc) do the right thing with this change? I havent seen issues so far. If memory serves, the initscripts will down the bond during setup; I'm not sure if there is any dependency on that action releasing all (possibly preexisting) slaves. The one i have experimented with has no issues - but you may be right some people depend on this behavior at shutdown. I don't have a big problem with this, but I'm a little concerned that there may be dependencies on the existing behavior. I could add a module parameter that restores old behavior when asked to and we keep that for a while and have it print a warning message. The other alternative is just release it and see if someone complains. cheers, jamal - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Alternate to Ixia's ANVL test harness for tcp compliance.
On Thu, 2006-07-20 at 16:04 -0400, Jeff Garzik wrote: Piet Delaney wrote: Do any of you use tools other than ANVL for RFC compliance while hacking to the tcp code? In the unlikely event that there isn't an alternate; is there any interest in a netdev group effort to motivate Ixia to porting their C sharp code to linux. I get the feeling that come of their developers would like to port the code to linux. Linux is the most RFC-compliant net stack in the world... if they don't want to support Linux, it's their loss. :) They aren't exactly dropping support for Linux, they 'just' are not plaining to port the new ethereal like GUI to Lunux: -- Hi Piet, Unfortunately there is no plan to redesign the GUI for Linux. We added support for Windows a couple of releases back. The latest release 7.10 has been benefit by a new Windows GUI framework we have designed for all windows based Ixia test application. The new GUI is based on C#, which includes ethereal like packet decode, ladder diagram, Outlook like GUI design. Currently there is big challenge to implement the same GUI for Linux. The needed resource is also an issue. Ixia will continue to maintain and support Linux platform. Please rest assure. Both windows and linux platforms share the same under layer test engine.. So there is no difference in test cases. Ixia also offers an upgrade path from Linux to Windows. Please contact your local Ixia sales if you are interested. Dean --- I wonder if Microsoft is providing the big challenge to porting the same GUI to linux. The world really doesn't need yet another Java language. Gosling is a Genius, I studied his X11 News Server enough to know first hand. Microsoft lost in court with their violating the Java standards and C sharp seems to be just another stratagy to their bizarre attempt to world domination (Like the SCO mess). I suggest that Linux networking companies like UNM and us be Beta customers for a port. So far it hasn't been entertained TMBK. -piet Jeff -- Piet Delaney BlueLane Teck W: (408) 200-5256; [EMAIL PROTECTED] H: (408) 243-8872; [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Alternate to Ixia's ANVL test harness for tcp compliance.
On Thursday 20 July 2006 21:49, Piet Delaney wrote: Unfortunately Ixia told me they don't have any plains to port the new GUI to linux. Instead they are trying to migrate Linux developers, us, to using Windows. Yeck! With some luck it will just work in wine. -Andi - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Netchannles: first stage has been completed. Further ideas.
On Thu, Jul 20, 2006 at 08:41:00PM +0400, Alexey Kuznetsov ([EMAIL PROTECTED]) wrote: Hello! Hello, Alexey. Small question first: userspace, but also there are big problems, like one syscall per ack, I do not see redundant syscalls. Is not it expected to send ACKs only after receiving data as you said? What is the problem? I mean that each ack is a pure syscall without any data, so overhead is quite huge compared to the situatin when acks are created in kernelspace. At least slow start will eat a lot of CPU with them. Now boring things: There is no BH protocol processing at all, so there is no need to pprotect against someone who will add data while you are processing own chunk. Essential part of socket user lock is the same mutex. Backlog is actually not a protection, but a thing equivalent to netchannel. The difference is only that it tries to process something immediately, when it is safe. You can omit this and push everything to backlog(=netchannel), which is processed only by syscalls, if you do not care about latency. If we consider netchannels as how Van Jackobson discribed them, then mutext is not needed, since it is impossible to have several readers or writers. But in socket case even if there is only one userspace consumer, that lock must be held to protect against bh (or introduce several queues and complicate a lot their's management (ucopy for example)). How many hacks just to be a bit closer to userspace processing, implemented in netchannels! Moving processing closer to userspace is not a goal, it is a tool. Which sometimes useful, but generally quite useless. F.e. in your tests it should not affect performance at all, end user is just a sink. What's about prequeueing, it is a bright example. Guess why is it useful? What does it save? Nothing, like netchannel. Answer is: it is just a tool to generate coarsed ACKs in a controlled manner without essential violation of protocol. (Well, and to combine checksumming and copy if you do not like how your card does this) I can not agree here. The main goal of the protocol is data delivery to the user, but not it's blind accepting and data transmit from user, but not some other ring. As you see, sending is already implemented in process' context, but receiving is not directly connected to the user. THe more elemnts between user and it's data we have, the more probability of some problems there. And we already have two queues just to eliminate one of them. Moving protocol (no matter if it is TCP or not) closer to user allows naturally control the dataflow - when user can read that data(and _this_ is the main goal), user acks, when it can not - it does not generate ack. In theory that can lead to the full absence of the congestions, especially if receiving window can be controlled in both directions. At least with current state of routers it does not lead to the broken connections. If userspace is scheduled away for too much time, it is bloody wrong to ack the data, that is impossible to read due to the fact that system is being busy. It is just postponing the work from one end to another - ack now and stop when queue is full, or postpone the ack generation when segment is realy being read. ... when you get all the segments nicely aligned, blah-blah-blah. If you do not care about losses-congestion-delays-delacks-whatever, you have a totally different protocol. Sending window feedback is only a minor part of tcp. But even these boring tcp intrinsics are not so important, look at ideal lossless network: Think what happens f.e. while plain file transfer to your notebook. You get 110MB/sec for a few seconds, then writeback is fired and disk io subsystems discovers that the disk holds only 50MB/sec. If you are unlucky and some another application starts, disk is so congested that it will take lots of seconds to make a progress with io. For this time another side will retransmit, because poor thing thought rtt is 100 usecs and you will never return to 50MB/sec. You have to _CLOSE_ window in the case of long delay, rather than to forget to ack. See the difference? It is just because actual end user is still far far away. And this happens all the time, when you relay the results to another application via pipe, when... Well, the only case where real end user is user of netchannel is when you receive to a sink. There is one problem in your logic. RTT will not be so small, since acks are not sent when user does not read data. But I said not this. I said it looks _worse_. A bit, but worse. At least for 80 bytes it does not matter at all. Hello-o, do you hear me? :-) I am asking: it looks not much better, but a bit worse, then what is real reason for better performance, unless it is due to castration of protocol? Well, if speed would be measured in lines of code, that atcp gets far less than existing tcp, but performance win is only 2.5 times.
Re: Alternate to Ixia's ANVL test harness for tcp compliance.
From: Piet Delaney [EMAIL PROTECTED] Date: Thu, 20 Jul 2006 13:24:34 -0700 I wonder if Microsoft is providing the big challenge to porting the same GUI to linux. The world really doesn't need yet another Java language. Gosling is a Genius, I studied his X11 News Server enough to know first hand. Microsoft lost in court with their violating the Java standards and C sharp seems to be just another stratagy to their bizarre attempt to world domination (Like the SCO mess). Under Linux we have Mono as a C-sharp implementation. For the kind of GUI they most likely have, porting shouldn't be much of an issue at all. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Netchannles: first stage has been completed. Further ideas.
Evgeniy Polyakov wrote: Backlog is actually not a protection, but a thing equivalent to netchannel. The difference is only that it tries to process something immediately, when it is safe. You can omit this and push everything to backlog(=netchannel), which is processed only by syscalls, if you do not care about latency. If we consider netchannels as how Van Jackobson discribed them, then mutext is not needed, since it is impossible to have several readers or writers. But in socket case even if there is only one userspace consumer, that lock must be held to protect against bh (or introduce several queues and complicate a lot their's management (ucopy for example)). Out of curiosity, is it possible to have the single producer logic if you have two+ ethernet interfaces handling frames for a single TCP connection? (I am assuming some sort of multi-path routing logic...) Thanks, Ben -- Ben Greear [EMAIL PROTECTED] Candela Technologies Inc http://www.candelatech.com - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Alternate to Ixia's ANVL test harness for tcp compliance.
Piet Delaney wrote: I wonder if Microsoft is providing the big challenge to porting the same GUI to linux. The world really doesn't need yet another Java language. Gosling is a Genius, I studied his X11 News Server enough to know first hand. Microsoft lost in court with their violating the Java standards and C sharp seems to be just another stratagy to their bizarre attempt to world domination (Like the SCO mess). Runtime dynamic bytecode languages -- Java, Perl, Python, Ruby, ... -- do seem to be all the rage. As DaveM noted, though, C# is fully supported under Linux. Or maybe they could go for Gtk+, which has successfully been used to maintain complex GUIs apps on both Windows and Linux. GIMP is the most notable example, but use of Gtk+, GLib, and mingw has meant that you can build Linux-ish apps on Windows without nasty porting layers like Cygwin. Jeff - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Netchannles: first stage has been completed. Further ideas.
If we consider netchannels as how Van Jackobson discribed them, then mutext is not needed, since it is impossible to have several readers or writers. But in socket case even if there is only one userspace consumer, that lock must be held to protect against bh (or introduce several queues and complicate a lot their's management (ucopy for example)). As I recall Van's talk you don't need a lock with a ring buffer if you have a start and end variable pointing to location within ring buffer. He didn't explain this in great depth as it is computer science 101 but here is how I would explain it: Once socket is initialiased consumer is the only one that sets start variable and network driver reads this only. It is the other way around for the end variable. As long as the writes are atomic then you are fine. You only need one ring buffer in this scenario and two atomic variables. Having atomic writes does have overhead but far less than locking semantic. -- Ian McDonald Web: http://wand.net.nz/~iam4 Blog: http://imcdnzl.blogspot.com WAND Network Research Group Department of Computer Science University of Waikato New Zealand - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Alternate to Ixia's ANVL test harness for tcp compliance.
On Thu, 2006-07-20 at 17:31 -0400, Jeff Garzik wrote: Piet Delaney wrote: I wonder if Microsoft is providing the big challenge to porting the same GUI to linux. The world really doesn't need yet another Java language. Gosling is a Genius, I studied his X11 News Server enough to know first hand. Microsoft lost in court with their violating the Java standards and C sharp seems to be just another stratagy to their bizarre attempt to world domination (Like the SCO mess). Runtime dynamic bytecode languages -- Java, Perl, Python, Ruby, ... -- do seem to be all the rage. As DaveM noted, though, C# is fully supported under Linux. Or maybe they could go for Gtk+, which has successfully been used to maintain complex GUIs apps on both Windows and Linux. GIMP is the most notable example, but use of Gtk+, GLib, and mingw has meant that you can build Linux-ish apps on Windows without nasty porting layers like Cygwin. Perhaps, but my experience with GTK has been that it's difficult to get installed right if you put it on /usr/local. I tried compiling ethereal for our platform and it needed GTK and a series of other libraries. I suspect it's likely a major effort to migrate from a Microsoft C sharp environment to GTK. -piet Jeff -- Piet Delaney BlueLane Teck W: (408) 200-5256; [EMAIL PROTECTED] H: (408) 243-8872; [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Alternate to Ixia's ANVL test harness for tcp compliance.
On Thursday 20 July 2006 16:31, Jeff Garzik wrote: Piet Delaney wrote: I wonder if Microsoft is providing the big challenge to porting the same GUI to linux. The world really doesn't need yet another Java language. Gosling is a Genius, I studied his X11 News Server enough to know first hand. Microsoft lost in court with their violating the Java standards and C sharp seems to be just another stratagy to their bizarre attempt to world domination (Like the SCO mess). Runtime dynamic bytecode languages -- Java, Perl, Python, Ruby, ... -- do seem to be all the rage. As DaveM noted, though, C# is fully supported under Linux. Or maybe they could go for Gtk+, which has successfully been used to maintain complex GUIs apps on both Windows and Linux. GIMP is the most notable example, but use of Gtk+, GLib, and mingw has meant that you can build Linux-ish apps on Windows without nasty porting layers like Cygwin. Jeff Base C# support is pretty good in Mono, but you still have to be quite careful when creating a cross-platform application with it. Microsoft's version implements a number of libraries that still are not quite as well implemented in Mono (if at all). The toolkit libraries (Windows Forms, to the latest stuff with Vista) are a bit of a moving target. Plus, the .Net platform still lets developers interact with COM objects and other Windows-only code. Just because the GUI is C# does not mean that it does not have a number of Windows-only dependencies, unless it was implemented with portability in-mind. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Alternate to Ixia's ANVL test harness for tcp compliance.
Brent Cook wrote: Just because the GUI is C# does not mean that it does not have a number of Windows-only dependencies, unless it was implemented with portability in-mind. Well, sure... The same can be said of any source code base, for any set of platforms, for any given language. Jeff - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] via-velocity: fix reported speed and link detected status
Jay Cliburn [EMAIL PROTECTED] : The via-velocity driver reports incorrect speed and link detected status as viewed by ethtool (and probably other tools). This patch fixes those incorrect reports and prettifies a long line. Looks fine. Fixed the whitespace/tabs damage, the 190 cols comment and taged as 'upstream-20060720-00' in branch 'upstream' at git://electric-eye.fr.zoreil.com/home/romieu/linux-2.6.git -- Ueimor - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Netchannles: first stage has been completed. Further ideas.
Hello! Moving protocol (no matter if it is TCP or not) closer to user allows naturally control the dataflow - when user can read that data(and _this_ is the main goal), user acks, when it can not - it does not generate ack. In theory To all that I rememeber, in theory absence of feedback leads to loss of control yet. The same is in practice, unfortunately. You must say that window is closed, otherwise sender is totally confused. There is one problem in your logic. RTT will not be so small, since acks are not sent when user does not read data. It is arithmetics: rtt = window/rate. And rto stays rounded up to 200 msec, unless you messed the connection so hard that it is not alive. Check. Simplify protocol, move all the processing (even memory copies) to softirq, leave to user space only feeding pages to copy and you will have unbeatable performance. Been there, done that, not with TCP of course, but if you do not care about losses and ACK clocking and send an ACK once per window, I do not see how it can spoil the situation. Do you live in a perfect world, where user does not want what was requested? All the time I am trying to bring you attention that you read to sink. :-) At least, read to disk to move it a little closer to reality. Or at least do it from terminal and press ^Z sometimes. You deal with 80 byte packets, to all that I understand. If you lose one cacheline per packet, it is a big problem. So actual netchannels speed is even better? :) atcp. If you get rid of netchannels, leave only atcp, the speed will be at least not worse. No doubts. tell me, why we should keep (enabled) that redundant functionality? Because it can work better in some other places, and that is correct, but why it should be enabled then in majority of the cases? Did not I tell you something like that? :-) Optimize real thing, even trying to detect the situations when retransmissions are redundant and eliminate the code. Let's draw the line. ... That was my opinion on the topic. It looks like neither you, nor me will not change our point of view about that right now :) I agree. :) Alexey - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Oops in IFB
From: Nicolas DICHTEL [EMAIL PROTECTED] Date: Thu, 20 Jul 2006 16:31:16 +0200 Sorry, I forgot the patch ;-) Also applied, thanks Nicolas. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [IPV4]: Fix nexthop realm dumping for multipath routes
Good catch, applied, thanks Patrick. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Weird TCP SACK problem. in Linux...
Hello! Hmmm... I dont understand thisso if reording can be detected, (i.e we use timestamps, DSACK), the dupthreshold is increased Yes. implementation that might lead to increase in the number of retransmissions, but leads to improvment in download time Hmm... I thought and still do not know. couldnt figure it out,also is there anywhere where the reordering response of tcp linux described? (it seem dupthreshold is dynamically adjusted based on the reordering history... but I was not able to find out how...)... That's comment from tcp_input: * Reordering detection. * * Reordering metric is maximal distance, which a packet can be displaced * in packet stream. With SACKs we can estimate it: * * 1. SACK fills old hole and the corresponding segment was not *ever retransmitted - reordering. Alas, we cannot use it *when segment was retransmitted. * 2. The last flaw is solved with D-SACK. D-SACK arrives *for retransmitted and already SACKed segment - reordering.. Alexey - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] NET: Accurate packet scheduling for ATM/ADSL (RTAB BUG)
Hello! It shouldn't be. Any decimal number can be expressed as a fraction, eg: I remember this. :-) I stalled selecting corrects divisors to fight over/underflows. Not becuase it was difficult, just because did not see a reason to do this. But doing so would get rid of the table implementation and the flexibility it has given us to date. For that reason I feel uncomfortable with it. The engineering decision becomes this - are there any other protocols like ATM out there that could justify such a change? Is it faster? You say, yes. Is it required? You say, yes. Is there some protocols, which needs more flexibility? No. know a good deal more about them than I do. What say you? Frankly, I seriously believed that rtabs is a good way to handle ATM. :-) I seriously believed that you have to do something like: ((packet_size+cell_payload-1)/cell_payload)*cell_size So, if in reality even this protocol does not justify keeping ratbs, kill them. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
A question about linux/net/ipv4/ipcomp.c
Hello, everyone. I'm having fun reading RFC's and looking through linux source code for implementation examples. What I'm not able to understand is this piece of code : union { struct iphdriph; charbuf[60]; } tmp_iph; and corresponding RFC 791 statement : The maximal internet header is 60 octets. Would you please say why it's 60, and not 52? Well, what I came up with is this : RFC791: The option-length octet counts the option-type octet and the option-length octet as well as the option-data octets. [i.e. options' total length may be up to 2^8/8 octets (32)]. Then, header lenght without options is 20 octets. So, a maximum header length is 32+20=52 octets. RFC791: Internet Header Length is the length of the internet header in 32 bit words... 52 octets is 52*8 bits and it's a multiple of 32. Thanks in advance - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[RTLWS8-CFP] Eighth Real-Time Linux Workshop 2nd CFP
We apologize for multiple receipts. Eighth Real-Time Linux Workshop October 12-15, 2006 Lanzhou University - SISE Tianshui South Road 222 Lanzhou, Gansu 73 P.R.China General Following the meetings of developers and users at the previous 7 successful real-time Linux workshops held in Vienna, Orlando, Milano, Boston, and Valencia, Singapore, Lille, the Real-Time Linux Workshop for 2006 will come back to Asia again, to be held at the School for Information Science and Engineering, Lanzhou University, in Lanzhou China. Embedded and real-time Linux is rapidly gaining traction in the Asia Pacific region. Embedded systems in both automation/control and entertainment moving to 32/64bit systems, opening the door for the use of full featured OS like GNU/Linux on COTS based systems. With real-time capabilities being a common demand for embedded systems the soft and hard real-time variants are an important extension to the versatile GNU/Linux GPOS. Authors are invited to submit original work dealing with general topics related to real-time Linux research, experiments and case studies, as well as issues of integration of real-time and embedded Linux. A special focus will be on industrial case studies. Topics of interest include, but are not limited to: * Modifications and variants of the GNU/Linux operating system extending its real-time capabilities, * Contributions to real-time Linux variants, drivers and extensions, * User-mode real-time concepts, implementation and experience, * Real-time Linux applications, in academia, research and industry, * Work in progress reports, covering recent developments, * Educational material on real-time Linux, * Tools for embedding Linux or real-time Linux and embedded real-time Linux applications, * RTOS core concepts, RT-safe synchronization mechanisms, * RT-safe interaction of RT and non RT components, * IPC mechanisms in RTOS, * Analysis and Benchmarking methods and results of real-time GNU/Linux variants, * Debugging techniques and tools, both for code and temporal debugging of core RTOS components, drivers and real-time applications, * Real-time related extensions to development environments. Further information: EN: http://www.realtimelinuxfoundation.org/events/rtlws-2006/ws.html CN: http://dslab.lzu.edu.cn/rtlws8/index.html Awarded papers The Programme Committee will award a best paper in the category Real- Time Systems Theory. This best paper will be invited for publication to the Real-Time Systems Journal, RTSJ. The Programme Committee will award a best paper in the category Real- Time Systems Application. This best paper will be invited for publication to the Dr Dobbs Journal. Moreover, the publication of the other papers in a special issue of Dr Dobbs Journal is in discussion. Abstract submission In order register an abstract, please go to: http://www.realtimelinuxfoundation.org/rtlf/register-abstract.html Venue Lanzhou University Information Building, School of Information Science and Engineering, Laznhou University, http://www.lzu.edu.cn/. Registration In order to participate to the workshop, please register on the registration page at: http://www.realtimelinuxfoundation.org/rtlf/register-participant.html Accommodation Please refer to the Lanzhou hotel page for accomodation at http://dslab.lzu.edu.cn/rtlws8/hotels/hotels.htm Travel information For travel information and directions how to get to Lanzhou from an international airport in China please refer to: http://www.realtimelinuxfoundation.org/events/rtlws-2006/ Important dates August28: Abstract submission September 15: Notification of acceptance September 29: Final paper Pannel Participants: o Roberto Bucher - Scuola Universitaria Professionale della Svizzera Italiana, Switzerland, RTAI/ADEOS/RTAI-Lab. o Alfons Crespo Lorente - University of Valenica, Spain,Departament d'Informtica de Sistemes i Computadors, XtratuM. o Herman Haertig - Technical University Dresden, Germany,Institute for System Architecture, L4/Fiasco/L4Linux. o Nicholas Mc Guire - Lanzhou University, P.R. China, Distributed and Embedded Systems Lab, RTLinux/GPL. o Douglas Niehaus - University of Kansas, USA, Information and Telecommunication Technology Center, RT-preempt. Organization committee: * Prof. Li LIAN (Co-Chair), (SISE, Lanzhou University, CHINA) * Xiaoping ZHANG, LZU, CHINA *
Re: Netchannles: first stage has been completed. Further ideas.
From: Alexey Kuznetsov [EMAIL PROTECTED] Date: Fri, 21 Jul 2006 02:59:08 +0400 Moving protocol (no matter if it is TCP or not) closer to user allows naturally control the dataflow - when user can read that data(and _this_ is the main goal), user acks, when it can not - it does not generate ack. In theory To all that I rememeber, in theory absence of feedback leads to loss of control yet. The same is in practice, unfortunately. You must say that window is closed, otherwise sender is totally confused. Correct, and too large delay even results in retransmits. You can say that RTT will be adjusted by delay of ACK, but if user context switches cleanly at the beginning, resulting in near immediate ACKs, and then blocks later you will get spurious retransmits. Alexey's example of blocking on a disk write is a good example. I really don't like when pure NULL data sinks are used for benchmarking these kinds of things because real applications 1) touch the data, 2) do something with that data, and 3) have some life outside of TCP! If you optimize an application that does nothing with the data it receives, you have likewise optimized nothing :-) All this talk reminds me of one thing, how expensive tcp_ack() is. And this expense has nothing to do with TCP really. The main cost is purging and freeing up the skbs which have been ACK'd in the retransmit queue. So tcp_ack() sort of inherits the cost of freeing a bunch of SKBs which haven't been touched by the cpu in some time and are thus nearly guarenteed to be cold in the cache. This is the kind of work we could think about batching to user sleeping on some socket call. Also notice that retransmit queue is potentially a good use of an array similar VJ netchannel lockless queue data structure. :) BTW, notice that TSO makes this work touch less skb state. TSO also decreases cpu utilization. I think these two things are no coincidence. :-) I have even toyed with the idea of eventually abstracting the retransmit queue into a pure data representation. The skb_shinfo() page vector is very nearly this already. Or, a less extreme idea where we have fully retained huge TSO skbs, but we do not chop them up to create smaller TSO frames. Instead, we add offset GSO attribute which is used in the clones. Calls to tso_fragment() would be replaced with pure clones and adjustment of skb-len and the new skb-gso_offset in the clone. Rest of the logic would remain identical except that non-linear data would start skb-gso_offset bytes into the skb_shinfo() described area. In this way we could also set tp-xmit_size_goal to it's maximum possible value, always. Actually, I was looking at this the other day and this clamping of xmit_size_goal to 1/2 max_window is extremely dubious. In fact it's downright wrong, only MSS needs this limiting for sender side SWS avoidance. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html