Re: NETPOLL=y , NETDEVICES=n compile error ( Re: 2.6.23-rc1-mm1 )
On Thu, Aug 02, 2007 at 10:59:23AM -0500, Matt Mackall wrote: On Thu, Aug 02, 2007 at 11:00:08AM +0200, Jarek Poplawski wrote: On Wed, Aug 01, 2007 at 09:02:19PM -0500, Matt Mackall wrote: ... How about cc:ing the netpoll maintainer? Is there a new one or do you suggest possibility of abusing the authority of the netpoll's author with such trifles...?! I'm just subtly suggesting that if you're going to have a discussion about netpoll, you ought to cc: me. Thanks! I'm very honored. I've suspected there is some subtlety, but wasn't sure of possible new patches to MAINTAINERS, so tried to be subtle too... There are some notions about other diagnostic tools in some net drivers, eg. 3c509.c, so there would be a little bit of work if, after changing this, they really exist (and even if not - maybe it's reasonable to save such possibility for the future?). I created it for netpoll, only netpoll clients have ever cared. So, probably you're the best person to change this! Alas, it seems, for some time any changes to netpoll could have a cold reception here (pity for Ingo's laptop...). Regards, Jarek P. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch] genirq: fix simple and fasteoi irq handlers
* Jarek Poplawski [EMAIL PROTECTED] wrote: I can't guarantee this is all needed to fix this bug, but I think this patch is necessary here. hmmm ... very interesting! Now _this_ is something we'd like to see tested. Could you send a patch to Marcin that also undoes the workaround we have in place now, so that he could check whether ne2k-pci works fine with your fix alone? Ingo - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: strange tcp behavior
On Thu, Aug 02, 2007 at 07:21:34PM -0700, David Miller ([EMAIL PROTECTED]) wrote: On Thu, Aug 02, 2007 at 10:08:42PM +0400, Evgeniy Polyakov ([EMAIL PROTECTED]) wrote: So, following patch fixes problem for me. Or this one. Essentially the same though. Signed-off-by: Evgeniy Polyakov [EMAIL PROTECTED] So, this bug got introduced partly in 2.3.15, which is when we SMP threaded the networking stack. The error check was present in inet_sendmsg() previously, it looked like this: int inet_sendmsg(struct socket *sock, struct msghdr *msg, int size, struct scm_cookie *scm) { struct sock *sk = sock-sk; if (sk-shutdown SEND_SHUTDOWN) { if (!(msg-msg_flagsMSG_NOSIGNAL)) send_sig(SIGPIPE, current, 1); return(-EPIPE); } This one would caught our problem. if (sk-prot-sendmsg == NULL) return(-EOPNOTSUPP); if(sk-err) return sock_error(sk); And this one too. /* We may need to bind the socket. */ if (inet_autobind(sk) != 0) return -EAGAIN; return sk-prot-sendmsg(sk, msg, size); } I believe the idea was to move the sk-err check down into tcp_sendmsg(). But this raises a major issue. What in the world are we doing allowing stream sockets to autobind? That is totally bogus. Even if we autobind, that won't make a connect happen. For accepted socket it is perfectly valid assumption - we could autobind it during the first send. Or may bind it during accept. Its a matter of taste I think. Autobinding during first sending can end up being a protection against DoS in some obscure rare case... There is logic down in TCP to handle all of these details properly as long as we don't do this bogus autobind stuff. Yes, TCP sending function will catch this problems. do_tcp_sendpages() and tcp_sendmsg() both invoke sk_stream_wait_connect() if TCP is in a state where data sending is not possible. Inside of sk_stream_wait_connect() it handles socket errors as first priority, then if no socket errors are pending it checks if we are trying to connect currently and if not returns -EPIPE. It is exactly what we want under these circumstances. So the bug is purely that autobind is attempted for TCP sockets at all. TCP's sendpage handles this correctly already, it calls directly down into tcp_sendpage(), inet_sendpage() is not used at all. So the fix is to make tcp_sendmsg() direct as well, that bypasses all of this autobind madness. The error checking and state verification in TCP's sendmsg() and sendpage() implementations will do the right thing. Comments? Signed-off-by: David S. Miller [EMAIL PROTECTED] diff --git a/include/net/tcp.h b/include/net/tcp.h index c209361..185c7ec 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -281,7 +281,7 @@ extern int tcp_v4_remember_stamp(struct sock *sk); extern int tcp_v4_tw_remember_stamp(struct inet_timewait_sock *tw); -extern int tcp_sendmsg(struct kiocb *iocb, struct sock *sk, +extern int tcp_sendmsg(struct kiocb *iocb, struct socket *sock, struct msghdr *msg, size_t size); Maybe recvmsg should be changed too for symmetry? -- Evgeniy Polyakov - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: strange tcp behavior
On Thu, Aug 02, 2007 at 07:58:03PM +0100, Simon Arlott ([EMAIL PROTECTED]) wrote: 19:24:32.897071 IP 192.168.7.4.5 192.168.7.8.2500: S 705362199:705362199(0) win 1500 19:24:32.897211 IP 192.168.7.8.2500 192.168.7.4.5: S 4159455228:4159455228(0) ack 705362200 win 14360 mss 7180 19:24:32.920784 IP 192.168.7.4.5 192.168.7.8.2500: . ack 1 win 1500 19:24:32.921732 IP 192.168.7.4.5 192.168.7.8.2500: P 1:17(16) ack 1 win 1500 19:24:32.921795 IP 192.168.7.8.2500 192.168.7.4.5: . ack 17 win 14360 19:24:32.922881 IP 192.168.7.4.5 192.168.7.8.2500: R 705362216:705362216(0) win 1500 19:24:34.927717 IP 192.168.7.8.2500 192.168.7.4.5: R 1:1(0) ack 17 win 14360 According to RFC 793, the RST from .4 means that the connection is CLOSED. RFC 2525 - common tcp problems, says we should send RST in this case, although it does not specify should we send it if socket is in CLOSED state or not. Well, we send :) Even if tcp_send_active_reset() will check if socket is in CLOSED state and will not send data, but is still there, it will not be easily triggered though, but it can be possible. -- Evgeniy Polyakov - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 0/5][RFC] Update network drivers to use devres
On Fri, Aug 03, 2007 at 09:58:57AM +0100, Stephen Hemminger wrote: On Thu, 2 Aug 2007 15:42:06 -0700 Brandon Philips [EMAIL PROTECTED] wrote: This patch set adds support for devres in the net core and converts the e100 and e1000 drivers to devres. Devres is a simple resource manager for device drivers, see Documentation/driver-model/devres.txt for more information. The use of devres will remain optional for drivers with this patch set. Drivers can be converted when it makes sense. Just because devres exists is not sufficient motivation to change. It seems that devres was a band-aid rather than fixing storage drivers to have proper DMA lifetimes. I don't really get what you mean by having proper DMA lifetimes but please don't write devres off too fast. devres doesn't solve any problem that you can't fix without it but it does make the 'solving' much easier. IMHO, libata drivers generally have been well maintained and reviewed but I could still find quite a few bugs (resource leaks or occasionally double free) in init failure and removal paths. Init failure paths are especially prone to bugs because they don't get excercised often. It's just very easy to make a mistake and fail to notice and low level drivers don't always get sufficient amount of review or testing. Skimming through drivers... via-rhine doesn't disable PCI device on init failure path but does so on removal. sky2 doesn't free consistent memory if sky2_init() fails. acenic calls iounmap() with NULL parameter which I'm not sure whether it's safe or not. natsemi doesn't disable PCI device on failure or removal. Devres makes low level drivers simpler, easier to get right and maintain. Writing new drivers becomes easier too. So, why not? Network devices seem to work fine thanks, and the resource requirements are different. If ain't broke, don't fix it. Care to enlighten me on how the resource requirments are different from ATA drivers? Thanks. -- tejun - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] TCP: H-TCP maxRTT estimation at startup
Small patch to H-TCP from Douglas Leith. Fix estimation of maxRTT. The original code ignores rtt measurements during slow start (via the check tp-snd_ssthresh 0x) yet this is probably a good time to try to estimate max rtt as delayed acking is disabled and slow start will only exit on a loss which presumably corresponds to a maxrtt measurement. Second, the original code (via the check htcp_ccount(ca) 3) ignores rtt data during what it estimates to be the first 3 round-trip times. This seems like an unnecessary check now that the RCV timestamp are no longer used for rtt estimation. Signed-off-by: Stephen Hemminger [EMAIL PROTECTED] --- a/net/ipv4/tcp_htcp.c 2007-08-03 10:51:51.0 +0100 +++ b/net/ipv4/tcp_htcp.c 2007-08-03 10:51:53.0 +0100 @@ -79,7 +79,6 @@ static u32 htcp_cwnd_undo(struct sock *s static inline void measure_rtt(struct sock *sk, u32 srtt) { const struct inet_connection_sock *icsk = inet_csk(sk); - const struct tcp_sock *tp = tcp_sk(sk); struct htcp *ca = inet_csk_ca(sk); /* keep track of minimum RTT seen so far, minRTT is zero at first */ @@ -87,8 +86,7 @@ static inline void measure_rtt(struct so ca-minRTT = srtt; /* max RTT */ - if (icsk-icsk_ca_state == TCP_CA_Open -tp-snd_ssthresh 0x htcp_ccount(ca) 3) { + if (icsk-icsk_ca_state == TCP_CA_Open) { if (ca-maxRTT ca-minRTT) ca-maxRTT = ca-minRTT; if (ca-maxRTT srtt - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [REGRESSION] tg3 dead after s2ram
On Thursday 02 August 2007 21:10:29 Michael Chan wrote: Alternatively, we can also fix it by calling pci_enable_device() again in tg3_open(). But I think it is better to just always save and restore in suspend/resume. bnx2.c will also require the same fix. Thanks Joachim for helping to debug this problem. Please try this patch: Patch works for me. -Joachim - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Distributed storage.
On Fri, Aug 03, 2007 at 02:26:29PM +0400, Evgeniy Polyakov ([EMAIL PROTECTED]) wrote: Memory deadlock is a concern of course. From a cursory glance through, it looks like this code is pretty vm-friendly and you have thought quite a lot about it, however I respectfully invite peterz (obsessive/compulsive memory deadlock hunter) to help give it a good going over with me. Another major issue is network allocations. Your initial work and subsequent releases made by Peter were originally opposed on my side, but now I think the right way is to use both positive moments from your approach and specialized allocator - essentially what I proposed (in the blog only though) is to bind a independent reserve for any socket - such a reserve can be stolen from socket buffer itself (each socket has a limited socket buffer where packets are allocated from, it accounts both data and control (skb) lengths), so when main allocation via common path fails, it would be possible to get data from own reserve. This allows sending sockets to make a progress in case of deadlock. For receiving situation is worse, since system does not know in advance to which socket given packet will belong to, so it must allocate from global pool (and thus there must be independent global reserve), and then exchange part of the socket's reserve to the global one (or just copy packet to the new one, allocated from socket's reseve is it was setup, or drop it otherwise). Global independent reserve is what I proposed when stopped to advertise network allocator, but it seems that it was not taken into account, and reserve was always allocated only when system has serious memory pressure in Peter's patches without any meaning for per-socket reservation. It allows to separate sockets and effectively make them fair - system administrator or programmer can limit socket's buffer a bit and request a reserve for special communication channels, which will have guaranteed ability to have both sending and receiving progress, no matter how many of them were setup. And it does not require any changes behind network side. -- Evgeniy Polyakov - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 0/5][RFC] Update network drivers to use devres
On Fri, 3 Aug 2007 19:26:45 +0900 Tejun Heo [EMAIL PROTECTED] wrote: On Fri, Aug 03, 2007 at 09:58:57AM +0100, Stephen Hemminger wrote: On Thu, 2 Aug 2007 15:42:06 -0700 Brandon Philips [EMAIL PROTECTED] wrote: This patch set adds support for devres in the net core and converts the e100 and e1000 drivers to devres. Devres is a simple resource manager for device drivers, see Documentation/driver-model/devres.txt for more information. The use of devres will remain optional for drivers with this patch set. Drivers can be converted when it makes sense. Just because devres exists is not sufficient motivation to change. It seems that devres was a band-aid rather than fixing storage drivers to have proper DMA lifetimes. I don't really get what you mean by having proper DMA lifetimes but please don't write devres off too fast. devres doesn't solve any problem that you can't fix without it but it does make the 'solving' much easier. IMHO, libata drivers generally have been well maintained and reviewed but I could still find quite a few bugs (resource leaks or occasionally double free) in init failure and removal paths. Init failure paths are especially prone to bugs because they don't get excercised often. It's just very easy to make a mistake and fail to notice and low level drivers don't always get sufficient amount of review or testing. Skimming through drivers... via-rhine doesn't disable PCI device on init failure path but does so on removal. sky2 doesn't free consistent memory if sky2_init() fails. acenic calls iounmap() with NULL parameter which I'm not sure whether it's safe or not. natsemi doesn't disable PCI device on failure or removal. Did you report these to the developers? Devres makes low level drivers simpler, easier to get right and maintain. Writing new drivers becomes easier too. So, why not? Network devices seem to work fine thanks, and the resource requirements are different. If ain't broke, don't fix it. Care to enlighten me on how the resource requirments are different from ATA drivers? I was thinking of the hot remove (no mod ref counts) and lingering /sys open issues. ATA drivers use ref counts. My take on devres is that it is similar to talloc() for device drivers. Not a bad idea in itself, but the real advantage of hierarchical allocation is that it makes exception handling easier if things are layered deeply. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: strange tcp behavior
On Fri, Aug 03, 2007 at 12:21:46PM +0100, Simon Arlott ([EMAIL PROTECTED]) wrote: Since the connection is considered closed, couldn't another socket re-use it? Socket A: Recv data (unread) Socket A: Recv RST Socket B: Reuses connection (same IPs/ports) Socket A: Close Wouldn't that disrupt socket B's use of the connection? Then it will drop our data, since there were no appropriate handhsake. -- Simon Arlott -- Evgeniy Polyakov - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch] genirq: fix simple and fasteoi irq handlers
2007/8/3, Jarek Poplawski [EMAIL PROTECTED]: On Fri, Aug 03, 2007 at 10:04:08AM +0200, Ingo Molnar wrote: * Jarek Poplawski [EMAIL PROTECTED] wrote: I can't guarantee this is all needed to fix this bug, but I think this patch is necessary here. hmmm ... very interesting! Now _this_ is something we'd like to see tested. Could you send a patch to Marcin that also undoes the workaround we have in place now, so that he could check whether ne2k-pci works fine with your fix alone? I'm not sure this is needed... Marcin got this patch, I hope, and I don't have another possibility to contact with him. Since he managed with this bisection and all the previous patches I don't think there could be any problems, so: Marcin! I'd be very glad if you could test this patch alone; this should apply without any problems to 2.6.21 (with some offset) and later vanilla versions (or try to revert Ingo's last patch with patch -p1 -R). Please, contact me on any problems (alas not during the weekend...). I'll test this patch tomorrow (and confirm that the last one from Ingo works fine) and report results on monday (sorry, no internet at home since I moved out of city :|). Marcin - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch] genirq: fix simple and fasteoi irq handlers
On Fri, Aug 03, 2007 at 01:57:00PM +0200, Marcin Ślusarz wrote: ... I'll test this patch tomorrow (and confirm that the last one from Ingo works fine) and report results on monday (sorry, no internet at home since I moved out of city :|). So, you are a lucky guy! I have only no internet at home. ...and time for dreaming about moving out of a city... Cheers, Jarek P. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Distributed storage.
On Fri, 2007-08-03 at 14:57 +0400, Evgeniy Polyakov wrote: For receiving situation is worse, since system does not know in advance to which socket given packet will belong to, so it must allocate from global pool (and thus there must be independent global reserve), and then exchange part of the socket's reserve to the global one (or just copy packet to the new one, allocated from socket's reseve is it was setup, or drop it otherwise). Global independent reserve is what I proposed when stopped to advertise network allocator, but it seems that it was not taken into account, and reserve was always allocated only when system has serious memory pressure in Peter's patches without any meaning for per-socket reservation. This is not true. I have a global reserve which is set-up a priori. You cannot allocate a reserve when under pressure, that does not make sense. Let me explain my approach once again. At swapon(8) time we allocate a global reserve. And associate the needed sockets with it. The size of this global reserve is make up of two parts: - TX - RX The RX pool is the most interresting part. It again is made up of two parts: - skb - auxilary data The skb part is scaled such that it can overflow the IP fragment reassembly, the aux pool such that it can overflow the route cache (that was the largest other allocator in the RX path) All (reserve) RX skb allocations are accounted, so as to never allocate more than we reserved. All packets are received (given the limit) and are processed up to socket demux. At that point all packets not targeted at an associated socket are dropped and the skb memory freed - ready for another packet. All packets targeted for associated sockets get processed. This requires that this packet processing happens in-kernel. Since we are swapping user-space might be waiting for this data, and we'd deadlock. I'm not quite sure why you need per socket reservations. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 0/5][RFC] Update network drivers to use devres
On Fri, 03 Aug 2007 20:33:04 +0900 Tejun Heo [EMAIL PROTECTED] wrote: Hello, Stephen Hemminger wrote: Skimming through drivers... via-rhine doesn't disable PCI device on init failure path but does so on removal. sky2 doesn't free consistent memory if sky2_init() fails. acenic calls iounmap() with NULL parameter which I'm not sure whether it's safe or not. natsemi doesn't disable PCI device on failure or removal. Did you report these to the developers? Just skimmed through. I'm pretty sure Brandon will pick those up later. Devres makes low level drivers simpler, easier to get right and maintain. Writing new drivers becomes easier too. So, why not? Network devices seem to work fine thanks, and the resource requirements are different. If ain't broke, don't fix it. Care to enlighten me on how the resource requirments are different from ATA drivers? I was thinking of the hot remove (no mod ref counts) and lingering /sys open issues. ATA drivers use ref counts. I guess the hot removing is done by severing netdev from the actual device, right? I don't see how that affects usage of devres on network drivers. Am I missing something? The issue is that device may be removed at any time. So you can't rely on module ref counts to save you. And netdevice structure must still linger after module is removed, till dev ref count goes to zero. On a separate note, can you explain lingering /sys open issue to me a bit? With recent sysfs changes, sysfs nodes are disconnected immediately on deletion. Would that make any difference to netdevs? Examples are in Documentation/networking/netdevices.txt My take on devres is that it is similar to talloc() for device drivers. Not a bad idea in itself, but the real advantage of hierarchical allocation is that it makes exception handling easier if things are layered deeply. Yeah, devres made layering easier in libata, especially SFF stuff. Dunno how much of that is applicable to netdev but, with or without layering, it'll be a nice cleanup and I don't see much negative side. Conversion would take some work and bugs might be introduced in the process as with any changes but the good thing about devres is that you're very likely to get failure/release paths right if you get the init path right, and if you get the init path wrong, it will stand out like a sore thumb - easy to spot, easy to fix. So, I think using devres on net drivers is a good idea, well, for that matter, for any driver, but me being the devres writer, that isn't really surprising, is it? Thanks. -- tejun - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] lro: eHEA example how to use LRO
This patch shows how the generic LRO interface is used for SKB mode Signed-off-by: Jan-Bernd Themann [EMAIL PROTECTED] --- drivers/net/Kconfig |1 + drivers/net/ehea/ehea.h |9 - drivers/net/ehea/ehea_ethtool.c | 15 +++ drivers/net/ehea/ehea_main.c| 84 +++--- 4 files changed, 101 insertions(+), 8 deletions(-) diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig index f8a602c..fec4004 100644 --- a/drivers/net/Kconfig +++ b/drivers/net/Kconfig @@ -2399,6 +2399,7 @@ config CHELSIO_T3 config EHEA tristate eHEA Ethernet support depends on IBMEBUS + select INET_LRO ---help--- This driver supports the IBM pSeries eHEA ethernet adapter. diff --git a/drivers/net/ehea/ehea.h b/drivers/net/ehea/ehea.h index d67f97b..70e33fe 100644 --- a/drivers/net/ehea/ehea.h +++ b/drivers/net/ehea/ehea.h @@ -33,13 +33,14 @@ #include linux/ethtool.h #include linux/vmalloc.h #include linux/if_vlan.h +#include linux/inet_lro.h #include asm/ibmebus.h #include asm/abs_addr.h #include asm/io.h #define DRV_NAME ehea -#define DRV_VERSIONEHEA_0073 +#define DRV_VERSIONEHEA_0074 /* eHEA capability flags */ #define DLPAR_PORT_ADD_REM 1 @@ -58,6 +59,7 @@ #define EHEA_SMALL_QUEUES #define EHEA_NUM_TX_QP 1 +#define EHEA_LRO_MAX_AGGR 64 #ifdef EHEA_SMALL_QUEUES #define EHEA_MAX_CQE_COUNT 1023 @@ -84,6 +86,8 @@ #define EHEA_RQ2_PKT_SIZE 1522 #define EHEA_L_PKT_SIZE 256/* low latency */ +#define MAX_LRO_DESCRIPTORS 8 + /* Send completion signaling */ /* Protection Domain Identifier */ @@ -376,6 +380,8 @@ struct ehea_port_res { u64 tx_packets; u64 rx_packets; u32 poll_counter; + struct net_lro_mgr lro_mgr; + struct net_lro_desc lro_desc[MAX_LRO_DESCRIPTORS]; }; @@ -427,6 +433,7 @@ struct ehea_port { u32 msg_enable; u32 sig_comp_iv; u32 state; + u32 lro_max_aggr; u8 full_duplex; u8 autoneg; u8 num_def_qps; diff --git a/drivers/net/ehea/ehea_ethtool.c b/drivers/net/ehea/ehea_ethtool.c index decec8c..29ef7a9 100644 --- a/drivers/net/ehea/ehea_ethtool.c +++ b/drivers/net/ehea/ehea_ethtool.c @@ -183,6 +183,9 @@ static char ehea_ethtool_stats_keys[][ETH_GSTRING_LEN] = { {PR5 free_swqes}, {PR6 free_swqes}, {PR7 free_swqes}, + {LRO aggregated}, + {LRO flushed}, + {LRO no_desc}, }; static void ehea_get_strings(struct net_device *dev, u32 stringset, u8 *data) @@ -239,6 +242,18 @@ static void ehea_get_ethtool_stats(struct net_device *dev, for (k = 0; k 8; k++) data[i++] = atomic_read(port-port_res[k].swqe_avail); + for (k = 0, tmp = 0; k EHEA_MAX_PORT_RES; k++) + tmp |= port-port_res[k].lro_mgr.stats.aggregated; + data[i++] = tmp; + + for (k = 0, tmp = 0; k EHEA_MAX_PORT_RES; k++) + tmp |= port-port_res[k].lro_mgr.stats.flushed; + data[i++] = tmp; + + for (k = 0, tmp = 0; k EHEA_MAX_PORT_RES; k++) + tmp |= port-port_res[k].lro_mgr.stats.no_desc; + data[i++] = tmp; + } const struct ethtool_ops ehea_ethtool_ops = { diff --git a/drivers/net/ehea/ehea_main.c b/drivers/net/ehea/ehea_main.c index 9756211..fbaa395 100644 --- a/drivers/net/ehea/ehea_main.c +++ b/drivers/net/ehea/ehea_main.c @@ -52,6 +52,8 @@ static int rq2_entries = EHEA_DEF_ENTRIES_RQ2; static int rq3_entries = EHEA_DEF_ENTRIES_RQ3; static int sq_entries = EHEA_DEF_ENTRIES_SQ; static int use_mcs = 0; +static int use_lro = 0; +static int lro_max_aggr = EHEA_LRO_MAX_AGGR; static int num_tx_qps = EHEA_NUM_TX_QP; module_param(msg_level, int, 0); @@ -60,6 +62,8 @@ module_param(rq2_entries, int, 0); module_param(rq3_entries, int, 0); module_param(sq_entries, int, 0); module_param(use_mcs, int, 0); +module_param(use_lro, int, 0); +module_param(lro_max_aggr, int, 0); module_param(num_tx_qps, int, 0); MODULE_PARM_DESC(num_tx_qps, Number of TX-QPS); @@ -77,6 +81,10 @@ MODULE_PARM_DESC(sq_entries, Number of entries for the Send Queue [2^x - 1], x = [6..14]. Default = __MODULE_STRING(EHEA_DEF_ENTRIES_SQ) )); MODULE_PARM_DESC(use_mcs, 0:NAPI, 1:Multiple receive queues, Default = 1 ); +MODULE_PARM_DESC(lro_max_aggr, LRO: Max packets to be aggregated. Default = +__MODULE_STRING(EHEA_LRO_MAX_AGGR)); +MODULE_PARM_DESC(use_lro, Large Receive Offload, 1: enable, 0: disable, + Default = 0); static int port_name_cnt = 0; static LIST_HEAD(adapter_list); @@ -389,6 +397,60 @@ static int ehea_treat_poll_error(struct ehea_port_res *pr, int rq, return 0; } +static int get_skb_hdr(struct sk_buff *skb, void **iphdr, + void **tcph, u64 *hdr_flags, void *priv) +{ + struct ehea_cqe *cqe = priv; + unsigned int ip_len; + struct iphdr *iph; + +
[PATCH 1/1] lro: Generic Large Receive Offload for TCP traffic
This patch provides generic Large Receive Offload (LRO) functionality for IPv4/TCP traffic. LRO combines received tcp packets to a single larger tcp packet and passes them then to the network stack in order to increase performance (throughput). The interface supports two modes: Drivers can either pass SKBs or fragment lists to the LRO engine. Signed-off-by: Jan-Bernd Themann [EMAIL PROTECTED] --- include/linux/inet_lro.h | 177 ++ net/ipv4/Kconfig |8 + net/ipv4/Makefile|1 + net/ipv4/inet_lro.c | 600 ++ 4 files changed, 786 insertions(+), 0 deletions(-) create mode 100644 include/linux/inet_lro.h create mode 100644 net/ipv4/inet_lro.c diff --git a/include/linux/inet_lro.h b/include/linux/inet_lro.h new file mode 100644 index 000..e1fc1d1 --- /dev/null +++ b/include/linux/inet_lro.h @@ -0,0 +1,177 @@ +/* + * linux/include/linux/inet_lro.h + * + * Large Receive Offload (ipv4 / tcp) + * + * (C) Copyright IBM Corp. 2007 + * + * Authors: + * Jan-Bernd Themann [EMAIL PROTECTED] + * Christoph Raisch [EMAIL PROTECTED] + * + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2, or (at your option) + * any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. + */ + +#ifndef __INET_LRO_H_ +#define __INET_LRO_H_ + +#include net/ip.h +#include net/tcp.h + +/* + * LRO statistics + */ + +struct net_lro_stats { + unsigned long aggregated; + unsigned long flushed; + unsigned long no_desc; +}; + +/* + * LRO descriptor for a tcp session + */ +struct net_lro_desc { + struct sk_buff *parent; + struct sk_buff *last_skb; + struct skb_frag_struct *next_frag; + struct iphdr *iph; + struct tcphdr *tcph; + struct vlan_group *vgrp; + __wsum data_csum; + u32 tcp_rcv_tsecr; + u32 tcp_rcv_tsval; + u32 tcp_ack; + u32 tcp_next_seq; + u32 skb_tot_frags_len; + u16 ip_tot_len; + u16 tcp_saw_tstamp; /* timestamps enabled */ + u16 tcp_window; + u16 vlan_tag; + int pkt_aggr_cnt; /* counts aggregated packets */ + int vlan_packet; + int mss; + int active; +}; + +/* + * Large Receive Offload (LRO) Manager + * + * Fields must be set by driver + */ + +struct net_lro_mgr { + struct net_device *dev; + struct net_lro_stats stats; + + /* LRO features */ + unsigned long features; +#define LRO_F_NAPI1 /* Pass packets to stack via NAPI */ +#define LRO_F_EXTRACT_VLAN_ID 2 /* Set flag if VLAN IDs are extracted + from received packets and eth protocol + is still ETH_P_8021Q */ + + u32 ip_summed; /* Set in non generated SKBs in page mode */ + u32 ip_summed_aggr; /* Set in aggregated SKBs: CHECKSUM_UNNECESSARY +* or CHECKSUM_NONE */ + + int max_desc; /* Max number of LRO descriptors */ + int max_aggr; /* Max number of LRO packets to be aggregated */ + + struct net_lro_desc *lro_arr; /* Array of LRO descriptors */ + + /* +* Optimized driver functions +* +* get_skb_header: returns tcp and ip header for packet in SKB +*/ + int (*get_skb_header)(struct sk_buff *skb, void **ip_hdr, + void **tcpudp_hdr, u64 *hdr_flags, void *priv); + + /* hdr_flags: */ +#define LRO_IPV4 1 /* ip_hdr is IPv4 header */ +#define LRO_TCP 2 /* tcpudp_hdr is TCP header */ + + /* +* get_frag_header: returns mac, tcp and ip header for packet in SKB +* +* @hdr_flags: Indicate what kind of LRO has to be done +* (IPv4/IPv6/TCP/UDP) +*/ + int (*get_frag_header)(struct skb_frag_struct *frag, void **mac_hdr, + void **ip_hdr, void **tcpudp_hdr, u64 *hdr_flags, + void *priv); +}; + +/* + * Processes a SKB + * + * @lro_mgr: LRO manager to use + * @skb: SKB to aggregate + * @priv: Private data that may be used by driver functions + *(for example get_tcp_ip_hdr) + */ + +void lro_receive_skb(struct net_lro_mgr *lro_mgr, +struct sk_buff *skb, +void *priv); + +/* + * Processes a SKB with VLAN HW acceleration support + */ + +void lro_vlan_hwaccel_receive_skb(struct
[PATCH 0/1] lro: Generic Large Receive Offload for TCP traffic
Hi, I think this patch could be the final version for now. It has been tested on two platforms (power and x86_64) and works very well. Apart from David Miller and Evgeniy Polaykov, we'd like to thank especially Andrew Gallatin for his great reviews and help to make that happen. After some discussion we decided to post the LRO patch separately from the driver patches. Our final driver patches for LRO will be posted later with some additional fixes for upstream inclusion to the netdev git. However, I'll also post our LRO patch for the driver today as an example of how to use this interface. Thanks a lot, Jan-Bernd [PATCH 1/1] lro: Generic Large Receive Offload for TCP traffic Changes to http://www.spinics.net/lists/netdev/msg37084.html 1) Fixed the LRO_MAX_PG_HLEN bug 2) skb-ip_summed can now be defined by driver for aggregated packets 3) The problem that the ramp up for tcp connections between machines with different MTU size (1500 vs 9000) is very slow has been fixed by setting skb-gso_size. 4) Checksum problem for little endian machines has been fixed 5) missing additon of vlan_hdr_len for TCP header determination has been added. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: strange tcp behavior
On Fri, Aug 03, 2007 at 01:03:46PM +0100, Simon Arlott ([EMAIL PROTECTED]) wrote: On Fri, August 3, 2007 12:56, Evgeniy Polyakov wrote: On Fri, Aug 03, 2007 at 12:21:46PM +0100, Simon Arlott ([EMAIL PROTECTED]) wrote: Since the connection is considered closed, couldn't another socket re-use it? Socket A: Recv data (unread) Socket A: Recv RST Socket B: Reuses connection (same IPs/ports) Socket A: Close Wouldn't that disrupt socket B's use of the connection? Then it will drop our data, since there were no appropriate handhsake. Couldn't the sequence numbers be close enough to make the RST valid? It does not matter - if connection is not in synchronized state all unrelated data is dropped, so remote side is only allowed to receive syn flag only, anything else must be dropped. If remote side does not do that, it violates RFC. -- Simon Arlott -- Evgeniy Polyakov - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch]support for USB autosuspend in the asix driver
Hi, this implements support for USB autosuspend in the asix USB ethernet driver. Regards Oliver Signed-off-by: Oliver Neukum [EMAIL PROTECTED] --- --- a/drivers/net/usb/asix.c2007-08-03 13:16:31.0 +0200 +++ b/drivers/net/usb/asix.c2007-08-03 13:17:05.0 +0200 @@ -1474,6 +1474,7 @@ static struct usb_driver asix_driver = { .suspend = usbnet_suspend, .resume = usbnet_resume, .disconnect = usbnet_disconnect, + .supports_autosuspend = 1, }; static int __init asix_init(void) --- a/drivers/net/usb/usbnet.c 2007-08-03 13:16:53.0 +0200 +++ b/drivers/net/usb/usbnet.c 2007-08-03 13:19:31.0 +0200 @@ -588,6 +588,7 @@ static int usbnet_stop (struct net_devic dev-flags = 0; del_timer_sync (dev-delay); tasklet_kill (dev-bh); + usb_autopm_put_interface(dev-intf); return 0; } @@ -601,9 +602,19 @@ static int usbnet_stop (struct net_devic static int usbnet_open (struct net_device *net) { struct usbnet *dev = netdev_priv(net); - int retval = 0; + int retval; struct driver_info *info = dev-driver_info; + if ((retval = usb_autopm_get_interface(dev-intf)) 0) { + if (netif_msg_ifup (dev)) + devinfo (dev, + resumption fail (%d) usbnet usb-%s-%s, %s, + retval, + dev-udev-bus-bus_name, dev-udev-devpath, + info-description); + goto done_nopm; + } + // put into known safe state if (info-reset (retval = info-reset (dev)) 0) { if (netif_msg_ifup (dev)) @@ -657,7 +668,10 @@ static int usbnet_open (struct net_devic // delay posting reads until we're fully open tasklet_schedule (dev-bh); + return retval; done: + usb_autopm_put_interface(dev-intf); +done_nopm: return retval; } @@ -1141,6 +1155,7 @@ usbnet_probe (struct usb_interface *udev dev = netdev_priv(net); dev-udev = xdev; + dev-intf = udev; dev-driver_info = info; dev-driver_name = name; dev-msg_enable = netif_msg_init (msg_level, NETIF_MSG_DRV @@ -1265,12 +1280,18 @@ int usbnet_suspend (struct usb_interface struct usbnet *dev = usb_get_intfdata(intf); if (!dev-suspend_count++) { - /* accelerate emptying of the rx and queues, to avoid + /* +* accelerate emptying of the rx and queues, to avoid * having everything error out. */ netif_device_detach (dev-net); (void) unlink_urbs (dev, dev-rxq); (void) unlink_urbs (dev, dev-txq); + /* +* reattach so runtime management can use and +* wake the device +*/ + netif_device_attach (dev-net); } return 0; } @@ -1280,10 +1301,9 @@ int usbnet_resume (struct usb_interface { struct usbnet *dev = usb_get_intfdata(intf); - if (!--dev-suspend_count) { - netif_device_attach (dev-net); + if (!--dev-suspend_count) tasklet_schedule (dev-bh); - } + return 0; } EXPORT_SYMBOL_GPL(usbnet_resume); --- a/drivers/net/usb/usbnet.h 2007-08-03 13:16:44.0 +0200 +++ b/drivers/net/usb/usbnet.h 2007-08-03 13:17:05.0 +0200 @@ -28,6 +28,7 @@ struct usbnet { /* housekeeping */ struct usb_device *udev; + struct usb_interface*intf; struct driver_info *driver_info; const char *driver_name; wait_queue_head_t *wait; - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 0/5][RFC] Update network drivers to use devres
Hello, Stephen Hemminger wrote: Skimming through drivers... via-rhine doesn't disable PCI device on init failure path but does so on removal. sky2 doesn't free consistent memory if sky2_init() fails. acenic calls iounmap() with NULL parameter which I'm not sure whether it's safe or not. natsemi doesn't disable PCI device on failure or removal. Did you report these to the developers? Just skimmed through. I'm pretty sure Brandon will pick those up later. Devres makes low level drivers simpler, easier to get right and maintain. Writing new drivers becomes easier too. So, why not? Network devices seem to work fine thanks, and the resource requirements are different. If ain't broke, don't fix it. Care to enlighten me on how the resource requirments are different from ATA drivers? I was thinking of the hot remove (no mod ref counts) and lingering /sys open issues. ATA drivers use ref counts. I guess the hot removing is done by severing netdev from the actual device, right? I don't see how that affects usage of devres on network drivers. Am I missing something? On a separate note, can you explain lingering /sys open issue to me a bit? With recent sysfs changes, sysfs nodes are disconnected immediately on deletion. Would that make any difference to netdevs? My take on devres is that it is similar to talloc() for device drivers. Not a bad idea in itself, but the real advantage of hierarchical allocation is that it makes exception handling easier if things are layered deeply. Yeah, devres made layering easier in libata, especially SFF stuff. Dunno how much of that is applicable to netdev but, with or without layering, it'll be a nice cleanup and I don't see much negative side. Conversion would take some work and bugs might be introduced in the process as with any changes but the good thing about devres is that you're very likely to get failure/release paths right if you get the init path right, and if you get the init path wrong, it will stand out like a sore thumb - easy to spot, easy to fix. So, I think using devres on net drivers is a good idea, well, for that matter, for any driver, but me being the devres writer, that isn't really surprising, is it? Thanks. -- tejun - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Distributed storage.
On Fri, Aug 03, 2007 at 02:27:52PM +0200, Peter Zijlstra ([EMAIL PROTECTED]) wrote: On Fri, 2007-08-03 at 14:57 +0400, Evgeniy Polyakov wrote: For receiving situation is worse, since system does not know in advance to which socket given packet will belong to, so it must allocate from global pool (and thus there must be independent global reserve), and then exchange part of the socket's reserve to the global one (or just copy packet to the new one, allocated from socket's reseve is it was setup, or drop it otherwise). Global independent reserve is what I proposed when stopped to advertise network allocator, but it seems that it was not taken into account, and reserve was always allocated only when system has serious memory pressure in Peter's patches without any meaning for per-socket reservation. This is not true. I have a global reserve which is set-up a priori. You cannot allocate a reserve when under pressure, that does not make sense. I probably did not cut enough details - my main position is to allocate per socket reserve from socket's queue, and copy data there from main reserve, all of which are allocated either in advance (global one) or per sockoption, so that there would be no fairness issues what to mark as special and what to not. Say we have a page per socket, each socket can assign a reserve for itself from own memory, this accounts both tx and rx side. Tx is not interesting, it is simple, rx has global reserve (always allocated on startup or sometime way before reclaim/oom)where data is originally received (including skb, shared info and whatever is needed, page is just an exmaple), then it is copied into per-socket reserve and reused for the next packet. Having per-socket reserve allows to have progress in any situation not only in cases where single action must be received/processed, and allows to be completely fair for all users, but not only special sockets, thus admin for example would be allowed to login, ipsec would work and so on... -- Evgeniy Polyakov - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Distributed storage.
Hi Mike. On Fri, Aug 03, 2007 at 12:09:02AM -0400, Mike Snitzer ([EMAIL PROTECTED]) wrote: * storage can be formed on top of remote nodes and be exported simultaneously (iSCSI is peer-to-peer only, NBD requires device mapper and is synchronous) Having the in-kernel export is a great improvement over NBD's userspace nbd-server (extra copy, etc). But NBD's synchronous nature is actually an asset when coupled with MD raid1 as it provides guarantees that the data has _really_ been mirrored remotely. I believe, that the right answer to this is barrier, but not synchronous sending/receiving, which might slow things down noticebly. Barrier must wait until remote side received data and send back a notice. Until acknowledge is received, no one can say if data mirrored or ever received by remote node or not. TODO list currently includes following main items: * redundancy algorithm (drop me a request of your own, but it is highly unlikley that Reed-Solomon based will ever be used - it is too slow for distributed RAID, I consider WEAVER codes) I'd like to better understand where you see DST heading in the area of redundancy.Based on your blog entries: http://tservice.net.ru/~s0mbre/blog/devel/dst/2007_07_24_1.html http://tservice.net.ru/~s0mbre/blog/devel/dst/2007_07_31_2.html (and your todo above) implementing a mirroring algorithm appears to be a near-term goal for you. Can you comment on how your intended implementation would compare, in terms of correctness and efficiency, to say MD (raid1) + NBD? MD raid1 has a write intent bitmap that is useful to speed resyncs; what if any mechanisms do you see DST embracing to provide similar and/or better reconstruction infrastructure? Do you intend to embrace any exisiting MD or DM infrastructure? Depending on what algorithm will be preferred - I do not want mirroring, it is _too_ wasteful in terms of used storage, but it is the simplest. Right now I still consider WEAVER codes as the fastest in distributed envornment from what I checked before, but it is quite complex and spec is (at least for me) not clear in all aspects right now. I did not even start userspace implementation of that codes. (Hint: spec sucks, kidding :) For simple mirroring each node must be split to chunks, each one has representation bin in main node mask, when dirty full chunk is resynced. Depending on node size and amount of memory chunk size varies. Setup is performed during node initialization. Having checksum for each chunk is a good step. All interfaces are already there, although require cleanup and move from place to place, but I decided to make initial release small. BTW, you have definitely published some very compelling work and its sad that you're predisposed to think DST won't be recieved well if you pushed for inclusion (for others, as much was said in the 7.31.2007 blog post I referenced above). Clearly others need to embrace DST to help inclusion become a reality. To that end, its great to see that Daniel Phillips and the other zumastor folks will be putting DST through its paces. In that blog entry I misspelled Zen with Xen - that's an error, according to prognosis - time will judge :) regards, Mike - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html -- Evgeniy Polyakov - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 0/5][RFC] Update network drivers to use devres
On Thu, 2 Aug 2007 15:42:06 -0700 Brandon Philips [EMAIL PROTECTED] wrote: This patch set adds support for devres in the net core and converts the e100 and e1000 drivers to devres. Devres is a simple resource manager for device drivers, see Documentation/driver-model/devres.txt for more information. The use of devres will remain optional for drivers with this patch set. Drivers can be converted when it makes sense. Just because devres exists is not sufficient motivation to change. It seems that devres was a band-aid rather than fixing storage drivers to have proper DMA lifetimes. Network devices seem to work fine thanks, and the resource requirements are different. If ain't broke, don't fix it. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch] genirq: fix simple and fasteoi irq handlers
On Fri, Aug 03, 2007 at 10:04:08AM +0200, Ingo Molnar wrote: * Jarek Poplawski [EMAIL PROTECTED] wrote: I can't guarantee this is all needed to fix this bug, but I think this patch is necessary here. hmmm ... very interesting! Now _this_ is something we'd like to see tested. Could you send a patch to Marcin that also undoes the workaround we have in place now, so that he could check whether ne2k-pci works fine with your fix alone? I'm not sure this is needed... Marcin got this patch, I hope, and I don't have another possibility to contact with him. Since he managed with this bisection and all the previous patches I don't think there could be any problems, so: Marcin! I'd be very glad if you could test this patch alone; this should apply without any problems to 2.6.21 (with some offset) and later vanilla versions (or try to revert Ingo's last patch with patch -p1 -R). Please, contact me on any problems (alas not during the weekend...). Thanks, Jarek P. PS: of course, I'm very curious of this testing too, but, on the other hand, as I've written earlier, I think this patch is needed for logical reasons only, and it really doesn't look like it could make any damage here. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: strange tcp behavior
On 03/08/07 13:09, Evgeniy Polyakov wrote: On Fri, Aug 03, 2007 at 01:03:46PM +0100, Simon Arlott ([EMAIL PROTECTED]) wrote: On Fri, August 3, 2007 12:56, Evgeniy Polyakov wrote: On Fri, Aug 03, 2007 at 12:21:46PM +0100, Simon Arlott ([EMAIL PROTECTED]) wrote: Since the connection is considered closed, couldn't another socket re-use it? Socket A: Recv data (unread) Socket A: Recv RST Socket B: Reuses connection (same IPs/ports) Socket A: Close Wouldn't that disrupt socket B's use of the connection? Then it will drop our data, since there were no appropriate handhsake. Couldn't the sequence numbers be close enough to make the RST valid? It does not matter - if connection is not in synchronized state all unrelated data is dropped, so remote side is only allowed to receive syn flag only, anything else must be dropped. If remote side does not do that, it violates RFC. Except the remote side has a connection, because another one can be made before the existing connection is closed: 17:37:37.377571 IP 192.168.7.4.50550 192.168.7.8.2500: S 134077329:134077329(0) win 1500 (raw) 17:37:37.382352 IP 192.168.7.8.2500 192.168.7.4.50550: S 3460060233:3460060233(0) ack 134077330 win 14360 mss 7180 (accept) 17:37:37.377966 IP 192.168.7.4.50550 192.168.7.8.2500: . ack 1 win 1500 (raw) 17:37:37.378128 IP 192.168.7.4.50550 192.168.7.8.2500: P 1:17(16) ack 1 win 1500 (raw) 17:37:37.378162 IP 192.168.7.8.2500 192.168.7.4.50550: . ack 17 win 14360 17:37:37.378131 IP 192.168.7.4.50550 192.168.7.8.2500: R 134077346:134077346(0) win 1500 (raw) 17:37:37.412709 IP 192.168.7.4.50550 192.168.7.8.2500: SWE 3257207813:3257207813(0) win 14280 mss 7140,sackOK,timestamp 3601441543 0,nop,wscale 5 (connect) 17:37:37.412785 IP 192.168.7.8.2500 192.168.7.4.50550: SE 3495384256:3495384256(0) ack 3257207814 win 14336 mss 7180,sackOK,timestamp 4294812905 3601441543,nop,wscale 6 (accept) 17:37:37.412960 IP 192.168.7.4.50550 192.168.7.8.2500: . ack 1 win 447 nop,nop,timestamp 3601441543 4294812905 17:37:38.383085 IP 192.168.7.8.2500 192.168.7.4.50550: R 4259643274:4259643274(0) ack 1171836829 win 14360 (close (previous connection)) 17:37:47.417649 IP 192.168.7.8.2500 192.168.7.4.50550: F 1:1(0) ack 1 win 224 nop,nop,timestamp 4294822910 3601441543 (close) 17:37:47.417993 IP 192.168.7.4.50550 192.168.7.8.2500: F 1:1(0) ack 2 win 447 nop,nop,timestamp 3601444045 4294822910 (read returned) 17:37:47.418466 IP 192.168.7.8.2500 192.168.7.4.50550: . ack 2 win 224 nop,nop,timestamp 4294822911 3601444045 The second connection also modified the RST|ACK that was sent compared to no second connection: 17:38:03.532703 IP 192.168.7.4.50550 192.168.7.8.2500: S 82517575:82517575(0) win 1500 (raw) 17:38:03.532832 IP 192.168.7.8.2500 192.168.7.4.50550: S 3495449795:3495449795(0) ack 82517576 win 14360 mss 7180 (accept) 17:38:03.533388 IP 192.168.7.4.50550 192.168.7.8.2500: . ack 1 win 1500 (raw) 17:38:03.533457 IP 192.168.7.4.50550 192.168.7.8.2500: P 1:17(16) ack 1 win 1500 (raw) 17:38:03.533597 IP 192.168.7.8.2500 192.168.7.4.50550: . ack 17 win 14360 17:38:03.533589 IP 192.168.7.4.50550 192.168.7.8.2500: R 82517592:82517592(0) win 1500 (raw) 17:38:04.536277 IP 192.168.7.8.2500 192.168.7.4.50550: R 1:1(0) ack 17 win 14360 (close) 17:38:04.536277 IP 192.168.7.8.2500 192.168.7.4.50550: R 1:1(0) ack 17 win 14360 vs 17:37:38.383085 IP 192.168.7.8.2500 192.168.7.4.50550: R 4259643274:4259643274(0) ack 1171836829 win 14360 What happened there ? On the server, run tcptest-server.c, which waits for 1s on the first connection then 10s on the second connection. On the client, run: iptables -I INPUT -i eth0 -p tcp --dport 50550 -j DROP; ./client; iptables -D INPUT -i eth0 -p tcp --dport 50550 -j DROP; ./tcptest-client (client.c from john's original email) -- Simon Arlott #include sys/types.h #include sys/socket.h #include arpa/inet.h #include poll.h #include fcntl.h #define PORT 2500 #define xerror(str) do { perror(str); exit(1); } while (0) int main(void) { struct sockaddr_in sa; int l, s, tmp; int t = 0; memset(sa, 0, sizeof(sa)); l = socket(PF_INET, SOCK_STREAM, IPPROTO_TCP); if (!l) xerror(socket); sa.sin_family = AF_INET; sa.sin_addr.s_addr = htonl(INADDR_ANY); sa.sin_port = htons(PORT); tmp = 1; setsockopt(l, SOL_SOCKET, SO_REUSEADDR, (char*)tmp, sizeof(tmp)); if (bind(l, (struct sockaddr*)sa, sizeof(sa)) != 0) xerror(bind); if (listen(l, 0) != 0) xerror(listen); printf(server %d ready...\n, getpid()); for (t = 1; t = 2; t++) { s = accept(l, NULL, NULL); switch (fork()) { case -1: xerror(fork); break; case 0: switch (t) { case 1: printf(server %d accepted connection\n, getpid()); #if 0 tmp = fcntl(s, F_GETFL, 0); if (fcntl(s, F_SETFL, tmp | O_NONBLOCK) != 0) xerror(fcntl); if (send(s, AAA, 7, 0) != 7) xerror(send); #endif printf(server %d waiting for 1
Re: [patch] genirq: fix simple and fasteoi irq handlers
* Ingo Molnar [EMAIL PROTECTED] wrote: * Jarek Poplawski [EMAIL PROTECTED] wrote: I can't guarantee this is all needed to fix this bug, but I think this patch is necessary here. hmmm ... very interesting! Now _this_ is something we'd like to see tested. Could you send a patch to Marcin that also undoes the workaround we have in place now, so that he could check whether ne2k-pci works fine with your fix alone? or it would be nice if Marcin could test pure 2.6.22 plus your fix (without any other patches applied). Ingo - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[GIT PATCH] ucc_geth fixes for 2.6.22-rc1
Please pull from 'ucc_geth' branch of master.kernel.org:/pub/scm/linux/kernel/git/leo/fsl-soc.git ucc_geth to receive the following fixes: drivers/net/ucc_geth_ethtool.c |1 - drivers/net/ucc_geth_mii.c |3 ++- 2 files changed, 2 insertions(+), 2 deletions(-) Domen Puncer (1): ucc_geth: fix section mismatch Jan Altenberg (1): ucc_geth: remove get_perm_addr from ucc_geth_ethtool.c diff --git a/drivers/net/ucc_geth_ethtool.c b/drivers/net/ucc_geth_ethtool.c index a8994c7..64bef7c 100644 --- a/drivers/net/ucc_geth_ethtool.c +++ b/drivers/net/ucc_geth_ethtool.c @@ -379,7 +379,6 @@ static const struct ethtool_ops uec_ethtool_ops = { .get_stats_count= uec_get_stats_count, .get_strings= uec_get_strings, .get_ethtool_stats = uec_get_ethtool_stats, - .get_perm_addr = ethtool_op_get_perm_addr, }; void uec_set_ethtool_ops(struct net_device *netdev) diff --git a/drivers/net/ucc_geth_mii.c b/drivers/net/ucc_geth_mii.c index 5f8c2d3..6c257b8 100644 --- a/drivers/net/ucc_geth_mii.c +++ b/drivers/net/ucc_geth_mii.c @@ -272,7 +272,8 @@ int __init uec_mdio_init(void) return of_register_platform_driver(uec_mdio_driver); } -void __exit uec_mdio_exit(void) +/* called from __init ucc_geth_init, therefore can not be __exit */ +void uec_mdio_exit(void) { of_unregister_platform_driver(uec_mdio_driver); } - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 2/5][RFC] Update net core to use devres.
+static inline void * register_netdev_devres(struct device *gendev, + struct net_device *dev) +{ + struct net_device **p; + + /* 0 size because we don't need it. The net_device is already alloc'd + * in alloc_netdev_mq. We can't use devm_kzalloc in alloc_netdeev_mq + * because a net_device cannot be free'd directly as it can be a + * kobject. See free_netdev. + */ + p = devres_alloc(devm_free_netdev, 0, GFP_KERNEL); s/0/sizeof(*p)/ -- tejun - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 3/5][RFC] Update e100 driver to use devres.
On Thu, Aug 02, 2007 at 03:45:37PM -0700, Brandon Philips wrote: if((err = pci_request_regions(pdev, DRV_NAME))) { DPRINTK(PROBE, ERR, Cannot obtain PCI resources, aborting.\n); - goto err_out_disable_pdev; + return err; } if((err = pci_set_dma_mask(pdev, DMA_32BIT_MASK))) { DPRINTK(PROBE, ERR, No usable DMA configuration, aborting.\n); - goto err_out_free_res; + return err; } SET_MODULE_OWNER(netdev); @@ -2613,11 +2606,11 @@ static int __devinit e100_probe(struct p if (use_io) DPRINTK(PROBE, INFO, using i/o access mode\n); - nic-csr = pci_iomap(pdev, (use_io ? 1 : 0), sizeof(struct csr)); + nic-csr = pcim_iomap(pdev, (use_io ? 1 : 0), sizeof(struct csr)); if(!nic-csr) { DPRINTK(PROBE, ERR, Cannot map device registers, aborting.\n); err = -ENOMEM; - goto err_out_free_res; + return err; Calls to pci_request_regions() and pcim_iomap() can be merged into pcim_iomap_regions(). Other than that, Acked-by: Tejun Heo [EMAIL PROTECTED] -- tejun - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 1/5][RFC] NET: Change pci_enable_device to pci_reenable_device to keep device enable balance
Brandon Philips wrote: On a slot_reset event pci_disable_device() is never called so calling pci_enable_device() will unbalance the enable count. Signed-off-by: Brandon Philips [EMAIL PROTECTED] Acked-by: Tejun Heo [EMAIL PROTECTED] -- tejun - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 4/5][RFC] Implement devm_kcalloc
On Thu, Aug 02, 2007 at 03:45:45PM -0700, Brandon Philips wrote: /** + * devm_kcalloc - resource-managed kcalloc + * @dev: Device to allocate memory for + * @n: number of elements. + * @size: element size. + * @flags: the type of memory to allocate. + */ +inline void * devm_kcalloc(struct device * dev, size_t n, size_t size, +gfp_t flags) +{ +if (n != 0 size ULONG_MAX / n) +return NULL; +return devm_kzalloc(dev, n * size, flags); +} +EXPORT_SYMBOL_GPL(devm_kcalloc); Please drop inline. It's meaningless. Other than that, Acked-by: Tejun Heo [EMAIL PROTECTED] -- tejun - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: strange tcp behavior
On Fri, August 3, 2007 09:25, Evgeniy Polyakov wrote: On Thu, Aug 02, 2007 at 07:58:03PM +0100, Simon Arlott ([EMAIL PROTECTED]) wrote: 19:24:32.897071 IP 192.168.7.4.5 192.168.7.8.2500: S 705362199:705362199(0) win 1500 19:24:32.897211 IP 192.168.7.8.2500 192.168.7.4.5: S 4159455228:4159455228(0) ack 705362200 win 14360 mss 7180 19:24:32.920784 IP 192.168.7.4.5 192.168.7.8.2500: . ack 1 win 1500 19:24:32.921732 IP 192.168.7.4.5 192.168.7.8.2500: P 1:17(16) ack 1 win 1500 19:24:32.921795 IP 192.168.7.8.2500 192.168.7.4.5: . ack 17 win 14360 19:24:32.922881 IP 192.168.7.4.5 192.168.7.8.2500: R 705362216:705362216(0) win 1500 19:24:34.927717 IP 192.168.7.8.2500 192.168.7.4.5: R 1:1(0) ack 17 win 14360 According to RFC 793, the RST from .4 means that the connection is CLOSED. RFC 2525 - common tcp problems, says we should send RST in this case, although it does not specify should we send it if socket is in CLOSED state or not. Well, we send :) Even if tcp_send_active_reset() will check if socket is in CLOSED state and will not send data, but is still there, it will not be easily triggered though, but it can be possible. Since the connection is considered closed, couldn't another socket re-use it? Socket A: Recv data (unread) Socket A: Recv RST Socket B: Reuses connection (same IPs/ports) Socket A: Close Wouldn't that disrupt socket B's use of the connection? -- Simon Arlott - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 5/5][RFC] Update e1000 driver to use devres.
On Thu, Aug 02, 2007 at 03:45:52PM -0700, Brandon Philips wrote: if ((err = pci_request_regions(pdev, e1000_driver_name))) - goto err_pci_reg; + goto err_dma; Why not just return? Ditto for all goto err_dma's. err = -EIO; - adapter-hw.hw_addr = ioremap(mmio_start, mmio_len); + adapter-hw.hw_addr = devm_ioremap(pdev-dev, mmio_start, mmio_len); This is correct conversion but I have no idea why the origical code did manual ioremap instead of using pci_iomap(). - adapter-hw.flash_address = ioremap(flash_start, flash_len); + adapter-hw.flash_address = devm_ioremap(pdev-dev, + flash_start, + flash_len); Ditto. err_dma: pci_disable_device(pdev); return err; err_dma can be killed. Thanks. -- tejun - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 2/5][RFC] Update net core to use devres.
On 18:13 Fri 03 Aug 2007, Tejun Heo wrote: + p = devres_alloc(devm_free_netdev, 0, GFP_KERNEL); s/0/sizeof(*p)/ Oops! It should have read like this: +static void * register_netdev_devres(struct device *gendev, + struct net_device *dev) +{ + void *p; + + /* 0 size because we don't need it. The net_device is already alloc'd +* in alloc_netdev_mq. We can't use devm_kzalloc in alloc_netdev_mq +* because a net_device cannot be free'd directly as it can be a +* kobject. See free_netdev. +*/ + p = devres_alloc(devm_free_netdev, 0, GFP_KERNEL); + + if (unlikely(!p)) + return NULL; + + devres_add(gendev, p); + + return dev; +} I will send the full correct patch. Thanks, Brandon - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Distributed storage.
Hi. On Fri, Aug 03, 2007 at 09:04:51AM +0400, Manu Abraham ([EMAIL PROTECTED]) wrote: On 7/31/07, Evgeniy Polyakov [EMAIL PROTECTED] wrote: TODO list currently includes following main items: * redundancy algorithm (drop me a request of your own, but it is highly unlikley that Reed-Solomon based will ever be used - it is too slow for distributed RAID, I consider WEAVER codes) LDPC codes[1][2] have been replacing Turbo code[3] with regards to communication links and we have been seeing that transition. (maybe helpful, came to mind seeing the mention of Turbo code) Don't know how weaver compares to LDPC, though found some comparisons [4][5] But looking at fault tolerance figures, i guess Weaver is much better. [1] http://www.ldpc-codes.com/ [2] http://portal.acm.org/citation.cfm?id=1240497 [3] http://en.wikipedia.org/wiki/Turbo_code [4] http://domino.research.ibm.com/library/cyberdig.nsf/papers/BD559022A190D41C85257212006CEC11/$File/rj10391.pdf [5] http://hplabs.hp.com/personal/Jay_Wylie/publications/wylie_dsn2007.pdf Great thanks for this links, I will definitely study them. -- Evgeniy Polyakov - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Distributed storage.
On Thu, Aug 02, 2007 at 02:08:24PM -0700, Daniel Phillips ([EMAIL PROTECTED]) wrote: On Tuesday 31 July 2007 10:13, Evgeniy Polyakov wrote: Hi. I'm pleased to announce first release of the distributed storage subsystem, which allows to form a storage on top of remote and local nodes, which in turn can be exported to another storage as a node to form tree-like storages. Excellent! This is precisely what the doctor ordered for the OCFS2-based distributed storage system I have been mumbling about for some time. In fact the dd in ddsnap and ddraid stands for distributed data. The ddsnap/raid devices do not include an actual network transport, that is expected to be provided by a specialized block device, which up till now has been NBD. But NBD has various deficiencies as you note, in addition to its tendency to deadlock when accessed locally. Your new code base may be just the thing we always wanted. We (zumastor et al) will take it for a drive and see if anything breaks. That would be great. Memory deadlock is a concern of course. From a cursory glance through, it looks like this code is pretty vm-friendly and you have thought quite a lot about it, however I respectfully invite peterz (obsessive/compulsive memory deadlock hunter) to help give it a good going over with me. I see bits that worry me, e.g.: + req = mempool_alloc(st-w-req_pool, GFP_NOIO); which seems to be callable in response to a local request, just the case where NBD deadlocks. Your mempool strategy can work reliably only if you can prove that the pool allocations of the maximum number of requests you can have in flight do not exceed the size of the pool. In other words, if you ever take the pool's fallback path to normal allocation, you risk deadlock. mempool should be allocated to be able to catch up with maximum in-flight requests, in my tests I was unable to force block layer to put more than 31 pages in sync, but in one bio. Each request is essentially dealyed bio processing, so this must handle maximum number of in-flight bios (if they do not cover multiple nodes, if they do, then each node requires own request). Sync has one bio in-flight on my machines (from tiny VIA nodes to low-end amd64), number of normal requests *usually* does not increase several dozens (less than hundred always), but that might be only my small systems, so request size was selected as small as possible and number of allocations decreased to absolutely healthcare minimum. Anyway, if this is as grand as it seems then I would think we ought to factor out a common transfer core that can be used by all of NBD, iSCSI, ATAoE and your own kernel server, in place of the roll-yer-own code those things have now. Regards, Daniel Thanks. -- Evgeniy Polyakov - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/1] lro: Generic Large Receive Offload for TCP traffic
On Fri, 3 August 2007 14:41:19 +0200, Jan-Bernd Themann wrote: This patch provides generic Large Receive Offload (LRO) functionality for IPv4/TCP traffic. LRO combines received tcp packets to a single larger tcp packet and passes them then to the network stack in order to increase performance (throughput). The interface supports two modes: Drivers can either pass SKBs or fragment lists to the LRO engine. Maybe this is a stupid question, but why is LRO done at the device driver level? If it is a unversal performance benefit, I would have expected it to be done generically, i.e. have all packets moved into network layer pass through LRO instead. +void lro_flush_pkt(struct net_lro_mgr *lro_mgr, +struct iphdr *iph, struct tcphdr *tcph); In particular this bit looks like it should be driven by a timeout, which would be settable via /proc/sys/net/core/lro_timeout or similar. Jörn -- Rules of Optimization: Rule 1: Don't do it. Rule 2 (for experts only): Don't do it yet. -- M.A. Jackson - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] lro: myri10ge example how to use LRO
To follow up on Jan-Bernd Themann's LRO patch earlier today, this patch shows how the generic LRO interface can be used for page based drivers. Again, many thanks to Jan-Bernd Themann for leading this effort. Drew Singed off by: Andrew Gallatin [EMAIL PROTECTED] diff -urNp a/drivers/net/myri10ge/myri10ge.c b/drivers/net/myri10ge/myri10ge.c --- a/drivers/net/myri10ge/myri10ge.c 2007-07-24 15:57:12.0 -0400 +++ b/drivers/net/myri10ge/myri10ge.c 2007-08-03 13:07:48.0 -0400 @@ -48,6 +48,7 @@ #include linux/etherdevice.h #include linux/if_ether.h #include linux/if_vlan.h +#include linux/inet_lro.h #include linux/ip.h #include linux/inet.h #include linux/in.h @@ -62,6 +63,8 @@ #include linux/io.h #include linux/log2.h #include net/checksum.h +#include net/ip.h +#include net/tcp.h #include asm/byteorder.h #include asm/io.h #include asm/processor.h @@ -89,6 +92,7 @@ MODULE_LICENSE(Dual BSD/GPL); #define MYRI10GE_EEPROM_STRINGS_SIZE 256 #define MYRI10GE_MAX_SEND_DESC_TSO ((65536 / 2048) * 2) +#define MYRI10GE_MAX_LRO_DESCRIPTORS 8 #define MYRI10GE_NO_CONFIRM_DATA htonl(0x) #define MYRI10GE_NO_RESPONSE_RESULT 0x @@ -151,6 +155,8 @@ struct myri10ge_rx_done { dma_addr_t bus; int cnt; int idx; + struct net_lro_mgr lro_mgr; + struct net_lro_desc lro_desc[MYRI10GE_MAX_LRO_DESCRIPTORS]; }; struct myri10ge_priv { @@ -276,6 +282,14 @@ static int myri10ge_debug = -1;/* defau module_param(myri10ge_debug, int, 0); MODULE_PARM_DESC(myri10ge_debug, Debug level (0=none,...,16=all)); +static int myri10ge_lro = 1; +module_param(myri10ge_lro, int, S_IRUGO); +MODULE_PARM_DESC(myri10ge_lro, Enable large receive offload\n); + +static int myri10ge_lro_max_pkts = MYRI10GE_LRO_MAX_PKTS; +module_param(myri10ge_lro_max_pkts, int, S_IRUGO); +MODULE_PARM_DESC(myri10ge_lro, Number of LRO packets to be aggregated\n); + static int myri10ge_fill_thresh = 256; module_param(myri10ge_fill_thresh, int, S_IRUGO | S_IWUSR); MODULE_PARM_DESC(myri10ge_fill_thresh, Number of empty rx slots allowed\n); @@ -1019,6 +1033,15 @@ myri10ge_rx_done(struct myri10ge_priv *m remainder -= MYRI10GE_ALLOC_SIZE; } + if (mgp-csum_flag myri10ge_lro) { + rx_frags[0].page_offset += MXGEFW_PAD; + rx_frags[0].size -= MXGEFW_PAD; + len -= MXGEFW_PAD; + lro_receive_frags(mgp-rx_done.lro_mgr, rx_frags, + len, len, (void *)(unsigned long)csum, csum); + return 1; + } + hlen = MYRI10GE_HLEN len ? len : MYRI10GE_HLEN; /* allocate an skb to attach the page(s) to. */ @@ -1137,6 +1160,9 @@ static inline void myri10ge_clean_rx_don mgp-stats.rx_packets += rx_packets; mgp-stats.rx_bytes += rx_bytes; + if (myri10ge_lro) + lro_flush_all(rx_done-lro_mgr); + /* restock receive rings if needed */ if (mgp-rx_small.fill_cnt - mgp-rx_small.cnt myri10ge_fill_thresh) myri10ge_alloc_rx_pages(mgp, mgp-rx_small, @@ -1378,7 +1404,8 @@ static const char myri10ge_gstrings_stat dropped_pause, dropped_bad_phy, dropped_bad_crc32, dropped_unicast_filtered, dropped_multicast_filtered, dropped_runt, dropped_overrun, dropped_no_small_buffer, - dropped_no_big_buffer + dropped_no_big_buffer, LRO aggregated, LRO flushed, + LRO avg aggr, LRO no_desc }; #define MYRI10GE_NET_STATS_LEN 21 @@ -1444,6 +1471,14 @@ myri10ge_get_ethtool_stats(struct net_de data[i++] = (unsigned int)ntohl(mgp-fw_stats-dropped_overrun); data[i++] = (unsigned int)ntohl(mgp-fw_stats-dropped_no_small_buffer); data[i++] = (unsigned int)ntohl(mgp-fw_stats-dropped_no_big_buffer); + data[i++] = mgp-rx_done.lro_mgr.stats.aggregated; + data[i++] = mgp-rx_done.lro_mgr.stats.flushed; + if (mgp-rx_done.lro_mgr.stats.flushed) + data[i++] = mgp-rx_done.lro_mgr.stats.aggregated / + mgp-rx_done.lro_mgr.stats.flushed; + else + data[i++] = 0; + data[i++] = mgp-rx_done.lro_mgr.stats.no_desc; } static void myri10ge_set_msglevel(struct net_device *netdev, u32 value) @@ -1717,10 +1752,69 @@ static void myri10ge_free_irq(struct myr pci_disable_msi(pdev); } +static int +myri10ge_get_frag_header(struct skb_frag_struct *frag, void **mac_hdr, +void **ip_hdr, void **tcpudp_hdr, +u64 * hdr_flags, void *priv) +{ + struct ethhdr *eh; + struct vlan_ethhdr *veh; + struct iphdr *iph; + u8 *va = page_address(frag-page) + frag-page_offset; + unsigned long ll_hlen; + __wsum csum = (__wsum) (unsigned long)priv; + + /* find the mac header, aborting if not IPv4 */ + + eh = (struct ethhdr *)va; + *mac_hdr = eh; + ll_hlen = ETH_HLEN; + if (eh-h_proto !=
Re: strange tcp behavior
On Fri, August 3, 2007 12:56, Evgeniy Polyakov wrote: On Fri, Aug 03, 2007 at 12:21:46PM +0100, Simon Arlott ([EMAIL PROTECTED]) wrote: Since the connection is considered closed, couldn't another socket re-use it? Socket A: Recv data (unread) Socket A: Recv RST Socket B: Reuses connection (same IPs/ports) Socket A: Close Wouldn't that disrupt socket B's use of the connection? Then it will drop our data, since there were no appropriate handhsake. Couldn't the sequence numbers be close enough to make the RST valid? -- Simon Arlott - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Distributed storage.
On Fri, 2007-08-03 at 17:49 +0400, Evgeniy Polyakov wrote: On Fri, Aug 03, 2007 at 02:27:52PM +0200, Peter Zijlstra ([EMAIL PROTECTED]) wrote: On Fri, 2007-08-03 at 14:57 +0400, Evgeniy Polyakov wrote: For receiving situation is worse, since system does not know in advance to which socket given packet will belong to, so it must allocate from global pool (and thus there must be independent global reserve), and then exchange part of the socket's reserve to the global one (or just copy packet to the new one, allocated from socket's reseve is it was setup, or drop it otherwise). Global independent reserve is what I proposed when stopped to advertise network allocator, but it seems that it was not taken into account, and reserve was always allocated only when system has serious memory pressure in Peter's patches without any meaning for per-socket reservation. This is not true. I have a global reserve which is set-up a priori. You cannot allocate a reserve when under pressure, that does not make sense. I probably did not cut enough details - my main position is to allocate per socket reserve from socket's queue, and copy data there from main reserve, all of which are allocated either in advance (global one) or per sockoption, so that there would be no fairness issues what to mark as special and what to not. Say we have a page per socket, each socket can assign a reserve for itself from own memory, this accounts both tx and rx side. Tx is not interesting, it is simple, rx has global reserve (always allocated on startup or sometime way before reclaim/oom)where data is originally received (including skb, shared info and whatever is needed, page is just an exmaple), then it is copied into per-socket reserve and reused for the next packet. Having per-socket reserve allows to have progress in any situation not only in cases where single action must be received/processed, and allows to be completely fair for all users, but not only special sockets, thus admin for example would be allowed to login, ipsec would work and so on... Ah, I think I understand now. Yes this is indeed a good idea! It would be quite doable to implement this on top of that I already have. We would need to extend the socket with a sock_opt that would reserve a specified amount of data for that specific socket. And then on socket demux check if the socket has a non zero reserve and has not yet exceeded said reserve. If so, process the packet. This would also quite neatly work for -rt where we would not want incomming packet processing to be delayed by memory allocations. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch 2.6.23-rc1] add xt_statistic.h to the header list for usermode programs
Add xt_statistic.h to the list of headers to install. Apparently needed to build newer versions of iptables. Signed-off-by: Chuck Ebbert [EMAIL PROTECTED] --- include/linux/netfilter/Kbuild |1 + 1 file changed, 1 insertion(+) --- linux-2.6.22.noarch.orig/include/linux/netfilter/Kbuild +++ linux-2.6.22.noarch/include/linux/netfilter/Kbuild @@ -28,6 +28,7 @@ header-y += xt_policy.h header-y += xt_realm.h header-y += xt_sctp.h header-y += xt_state.h +header-y += xt_statistic.h header-y += xt_string.h header-y += xt_tcpmss.h header-y += xt_tcpudp.h - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: strange tcp behavior
On Fri, Aug 03, 2007 at 05:51:42PM +0100, Simon Arlott ([EMAIL PROTECTED]) wrote: 17:38:03.533589 IP 192.168.7.4.50550 192.168.7.8.2500: R 82517592:82517592(0) win 1500 (raw) vs 17:37:38.383085 IP 192.168.7.8.2500 192.168.7.4.50550: R 4259643274:4259643274(0) ack 1171836829 win 14360 What happened there ? You mean what will happend if second rst (4259643274) is close enough to first (82517592) to reset the connection? If this will be session hijiking attack first (known) implemented by Kevin Mitnik. So far things moved forward and sequence number generation algorithm changed a lot. It is the same situation, which would happen if you will spam remote side with RST packets with arbitrary sequence number in hope that it will reset some connection. -- Evgeniy Polyakov - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 02/13] dev-priv to netdev_priv(dev), for drivers/net/appletalk
Replacing accesses to dev-priv to netdev_priv(dev). The replacment is safe when netdev_priv is used to access a private structure that is right next to the net_device structure in memory. Cf http://groups.google.com/group/comp.os.linux.development.system/browse_thread/thread/de19321bcd94dbb8/0d74a4adcd6177bd This is the case when the net_device structure was allocated with a call to alloc_netdev or one of its derivative. Here is an excerpt of the semantic patch that performs the transformation @ rule1 @ type T; struct net_device *dev; @@ dev = ( alloc_netdev | alloc_etherdev | alloc_trdev ) (sizeof(T), ...) @ rule1bis @ struct net_device *dev; expression E; @@ dev-priv = E @ rule2 depends on rule1 !rule1bis @ struct net_device *dev; type rule1.T; @@ - (T*) dev-priv + netdev_priv(dev) Signed-off-by: Yoann Padioleau [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] Cc: netdev@vger.kernel.org Cc: [EMAIL PROTECTED] --- drivers/net/appletalk/ipddp.c |6 +++--- drivers/net/appletalk/ltpc.c |8 2 files changed, 7 insertions(+), 7 deletions(-) diff --git a/drivers/net/appletalk/ipddp.c b/drivers/net/appletalk/ipddp.c index f22e46d..61add0e 100644 --- a/drivers/net/appletalk/ipddp.c +++ b/drivers/net/appletalk/ipddp.c @@ -109,7 +109,7 @@ static struct net_device * __init ipddp_ */ static struct net_device_stats *ipddp_get_stats(struct net_device *dev) { -return dev-priv; +return netdev_priv(dev); } /* @@ -171,8 +171,8 @@ static int ipddp_xmit(struct sk_buff *sk skb-protocol = htons(ETH_P_ATALK); /* Protocol has changed */ - ((struct net_device_stats *) dev-priv)-tx_packets++; -((struct net_device_stats *) dev-priv)-tx_bytes+=skb-len; + ((struct net_device_stats *)netdev_priv(dev))-tx_packets++; +((struct net_device_stats *)netdev_priv(dev))-tx_bytes+=skb-len; if(aarp_send_ddp(rt-dev, skb, rt-at, NULL) 0) dev_kfree_skb(skb); diff --git a/drivers/net/appletalk/ltpc.c b/drivers/net/appletalk/ltpc.c index 6a6cbd3..be12c6b 100644 --- a/drivers/net/appletalk/ltpc.c +++ b/drivers/net/appletalk/ltpc.c @@ -726,7 +726,7 @@ static int sendup_buffer (struct net_dev int dnode, snode, llaptype, len; int sklen; struct sk_buff *skb; - struct net_device_stats *stats = ((struct ltpc_private *)dev-priv)-stats; + struct net_device_stats *stats = ((struct ltpc_private *)netdev_priv(dev))-stats; struct lt_rcvlap *ltc = (struct lt_rcvlap *) ltdmacbuf; if (ltc-command != LT_RCVLAP) { @@ -823,7 +823,7 @@ static int ltpc_ioctl(struct net_device { struct sockaddr_at *sa = (struct sockaddr_at *) ifr-ifr_addr; /* we'll keep the localtalk node address in dev-pa_addr */ - struct atalk_addr *aa = ((struct ltpc_private *)dev-priv)-my_addr; + struct atalk_addr *aa = ((struct ltpc_private *)netdev_priv(dev))-my_addr; struct lt_init c; int ltflags; @@ -913,7 +913,7 @@ static int ltpc_xmit(struct sk_buff *skb * and skb-len is the length of the ddp data + ddp header */ - struct net_device_stats *stats = ((struct ltpc_private *)dev-priv)-stats; + struct net_device_stats *stats = ((struct ltpc_private *)netdev_priv(dev))-stats; int i; struct lt_sendlap cbuf; @@ -952,7 +952,7 @@ static int ltpc_xmit(struct sk_buff *skb static struct net_device_stats *ltpc_get_stats(struct net_device *dev) { - struct net_device_stats *stats = ((struct ltpc_private *) dev-priv)-stats; + struct net_device_stats *stats = ((struct ltpc_private *)netdev_priv(dev))-stats; return stats; } - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 09/13] dev-priv to netdev_priv(dev), for drivers/net/tokenring
Replacing accesses to dev-priv to netdev_priv(dev). The replacment is safe when netdev_priv is used to access a private structure that is right next to the net_device structure in memory. Cf http://groups.google.com/group/comp.os.linux.development.system/browse_thread/thread/de19321bcd94dbb8/0d74a4adcd6177bd This is the case when the net_device structure was allocated with a call to alloc_netdev or one of its derivative. Here is an excerpt of the semantic patch that performs the transformation @ rule1 @ type T; struct net_device *dev; @@ dev = ( alloc_netdev | alloc_etherdev | alloc_trdev ) (sizeof(T), ...) @ rule1bis @ struct net_device *dev; expression E; @@ dev-priv = E @ rule2 depends on rule1 !rule1bis @ struct net_device *dev; type rule1.T; @@ - (T*) dev-priv + netdev_priv(dev) Signed-off-by: Yoann Padioleau [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] Cc: netdev@vger.kernel.org Cc: [EMAIL PROTECTED] --- drivers/net/tokenring/3c359.c | 58 ++-- drivers/net/tokenring/ibmtr.c | 38 +++ drivers/net/tokenring/lanstreamer.c | 32 +-- drivers/net/tokenring/madgemc.c |4 +- drivers/net/tokenring/olympic.c | 36 +++--- drivers/net/tokenring/tmspci.c |4 +- 6 files changed, 86 insertions(+), 86 deletions(-) diff --git a/drivers/net/tokenring/3c359.c b/drivers/net/tokenring/3c359.c index 9f1b6ab..a8573da 100644 --- a/drivers/net/tokenring/3c359.c +++ b/drivers/net/tokenring/3c359.c @@ -156,7 +156,7 @@ static void print_rx_state(struct net_de static void print_tx_state(struct net_device *dev) { - struct xl_private *xl_priv = (struct xl_private *)dev-priv ; + struct xl_private *xl_priv = netdev_priv(dev) ; struct xl_tx_desc *txd ; u8 __iomem *xl_mmio = xl_priv-xl_mmio ; int i ; @@ -179,7 +179,7 @@ static void print_tx_state(struct net_de static void print_rx_state(struct net_device *dev) { - struct xl_private *xl_priv = (struct xl_private *)dev-priv ; + struct xl_private *xl_priv = netdev_priv(dev) ; struct xl_rx_desc *rxd ; u8 __iomem *xl_mmio = xl_priv-xl_mmio ; int i ; @@ -213,7 +213,7 @@ #endif static u16 xl_ee_read(struct net_device *dev, int ee_addr) { - struct xl_private *xl_priv = (struct xl_private *)dev-priv ; + struct xl_private *xl_priv = netdev_priv(dev) ; u8 __iomem *xl_mmio = xl_priv-xl_mmio ; /* Wait for EEProm to not be busy */ @@ -245,7 +245,7 @@ static u16 xl_ee_read(struct net_device static void xl_ee_write(struct net_device *dev, int ee_addr, u16 ee_value) { - struct xl_private *xl_priv = (struct xl_private *)dev-priv ; + struct xl_private *xl_priv = netdev_priv(dev) ; u8 __iomem *xl_mmio = xl_priv-xl_mmio ; /* Wait for EEProm to not be busy */ @@ -305,11 +305,11 @@ static int __devinit xl_probe(struct pci pci_release_regions(pdev) ; return -ENOMEM ; } - xl_priv = dev-priv ; + xl_priv = netdev_priv(dev) ; #if XL_DEBUG printk(pci_device: %p, dev:%p, dev-priv: %p, ba[0]: %10x, ba[1]:%10x\n, - pdev, dev, dev-priv, (unsigned int)pdev-resource[0].start, (unsigned int)pdev-resource[1].start) ; + pdev, dev, netdev_priv(dev), (unsigned int)pdev-resource[0].start, (unsigned int)pdev-resource[1].start) ; #endif dev-irq=pdev-irq; @@ -365,7 +365,7 @@ #endif static int __devinit xl_init(struct net_device *dev) { - struct xl_private *xl_priv = (struct xl_private *)dev-priv ; + struct xl_private *xl_priv = netdev_priv(dev) ; printk(KERN_INFO %s \n, version); printk(KERN_INFO %s: I/O at %hx, MMIO at %p, using irq %d\n, @@ -385,7 +385,7 @@ static int __devinit xl_init(struct net_ static int xl_hw_reset(struct net_device *dev) { - struct xl_private *xl_priv = (struct xl_private *)dev-priv ; + struct xl_private *xl_priv = netdev_priv(dev) ; u8 __iomem *xl_mmio = xl_priv-xl_mmio ; unsigned long t ; u16 i ; @@ -568,7 +568,7 @@ #endif static int xl_open(struct net_device *dev) { - struct xl_private *xl_priv=(struct xl_private *)dev-priv; + struct xl_private *xl_priv=netdev_priv(dev); u8 __iomem *xl_mmio = xl_priv-xl_mmio ; u8 i ; u16 hwaddr[3] ; /* Should be u8[6] but we get word return values */ @@ -726,7 +726,7 @@ static int xl_open(struct net_device *de static int xl_open_hw(struct net_device *dev) { - struct xl_private *xl_priv=(struct xl_private *)dev-priv; + struct xl_private *xl_priv=netdev_priv(dev); u8 __iomem *xl_mmio = xl_priv-xl_mmio ; u16 vsoff ; char ver_str[33]; @@ -875,7 +875,7 @@ static int xl_open_hw(struct net_device static void adv_rx_ring(struct net_device *dev) /* Advance
Re: [patch 0/5][RFC] Update network drivers to use devres
On 14:44 Fri 03 Aug 2007, Stephen Hemminger wrote: On Fri, 03 Aug 2007 20:33:04 +0900 Tejun Heo [EMAIL PROTECTED] wrote: Devres makes low level drivers simpler, easier to get right and maintain. Writing new drivers becomes easier too. So, why not? Network devices seem to work fine thanks, and the resource requirements are different. If ain't broke, don't fix it. Care to enlighten me on how the resource requirments are different from ATA drivers? I was thinking of the hot remove (no mod ref counts) and lingering /sys open issues. ATA drivers use ref counts. I guess the hot removing is done by severing netdev from the actual device, right? I don't see how that affects usage of devres on network drivers. Am I missing something? The issue is that device may be removed at any time. So you can't rely on module ref counts to save you. And netdevice structure must still linger after module is removed, till dev ref count goes to zero. These patches allow the net_device to linger. The code calls free_netdev on device removal just as before. This is how the net_device is handled on device removal by these patches: +static void devm_free_netdev(struct device *gendev, void *res) +{ + struct net_device *dev = dev_get_drvdata(gendev); + free_netdev(dev); +} On a separate note, can you explain lingering /sys open issue to me a bit? With recent sysfs changes, sysfs nodes are disconnected immediately on deletion. Would that make any difference to netdevs? Examples are in Documentation/networking/netdevices.txt Isn't this the same problem as above? The net_device structure must stay around if there are still references to it and it does. Or am I missing something? Thanks, Brandon - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: strange tcp behavior
On 03/08/07 18:39, Evgeniy Polyakov wrote: On Fri, Aug 03, 2007 at 05:51:42PM +0100, Simon Arlott ([EMAIL PROTECTED]) wrote: 17:38:03.533589 IP 192.168.7.4.50550 192.168.7.8.2500: R 82517592:82517592(0) win 1500 (raw) vs 17:37:38.383085 IP 192.168.7.8.2500 192.168.7.4.50550: R 4259643274:4259643274(0) ack 1171836829 win 14360 What happened there ? Erm... you seem to have removed parts of my message in a way that doesn't make sense... On Fri, Aug 03, 2007 at 05:51:42PM +0100, Simon Arlott wrote: 17:38:04.536277 IP 192.168.7.8.2500 192.168.7.4.50550: R 1:1(0) ack 17 win 14360 vs 17:37:38.383085 IP 192.168.7.8.2500 192.168.7.4.50550: R 4259643274:4259643274(0) ack 1171836829 win 14360 What happened there ? The first one is the RST sent when the connection is close()d without reading, and the second one is the same RST but after other connection has been made on the same ports using a different socket. It is the same situation, which would happen if you will spam remote side with RST packets with arbitrary sequence number in hope that it will reset some connection. Isn't it still possible that the connection that got reset is left open (possibly for days) until another connection using the same ports is using roughly the same sequence numbers? -- Simon Arlott - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Distributed storage.
On Friday 03 August 2007 06:49, Evgeniy Polyakov wrote: ...rx has global reserve (always allocated on startup or sometime way before reclaim/oom)where data is originally received (including skb, shared info and whatever is needed, page is just an exmaple), then it is copied into per-socket reserve and reused for the next packet. Having per-socket reserve allows to have progress in any situation not only in cases where single action must be received/processed, and allows to be completely fair for all users, but not only special sockets, thus admin for example would be allowed to login, ipsec would work and so on... And when the global reserve is entirely used up your system goes back to dropping vm writeout acknowledgements, not so good. I like your approach, and specifically the copying idea cuts out considerable complexity. But I believe the per-socket flag to mark a socket as part of the vm writeout path is not optional, and in this case it will be a better world if it is a slightly unfair world in favor of vm writeout traffic. Ssh will still work fine even with vm getting priority access to the pool. During memory crunches, non-vm ssh traffic may get bumped till after the crunch, but vm writeout is never supposed to hog the whole machine. If vm writeout hogs your machine long enough to delay an ssh login then that is a vm bug and should be fixed at that level. Regards, Daniel - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Distributed storage.
On Friday 03 August 2007 07:53, Peter Zijlstra wrote: On Fri, 2007-08-03 at 17:49 +0400, Evgeniy Polyakov wrote: On Fri, Aug 03, 2007 at 02:27:52PM +0200, Peter Zijlstra wrote: ...my main position is to allocate per socket reserve from socket's queue, and copy data there from main reserve, all of which are allocated either in advance (global one) or per sockoption, so that there would be no fairness issues what to mark as special and what to not. Say we have a page per socket, each socket can assign a reserve for itself from own memory, this accounts both tx and rx side. Tx is not interesting, it is simple, rx has global reserve (always allocated on startup or sometime way before reclaim/oom)where data is originally received (including skb, shared info and whatever is needed, page is just an exmaple), then it is copied into per-socket reserve and reused for the next packet. Having per-socket reserve allows to have progress in any situation not only in cases where single action must be received/processed, and allows to be completely fair for all users, but not only special sockets, thus admin for example would be allowed to login, ipsec would work and so on... Ah, I think I understand now. Yes this is indeed a good idea! It would be quite doable to implement this on top of that I already have. We would need to extend the socket with a sock_opt that would reserve a specified amount of data for that specific socket. And then on socket demux check if the socket has a non zero reserve and has not yet exceeded said reserve. If so, process the packet. This would also quite neatly work for -rt where we would not want incomming packet processing to be delayed by memory allocations. At this point we need anything that works in mainline as a starting point. By erring on the side of simplicity we can make this understandable for folks who haven't spent the last two years wallowing in it. The page per socket approach is about as simple as it gets. I therefore propose we save our premature optimizations for later. It will also help our cause if we keep any new internal APIs to strictly what is needed to make deadlock go away. Not a whole lot more than just the flag to mark a socket as part of the vm writeout path when you get right down to essentials. Regards, Daniel - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: strange tcp behavior
From: Evgeniy Polyakov [EMAIL PROTECTED] Date: Fri, 3 Aug 2007 12:22:42 +0400 On Thu, Aug 02, 2007 at 07:21:34PM -0700, David Miller ([EMAIL PROTECTED]) wrote: What in the world are we doing allowing stream sockets to autobind? That is totally bogus. Even if we autobind, that won't make a connect happen. For accepted socket it is perfectly valid assumption - we could autobind it during the first send. Or may bind it during accept. Its a matter of taste I think. Autobinding during first sending can end up being a protection against DoS in some obscure rare case... accept()ed socket is by definition fully bound and already in established state. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] lro: eHEA example how to use LRO
Jan-Bernd Themann wrote: This patch shows how the generic LRO interface is used for SKB mode Signed-off-by: Jan-Bernd Themann [EMAIL PROTECTED] --- drivers/net/Kconfig |1 + drivers/net/ehea/ehea.h |9 - drivers/net/ehea/ehea_ethtool.c | 15 +++ drivers/net/ehea/ehea_main.c| 84 +++--- 4 files changed, 101 insertions(+), 8 deletions(-) diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig index f8a602c..fec4004 100644 --- a/drivers/net/Kconfig +++ b/drivers/net/Kconfig snip +module_param(use_lro, int, 0); Have you looked at my generic lro get/set patch that I posted this week? this adds a useless module parameter while ethtool has all the structure already to accomodate setting lro on/off. Auke - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] lro: myri10ge example how to use LRO
Andrew Gallatin wrote: To follow up on Jan-Bernd Themann's LRO patch earlier today, this patch shows how the generic LRO interface can be used for page based drivers. Again, many thanks to Jan-Bernd Themann for leading this effort. Drew Singed off by: Andrew Gallatin [EMAIL PROTECTED] please take a look at my lro patch for ethtool and see if it works for you, instead of adding another generic module parameter that doesn't need to be there. Thanks. Auke - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] lro: myri10ge example how to use LRO
Kok, Auke wrote: Andrew Gallatin wrote: To follow up on Jan-Bernd Themann's LRO patch earlier today, this patch shows how the generic LRO interface can be used for page based drivers. Again, many thanks to Jan-Bernd Themann for leading this effort. Drew Singed off by: Andrew Gallatin [EMAIL PROTECTED] please take a look at my lro patch for ethtool and see if it works for you, instead of adding another generic module parameter that doesn't need to be there. That looks very nice, and will indeed work for me. Thanks, Drew - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[BNX2]: Fix suspend/resume problem.
[BNX2]: Fix suspend/resume problem. The device would not resume properly if it was shutdown before the system was suspended. In such scenario where the netif_running state is 0, bnx2_suspend() would not save the PCI state and so the memory enable bit and bus master enable bit would be lost. We fix this by always saving and restoring the PCI state in bnx2_suspend() and bnx2_resume() regardless of netif_running() state. Update version to 1.6.4. Signed-off-by: Michael Chan [EMAIL PROTECTED] diff --git a/drivers/net/bnx2.c b/drivers/net/bnx2.c index d53dfc5..24e7f9a 100644 --- a/drivers/net/bnx2.c +++ b/drivers/net/bnx2.c @@ -54,8 +54,8 @@ #define DRV_MODULE_NAMEbnx2 #define PFX DRV_MODULE_NAME: -#define DRV_MODULE_VERSION 1.6.3 -#define DRV_MODULE_RELDATE July 16, 2007 +#define DRV_MODULE_VERSION 1.6.4 +#define DRV_MODULE_RELDATE August 3, 2007 #define RUN_AT(x) (jiffies + (x)) @@ -6937,6 +6937,11 @@ bnx2_suspend(struct pci_dev *pdev, pm_message_t state) struct bnx2 *bp = netdev_priv(dev); u32 reset_code; + /* PCI register 4 needs to be saved whether netif_running() or not. +* MSI address and data need to be saved if using MSI and +* netif_running(). +*/ + pci_save_state(pdev); if (!netif_running(dev)) return 0; @@ -6952,7 +6957,6 @@ bnx2_suspend(struct pci_dev *pdev, pm_message_t state) reset_code = BNX2_DRV_MSG_CODE_SUSPEND_NO_WOL; bnx2_reset_chip(bp, reset_code); bnx2_free_skbs(bp); - pci_save_state(pdev); bnx2_set_power_state(bp, pci_choose_state(pdev, state)); return 0; } @@ -6963,10 +6967,10 @@ bnx2_resume(struct pci_dev *pdev) struct net_device *dev = pci_get_drvdata(pdev); struct bnx2 *bp = netdev_priv(dev); + pci_restore_state(pdev); if (!netif_running(dev)) return 0; - pci_restore_state(pdev); bnx2_set_power_state(bp, PCI_D0); netif_device_attach(dev); bnx2_init_nic(bp); - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: strange tcp behavior
From: Evgeniy Polyakov [EMAIL PROTECTED] Date: Fri, 3 Aug 2007 12:22:42 +0400 Maybe recvmsg should be changed too for symmetry? I took a look at this, and it's not %100 trivial. Let's do this later, and only sendmsg for now in order to fix the bug in the stable branches. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: netdevice queueing / sendmsg issue?
David Miller [EMAIL PROTECTED] writes: Software interrupts might be getting lost, dev_kfree_skb_irq() has to queue the kfree_skb() to soft IRQ. Therefore, dev_kfree_skb_irq() will only work properly from hardware interrupt context, where we will return and thus run the scheduled software interrupt. Problem solved, stupid user mistake. I was using netif_start_queue() instead of netif_wake_queue(). -- Krzysztof Halasa - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Distributed storage.
Hi Evgeniy, Nit alert: On Tuesday 31 July 2007 10:13, Evgeniy Polyakov wrote: * storage can be formed on top of remote nodes and be exported simultaneously (iSCSI is peer-to-peer only, NBD requires device mapper and is synchronous) In fact, NBD has nothing to do with device mapper. I use it as a physical target underneath ddraid (a device mapper plugin) just like I would use your DST if it proves out. Regards, Daniel - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Distributed storage.
Hi Mike, On Thursday 02 August 2007 21:09, Mike Snitzer wrote: But NBD's synchronous nature is actually an asset when coupled with MD raid1 as it provides guarantees that the data has _really_ been mirrored remotely. And bio completion doesn't? Regards, Daniel - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Distributed storage.
On Friday 03 August 2007 03:26, Evgeniy Polyakov wrote: On Thu, Aug 02, 2007 at 02:08:24PM -0700, I wrote: I see bits that worry me, e.g.: + req = mempool_alloc(st-w-req_pool, GFP_NOIO); which seems to be callable in response to a local request, just the case where NBD deadlocks. Your mempool strategy can work reliably only if you can prove that the pool allocations of the maximum number of requests you can have in flight do not exceed the size of the pool. In other words, if you ever take the pool's fallback path to normal allocation, you risk deadlock. mempool should be allocated to be able to catch up with maximum in-flight requests, in my tests I was unable to force block layer to put more than 31 pages in sync, but in one bio. Each request is essentially dealyed bio processing, so this must handle maximum number of in-flight bios (if they do not cover multiple nodes, if they do, then each node requires own request). It depends on the characteristics of the physical and virtual block devices involved. Slow block devices can produce surprising effects. Ddsnap still qualifies as slow under certain circumstances (big linear write immediately following a new snapshot). Before we added throttling we would see as many as 800,000 bios in flight. Nice to know the system can actually survive this... mostly. But memory deadlock is a clear and present danger under those conditions and we did hit it (not to mention that read latency sucked beyond belief). Anyway, we added a simple counting semaphore to throttle the bio traffic to a reasonable number and behavior became much nicer, but most importantly, this satisfies one of the primary requirements for avoiding block device memory deadlock: a strictly bounded amount of bio traffic in flight. In fact, we allow some bounded number of non-memalloc bios *plus* however much traffic the mm wants to throw at us in memalloc mode, on the assumption that the mm knows what it is doing and imposes its own bound of in flight bios per device. This needs auditing obviously, but the mm either does that or is buggy. In practice, with this throttling in place we never saw more than 2,000 in flight no matter how hard we hit it, which is about the number we were aiming at. Since we draw our reserve from the main memalloc pool, we can easily handle 2,000 bios in flight, even under extreme conditions. See: http://zumastor.googlecode.com/svn/trunk/ddsnap/kernel/dm-ddsnap.c down(info-throttle_sem); To be sure, I am not very proud of this throttling mechanism for various reasons, but the thing is, _any_ throttling mechanism no matter how sucky solves the deadlock problem. Over time I want to move the throttling up into bio submission proper, or perhaps incorporate it in device mapper's queue function, not quite as high up the food chain. Only some stupid little logistical issues stopped me from doing it one of those ways right from the start. I think Peter has also tried some things in this area. Anyway, that part is not pressing because the throttling can be done in the virtual device itself as we do it, even if it is not very pretty there. The point is: you have to throttle the bio traffic. The alternative is to die a horrible death under conditions that may be rare, but _will_ hit somebody. Regards, Daniel - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch 0/7] CAN: Add new PF_CAN protocol family, try #5
Hello Dave, this is the fifth post of the patch series that adds the PF_CAN protocol family for the Controller Area Network. Since our last post we have changed the following: * Remove slab destructor from calls to kmem_cache_alloc(). * Add comments about types defined in can.h. * Update comment on vcan loopback module parameter. * Fix typo in documentation. The changes in try #4 were: * Change vcan network driver to use the new RTNL API, as suggested by Patrick. * Revert our change to use skb-iif instead of skb-cb. After discussion with Patrick and Jamal it turned out, our first implementation was correct. * Use skb_tail_pointer() instead of skb-tail directly. * Coding style changes to satisfy linux/scripts/checkpatch.pl. * Minor changes for 64-bit-cleanliness. * Minor cleanup of #include's The changes in try #3 were: * Use sbk-sk and skb-pkt_type instead of skb-cb to pass loopback flags and originating socket down to the driver and back to the receiving socket. Thanks to Patrick McHardy for pointing out our wrong use of sbk-cb. * Use skb-iif instead of skb-cb to pass receiving interface from raw_rcv() and bcm_rcv() up to raw_recvmsg() and bcm_recvmsg(). * Set skb-protocol when sending CAN frames to netdevices. * Removed struct raw_opt and struct bcm_opt and integrated these directly into struct raw_sock and bcm_sock resp., like most other proto implementations do. * We have found and fixed race conditions between raw_bind(), raw_{set,get}sockopt() and raw_notifier(). This resulted in - complete removal of our own notifier list infrastructure in af_can.c. raw.c and bcm.c now use normal netdevice notifiers. - removal of ro-lock spinlock. We use lock_sock(sk) now. - changed deletion of dev_rcv_lists, which are now marked for deletion in the netdevice notifier in af_can.c and are actually deleted when all entries have been deleted using can_rx_unregister(). * Follow changes in 2.6.22 (e.g. ktime_t timestamps in skb). * Removed obsolete code from vcan.c, as pointed out by Stephen Hemminger. The changes in try #2 were: * reduced RCU callback overhead when deleting receiver lists (thx to feedback from Paul E. McKenney). * eliminated some code duplication in net/can/proc.c. * renamed slock-29 and sk_lock-29 to slock-AF_CAN and sk_lock-AF_CAN in net/core/sock.c * added entry for can.txt in Documentation/networking/00-INDEX * added error frame definitions in include/linux/can/error.h, which are to be used by CAN network drivers. This patch series applies against net-2.6 and is derived from Subversion revision r455 of http://svn.berlios.de/svnroot/repos/socketcan. It can be found in the directory http://svn.berlios.de/svnroot/repos/socketcan/trunk/patch-series/version. This patch doesn't touch anything in the kernel except for the allocation of a couple of numbers for protocol, arp hw type, and a line discipline. Please review this patch series for integration into your tree. Thanks very much for your work! Best regards, Urs Thuermann Oliver Hartkopp -- - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch 6/7] CAN: Add maintainer entries
This patch adds entries in the CREDITS and MAINTAINERS file for CAN. Signed-off-by: Oliver Hartkopp [EMAIL PROTECTED] Signed-off-by: Urs Thuermann [EMAIL PROTECTED] --- CREDITS | 16 MAINTAINERS |9 + 2 files changed, 25 insertions(+) Index: net-2.6/CREDITS === --- net-2.6.orig/CREDITS2007-08-03 11:21:31.0 +0200 +++ net-2.6/CREDITS 2007-08-03 11:21:56.0 +0200 @@ -1331,6 +1331,14 @@ S: 5623 HZ Eindhoven S: The Netherlands +N: Oliver Hartkopp +E: [EMAIL PROTECTED] +W: http://www.volkswagen.de +D: Controller Area Network (network layer core) +S: Brieffach 1776 +S: 38436 Wolfsburg +S: Germany + N: Andrew Haylett E: [EMAIL PROTECTED] D: Selection mechanism @@ -3284,6 +3292,14 @@ S: F-35042 Rennes Cedex S: France +N: Urs Thuermann +E: [EMAIL PROTECTED] +W: http://www.volkswagen.de +D: Controller Area Network (network layer core) +S: Brieffach 1776 +S: 38436 Wolfsburg +S: Germany + N: Jon Tombs E: [EMAIL PROTECTED] W: http://www.esi.us.es/~jon Index: net-2.6/MAINTAINERS === --- net-2.6.orig/MAINTAINERS2007-08-03 11:21:31.0 +0200 +++ net-2.6/MAINTAINERS 2007-08-03 11:21:56.0 +0200 @@ -951,6 +951,15 @@ L: [EMAIL PROTECTED] S: Maintained +CAN NETWORK LAYER +P: Urs Thuermann +M: [EMAIL PROTECTED] +P: Oliver Hartkopp +M: [EMAIL PROTECTED] +L: [EMAIL PROTECTED] +W: http://developer.berlios.de/projects/socketcan/ +S: Maintained + CALGARY x86-64 IOMMU P: Muli Ben-Yehuda M: [EMAIL PROTECTED] -- - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch 5/7] CAN: Add virtual CAN netdevice driver
This patch adds the virtual CAN bus (vcan) network driver. The vcan device is just a loopback device for CAN frames, no real CAN hardware is involved. Signed-off-by: Oliver Hartkopp [EMAIL PROTECTED] Signed-off-by: Urs Thuermann [EMAIL PROTECTED] --- drivers/net/Makefile |1 drivers/net/can/Kconfig | 25 drivers/net/can/Makefile |5 drivers/net/can/vcan.c | 261 +++ net/can/Kconfig |3 5 files changed, 295 insertions(+) Index: net-2.6/drivers/net/Makefile === --- net-2.6.orig/drivers/net/Makefile 2007-08-03 11:21:31.0 +0200 +++ net-2.6/drivers/net/Makefile2007-08-03 11:21:54.0 +0200 @@ -8,6 +8,7 @@ obj-$(CONFIG_CHELSIO_T1) += chelsio/ obj-$(CONFIG_CHELSIO_T3) += cxgb3/ obj-$(CONFIG_EHEA) += ehea/ +obj-$(CONFIG_CAN) += can/ obj-$(CONFIG_BONDING) += bonding/ obj-$(CONFIG_ATL1) += atl1/ obj-$(CONFIG_GIANFAR) += gianfar_driver.o Index: net-2.6/drivers/net/can/Kconfig === --- /dev/null 1970-01-01 00:00:00.0 + +++ net-2.6/drivers/net/can/Kconfig 2007-08-03 11:21:54.0 +0200 @@ -0,0 +1,25 @@ +menu CAN Device Drivers + depends on CAN + +config CAN_VCAN + tristate Virtual Local CAN Interface (vcan) + depends on CAN + default N + ---help--- + Similar to the network loopback devices, vcan offers a + virtual local CAN interface. + + This driver can also be built as a module. If so, the module + will be called vcan. + +config CAN_DEBUG_DEVICES + bool CAN devices debugging messages + depends on CAN + default N + ---help--- + Say Y here if you want the CAN device drivers to produce a bunch of + debug messages to the system log. Select this if you are having + a problem with CAN support and want to see more of what is going + on. + +endmenu Index: net-2.6/drivers/net/can/Makefile === --- /dev/null 1970-01-01 00:00:00.0 + +++ net-2.6/drivers/net/can/Makefile2007-08-03 11:21:54.0 +0200 @@ -0,0 +1,5 @@ +# +# Makefile for the Linux Controller Area Network drivers. +# + +obj-$(CONFIG_CAN_VCAN) += vcan.o Index: net-2.6/drivers/net/can/vcan.c === --- /dev/null 1970-01-01 00:00:00.0 + +++ net-2.6/drivers/net/can/vcan.c 2007-08-03 11:21:54.0 +0200 @@ -0,0 +1,261 @@ +/* + * vcan.c - Virtual CAN interface + * + * Copyright (c) 2002-2007 Volkswagen Group Electronic Research + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + *notice, this list of conditions, the following disclaimer and + *the referenced file 'COPYING'. + * 2. Redistributions in binary form must reproduce the above copyright + *notice, this list of conditions and the following disclaimer in the + *documentation and/or other materials provided with the distribution. + * 3. Neither the name of Volkswagen nor the names of its contributors + *may be used to endorse or promote products derived from this software + *without specific prior written permission. + * + * Alternatively, provided that this notice is retained in full, this + * software may be distributed under the terms of the GNU General + * Public License (GPL) version 2 as distributed in the 'COPYING' + * file from the main directory of the linux kernel source. + * + * The provided data structures and external interfaces from this code + * are not restricted to be used by modules with a GPL compatible license. + * + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + * AS IS AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH + * DAMAGE. + * + * Send feedback to [EMAIL PROTECTED] + * + */ + +#include linux/module.h +#include linux/init.h +#include linux/netdevice.h +#include linux/if_arp.h +#include linux/if_ether.h +#include linux/can.h +#include
[patch 3/7] CAN: Add raw protocol
This patch adds the CAN raw protocol. Signed-off-by: Oliver Hartkopp [EMAIL PROTECTED] Signed-off-by: Urs Thuermann [EMAIL PROTECTED] --- include/linux/can/raw.h | 31 + net/can/Kconfig | 26 + net/can/Makefile|3 net/can/raw.c | 757 4 files changed, 817 insertions(+) Index: net-2.6/include/linux/can/raw.h === --- /dev/null 1970-01-01 00:00:00.0 + +++ net-2.6/include/linux/can/raw.h 2007-08-03 11:21:48.0 +0200 @@ -0,0 +1,31 @@ +/* + * linux/can/raw.h + * + * Definitions for raw CAN sockets + * + * Authors: Oliver Hartkopp [EMAIL PROTECTED] + * Urs Thuermann [EMAIL PROTECTED] + * Copyright (c) 2002-2007 Volkswagen Group Electronic Research + * All rights reserved. + * + * Send feedback to [EMAIL PROTECTED] + * + */ + +#ifndef CAN_RAW_H +#define CAN_RAW_H + +#include linux/can.h + +#define SOL_CAN_RAW (SOL_CAN_BASE + CAN_RAW) + +/* for socket options affecting the socket (not the global system) */ + +enum { + CAN_RAW_FILTER = 1, /* set 0 .. n can_filter(s) */ + CAN_RAW_ERR_FILTER, /* set filter for error frames */ + CAN_RAW_LOOPBACK, /* local loopback (default:on) */ + CAN_RAW_RECV_OWN_MSGS /* receive my own msgs (default:off) */ +}; + +#endif Index: net-2.6/net/can/Kconfig === --- net-2.6.orig/net/can/Kconfig2007-08-03 11:21:46.0 +0200 +++ net-2.6/net/can/Kconfig 2007-08-03 11:21:48.0 +0200 @@ -16,6 +16,32 @@ If you want CAN support, you should say Y here and also to the specific driver for your controller(s) below. +config CAN_RAW + tristate Raw CAN Protocol (raw access with CAN-ID filtering) + depends on CAN + default N + ---help--- + The Raw CAN protocol option offers access to the CAN bus via + the BSD socket API. You probably want to use the raw socket in + most cases where no higher level protocol is being used. The raw + socket has several filter options e.g. ID-Masking / Errorframes. + To receive/send raw CAN messages, use AF_CAN with protocol CAN_RAW. + +config CAN_RAW_USER + bool Allow non-root users to access Raw CAN Protocol sockets + depends on CAN_RAW + default N + ---help--- + The Controller Area Network is a local field bus transmitting only + broadcast messages without any routing and security concepts. + In the majority of cases the user application has to deal with + raw CAN frames. Therefore it might be reasonable NOT to restrict + the CAN access only to the user root, as known from other networks. + Since CAN_RAW sockets can only send and receive frames to/from CAN + interfaces this does not affect security of others networks. + Say Y here if you want non-root users to be able to access CAN_RAW + sockets. + config CAN_DEBUG_CORE bool CAN Core debugging messages depends on CAN Index: net-2.6/net/can/Makefile === --- net-2.6.orig/net/can/Makefile 2007-08-03 11:21:46.0 +0200 +++ net-2.6/net/can/Makefile2007-08-03 11:21:48.0 +0200 @@ -4,3 +4,6 @@ obj-$(CONFIG_CAN) += can.o can-objs := af_can.o proc.o + +obj-$(CONFIG_CAN_RAW) += can-raw.o +can-raw-objs := raw.o Index: net-2.6/net/can/raw.c === --- /dev/null 1970-01-01 00:00:00.0 + +++ net-2.6/net/can/raw.c 2007-08-03 11:21:48.0 +0200 @@ -0,0 +1,757 @@ +/* + * raw.c - Raw sockets for protocol family CAN + * + * Copyright (c) 2002-2007 Volkswagen Group Electronic Research + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + *notice, this list of conditions, the following disclaimer and + *the referenced file 'COPYING'. + * 2. Redistributions in binary form must reproduce the above copyright + *notice, this list of conditions and the following disclaimer in the + *documentation and/or other materials provided with the distribution. + * 3. Neither the name of Volkswagen nor the names of its contributors + *may be used to endorse or promote products derived from this software + *without specific prior written permission. + * + * Alternatively, provided that this notice is retained in full, this + * software may be distributed under the terms of the GNU General + * Public License (GPL) version 2 as distributed in the 'COPYING' + * file from the main directory of the linux kernel
[patch 1/7] CAN: Allocate protocol numbers for PF_CAN
This patch adds a protocol/address family number, ARP hardware type, ethernet packet type, and a line discipline number for the SocketCAN implementation. Signed-off-by: Oliver Hartkopp [EMAIL PROTECTED] Signed-off-by: Urs Thuermann [EMAIL PROTECTED] --- include/linux/if_arp.h |1 + include/linux/if_ether.h |1 + include/linux/socket.h |2 ++ include/linux/tty.h |3 ++- net/core/sock.c |4 ++-- 5 files changed, 8 insertions(+), 3 deletions(-) Index: net-2.6/include/linux/if_arp.h === --- net-2.6.orig/include/linux/if_arp.h 2007-08-03 11:21:32.0 +0200 +++ net-2.6/include/linux/if_arp.h 2007-08-03 11:21:42.0 +0200 @@ -52,6 +52,7 @@ #define ARPHRD_ROSE270 #define ARPHRD_X25 271 /* CCITT X.25 */ #define ARPHRD_HWX25 272 /* Boards with X.25 in firmware */ +#define ARPHRD_CAN 280 /* Controller Area Network */ #define ARPHRD_PPP 512 #define ARPHRD_CISCO 513 /* Cisco HDLC */ #define ARPHRD_HDLCARPHRD_CISCO Index: net-2.6/include/linux/if_ether.h === --- net-2.6.orig/include/linux/if_ether.h 2007-08-03 11:21:32.0 +0200 +++ net-2.6/include/linux/if_ether.h2007-08-03 11:21:42.0 +0200 @@ -90,6 +90,7 @@ #define ETH_P_WAN_PPP 0x0007 /* Dummy type for WAN PPP frames*/ #define ETH_P_PPP_MP0x0008 /* Dummy type for PPP MP frames */ #define ETH_P_LOCALTALK 0x0009 /* Localtalk pseudo type*/ +#define ETH_P_CAN 0x000C /* Controller Area Network */ #define ETH_P_PPPTALK 0x0010 /* Dummy type for Atalk over PPP*/ #define ETH_P_TR_802_2 0x0011 /* 802.2 frames */ #define ETH_P_MOBITEX 0x0015 /* Mobitex ([EMAIL PROTECTED]) */ Index: net-2.6/include/linux/socket.h === --- net-2.6.orig/include/linux/socket.h 2007-08-03 11:21:32.0 +0200 +++ net-2.6/include/linux/socket.h 2007-08-03 11:21:42.0 +0200 @@ -185,6 +185,7 @@ #define AF_PPPOX 24 /* PPPoX sockets*/ #define AF_WANPIPE 25 /* Wanpipe API Sockets */ #define AF_LLC 26 /* Linux LLC*/ +#define AF_CAN 29 /* Controller Area Network */ #define AF_TIPC30 /* TIPC sockets */ #define AF_BLUETOOTH 31 /* Bluetooth sockets*/ #define AF_IUCV32 /* IUCV sockets */ @@ -220,6 +221,7 @@ #define PF_PPPOX AF_PPPOX #define PF_WANPIPE AF_WANPIPE #define PF_LLC AF_LLC +#define PF_CAN AF_CAN #define PF_TIPCAF_TIPC #define PF_BLUETOOTH AF_BLUETOOTH #define PF_IUCVAF_IUCV Index: net-2.6/include/linux/tty.h === --- net-2.6.orig/include/linux/tty.h2007-08-03 11:21:32.0 +0200 +++ net-2.6/include/linux/tty.h 2007-08-03 11:21:42.0 +0200 @@ -24,7 +24,7 @@ #define NR_PTYSCONFIG_LEGACY_PTY_COUNT /* Number of legacy ptys */ #define NR_UNIX98_PTY_DEFAULT 4096 /* Default maximum for Unix98 ptys */ #define NR_UNIX98_PTY_MAX (1 MINORBITS) /* Absolute limit */ -#define NR_LDISCS 17 +#define NR_LDISCS 18 /* line disciplines */ #define N_TTY 0 @@ -45,6 +45,7 @@ #define N_SYNC_PPP 14 /* synchronous PPP */ #define N_HCI 15 /* Bluetooth HCI UART */ #define N_GIGASET_M101 16 /* Siemens Gigaset M101 serial DECT adapter */ +#define N_SLCAN17 /* Serial / USB serial CAN Adaptors */ /* * This character is the same as _POSIX_VDISABLE: it cannot be used as Index: net-2.6/net/core/sock.c === --- net-2.6.orig/net/core/sock.c2007-08-03 11:21:32.0 +0200 +++ net-2.6/net/core/sock.c 2007-08-03 11:21:42.0 +0200 @@ -153,7 +153,7 @@ sk_lock-AF_ASH , sk_lock-AF_ECONET , sk_lock-AF_ATMSVC , sk_lock-21 , sk_lock-AF_SNA , sk_lock-AF_IRDA , sk_lock-AF_PPPOX , sk_lock-AF_WANPIPE , sk_lock-AF_LLC , - sk_lock-27 , sk_lock-28 , sk_lock-29 , + sk_lock-27 , sk_lock-28 , sk_lock-AF_CAN , sk_lock-AF_TIPC , sk_lock-AF_BLUETOOTH, sk_lock-IUCV, sk_lock-AF_RXRPC , sk_lock-AF_MAX }; @@ -167,7 +167,7 @@ slock-AF_ASH , slock-AF_ECONET , slock-AF_ATMSVC , slock-21 , slock-AF_SNA , slock-AF_IRDA , slock-AF_PPPOX , slock-AF_WANPIPE , slock-AF_LLC , - slock-27 , slock-28 , slock-29 , + slock-27 , slock-28 , slock-AF_CAN ,
[patch 7/7] CAN: Add documentation
This patch adds documentation for the PF_CAN protocol family. Signed-off-by: Oliver Hartkopp [EMAIL PROTECTED] Signed-off-by: Urs Thuermann [EMAIL PROTECTED] --- Documentation/networking/00-INDEX |2 Documentation/networking/can.txt | 635 ++ 2 files changed, 637 insertions(+) Index: net-2.6/Documentation/networking/can.txt === --- /dev/null 1970-01-01 00:00:00.0 + +++ net-2.6/Documentation/networking/can.txt2007-08-03 11:21:58.0 +0200 @@ -0,0 +1,635 @@ + + +can.txt + +Readme file for the Controller Area Network Protocol Family (aka Socket CAN) + +This file contains + + 1 Overview / What is Socket CAN + + 2 Motivation / Why using the socket API + + 3 Socket CAN concept +3.1 receive lists +3.2 loopback +3.3 network security issues (capabilities) +3.4 network problem notifications + + 4 How to use Socket CAN +4.1 RAW protocol sockets with can_filters (SOCK_RAW) + 4.1.1 RAW socket option CAN_RAW_FILTER + 4.1.2 RAW socket option CAN_RAW_ERR_FILTER + 4.1.3 RAW socket option CAN_RAW_LOOPBACK + 4.1.4 RAW socket option CAN_RAW_RECV_OWN_MSGS +4.2 Broadcast Manager protocol sockets (SOCK_DGRAM) +4.3 connected transport protocols (SOCK_SEQPACKET) +4.4 unconnected transport protocols (SOCK_DGRAM) + + 5 Socket CAN core module +5.1 can.ko module params +5.2 procfs content +5.3 writing own CAN protocol modules + + 6 CAN network drivers +6.1 general settings +6.2 loopback +6.3 CAN controller hardware filters +6.4 currently supported CAN hardware +6.5 todo + + 7 Credits + + + +1. Overview / What is Socket CAN + + +The socketcan package is an implementation of CAN protocols +(Controller Area Network) for Linux. CAN is a networking technology +which has wide-spread use in automation, embedded devices, and +automotive fields. While there have been other CAN implementations +for Linux based on character devices, Socket CAN uses the Berkeley +socket API, the Linux network stack and implements the CAN device +drivers as network interfaces. The CAN socket API has been designed +as similar as possible to the TCP/IP protocols to allow programmers, +familiar with network programming, to easily learn how to use CAN +sockets. + +2. Motivation / Why using the socket API + + +There have been CAN implementations for Linux before Socket CAN so the +question arises, why we have started another project. Most existing +implementations come as a device driver for some CAN hardware, they +are based on character devices and provide comparatively little +functionality. Usually, there is only a hardware-specific device +driver which provides a character device interface to send and +receive raw CAN frames, directly to/from the controller hardware. +Queueing of frames and higher-level transport protocols like ISO-TP +have to be implemented in user space applications. Also, most +character-device implementations support only one single process to +open the device at a time, similar to a serial interface. Exchanging +the CAN controller requires employment of another device driver and +often the need for adaption of large parts of the application to the +new driver's API. + +Socket CAN was designed to overcome all of these limitations. A new +protocol family has been implemented which provides a socket interface +to user space applications and which builds upon the Linux network +layer, so to use all of the provided queueing functionality. Device +drivers for CAN controller hardware register itself with the Linux +network layer as a network device, so that CAN frames from the +controller can be passed up to the network layer and on to the CAN +protocol family module and also vice-versa. Also, the protocol family +module provides an API for transport protocol modules to register, so +that any number of transport protocols can be loaded or unloaded +dynamically. In fact, the can core module alone does not provide any +protocol and can not be used without loading at least one additional +protocol module. Multiple sockets can be opened at the same time, +on different or the same protocol module and they can listen/send +frames on different or the same CAN IDs. Several sockets listening on +the same interface for frames with the same CAN ID are all passed the +same received matching CAN frames. An application wishing to +communicate using a specific transport protocol, e.g. ISO-TP, just +selects that protocol when opening the socket, and then can read and +write application data byte streams, without having to deal with +CAN-IDs, frames, etc. + +Similar functionality visible from user-space could
[patch 4/7] CAN: Add broadcast manager (bcm) protocol
This patch adds the CAN broadcast manager (bcm) protocol. Signed-off-by: Oliver Hartkopp [EMAIL PROTECTED] Signed-off-by: Urs Thuermann [EMAIL PROTECTED] --- include/linux/can/bcm.h | 65 + net/can/Kconfig | 28 net/can/Makefile|3 net/can/bcm.c | 1755 4 files changed, 1851 insertions(+) Index: net-2.6/include/linux/can/bcm.h === --- /dev/null 1970-01-01 00:00:00.0 + +++ net-2.6/include/linux/can/bcm.h 2007-08-03 11:21:51.0 +0200 @@ -0,0 +1,65 @@ +/* + * linux/can/bcm.h + * + * Definitions for CAN Broadcast Manager (BCM) + * + * Author: Oliver Hartkopp [EMAIL PROTECTED] + * Copyright (c) 2002-2007 Volkswagen Group Electronic Research + * All rights reserved. + * + * Send feedback to [EMAIL PROTECTED] + * + */ + +#ifndef CAN_BCM_H +#define CAN_BCM_H + +/** + * struct bcm_msg_head - head of messages to/from the broadcast manager + * @opcode:opcode, see enum below. + * @flags: special flags, see below. + * @count: number of frames to send before changing interval. + * @ival1: interval for the first @count frames. + * @ival2: interval for the following frames. + * @can_id:CAN ID of frames to be sent or received. + * @nframes: number of frames appended to the message head. + * @frames:array of CAN frames. + */ +struct bcm_msg_head { + int opcode; + int flags; + int count; + struct timeval ival1, ival2; + canid_t can_id; + int nframes; + struct can_frame frames[0]; +}; + +enum { + TX_SETUP = 1, /* create (cyclic) transmission task */ + TX_DELETE, /* remove (cyclic) transmission task */ + TX_READ,/* read properties of (cyclic) transmission task */ + TX_SEND,/* send one CAN frame */ + RX_SETUP, /* create RX content filter subscription */ + RX_DELETE, /* remove RX content filter subscription */ + RX_READ,/* read properties of RX content filter subscription */ + TX_STATUS, /* reply to TX_READ request */ + TX_EXPIRED, /* notification on performed transmissions (count=0) */ + RX_STATUS, /* reply to RX_READ request */ + RX_TIMEOUT, /* cyclic message is absent */ + RX_CHANGED /* updated CAN frame (detected content change) */ +}; + +#define SETTIMER0x0001 +#define STARTTIMER 0x0002 +#define TX_COUNTEVT 0x0004 +#define TX_ANNOUNCE 0x0008 +#define TX_CP_CAN_ID0x0010 +#define RX_FILTER_ID0x0020 +#define RX_CHECK_DLC0x0040 +#define RX_NO_AUTOTIMER 0x0080 +#define RX_ANNOUNCE_RESUME 0x0100 +#define TX_RESET_MULTI_IDX 0x0200 +#define RX_RTR_FRAME0x0400 + +#endif /* CAN_BCM_H */ Index: net-2.6/net/can/Kconfig === --- net-2.6.orig/net/can/Kconfig2007-08-03 11:21:48.0 +0200 +++ net-2.6/net/can/Kconfig 2007-08-03 11:21:51.0 +0200 @@ -42,6 +42,34 @@ Say Y here if you want non-root users to be able to access CAN_RAW sockets. +config CAN_BCM + tristate Broadcast Manager CAN Protocol (with content filtering) + depends on CAN + default N + ---help--- + The Broadcast Manager offers content filtering, timeout monitoring, + sending of RTR-frames and cyclic CAN messages without permanent user + interaction. The BCM can be 'programmed' via the BSD socket API and + informs you on demand e.g. only on content updates / timeouts. + You probably want to use the bcm socket in most cases where cyclic + CAN messages are used on the bus (e.g. in automotive environments). + To use the Broadcast Manager, use AF_CAN with protocol CAN_BCM. + +config CAN_BCM_USER + bool Allow non-root users to access CAN broadcast manager sockets + depends on CAN_BCM + default N + ---help--- + The Controller Area Network is a local field bus transmitting only + broadcast messages without any routing and security concepts. + In the majority of cases the user application has to deal with + raw CAN frames. Therefore it might be reasonable NOT to restrict + the CAN access only to the user root, as known from other networks. + Since CAN_BCM sockets can only send and receive frames to/from CAN + interfaces this does not affect security of others networks. + Say Y here if you want non-root users to be able to access CAN_BCM + sockets. + config CAN_DEBUG_CORE bool CAN Core debugging messages depends on CAN Index: net-2.6/net/can/Makefile === --- net-2.6.orig/net/can/Makefile 2007-08-03 11:21:48.0 +0200 +++ net-2.6/net/can/Makefile2007-08-03
Re: Distributed storage.
On Fri, 2007-08-03 at 09:04 +0400, Manu Abraham wrote: On 7/31/07, Evgeniy Polyakov [EMAIL PROTECTED] wrote: TODO list currently includes following main items: * redundancy algorithm (drop me a request of your own, but it is highly unlikley that Reed-Solomon based will ever be used - it is too slow for distributed RAID, I consider WEAVER codes) LDPC codes[1][2] have been replacing Turbo code[3] with regards to communication links and we have been seeing that transition. (maybe helpful, came to mind seeing the mention of Turbo code) Don't know how weaver compares to LDPC, though found some comparisons [4][5] But looking at fault tolerance figures, i guess Weaver is much better. [1] http://www.ldpc-codes.com/ [2] http://portal.acm.org/citation.cfm?id=1240497 [3] http://en.wikipedia.org/wiki/Turbo_code [4] http://domino.research.ibm.com/library/cyberdig.nsf/papers/BD559022A190D41C85257212006CEC11/$File/rj10391.pdf [5] http://hplabs.hp.com/personal/Jay_Wylie/publications/wylie_dsn2007.pdf Searching Google for Dr. Plank's work at the University of TN turns up some analysis of using LDPC codes in storage systems. http://www.google.com/search?hl=enq=plank+ldpcbtnG=Google+Search Patents are an issue to watch out for around the use of Tornado/Raptor codes. I've not researched it, but I believe there be dragons there. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC 0/2][BNX2]: Add iSCSI support to BNX2 devices.
[BNX2]: Add iSCSI support to BNX2 devices. Modify bnx2 and add a cnic driver to support some offload functions needed by iSCSI. Add a new open-iscsi driver to support iSCSI offload on bnx2 devices. Signed-off-by: Anil Veerabhadrappa [EMAIL PROTECTED] Signed-off-by: Michael Chan [EMAIL PROTECTED] -- The complete patch is in: ftp://[EMAIL PROTECTED]/0001-BNX2-Add-iSCSI-support-to-BNX2-devices.patch I broke this into 2 patches and omitted the firmware blob in the next 2 emails for review. --- drivers/net/Kconfig | 10 + drivers/net/Makefile |1 + drivers/net/bnx2.c| 116 +- drivers/net/bnx2.h| 25 +- drivers/net/bnx2_fw.h | 7036 ++--- drivers/net/cnic.c| 1885 drivers/net/cnic.diff | 363 ++ drivers/net/cnic.h| 163 + drivers/net/cnic_cm.h | 555 +++ drivers/net/cnic_if.h | 152 + drivers/scsi/Kconfig |2 + drivers/scsi/Makefile |1 + drivers/scsi/bnx2i/57xx_iscsi_constants.h | 212 + drivers/scsi/bnx2i/57xx_iscsi_hsi.h | 1501 ++ drivers/scsi/bnx2i/Kconfig|7 + drivers/scsi/bnx2i/Makefile |4 + drivers/scsi/bnx2i/bnx2i.h| 828 drivers/scsi/bnx2i/bnx2i_hwi.c| 1993 drivers/scsi/bnx2i/bnx2i_init.c | 393 ++ drivers/scsi/bnx2i/bnx2i_iscsi.c | 3718 +++ drivers/scsi/bnx2i/bnx2i_sysfs.c | 616 +++ - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Distributed storage.
On 8/4/07, Dave Dillow [EMAIL PROTECTED] wrote: On Fri, 2007-08-03 at 09:04 +0400, Manu Abraham wrote: On 7/31/07, Evgeniy Polyakov [EMAIL PROTECTED] wrote: TODO list currently includes following main items: * redundancy algorithm (drop me a request of your own, but it is highly unlikley that Reed-Solomon based will ever be used - it is too slow for distributed RAID, I consider WEAVER codes) LDPC codes[1][2] have been replacing Turbo code[3] with regards to communication links and we have been seeing that transition. (maybe helpful, came to mind seeing the mention of Turbo code) Don't know how weaver compares to LDPC, though found some comparisons [4][5] But looking at fault tolerance figures, i guess Weaver is much better. [1] http://www.ldpc-codes.com/ [2] http://portal.acm.org/citation.cfm?id=1240497 [3] http://en.wikipedia.org/wiki/Turbo_code [4] http://domino.research.ibm.com/library/cyberdig.nsf/papers/BD559022A190D41C85257212006CEC11/$File/rj10391.pdf [5] http://hplabs.hp.com/personal/Jay_Wylie/publications/wylie_dsn2007.pdf Searching Google for Dr. Plank's work at the University of TN turns up some analysis of using LDPC codes in storage systems. http://www.google.com/search?hl=enq=plank+ldpcbtnG=Google+Search Patents are an issue to watch out for around the use of Tornado/Raptor codes. I've not researched it, but I believe there be dragons there. We don't use the code in the driver straight away [2] (in the case that i mentioned), since that happens in the hardware (demodulator chip) [1], but we have an interface for selecting the code-rate [2] (LDPC/BCH) for DVB-S2 and the new papers for DVB-T2 looks geared that the base decision is to use LDPC. Though i now see a patent application for it [3]. Not sure whether it is a registered patent, i am under an agreement of Non-Disclosure with STM. Will ask the relevant person there, whether they have it registered. (Most probably they may have it registered). There are a few people from STM on LK, if not they can possibly confirm whether the patent is regsitered or not. [1] http://www2.dac.com/data2/42nd/42acceptedpapers.nsf/0c4c09c6ffa905c487256b7b007afb72/998f93e4b29e99fa87256fc400714617/$FILE/33_1.pdf [2] http://linuxtv.org/hg/~manu/stb0899-c5/file/760cb230695c/linux/include/linux/dvb/frontend.h [3] http://www.freepatentsonline.com/20060206779.html http://www.freepatentsonline.com/20060206778.html - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [REGRESSION] tg3 dead after s2ram
From: Michael Chan [EMAIL PROTECTED] Date: Thu, 02 Aug 2007 12:10:29 -0700 [TG3]: Fix suspend/resume problem. Joachim Deguara [EMAIL PROTECTED] reported that tg3 devices would not resume properly if the device was shutdown before the system was suspended. In such scenario where the netif_running state is 0, tg3_suspend() would not save the PCI state and so the memory enable bit and bus master enable bit would be lost. We fix this by always saving and restoring the PCI state in tg3_suspend() and tg3_resume() regardless of netif_running() state. Signed-off-by: Michael Chan [EMAIL PROTECTED] Patch applied. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [BNX2]: Fix suspend/resume problem.
From: Michael Chan [EMAIL PROTECTED] Date: Fri, 03 Aug 2007 15:32:34 -0700 [BNX2]: Fix suspend/resume problem. The device would not resume properly if it was shutdown before the system was suspended. In such scenario where the netif_running state is 0, bnx2_suspend() would not save the PCI state and so the memory enable bit and bus master enable bit would be lost. We fix this by always saving and restoring the PCI state in bnx2_suspend() and bnx2_resume() regardless of netif_running() state. Update version to 1.6.4. Signed-off-by: Michael Chan [EMAIL PROTECTED] Also applied, thanks Michael. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] ixgbe: New driver for Pci-Express 10GbE 82598 support
Auke Kok wrote: This patch adds support for the Intel 82598 PCI-Express 10GbE chipset. Devices will be available on the market soon. Also available through http and git: http://foo-projects.org/~sofar/ixgbe-20070803-submission.patch http://foo-projects.org/~sofar/ixgbe-20070803-submission.patch.bz2 git://lost.foo-projects.org/~ahkok/linux-2.6#ixgbe-20070803-submission Cheers, Auke - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html