new documentation: IP_TRANSPARENT, is it correct?
Hi everyone, 10 years after lartc.org I decided to document a little bit more of Linux networking, and I hope I got it right. This email asks for your help in making sure. Recently I attempted to use IP_TRANSPARENT as outlined in https://www.kernel.org/doc/Documentation/networking/tproxy.txt but I could not figure out how it really worked from there (although I could copy paste my way to some working code). The web also mostly offered little in the way of (correct) explanation. I think I have it figured out by now, but I'm sure there are nuances I have missed. I'm especially interested in understanding _exactly_ what the IP_TRANSPARENT socket option does, because it appears somewhat arbitrary right now: "The IP_TRANSPARENT socket option enables: * Binding to addresses that are not (usually) considered local * Receiving connections and packets from iptables TPROXY redirected sessions" https://ds9a.nl/tproxy/tproxy.md.html has somewhat prettified Markdown that requires Javascript, plain Markdown is on https://github.com/ahupowerdns/tproxydoc/blob/master/tproxy.md If you could give this a read and a comment on things I got wrong, that would be most appreciated. Pointers to other relevant documentation are also very welcome. Thanks! Bert
Re: all syscalls initially taking 4usec on a P4? Re: nonblocking UDPv4 recvfrom() taking 4usec @ 3GHz?
On Tue, Feb 20, 2007 at 02:02:00PM -0800, Rick Jones wrote: > The slope appears to be flattening-out the farther out to the right it > goes. Perhaps that is the length of time it takes to take all the > requisite cache misses. The rate of flattening out appears to correlate with the number of processes running, even though the system is otherwise >99.5% idle during my measurements. With only 'gdm' running, things flatten out slowly, iow, it takes longer delays to see recvfrom slow down. With only 1 process running (init=bash), the graph is nearly flat. >From this, it is probable that even an idle GNOME desktop (Ubunty Edgy Eft) is under fierce cache pressure, enough to blow away my meagre 1MB in a matter of milliseconds. I'm trying to figure out which processes have the most impact, I had already killed anything non-essential. But that still leaves 140 pids. Bert -- http://www.PowerDNS.com Open source, database driven DNS Software http://netherlabs.nl Open and Closed source services - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: all syscalls initially taking 4usec on a P4? Re: nonblocking UDPv4 recvfrom() taking 4usec @ 3GHz?
On Tue, Feb 20, 2007 at 02:40:40PM -0500, Benjamin LaHaise wrote: > Make sure your system is idle. Userspace bloat means that *lots* of idle > activity occurs in between timer ticks on recent distributions -- all those You hit the nail on the head. I had previously measured with X shut down, but the effect didn't disappear. With init=/bin/bash, recvfrom suddenly takes from 900nsec to 1.3usec, with only slight correlation between inter-call delay and cycles spent. I'm investigating this further as it appears this has a real life effect on my P4 - a drastic one! processor : 0 vendor_id : GenuineIntel cpu family : 15 model : 4 model name : Intel(R) Pentium(R) 4 CPU 3.00GHz stepping: 1 cpu MHz : 3000.131 cache size : 1024 KB fdiv_bug: no hlt_bug : no f00f_bug: no coma_bug: no fpu : yes fpu_exception : yes cpuid level : 5 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx constant_tsc pni monitor ds_cpl cid xtpr bogomips: 6003.91 clflush size: 64 Thanks for your help! -- http://www.PowerDNS.com Open source, database driven DNS Software http://netherlabs.nl Open and Closed source services - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: all syscalls initially taking 4usec on a P4? Re: nonblocking UDPv4 recvfrom() taking 4usec @ 3GHz?
On Tue, Feb 20, 2007 at 09:48:59PM +0300, Evgeniy Polyakov wrote: > Likely first overhead related to cache population or gamma-ray radiation. > If it happens only one (it does in my test), then everything is ok I > think. Bert, how frequently you get that long recvfrom()? I have plotted the average time for a single non-blocking UPDv4 recvfrom call returning 100 bytes, based on the delay I insert between recvfrom calls, as measured in cycles spent busywaiting. In theory, this graph should show some slope, perhaps because of the higher chance of context switches, cache evictions and purging of any branche-prediction information the CPU might have kept. I'm no expert. I measure a huge slope, however. Starting at 1usec for back-to-back system calls, it rises to 2usec after interleaving calls with a count to 20 million. 4usec is hit after 110 million. The graph, with semi-scientific error-bars is on http://ds9a.nl/tmp/recvfrom-usec-vs-wait.png The code to generate it is on: http://ds9a.nl/tmp/recvtimings.c I'm investigating this further for other system calls. It might be that my measurements are off, but it appears even a slight delay between calls incurs a large penalty. Bert -- http://www.PowerDNS.com Open source, database driven DNS Software http://netherlabs.nl Open and Closed source services - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: all syscalls initially taking 4usec on a P4? Re: nonblocking UDPv4 recvfrom() taking 4usec @ 3GHz?
On Tue, Feb 20, 2007 at 07:41:25PM +0300, Evgeniy Polyakov wrote: > It can be recvfrom only problem - syscall overhead on my p4 (core duo, > debian testing) is bout 300 usec - to test I ran read('dev/zero', &data, > 0) in a loop. nsec I assume? The usec numbers for read(fd, &c, 0) where fd is /dev/zero: 1.557667, 0.627667, 0.447333, 0.44, 0.44, 0.44, 0.442333, 0.44, 0.44, 0.442333, 0.442333, 0.44, 0.44, 0.442333, 0.442667, 0.44, 0.44, 0.44, 0.442333, 0.442667, In usecs. Notice the same declining figure, but not as pronounced. With a sleep(1) in between, we get: 1.692667, 1.80, 0.782667, 1.282667, 0.665000, 0.98, 0.925000, 0.887667, 0.662667, 0.862667, 1.077333, 1.442333, 0.66, 1.89, 0.672333, 0.795000, 0.647667, 0.692333, 0.75, 0.865000, This doesn't look all that unhealthy. > Could you try to hack recvfrom() for your socket to always copy some > empty buffer and check the results without waiting for packet? That might be out of my reach before tomorrow :-) > If you are not hurry I can test it myself tomorrow. Thanks. My major problem is that in my measurements, I quite often see the 'worst case' 4usec result. It would not be a problem if it happens only once, of course. Bert -- http://www.PowerDNS.com Open source, database driven DNS Software http://netherlabs.nl Open and Closed source services - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
all syscalls initially taking 4usec on a P4? Re: nonblocking UDPv4 recvfrom() taking 4usec @ 3GHz?
On Tue, Feb 20, 2007 at 11:50:13AM +0100, Andi Kleen wrote: > P4s are pretty slow at taking locks (or rather doing atomical operations) > and there are several of them in this path. You could try it with a UP > kernel. Actually hotunplugging the other virtual CPU should be sufficient > with recent kernels. This is on a UP kernel, on a single CPU. It does have hyperthreading, but the kernel is uniprocessor, non-preempt. No frequency scaling. Linux 2.6.20-rc4, 2.6.19, 2.6.18, P4, P-M, Athlon 64. Ubunty Edgy Eft on the P4. > Also BTW RDTSC on P4 is not very accurate for small measurements > because it has a quite high overhead by itself, i would suggest > running it in a loop. I've done so, with some interesting results. Source on http://ds9a.nl/tmp/recvtimings.c - be careful to adjust the '3000' divider to your CPU frequency if you care about absolute numbers! These are two groups, each consisting of 10 consecutive nonblocking UDP recvfroms, with 10 packets preloaded. Reported is the number of microseconds per recvfrom call which yielded a packet: $ ./recvtimings 4.142333 2.237667 1.927333 1.58 1.77 1.632333 1.712667 1.685000 1.62 2.415000 1.347333 1.545000 1.492667 1.902333 1.485000 1.532667 1.46 1.517667 1.492333 1.58 This in a nearly quiet P4 - I've removed the first line: $ vmstat 1 procs ---memory-- ---swap-- -io -system-- cpu r b swpd free buff cache si sobibo in cs us sy id wa 0 0 0 290064 307036 29603600 0 0 124 58 0 0 100 0 0 0 0 289972 307036 29603600 0 4 139 95 0 0 100 0 0 0 0 289972 307036 29603600 0 0 119 55 0 0 100 0 1 0 0 289972 307036 29603600 0 0 135 71 0 0 100 0 HZ is clearly 100. If I usleep in between, timings for each recvfrom call become higher. If I sleep for a full second, I get nearly flat results: 4.25 5.317667 3.525000 4.147333 3.36 3.552667 3.087667 Various differing CPUs report more or less the same results. Now I know we have caching effects, but these effects are HUGE. Is this supposed to be the case? I'm on an up to date system, glibc 2.4. Bert -- http://www.PowerDNS.com Open source, database driven DNS Software http://netherlabs.nl Open and Closed source services - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: nonblocking UDPv4 recvfrom() taking 4usec @ 3GHz?
On Mon, Feb 19, 2007 at 03:56:23PM -0800, Stephen Hemminger wrote: > > Linux 2.6.20-rc4 appears to take 4 microseconds on my P4 3GHz for a > > non-blocking UDPv4 recvfrom() call, both on loopback and ethernet. > > > > Linux 2.6.18 on my 64 bit Athlon64 3200+ takes a similar amount of time. > > recvfrom itself is a tad worrisome, x=recvfrom. I didn't ask for the > > 'libc_enable_asynccancel' stuff. I'm trying to isolate the actual syscall > > but it is proving hard work for an assemnly newbie like me - socketcall > > doesn't make things easier. Together with Zwane Mwaikambo, we managed to isolate the pure syscall, it doesn't make a difference, a single recvfrom continues to take around 4 microseconds at 3GHz. Many thanks to Zwane for helping out. > Use oprofile to find the hotspot. Will do this next - I need to get me a setup where I can do oprofile *and* decent query rates, I don't do oprofile on my remote machines I don't have easy access to. Thanks. -- http://www.PowerDNS.com Open source, database driven DNS Software http://netherlabs.nl Open and Closed source services - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
nonblocking UDPv4 recvfrom() taking 4usec @ 3GHz?
Hi people, I'm trying to save people the cost of buying extra servers by making PowerDNS (GPL) ever faster, but I've hit a rather fundamental problem. Linux 2.6.20-rc4 appears to take 4 microseconds on my P4 3GHz for a non-blocking UDPv4 recvfrom() call, both on loopback and ethernet. Linux 2.6.18 on my 64 bit Athlon64 3200+ takes a similar amount of time. This seems like rather a lot for a 50 byte datagram, but perhaps I'm overestimating your abilities :-) The program is unthreaded, and I measure like this: #define RDTSC(qp) \ do { \ unsigned long lowPart, highPart; \ __asm__ __volatile__("rdtsc" : "=a" (lowPart), "=d" (highPart)); \ qp = (((unsigned long long) highPart) << 32) | lowPart; \ } while (0) ... uint64_t tsc1, tsc2; RDTSC(tsc1); if((len=recvfrom(fd, data, sizeof(data), 0, (sockaddr *)&fromaddr, &addrlen)) >= 0) { RDTSC(tsc2); printf("%f\n", (tsc2-tsc1)/3000.0); // 3GHz P4 } gdb generates the following dump from the actual program, x=_Z20handleNewUDPQuestioniRN5boost3anyE, I see nothing untoward happening between the two 'rdtsc' opcodes. 0x08091de0 : push %ebp 0x08091de1 : mov%esp,%ebp 0x08091de3 : push %edi 0x08091de4 : push %esi 0x08091de5 : push %ebx 0x08091de6 : sub$0x78c,%esp 0x08091dec : mov%gs:0x14,%eax 0x08091df2 : mov%eax,0xffe4(%ebp) 0x08091df5 : xor%eax,%eax 0x08091df7 : movw $0x2,0xffac(%ebp) 0x08091dfd : movl $0x0,0xffb0(%ebp) 0x08091e04 : movw $0x0,0xffae(%ebp) 0x08091e0a : movl $0x1c,0xf8f4(%ebp) 0x08091e14 : rdtsc 0x08091e16 : mov%edx,%ebx 0x08091e18 : mov0x8(%ebp),%edx 0x08091e1b : mov%eax,%esi 0x08091e1d : lea0xf8f4(%ebp),%eax 0x08091e23 : mov%eax,0x14(%esp) 0x08091e27 : lea0xffac(%ebp),%ecx 0x08091e2a : lea0xf950(%ebp),%eax 0x08091e30 : mov%ecx,0x10(%esp) 0x08091e34 : movl $0x0,0xc(%esp) 0x08091e3c : movl $0x5dc,0x8(%esp) 0x08091e44 :mov%eax,0x4(%esp) 0x08091e48 :mov%edx,(%esp) 0x08091e4b :call 0x8192110 0x08091e50 :test %eax,%eax 0x08091e52 :mov%eax,0xf8b0(%ebp) 0x08091e58 :js 0x8092168 0x08091e5e :mov%ebx,%eax 0x08091e60 :xor%edx,%edx 0x08091e62 :mov%eax,%edx 0x08091e64 :mov$0x0,%eax 0x08091e69 :mov%esi,%ecx 0x08091e6b :mov%eax,%esi 0x08091e6d :or %ecx,%esi 0x08091e6f :mov%edx,%edi 0x08091e71 :rdtsc 0x08091e73 :mov%eax,0xf8a0(%ebp) 0x08091e79 :mov0xf8a0(%ebp),%eax 0x08091e7f :mov%edx,%ecx 0x08091e81 :xor%ebx,%ebx 0x08091e83 :mov%ecx,%ebx recvfrom itself is a tad worrisome, x=recvfrom. I didn't ask for the 'libc_enable_asynccancel' stuff. I'm trying to isolate the actual syscall but it is proving hard work for an assemnly newbie like me - socketcall doesn't make things easier. 0xb7d62410 :cmpl $0x0,%gs:0xc 0xb7d62418 :jne0xb7d62439 0xb7d6241a : mov%ebx,%edx 0xb7d6241c : mov$0x66,%eax 0xb7d62421 : mov$0xc,%ebx 0xb7d62426 : lea0x4(%esp),%ecx 0xb7d6242a : call *%gs:0x10 0xb7d62431 : mov%edx,%ebx 0xb7d62433 : cmp$0xff83,%eax 0xb7d62436 : jae0xb7d62469 0xb7d62438 : ret 0xb7d62439 : push %esi 0xb7d6243a : call 0xb7d6ddd0 <__libc_enable_asynccancel> 0xb7d6243f : mov%eax,%esi 0xb7d62441 : mov%ebx,%edx 0xb7d62443 : mov$0x66,%eax 0xb7d62448 : mov$0xc,%ebx 0xb7d6244d : lea0x8(%esp),%ecx 0xb7d62451 : call *%gs:0x10 0xb7d62458 : mov%edx,%ebx 0xb7d6245a : xchg %eax,%esi 0xb7d6245b : call 0xb7d6dd90 <__libc_disable_asynccancel> 0xb7d62460 : mov%esi,%eax 0xb7d62462 : pop%esi 0xb7d62463 : cmp$0xff83,%eax 0xb7d62466 : jae0xb7d62469 0xb7d62468 : ret 0xb7d62469 : call 0xb7d998f8 <__i686.get_pc_thunk.cx> 0xb7d6246e : add$0x61b86,%ecx 0xb7d62474 : mov0xff2c(%ecx),%ecx 0xb7d6247a : xor%edx,%edx 0xb7d6247c : sub%eax,%edx 0xb7d6247e : mov%edx,%gs:(%ecx) 0xb7d62481 : or $0x,%eax 0xb7d62484 : jmp0xb7d62438 Any clues? -- http://www.PowerDNS.com Open source, database driven DNS Software http://netherlabs.nl Open and Closed source services - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: bcm43xx-softmac broken on 2.6.20-rc2
On Sun, Dec 24, 2006 at 09:51:50AM -0600, Larry Finger wrote: > This is a heads-up for anyone wishing to use bcm43xx-softmac on Linus's git > tree, which is now at > v2.6.20-rc2. There are two serious bugs in that code. Fixes are found below. For some reason your patch does not apply to stock 2.6.20-rc2, although I don't see why. Applying it by hand makes things compile though, and even fixes the problem I mentioned in my previous post: http://www.spinics.net/lists/netdev/msg21906.html Thanks! -- http://www.PowerDNS.com Open source, database driven DNS Software http://netherlabs.nl Open and Closed source services - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
fix for 2.9.20-rc2 null pointer dereference in SoftMAC? was Re: [PATCH] softmac: Fix for work struct changes
On Sun, Dec 10, 2006 at 03:37:27PM -0600, Larry Finger wrote: > casted to (void*). This compiled correctly but resulted in a > softlock, because mutex_lock was called with the wrong memory > address. The patch fixes the problem. Another issue was a wrong (quickly, between christmas dinner preparations) Does this explain the following, which happens reliably in stock 2.6.20-rc2 (in-kernel zd1211rw): Dec 24 22:07:25 localhost kernel: [ 120.238914] SoftMAC: Open Authentication completed with 00:0e:a6:16:28:a9 Dec 24 22:07:25 localhost kernel: [ 120.239005] BUG: unable to handle kernel NULL pointer dereference at virtual address 0006 Dec 24 22:07:25 localhost kernel: [ 120.239132] printing eip: Dec 24 22:07:25 localhost kernel: [ 120.239191] c04cf8c5 Dec 24 22:07:25 localhost kernel: [ 120.239249] *pde = Dec 24 22:07:25 localhost kernel: [ 120.239308] Oops: 0002 [#1] Dec 24 22:07:25 localhost kernel: [ 120.239367] Modules linked in: capability commoncap cpufreq_userspace cpufreq_stats cpufreq_powersave cpufreq_ondemand freq_table cpufreq_conservative zd1211rw ieee80211softmac usbhid ieee80211 ieee80211_crypt psmouse Dec 24 22:07:25 localhost kernel: [ 120.239850] CPU:0 Dec 24 22:07:25 localhost kernel: [ 120.239851] EIP: 0060:[__mutex_lock_slowpath+30/89]Not tainted VLI Dec 24 22:07:25 localhost kernel: [ 120.239853] EFLAGS: 00010286 (2.6.20-rc2 #7) Dec 24 22:07:25 localhost kernel: [ 120.240043] EIP is at __mutex_lock_slowpath+0x1e/0x59 Dec 24 22:07:25 localhost kernel: [ 120.240106] eax: f5b449e0 ebx: f5b449dc ecx: 0006 edx: 0004 Dec 24 22:07:25 localhost kernel: [ 120.240173] esi: c19005a0 edi: f5b44a40 ebp: f8862ce8 esp: c1909ec0 Dec 24 22:07:25 localhost kernel: [ 120.240241] ds: 007b es: 007b ss: 0068 Dec 24 22:07:25 localhost kernel: [ 120.240305] Process events/0 (pid: 4, ti=c1908000 task=c19005a0 task.ti=c1908000) Dec 24 22:07:25 localhost kernel: [ 120.240372] Stack: f5b449e0 0006 0020 f5b449a0 f5b44a40 c04cf7d8 f8862943 f72b8500 Dec 24 22:07:25 localhost kernel: [ 120.240676]0286 f5b44314 f5b449dc f5b44a40 0001 f5e6c9c0 f5e6c9c0 Dec 24 22:07:25 localhost kernel: [ 120.240981] f5b44a40 f8862ce8 f8862d50 0004 00100100 00200200 0004 Dec 24 22:07:25 localhost kernel: [ 120.241284] Call Trace: Dec 24 22:07:25 localhost kernel: [ 120.241399] [mutex_lock+9/10] mutex_lock+0x9/0xa Dec 24 22:07:25 localhost kernel: [ 120.241485] [] ieee80211softmac_assoc_work+0x1b/0x3c0 [ieee80211softmac] Dec 24 22:07:25 localhost kernel: [ 120.241614] [] ieee80211softmac_assoc_notify_auth+0x0/0x1e [ieee80211softmac] Dec 24 22:07:25 localhost kernel: [ 120.241741] [] ieee80211softmac_notify_callback+0x40/0x48 [ieee80211softmac] Dec 24 22:07:25 localhost kernel: [ 120.241866] [] ieee80211softmac_notify_callback+0x0/0x48 [ieee80211softmac] Dec 24 22:07:25 localhost kernel: [ 120.241992] [] ieee80211softmac_assoc_notify_auth+0x0/0x1e [ieee80211softmac] Dec 24 22:07:25 localhost kernel: [ 120.242118] [] ieee80211softmac_notify_callback+0x0/0x48 [ieee80211softmac] Dec 24 22:07:25 localhost kernel: [ 120.242243] [run_workqueue+139/311] run_workqueue+0x8b/0x137 Dec 24 22:07:25 localhost kernel: [ 120.242336] [worker_thread+0/302] worker_thread+0x0/0x12e Dec 24 22:07:25 localhost kernel: [ 120.242422] [worker_thread+261/302] worker_thread+0x105/0x12e Dec 24 22:07:25 localhost kernel: [ 120.242509] [default_wake_function+0/12] default_wake_function+0x0/0xc Dec 24 22:07:25 localhost kernel: [ 120.242596] [kthread+155/191] kthread+0x9b/0xbf Dec 24 22:07:25 localhost kernel: [ 120.242682] [kthread+0/191] kthread+0x0/0xbf Dec 24 22:07:25 localhost kernel: [ 120.242767] [kernel_thread_helper+7/16] kernel_thread_helper+0x7/0x10 Dec 24 22:07:25 localhost kernel: [ 120.242856] === Dec 24 22:07:25 localhost kernel: [ 120.242915] Code: 00 00 00 31 d2 89 d0 83 c4 0c 5b 5e c3 56 53 83 ec 0c 89 c3 65 8b 35 08 00 00 00 8d 40 04 8b 48 04 89 60 04 89 04 24 89 4c 24 04 <89> 21 89 74 24 08 83 c8 ff 87 03 48 74 0d c7 06 02 00 00 00 e8 Dec 24 22:07:25 localhost kernel: [ 120.244531] EIP: [__mutex_lock_slowpath+30/89] __mutex_lock_slowpath+0x1e/0x59 SS:ESP 0068:c1909ec0 This happens after starting wpa_supplicant on a zd1211rw device. -- http://www.PowerDNS.com Open source, database driven DNS Software http://netherlabs.nl Open and Closed source services - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: sendmsg, descriptors and no content
On Tue, Oct 31, 2006 at 11:01:01PM +0100, [EMAIL PROTECTED] wrote: > When I use sendmsg to send descriptors from one process to another using > unix-sockets I need to include at least one byte of normal data for the > descriptors to be send (using the iovec structure). The same code worked W. R. Stevens, Unix Network Programming (2nd ed), vol 1, p. 389 recommends sending at least a byte anyhow, which allows you to detect EOF. Also see http://www.cs-ipv6.lancs.ac.uk/ipv6/mail-archive/LinuxNetdev/1998-03/0144.html however. I've attached an example that appears to work. Bert #include #include int sfd(int passfd, int fd, int data) { char cbuf[CMSG_SPACE(sizeof(int))]; struct msghdr mh = { 0 }; struct cmsghdr *cm; int *dp; struct iovec iov; if (fd >= 0) { mh.msg_control = cbuf; mh.msg_controllen = sizeof cbuf; cm = CMSG_FIRSTHDR(&mh); cm->cmsg_len = CMSG_LEN(sizeof(int)); cm->cmsg_level = SOL_SOCKET; cm->cmsg_type = SCM_RIGHTS; dp = CMSG_DATA(cm); *dp = fd; } if (data != 0) { iov.iov_base = &data; iov.iov_len = sizeof data; mh.msg_iov = &iov; mh.msg_iovlen = 1; } return sendmsg(passfd, &mh, 0); } /* Only prepared to rcv one fd per message */ int rcvfd(int passfd, int *data, int *datalen) { char cbuf[CMSG_SPACE(sizeof(int))]; struct msghdr mh = { 0 }; struct cmsghdr *cm; int *dp, ret; struct iovec iov; if (data) { mh.msg_iov = &iov; mh.msg_iovlen = 1; iov.iov_base = &data; iov.iov_len = sizeof(int); } mh.msg_control = cbuf; mh.msg_controllen = sizeof cbuf; cm = CMSG_FIRSTHDR(&mh); cm->cmsg_len = CMSG_LEN(sizeof(int)); cm->cmsg_level = SOL_SOCKET; cm->cmsg_type = SCM_RIGHTS; *datalen = 0; ret = recvmsg(passfd, &mh, 0); if (ret < 0) return ret; if (datalen) *datalen = ret; dp = CMSG_DATA(cm); return *dp; } int main() { int fd[2]; int datalen; socketpair(AF_UNIX, SOCK_DGRAM, 0, fd); printf("Sending returned status: %d\n", sfd(fd[0], 0, 1)); printf("Received fd: %d\n", rcvfd(fd[1], 0, &datalen)); } -- http://www.PowerDNS.com Open source, database driven DNS Software http://netherlabs.nl Open and Closed source services - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Announce] Netchannels ported to the latest git tree. Gigabit benchmark. Complete rout.
On Thu, Oct 26, 2006 at 02:51:51PM +0400, Evgeniy Polyakov wrote: > Benchmark uses 128 bytes sending/receiving per syscall (no latency > checks, only throughput. > Receiving CPU usage is 3 times less (90% socket code vs. 30% > Sending CPU usage is 5 times less (upto 50% vs. upto 10%). Wow. I currently lack the hardware to reproduce your measurements, do you have any idea of how these numbers would be with 1024 byte "system calls"? Thanks. -- http://www.PowerDNS.com Open source, database driven DNS Software http://netherlabs.nl Open and Closed source services - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
tested: Re: [PATCH] tcp: make cubic the default
Stephen, I've applied both of your patches (http://marc.theaimsgroup.com/?l=linux-netdev&m=115878447914612&w=2 and http://marc.theaimsgroup.com/?l=linux-netdev&m=115878448125216&w=2 ) and tried to break them, but it now appears to do the right thing in all cases, even when malforming the .config by hand, a 'make oldconfig' restores sanity. Reno is chosen if none of the non-scary congestion avoidance algorithms are available, and the default for when they are are as you intended. I've testbooted the resulting kernel and everything appears to work as desired, the proper TCP gets chosen, loading other ones does not change the default, but does make them available. Unloading the module containing the configured policy sets the policy to 'cubic', which is probably the next entry in the policy list. All in all, this final iteration of the congestion selection patches appears to do the job! Davem, I'd recommend both patches for merging. Bert On Wed, Sep 20, 2006 at 01:32:58PM -0700, Stephen Hemminger wrote: > Change default congestion control used from BIC to the newer CUBIC > which it the successor to BIC but has better properties over long delay links. > > Signed-off-by: Stephen Hemminger <[EMAIL PROTECTED]> > > --- > net/ipv4/Kconfig | 12 ++-- > 1 file changed, 6 insertions(+), 6 deletions(-) > > --- net-test.orig/net/ipv4/Kconfig2006-09-20 12:22:06.0 -0700 > +++ net-test/net/ipv4/Kconfig 2006-09-20 13:31:21.0 -0700 > @@ -454,7 +454,7 @@ > modules. > > Nearly all users can safely say no here, and a safe default > - selection will be made (BIC-TCP with new Reno as a fallback). > + selection will be made (CUBIC with new Reno as a fallback). > > If unsure, say N. > > @@ -462,7 +462,7 @@ > > config TCP_CONG_BIC > tristate "Binary Increase Congestion (BIC) control" > - default y > + default m > ---help--- > BIC-TCP is a sender-side only change that ensures a linear RTT > fairness under large windows while offering both scalability and > @@ -476,7 +476,7 @@ > > config TCP_CONG_CUBIC > tristate "CUBIC TCP" > - default m > + default y > ---help--- > This is version 2.0 of BIC-TCP which uses a cubic growth function > among other techniques. > @@ -573,7 +573,7 @@ > > choice > prompt "Default TCP congestion control" > - default DEFAULT_BIC > + default DEFAULT_CUBIC > help > Select the TCP congestion control that will be used by default > for all connections. > @@ -600,7 +600,7 @@ > > endif > > -config TCP_CONG_BIC > +config TCP_CONG_CUBIC > tristate > depends on !TCP_CONG_ADVANCED > default y > @@ -613,7 +613,7 @@ > default "vegas" if DEFAULT_VEGAS > default "westwood" if DEFAULT_WESTWOOD > default "reno" if DEFAULT_RENO > - default "bic" > + default "cubic" > > source "net/ipv4/ipvs/Kconfig" > > - > To unsubscribe from this list: send the line "unsubscribe netdev" in > the body of a message to [EMAIL PROTECTED] > More majordomo info at http://vger.kernel.org/majordomo-info.html > > > !DSPAM:4511a594269391527717022! > -- http://www.PowerDNS.com Open source, database driven DNS Software http://netherlabs.nl Open and Closed source services - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] tcp: simpler bic default
On Tue, Sep 19, 2006 at 04:23:55PM -0700, Stephen Hemminger wrote: > Okay, build testing all the possibilities now, answer by morning.. Please boot some of them as well - I can see a kernel that really wants to load "bic" at boot time but can't find it. Bert -- http://www.PowerDNS.com Open source, database driven DNS Software http://netherlabs.nl Open and Closed source services - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] tcp: simpler bic default
On Tue, Sep 19, 2006 at 02:32:09PM -0700, Stephen Hemminger wrote: > Signed-off-by: Stephen Hemminger <[EMAIL PROTECTED]> # CONFIG_TCP_CONG_ADVANCED is not set # CONFIG_DEFAULT_BIC is not set # CONFIG_DEFAULT_CUBIC is not set # CONFIG_DEFAULT_HTCP is not set # CONFIG_DEFAULT_VEGAS is not set # CONFIG_DEFAULT_WESTWOOD is not set # CONFIG_DEFAULT_RENO is not set CONFIG_DEFAULT_TCP_CONG="bic" There is no "bic" in the kernel now - will this do the right thing? -- http://www.PowerDNS.com Open source, database driven DNS Software http://netherlabs.nl Open and Closed source services - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] tcp: set congestion default through Kconfig (v2)
On Tue, Sep 19, 2006 at 02:20:07PM -0700, David Miller wrote: > > Bert's attempt was noble > > It showed your desire for the truth It was also crap :-) > Applied, but... > net/ipv4/Kconfig:607:warning: defaults for choice values not supported It does appear to do the right thing in all cases I throw against it, but this warning is sure to generate noise. It probably means CONFIG_BIC is both available in the menu, and set as a separate option, and that this does not set the menu to CONFIG_BIC. This is not a big deal however as the menu is not shown at all whenever line 607 applies. -- http://www.PowerDNS.com Open source, database driven DNS Software http://netherlabs.nl Open and Closed source services - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: tcp congestion policy selection link order fragile
On Mon, Sep 18, 2006 at 11:53:09AM -0700, David Miller wrote: > > What would the desired default be, 'BIC' in all cases? > > And if BIC is not enabled in the configuration, then what? As the source notes "/* we'll always have reno */ ". This would make the policy: the default is "bic" if available, otherwise it is "reno", which is *always* available. But it is all up to you. I'm willing to do the leg work. Bert -- http://www.PowerDNS.com Open source, database driven DNS Software http://netherlabs.nl Open and Closed source services - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: tcp congestion policy selection link order fragile
On Mon, Sep 18, 2006 at 07:06:00AM -0700, David Miller wrote: > Any ordering scheme is wrong or unexpected for _somebody_. Look how I agree violently. Would you agree that it would be best to have a mechanism that explicitly sets a sane default, and does not rely on ordering? My implementation indeed broke your intentions, but would you be open to revamping things so the default policy is not dependent on load order? What would the desired default be, 'BIC' in all cases? Thanks. -- http://www.PowerDNS.com Open source, database driven DNS Software http://netherlabs.nl Open and Closed source services - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: tcp congestion policy selection link order fragile
On Mon, Sep 18, 2006 at 01:51:30AM -0700, David Miller wrote: > We created TCP_CONG_ADVANCED for a purpose. If you turn that > thing on, you get full control but if something breaks you get > to keep the pieces. But we should not try to break stuff on purpose, no matter how advanced. It makes zero sense. To reiterate, when compiling in multiple TCP policies, a *random* one gets enabled. This is not something we want to offer even advanced users. It is a kernel, not an adventure course. Please consider this near-oneliner patch which makes stuff behave more like people expect: loading a module, or compiling in a congestion avoidance policy only makes it available, but does not turn it on by default. It also cleans up two notices a bit. I've tested this patch and it does the job for me, reno is now the default, even when more advanced options are compiled in, but the rest is still available. When in doubt, consider that I discovered this because my kernel was crashing, and that this is bound to generate heaps of annoying email otherwise. Thanks. Signed-off-by: bert hubert <[EMAIL PROTECTED]> --- linux-2.6.18-rc7/net/ipv4/tcp_cong.c.org2006-09-18 11:42:25.0 +0200 +++ linux-2.6.18-rc7/net/ipv4/tcp_cong.c2006-09-18 11:43:45.0 +0200 @@ -45,11 +45,11 @@ spin_lock(&tcp_cong_list_lock); if (tcp_ca_find(ca->name)) { - printk(KERN_NOTICE "TCP %s already registered\n", ca->name); + printk(KERN_NOTICE "TCP congestion control '%s' already registered\n", ca->name); ret = -EEXIST; } else { - list_add_rcu(&ca->list, &tcp_cong_list); - printk(KERN_INFO "TCP %s registered\n", ca->name); + list_add_tail_rcu(&ca->list, &tcp_cong_list); + printk(KERN_INFO "TCP congestion control '%s' registered\n", ca->name); } spin_unlock(&tcp_cong_list_lock); -- http://www.PowerDNS.com Open source, database driven DNS Software http://netherlabs.nl Open and Closed source services - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: tcp congestion policy selection link order fragile
The original message Stephen reacts to below apparently never made it to the list, it can be found here: http://ds9a.nl/tmp/module-policy.txt > Any body who builds in random stuff without thinking is being foolish. > But, if you can think of a better configuration method that isn't too > grotty, then go for it. The method I'm proposing is simple enough: 1) reno is always built-in 2) it is the default tcp congestion policy 3) loading/compiling-in additional tcp congestion policies only make them available 4) userspace is free to select a non-default tcp congestion policy at will The implementation might be as simple as making the *first* registered congestion policy the default (instead of the last one) which would be reno, as it is in tcp_cong.o, which is probably always loaded first (as the other .o's need symbols that are in tcp_cong.o). Despite what you allege about my foolishness, I maintain that a kernel that enables a *random policy* from the ones you compiled in, is not a sane kernel. The default kernel should be as sane as possible, allowing the userspace people (ie, me) to mess things up to their heart's desire. Bert -- http://www.PowerDNS.com Open source, database driven DNS Software http://netherlabs.nl Open and Closed source services - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.18-rc[67] crashes in TCP ack handling
On Sun, Sep 17, 2006 at 08:32:14AM +0900, Stephen Hemminger wrote: > By building all the possiblities into the kernel, ie. not as modules > you get the last one registered. TCP LP is probably the worst one > to use, because it is designed for bulk low priority applications. > It also is one of the newest least tested. Right now, I would rate Hehe, this seems to be a bad default configuration policy then. People generally don't assume that if the kernel offers 10 policies that the most unstable will be used by default if you compile them all in. I've attached a patch that reorders the choices per your suggested order, so people are most likely to get a sane default. I've tried to make "reno" the default, no matter what you compiled in, but it didn't work. The linker probably reorders tcp_cong.o in early. > Without a back trace, it will be hard to find the bug in TCP LP Indeed. Many thanks for your quick answer Stephen! -- http://www.PowerDNS.com Open source, database driven DNS Software http://netherlabs.nl Open and Closed source services --- linux-2.6.18/net/ipv4/Makefile~ 2006-09-17 11:48:33.0 +0200 +++ linux-2.6.18/net/ipv4/Makefile 2006-09-17 11:48:45.0 +0200 @@ -7,7 +7,7 @@ ip_output.o ip_sockglue.o inet_hashtables.o \ inet_timewait_sock.o inet_connection_sock.o \ tcp.o tcp_input.o tcp_output.o tcp_timer.o tcp_ipv4.o \ -tcp_minisocks.o tcp_cong.o \ +tcp_minisocks.o \ datagram.o raw.o udp.o arp.o icmp.o devinet.o af_inet.o igmp.o \ sysctl_net_ipv4.o fib_frontend.o fib_semantics.o @@ -37,16 +37,20 @@ obj-$(CONFIG_IP_ROUTE_MULTIPATH_CACHED) += multipath.o obj-$(CONFIG_INET_TCP_DIAG) += tcp_diag.o obj-$(CONFIG_NET_TCPPROBE) += tcp_probe.o -obj-$(CONFIG_TCP_CONG_BIC) += tcp_bic.o -obj-$(CONFIG_TCP_CONG_CUBIC) += tcp_cubic.o -obj-$(CONFIG_TCP_CONG_WESTWOOD) += tcp_westwood.o + +obj-$(CONFIG_TCP_CONG_LP) += tcp_lp.o +obj-$(CONFIG_TCP_CONG_VENO) += tcp_veno.o +obj-$(CONFIG_TCP_CONG_VEGAS) += tcp_vegas.o +obj-$(CONFIG_TCP_CONG_SCALABLE) += tcp_scalable.o obj-$(CONFIG_TCP_CONG_HSTCP) += tcp_highspeed.o obj-$(CONFIG_TCP_CONG_HYBLA) += tcp_hybla.o +obj-$(CONFIG_TCP_CONG_CUBIC) += tcp_cubic.o +obj-$(CONFIG_TCP_CONG_WESTWOOD) += tcp_westwood.o obj-$(CONFIG_TCP_CONG_HTCP) += tcp_htcp.o -obj-$(CONFIG_TCP_CONG_VEGAS) += tcp_vegas.o -obj-$(CONFIG_TCP_CONG_VENO) += tcp_veno.o -obj-$(CONFIG_TCP_CONG_SCALABLE) += tcp_scalable.o -obj-$(CONFIG_TCP_CONG_LP) += tcp_lp.o +obj-$(CONFIG_TCP_CONG_BIC) += tcp_bic.o + +# make sure the built in congestion scheme is the default +obj-y += tcp_cong.o obj-$(CONFIG_XFRM) += xfrm4_policy.o xfrm4_state.o xfrm4_input.o \ xfrm4_output.o
2.6.18-rc[67] crashes in TCP ack handling
The bad news is that I haven't yet been able to capture traces. Once every three days or so I get a crash of 2.6.18-rc[67] which *probably* end in tcp_ack(), but I don't have the exact dump. My .config is indeed heavy on TCP congestion stuff: $ zcat /proc/config.gz | grep -i tcp CONFIG_INET_TCP_DIAG=y CONFIG_TCP_CONG_ADVANCED=y # TCP congestion control CONFIG_TCP_CONG_BIC=y CONFIG_TCP_CONG_CUBIC=y CONFIG_TCP_CONG_WESTWOOD=y CONFIG_TCP_CONG_HTCP=y CONFIG_TCP_CONG_HSTCP=y CONFIG_TCP_CONG_HYBLA=y CONFIG_TCP_CONG_VEGAS=y CONFIG_TCP_CONG_SCALABLE=y CONFIG_TCP_CONG_LP=y CONFIG_TCP_CONG_VENO=y CONFIG_IP_VS_PROTO_TCP=y CONFIG_NETFILTER_XT_MATCH_TCPMSS=m CONFIG_IP_NF_TARGET_TCPMSS=y # CONFIG_NET_TCPPROBE is not set # CONFIG_ISCSI_TCP is not set # CONFIG_NFSD_TCP is not set However, I haven't specifically configured anything. $ dmesg | grep -i tcp [ 33.106317] TCP established hash table entries: 131072 (order: 8, 1048576 bytes) [ 33.107086] TCP bind hash table entries: 65536 (order: 7, 524288 bytes) [ 33.107476] TCP: Hash tables configured (established 131072 bind 65536) [ 33.107605] TCP reno registered [ 40.985770] IPVS: Registered protocols (TCP, UDP, AH, ESP) [ 41.105710] TCP bic registered [ 41.105833] TCP cubic registered [ 41.105957] TCP westwood registered [ 41.106080] TCP highspeed registered [ 41.106203] TCP hybla registered [ 41.106328] TCP htcp registered [ 41.106452] TCP vegas registered [ 41.106574] TCP veno registered [ 41.106698] TCP scalable registered [ 41.106822] TCP lp registered $ cat ipv4/tcp_congestion_control lp I hope to follow up this message with the actual backtrace, but this is already an heads up. Sorry for not yet being able to be more specific. bert -- http://www.PowerDNS.com Open source, database driven DNS Software http://netherlabs.nl Open and Closed source services - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.18-rc6 memory mapped pcap truncates outgoing TCP packets, but not icmp
> It appears to be intentionally, but I don't see a reason for it. > Can you try if this patch makes it work as expected? > [PACKET]: Don't truncate non-linear skbs with mmaped IO > > Non-linear skbs are truncated to their linear part with mmaped IO. > Fix by using skb_copy_bits instead of memcpy. Works very well for me! I hope this can make it into 2.6.18. Thanks everybody. -- http://www.PowerDNS.com Open source, database driven DNS Software http://netherlabs.nl Open and Closed source services - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.18-rc6 memory mapped pcap truncates outgoing TCP packets, but not icmp
On Wed, Sep 13, 2006 at 08:44:21PM +0200, Patrick McHardy wrote: > Are you using TSO on the outgoing device? If so please try to log the > packet using iptables to see if it really is a TSO packet. Good catch! I turned off TSO and things are working fine again. Is this a known problem, should it be documented or fixed? I'm more than willing to write up some warnings should this be a good idea. Thanks Patrick! I can do without TSO but not without mmapped pcap! Bert -- http://www.PowerDNS.com Open source, database driven DNS Software http://netherlabs.nl Open and Closed source services - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
2.6.18-rc6 memory mapped pcap truncates outgoing TCP packets, but not icmp
Hi people, I like to use memory mapped pcap (PACKET_MMAP) since off the shelf, linux is a tad prone to drop packets while capturing these days. It used to be lots better at it, but right now memory mapped pcap is the only way to get things working a bit. I've noticed this on many machines. However, memory mapped pcap has started to truncate outgoing packets for me recently, and interestingly, I only see this with locally generated TCP packets, not with locally generated ICMP packets. I haven't yet tried UDP, nor actual sniffing, this is all locally generated packets going out on eth0. Incoming packets are not truncated. My commandline: # PCAP_VERBOSE=1 PCAP_FRAMES=15000 tcpdump -i eth0 -s 1600 -p -w test-dump libpcap version: 0.9 Kernel filter, Protocol 0300, MMAP mode (12188 frames, snapshot 1600), socket type: Raw tcpdump: listening on eth0, link-type EN10MB (Ethernet), capture size 1600 bytes Within this dump we find the following outgoing TCP packet: Frame 289 (553 bytes on wire, 66 bytes captured) Arrival Time: Sep 13, 2006 13:18:31.96025 Time delta from previous packet: 0.72000 seconds Time since reference or first frame: 42.738582000 seconds Frame Number: 289 Packet Length: 553 bytes Capture Length: 66 bytes Protocols in frame: eth:ip:tcp Type: IP (0x0800) Internet Protocol, Src: 10.0.3.146 (10.0.3.146), Dst: 82.165.25.125 (82.165.25.125) Which is truncated! However, we also find this incoming packet: Frame 290 (1508 bytes on wire, 1508 bytes captured) Arrival Time: Sep 13, 2006 13:18:32.036536000 Time delta from previous packet: 0.076286000 seconds Time since reference or first frame: 42.814868000 seconds Frame Number: 290 Packet Length: 1508 bytes Capture Length: 1508 bytes Protocols in frame: eth:ip:tcp:http Internet Protocol, Src: 82.165.25.125 (82.165.25.125), Dst: 10.0.3.146 (10.0.3.146) Which looks just fine. Downgrading to 'normal' mode fixes this problem, but suffers from too much packet loss to be useful. My tcpdump is built with: http://public.lanl.gov/cpw/libpcap-0.9.20060417.tar.gz It used to work just fine, but I haven't been able to find when it broke. Please let me know how I can help solve this! Bert -- http://www.PowerDNS.com Open source, database driven DNS Software http://netherlabs.nl Open and Closed source services - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Van Jacobson's net channels and real-time
On Thu, Apr 20, 2006 at 12:09:55PM -0700, David S. Miller wrote: > Going all the way to the socket is a large endeavor and will require a > lot of restructuring to do it right, so expect this to take on the > order of months. That's what you said about Niagara too :-) Good luck! -- http://www.PowerDNS.com Open source, database driven DNS Software http://netherlabs.nl Open and Closed source services - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: skb diet
On Sat, Apr 15, 2006 at 09:22:01PM +0200, Andi Kleen wrote: > And optimizing for uncommon cases (not TCP) doesn't seem too useful. There are servers that do tens of megabits of UDP these days (think VoIP, or in my case, DNS), so it not that uncommon. Bert -- http://www.PowerDNS.com Open source, database driven DNS Software http://netherlabs.nl Open and Closed source services - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: gcc -Os causes: Re: ip route add default: network unreachable? 2.6.15
On Wed, Jan 04, 2006 at 03:46:21PM -0800, David S. Miller wrote: > > Now verifying if this is fixed in gcc 4.0.2. Plain, non-Ubuntu prerelease, gcc 4.0.2 does not exhibit this problem, even with -Os. Problem solved. -- http://www.PowerDNS.com Open source, database driven DNS Software http://netherlabs.nl Open and Closed source services - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
gcc -Os causes: Re: ip route add default: network unreachable? 2.6.15
On Wed, Jan 04, 2006 at 11:36:33PM +0100, bert hubert wrote: > $ sudo ip route re default via 10.0.0.12 > RTNETLINK answers: Network is unreachable This all goes away on removing CONFIG_CC_OPTIMIZE_FOR_SIZE in the kernel .config with the gcc prerelease Ubunty Breezy ships. Now verifying if this is fixed in gcc 4.0.2. See http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=322723 for more details. I hope to pin down a culprit. -- http://www.PowerDNS.com Open source, database driven DNS Software http://netherlabs.nl Open and Closed source services - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: udp source port randomization?
On Mon, Aug 15, 2005 at 01:43:23PM -0700, David S. Miller wrote: > But that's still going to be 48-bits less protection than > TCP gives you. TCP has a sequence number (32-bits) and > a timestamp (another 32-bits) as well as the saddr/daddr/ > sport/dport 48-bit tuple. I hate it as well hehe. A Large DNS Market Power recently experimented with forcing DNS over TCP, it was about as much fun as turning on ECN was, and they've since backed off. I'm looking into SCTP for DNS, but that is really future material. > UDP only has saddr/daddr/sport/dport, and that's it. > Even your 16-bit key in the user component doesn't help > much at all. It does help 16 bits :-) Better than nothing. > I don't know... if someone wants to look into the implementation > and it doesn't look too complicated, I'll probably accept the > patch, but there's no way I'm wasting my time working on this :-) Ok, I'll see what I can whip up. Thanks. -- http://www.PowerDNS.com Open source, database driven DNS Software http://netherlabs.nl Open and Closed source services - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: udp source port randomization?
Dave, Thanks for the prompt reply, much appreciated. On Mon, Aug 15, 2005 at 01:25:10PM -0700, David S. Miller wrote: > UDP does not have the same kind of vulnerability from port > number guessing. In fact, UDP is extremely vulnerable for Yes it does. Nameservers also need to send outgoing packets. The DNS 'keyspace' for response spoofing is a sad 16 bits, there are two bytes available in the DNS packet. By randomising the source port, another 16 bits are added to this keyspace. More importantly, there is no good way to randomize the source port from userspace, it will never be very robust. See below for sample horrible code. > Another factor influencing this is the fact that most UDP usage is of the > request/response type where the port identity only exists for those two > packets. Not if you are an nameserver doing outgoing questions, although you could conceivably waste an fd per packet, but it would still have very predictable source port numbers. > I really don't think it's worth the work to add UDP port > randomization at all. I currently need to do ugly stuff like this to get somewhat random source port numbers: for(n=0; n<10; n++) { sin.sin_port = htons(1+(random()%5)); if(bind(d_sock, (struct sockaddr *)&sin, sizeof(sin)) >= 0) break; } (.. error checking ..) Which is not very robust. Getting a random source port on a busy server might actually turn out to be a very expensive operation from userspace. Or am I missing something? Bert -- http://www.PowerDNS.com Open source, database driven DNS Software http://netherlabs.nl Open and Closed source services - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html