Re: [tcpdump-workers] single packet capture time w/pcap vs. recvfrom()
In some email I received from Brandon Stafford, sie wrote: [ Charset windows-1252 unsupported, converting... ] Hello, I'm writing a server that captures UDP packets and, after some manipulation, sends the data out the serial port. Right now, I'm using recvfrom(), but it takes 20 ms to execute for each packet captured. I know that tcpdump can capture packets much faster than 20 ms/packet on the same computer, so I know recvfrom() is running into trouble, probably because of bad checksums on the packets. What you're looking at is unlikely a recvfrom() problem but rather one of scheduling the process - and similarly just as unlikely to disappear because you use pcap vs a real socket. To say that it is would imply that a process could not deal with more than 50 UDP packets per second ... something I find hard to believe of any modern computer. Darren - This is the tcpdump-workers list. Visit https://lists.sandelman.ca/ to unsubscribe.
Re: [tcpdump-workers] single packet capture time w/pcap vs. recvfrom()
Brandon, For curiousity sake (I have a simular app and am seriously interested in performance). - What platform (OS/processor) are you on, - How did you measure the time to call recvfrom(), or perhaps even a more relevant question is how do you use recvfrom() (whats the surrounding code like, do you select() first, etc..)? I ask because I'm seeing substantially better numbers for recvfrom (0.021ms), although granted it is for a fairly short message but that doesn't explain the 1000X performance delta. Recvfrom itself is relatively cheap, select() is VERY expensive in comparison ( 100K clock cycles versus 23M clock cycles for select, or ~0.021ms for recv from versus ~9.7ms on a 2.4Ghz Xenon for select). Note also that this is for an empty select set with 0 timeout, so that is JUST the calling overhead, if its checking fd's it will be greater. If your on a slower platform (embedded perhaps given your company?) the numbers will be obviously be worse, recvfrom seems to scale almost directly proportional to clock speed. Based on the numbers I've seen I would expect recvfrom to be at LEAST as fast if not faster than libpcap since libpcap (often) uses select() (on some platforms :) to check if the capture device has data ready. Libpcap will benefit somewhat from its ability to bundle multiple packets into a buffer (on platforms that support that), but not I suspect as much to make up a ~250X performance delta. I'm also going to agree with Guy. If you have checksum problems and are still seeing the packets you are doing something I would seriously like to find out how you accomplished. More likely you shouldn't see the packets at all. Below is a short sample of results and a short overview of my methodology. This is on a single CPU 2.4Ghz Xenon running RH 9.0 with a stock 2.4.20 kernel. rdtsc: 1009667286648918 - 1009667286584434 = 64484 ( / one_second) = 0.21 size 36 rdtsc: 1009679499010456 - 1009679498987684 = 22772 ( / one_second) = 0.07 size 36 rdtsc: 1009691711003018 - 1009691710985706 = 17312 ( / one_second) = 0.06 size 36 rdtsc: 1009703923467532 - 1009703923448944 = 18588 ( / one_second) = 0.06 size 36 rdtsc: 1009716135791420 - 1009716135773620 = 17800 ( / one_second) = 0.06 size 36 rdtsc: 1009752772553556 - 1009752772447452 = 106104 ( / one_second) = 0.35 size 36 rdtsc: 1009764984789778 - 1009764984748314 = 41464 ( / one_second) = 0.14 size 36 rdtsc: 1009777197311290 - 1009777197270442 = 40848 ( / one_second) = 0.13 size 36 rdtsc: 1009789410262486 - 1009789410220774 = 41712 ( / one_second) = 0.14 size 36 rdtsc: 1009801622233478 - 1009801622192834 = 40644 ( / one_second) = 0.13 size 36 rdtsc: 1009813840971578 - 1009813840931126 = 40452 ( / one_second) = 0.13 size 36 rdtsc: 1009826046989322 - 1009826046959894 = 29428 ( / one_second) = 0.10 size 36 rdtsc: 1009838259554966 - 1009838259526818 = 28148 ( / one_second) = 0.09 size 36 rdtsc: 1009850472114698 - 1009850472085270 = 29428 ( / one_second) = 0.10 size 36 Here is a code snippet that shows how I do my timings: while(1) { // We wouldn't actually use select in the real app // we're using it here to make sure we're timing the // call the recvfrom for a live socket instead of // counting the time recvfrom blocks waiting for a packet FD_ZERO(rfds); FD_SET(recv_s, rfds); select(recv_s +1, rfds, NULL, NULL, NULL); // time the actual recvfrom call rdtsc_ret[index] = rdtsc(); index = !index; size = recvfrom(recv_s, buf, sizeof(buf), 0, NUL, NUL); rdtsc_ret[index] = rdtsc(); printf(rdtsc: %lld - %lld = %lld ( / one_second) = %f size %d\n, rdtsc_ret[index], rdtsc_ret[!index], rdtsc_ret[index] - rdtsc_ret[!index], (double)(rdtsc_ret[index] - rdtsc_ret[!index])/(double) one_second, size); } rdtsc is defined as an assembly snippet that reads the processor clock register on I386 achitectures. Other architectures are obviously different. The overhead for calling {rdtsc(); index = !index; rdtsc(); } is 84-96 clock cycles so I just ignored it for this since its way less than the noise. extern __inline__ unsigned long long int rdtsc() { unsigned long long int x; __asm__ volatile (.byte 0x0f, 0x31 : =A (x)); return x; } One second is defined as the real processor clock speed in HZ. You need to figure this out (dmesg, on linux cat /proc/cpuinfo, etc??) double one_second = 3050.905*100; We use rdtsc because gettimeofday doesn't have enough resolution to accurately measure a single call like this (resolution on the same machine as above for gettimeofday is slightly worse than 1ms). On Sun, May 23, 2004 at 06:37:40PM -0700, Brandon Stafford wrote: Hello, I'm writing a server that captures UDP packets and, after some manipulation, sends the data out the serial port. Right now, I'm using recvfrom(), but it takes 20 ms to execute for each packet
Re: [tcpdump-workers] single packet capture time w/pcap vs. recvfrom()
Guy Harris wrote: If a received packet has a bad IP header or UDP checksum, it should get discarded at the IP or UDP layer, so that your application would *never* see it, not just see it after 20ms. I should have been more clear-- the UDP packets have checksums of 0x00, not bad checksums. I believe that this means that the checksum is ignored. This still does not explain the 20 ms execution time. If that's what you want - i.e., if the server is supposed to log all the UDP packets in question, even if they have a checksum error - then it should use libpcap. I have rewritten the application using libpcap, and it is definitely faster. The receive time now varies between 1 ms to ~12 ms. I haven't tested it very thoroughly yet, so I don't know how it does under load. Also, I'm using pcap_next() to grab one packet at a time. I could probably do better grabbing a few at once. I'll keep you posted. Thanks for the response and the software. Brandon - This is the tcpdump-workers list. Visit https://lists.sandelman.ca/ to unsubscribe.
Re: [tcpdump-workers] single packet capture time w/pcap vs. recvfrom()
Ryan Mooney wrote: Brandon, For curiousity sake (I have a simular app and am seriously interested in performance). - What platform (OS/processor) are you on, OpenBSD 3.4 on a HP Kayak XA, Pentium II, 233 MHz (ugh!) - How did you measure the time to call recvfrom(), or perhaps even a more relevant question is how do you use recvfrom() (whats the surrounding code like, do you select() first, etc..)? I am measuring execution time using gettimeofday(). I am not using select(), though I did write a version that used select() previously-- it was about the same speed, but I didn't measure it too precisely. I'll post a code snippet later tonight. I ask because I'm seeing substantially better numbers for recvfrom (0.021ms), although granted it is for a fairly short message but that doesn't explain the 1000X performance delta. 0.021 ms! This is what I need! A glimmer of hope! If your on a slower platform (embedded perhaps given your company?) the numbers will be obviously be worse, recvfrom seems to scale almost directly proportional to clock speed. The devices sending the packets are slow embedded devices, but the receiver is relatively fast. I also tried the same code on a Via Epia running at 600 MHz with no speed improvement, which is what makes me suspect that there is a timeout involved, rather than just a lot of instructions to execute. rdtsc is defined as an assembly snippet that reads the processor clock register on I386 achitectures. Other architectures are obviously different. The overhead for calling {rdtsc(); index = !index; rdtsc(); } is 84-96 clock cycles so I just ignored it for this since its way less than the noise. extern __inline__ unsigned long long int rdtsc() { unsigned long long int x; __asm__ volatile (.byte 0x0f, 0x31 : =A (x)); return x; } Interesting. I'll post some code snippets with timing results later. I'll also try the same stuff on a 2 GHz Linux machine running the 2.6.3 kernel to see if OpenBSD is a dog. Thanks for the comments, Brandon - This is the tcpdump-workers list. Visit https://lists.sandelman.ca/ to unsubscribe.
Re: [tcpdump-workers] single packet capture time w/pcap vs. recvfrom()
On May 23, 2004, at 6:37 PM, Brandon Stafford wrote: I'm writing a server that captures UDP packets and, after some manipulation, sends the data out the serial port. Right now, I'm using recvfrom(), but it takes 20 ms to execute for each packet captured. I know that tcpdump can capture packets much faster than 20 ms/packet on the same computer, so I know recvfrom() is running into trouble, probably because of bad checksums on the packets. Is it a good idea to rewrite the server using pcap, It depends on the purpose of the server. If a received packet has a bad IP header or UDP checksum, it should get discarded at the IP or UDP layer, so that your application would *never* see it, not just see it after 20ms. If something out there causes the sender of the packet to retransmit it because it received no acknowledgments (because it was discarded by the IP or UDP layer, and thus not presented to a layer that would acknowledge it), with that happening after 20ms, then, if you use libpcap to read the packets, you'll see the initial bad packet *and* the retransmission. If that's what you want - i.e., if the server is supposed to log all the UDP packets in question, even if they have a checksum error - then it should use libpcap. However, if, in a case where a packet is retransmitted due to an error such as that, the server is supposed to log only the *valid* packet, then it's not clear using libpcap would provide any advantages. - This is the tcpdump-workers list. Visit https://lists.sandelman.ca/ to unsubscribe.