Re: Problem with writing fast UDP server
On Nov 21, 6:55 pm, Greg Copeland [EMAIL PROTECTED] wrote: On Nov 21, 11:05 am, Krzysztof Retel [EMAIL PROTECTED] wrote: On Nov 21, 4:48 pm, Peter Pearson [EMAIL PROTECTED] wrote: On Fri, 21 Nov 2008 08:14:19 -0800 (PST), Krzysztof Retel wrote: I am not sure what do you mean by CPU-bound? How can I find out if I run it on CPU-bound? CPU-bound is the state in which performance is limited by the availability of processor cycles. On a Unix box, you might run the top utility and look to see whether the %CPU figure indicates 100% CPU use. Alternatively, you might have a tool for plotting use of system resources. -- To email me, substitute nowhere-spamcop, invalid-net. Thanks. I run it without CPU-bound With clearer eyes, I did confirm my math above is correct. I don't have a networking reference to provide. You'll likely have some good results via Google. :) If you are not CPU bound, you are likely IO-bound. That means you computer is waiting for IO to complete - likely on the sending side. In this case, it likely means you have reached your ethernet bandwidth limits available to your computer. Since you didn't correct me when I assumed you're running 10Mb ethernet, I'll continue to assume that's a safe assumption. So, assuming you are running on 10Mb ethernet, try converting your application to use TCP. I'd bet, unless you have requirements which prevent its use, you'll suddenly have enough bandwidth (in this case, frames) to achieve your desired results. This is untested and off the top of my head but it should get you pointed in the right direction pretty quickly. Make the following changes to the server: sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM) to sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM) Make this: print Waiting for first packet to arrive..., sock.recvfrom(BUFSIZE) look like: print Waiting for first packet to arrive..., cliSock = sock.accept() Change your calls to sock.recvfrom(BUFSIZE) to cliSock.recv(BUFSIZE). Notice the change to cliSock. Keep in mind TCP is stream based, not datagram based so you may need to add additional logic to determine data boundaries for re-assemble of your data on the receiving end. There are several strategies to address that, but for now I'll gloss it over. As someone else pointed out above, change your calls to time.clock() to time.time(). On your client, make the following changes. sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM) to sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM) sock.connect( (remotehost,port) ) nbytes = sock.sendto(data, (remotehost,port)) to nbytes = sock.send(data) Now, rerun your tests on your network. I expect you'll be faster now because TCP can be pretty smart about buffering. Let's say you write 16, 90B blocks to the socket. If they are timely enough, it is possible all of those will be shipped across ethernet as a single frame. So what took 16 frames via UDP can now *potentially* be done in a single ethernet frame (assuming 1500MTU). I say potentially because the exact behaviour is OS/stack and NIC-driver specific and is often tunable to boot. Likewise, on the client end, what previously required 15 calls to recvfrom, each returning 90B, can *potentially* be completed in a single call to recv, returning 1440B. Remember, fewer frames means less protocol overhead which makes more bandwidth available to your applications. When sending 90B datagrams, you're waisting over 48% of your available bandwidth because of protocol overhead (actually a lot more because I'm not accounting for UDP headers). Because of the differences between UDP and TCP, unlike your original UDP implementation which can receive from multiple clients, the TCP implementation can only receive from a single client. If you need to receive from multiple clients concurrently, look at python's select module to take up the slack. Hopefully you'll be up and running. Please report back your findings. I'm curious as to your results. I've been out of online for a while. Anyway, clarifing few things: - I am running on IO-bound - we have 10/100/1000MB ethernet, and 10/100MB switches, routers, servers, - the MTU is default 1500 I know what you are saying regarding TCP. I was using it in another project. However this project needs to be done using UDP and can't be changed :( Was testing today multiple approaches to client. Kept one similar to the above one, rewrote one using threads and found another issue. The speed is pretty acceptable, but there is an issue with sending 1.000.000 packets per client (it does it within around 1.5min). It runs from a client machine to the server machine, both on the same network. So, when sending milion packets only around 50%-70% are send over. On the client machine it looks like all the packets where transmitted however tcpdump running on the server shows that only 50-70% went through. I
Re: Problem with writing fast UDP server
On Nov 21, 3:52 am, Jean-Paul Calderone [EMAIL PROTECTED] wrote: Start the server before the client. If you want to try this program out on POSIX, make sure you change the time.clock() calls to time.time() calls instead, otherwise the results aren't very meaningful. I gave this a try on an AMD64 3200+ running a 32 bit Linux installation. Here's the results I got on the server: Jean-Paul, thanks very much for the code snippets. I have tried this out with number of tests. Both approaches are working fine when I run them on the same machine. When I run the client (twisted version) on another machine and the udp server (non twisted) on server machine, I got some strange behaviour. The client send 255 messages and than it cause an error: socket.error: (11, 'Resource temporarily unavailable') Any idea what could be wrong? -- http://mail.python.org/mailman/listinfo/python-list
Re: Problem with writing fast UDP server
On Nov 21, 5:49 am, Greg Copeland [EMAIL PROTECTED] wrote: On Nov 20, 9:03 am, Krzysztof Retel [EMAIL PROTECTED] wrote: Hi guys, I am struggling writing fast UDP server. It has to handle around 1 UDP packets per second. I started building that with non blocking socket and threads. Unfortunately my approach does not work at all. I wrote a simple case test: client and server. The client sends 2200 packets within 0.137447118759 secs. The tcpdump received 2189 packets, which is not bad at all. But the server only handles 700 -- 870 packets, when it is non- blocking, and only 670 – 700 received with blocking sockets. The client and the server are working within the same local network and tcpdump shows pretty correct amount of packets received. I included a bit of the code of the UDP server. class PacketReceive(threading.Thread): def __init__(self, tname, socket, queue): self._tname = tname self._socket = socket self._queue = queue threading.Thread.__init__(self, name=self._tname) def run(self): print 'Started thread: ', self.getName() cnt = 1 cnt_msgs = 0 while True: try: data = self._socket.recv(512) msg = data cnt_msgs += 1 total += 1 # self._queue.put(msg) print 'thread: %s, cnt_msgs: %d' % (self.getName(), cnt_msgs) except: pass I was also using Queue, but this didn't help neither. Any idea what I am doing wrong? I was reading that Python socket modules was causing some delays with TCP server. They recomended to set up socket option for nondelays: sock.setsockopt(SOL_TCP, TCP_NODELAY, 1) . I couldn't find any similar option for UDP type sockets. Is there anything I have to change in socket options to make it working faster? Why the server can't process all incomming packets? Is there a bug in the socket layer? btw. I am using Python 2.5 on Ubuntu 8.10. Cheers K First and foremost, you are not being realistic here. Attempting to squeeze 10,000 packets per second out of 10Mb/s (assumed) Ethernet is not realistic. The maximum theoretical limit is 14,880 frames per second, and that assumes each frame is only 84 bytes per frame, making it useless for data transport. Using your numbers, each frame requires (90B + 84B) 174B, which works out to be a theoretical maximum of ~7200 frames per second. These are obviously some rough numbers but I believe you get the point. It's late here, so I'll double check my numbers tomorrow. In your case, you would not want to use TCP_NODELAY, even if you were to use TCP, as it would actually limit your throughput. UDP does not have such an option because each datagram is an ethernet frame - which is not true for TCP as TCP is a stream. In this case, use of TCP may significantly reduce the number of frames required for transport - assuming TCP_NODELAY is NOT used. If you want to increase your throughput, use larger datagrams. If you are on a reliable connection, which we can safely assume since you are currently using UDP, use of TCP without the use of TCP_NODELAY may yield better performance because of its buffering strategy. Assuming you are using 10Mb ethernet, you are nearing its frame- saturation limits. If you are using 100Mb ethernet, you'll obviously have a lot more elbow room but not nearly as much as one would hope because 100Mb is only possible when frames which are completely filled. It's been a while since I last looked at 100Mb numbers, but it's not likely most people will see numbers near its theoretical limits simply because that number has so many caveats associated with it - and small frames are its nemesis. Since you are using very small datagrams, you are wasting a lot of potential throughput. And if you have other computers on your network, the situation is made yet more difficult. Additionally, many switches and/or routes also have bandwidth limits which may or may not pose a wall for your application. And to make matters worse, you are allocating lots of buffers (4K) to send/receive 90 bytes of data, creating yet more work for your computer. Options to try: See how TCP measures up for you Attempt to place multiple data objects within a single datagram, thereby optimizing available ethernet bandwidth You didn't say if you are CPU-bound, but you are creating a tuple and appending it to a list on every datagram. You may find allocating smaller buffers and optimizing your history accounting may help if you're CPU-bound. Don't forget, localhost does not suffer from frame limits - it's basically testing your memory/bus speed If this is for local use only, considering using a different IPC mechanism - unix domain sockets or memory mapped files Greg, thanks very much for your reply. I am not sure what do
Re: Problem with writing fast UDP server
On Fri, 21 Nov 2008 08:14:19 -0800 (PST), Krzysztof Retel wrote: I am not sure what do you mean by CPU-bound? How can I find out if I run it on CPU-bound? CPU-bound is the state in which performance is limited by the availability of processor cycles. On a Unix box, you might run the top utility and look to see whether the %CPU figure indicates 100% CPU use. Alternatively, you might have a tool for plotting use of system resources. -- To email me, substitute nowhere-spamcop, invalid-net. -- http://mail.python.org/mailman/listinfo/python-list
Re: Problem with writing fast UDP server
On Nov 21, 4:48 pm, Peter Pearson [EMAIL PROTECTED] wrote: On Fri, 21 Nov 2008 08:14:19 -0800 (PST), Krzysztof Retel wrote: I am not sure what do you mean by CPU-bound? How can I find out if I run it on CPU-bound? CPU-bound is the state in which performance is limited by the availability of processor cycles. On a Unix box, you might run the top utility and look to see whether the %CPU figure indicates 100% CPU use. Alternatively, you might have a tool for plotting use of system resources. -- To email me, substitute nowhere-spamcop, invalid-net. Thanks. I run it without CPU-bound -- http://mail.python.org/mailman/listinfo/python-list
Re: Problem with writing fast UDP server
On Nov 21, 11:05 am, Krzysztof Retel [EMAIL PROTECTED] wrote: On Nov 21, 4:48 pm, Peter Pearson [EMAIL PROTECTED] wrote: On Fri, 21 Nov 2008 08:14:19 -0800 (PST), Krzysztof Retel wrote: I am not sure what do you mean by CPU-bound? How can I find out if I run it on CPU-bound? CPU-bound is the state in which performance is limited by the availability of processor cycles. On a Unix box, you might run the top utility and look to see whether the %CPU figure indicates 100% CPU use. Alternatively, you might have a tool for plotting use of system resources. -- To email me, substitute nowhere-spamcop, invalid-net. Thanks. I run it without CPU-bound With clearer eyes, I did confirm my math above is correct. I don't have a networking reference to provide. You'll likely have some good results via Google. :) If you are not CPU bound, you are likely IO-bound. That means you computer is waiting for IO to complete - likely on the sending side. In this case, it likely means you have reached your ethernet bandwidth limits available to your computer. Since you didn't correct me when I assumed you're running 10Mb ethernet, I'll continue to assume that's a safe assumption. So, assuming you are running on 10Mb ethernet, try converting your application to use TCP. I'd bet, unless you have requirements which prevent its use, you'll suddenly have enough bandwidth (in this case, frames) to achieve your desired results. This is untested and off the top of my head but it should get you pointed in the right direction pretty quickly. Make the following changes to the server: sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM) to sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM) Make this: print Waiting for first packet to arrive..., sock.recvfrom(BUFSIZE) look like: print Waiting for first packet to arrive..., cliSock = sock.accept() Change your calls to sock.recvfrom(BUFSIZE) to cliSock.recv(BUFSIZE). Notice the change to cliSock. Keep in mind TCP is stream based, not datagram based so you may need to add additional logic to determine data boundaries for re-assemble of your data on the receiving end. There are several strategies to address that, but for now I'll gloss it over. As someone else pointed out above, change your calls to time.clock() to time.time(). On your client, make the following changes. sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM) to sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM) sock.connect( (remotehost,port) ) nbytes = sock.sendto(data, (remotehost,port)) to nbytes = sock.send(data) Now, rerun your tests on your network. I expect you'll be faster now because TCP can be pretty smart about buffering. Let's say you write 16, 90B blocks to the socket. If they are timely enough, it is possible all of those will be shipped across ethernet as a single frame. So what took 16 frames via UDP can now *potentially* be done in a single ethernet frame (assuming 1500MTU). I say potentially because the exact behaviour is OS/stack and NIC-driver specific and is often tunable to boot. Likewise, on the client end, what previously required 15 calls to recvfrom, each returning 90B, can *potentially* be completed in a single call to recv, returning 1440B. Remember, fewer frames means less protocol overhead which makes more bandwidth available to your applications. When sending 90B datagrams, you're waisting over 48% of your available bandwidth because of protocol overhead (actually a lot more because I'm not accounting for UDP headers). Because of the differences between UDP and TCP, unlike your original UDP implementation which can receive from multiple clients, the TCP implementation can only receive from a single client. If you need to receive from multiple clients concurrently, look at python's select module to take up the slack. Hopefully you'll be up and running. Please report back your findings. I'm curious as to your results. -- http://mail.python.org/mailman/listinfo/python-list
Problem with writing fast UDP server
Hi guys, I am struggling writing fast UDP server. It has to handle around 1 UDP packets per second. I started building that with non blocking socket and threads. Unfortunately my approach does not work at all. I wrote a simple case test: client and server. The client sends 2200 packets within 0.137447118759 secs. The tcpdump received 2189 packets, which is not bad at all. But the server only handles 700 -- 870 packets, when it is non- blocking, and only 670 – 700 received with blocking sockets. The client and the server are working within the same local network and tcpdump shows pretty correct amount of packets received. I included a bit of the code of the UDP server. class PacketReceive(threading.Thread): def __init__(self, tname, socket, queue): self._tname = tname self._socket = socket self._queue = queue threading.Thread.__init__(self, name=self._tname) def run(self): print 'Started thread: ', self.getName() cnt = 1 cnt_msgs = 0 while True: try: data = self._socket.recv(512) msg = data cnt_msgs += 1 total += 1 # self._queue.put(msg) print 'thread: %s, cnt_msgs: %d' % (self.getName(), cnt_msgs) except: pass I was also using Queue, but this didn't help neither. Any idea what I am doing wrong? I was reading that Python socket modules was causing some delays with TCP server. They recomended to set up socket option for nondelays: sock.setsockopt(SOL_TCP, TCP_NODELAY, 1) . I couldn't find any similar option for UDP type sockets. Is there anything I have to change in socket options to make it working faster? Why the server can't process all incomming packets? Is there a bug in the socket layer? btw. I am using Python 2.5 on Ubuntu 8.10. Cheers K -- http://mail.python.org/mailman/listinfo/python-list
Re: Problem with writing fast UDP server
Krzysztof Retel [EMAIL PROTECTED] writes: But the server only handles 700 -- 870 packets, when it is non- blocking, and only 670 – 700 received with blocking sockets. What are your other threads doing? Have you tried the same code without any threading? -- http://mail.python.org/mailman/listinfo/python-list
Re: Problem with writing fast UDP server
On Nov 20, 3:34 pm, Hrvoje Niksic [EMAIL PROTECTED] wrote: Krzysztof Retel [EMAIL PROTECTED] writes: But the server only handles 700 -- 870 packets, when it is non- blocking, and only 670 – 700 received with blocking sockets. What are your other threads doing? Have you tried the same code without any threading? I have only this one thread, which I can run couple of times. I tried without a threading and was the same result, not all packets were processed. -- http://mail.python.org/mailman/listinfo/python-list
Re: Problem with writing fast UDP server
On 20 Nov, 16:03, Krzysztof Retel [EMAIL PROTECTED] wrote: Hi guys, I am struggling writing fast UDP server. It has to handle around 1 UDP packets per second. I started building that with non blocking socket and threads. Unfortunately my approach does not work at all. I wrote a simple case test: client and server. The client sends 2200 packets within 0.137447118759 secs. The tcpdump received 2189 packets, which is not bad at all. But the server only handles 700 -- 870 packets, when it is non- blocking, and only 670 – 700 received with blocking sockets. The client and the server are working within the same local network and tcpdump shows pretty correct amount of packets received. I included a bit of the code of the UDP server. class PacketReceive(threading.Thread): def __init__(self, tname, socket, queue): self._tname = tname self._socket = socket self._queue = queue threading.Thread.__init__(self, name=self._tname) def run(self): print 'Started thread: ', self.getName() cnt = 1 cnt_msgs = 0 while True: try: data = self._socket.recv(512) msg = data cnt_msgs += 1 total += 1 # self._queue.put(msg) print 'thread: %s, cnt_msgs: %d' % (self.getName(), cnt_msgs) except: pass I was also using Queue, but this didn't help neither. Any idea what I am doing wrong? I was reading that Python socket modules was causing some delays with TCP server. They recomended to set up socket option for nondelays: sock.setsockopt(SOL_TCP, TCP_NODELAY, 1) . I couldn't find any similar option for UDP type sockets. Is there anything I have to change in socket options to make it working faster? Why the server can't process all incomming packets? Is there a bug in the socket layer? btw. I am using Python 2.5 on Ubuntu 8.10. Cheers K Stupid question: did you try removing the print (e.g. printing once every 100 messages) ? Ciao FB -- http://mail.python.org/mailman/listinfo/python-list
Re: Problem with writing fast UDP server
On Nov 20, 4:00 pm, [EMAIL PROTECTED] wrote: On 20 Nov, 16:03, Krzysztof Retel [EMAIL PROTECTED] wrote: Hi guys, I am struggling writing fast UDP server. It has to handle around 1 UDP packets per second. I started building that with non blocking socket and threads. Unfortunately my approach does not work at all. I wrote a simple case test: client and server. The client sends 2200 packets within 0.137447118759 secs. The tcpdump received 2189 packets, which is not bad at all. But the server only handles 700 -- 870 packets, when it is non- blocking, and only 670 – 700 received with blocking sockets. The client and the server are working within the same local network and tcpdump shows pretty correct amount of packets received. I included a bit of the code of the UDP server. class PacketReceive(threading.Thread): def __init__(self, tname, socket, queue): self._tname = tname self._socket = socket self._queue = queue threading.Thread.__init__(self, name=self._tname) def run(self): print 'Started thread: ', self.getName() cnt = 1 cnt_msgs = 0 while True: try: data = self._socket.recv(512) msg = data cnt_msgs += 1 total += 1 # self._queue.put(msg) print 'thread: %s, cnt_msgs: %d' % (self.getName(), cnt_msgs) except: pass I was also using Queue, but this didn't help neither. Any idea what I am doing wrong? I was reading that Python socket modules was causing some delays with TCP server. They recomended to set up socket option for nondelays: sock.setsockopt(SOL_TCP, TCP_NODELAY, 1) . I couldn't find any similar option for UDP type sockets. Is there anything I have to change in socket options to make it working faster? Why the server can't process all incomming packets? Is there a bug in the socket layer? btw. I am using Python 2.5 on Ubuntu 8.10. Cheers K Stupid question: did you try removing the print (e.g. printing once every 100 messages) ? :) Of course I did Nothing has changed I wonder if there is a kind of setting for socket to allow no delays? -- http://mail.python.org/mailman/listinfo/python-list
Re: Problem with writing fast UDP server
En Thu, 20 Nov 2008 14:24:20 -0200, Krzysztof Retel [EMAIL PROTECTED] escribió: On Nov 20, 4:00 pm, [EMAIL PROTECTED] wrote: On 20 Nov, 16:03, Krzysztof Retel [EMAIL PROTECTED] wrote: I am struggling writing fast UDP server. It has to handle around 1 UDP packets per second. I started building that with non blocking socket and threads. Unfortunately my approach does not work at all. I wrote a simple case test: client and server. The client sends 2200 packets within 0.137447118759 secs. The tcpdump received 2189 packets, which is not bad at all. But the server only handles 700 -- 870 packets, when it is non- blocking, and only 670 – 700 received with blocking sockets. The client and the server are working within the same local network and tcpdump shows pretty correct amount of packets received. I wonder if there is a kind of setting for socket to allow no delays? I've used this script to test sending UDP packets. I've not seen any delays. code a very simple UDP test Usage: %(name)s client remotehost message to send|length of message to continuously send messages to remotehost until Ctrl-C %(name)s server to listen for messages until Ctrl-C Uses port %(port)d. Once stopped, shows some statistics. Creates udpstress-client.csv or udpstress-server.csv with pairs (size,time) import os, sys import socket import time PORT = 21758 BUFSIZE = 4096 socket.setdefaulttimeout(10.0) def server(port): sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM) sock.bind(('',port)) print Receiving at port %d % (port) history = [] print Waiting for first packet to arrive..., sock.recvfrom(BUFSIZE) print ok t0 = time.clock() while 1: try: try: data, remoteaddr = sock.recvfrom(BUFSIZE) except socket.timeout: print Timed out break except KeyboardInterrupt: # #1755388 #926423 raise t1 = time.clock() if not data: break history.append((len(data), t1-t0)) t0 = t1 except KeyboardInterrupt: print Stopped break sock.close() return history def client(remotehost, port, data): sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM) history = [] print Sending %d-bytes packets to %s:%d % (len(data), remotehost, port) t0 = time.clock() while 1: try: nbytes = sock.sendto(data, (remotehost,port)) t1 = time.clock() if not nbytes: break history.append((nbytes, t1-t0)) t0 = t1 except KeyboardInterrupt: print Stopped break sock.close() return history def show_stats(history, which): npackets = len(history) bytes_total = sum([item[0] for item in history]) bytes_avg = float(bytes_total) / npackets bytes_max = max([item[0] for item in history]) time_total = sum([item[1] for item in history]) time_max = max([item[1] for item in history]) time_min = min([item[1] for item in history]) time_avg = float(time_total) / npackets speed_max = max([item[0]/item[1] for item in history if item[1]0]) speed_min = min([item[0]/item[1] for item in history if item[1]0]) speed_avg = float(bytes_total) / time_total print Packet count %8d % npackets print Total bytes%8d bytes % bytes_total print Total time %8.1f secs % time_total print Avg size / packet %8d bytes % bytes_avg print Max size / packet %8d bytes % bytes_max print Max time / packet %8.1f us % (time_max*1e6) print Min time / packet %8.1f us % (time_min*1e6) print Avg time / packet %8.1f us % (time_avg*1e6) print Max speed %8.1f Kbytes/sec % (speed_max/1024) print Min speed %8.1f Kbytes/sec % (speed_min/1024) print Avg speed %8.1f Kbytes/sec % (speed_avg/1024) print open(udpstress-%s.csv % which,w).writelines( [%d,%f\n % item for item in history]) if len(sys.argv)1: if client.startswith(sys.argv[1].lower()): remotehost = sys.argv[2] data = sys.argv[3] if data.isdigit(): # means length of message data = x * int(data) history = client(remotehost, PORT, data) show_stats(history, client) sys.exit(0) elif server.startswith(sys.argv[1].lower()): history = server(PORT) show_stats(history, server) sys.exit(0) print sys.stderr, __doc__ % { name: os.path.basename(sys.argv[0]), port: PORT} /code Start the server before the client. -- Gabriel Genellina -- http://mail.python.org/mailman/listinfo/python-list
Re: Problem with writing fast UDP server
Gabriel Genellina wrote: En Thu, 20 Nov 2008 14:24:20 -0200, Krzysztof Retel [EMAIL PROTECTED] escribió: On Nov 20, 4:00 pm, [EMAIL PROTECTED] wrote: On 20 Nov, 16:03, Krzysztof Retel [EMAIL PROTECTED] wrote: I am struggling writing fast UDP server. It has to handle around 1 UDP packets per second. I started building that with non blocking socket and threads. Unfortunately my approach does not work at all. I wrote a simple case test: client and server. The client sends 2200 packets within 0.137447118759 secs. The tcpdump received 2189 packets, which is not bad at all. But the server only handles 700 -- 870 packets, when it is non- blocking, and only 670 – 700 received with blocking sockets. The client and the server are working within the same local network and tcpdump shows pretty correct amount of packets received. I wonder if there is a kind of setting for socket to allow no delays? Is the program CPU-bound? If so, CPython is too slow for what you want to do. John Nagle -- http://mail.python.org/mailman/listinfo/python-list
Re: Problem with writing fast UDP server
On Fri, 21 Nov 2008 00:20:49 -0200, Gabriel Genellina [EMAIL PROTECTED] wrote: En Thu, 20 Nov 2008 14:24:20 -0200, Krzysztof Retel [EMAIL PROTECTED] escribió: On Nov 20, 4:00 pm, [EMAIL PROTECTED] wrote: On 20 Nov, 16:03, Krzysztof Retel [EMAIL PROTECTED] wrote: I am struggling writing fast UDP server. It has to handle around 1 UDP packets per second. I started building that with non blocking socket and threads. Unfortunately my approach does not work at all. I wrote a simple case test: client and server. The client sends 2200 packets within 0.137447118759 secs. The tcpdump received 2189 packets, which is not bad at all. But the server only handles 700 -- 870 packets, when it is non- blocking, and only 670 – 700 received with blocking sockets. The client and the server are working within the same local network and tcpdump shows pretty correct amount of packets received. I wonder if there is a kind of setting for socket to allow no delays? I've used this script to test sending UDP packets. I've not seen any delays. code [snip] /code Start the server before the client. If you want to try this program out on POSIX, make sure you change the time.clock() calls to time.time() calls instead, otherwise the results aren't very meaningful. I gave this a try on an AMD64 3200+ running a 32 bit Linux installation. Here's the results I got on the server: Packet count 91426 Total bytes 8228340 bytes Total time 8.4 secs Avg size / packet90 bytes Max size / packet90 bytes Max time / packet 41070.9 us Min time / packet 79.9 us Avg time / packet 92.3 us Max speed1100.4 Kbytes/sec Min speed 2.1 Kbytes/sec Avg speed 952.0 Kbytes/sec And on the client: Packet count 91426 Total bytes 8228340 bytes Total time 8.4 secs Avg size / packet90 bytes Max size / packet90 bytes Max time / packet 40936.0 us Min time / packet 78.9 us Avg time / packet 92.3 us Max speed1113.7 Kbytes/sec Min speed 2.1 Kbytes/sec Avg speed 952.1 Kbytes/sec Both processes ran on the same machine and communicated over localhost. For comparison, I tried running the client against a Twisted-based UDP server. Here are the results from that server: Packet count 91426 Total bytes 8228340 bytes Total time 11.8 secs Avg size / packet90 bytes Max size / packet90 bytes Max time / packet 55393.9 us Min time / packet 8.8 us Avg time / packet 128.7 us Max speed9963.2 Kbytes/sec Min speed 1.6 Kbytes/sec Avg speed 682.7 Kbytes/sec This seemed a bit low to me though, so I tried writing an alternate client and re-ran the measurement. Here are the new server results: Packet count 91426 Total bytes 8228340 bytes Total time 2.9 secs Avg size / packet90 bytes Max size / packet90 bytes Max time / packet 38193.0 us Min time / packet 8.8 us Avg time / packet 32.2 us Max speed9963.2 Kbytes/sec Min speed 2.3 Kbytes/sec Avg speed2726.7 Kbytes/sec And then tried the new client against the original server, with these results: Packet count 91426 Total bytes 8228340 bytes Total time 3.8 secs Avg size / packet90 bytes Max size / packet90 bytes Max time / packet 23675.0 us Min time / packet 6.9 us Avg time / packet 41.7 us Max speed 12711.7 Kbytes/sec Min speed 3.7 Kbytes/sec Avg speed2109.0 Kbytes/sec So it does seem that handling 10k datagrams per second should be no problem, assuming comparable hardware, at least if whatever work you have to do to process each one doesn't take more than about 24 25ths of a millisecond (leaving you the remaining 1 part out of 25 of every millisecond to receive a packet). For reference, here's the Twisted UDP client code: from twisted.internet.protocol import DatagramProtocol from twisted.internet import reactor from twisted.internet.task import LoopingCall msg = 'xyxabc123' * 10 class EchoClientDatagramProtocol(DatagramProtocol): def startProtocol(self): self.transport.connect('127.0.0.1', 8000) LoopingCall(self.sendDatagrams).start(0.5) def sendDatagrams(self): for i in xrange(50): self.transport.write(msg) def main(): protocol = EchoClientDatagramProtocol() t = reactor.listenUDP(0, protocol) reactor.run() if __name__ == '__main__': main() And here's the Twisted UDP server code: from time import time as clock from twisted.internet.protocol import DatagramProtocol from twisted.internet import reactor class EchoUDP(DatagramProtocol): history = [] t0 = clock() def
Re: Problem with writing fast UDP server
On Nov 20, 9:03 am, Krzysztof Retel [EMAIL PROTECTED] wrote: Hi guys, I am struggling writing fast UDP server. It has to handle around 1 UDP packets per second. I started building that with non blocking socket and threads. Unfortunately my approach does not work at all. I wrote a simple case test: client and server. The client sends 2200 packets within 0.137447118759 secs. The tcpdump received 2189 packets, which is not bad at all. But the server only handles 700 -- 870 packets, when it is non- blocking, and only 670 – 700 received with blocking sockets. The client and the server are working within the same local network and tcpdump shows pretty correct amount of packets received. I included a bit of the code of the UDP server. class PacketReceive(threading.Thread): def __init__(self, tname, socket, queue): self._tname = tname self._socket = socket self._queue = queue threading.Thread.__init__(self, name=self._tname) def run(self): print 'Started thread: ', self.getName() cnt = 1 cnt_msgs = 0 while True: try: data = self._socket.recv(512) msg = data cnt_msgs += 1 total += 1 # self._queue.put(msg) print 'thread: %s, cnt_msgs: %d' % (self.getName(), cnt_msgs) except: pass I was also using Queue, but this didn't help neither. Any idea what I am doing wrong? I was reading that Python socket modules was causing some delays with TCP server. They recomended to set up socket option for nondelays: sock.setsockopt(SOL_TCP, TCP_NODELAY, 1) . I couldn't find any similar option for UDP type sockets. Is there anything I have to change in socket options to make it working faster? Why the server can't process all incomming packets? Is there a bug in the socket layer? btw. I am using Python 2.5 on Ubuntu 8.10. Cheers K First and foremost, you are not being realistic here. Attempting to squeeze 10,000 packets per second out of 10Mb/s (assumed) Ethernet is not realistic. The maximum theoretical limit is 14,880 frames per second, and that assumes each frame is only 84 bytes per frame, making it useless for data transport. Using your numbers, each frame requires (90B + 84B) 174B, which works out to be a theoretical maximum of ~7200 frames per second. These are obviously some rough numbers but I believe you get the point. It's late here, so I'll double check my numbers tomorrow. In your case, you would not want to use TCP_NODELAY, even if you were to use TCP, as it would actually limit your throughput. UDP does not have such an option because each datagram is an ethernet frame - which is not true for TCP as TCP is a stream. In this case, use of TCP may significantly reduce the number of frames required for transport - assuming TCP_NODELAY is NOT used. If you want to increase your throughput, use larger datagrams. If you are on a reliable connection, which we can safely assume since you are currently using UDP, use of TCP without the use of TCP_NODELAY may yield better performance because of its buffering strategy. Assuming you are using 10Mb ethernet, you are nearing its frame- saturation limits. If you are using 100Mb ethernet, you'll obviously have a lot more elbow room but not nearly as much as one would hope because 100Mb is only possible when frames which are completely filled. It's been a while since I last looked at 100Mb numbers, but it's not likely most people will see numbers near its theoretical limits simply because that number has so many caveats associated with it - and small frames are its nemesis. Since you are using very small datagrams, you are wasting a lot of potential throughput. And if you have other computers on your network, the situation is made yet more difficult. Additionally, many switches and/or routes also have bandwidth limits which may or may not pose a wall for your application. And to make matters worse, you are allocating lots of buffers (4K) to send/receive 90 bytes of data, creating yet more work for your computer. Options to try: See how TCP measures up for you Attempt to place multiple data objects within a single datagram, thereby optimizing available ethernet bandwidth You didn't say if you are CPU-bound, but you are creating a tuple and appending it to a list on every datagram. You may find allocating smaller buffers and optimizing your history accounting may help if you're CPU-bound. Don't forget, localhost does not suffer from frame limits - it's basically testing your memory/bus speed If this is for local use only, considering using a different IPC mechanism - unix domain sockets or memory mapped files -- http://mail.python.org/mailman/listinfo/python-list