Re: Threading and consuming output from processes
I asked: I am developing a Python program that submits a command to each node of a cluster and consumes the stdout and stderr from each. I want all the processes to run in parallel, so I start a thread for each node. There could be a lot of output from a node, so I have a thread reading each stream, for a total of three threads per node. (I could probably reduce to two threads per node by having the process thread handle stdout or stderr.) Simon Wittber said: In the past, I have used the select module to manage asynchronous IO operations. I pass the select.select function a list of file-like objects, and it returns a list of file-like objects which are ready for reading and writing. Donn Cave said: As I see another followup has already mentioned, the classic pre threads solution to multiple I/O sources is the select(2) function, ... Thanks for your replies. The streams that I need to read contain pickled data. The select call returns files that have available input, and I can use read(file_descriptor, max) to read some of the input data. But then how can I convert the bytes just read into a stream for unpickling? I somehow need to take the bytes arriving for a given file descriptor and buffer them until the unpickler has enough data to return a complete unpickled object. (It would be nice to do this without copying the bytes from one place to another, but I don't even see how do solve the problem with copying.) Jack -- http://mail.python.org/mailman/listinfo/python-list
Re: Threading and consuming output from processes
Quoth Jack Orenstein [EMAIL PROTECTED]: [ ... re alternatives to threads ] | Thanks for your replies. The streams that I need to read contain | pickled data. The select call returns files that have available input, | and I can use read(file_descriptor, max) to read some of the input | data. But then how can I convert the bytes just read into a stream for | unpickling? I somehow need to take the bytes arriving for a given file | descriptor and buffer them until the unpickler has enough data to | return a complete unpickled object. | | (It would be nice to do this without copying the bytes from one place | to another, but I don't even see how do solve the problem with | copying.) Note that the file object copies bytes from one place to another, via C library stdio. If we could only see the data in those stdio buffers, it would be possible to use file objects with select() in more applications. (Though not with pickle.) Since input very commonly needs to be buffered for various reasons, we end up writing our own buffer code, all because stdio has no standard function that tells you how much data is in a buffer. But unpickling consumes an I/O stream, as you observe, so as a network data protocol by itself, it's unsuitable for use with select. I think the only option would be a packet protocol - a count field followed by the indicated amount of pickle data. I suppose I would copy the received data into a StringIO object, and unpickle that when all the data has been received. Incidentally, I think I read here yesterday, someone held a book about Python programming up to some ridicule for suggesting that pickles would be a good way to send data around on the network. The problem with this was supposed to have something to do with overloading. I have no idea what he was talking about, but you might be interested in this issue. Donn Cave, [EMAIL PROTECTED] -- http://mail.python.org/mailman/listinfo/python-list
Re: Threading and consuming output from processes
In article [EMAIL PROTECTED], Jack Orenstein [EMAIL PROTECTED] wrote: I am developing a Python program that submits a command to each node of a cluster and consumes the stdout and stderr from each. I want all the processes to run in parallel, so I start a thread for each node. There could be a lot of output from a node, so I have a thread reading each stream, for a total of three threads per node. (I could probably reduce to two threads per node by having the process thread handle stdout or stderr.) I've developed some code and have run into problems using the threading module, and have questions at various levels of detail. 1) How should I solve this problem? I'm an experienced Java programmer but new to Python, so my solution looks very Java-like (hence the use of the threading module). Any advice on the right way to approach the problem in Python would be useful. 2) How many active Python threads is it reasonable to have at one time? Our clusters have up to 50 nodes -- is 100-150 threads known to work? (I'm using Python 2.2.2 on RedHat 9.) 3) I've run into a number of problems with the threading module. My program seems to work about 90% of the time. The remaining 10%, it looks like notify or notifyAll don't wake up waiting threads; or I find some other problem that makes me wonder about the stability of the threading module. I can post details on the problems I'm seeing, but I thought it would be good to get general feedback first. (Googling doesn't turn up any signs of trouble.) One of my colleagues here wrote a sort of similar application in Python, used threads, and had plenty of troubles with it. I don't recall the details. Some of the problems could be specific to Python. For example, there are some extra signal handling issues - but this is not to say that there are no signal handling issues with a multithreaded C application. For my money, you just don't get robust applications when you solve problems like multiple I/O sources by throwing threads at them. As I see another followup has already mentioned, the classic pre threads solution to multiple I/O sources is the select(2) function, which allows a single thread to serially process multiple file descriptors as data becomes available on them. When using select(), you should read from the file descriptor, using os.read(fd, size), socketobject.recv(size) etc., to avoid reading into local buffers as would happen with a file object. Donn Cave, [EMAIL PROTECTED] -- http://mail.python.org/mailman/listinfo/python-list
Threading and consuming output from processes
I am developing a Python program that submits a command to each node of a cluster and consumes the stdout and stderr from each. I want all the processes to run in parallel, so I start a thread for each node. There could be a lot of output from a node, so I have a thread reading each stream, for a total of three threads per node. (I could probably reduce to two threads per node by having the process thread handle stdout or stderr.) I've developed some code and have run into problems using the threading module, and have questions at various levels of detail. 1) How should I solve this problem? I'm an experienced Java programmer but new to Python, so my solution looks very Java-like (hence the use of the threading module). Any advice on the right way to approach the problem in Python would be useful. 2) How many active Python threads is it reasonable to have at one time? Our clusters have up to 50 nodes -- is 100-150 threads known to work? (I'm using Python 2.2.2 on RedHat 9.) 3) I've run into a number of problems with the threading module. My program seems to work about 90% of the time. The remaining 10%, it looks like notify or notifyAll don't wake up waiting threads; or I find some other problem that makes me wonder about the stability of the threading module. I can post details on the problems I'm seeing, but I thought it would be good to get general feedback first. (Googling doesn't turn up any signs of trouble.) Thanks. Jack Orenstein -- http://mail.python.org/mailman/listinfo/python-list
Re: Threading and consuming output from processes
1) How should I solve this problem? I'm an experienced Java programmer but new to Python, so my solution looks very Java-like (hence the use of the threading module). Any advice on the right way to approach the problem in Python would be useful. In the past, I have used the select module to manage asynchronous IO operations. I pass the select.select function a list of file-like objects, and it returns a list of file-like objects which are ready for reading and writing. http://python.org/doc/2.2/lib/module-select.html -- http://mail.python.org/mailman/listinfo/python-list