Re: Threading and consuming output from processes

2005-02-26 Thread Jack Orenstein
I asked:
I am developing a Python program that submits a command to each node
of a cluster and consumes the stdout and stderr from each. I want all
the processes to run in parallel, so I start a thread for each
node. There could be a lot of output from a node, so I have a thread
reading each stream, for a total of three threads per node. (I could
probably reduce to two threads per node by having the process thread
handle stdout or stderr.)
Simon Wittber said:
 In the past, I have used the select module to manage asynchronous
 IO operations.

 I pass the select.select function a list of file-like objects, and it
 returns a list of file-like objects which are ready for reading and
 writing.
Donn Cave said:
As I see another followup has already mentioned, the classic
pre threads solution to multiple I/O sources is the select(2)
function, ...
Thanks for your replies. The streams that I need to read contain
pickled data. The select call returns files that have available input,
and I can use read(file_descriptor, max) to read some of the input
data. But then how can I convert the bytes just read into a stream for
unpickling? I somehow need to take the bytes arriving for a given file
descriptor and buffer them until the unpickler has enough data to
return a complete unpickled object.
(It would be nice to do this without copying the bytes from one place
to another, but I don't even see how do solve the problem with
copying.)
Jack
--
http://mail.python.org/mailman/listinfo/python-list


Re: Threading and consuming output from processes

2005-02-26 Thread Donn Cave
Quoth Jack Orenstein [EMAIL PROTECTED]:
[ ... re alternatives to threads ]
| Thanks for your replies. The streams that I need to read contain
| pickled data. The select call returns files that have available input,
| and I can use read(file_descriptor, max) to read some of the input
| data. But then how can I convert the bytes just read into a stream for
| unpickling? I somehow need to take the bytes arriving for a given file
| descriptor and buffer them until the unpickler has enough data to
| return a complete unpickled object.
|
| (It would be nice to do this without copying the bytes from one place
| to another, but I don't even see how do solve the problem with
| copying.)

Note that the file object copies bytes from one place to another,
via C library stdio.  If we could only see the data in those
stdio buffers, it would be possible to use file objects with
select() in more applications.  (Though not with pickle.)  Since
input very commonly needs to be buffered for various reasons, we
end up writing our own buffer code, all because stdio has no
standard function that tells you how much data is in a buffer.

But unpickling consumes an I/O stream, as you observe, so as a
network data protocol by itself, it's unsuitable for use with
select.  I think the only option would be a packet protocol -
a count field followed by the indicated amount of pickle data.
I suppose I would copy the received data into a StringIO object,
and unpickle that when all the data has been received.

Incidentally, I think I read here yesterday, someone held a book
about Python programming up to some ridicule for suggesting that
pickles would be a good way to send data around on the network.
The problem with this was supposed to have something to do with
overloading.  I have no idea what he was talking about, but you
might be interested in this issue.

Donn Cave, [EMAIL PROTECTED]
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Threading and consuming output from processes

2005-02-25 Thread Donn Cave
In article [EMAIL PROTECTED],
 Jack Orenstein [EMAIL PROTECTED] wrote:
 I am developing a Python program that submits a command to each node
 of a cluster and consumes the stdout and stderr from each. I want all
 the processes to run in parallel, so I start a thread for each
 node. There could be a lot of output from a node, so I have a thread
 reading each stream, for a total of three threads per node. (I could
 probably reduce to two threads per node by having the process thread
 handle stdout or stderr.)
 
 I've developed some code and have run into problems using the
 threading module, and have questions at various levels of detail.
 
 1) How should I solve this problem? I'm an experienced Java programmer
 but new to Python, so my solution looks very Java-like (hence the use of
 the threading module). Any advice on the right way to approach the
 problem in Python would be useful.
 
 2) How many active Python threads is it reasonable to have at one
 time? Our clusters have up to 50 nodes -- is 100-150 threads known to
 work? (I'm using Python 2.2.2 on RedHat 9.)
 
 3) I've run into a number of problems with the threading module. My
 program seems to work about 90% of the time. The remaining 10%, it
 looks like notify or notifyAll don't wake up waiting threads; or I
 find some other problem that makes me wonder about the stability of
 the threading module. I can post details on the problems I'm seeing,
 but I thought it would be good to get general feedback
 first. (Googling doesn't turn up any signs of trouble.)

One of my colleagues here wrote a sort of similar application
in Python, used threads, and had plenty of troubles with it.
I don't recall the details.  Some of the problems could be
specific to Python.  For example, there are some extra signal
handling issues - but this is not to say that there are no
signal handling issues with a multithreaded C application.
For my money, you just don't get robust applications when
you solve problems like multiple I/O sources by throwing
threads at them.

As I see another followup has already mentioned, the classic
pre threads solution to multiple I/O sources is the select(2)
function, which allows a single thread to serially process
multiple file descriptors as data becomes available on them.
When using select(), you should read from the file descriptor,
using os.read(fd, size), socketobject.recv(size) etc., to
avoid reading into local buffers as would happen with a file
object.

   Donn Cave, [EMAIL PROTECTED]
-- 
http://mail.python.org/mailman/listinfo/python-list


Threading and consuming output from processes

2005-02-24 Thread Jack Orenstein
I am developing a Python program that submits a command to each node
of a cluster and consumes the stdout and stderr from each. I want all
the processes to run in parallel, so I start a thread for each
node. There could be a lot of output from a node, so I have a thread
reading each stream, for a total of three threads per node. (I could
probably reduce to two threads per node by having the process thread
handle stdout or stderr.)
I've developed some code and have run into problems using the
threading module, and have questions at various levels of detail.
1) How should I solve this problem? I'm an experienced Java programmer
but new to Python, so my solution looks very Java-like (hence the use of
the threading module). Any advice on the right way to approach the
problem in Python would be useful.
2) How many active Python threads is it reasonable to have at one
time? Our clusters have up to 50 nodes -- is 100-150 threads known to
work? (I'm using Python 2.2.2 on RedHat 9.)
3) I've run into a number of problems with the threading module. My
program seems to work about 90% of the time. The remaining 10%, it
looks like notify or notifyAll don't wake up waiting threads; or I
find some other problem that makes me wonder about the stability of
the threading module. I can post details on the problems I'm seeing,
but I thought it would be good to get general feedback
first. (Googling doesn't turn up any signs of trouble.)
Thanks.
Jack Orenstein
--
http://mail.python.org/mailman/listinfo/python-list


Re: Threading and consuming output from processes

2005-02-24 Thread Simon Wittber
 1) How should I solve this problem? I'm an experienced Java programmer
 but new to Python, so my solution looks very Java-like (hence the use of
 the threading module). Any advice on the right way to approach the
 problem in Python would be useful.

In the past, I have used the select module to manage asynchronous IO operations.

I pass the select.select function a list of file-like objects, and it
returns a list of file-like objects which are ready for reading and
writing.

http://python.org/doc/2.2/lib/module-select.html
-- 
http://mail.python.org/mailman/listinfo/python-list