[issue8873] Popen uses 333 times as much CPU as a shell pipe on Mac OS X

2010-08-04 Thread Terry J. Reedy

Changes by Terry J. Reedy tjre...@udel.edu:


--
assignee:  - d...@python
components: +Documentation -Library (Lib)
nosy: +d...@python
resolution:  - fixed
status: open - closed
versions: +Python 2.7 -Python 2.5, Python 2.6

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue8873
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8873] Popen uses 333 times as much CPU as a shell pipe on Mac OS X

2010-06-02 Thread Antoine Pitrou

Antoine Pitrou pit...@free.fr added the comment:

Have you tried specifying a value for the bufsize argument to Popen? Either 1 
for line buffering or e.g. 4096 for a decent size block buffering.

--
nosy: +gregory.p.smith, pitrou

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue8873
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8873] Popen uses 333 times as much CPU as a shell pipe on Mac OS X

2010-06-02 Thread Hugh Secker-Walker

Hugh Secker-Walker hug...@users.sourceforge.net added the comment:

OK.  Using bufsize=4096 or bufsize=-1 eliminates the CPU-hogging problem on Mac 
OS X.  Thank you.

I have to say that the Popen() doc could be better.  It says: bufsize, if 
given, has the same meaning as the corresponding argument to the built-in 
open() function! which implies behavior like open(), which is to use the 
system's default buffering, generally a good thing.

I think the default of 0, meaning unbuffered, is a poor choice.  I'm guessing 
many many Mac users are getting a big slowdown from this default behavior (as 
I've now discovered elsewhere in our project!).  I request that you change the 
default to be the same as open(), i.e. use the system default.

Barring that semantic change, please add a note to the doc:  Most users will 
want to set bufsize=4096 for pipe-like buffering, or bufsize=1 for line-based 
buffering.  

Thanks again for the quick response and resolution.

-Hugh

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue8873
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8873] Popen uses 333 times as much CPU as a shell pipe on Mac OS X

2010-06-02 Thread Antoine Pitrou

Antoine Pitrou pit...@free.fr added the comment:

 I think the default of 0, meaning unbuffered, is a poor choice.

So do I, but I'm not sure whether it's acceptable to change behaviour in
a potentially incompatible way.

 Barring that semantic change, please add a note to the doc:  Most
 users will want to set bufsize=4096 for pipe-like buffering, or
 bufsize=1 for line-based buffering.  

Yes, it deserves a documentation note indeed.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue8873
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8873] Popen uses 333 times as much CPU as a shell pipe on Mac OS X

2010-06-02 Thread Skip Montanaro

Skip Montanaro s...@pobox.com added the comment:

This problem has come up before.  It was a bug introduced I think when
the subprocess module was first incorporated into Python.  I don't
recall if the default was changed in 2.7 or 3.x.  Can you demonstrate
the problem with Python 2.7 or 3.1?  If not, I'd recommend this 
ticket just be closed.

--
nosy: +skip.montanaro

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue8873
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8873] Popen uses 333 times as much CPU as a shell pipe on Mac OS X

2010-06-02 Thread Antoine Pitrou

Antoine Pitrou pit...@free.fr added the comment:

 This problem has come up before.  It was a bug introduced I think when
 the subprocess module was first incorporated into Python.  I don't
 recall if the default was changed in 2.7 or 3.x.

According to the docs it is still unbuffered by default. I've added a
documentation note about performance issues in r81652.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue8873
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8873] Popen uses 333 times as much CPU as a shell pipe on Mac OS X

2010-06-01 Thread Hugh Secker-Walker

New submission from Hugh Secker-Walker hug...@users.sourceforge.net:

Popen seems to be *very* costly, CPU wise, on Mac OS X.

I need to read and process bytes from the stdout of another process that 
outputs bytes in a rate-limited fashion, e.g. 8 bytes per second.  If I use 
a shell pipeline and a simple Python script to read the bytes and log some 
details then top and time both agree that the Python script is using roughly 1 
second of CPU time for every 2000 seconds of elapsed time.

If I change the script to use Popen to start the first process and to read from 
proc.stdout, then top and time both agree that the Python script uses roughly 1 
second of CPU time for every 6 seconds of elapsed time.  That is, using Popen 
uses 333 times as much CPU as using a shell pipeline.

In practice, this means that using a shell pipeline has no measureable impact 
on system performance, and I can run 300 concurrent pipeline jobs, whereas 
using Popen limits me to running 6 such concurrent jobs.

This is on an up-to-date Mac OS X 10.5.8 system.  The behavior happens with 
both the stock Python 2.5 from Apple and with Python 2.6 from Macports.

On Ubuntu Linux the Popen solution is perhaps 2% slower than using a shell 
pipeline.

I'll attach three scripts:
bytestream -- simple program that sends bytes to its stdout
pull0 -- Python script that reads its stdin and logs a couple of stats
pull1 -- Python script that uses Popen to start bytestream and pulls from 
proc.stdout and runs abysmally slowly.

Use this with a concurrent top to see the shell pipeline behavior:
$ ./bytestream 8 4 | time ./pull0 11000

Use this with a concurrent top to see the Popen behavior:
$ time ./pull1 ./bytestream 8 4 11000

In both cases you'll see that system and user time are approximately equal, but 
that they're roughly 333 times higher for pull1.  Is there a clue in the fact 
that both times go up by about the same huge factor?

-Hugh

--
components: Library (Lib)
files: bytestream
messages: 106862
nosy: hughsw
priority: normal
severity: normal
status: open
title: Popen uses 333 times as much CPU as a shell pipe on Mac OS X
type: behavior
versions: Python 2.5, Python 2.6
Added file: http://bugs.python.org/file17517/bytestream

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue8873
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8873] Popen uses 333 times as much CPU as a shell pipe on Mac OS X

2010-06-01 Thread Hugh Secker-Walker

Changes by Hugh Secker-Walker hug...@users.sourceforge.net:


Added file: http://bugs.python.org/file17518/pull0

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue8873
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8873] Popen uses 333 times as much CPU as a shell pipe on Mac OS X

2010-06-01 Thread Hugh Secker-Walker

Changes by Hugh Secker-Walker hug...@users.sourceforge.net:


Added file: http://bugs.python.org/file17519/pull1

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue8873
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com