How should I use grep from python?

2009-05-07 Thread Matthew Wilson
I'm writing a command-line application and I want to search through lots
of text files for a string.  Instead of writing the python code to do
this, I want to use grep.

This is the command I want to run:

$ grep -l foo dir

In other words, I want to list all files in the directory dir that
contain the string foo.

I'm looking for the one obvious way to do it and instead I found no
consensus.  I could os.popen, commands.getstatusoutput, the subprocess
module, backticks, etc.  

As of May 2009, what is the recommended way to run an external process
like grep and capture STDOUT and the error code?


TIA

Matt
--
http://mail.python.org/mailman/listinfo/python-list


Re: How should I use grep from python?

2009-05-07 Thread Diez B. Roggisch
Matthew Wilson wrote:

 I'm writing a command-line application and I want to search through lots
 of text files for a string.  Instead of writing the python code to do
 this, I want to use grep.
 
 This is the command I want to run:
 
 $ grep -l foo dir
 
 In other words, I want to list all files in the directory dir that
 contain the string foo.
 
 I'm looking for the one obvious way to do it and instead I found no
 consensus.  I could os.popen, commands.getstatusoutput, the subprocess
 module, backticks, etc.
 
 As of May 2009, what is the recommended way to run an external process
 like grep and capture STDOUT and the error code?

subprocess. Which becomes pretty clear when reading it's docs:


The subprocess module allows you to spawn new processes, connect to their
input/output/error pipes, and obtain their return codes. This module
intends to replace several other, older modules and functions, such as:
os.system
os.spawn*
os.popen*
popen2.*
commands.*


Diez
--
http://mail.python.org/mailman/listinfo/python-list


Re: How should I use grep from python?

2009-05-07 Thread Tim Chase

I'm writing a command-line application and I want to search through lots
of text files for a string.  Instead of writing the python code to do
this, I want to use grep.

This is the command I want to run:

$ grep -l foo dir

In other words, I want to list all files in the directory dir that
contain the string foo.

I'm looking for the one obvious way to do it and instead I found no
consensus.  I could os.popen, commands.getstatusoutput, the subprocess
module, backticks, etc.  


While it doesn't use grep or external processes, I'd just do it 
in pure Python:


  def files_containing(location, search_term):
for fname in os.listdir(location):
  fullpath = os.path.join(location, fname)
  if os.isfile(fullpath):
for line in file(fullpath):
  if search_term in line:
yield fname
break
  for fname in files_containing('/tmp', 'term'):
print fname

It's fairly readable, you can easily tweak the search methods 
(case sensitive, etc), change it to be recursive by using 
os.walk() instead of listdir(), it's cross-platform, and doesn't 
require the overhead of an external process (along with the 
which call do I use to spawn the function questions that come 
with it :)


However, to answer your original question, I'd use os.popen which 
is the one I see suggested most frequently.


-tkc



--
http://mail.python.org/mailman/listinfo/python-list


Re: How should I use grep from python?

2009-05-07 Thread Matthew Wilson
On Thu 07 May 2009 09:09:53 AM EDT, Diez B. Roggisch wrote:
 Matthew Wilson wrote:
 
 As of May 2009, what is the recommended way to run an external process
 like grep and capture STDOUT and the error code?

 subprocess. Which becomes pretty clear when reading it's docs:

Yeah, that's what I figured, but I wondered if there was already
something newer and shinier aiming to bump subprocess off its throne.

I'll just stick with subprocess for now.  Thanks for the feedback!
--
http://mail.python.org/mailman/listinfo/python-list


Re: How should I use grep from python?

2009-05-07 Thread Matthew Wilson
On Thu 07 May 2009 09:25:52 AM EDT, Tim Chase wrote:
 While it doesn't use grep or external processes, I'd just do it 
 in pure Python:

Thanks for the code!

I'm reluctant to take that approach for a few reasons:

1. Writing tests for that code seems like a fairly large amount of work.
I think I'd need to to either mock out lots of stuff or create a bunch
of temporary directories and files for each test run.

I don't intend to test that grep works like it says it does.  I'll
just test that my code calls a mocked-out grep with the right options
and arguments, and that my code behaves nicely when my mocked-out
grep returns errors.

2. grep is crazy fast.  For a search through just a few files, I doubt
it would matter, but when searching through a thousand files (which is
likely) I suspect that an all-python approach might lag behind.  I'm
speculating here, though.

3. grep has lots and lots of cute options.  I don't want to think about
implementing stuff like --color, for example.  If I just pass all the
heavy lifting to grep, I'm already done.

On the other hand, your solution is platform-independent and has no
dependencies.  Mine depends on an external grep command.

Thanks again for the feedback!

Matt

--
http://mail.python.org/mailman/listinfo/python-list


Re: How should I use grep from python?

2009-05-07 Thread Marco Mariani

Matthew Wilson wrote:


consensus.  I could os.popen, commands.getstatusoutput, the subprocess
module, backticks, etc.


Backticks do_not_do what you think they do.

And with py3k they're also as dead as a dead parrot.


--
http://mail.python.org/mailman/listinfo/python-list


Re: How should I use grep from python?

2009-05-07 Thread Nick Craig-Wood
Matthew Wilson m...@tplus1.com wrote:
  I'm writing a command-line application and I want to search through lots
  of text files for a string.  Instead of writing the python code to do
  this, I want to use grep.
 
  This is the command I want to run:
 
  $ grep -l foo dir
 
  In other words, I want to list all files in the directory dir that
  contain the string foo.
 
  I'm looking for the one obvious way to do it and instead I found no
  consensus.  I could os.popen, commands.getstatusoutput, the subprocess
  module, backticks, etc.  

backticks is some other language ;-)

  As of May 2009, what is the recommended way to run an external process
  like grep and capture STDOUT and the error code?

This is the one true way now-a-days

 from subprocess import Popen, PIPE
 p = Popen([ls, -l], stdout=PIPE)
 for line in p.stdout:
... print line
...
total 93332

-rw-r--r--  1 ncw ncw  181 2007-10-18 14:01 -

drwxr-xr-x  2 ncw ncw 4096 2007-08-29 22:56 10_files

-rw-r--r--  1 ncw ncw   124713 2007-08-29 22:56 10.html
[snip]
 p.wait() # returns the error code
0


There was talk of removing the other methods from public use for 3.x.
Not sure of the conclusion.

-- 
Nick Craig-Wood n...@craig-wood.com -- http://www.craig-wood.com/nick
--
http://mail.python.org/mailman/listinfo/python-list