How should I use grep from python?
I'm writing a command-line application and I want to search through lots of text files for a string. Instead of writing the python code to do this, I want to use grep. This is the command I want to run: $ grep -l foo dir In other words, I want to list all files in the directory dir that contain the string foo. I'm looking for the one obvious way to do it and instead I found no consensus. I could os.popen, commands.getstatusoutput, the subprocess module, backticks, etc. As of May 2009, what is the recommended way to run an external process like grep and capture STDOUT and the error code? TIA Matt -- http://mail.python.org/mailman/listinfo/python-list
Re: How should I use grep from python?
Matthew Wilson wrote: I'm writing a command-line application and I want to search through lots of text files for a string. Instead of writing the python code to do this, I want to use grep. This is the command I want to run: $ grep -l foo dir In other words, I want to list all files in the directory dir that contain the string foo. I'm looking for the one obvious way to do it and instead I found no consensus. I could os.popen, commands.getstatusoutput, the subprocess module, backticks, etc. As of May 2009, what is the recommended way to run an external process like grep and capture STDOUT and the error code? subprocess. Which becomes pretty clear when reading it's docs: The subprocess module allows you to spawn new processes, connect to their input/output/error pipes, and obtain their return codes. This module intends to replace several other, older modules and functions, such as: os.system os.spawn* os.popen* popen2.* commands.* Diez -- http://mail.python.org/mailman/listinfo/python-list
Re: How should I use grep from python?
I'm writing a command-line application and I want to search through lots of text files for a string. Instead of writing the python code to do this, I want to use grep. This is the command I want to run: $ grep -l foo dir In other words, I want to list all files in the directory dir that contain the string foo. I'm looking for the one obvious way to do it and instead I found no consensus. I could os.popen, commands.getstatusoutput, the subprocess module, backticks, etc. While it doesn't use grep or external processes, I'd just do it in pure Python: def files_containing(location, search_term): for fname in os.listdir(location): fullpath = os.path.join(location, fname) if os.isfile(fullpath): for line in file(fullpath): if search_term in line: yield fname break for fname in files_containing('/tmp', 'term'): print fname It's fairly readable, you can easily tweak the search methods (case sensitive, etc), change it to be recursive by using os.walk() instead of listdir(), it's cross-platform, and doesn't require the overhead of an external process (along with the which call do I use to spawn the function questions that come with it :) However, to answer your original question, I'd use os.popen which is the one I see suggested most frequently. -tkc -- http://mail.python.org/mailman/listinfo/python-list
Re: How should I use grep from python?
On Thu 07 May 2009 09:09:53 AM EDT, Diez B. Roggisch wrote: Matthew Wilson wrote: As of May 2009, what is the recommended way to run an external process like grep and capture STDOUT and the error code? subprocess. Which becomes pretty clear when reading it's docs: Yeah, that's what I figured, but I wondered if there was already something newer and shinier aiming to bump subprocess off its throne. I'll just stick with subprocess for now. Thanks for the feedback! -- http://mail.python.org/mailman/listinfo/python-list
Re: How should I use grep from python?
On Thu 07 May 2009 09:25:52 AM EDT, Tim Chase wrote: While it doesn't use grep or external processes, I'd just do it in pure Python: Thanks for the code! I'm reluctant to take that approach for a few reasons: 1. Writing tests for that code seems like a fairly large amount of work. I think I'd need to to either mock out lots of stuff or create a bunch of temporary directories and files for each test run. I don't intend to test that grep works like it says it does. I'll just test that my code calls a mocked-out grep with the right options and arguments, and that my code behaves nicely when my mocked-out grep returns errors. 2. grep is crazy fast. For a search through just a few files, I doubt it would matter, but when searching through a thousand files (which is likely) I suspect that an all-python approach might lag behind. I'm speculating here, though. 3. grep has lots and lots of cute options. I don't want to think about implementing stuff like --color, for example. If I just pass all the heavy lifting to grep, I'm already done. On the other hand, your solution is platform-independent and has no dependencies. Mine depends on an external grep command. Thanks again for the feedback! Matt -- http://mail.python.org/mailman/listinfo/python-list
Re: How should I use grep from python?
Matthew Wilson wrote: consensus. I could os.popen, commands.getstatusoutput, the subprocess module, backticks, etc. Backticks do_not_do what you think they do. And with py3k they're also as dead as a dead parrot. -- http://mail.python.org/mailman/listinfo/python-list
Re: How should I use grep from python?
Matthew Wilson m...@tplus1.com wrote: I'm writing a command-line application and I want to search through lots of text files for a string. Instead of writing the python code to do this, I want to use grep. This is the command I want to run: $ grep -l foo dir In other words, I want to list all files in the directory dir that contain the string foo. I'm looking for the one obvious way to do it and instead I found no consensus. I could os.popen, commands.getstatusoutput, the subprocess module, backticks, etc. backticks is some other language ;-) As of May 2009, what is the recommended way to run an external process like grep and capture STDOUT and the error code? This is the one true way now-a-days from subprocess import Popen, PIPE p = Popen([ls, -l], stdout=PIPE) for line in p.stdout: ... print line ... total 93332 -rw-r--r-- 1 ncw ncw 181 2007-10-18 14:01 - drwxr-xr-x 2 ncw ncw 4096 2007-08-29 22:56 10_files -rw-r--r-- 1 ncw ncw 124713 2007-08-29 22:56 10.html [snip] p.wait() # returns the error code 0 There was talk of removing the other methods from public use for 3.x. Not sure of the conclusion. -- Nick Craig-Wood n...@craig-wood.com -- http://www.craig-wood.com/nick -- http://mail.python.org/mailman/listinfo/python-list