Re: Searching through more than one file.
- On Sun, Dec 28, 2014 8:12 PM CET Dave Angel wrote: On 12/28/2014 12:27 PM, Seymore4Head wrote: I need to search through a directory of text files for a string. Here is a short program I made in the past to search through a single text file for a line of text. How can I modify the code to search through a directory of files that have different filenames, but the same extension? You have two other replies to your specific question, glob and os.listdir. I would also mention the module fileinput: https://docs.python.org/2/library/fileinput.html Ah, I was just about to say that. I found out about this gem after reading Dough Helmann's book. Here are some usage examples: http://pymotw.com/2/fileinput/ import fileinput from glob import glob fnames = glob('*.txt') for line in fileinput.input(fnames): pass # do whatever If you're not on Windows, I'd mention that the shell will expand the wildcards for you, so you could get the filenames from argv even simpler. See first example on the above web page. I'm more concerned that you think the following code you supplied does a search for a string. It does something entirely different, involving making a crude dictionary. But it could be reduced to just a few lines, and probably take much less memory, if this is really the code you're working on. fname = raw_input(Enter file name: ) #*.txt fh = open(fname) lst = list() biglst=[] for line in fh: line=line.rstrip() line=line.split() biglst+=line final=[] for out in biglst: if out not in final: final.append(out) final.sort() print (final) Something like the following: import fileinput from glob import glob res = set() fnames = glob('*.txt') for line in fileinput.input(fnames): res.update(line.rstrip().split()) print sorted(res) -- DaveA -- https://mail.python.org/mailman/listinfo/python-list -- https://mail.python.org/mailman/listinfo/python-list
Re: Searching through more than one file.
On Dec 29, 2014, at 2:47 AM, Rick Johnson rantingrickjohn...@gmail.com wrote: On Sunday, December 28, 2014 11:29:48 AM UTC-6, Seymore4Head wrote: I need to search through a directory of text files for a string. Here is a short program I made in the past to search through a single text file for a line of text. Step1: Search through a single file. # Just a few more brush strokes... Step2: Search through all files in a directory. # Time to go exploring! Step3: Option to filter by file extension. # Waste not, want not! Step4: Option for recursing down sub-directories. # Look out deeply nested structures, here i come! # Look out deeply nested structures, here i come! # Look out deeply nested structures, here i come! # Look out deeply nested structures, here i come! # Look out deeply nested structures, here i come! # Look out deeply nested structures, here i come! # Look out deeply nested structures, here i come! [Opps, fell into a recursive black hole!] # Look out deeply nested structures, here i come! # Look out deeply nested structures, here i come! # Look out deeply nested structures, here i come! # Look out deeply nested structures, here i come! [BREAK] # Whew, no worries, MaximumRecursionError is my best friend! ;-) In addition to the other advice, you might want to check out os.walk() DEFINITELY use os.walk() if you're going to recurse through a directory tree. Here is an untested program I wrote that should do what you want. Modify as needed: # This is all Python 3 code, although I believe it will run under Python 2 # as well. # os.path is documented at https://docs.python.org/3/library/os.path.html # os.walk is documented at https://docs.python.org/3/library/os.html#os.walk # losging is documented at https://docs.python.org/3/library/logging.html import os import os.path import logging # Logging messages can be filtered by level. If you set the level really # low, then low-level messages, and all higher-level messages, will be # logged. However, if you set the filtering level higher, then low-level # messages will not be logged. Debug messages are lower than info messages, # so if you comment out the first line, and uncomment the second, you will # only get info messages (right now you're getting both). If you look # through the code, you'll see that I go up in levels as I work my way # inward through the filters; this makes debugging really, really easy. # I'll start out with my level high, and if my code works, I'm done. # However, if there is a bug, I'll work my downwards towards lower and # lower debug levels, which gives me more and more information. Eventually # I'll hit a level where I know enough about what is going on that I can # fix the problem. By the way, if you comment out both lines, you shouldn't # have any logging at all. logging.basicConfig(level=logging.DEBUG) ##logging.basicConfig(level=logging.INFO) EXTENSIONS = {.txt} def do_something_useful(real_path): # I deleted the original message, so I have no idea # what you were trying to accomplish, so I'm punting # the definition of this function back to you. pass for root, dirs, files in os.walk('/'): for f in files: # This expands symbolic links, cleans up double slashes, etc. # This can be useful when you're trying to debug why something # isn't working via logging. real_path = os.path.realpath(os.path.join(root, f)) logging.debug(operating on path '{0!s}'.format(real_path)) (r, e) = os.path.splitext(real_path) if e in EXTENSIONS: # If we've made a mistake in our EXTENSIONS set, we might never # reach this point. logging.info(Selected path '{0!s}'.format(real_path)) do_something_useful(real_path) As a note, for the sake of speed and your own sanity, you probably want to do the easiest/computationally cheapest filtering first here. That means selecting the files that match your extensions first, and then filtering those files by their contents second. Finally, if you are planning on parsing command-line options, DON'T do it by hand! Use argparse (https://docs.python.org/3/library/argparse.html) instead. Thanks, Cem Karan -- https://mail.python.org/mailman/listinfo/python-list
Searching through more than one file.
I need to search through a directory of text files for a string. Here is a short program I made in the past to search through a single text file for a line of text. How can I modify the code to search through a directory of files that have different filenames, but the same extension? fname = raw_input(Enter file name: ) #*.txt fh = open(fname) lst = list() biglst=[] for line in fh: line=line.rstrip() line=line.split() biglst+=line final=[] for out in biglst: if out not in final: final.append(out) final.sort() print (final) -- https://mail.python.org/mailman/listinfo/python-list
Re: Searching through more than one file.
On 28/12/2014 17:27, Seymore4Head wrote: I need to search through a directory of text files for a string. Here is a short program I made in the past to search through a single text file for a line of text. How can I modify the code to search through a directory of files that have different filenames, but the same extension? fname = raw_input(Enter file name: ) #*.txt fh = open(fname) lst = list() biglst=[] for line in fh: line=line.rstrip() line=line.split() biglst+=line final=[] for out in biglst: if out not in final: final.append(out) final.sort() print (final) See the glob function in the glob module here https://docs.python.org/3/library/glob.html#module-glob Similar functionality is available in the pathlib module https://docs.python.org/3/library/pathlib.html#module-pathlib but this is only available with Python 3.4 -- My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language. Mark Lawrence -- https://mail.python.org/mailman/listinfo/python-list
Re: Searching through more than one file.
Seymore4Head Seymore4Head@Hotmail.invalid writes: How can I modify the code to search through a directory of files that have different filenames, but the same extension? Use the os.listdir function to read the directory. It gives you a list of filenames that you can filter for the extension you want. Per Mark Lawrence, there's also a glob function. -- https://mail.python.org/mailman/listinfo/python-list
Re: Searching through more than one file.
On 12/28/2014 12:27 PM, Seymore4Head wrote: I need to search through a directory of text files for a string. Here is a short program I made in the past to search through a single text file for a line of text. How can I modify the code to search through a directory of files that have different filenames, but the same extension? You have two other replies to your specific question, glob and os.listdir. I would also mention the module fileinput: https://docs.python.org/2/library/fileinput.html import fileinput from glob import glob fnames = glob('*.txt') for line in fileinput.input(fnames): pass # do whatever If you're not on Windows, I'd mention that the shell will expand the wildcards for you, so you could get the filenames from argv even simpler. See first example on the above web page. I'm more concerned that you think the following code you supplied does a search for a string. It does something entirely different, involving making a crude dictionary. But it could be reduced to just a few lines, and probably take much less memory, if this is really the code you're working on. fname = raw_input(Enter file name: ) #*.txt fh = open(fname) lst = list() biglst=[] for line in fh: line=line.rstrip() line=line.split() biglst+=line final=[] for out in biglst: if out not in final: final.append(out) final.sort() print (final) Something like the following: import fileinput from glob import glob res = set() fnames = glob('*.txt') for line in fileinput.input(fnames): res.update(line.rstrip().split()) print sorted(res) -- DaveA -- https://mail.python.org/mailman/listinfo/python-list
Re: Searching through more than one file.
On 12/28/2014 02:12 PM, Dave Angel wrote: On 12/28/2014 12:27 PM, Seymore4Head wrote: I need to search through a directory of text files for a string. Here is a short program I made in the past to search through a single text file for a line of text. How can I modify the code to search through a directory of files that have different filenames, but the same extension? You have two other replies to your specific question, glob and os.listdir. I would also mention the module fileinput: https://docs.python.org/2/library/fileinput.html import fileinput from glob import glob fnames = glob('*.txt') for line in fileinput.input(fnames): pass # do whatever If you're not on Windows, I'd mention that the shell will expand the wildcards for you, so you could get the filenames from argv even simpler. See first example on the above web page. I'm more concerned that you think the following code you supplied does a search for a string. It does something entirely different, involving making a crude dictionary. But it could be reduced to just a few lines, and probably take much less memory, if this is really the code you're working on. Note: the changes I suggest also should be tons faster, if you have very many words you're parsing this way. fname = raw_input(Enter file name: ) #*.txt fh = open(fname) lst = list() biglst=[] for line in fh: line=line.rstrip() line=line.split() biglst+=line final=[] for out in biglst: if out not in final: final.append(out) final.sort() print (final) Something like the following: Untested, I should have said. import fileinput from glob import glob res = set() fnames = glob('*.txt') for line in fileinput.input(fnames): res.update(line.rstrip().split()) And I should have omitted the rsplit(), which does nothing that split() isn't already going to do. print sorted(res) -- DaveA -- https://mail.python.org/mailman/listinfo/python-list
Re: Searching through more than one file.
Dave Angel da...@davea.name writes: res = set() fnames = glob('*.txt') for line in fileinput.input(fnames): res.update(line.rstrip().split()) print sorted(res) Untested: print sorted(set(line.rstrip().split() for line in fileinput(fnames))) -- https://mail.python.org/mailman/listinfo/python-list
Re: Searching through more than one file.
On 12/28/2014 12:27 PM, Seymore4Head wrote: I need to search through a directory of text files for a string. Here is a short program I made in the past to search through a single text file for a line of text. How can I modify the code to search through a directory of files that have different filenames, but the same extension? You could simplify the relevant parts of idlelib/grep.py -- Terry Jan Reedy -- https://mail.python.org/mailman/listinfo/python-list
Re: Searching through more than one file.
On Sunday, December 28, 2014 11:29:48 AM UTC-6, Seymore4Head wrote: I need to search through a directory of text files for a string. Here is a short program I made in the past to search through a single text file for a line of text. Step1: Search through a single file. # Just a few more brush strokes... Step2: Search through all files in a directory. # Time to go exploring! Step3: Option to filter by file extension. # Waste not, want not! Step4: Option for recursing down sub-directories. # Look out deeply nested structures, here i come! # Look out deeply nested structures, here i come! # Look out deeply nested structures, here i come! # Look out deeply nested structures, here i come! # Look out deeply nested structures, here i come! # Look out deeply nested structures, here i come! # Look out deeply nested structures, here i come! [Opps, fell into a recursive black hole!] # Look out deeply nested structures, here i come! # Look out deeply nested structures, here i come! # Look out deeply nested structures, here i come! # Look out deeply nested structures, here i come! [BREAK] # Whew, no worries, MaximumRecursionError is my best friend! ;-) In addition to the other advice, you might want to check out os.walk(). -- https://mail.python.org/mailman/listinfo/python-list