Re: Searching through more than one file.

2014-12-29 Thread Albert-Jan Roskam

-
On Sun, Dec 28, 2014 8:12 PM CET Dave Angel wrote:

On 12/28/2014 12:27 PM, Seymore4Head wrote:
 I need to search through a directory of text files for a string.
 Here is a short program I made in the past to search through a single
 text file for a line of text.
 
 How can I modify the code to search through a directory of files that
 have different filenames, but the same extension?
 

You have two other replies to your specific question, glob and os.listdir.  I 
would also mention the module fileinput:

https://docs.python.org/2/library/fileinput.html


Ah, I was just about to say that. I found out about this gem after reading 
Dough Helmann's book. Here are some usage examples: 
http://pymotw.com/2/fileinput/


import fileinput
from glob import glob

fnames = glob('*.txt')
for line in fileinput.input(fnames):
pass # do whatever

If you're not on Windows, I'd mention that the shell will expand the wildcards 
for you, so you could get the filenames from argv even simpler.  See first 
example on the above web page.


I'm more concerned that you think the following code you supplied does a 
search for a string.  It does something entirely different, involving making a 
crude dictionary.  But it could be reduced to just a few lines, and probably 
take much less memory, if this is really the code you're working on.

 fname = raw_input(Enter file name: )  #*.txt
 fh = open(fname)
 lst = list()
 biglst=[]
 for line in fh:
  line=line.rstrip()
  line=line.split()
  biglst+=line
 final=[]
 for out in biglst:
  if out not in final:
  final.append(out)
 final.sort()
 print (final)
 

Something like the following:

import fileinput
from glob import glob

res = set()
fnames = glob('*.txt')
for line in fileinput.input(fnames):
res.update(line.rstrip().split())
print sorted(res)




-- DaveA
-- https://mail.python.org/mailman/listinfo/python-list

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Searching through more than one file.

2014-12-29 Thread Cem Karan

On Dec 29, 2014, at 2:47 AM, Rick Johnson rantingrickjohn...@gmail.com wrote:

 On Sunday, December 28, 2014 11:29:48 AM UTC-6, Seymore4Head wrote:
 I need to search through a directory of text files for a string.
 Here is a short program I made in the past to search through a single
 text file for a line of text.
 
 Step1: Search through a single file. 
 # Just a few more brush strokes...
 
 Step2: Search through all files in a directory. 
 # Time to go exploring! 
 
 Step3: Option to filter by file extension. 
 # Waste not, want not!
 
 Step4: Option for recursing down sub-directories. 
 # Look out deeply nested structures, here i come!
 # Look out deeply nested structures, here i come!
 # Look out deeply nested structures, here i come!
 # Look out deeply nested structures, here i come!
 # Look out deeply nested structures, here i come!
 # Look out deeply nested structures, here i come!
 # Look out deeply nested structures, here i come!
 [Opps, fell into a recursive black hole!]
 # Look out deeply nested structures, here i come!
 # Look out deeply nested structures, here i come!
 # Look out deeply nested structures, here i come!
 # Look out deeply nested structures, here i come!
 [BREAK]
 # Whew, no worries, MaximumRecursionError is my best friend! 
 
 ;-)
 
 In addition to the other advice, you might want to check out os.walk()

DEFINITELY use os.walk() if you're going to recurse through a directory tree.  
Here is an untested program I wrote that should do what you want.  Modify as 
needed:


# This is all Python 3 code, although I believe it will run under Python 2
# as well.  

# os.path is documented at https://docs.python.org/3/library/os.path.html
# os.walk is documented at https://docs.python.org/3/library/os.html#os.walk
# losging is documented at https://docs.python.org/3/library/logging.html

import os
import os.path
import logging

# Logging messages can be filtered by level.  If you set the level really
# low, then low-level messages, and all higher-level messages, will be
# logged.  However, if you set the filtering level higher, then low-level
# messages will not be logged.  Debug messages are lower than info messages,
# so if you comment out the first line, and uncomment the second, you will
# only get info messages (right now you're getting both).  If you look
# through the code, you'll see that I go up in levels as I work my way 
# inward through the filters; this makes debugging really, really easy.
# I'll start out with my level high, and if my code works, I'm done. 
# However, if there is a bug, I'll work my downwards towards lower and
# lower debug levels, which gives me more and more information.  Eventually
# I'll hit a level where I know enough about what is going on that I can 
# fix the problem.  By the way, if you comment out both lines, you shouldn't
# have any logging at all.
logging.basicConfig(level=logging.DEBUG)
##logging.basicConfig(level=logging.INFO)

EXTENSIONS = {.txt}

def do_something_useful(real_path):
# I deleted the original message, so I have no idea 
# what you were trying to accomplish, so I'm punting 
# the definition of this function back to you.
pass

for root, dirs, files in os.walk('/'):
for f in files:
# This expands symbolic links, cleans up double slashes, etc.
# This can be useful when you're trying to debug why something
# isn't working via logging.
real_path = os.path.realpath(os.path.join(root, f))
logging.debug(operating on path '{0!s}'.format(real_path))
(r, e) = os.path.splitext(real_path)
if e in EXTENSIONS:
# If we've made a mistake in our EXTENSIONS set, we might never
# reach this point.  
logging.info(Selected path '{0!s}'.format(real_path))
do_something_useful(real_path)


As a note, for the sake of speed and your own sanity, you probably want to do 
the easiest/computationally cheapest filtering first here.  That means 
selecting the files that match your extensions first, and then filtering those 
files by their contents second.

Finally, if you are planning on parsing command-line options, DON'T do it by 
hand!  Use argparse (https://docs.python.org/3/library/argparse.html) instead.

Thanks,
Cem Karan

-- 
https://mail.python.org/mailman/listinfo/python-list


Searching through more than one file.

2014-12-28 Thread Seymore4Head
I need to search through a directory of text files for a string.
Here is a short program I made in the past to search through a single
text file for a line of text.

How can I modify the code to search through a directory of files that
have different filenames, but the same extension?

fname = raw_input(Enter file name: )  #*.txt
fh = open(fname)
lst = list()
biglst=[]
for line in fh:
line=line.rstrip()
line=line.split()
biglst+=line
final=[]
for out in biglst:
if out not in final:
final.append(out)
final.sort()
print (final)
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Searching through more than one file.

2014-12-28 Thread Mark Lawrence

On 28/12/2014 17:27, Seymore4Head wrote:

I need to search through a directory of text files for a string.
Here is a short program I made in the past to search through a single
text file for a line of text.

How can I modify the code to search through a directory of files that
have different filenames, but the same extension?

fname = raw_input(Enter file name: )  #*.txt
fh = open(fname)
lst = list()
biglst=[]
for line in fh:
 line=line.rstrip()
 line=line.split()
 biglst+=line
final=[]
for out in biglst:
 if out not in final:
 final.append(out)
final.sort()
print (final)



See the glob function in the glob module here 
https://docs.python.org/3/library/glob.html#module-glob


Similar functionality is available in the pathlib module 
https://docs.python.org/3/library/pathlib.html#module-pathlib but this 
is only available with Python 3.4


--
My fellow Pythonistas, ask not what our language can do for you, ask
what you can do for our language.

Mark Lawrence

--
https://mail.python.org/mailman/listinfo/python-list


Re: Searching through more than one file.

2014-12-28 Thread Paul Rubin
Seymore4Head Seymore4Head@Hotmail.invalid writes:
 How can I modify the code to search through a directory of files that
 have different filenames, but the same extension?

Use the os.listdir function to read the directory.  It gives you a list
of filenames that you can filter for the extension you want.

Per Mark Lawrence, there's also a glob function.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Searching through more than one file.

2014-12-28 Thread Dave Angel

On 12/28/2014 12:27 PM, Seymore4Head wrote:

I need to search through a directory of text files for a string.
Here is a short program I made in the past to search through a single
text file for a line of text.

How can I modify the code to search through a directory of files that
have different filenames, but the same extension?



You have two other replies to your specific question, glob and 
os.listdir.  I would also mention the module fileinput:


https://docs.python.org/2/library/fileinput.html

import fileinput
from glob import glob

fnames = glob('*.txt')
for line in fileinput.input(fnames):
pass # do whatever

If you're not on Windows, I'd mention that the shell will expand the 
wildcards for you, so you could get the filenames from argv even 
simpler.  See first example on the above web page.



I'm more concerned that you think the following code you supplied does a 
search for a string.  It does something entirely different, involving 
making a crude dictionary.  But it could be reduced to just a few lines, 
and probably take much less memory, if this is really the code you're 
working on.



fname = raw_input(Enter file name: )  #*.txt
fh = open(fname)
lst = list()
biglst=[]
for line in fh:
 line=line.rstrip()
 line=line.split()
 biglst+=line
final=[]
for out in biglst:
 if out not in final:
 final.append(out)
final.sort()
print (final)



Something like the following:

import fileinput
from glob import glob

res = set()
fnames = glob('*.txt')
for line in fileinput.input(fnames):
res.update(line.rstrip().split())
print sorted(res)




--
DaveA
--
https://mail.python.org/mailman/listinfo/python-list


Re: Searching through more than one file.

2014-12-28 Thread Dave Angel

On 12/28/2014 02:12 PM, Dave Angel wrote:

On 12/28/2014 12:27 PM, Seymore4Head wrote:

I need to search through a directory of text files for a string.
Here is a short program I made in the past to search through a single
text file for a line of text.

How can I modify the code to search through a directory of files that
have different filenames, but the same extension?



You have two other replies to your specific question, glob and
os.listdir.  I would also mention the module fileinput:

https://docs.python.org/2/library/fileinput.html

import fileinput
from glob import glob

fnames = glob('*.txt')
for line in fileinput.input(fnames):
 pass # do whatever

If you're not on Windows, I'd mention that the shell will expand the
wildcards for you, so you could get the filenames from argv even
simpler.  See first example on the above web page.


I'm more concerned that you think the following code you supplied does a
search for a string.  It does something entirely different, involving
making a crude dictionary.  But it could be reduced to just a few lines,
and probably take much less memory, if this is really the code you're
working on.


Note:  the changes I suggest also should be tons faster, if you have 
very many words you're parsing this way.





fname = raw_input(Enter file name: )  #*.txt
fh = open(fname)
lst = list()
biglst=[]
for line in fh:
 line=line.rstrip()
 line=line.split()
 biglst+=line
final=[]
for out in biglst:
 if out not in final:
 final.append(out)
final.sort()
print (final)







Something like the following:

Untested, I should have said.



import fileinput
from glob import glob

res = set()
fnames = glob('*.txt')
for line in fileinput.input(fnames):
 res.update(line.rstrip().split())


And I should have omitted the rsplit(), which does nothing that split() 
isn't already going to do.



print sorted(res)







--
DaveA
--
https://mail.python.org/mailman/listinfo/python-list


Re: Searching through more than one file.

2014-12-28 Thread Paul Rubin
Dave Angel da...@davea.name writes:
 res = set()
 fnames = glob('*.txt')
 for line in fileinput.input(fnames):
 res.update(line.rstrip().split())
 print sorted(res)

Untested:

print sorted(set(line.rstrip().split() for line in fileinput(fnames)))
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Searching through more than one file.

2014-12-28 Thread Terry Reedy

On 12/28/2014 12:27 PM, Seymore4Head wrote:

I need to search through a directory of text files for a string.
Here is a short program I made in the past to search through a single
text file for a line of text.

How can I modify the code to search through a directory of files that
have different filenames, but the same extension?


You could simplify the relevant parts of idlelib/grep.py

--
Terry Jan Reedy

--
https://mail.python.org/mailman/listinfo/python-list


Re: Searching through more than one file.

2014-12-28 Thread Rick Johnson
On Sunday, December 28, 2014 11:29:48 AM UTC-6, Seymore4Head wrote:
 I need to search through a directory of text files for a string.
 Here is a short program I made in the past to search through a single
 text file for a line of text.

Step1: Search through a single file. 
# Just a few more brush strokes...

Step2: Search through all files in a directory. 
# Time to go exploring! 

Step3: Option to filter by file extension. 
# Waste not, want not!

Step4: Option for recursing down sub-directories. 
# Look out deeply nested structures, here i come!
# Look out deeply nested structures, here i come!
# Look out deeply nested structures, here i come!
# Look out deeply nested structures, here i come!
# Look out deeply nested structures, here i come!
# Look out deeply nested structures, here i come!
# Look out deeply nested structures, here i come!
 [Opps, fell into a recursive black hole!]
# Look out deeply nested structures, here i come!
# Look out deeply nested structures, here i come!
# Look out deeply nested structures, here i come!
# Look out deeply nested structures, here i come!
 [BREAK]
# Whew, no worries, MaximumRecursionError is my best friend! 

;-)

In addition to the other advice, you might want to check out os.walk().
-- 
https://mail.python.org/mailman/listinfo/python-list