Re: [Newbie] Search-and-delete text processing problem...

2005-04-01 Thread Bengt Richter
On Fri, 1 Apr 2005 17:33:59 -0800, "Todd_Calhoun" <[EMAIL PROTECTED]> wrote:

>I'm trying to learn about text processing in Python, and I'm trying to 
>tackle what should be a simple task.
>
>I have long text files of books with a citation between each paragraph,
Most text files aren't long enough to worry about, but you can avoid
reading in the whole file by just iterating, one line at a time. That is
the way a file object iterates by default, so there's not much to that.

>which might be like "Bill D. Smith, History through the Ages, p.5".
>
>So, I need to search for every line that starts with a certain string (in 
>this example, "Bill D. Smith"), and delete the whole line.
If you want to test what a string starts with, there's a string method for that.
E.g., if line is the string representing one line, line.startswith('Bill') would
return True or False.
>
>I've tried a couple of different things, but none seem to work.  Here's my 
>latest try.  I apologize in advance for being so clueless.
>
>##
>#Text search and delete line tool
>
>theInFile = open("test_f.txt", "r")
>theOutFile = open("test_f_out.txt", "w")
>
>allLines = theInFile.readlines()
This will create a list of lines, all (except perhaps the last, if
the file had no end-of-line character(s) at the very end) with '\n'
as the last character. There are various ways to strip the line ends,
but your use case doesn't appear to require it.

>
>for line in allLines:
 # line at this point contains each line successively as the loop proceeds,
 # but you don't know where in the sequence you are unless you provide for 
it,
 # e.g. by using
 for i, line in enumerate(allLines):
>if line[3] == 'Bill':
The above line is comparing the 4th character of the line (indexing from 0) 
with 'Bill'
which is never going to be true, and will raise an IndexError if the line is 
shorter than
4 characters. Not what you want to do.
 if line.startswith('Bill'):  # note that this is case sensitive. Otherwise 
use line.lower().startswith('bill')

>line == ' '
 the enumerate will give you an index you can use for this, but I doubt 
if you want and invisible space
 without a line ending in place of 'Bill ... \n'
 line[i] = '\n'  # makes an actual blank line , but you want to delete 
it, so this is not going to work
>

>
>theOutFile.writelines(allLines)

UIAM (untested) you should be able to do the entire job removing lines that 
start with 'Bill' thus:

 theInFile = open("test_f.txt", "r")
 theOutFile = open("test_f_out.txt", "w")
 theOutFile.writelines(line for line in theInfile if not 
line.startswith('Bill'))

Or just the line

 open("test_f_out.txt", "w").writelines(L for L in open("test_f.txt") if not 
L.startswith('Bill'))

If you need to remove lines starting with any name in a certain list, you can 
do that too, e.g.,

 delStarts = ['Bill', 'Bob', 'Sue']
 theInFile = open("test_f.txt", "r")
 theOutFile = open("test_f_out.txt", "w")
 for line in theInFile:
 for name in delStarts:
 if line.startswith(name): break
 else: # will happen if there is NO break, so line does not start with any 
delStarts name
 theOutFile.write(line) # write line out if not starting badly
 
(You could do that with a oneliner too, but it gets silly ;-)

If you have a LOT of names to check for, it could pay you to figure a way to 
split off the name
from the fron of a lines, and check if that is in a set instead using a 
delStart list.
If you do use delStart, put the most popular names at the front.

>#
>
>I know I could do it in Word fairly easily, but I'd like to learn the Python 
>way to do things.
Have fun.
>
>Thanks for any advice. 
>
HTH (nothing tested, sorry ;-)

Regards,
Bengt Richter
-- 
http://mail.python.org/mailman/listinfo/python-list


[Newbie] Search-and-delete text processing problem...

2005-04-01 Thread Todd_Calhoun
I'm trying to learn about text processing in Python, and I'm trying to 
tackle what should be a simple task.

I have long text files of books with a citation between each paragraph, 
which might be like "Bill D. Smith, History through the Ages, p.5".

So, I need to search for every line that starts with a certain string (in 
this example, "Bill D. Smith"), and delete the whole line.

I've tried a couple of different things, but none seem to work.  Here's my 
latest try.  I apologize in advance for being so clueless.

##
#Text search and delete line tool

theInFile = open("test_f.txt", "r")
theOutFile = open("test_f_out.txt", "w")

allLines = theInFile.readlines()

for line in allLines:
if line[3] == 'Bill':
line == ' '


theOutFile.writelines(allLines)
#

I know I could do it in Word fairly easily, but I'd like to learn the Python 
way to do things.

Thanks for any advice. 


-- 
http://mail.python.org/mailman/listinfo/python-list