Re: [Tutor] Matching zipcode in address file

2010-04-06 Thread Alan Gauld


"TGW"  wrote 

I got it. I was comparing '345' to '345\n'

Adding the '\n' to the slice did indeed do the trick.


Yes, the problem is that the data in the file always has a \n at the end.
So you either have to rstrip() that off when you read it from the file 
or add a \n to your source data when comparing it with file data.


Personally I usually use strip() so that I'm working with 'clean' data
both for source and reference.

--
Alan Gauld
Author of the Learn to Program web site
http://www.alan-g.me.uk/

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Matching zipcode in address file

2010-04-05 Thread TGW

I got it. I was comparing '345' to '345\n'

Adding the '\n' to the slice did indeed do the trick.

#!/usr/bin/env python

import string

def main():
 infile = open("filex")
 outfile = open("results_testx", "w")
 zips = open("zippys", "r")
 match_zips = zips.readlines()
 lines = [line for line in infile if (line[0:3] + '\n')  in  
match_zips]

 outfile.write(''.join(lines))
# print lines[0:2]
 zips.close()
 infile.close()
 outfile.close()
main()

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Matching zipcode in address file

2010-04-05 Thread TGW

OK - you handled the problem regarding reading to end-of-file. Yes it
takes a lot longer, because now you are actually iterating through
match_zips for each line.

How large are these files? Consider creating a set from match_zips. As
lists get longer, set membership test become faster than list  
membership

test.

If the outfile is empty that means that line[149:154] is never in
match_zips.

I suggest you take a look at match_zips. You will find a list of  
strings

of length 6, which cannot match line[149:154], a string of length 5.


I am still struggling with thisI have simplified the code, because  
I need to understand the principle.


#!/usr/bin/env python

import string

def main():
 infile = open("filex")
 outfile = open("results_testx", "w")
 zips = open("zippys", "r")
 match_zips = zips.readlines()
 lines = [line for line in infile if line[0:3] + '\n' in  
match_zips]

 outfile.write(''.join(lines))
 print line[0:3]
 zips.close()
 infile.close()
 outfile.close()
main()

filex:
112332424
23423423423
34523423423
456234234234
234234234234
5672342342
683824242

zippys:
123
123
234
345
456
567
678
555


I want to output records from filex whose first 3 characters match a  
record in zippys. Ouptut:

23423423423
34523423423
456234234234
234234234234
5672342342

I am not sure where I should put a '\n' or tweak something that I just  
cannot see.


Thanks
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Matching zipcode in address file

2010-04-05 Thread bob gailer

On 4/5/2010 1:15 AM, TGW wrote:

Sorry - my mistake - try:

infile = open("filex")
match_zips = open("zippys")
result = [line for line in infile if line in match_zips]
print result
When I apply the readlines to the original file, It is taking a lot 
longer to process and the outfile still remains blank. Any suggestions?


OK - you handled the problem regarding reading to end-of-file. Yes it 
takes a lot longer, because now you are actually iterating through 
match_zips for each line.


How large are these files? Consider creating a set from match_zips. As 
lists get longer, set membership test become faster than list membership 
test.


If the outfile is empty that means that line[149:154] is never in 
match_zips.


I suggest you take a look at match_zips. You will find a list of strings 
of length 6, which cannot match line[149:154], a string of length 5.




#!/usr/bin/env python
# Find records that match zipcodes in zips.txt

import os
import sys

def main():
infile = open("/Users/tgw/NM_2010/NM_APR.txt", "r")
outfile = open("zip_match_apr_2010.txt", "w")
zips = open("zips.txt", "r")
match_zips = zips.readlines()
lines = [ line for line in infile if line[149:154] in match_zips ]

outfile.write(''.join(lines))
#print line[149:154]
print lines
infile.close()
outfile.close()
main()






--
Bob Gailer
919-636-4239
Chapel Hill NC

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Matching zipcode in address file

2010-04-05 Thread ALAN GAULD
Please use Reply All whern responding to the list.

 

> lines = [line for line in infile if line[149:154] not in match_zips]
>
>Nope. I tried that. I actually modified your comprehension 
>that you provided about a month ago. 
>Works great for NOT matching, but can't figure out how to match. 
>Do you have another suggestion?def main():
>
> infile = open("/Users/tgw/NM_2010/NM_APR.txt", "r")
> outfile = open("zip_match_apr_2010.txt", "w")
> match_zips = open("zips.txt", "r")
>
>You probably are best to read the zips file into a list, 
>stripping the newlines: 
>
>
>matchzips = [match.strip() for match in open('zips.txt')]
>
>then
>
> lines = [line for line in infile if line[149:154] in match_zips] 
>Should work...
Either that or add a newline to the end of the slice.

HTH,

Alan G.___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Matching zipcode in address file

2010-04-04 Thread TGW
I'd suggest reading the data from the match_zips into a list, and if  
the format isn't correct, doing some post-processing on it.  But  
there's no way to advise on that since we weren't given the format  
of either file.


zipdata = match_zips.readlines()
Then you can do an  if XXX in zipdata with assurance.


Here is a simplified version of the program:
#!/usr/bin/env python

def main():
 infile = open("filex")
 outfile = open("results_testx", "w")
 zips = open("zippys")
 match_zips = zips.readlines()
 results = [line for line in infile if line[0:2] in match_zips]
 outfile.write(''.join(results))

 zips.close()
 infile.close()
 outfile.close()
main()

filex:
112332424
23423423423
34523423423
456234234234
234234234234
5672342342
67824242

zippys:
567
678
555

I want to output the lines in filex that match the the first 3 chars  
of zippys.


output:
5672342342
67824242
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Matching zipcode in address file

2010-04-04 Thread Dave Angel

Alan Gauld wrote:


"TGW"  wrote


I go the program functioning with
lines = [line for line in infile if line[149:154] not in match_zips]

But this matches records that do NOT match zipcodes. How do I get 
this  running so that it matches zips?



Take out the word 'not' from the comprehension?

That's one change.  But more fundamental is to change the file I/O.  
Since there's no seek() operation, the file continues wherever it left 
off the previous time.


I'd suggest reading the data from the match_zips into a list, and if the 
format isn't correct, doing some post-processing on it.  But there's no 
way to advise on that since we weren't given the format of either file.


zipdata = match_zips.readlines()
Then you can do an  if XXX in zipdata with assurance.

DaveA

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Matching zipcode in address file

2010-04-04 Thread TGW

Sorry - my mistake - try:

infile = open("filex")
match_zips = open("zippys")
result = [line for line in infile if line in match_zips]
print result
When I apply the readlines to the original file, It is taking a lot  
longer to process and the outfile still remains blank. Any suggestions?


#!/usr/bin/env python
# Find records that match zipcodes in zips.txt

import os
import sys

def main():
infile = open("/Users/tgw/NM_2010/NM_APR.txt", "r")
outfile = open("zip_match_apr_2010.txt", "w")
zips = open("zips.txt", "r")
match_zips = zips.readlines()
lines = [ line for line in infile if line[149:154] in match_zips ]

outfile.write(''.join(lines))
#print line[149:154]
print lines
infile.close()
outfile.close()
main()


___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Matching zipcode in address file

2010-04-04 Thread TGW



Sorry - my mistake - try:

infile = open("filex")
match_zips = open("zippys")
result = [line for line in infile if line in match_zips]
print result

okThanks...This should do it:

#!/usr/bin/env python

infile = open("filex")
zips = open("zippys")
match_zips = zips.readlines()
results = [line for line in infile if line in match_zips]
print results


___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Matching zipcode in address file

2010-04-04 Thread bob gailer

Please reply-all so a copy goes to the list.

On 4/4/2010 10:02 PM, TGW wrote:

>/  I wrote a script that compares two text files (one zip code file, and
/>/  one address file)  and tries to output records that match the
/>/  zipcodes. Here is what I have so far:
/>/
/>/  #!/usr/bin/env python
/>/  # Find records that match zipcodes in zips.txt
/>/
/>/  def main():
/>/  infile = open("/Users/tgw/NM_2010/NM_APR.txt", "r")
/>/  outfile = open("zip_match_apr_2010.txt", "w")
/>/  match_zips = open("zips.txt", "r")
/>/
/>/  lines = [line for line in infile if line[149:154] in match_zips] #
/>/  *** I think the problem is here ***
/
Yep. You are right.

Try a very simple test case; see if you can figure out what's happening:

infile:
123
234
345

match_zips:
123
234
345

infile = open("infile")
match_zips = open("match_zips")
[line for line in infile if line in match_zips]

Now change infile:
123
244
345
and run the program again.

Interesting, no. Does that give you any insights?

I think I am just lost on this one. I have no new insights. What is the exact 
program that you want me to run?
#!/usr/bin/env python

infile = open("filex")
match_zips = open("zippys")
[line for line in infile if line in match_zips]
print line
I did what you said and I get '345' output both times.


Sorry - my mistake - try:

infile = open("filex")
match_zips = open("zippys")
result = [line for line in infile if line in match_zips]
print result



--
Bob Gailer
919-636-4239
Chapel Hill NC

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Matching zipcode in address file

2010-04-04 Thread bob gailer

On 4/4/2010 5:18 PM, TGW wrote:
I wrote a script that compares two text files (one zip code file, and 
one address file)  and tries to output records that match the 
zipcodes. Here is what I have so far:


#!/usr/bin/env python
# Find records that match zipcodes in zips.txt

def main():
infile = open("/Users/tgw/NM_2010/NM_APR.txt", "r")
outfile = open("zip_match_apr_2010.txt", "w")
match_zips = open("zips.txt", "r")

lines = [line for line in infile if line[149:154] in match_zips] # 
*** I think the problem is here ***


Yep. You are right.

Try a very simple test case; see if you can figure out what's happening:

infile:
123
234
345

match_zips:
123
234
345

infile = open("infile")
match_zips = open("match_zips")
[line for line in infile if line in match_zips]

Now change infile:
123
244
345
and run the program again.

Interesting, no. Does that give you any insights?



outfile.write(''.join(lines))
infile.close()
outfile.close()
main()

I go the program functioning with
lines = [line for line in infile if line[149:154] not in match_zips]

But this matches records that do NOT match zipcodes. How do I get this 
running so that it matches zips?


Thanks


--
Bob Gailer
919-636-4239
Chapel Hill NC

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Matching zipcode in address file

2010-04-04 Thread Alan Gauld


"TGW"  wrote


I go the program functioning with
lines = [line for line in infile if line[149:154] not in match_zips]

But this matches records that do NOT match zipcodes. How do I get this  
running so that it matches zips?



Take out the word 'not' from the comprehension?

--
Alan Gauld
Author of the Learn to Program web site
http://www.alan-g.me.uk/

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


[Tutor] Matching zipcode in address file

2010-04-04 Thread TGW
I wrote a script that compares two text files (one zip code file, and  
one address file)  and tries to output records that match the  
zipcodes. Here is what I have so far:


#!/usr/bin/env python
# Find records that match zipcodes in zips.txt

def main():
infile = open("/Users/tgw/NM_2010/NM_APR.txt", "r")
outfile = open("zip_match_apr_2010.txt", "w")
match_zips = open("zips.txt", "r")

lines = [line for line in infile if line[149:154] in match_zips]  
# *** I think the problem is here ***


outfile.write(''.join(lines))
infile.close()
outfile.close()
main()

I go the program functioning with
lines = [line for line in infile if line[149:154] not in match_zips]

But this matches records that do NOT match zipcodes. How do I get this  
running so that it matches zips?


Thanks
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor