Re: [Tutor] Matching zipcode in address file
"TGW" wrote I got it. I was comparing '345' to '345\n' Adding the '\n' to the slice did indeed do the trick. Yes, the problem is that the data in the file always has a \n at the end. So you either have to rstrip() that off when you read it from the file or add a \n to your source data when comparing it with file data. Personally I usually use strip() so that I'm working with 'clean' data both for source and reference. -- Alan Gauld Author of the Learn to Program web site http://www.alan-g.me.uk/ ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Matching zipcode in address file
I got it. I was comparing '345' to '345\n' Adding the '\n' to the slice did indeed do the trick. #!/usr/bin/env python import string def main(): infile = open("filex") outfile = open("results_testx", "w") zips = open("zippys", "r") match_zips = zips.readlines() lines = [line for line in infile if (line[0:3] + '\n') in match_zips] outfile.write(''.join(lines)) # print lines[0:2] zips.close() infile.close() outfile.close() main() ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Matching zipcode in address file
OK - you handled the problem regarding reading to end-of-file. Yes it takes a lot longer, because now you are actually iterating through match_zips for each line. How large are these files? Consider creating a set from match_zips. As lists get longer, set membership test become faster than list membership test. If the outfile is empty that means that line[149:154] is never in match_zips. I suggest you take a look at match_zips. You will find a list of strings of length 6, which cannot match line[149:154], a string of length 5. I am still struggling with thisI have simplified the code, because I need to understand the principle. #!/usr/bin/env python import string def main(): infile = open("filex") outfile = open("results_testx", "w") zips = open("zippys", "r") match_zips = zips.readlines() lines = [line for line in infile if line[0:3] + '\n' in match_zips] outfile.write(''.join(lines)) print line[0:3] zips.close() infile.close() outfile.close() main() filex: 112332424 23423423423 34523423423 456234234234 234234234234 5672342342 683824242 zippys: 123 123 234 345 456 567 678 555 I want to output records from filex whose first 3 characters match a record in zippys. Ouptut: 23423423423 34523423423 456234234234 234234234234 5672342342 I am not sure where I should put a '\n' or tweak something that I just cannot see. Thanks ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Matching zipcode in address file
On 4/5/2010 1:15 AM, TGW wrote: Sorry - my mistake - try: infile = open("filex") match_zips = open("zippys") result = [line for line in infile if line in match_zips] print result When I apply the readlines to the original file, It is taking a lot longer to process and the outfile still remains blank. Any suggestions? OK - you handled the problem regarding reading to end-of-file. Yes it takes a lot longer, because now you are actually iterating through match_zips for each line. How large are these files? Consider creating a set from match_zips. As lists get longer, set membership test become faster than list membership test. If the outfile is empty that means that line[149:154] is never in match_zips. I suggest you take a look at match_zips. You will find a list of strings of length 6, which cannot match line[149:154], a string of length 5. #!/usr/bin/env python # Find records that match zipcodes in zips.txt import os import sys def main(): infile = open("/Users/tgw/NM_2010/NM_APR.txt", "r") outfile = open("zip_match_apr_2010.txt", "w") zips = open("zips.txt", "r") match_zips = zips.readlines() lines = [ line for line in infile if line[149:154] in match_zips ] outfile.write(''.join(lines)) #print line[149:154] print lines infile.close() outfile.close() main() -- Bob Gailer 919-636-4239 Chapel Hill NC ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Matching zipcode in address file
Please use Reply All whern responding to the list. > lines = [line for line in infile if line[149:154] not in match_zips] > >Nope. I tried that. I actually modified your comprehension >that you provided about a month ago. >Works great for NOT matching, but can't figure out how to match. >Do you have another suggestion?def main(): > > infile = open("/Users/tgw/NM_2010/NM_APR.txt", "r") > outfile = open("zip_match_apr_2010.txt", "w") > match_zips = open("zips.txt", "r") > >You probably are best to read the zips file into a list, >stripping the newlines: > > >matchzips = [match.strip() for match in open('zips.txt')] > >then > > lines = [line for line in infile if line[149:154] in match_zips] >Should work... Either that or add a newline to the end of the slice. HTH, Alan G.___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Matching zipcode in address file
I'd suggest reading the data from the match_zips into a list, and if the format isn't correct, doing some post-processing on it. But there's no way to advise on that since we weren't given the format of either file. zipdata = match_zips.readlines() Then you can do an if XXX in zipdata with assurance. Here is a simplified version of the program: #!/usr/bin/env python def main(): infile = open("filex") outfile = open("results_testx", "w") zips = open("zippys") match_zips = zips.readlines() results = [line for line in infile if line[0:2] in match_zips] outfile.write(''.join(results)) zips.close() infile.close() outfile.close() main() filex: 112332424 23423423423 34523423423 456234234234 234234234234 5672342342 67824242 zippys: 567 678 555 I want to output the lines in filex that match the the first 3 chars of zippys. output: 5672342342 67824242 ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Matching zipcode in address file
Alan Gauld wrote: "TGW" wrote I go the program functioning with lines = [line for line in infile if line[149:154] not in match_zips] But this matches records that do NOT match zipcodes. How do I get this running so that it matches zips? Take out the word 'not' from the comprehension? That's one change. But more fundamental is to change the file I/O. Since there's no seek() operation, the file continues wherever it left off the previous time. I'd suggest reading the data from the match_zips into a list, and if the format isn't correct, doing some post-processing on it. But there's no way to advise on that since we weren't given the format of either file. zipdata = match_zips.readlines() Then you can do an if XXX in zipdata with assurance. DaveA ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Matching zipcode in address file
Sorry - my mistake - try: infile = open("filex") match_zips = open("zippys") result = [line for line in infile if line in match_zips] print result When I apply the readlines to the original file, It is taking a lot longer to process and the outfile still remains blank. Any suggestions? #!/usr/bin/env python # Find records that match zipcodes in zips.txt import os import sys def main(): infile = open("/Users/tgw/NM_2010/NM_APR.txt", "r") outfile = open("zip_match_apr_2010.txt", "w") zips = open("zips.txt", "r") match_zips = zips.readlines() lines = [ line for line in infile if line[149:154] in match_zips ] outfile.write(''.join(lines)) #print line[149:154] print lines infile.close() outfile.close() main() ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Matching zipcode in address file
Sorry - my mistake - try: infile = open("filex") match_zips = open("zippys") result = [line for line in infile if line in match_zips] print result okThanks...This should do it: #!/usr/bin/env python infile = open("filex") zips = open("zippys") match_zips = zips.readlines() results = [line for line in infile if line in match_zips] print results ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Matching zipcode in address file
Please reply-all so a copy goes to the list. On 4/4/2010 10:02 PM, TGW wrote: >/ I wrote a script that compares two text files (one zip code file, and />/ one address file) and tries to output records that match the />/ zipcodes. Here is what I have so far: />/ />/ #!/usr/bin/env python />/ # Find records that match zipcodes in zips.txt />/ />/ def main(): />/ infile = open("/Users/tgw/NM_2010/NM_APR.txt", "r") />/ outfile = open("zip_match_apr_2010.txt", "w") />/ match_zips = open("zips.txt", "r") />/ />/ lines = [line for line in infile if line[149:154] in match_zips] # />/ *** I think the problem is here *** / Yep. You are right. Try a very simple test case; see if you can figure out what's happening: infile: 123 234 345 match_zips: 123 234 345 infile = open("infile") match_zips = open("match_zips") [line for line in infile if line in match_zips] Now change infile: 123 244 345 and run the program again. Interesting, no. Does that give you any insights? I think I am just lost on this one. I have no new insights. What is the exact program that you want me to run? #!/usr/bin/env python infile = open("filex") match_zips = open("zippys") [line for line in infile if line in match_zips] print line I did what you said and I get '345' output both times. Sorry - my mistake - try: infile = open("filex") match_zips = open("zippys") result = [line for line in infile if line in match_zips] print result -- Bob Gailer 919-636-4239 Chapel Hill NC ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Matching zipcode in address file
On 4/4/2010 5:18 PM, TGW wrote: I wrote a script that compares two text files (one zip code file, and one address file) and tries to output records that match the zipcodes. Here is what I have so far: #!/usr/bin/env python # Find records that match zipcodes in zips.txt def main(): infile = open("/Users/tgw/NM_2010/NM_APR.txt", "r") outfile = open("zip_match_apr_2010.txt", "w") match_zips = open("zips.txt", "r") lines = [line for line in infile if line[149:154] in match_zips] # *** I think the problem is here *** Yep. You are right. Try a very simple test case; see if you can figure out what's happening: infile: 123 234 345 match_zips: 123 234 345 infile = open("infile") match_zips = open("match_zips") [line for line in infile if line in match_zips] Now change infile: 123 244 345 and run the program again. Interesting, no. Does that give you any insights? outfile.write(''.join(lines)) infile.close() outfile.close() main() I go the program functioning with lines = [line for line in infile if line[149:154] not in match_zips] But this matches records that do NOT match zipcodes. How do I get this running so that it matches zips? Thanks -- Bob Gailer 919-636-4239 Chapel Hill NC ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Matching zipcode in address file
"TGW" wrote I go the program functioning with lines = [line for line in infile if line[149:154] not in match_zips] But this matches records that do NOT match zipcodes. How do I get this running so that it matches zips? Take out the word 'not' from the comprehension? -- Alan Gauld Author of the Learn to Program web site http://www.alan-g.me.uk/ ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
[Tutor] Matching zipcode in address file
I wrote a script that compares two text files (one zip code file, and one address file) and tries to output records that match the zipcodes. Here is what I have so far: #!/usr/bin/env python # Find records that match zipcodes in zips.txt def main(): infile = open("/Users/tgw/NM_2010/NM_APR.txt", "r") outfile = open("zip_match_apr_2010.txt", "w") match_zips = open("zips.txt", "r") lines = [line for line in infile if line[149:154] in match_zips] # *** I think the problem is here *** outfile.write(''.join(lines)) infile.close() outfile.close() main() I go the program functioning with lines = [line for line in infile if line[149:154] not in match_zips] But this matches records that do NOT match zipcodes. How do I get this running so that it matches zips? Thanks ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor