[Tutor] Script to search in string of values from file A in file B
Dear All, I'm new to Python and started to use it to search text strings in big (500Mb) txt files. I have a list on text file (e.g. A.txt) that I want to use as a key to search another file (e.g. B.txt), organized in the following way: A.txt: Aaa Bbb Ccc Ddd . . . B.txt Bbb 1234 Xxx 234 I want to use A.txt to search in B.txt and have as output the original search entry (e.g. Bbb) followed by the line that follows it in the B.txt (e.g. Bbb / 1234). I wrote the following script: object = open(B.txt', 'r') lista = open(A.txt', 'r') searches = lista.readlines() for line in object.readlines(): for word in searches: if word in line: print line+'\n' But from here I only get the searching entry and not the line afterwards, I tried to google it but I got lost and didn't manage to do it. Any ideas ? I guess that this is basic scripting but I just started . Best Afonso ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Script to search in string of values from file A in file B
On Thursday, May 10, 2012, Afonso Duarte adua...@itqb.unl.pt wrote: Dear All, I’m new to Python and started to use it to search text strings in big (500Mb) txt files. I have a list on text file (e.g. A.txt) that I want to use as a key to search another file (e.g. B.txt), organized in the following way: A.txt: Aaa Bbb Ccc Ddd . . . B.txt Bbb 1234 Xxx 234 I want to use A.txt to search in B.txt and have as output the original search entry (e.g. Bbb) followed by the line that follows it in the B.txt (e.g. Bbb / 1234). I wrote the following script: object = open(B.txt', 'r') lista = open(A.txt', 'r') searches = lista.readlines() for line in object.readlines(): for word in searches: if word in line: print line+'\n' But from here I only get the searching entry and not the line afterwards, I tried to google it but I got lost and didn’t manage to do it. Any ideas ? I guess that this is basic scripting but I just started . Not sure I understood the question... But: - are you trying to grep the text file? (simpler than programming in python, IMO) - if you have multiple matches of any of the keys from A file in a sungle line of B file, the script above will print it multiple times - you need not add new line (\n) in the print statement, unless you want it to print a blank line between results Based on the example you gave, the matching Bbb value in B and A are the same, so actually line is being printed, but it is just the same as word... Best Afonso -- Sent from Gmail Mobile ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Script to search in string of values from file A in file B
On 2012-05-09 15:22, BRAGA, Bruno wrote: On Thursday, May 10, 2012, Afonso Duarte adua...@itqb.unl.pt [1] wrote: Dear All, I’m new to Python and started to use it to search text strings in big (500Mb) txt files. I have a list on text file (e.g. A.txt) that I want to use as a key to search another file (e.g. B.txt), organized in the following way: A.txt: Aaa Bbb Ccc Ddd . . . B.txt Bbb 1234 Xxx 234 I want to use A.txt to search in B.txt and have as output the original search entry (e.g. Bbb) followed by the line that follows it in the B.txt (e.g. Bbb / 1234). I wrote the following script: object = open(B.txt', 'r') lista = open(A.txt', 'r') searches = lista.readlines() for line in object.readlines(): for word in searches: if word in line: print line+'n' But from here I only get the searching entry and not the line afterwards, I tried to google it but I got lost and didn’t manage to do it. Any ideas ? I guess that this is basic scripting but I just started . Not sure I understood the question... But: - are you trying to grep the text file? (simpler than programming in python, IMO) - if you have multiple matches of any of the keys from A file in a sungle line of B file, the script above will print it multiple times true, I did not mention, but the entries in file A.txt only appear once in b.txt. - you need not add new line (n) in the print statement, unless you want it to print a blank line between results true Based on the example you gave, the matching Bbb value in B and A are the same, so actually line is being printed, but it is just the same as word... exactly! but what I want is that plus the value that proceeds that line in the B.txt i.e. Bbb 1234 Best Afonso Best Afonso ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Script to search in string of values from file A in file B
On 05/09/2012 10:00 AM, Afonso Duarte wrote: Dear All, I'm new to Python and started to use it to search text strings in big (500Mb) txt files. I have a list on text file (e.g. A.txt) that I want to use as a key to search another file (e.g. B.txt), organized in the following way: A.txt: Aaa Bbb Ccc Ddd . . . B.txt Bbb 1234 Xxx 234 I want to use A.txt to search in B.txt and have as output the original search entry (e.g. Bbb) followed by the line that follows it in the B.txt (e.g. Bbb / 1234). I wrote the following script: object = open(B.txt', 'r') lista = open(A.txt', 'r') searches = lista.readlines() for line in object.readlines(): for word in searches: if word in line: print line+'\n' But from here I only get the searching entry and not the line afterwards, I tried to google it but I got lost and didn't manage to do it. Any ideas ? I guess that this is basic scripting but I just started . Best Afonso Please post your messages as plain-text. The double-spacing I get is very annoying. There's a lot you don't say, which is implied in your code. Are the lines in file B.txt really alternating: key1 data for key1 key2 data for key2 ... Are the key lines in file B.txt exact messages, or do they just contain the key somewhere in the line? Your code assumes the latter, but the whole thing could be much simpler if it were always an exact match. Are the keys in A.txt unique? If so, you could store them in a set, and make lookup basically instantaneous. I think the real question you had was how to access the line following the key, once you matched the key. Something like this should do it (untested) lines = iter( object ) for key in lines: linedata = lines.next() if key in mydictionary: print key, --, linedata Main caveat I can see is the file had better have an even number of lines. -- DaveA // ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Script to search in string of values from file A in file B
-Original Message- From: Dave Angel [mailto:d...@davea.name] Sent: woensdag 9 mei 2012 15:52 To: Afonso Duarte Cc: tutor@python.org Subject: Re: [Tutor] Script to search in string of values from file A in file B On 05/09/2012 10:00 AM, Afonso Duarte wrote: Dear All, I'm new to Python and started to use it to search text strings in big (500Mb) txt files. I have a list on text file (e.g. A.txt) that I want to use as a key to search another file (e.g. B.txt), organized in the following way: A.txt: Aaa Bbb Ccc Ddd . . . B.txt Bbb 1234 Xxx 234 I want to use A.txt to search in B.txt and have as output the original search entry (e.g. Bbb) followed by the line that follows it in the B.txt (e.g. Bbb / 1234). I wrote the following script: object = open(B.txt', 'r') lista = open(A.txt', 'r') searches = lista.readlines() for line in object.readlines(): for word in searches: if word in line: print line+'\n' But from here I only get the searching entry and not the line afterwards, I tried to google it but I got lost and didn't manage to do it. Any ideas ? I guess that this is basic scripting but I just started . Best Afonso Please post your messages as plain-text. The double-spacing I get is very annoying. Sorry for that my outlook mess-it-up There's a lot you don't say, which is implied in your code. Are the lines in file B.txt really alternating: key1 data for key1 key2 data for key2 ... Sure, that's why I describe them in the email like that and didn't say that they weren't Are the key lines in file B.txt exact messages, or do they just contain the key somewhere in the line? Your code assumes the latter, but the whole thing could be much simpler if it were always an exact match. The entry in B has text before and after (the size of that text changes from entry to entry. Are the keys in A.txt unique? If so, you could store them in a set, and make lookup basically instantaneous. That indeed I didn't refer, the entries from A are unique in B I think the real question you had was how to access the line following the key, once you matched the key. True that is my real question (as the code above works just for the title line, I basically want to print the next line of the B.txt for each entry) Something like this should do it (untested) lines = iter( object ) for key in lines: linedata = lines.next() if key in mydictionary: print key, --, linedata Main caveat I can see is the file had better have an even number of lines. That changes from file to file, and its unlikely i have all even number. Thanks Afonso -- DaveA // ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Script to search in string of values from file A in file B
On 05/09/2012 11:04 AM, Afonso Duarte wrote: -Original Message- From: Dave Angel [mailto:d...@davea.name] SNIP Please post your messages as plain-text. The double-spacing I get is very annoying. Sorry for that my outlook mess-it-up I'm sure there's a setting to say use plain-text. In Thunderbird, i tell it that any message to forums is to be plain-text. There's a lot you don't say, which is implied in your code. Are the lines in file B.txt really alternating: key1 data for key1 key2 data for key2 ... Sure, that's why I describe them in the email like that and didn't say that they weren't Are the key lines in file B.txt exact messages, or do they just contain the key somewhere in the line? Your code assumes the latter, but the whole thing could be much simpler if it were always an exact match. The entry in B has text before and after (the size of that text changes from entry to entry. In other words, the line pairs are not like your sample, but more like: trash key1more trash Useful associated data for the previous key trash2 key2more trash Useful associated ata for the previous key Are the keys in A.txt unique? If so, you could store them in a set, and make lookup basically instantaneous. That indeed I didn't refer, the entries from A are unique in B Not what I asked. Are the keys in A.txt ever present more than once in A.txt ? But then again, if the key line can contain garbage before and/or after the key, then the set idea is moot anyway. I think the real question you had was how to access the line following the key, once you matched the key. True that is my real question (as the code above works just for the title line, I basically want to print the next line of the B.txt for each entry) Something like this should do it (untested) lines = iter( object ) for key in lines: linedata = lines.next() if key in mydictionary: print key, --, linedata Main caveat I can see is the file had better have an even number of lines. That changes from file to file, and its unlikely i have all even number. In that case, what do you use for data of the last key? If you really have to handle the case where there is a final key with no data, then you'll have to detect that case, and make up the data separately. That could be done with a try block, but this is probably clearer: rawlines = object.readlines() if len(rawlines) %2 != 0: rawlines += #add an extra line lines = iter(rawlines) for keyline in lines: linedata = lines.next() for word in searches: if word in keyline: print word, --, linedata Thanks Afonso -- DaveA ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Script to search in string of values from file A in file B
SNIP If you really have to handle the case where there is a final key with no data, then you'll have to detect that case, and make up the data separately. That could be done with a try block, but this is probably clearer: rawlines = object.readlines() if len(rawlines) %2 != 0: rawlines += #add an extra line Oops, that should have been rawlines.append() or mayberawlines.append(\n) lines = iter(rawlines) for keyline in lines: linedata = lines.next() for word in searches: if word in keyline: print word, --, linedata -- DaveA ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Script to search in string of values from file A in file B
On 09/05/2012 15:00, Afonso Duarte wrote: object = open(B.txt', 'r') You'll already received some sound advice, so I'd just like to point out that your object will override the built-in object, apologies if somebody has already said this and I've missed it. Afonso ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor -- Cheers. Mark Lawrence. ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
[Tutor] odd behavior when renaming a file
import os def pre_process(): if os.path.isfile('revelex.csv'): os.rename('revelex.csv', 'revelex.tmp') print Renamed ok else: print Exiting, no revelex.csv file available exit() out_file = open('revelex.csv', 'w') # etc. if __name__ == '__main__': pre_process() When I run the code above it works file if run from the file. But when I import it and run it from another file it renames the file but then prints Exiting, no revelex.csv file available -- Joel Goldstick ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Script to search in string of values from file A in file B
Dear All, Sorry it seems that I got the wrong mailing list to subscribe ... I got the idea that this list was open to newbies ... by the answers I got I see that I was wrong In that case, what do you use for data of the last key? If you really have to handle the case where there is a final key with no data, then you'll have to detect that case, and make up the data separately. That could be done with a try block, but this is probably clearer: rawlines = object.readlines() if len(rawlines) %2 != 0: rawlines += #add an extra line lines = iter(rawlines) for keyline in lines: linedata = lines.next() for word in searches: if word in keyline: print word, --, linedata after chatting in other mailing lists about other languages I realized that this mailing list is not in my league for python ... Interestingly I did got a strange advice from this list: try awk ... of Perl for the job, as Python is kind of tricky to print the next line that you selected (yes that was my question and I still don't understand how ppl advise me to insert new lines in 500Mb files and so on to do it...) Once again sorry about the time. Cheers Afonso On 2012-05-09 16:16, Dave Angel wrote: On 05/09/2012 11:04 AM, Afonso Duarte wrote: -Original Message- From: Dave Angel [mailto:d...@davea.name] SNIP Please post your messages as plain-text. The double-spacing I get is very annoying. Sorry for that my outlook mess-it-up I'm sure there's a setting to say use plain-text. In Thunderbird, i tell it that any message to forums is to be plain-text. There's a lot you don't say, which is implied in your code. Are the lines in file B.txt really alternating: key1 data for key1 key2 data for key2 ... Sure, that's why I describe them in the email like that and didn't say that they weren't Are the key lines in file B.txt exact messages, or do they just contain the key somewhere in the line? Your code assumes the latter, but the whole thing could be much simpler if it were always an exact match. The entry in B has text before and after (the size of that text changes from entry to entry. In other words, the line pairs are not like your sample, but more like: trash key1more trash Useful associated data for the previous key trash2 key2more trash Useful associated ata for the previous key Are the keys in A.txt unique? If so, you could store them in a set, and make lookup basically instantaneous. That indeed I didn't refer, the entries from A are unique in B Not what I asked. Are the keys in A.txt ever present more than once in A.txt ? But then again, if the key line can contain garbage before and/or after the key, then the set idea is moot anyway. I think the real question you had was how to access the line following the key, once you matched the key. True that is my real question (as the code above works just for the title line, I basically want to print the next line of the B.txt for each entry) Something like this should do it (untested) lines = iter( object ) for key in lines: linedata = lines.next() if key in mydictionary: print key, --, linedata Main caveat I can see is the file had better have an even number of lines. That changes from file to file, and its unlikely i have all even number. In that case, what do you use for data of the last key? If you really have to handle the case where there is a final key with no data, then you'll have to detect that case, and make up the data separately. That could be done with a try block, but this is probably clearer: rawlines = object.readlines() if len(rawlines) %2 != 0: rawlines += #add an extra line lines = iter(rawlines) for keyline in lines: linedata = lines.next() for word in searches: if word in keyline: print word, --, linedata Thanks Afonso ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Script to search in string of values from file A in file B
On Wed, May 9, 2012 at 10:00 AM, Afonso Duarte adua...@itqb.unl.pt wrote: Dear All, I’m new to Python and started to use it to search text strings in big (500Mb) txt files. I have a list on text file (e.g. A.txt) that I want to use as a key to search another file (e.g. B.txt), organized in the following way: A.txt: Aaa Bbb Ccc Ddd . . . B.txt Bbb 1234 Xxx 234 I want to use A.txt to search in B.txt and have as output the original search entry (e.g. Bbb) followed by the line that follows it in the B.txt (e.g. Bbb / 1234). I wrote the following script: object = open(B.txt', 'r') lista = open(A.txt', 'r') searches = lista.readlines() for line in object.readlines(): for word in searches: if word in line: print line+'\n' Don't give up on this group so quickly. You will get lots of help here. As to your problem: Do you know about enumerate? Learn about it here: http://docs.python.org/library/functions.html#enumerate if you change your code above to: for index, word in enumerate line: print line, word[index+1] I think you will get what you are looking for But from here I only get the searching entry and not the line afterwards, I tried to google it but I got lost and didn’t manage to do it. Any ideas ? I guess that this is basic scripting but I just started . Best Afonso ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor -- Joel Goldstick ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Script to search in string of values from file A in file B
On Wed, May 9, 2012 at 3:40 PM, Joel Goldstick joel.goldst...@gmail.com wrote: On Wed, May 9, 2012 at 10:00 AM, Afonso Duarte adua...@itqb.unl.pt wrote: Dear All, I’m new to Python and started to use it to search text strings in big (500Mb) txt files. I have a list on text file (e.g. A.txt) that I want to use as a key to search another file (e.g. B.txt), organized in the following way: A.txt: Aaa Bbb Ccc Ddd . . . B.txt Bbb 1234 Xxx 234 I want to use A.txt to search in B.txt and have as output the original search entry (e.g. Bbb) followed by the line that follows it in the B.txt (e.g. Bbb / 1234). I wrote the following script: object = open(B.txt', 'r') lista = open(A.txt', 'r') searches = lista.readlines() for line in object.readlines(): for word in searches: if word in line: print line+'\n' Don't give up on this group so quickly. You will get lots of help here. As to your problem: Do you know about enumerate? Learn about it here: http://docs.python.org/library/functions.html#enumerate if you change your code above to: for index, word in enumerate line: print line, word[index+1] I think you will get what you are looking for My mistake : I meant this: my_lines = object.readlines() # note, not a good thing to name something object. Its a class for index, line in enumerate(my_lines): for word in searches: if word in line: print line print my_lines[index+1] Sorry for the crazy earlier post But from here I only get the searching entry and not the line afterwards, I tried to google it but I got lost and didn’t manage to do it. Any ideas ? I guess that this is basic scripting but I just started . Best Afonso ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor -- Joel Goldstick -- Joel Goldstick ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] odd behavior when renaming a file
Hi, On 9 May 2012 20:26, Joel Goldstick joel.goldst...@gmail.com wrote: import os def pre_process(): if os.path.isfile('revelex.csv'): os.rename('revelex.csv', 'revelex.tmp') print Renamed ok else: print Exiting, no revelex.csv file available exit() out_file = open('revelex.csv', 'w') # etc. if __name__ == '__main__': pre_process() When I run the code above it works file if run from the file. But when I import it and run it from another file it renames the file but then prints Exiting, no revelex.csv file available Can you post where/how you call this from another file? Anyway, it sounds like the pre_process() routine is being called twice, somehow. On the first call the file is renamed. Then on the second call, of course the file is not there anymore (as it's been renamed) and thus it prints the Exiting message. Best, Walter ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Script to search in string of values from file A in file B
Hi Alfonso, I see you've had some responses yet -- I've not read them all, and am just posting the following suggestion you might want to look at: # read lines with keys into a list selected_keys=open('A.txt', 'r').readlines() # read all data records into another list records=open('B.txt', 'r').readlines() # Now use a list comprehension to return the required entries, the i+1th entries for all i indexes in the records # list that corresponds to a key in the keys list: selected_values = [(records[i], records[i+1]) for i, row in enumerate(records) if row in selected_keys] # The above returns both the key and the value, in a tuple, if you just want the value rows only then the above becomes: #selected_values = [records[i+1] for i, row in enumerate(records) if row in selected_keys] # Finally print the result. print selected_values You'll note I read both files into memory, even though you say your files are largish. I don't consider 500MB to be very large in this day and age of 4+GB PC's, which is why I've basically ignored the large issue. If this is not true in your case then you'll have to post back. Good luck, Walter ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] odd behavior when renaming a file
Joel Goldstick wrote: import os def pre_process(): if os.path.isfile('revelex.csv'): os.rename('revelex.csv', 'revelex.tmp') print Renamed ok else: print Exiting, no revelex.csv file available exit() out_file = open('revelex.csv', 'w') # etc. if __name__ == '__main__': pre_process() When I run the code above it works file if run from the file. But when I import it and run it from another file it renames the file but then prints Exiting, no revelex.csv file available Add print os.getcwd() to your code, you are probably in the wrong directory. ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Script to search in string of values from file A in file B
On 09/05/12 20:28, aduarte wrote: Sorry it seems that I got the wrong mailing list to subscribe ... I got the idea that this list was open to newbies ... by the answers I got I see that I was wrong I'm not sure what you mean. The answers you got seem to have provided the answers to your questions. What more were you expecting? after chatting in other mailing lists about other languages I realized that this mailing list is not in my league for python ... Which league is that? You said you were a beginner so you got answers appropriate to a beginner. If you said you were an experienced data processing professional looking for a smart/efficient way to process large files using Python you would likely have gotten different answers. If the answers were too advanced then by all means ask for clarification. We can only guess your level based on what you post. Interestingly I did got a strange advice from this list: try awk ... of Perl for the job, as Python is kind of tricky to print the next line I didn't see that suggestion and I disagree with it. Python is just as capable of processing files as awk or Perl as I hope the other answers have demonstrated. But where another tool is more appropriate there is no harm in suggesting it. Just because this is a Python list doesn't mean the answer needs to be Python. that you selected (yes that was my question and I still don't understand how ppl advise me to insert new lines in 500Mb files and so on to do it...) Again I'm not sure that anyone is actually suggesting you insert new lines into your file. It's certainly not the general advice being given. But this is a list for beginners and the people giving the advice range from complete novices themselves to working pro's. The answers reflect that diversity. In your case the majority of the answers have come from experienced programmers giving you sound advice and probing your requirements to ensure that all your use cases are covered. The only slightly radical suggestion I can see is to read the files into memory - and on a modern PC that's not too radical for a 500M file even though I'd probably not do it myself... -- Alan G Author of the Learn to Program web site http://www.alan-g.me.uk/ ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor