Re: [Tutor] Script to search in string of values from file A in file B

2012-05-09 Thread aduarte


Dear All,

Sorry it seems that I got the wrong mailing list to subscribe ...


I got the idea that this list was open to newbies ... by the answers I 
got I see that I was wrong



"

In that case, what do you use for data of the last key?


If you really have to handle the case where there is a final key with 
no

data, then you'll have to detect that case, and make up the data
separately.  That could be done with a try block, but this is probably
clearer:

rawlines = object.readlines()
if len(rawlines) %2 != 0:
rawlines += ""  #add an extra line
lines = iter(rawlines)

for keyline in lines:
linedata = lines.next()
for word in searches:
if word in keyline:
print word, "-->", linedata
"


 after chatting in other mailing lists about other languages I realized 
that this mailing list is not in my league for python ...
 Interestingly I did got a strange advice from this list: try awk ... 
of Perl for the job, as Python is kind of tricky to print the next line 
that you selected (yes that was my question and I still don't understand 
how ppl advise me to insert new lines in 500Mb files and so on to do 
it...)


Once again sorry about the time.

Cheers

Afonso




On 2012-05-09 16:16, Dave Angel wrote:

On 05/09/2012 11:04 AM, Afonso Duarte wrote:



-Original Message-
From: Dave Angel [mailto:d...@davea.name]



Please post your messages as plain-text.   The double-spacing I get 
is

very annoying.


Sorry for that my outlook mess-it-up


I'm sure there's a setting to say use plain-text.  In Thunderbird, i
tell it that any message to forums is to be plain-text.




There's a lot you don't say, which is implied in your code.
Are the lines in file B.txt really alternating:

key1
data for key1
key2
data for key2
...


Sure, that's why I describe them in the email like that and didn't 
say that

they weren't


Are the key lines in file B.txt exact messages, or do they just
"contain" the key somewhere in the line?
 Your code assumes the latter,
but the whole thing could be much simpler if it were always an 
exact match.


The entry in B has text before and after (the size of that text 
changes from

entry to entry.


In other words, the line pairs are not like your sample, but more 
like:


trash  key1more trash
Useful associated data for the previous key
trash2 key2more trash
Useful associated ata for the previous key





Are the keys in A.txt unique?  If so, you could store them in a 
set, and

make lookup basically >instantaneous.

That indeed I didn't refer, the entries from A are unique in B


Not what I asked.  Are the keys in A.txt ever present more than once 
in

A.txt ?  But then again, if the key line can contain garbage before
and/or after the key, then the set idea is moot anyway.




I think the real question you had was how to access the line 
following the

key, once you matched the key.

True that is my real question (as the code above works just for the 
title
line, I basically want to print the next line of the B.txt for each 
entry)



Something like this should do it (untested)

lines = iter( object )
for key in lines:
   linedata = lines.next()
   if key in  mydictionary:
print key, "-->", linedata



Main caveat I can see is the file had better have an even number of 
lines.



That changes from file to file, and its unlikely i have all even 
number.


In that case, what do you use for data of the last key?


If you really have to handle the case where there is a final key with 
no

data, then you'll have to detect that case, and make up the data
separately.  That could be done with a try block, but this is 
probably

clearer:

rawlines = object.readlines()
if len(rawlines) %2 != 0:
rawlines += ""  #add an extra line
lines = iter(rawlines)

for keyline in lines:
linedata = lines.next()
for word in searches:
if word in keyline:
print word, "-->", linedata




Thanks


Afonso




___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Script to search in string of values from file A in file B

2012-05-09 Thread aduarte

On 2012-05-09 15:22, BRAGA, Bruno wrote:

On Thursday, May 10, 2012, Afonso Duarte 
wrote:

Dear All,

 

I’m new to Python and started to use it to search text strings in

big (>500Mb) txt files.
 >

I have a list on text file (e.g. A.txt) that I want to use as a key

to search another file (e.g. B.txt), organized in the following way:


 

A.txt:

 

Aaa

 >

Bbb

Ccc

Ddd

.

.

.

 

B.txt

 

Bbb

1234


 > Xxx


234

 

 

I want to use A.txt to search in B.txt and have as output the

original search entry (e.g. Bbb) followed by the line that follows it
in the B.txt (e.g.  Bbb / 1234).
 >

I wrote the following script:

 

 

object = open(B.txt', 'r')

lista = open(A.txt', 'r')

searches = lista.readlines()

 >

for line in object.readlines():

 for word in searches:

  if word in line:

   print line+'n'

 

 

 >

 

But from here I only get the searching entry and not the line

afterwards, I tried to google it but I got lost and didn’t manage to
do it.


Any ideas ? I guess that this is basic scripting but I just started

.




Not sure I understood the question... But:
- are you trying to "grep" the text file? (simpler than programming
in python, IMO)



- if you have multiple matches of any of the keys from A file in a
sungle line of B file, the script above will print it multiple times


true, I did not mention, but the entries in file A.txt only appear once 
in b.txt.




 - you need not add new line (n) in the print statement, unless you
want it to print a blank line between results


true


Based on the example you gave, the matching Bbb value in B and A are
the same, so actually line is being printed, but it is just the same
as word...



exactly! but what I want is that plus the value that proceeds that line 
in the B.txt i.e.


Bbb
1234

Best

Afonso




 

Best

 

Afonso

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor