I have two text file with a bunch of transcript name and their corresponding 
length, it looks like this:
ERCC.txt
ERCC-00002      1061
ERCC-00003      1023
ERCC-00004      523
ERCC-00009      984
ERCC-00012      994
ERCC-00013      808
ERCC-00014      1957
ERCC-00016      844
ERCC-00017      1136
ERCC-00019      644
blast.tx
ERCC-00002      1058
ERCC-00003      1017
ERCC-00004      519
ERCC-00009      977
ERCC-00019      638
ERCC-00022      746
ERCC-00024      134
ERCC-00024      126
ERCC-00024      98
ERCC-00025      445

I want to compare the length of the transcript and see if the length in 
blast.txt is at least 90% of the length in ERCC.txt for the corresponding 
transcript name ( I hope I am clear!) 
So I wrote the following script:
ercctranscript_size = {}
for line in open('ERCC.txt'):
    columns = line.strip().split()
    transcript = columns[0]
    size = columns[1]
    ercctranscript_size[transcript] = int(size)

unknown_transcript = open('Not_sequenced_ERCC_transcript.txt', 'w')
blast_file = open('blast.txt')
out_file = open ('out.txt', 'w')

blast_transcript = {}
blast_file.readline()
for line in blast_file:
    blasttranscript = columns[0].strip()
    blastsize = columns[1].strip()
    blast_transcript[blasttranscript] = int(blastsize)
    
blastsize = blast_transcript[blasttranscript]    
size = ercctranscript_size[transcript]
print size 
if transcript not in blast_transcript:
    unknown_transcript.write('{0}\n'.format(transcript))
else:
    size = ercctranscript_size[transcript]
    if blastsize >= 0.9*size:
        print >> out_file, transcript, True
    else:
        print >> out_file, transcript, False

But I have a problem storing all size length to the value size as it is always 
comes back with the last entry. 
Could anyone explain to me what I am doing wrong and how I should set the 
values for each dictionary? I am really new to python and this is my first 
script 

Thanks for your help everybody!
-- 
http://mail.python.org/mailman/listinfo/python-list

Reply via email to