I have two text file with a bunch of transcript name and their corresponding length, it looks like this: ERCC.txt ERCC-00002 1061 ERCC-00003 1023 ERCC-00004 523 ERCC-00009 984 ERCC-00012 994 ERCC-00013 808 ERCC-00014 1957 ERCC-00016 844 ERCC-00017 1136 ERCC-00019 644 blast.tx ERCC-00002 1058 ERCC-00003 1017 ERCC-00004 519 ERCC-00009 977 ERCC-00019 638 ERCC-00022 746 ERCC-00024 134 ERCC-00024 126 ERCC-00024 98 ERCC-00025 445
I want to compare the length of the transcript and see if the length in blast.txt is at least 90% of the length in ERCC.txt for the corresponding transcript name ( I hope I am clear!) So I wrote the following script: ercctranscript_size = {} for line in open('ERCC.txt'): columns = line.strip().split() transcript = columns[0] size = columns[1] ercctranscript_size[transcript] = int(size) unknown_transcript = open('Not_sequenced_ERCC_transcript.txt', 'w') blast_file = open('blast.txt') out_file = open ('out.txt', 'w') blast_transcript = {} blast_file.readline() for line in blast_file: blasttranscript = columns[0].strip() blastsize = columns[1].strip() blast_transcript[blasttranscript] = int(blastsize) blastsize = blast_transcript[blasttranscript] size = ercctranscript_size[transcript] print size if transcript not in blast_transcript: unknown_transcript.write('{0}\n'.format(transcript)) else: size = ercctranscript_size[transcript] if blastsize >= 0.9*size: print >> out_file, transcript, True else: print >> out_file, transcript, False But I have a problem storing all size length to the value size as it is always comes back with the last entry. Could anyone explain to me what I am doing wrong and how I should set the values for each dictionary? I am really new to python and this is my first script Thanks for your help everybody! -- http://mail.python.org/mailman/listinfo/python-list