Re: [Tutor] Huge list comprehension
Thanks One reason fornsharing the code was that I have to manually create over 100 variables Is there a way i can automate thst process? Get Outlook for Android<https://aka.ms/ghei36> From: Abdur-Rahmaan Janhangeer Sent: Saturday, June 10, 3:35 PM Subject: Re: [Tutor] Huge list comprehension To: syed zaidi, tutor take a look at numpy and don't necessarily give us the whole code. it becomes too long without purpose Abdur-Rahmaan Janhangeer, Mauritius abdurrahmaanjanhangeer.wordpress.com<http://abdurrahmaanjanhangeer.wordpress.com> On 6 Jun 2017 03:26, "syed zaidi" mailto:syedzaid...@hotmail.co.uk>> wrote: hi, I would appreciate if you can help me suggesting a quick and efficient strategy for comparing multiple lists with one principal list I have about 125 lists containing about 100,000 numerical entries in each my principal list contains about 6 million entries. I want to compare each small list with main list and append yes/no or 0/1 in each new list corresponding to each of 125 lists The program is working but it takes ages to process huge files, Can someone pleases tell me how can I make this process fast. Right now it takes arounf 2 weeks to complete this task the code I have written and is working is as under: sample_name = [] main_op_list,principal_list = [],[] dictionary = {} with open("C:/Users/INVINCIBLE/Desktop/T2D_ALL_blastout_batch.txt", 'r') as f: reader = csv.reader(f, dialect = 'excel', delimiter='\t') list2 = filter(None, reader) for i in range(len(list2)): col1 = list2[i][0] operon = list2[i][1] main_op_list.append(operon) col1 = col1.strip().split("_") sample_name = col1[0] if dictionary.get(sample_name): dictionary[sample_name].append(operon) else: dictionary[sample_name] = [] dictionary[sample_name].append(operon) locals().update(dictionary) ## converts dictionary keys to variables ##print DLF004 dict_values = dictionary.values() dict_keys = dictionary.keys() print dict_keys print len(dict_keys) main_op_list_np = np.array(main_op_list) DLF002_1,DLF004_1,DLF005_1,DLF006_1,DLF007_1,DLF008_1,DLF009_1,DLF010_1,DLF012_1,DLF013_1,DLF014_1,DLM001_1,DLM002_1,DLM003_1,DLM004_1,DLM005_1,DLM006_1,DLM009_1,DLM011_1,DLM012_1,DLM018_1,DOF002_1,DOF003_1 =[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[] DOF004_1,DOF006_1,DOF007_1,DOF008_1,DOF009_1,DOF010_1,DOF011_1,DOF012_1,DOF013_1,DOF014_1,DOM001_1,DOM003_1,DOM005_1,DOM008_1,DOM010_1,DOM012_1,DOM013_1,DOM014_1,DOM015_1,DOM016_1,DOM017_1,DOM018_1,DOM019_1 =[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[] DOM020_1,DOM021_1,DOM022_1,DOM023_1,DOM024_1,DOM025_1,DOM026_1 = [],[],[],[],[],[],[] NLF001_1,NLF002_1,NLF005_1,NLF006_1,NLF007_1,NLF008_1,NLF009_1,NLF010_1,NLF011_1,NLF012_1,NLF013_1,NLF014_1,NLF015_1,NLM001_1,NLM002_1,NLM003_1,NLM004_1,NLM005_1,NLM006_1,NLM007_1,NLM008_1,NLM009_1,NLM010_1 =[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[] NLM015_1,NLM016_1,NLM017_1,NLM021_1,NLM022_1,NLM023_1,NLM024_1,NLM025_1,NLM026_1,NLM027_1,NLM028_1,NLM029_1,NLM031_1,NLM032_1,NOF001_1,NOF002_1,NOF004_1,NOF005_1,NOF006_1,NOF007_1,NOF008_1,NOF009_1,NOF010_1 =[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[] NOF011_1,NOF012_1,NOF013_1,NOF014_1,NOM001_1,NOM002_1,NOM004_1,NOM005_1,NOM007_1,NOM008_1,NOM009_1,NOM010_1,NOM012_1,NOM013_1,NOM015_1,NOM016_1,NOM017_1,NOM018_1,NOM019_1,NOM020_1,NOM022_1,NOM023_1,NOM025_1 =[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[] NOM026_1,NOM027_1,NOM028_1,NOM029_1 = [],[],[],[] for i in main_op_list_np: if i in DLF002: DLF002_1.append('1') else:DLF002_1.append('0') if i in DLF004: DLF004_1.append('1') else:DLF004_1.append('0') if i in DLF005: DLF005_1.append('1') else:DLF005_1.append('0') if i in DLF006: DLF006_1.append('1') else:DLF006_1.append('0') if i in DLF007: DLF007_1.append('1') else:DLF007_1.append('0') if i in DLF008: DLF008_1.append('1') else:DLF008_1.append('0') ## if main_op_list[i] in DLF009: DLF009_1.append('1') ## else:DLF009_1.append('0') if i in DLF010: DLF010_1.append('1') else:DLF010_1.append('0') if i in DLF012: DLF012_1.append('1') else:DLF012_1.append('0') if i in DLF013: DLF013_1.append('1') else:DLF013_1.append('0') if i in DLF014: DLF014_1.append('1') else:DLF014_1.append('0') if i in DLM001: DLM001_1.append('1') else:DLM001_1.append('0') if i in DLM002: DLM002_1.append('1') else:DLM002_1.append('0') if i in DLM003: DLM003_1.append(
[Tutor] Huge list comprehension
hi, I would appreciate if you can help me suggesting a quick and efficient strategy for comparing multiple lists with one principal list I have about 125 lists containing about 100,000 numerical entries in each my principal list contains about 6 million entries. I want to compare each small list with main list and append yes/no or 0/1 in each new list corresponding to each of 125 lists The program is working but it takes ages to process huge files, Can someone pleases tell me how can I make this process fast. Right now it takes arounf 2 weeks to complete this task the code I have written and is working is as under: sample_name = [] main_op_list,principal_list = [],[] dictionary = {} with open("C:/Users/INVINCIBLE/Desktop/T2D_ALL_blastout_batch.txt", 'r') as f: reader = csv.reader(f, dialect = 'excel', delimiter='\t') list2 = filter(None, reader) for i in range(len(list2)): col1 = list2[i][0] operon = list2[i][1] main_op_list.append(operon) col1 = col1.strip().split("_") sample_name = col1[0] if dictionary.get(sample_name): dictionary[sample_name].append(operon) else: dictionary[sample_name] = [] dictionary[sample_name].append(operon) locals().update(dictionary) ## converts dictionary keys to variables ##print DLF004 dict_values = dictionary.values() dict_keys = dictionary.keys() print dict_keys print len(dict_keys) main_op_list_np = np.array(main_op_list) DLF002_1,DLF004_1,DLF005_1,DLF006_1,DLF007_1,DLF008_1,DLF009_1,DLF010_1,DLF012_1,DLF013_1,DLF014_1,DLM001_1,DLM002_1,DLM003_1,DLM004_1,DLM005_1,DLM006_1,DLM009_1,DLM011_1,DLM012_1,DLM018_1,DOF002_1,DOF003_1 =[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[] DOF004_1,DOF006_1,DOF007_1,DOF008_1,DOF009_1,DOF010_1,DOF011_1,DOF012_1,DOF013_1,DOF014_1,DOM001_1,DOM003_1,DOM005_1,DOM008_1,DOM010_1,DOM012_1,DOM013_1,DOM014_1,DOM015_1,DOM016_1,DOM017_1,DOM018_1,DOM019_1 =[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[] DOM020_1,DOM021_1,DOM022_1,DOM023_1,DOM024_1,DOM025_1,DOM026_1 = [],[],[],[],[],[],[] NLF001_1,NLF002_1,NLF005_1,NLF006_1,NLF007_1,NLF008_1,NLF009_1,NLF010_1,NLF011_1,NLF012_1,NLF013_1,NLF014_1,NLF015_1,NLM001_1,NLM002_1,NLM003_1,NLM004_1,NLM005_1,NLM006_1,NLM007_1,NLM008_1,NLM009_1,NLM010_1 =[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[] NLM015_1,NLM016_1,NLM017_1,NLM021_1,NLM022_1,NLM023_1,NLM024_1,NLM025_1,NLM026_1,NLM027_1,NLM028_1,NLM029_1,NLM031_1,NLM032_1,NOF001_1,NOF002_1,NOF004_1,NOF005_1,NOF006_1,NOF007_1,NOF008_1,NOF009_1,NOF010_1 =[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[] NOF011_1,NOF012_1,NOF013_1,NOF014_1,NOM001_1,NOM002_1,NOM004_1,NOM005_1,NOM007_1,NOM008_1,NOM009_1,NOM010_1,NOM012_1,NOM013_1,NOM015_1,NOM016_1,NOM017_1,NOM018_1,NOM019_1,NOM020_1,NOM022_1,NOM023_1,NOM025_1 =[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[] NOM026_1,NOM027_1,NOM028_1,NOM029_1 = [],[],[],[] for i in main_op_list_np: if i in DLF002: DLF002_1.append('1') else:DLF002_1.append('0') if i in DLF004: DLF004_1.append('1') else:DLF004_1.append('0') if i in DLF005: DLF005_1.append('1') else:DLF005_1.append('0') if i in DLF006: DLF006_1.append('1') else:DLF006_1.append('0') if i in DLF007: DLF007_1.append('1') else:DLF007_1.append('0') if i in DLF008: DLF008_1.append('1') else:DLF008_1.append('0') ## if main_op_list[i] in DLF009: DLF009_1.append('1') ## else:DLF009_1.append('0') if i in DLF010: DLF010_1.append('1') else:DLF010_1.append('0') if i in DLF012: DLF012_1.append('1') else:DLF012_1.append('0') if i in DLF013: DLF013_1.append('1') else:DLF013_1.append('0') if i in DLF014: DLF014_1.append('1') else:DLF014_1.append('0') if i in DLM001: DLM001_1.append('1') else:DLM001_1.append('0') if i in DLM002: DLM002_1.append('1') else:DLM002_1.append('0') if i in DLM003: DLM003_1.append('1') else:DLM003_1.append('0') if i in DLM004: DLM004_1.append('1') else:DLM004_1.append('0') if i in DLM005: DLM005_1.append('1') else:DLM005_1.append('0') if i in DLM006: DLM006_1.append('1') else:DLM006_1.append('0') if i in DLM009: DLM009_1.append('1') else:DLM009_1.append('0') if i in DLM011: DLM011_1.append('1') else:DLM011_1.append('0') if i in DLM012: DLM012_1.append('1') else:DLM012_1.append('0') if i in DLM018: DLM018_1.append('1') else:DLM018_1.append('0') if i in DOF002: DOF002_1.append('1') else:DOF002_1.append('0') if i in DOF003: DOF003_1.append('1') else:DOF003_1.append('0') if i in DOF004: DOF004_1.append('1') else:DOF004_1.append('0') if i in DOF006: DOF006_1.append('1') else:DOF006_1.append('0') if i in DOF007: DOF007_1.append('1') else:DOF007_1.append('0') if i in DOF008: DOF008_1.append('1') else:DOF008_1.append('0') if i in
Re: [Tutor] FASTA FILE SUB-SEQUENCE EXTRACTION
Well, I know about it but I want to perform the task using python 2.7 since the tool I'm trying to develop is in 2.7. For some reason I want to do it without biopython please help. .. Sent from my Samsung Galaxy smartphone. Original message From: Danny Yoo Date: 3/9/2016 08:39 (GMT+08:00) To: syed zaidi Cc: Python Tutor Mailing List Subject: Re: [Tutor] FASTA FILE SUB-SEQUENCE EXTRACTION You should probably look into Biopython: http://biopython.org/wiki/Main_Page Your question should involve fairly straightforward use of the Seq methods of Biopython. You should not try to write your own FASTA parser: Biopython comes with a good one already. Note that the tutor mailing list is not entirely general: the questions here are expected to be beginner-level. Bioinformatics questions are a bit out of scope for tutor @python.org, since they involve a specialized domain that our participants here won't probably be very familiar with. The last times I participated, the Biopython forums were very active. You should check them out: http://biopython.org/wiki/Mailing_lists Good luck! ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] FASTA FILE SUB-SEQUENCE EXTRACTION
Well, fasta is a file format used by biologists to store biological sequencesthe format is as under> sequence information (sequence name, sequence length etc)genomic sequence> sequence information (sequence name, sequence length etc)genomic sequenceI want to match the name of sequence with another list of sequence names and splice the sequence by the provided list of start and end sites for each sequenceso the pseudo code could beif line starts with '>':match the header name with sequence name:if sequence name found:splice from the given start and end positions of that sequencethe code I have devised so far is:import oswith open('E:/scaftig.sample - Copy.scaftig','r') as f:header = f.readline() header = header.rstrip(os.linesep)sequence = ''for line in f: line = line.rstrip('\n')if line[0] == '>':header = header[:]print headerif line[0] != '>': sequence+= line print sequence, len(sequence)I would appreciate if you can helpThanksBest RegardsAli > Date: Tue, 8 Mar 2016 03:11:42 -0500 > Subject: Re: [Tutor] FASTA FILE SUB-SEQUENCE EXTRACTION > From: wolfrage8...@gmail.com > To: syedzaid...@hotmail.co.uk > > What is FASTA? This seems very specific. Do you have any code thus far > that is failing? > > On Tue, Mar 8, 2016 at 2:33 AM, syed zaidi wrote: > > Hello all, > > I am stuck in a problem, I hope someone can help me out. I have a FASTA > > file with multiple sequences and another file with the gene coordinates. > > SAMPLEFASTA FILE: > >>EBM_revised_C2034_1 > >>length=611GCAGCAGTAATATTGCACAATGGGCGCAAGCCTGATGCAGCCATGCCGCGTGTATGAAGAAGGCCTTCGGGTTGTAAAGTACTTTCAGCAGGAAGGGAGTAAAGTTAATACCTTTGCTCATTGACGTTACCCGCAGAAGAAGCACCGGCTAACTCCGTGCCAGCAGCCGCGGTAATACGGAGGGTGCAAGCGTTAATCGGAATTACTGGGCGTAAAGCGCACGCAGGCGGTTTGTTAAGTCAGATGTGAAATGGGCTCAACCTGGGAACTGCATCTGATACTGGCAAGCTTGAGTCTCGTAGAGGTAGAATTCCAGGTGTAGCGGTGAAATGCGTAGAGATCTGGAGGAATACCGGTGGCGAAGGCGGCTGGACGAAGACTGACGCTCAGGTGCGAAAGCGTAGCAAACAGGATTAGATACCCTGGTAGTCCACGCCGTAAACGATGTCGACTTGGAGGTTGTGCCCTTGAGGCGTGGCTTCCGGAGCTAACGCGTTAAGTCGACCGCCTAGTACGGCCGCAAGGTTCTCAAATGAATTGACGCCCGCACAAGCGGTGGAGCATGTGGTTTAATT>EBM_revised_C2104_1 > >> > >>length=923TCCGAGGGCGGTGGGATGTTGGTGCTGCAGCGGCTTTCGGATGCGCGGCGGTTGGGTCATCCGGTGTTGGCGGTGGTGGTCGGGTCGGCGGTTAATCAGGATCGTCGAATGGGTTGACCGCGCCTAATGGTCCTTCGCAGCAGCGGGTGGTGCGGGCGGCGTTGGCCAATGCCGGGTTGAGCGCGGCCGAGGTGGATGTGGTGGACATGGGACCGGGACCACGTTGATCCGATTGAGGCTCAGGCGTTGTTGGCCACTTATGGGCAAGATCGGAGCCGGGAGAACCTTTGTGGTTTCGGT GAA > > > > GTCGAATATGGGTCATACGCAGGCCGCGGCGTGGCCTGATCAAGATGGTGTTGGCGATGCGCCATGAGCTGTTGCCGGCGACGTTGCACGTGGATGTGCCTAGCCCGCATGTGGATTGGTCGGCGCGGTGGAGTTGTTGACCGCGCCGCGGGTGTGGCCTGCTGGTGCTCGGACGCGTCGTGCGTGTCGTCGTTTGGGATTAGTGGCACTAATGCGCATGTGATTATCGAGGCGGTGCCGGTGGTGCCGCGGCGGGAGGCTGGTTGGGCCCGGTGGTGCCGTGGGTGGTGTCGGCGAAGTCGGAGTCGGCGTTGCGCAGGCGGCTCGGTTGGCCGCGTACGTGCGTGGCGATGATGGCCTCGATGTTGCCGATGTTGGTCGTTGGCGGGTCGTTCGGTGAGCATCGGGCGGTGGTGGTTGGCACCGTGATCGGTTGTTGGCCGGGCTCGATGAGCTGGCGGGTGACCAGTTGGGCGGCTCGGTTGTTCCACGGCGACTGCGGCGGGTAAGACGGTGTTCGTCTTGGCCAAGGCTCCCAATGGCTGGGCATGGGAAT > > GENE COORD FILEScaf_nameGene_name DS_St > > DS_EnEBM_revised_C2034_1gene1_1 33 99EBM_revised_C2034_1 > > gene1_1 55 100EBM_revised_C2034_1 gene1_1 111 > > 150EBM_revised_C2104_1 gene1_1 44 70 > > I want to perform the following steps:compare the scaf_name with the header > > of fasta sequenceif header matches then process the sequence and extract > > the sequence by the provided start and end positions. > > > > I would appreciate if someone can help > > Thanks > > Best Regards > > > > Ali > > > >> ___ > >> Tutor maillist - Tutor@python.org > >> To unsubscribe or change subscription options: > >> https://mail.python.org/mailman/listinfo/tutor > > > > ___ > > Tutor maillist - Tutor@python.org > > To unsubscribe or change subscription options: > > https://mail.python.org/mailman/listinfo/tutor ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
[Tutor] FASTA FILE SUB-SEQUENCE EXTRACTION
Hello all, I am stuck in a problem, I hope someone can help me out. I have a FASTA file with multiple sequences and another file with the gene coordinates. SAMPLEFASTA FILE: >EBM_revised_C2034_1 >length=611GCAGCAGTAATATTGCACAATGGGCGCAAGCCTGATGCAGCCATGCCGCGTGTATGAAGAAGGCCTTCGGGTTGTAAAGTACTTTCAGCAGGAAGGGAGTAAAGTTAATACCTTTGCTCATTGACGTTACCCGCAGAAGAAGCACCGGCTAACTCCGTGCCAGCAGCCGCGGTAATACGGAGGGTGCAAGCGTTAATCGGAATTACTGGGCGTAAAGCGCACGCAGGCGGTTTGTTAAGTCAGATGTGAAATGGGCTCAACCTGGGAACTGCATCTGATACTGGCAAGCTTGAGTCTCGTAGAGGTAGAATTCCAGGTGTAGCGGTGAAATGCGTAGAGATCTGGAGGAATACCGGTGGCGAAGGCGGCTGGACGAAGACTGACGCTCAGGTGCGAAAGCGTAGCAAACAGGATTAGATACCCTGGTAGTCCACGCCGTAAACGATGTCGACTTGGAGGTTGTGCCCTTGAGGCGTGGCTTCCGGAGCTAACGCGTTAAGTCGACCGCCTAGTACGGCCGCAAGGTTCTCAAATGAATTGACGCCCGCACAAGCGGTGGAGCATGTGGTTTAATT>EBM_revised_C2104_1 > >length=923TCCGAGGGCGGTGGGATGTTGGTGCTGCAGCGGCTTTCGGATGCGCGGCGGTTGGGTCATCCGGTGTTGGCGGTGGTGGTCGGGTCGGCGGTTAATCAGGATCGTCGAATGGGTTGACCGCGCCTAATGGTCCTTCGCAGCAGCGGGTGGTGCGGGCGGCGTTGGCCAATGCCGGGTTGAGCGCGGCCGAGGTGGATGTGGTGGACATGGGACCGGGACCACGTTGATCCGATTGAGGCTCAGGCGTTGTTGGCCACTTATGGGCAAGATCGGAGCCGGGAGAACCTTTGTGGTTTCGGTGAA GTCGAATATGGGTCATACGCAGGCCGCGGCGTGGCCTGATCAAGATGGTGTTGGCGATGCGCCATGAGCTGTTGCCGGCGACGTTGCACGTGGATGTGCCTAGCCCGCATGTGGATTGGTCGGCGCGGTGGAGTTGTTGACCGCGCCGCGGGTGTGGCCTGCTGGTGCTCGGACGCGTCGTGCGTGTCGTCGTTTGGGATTAGTGGCACTAATGCGCATGTGATTATCGAGGCGGTGCCGGTGGTGCCGCGGCGGGAGGCTGGTTGGGCCCGGTGGTGCCGTGGGTGGTGTCGGCGAAGTCGGAGTCGGCGTTGCGCAGGCGGCTCGGTTGGCCGCGTACGTGCGTGGCGATGATGGCCTCGATGTTGCCGATGTTGGTCGTTGGCGGGTCGTTCGGTGAGCATCGGGCGGTGGTGGTTGGCACCGTGATCGGTTGTTGGCCGGGCTCGATGAGCTGGCGGGTGACCAGTTGGGCGGCTCGGTTGTTCCACGGCGACTGCGGCGGGTAAGACGGTGTTCGTCTTGGCCAAGGCTCCCAATGGCTGGGCATGGGAAT GENE COORD FILEScaf_nameGene_name DS_St DS_EnEBM_revised_C2034_1gene1_1 33 99EBM_revised_C2034_1 gene1_1 55 100EBM_revised_C2034_1 gene1_1 111 150EBM_revised_C2104_1 gene1_1 44 70 I want to perform the following steps:compare the scaf_name with the header of fasta sequenceif header matches then process the sequence and extract the sequence by the provided start and end positions. I would appreciate if someone can help Thanks Best Regards Ali > ___ > Tutor maillist - Tutor@python.org > To unsubscribe or change subscription options: > https://mail.python.org/mailman/listinfo/tutor ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
[Tutor] Consecutive Sequence
Hi,I am trying to develop a python code that takes a character string as input and finds for the occurrence of letters that are occurring thrice or more consecutively.For E.g. a = 'ataattaaacagagtgagcagt'In the output I want a list of those characters that are occuring thrice or more. like in this case outout must b out_put = ['t','aaa',''] Can someone please suggest a code for this. ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Help with regular expression
Thanks for the help I need the whole line starting from 'D' but in seperate columns.like KO, EC, Gene ID, Enzyme Name etc > Date: Mon, 16 Apr 2012 00:24:17 +1000 > From: st...@pearwood.info > To: tutor@python.org > Subject: Re: [Tutor] Help with regular expression > > syed zaidi wrote: > > Dear Steve,Tutor doesn't allow attachment of huge files. I am attaching > > the files I am taking as input, code and the output CSV file. I hope then > > you would be able to help me. DOT keg files open in file viewer, you can > > also view them in python. The CSV file is the desired output file. > > > There is no need to send four files when one will do. Also no need to send a > file with multiple thousands of lines long when a dozen or so lines should be > sufficient. > > It would also help if you told us what the fields in the file should be > called. You are probably familiar with them, but we aren't. > > Since I don't know what the fields are called, I'm going to just make up some > names. > > def parse_d_line(line): > # Expects a line like this: > # DSBG_0147 aceE; xxx xxx\tK00163 xxx xxx [EC:1.2.4.1] > a, b = line.split('\t') # split on tab character > c, d = a.split(';') > letter, sbg_code, other_code = c.split() > compound1 = d.strip() > words = b.split() > k_code = words[0] > ec = words[-1] > compound2 = " ".join(words[1:-1]) > return (letter, sbg_code, other_code, compound1, k_code, compound2, ec) > > > kegfile = open('something.keg') > # skip lines until a bare exclamation mark > for line in kegfile: > if line.strip() == '!': > break > > # analyse D lines only, skipping all others > for line in kegfile: > if line.startswith('D'): > print(parse_d_line(dline)) > elif line.strip() == '!': > break # stop processing > > > You will notice I don't use regular expressions in this. > > Some people, when confronted with a problem, think "I know, > I'll use regular expressions." Now they have two problems. > -- Jamie Zawinski > > > > > -- > Steven > > ___ > Tutor maillist - Tutor@python.org > To unsubscribe or change subscription options: > http://mail.python.org/mailman/listinfo/tutor ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
[Tutor] Help with regular expression
Dear all Can someone please tell me how to solve the following problem. I have developed a python code to extract specific information from more than 1000 files which have slightly different format. The problem I am facing is that I have to develop specific RE for each of the file which is very difficult when it comes to handle 1000s of files. can someone please tell me how to solve this problem. ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor