Re: [Tutor] Huge list comprehension

2017-06-12 Thread syed zaidi
Thanks
One reason fornsharing the code was that I have to manually create over 100 
variables
Is there a way i can automate thst process?

Get Outlook for Android<https://aka.ms/ghei36>


From: Abdur-Rahmaan Janhangeer
Sent: Saturday, June 10, 3:35 PM
Subject: Re: [Tutor] Huge list comprehension
To: syed zaidi, tutor

take a look at numpy

and don't necessarily give us the whole code. it becomes too long without 
purpose

Abdur-Rahmaan Janhangeer,
Mauritius
abdurrahmaanjanhangeer.wordpress.com<http://abdurrahmaanjanhangeer.wordpress.com>

On 6 Jun 2017 03:26, "syed zaidi" 
<syedzaid...@hotmail.co.uk<mailto:syedzaid...@hotmail.co.uk>> wrote:

hi,

I would appreciate if you can help me suggesting a quick and efficient strategy 
for comparing multiple lists with one principal list

I have about 125 lists containing about 100,000 numerical entries in each

my principal list contains about 6 million entries.

I want to compare each small list with main list and append yes/no or 0/1 in 
each new list corresponding to each of 125 lists

The program is working but it takes ages to process huge files,
Can someone pleases tell me how can I make this process fast. Right now it 
takes arounf 2 weeks to complete this task

the code I have written and is working is as under:

sample_name = []

main_op_list,principal_list = [],[]
dictionary = {}

with open("C:/Users/INVINCIBLE/Desktop/T2D_ALL_blastout_batch.txt", 'r') as f:
reader = csv.reader(f, dialect = 'excel', delimiter='\t')
list2 = filter(None, reader)
for i in range(len(list2)):
col1 = list2[i][0]
operon = list2[i][1]
main_op_list.append(operon)
col1 = col1.strip().split("_")
sample_name = col1[0]
if dictionary.get(sample_name):
dictionary[sample_name].append(operon)
else:
dictionary[sample_name] = []
dictionary[sample_name].append(operon)
locals().update(dictionary) ## converts dictionary keys to variables
##print DLF004
dict_values = dictionary.values()
dict_keys = dictionary.keys()
print dict_keys
print len(dict_keys)
main_op_list_np = np.array(main_op_list)

DLF002_1,DLF004_1,DLF005_1,DLF006_1,DLF007_1,DLF008_1,DLF009_1,DLF010_1,DLF012_1,DLF013_1,DLF014_1,DLM001_1,DLM002_1,DLM003_1,DLM004_1,DLM005_1,DLM006_1,DLM009_1,DLM011_1,DLM012_1,DLM018_1,DOF002_1,DOF003_1
 =[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]
DOF004_1,DOF006_1,DOF007_1,DOF008_1,DOF009_1,DOF010_1,DOF011_1,DOF012_1,DOF013_1,DOF014_1,DOM001_1,DOM003_1,DOM005_1,DOM008_1,DOM010_1,DOM012_1,DOM013_1,DOM014_1,DOM015_1,DOM016_1,DOM017_1,DOM018_1,DOM019_1
 =[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]
DOM020_1,DOM021_1,DOM022_1,DOM023_1,DOM024_1,DOM025_1,DOM026_1 = 
[],[],[],[],[],[],[]
NLF001_1,NLF002_1,NLF005_1,NLF006_1,NLF007_1,NLF008_1,NLF009_1,NLF010_1,NLF011_1,NLF012_1,NLF013_1,NLF014_1,NLF015_1,NLM001_1,NLM002_1,NLM003_1,NLM004_1,NLM005_1,NLM006_1,NLM007_1,NLM008_1,NLM009_1,NLM010_1
 =[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]
NLM015_1,NLM016_1,NLM017_1,NLM021_1,NLM022_1,NLM023_1,NLM024_1,NLM025_1,NLM026_1,NLM027_1,NLM028_1,NLM029_1,NLM031_1,NLM032_1,NOF001_1,NOF002_1,NOF004_1,NOF005_1,NOF006_1,NOF007_1,NOF008_1,NOF009_1,NOF010_1
 =[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]
NOF011_1,NOF012_1,NOF013_1,NOF014_1,NOM001_1,NOM002_1,NOM004_1,NOM005_1,NOM007_1,NOM008_1,NOM009_1,NOM010_1,NOM012_1,NOM013_1,NOM015_1,NOM016_1,NOM017_1,NOM018_1,NOM019_1,NOM020_1,NOM022_1,NOM023_1,NOM025_1
 =[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]
NOM026_1,NOM027_1,NOM028_1,NOM029_1 = [],[],[],[]

for i in main_op_list_np:
if i in DLF002: DLF002_1.append('1')
else:DLF002_1.append('0')
if i in DLF004: DLF004_1.append('1')
else:DLF004_1.append('0')
if i in DLF005: DLF005_1.append('1')
else:DLF005_1.append('0')
if i in DLF006: DLF006_1.append('1')
else:DLF006_1.append('0')
if i in DLF007: DLF007_1.append('1')
else:DLF007_1.append('0')
if i in DLF008: DLF008_1.append('1')
else:DLF008_1.append('0')
##   if main_op_list[i] in DLF009: DLF009_1.append('1')
 ##   else:DLF009_1.append('0')
if i in DLF010: DLF010_1.append('1')
else:DLF010_1.append('0')
if i in DLF012: DLF012_1.append('1')
else:DLF012_1.append('0')
if i in DLF013: DLF013_1.append('1')
else:DLF013_1.append('0')
if i in DLF014: DLF014_1.append('1')
else:DLF014_1.append('0')
if i in DLM001: DLM001_1.append('1')
else:DLM001_1.append('0')
if i in DLM002: DLM002_1.append('1')
else:DLM002_1.append('0')
if i in DLM003: DLM003_1.append('1')
else:DLM003_1.append('0')
if i in DLM004: DLM004_1.append('1')
else:DLM004_1.append('0')
if i in DLM005: DLM005_1.append('1')
else:DLM005_1.append('0')
if i in DLM006: DLM006_1.append('1')
else:DLM006_1.append('0')
if i in DL

[Tutor] Huge list comprehension

2017-06-05 Thread syed zaidi

hi,

I would appreciate if you can help me suggesting a quick and efficient strategy 
for comparing multiple lists with one principal list

I have about 125 lists containing about 100,000 numerical entries in each

my principal list contains about 6 million entries.

I want to compare each small list with main list and append yes/no or 0/1 in 
each new list corresponding to each of 125 lists


The program is working but it takes ages to process huge files,
Can someone pleases tell me how can I make this process fast. Right now it 
takes arounf 2 weeks to complete this task


the code I have written and is working is as under:


sample_name = []

main_op_list,principal_list = [],[]
dictionary = {}

with open("C:/Users/INVINCIBLE/Desktop/T2D_ALL_blastout_batch.txt", 'r') as f:
reader = csv.reader(f, dialect = 'excel', delimiter='\t')
list2 = filter(None, reader)
for i in range(len(list2)):
col1 = list2[i][0]
operon = list2[i][1]
main_op_list.append(operon)
col1 = col1.strip().split("_")
sample_name = col1[0]
if dictionary.get(sample_name):
dictionary[sample_name].append(operon)
else:
dictionary[sample_name] = []
dictionary[sample_name].append(operon)
locals().update(dictionary) ## converts dictionary keys to variables
##print DLF004
dict_values = dictionary.values()
dict_keys = dictionary.keys()
print dict_keys
print len(dict_keys)
main_op_list_np = np.array(main_op_list)

DLF002_1,DLF004_1,DLF005_1,DLF006_1,DLF007_1,DLF008_1,DLF009_1,DLF010_1,DLF012_1,DLF013_1,DLF014_1,DLM001_1,DLM002_1,DLM003_1,DLM004_1,DLM005_1,DLM006_1,DLM009_1,DLM011_1,DLM012_1,DLM018_1,DOF002_1,DOF003_1
 =[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]
DOF004_1,DOF006_1,DOF007_1,DOF008_1,DOF009_1,DOF010_1,DOF011_1,DOF012_1,DOF013_1,DOF014_1,DOM001_1,DOM003_1,DOM005_1,DOM008_1,DOM010_1,DOM012_1,DOM013_1,DOM014_1,DOM015_1,DOM016_1,DOM017_1,DOM018_1,DOM019_1
 =[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]
DOM020_1,DOM021_1,DOM022_1,DOM023_1,DOM024_1,DOM025_1,DOM026_1 = 
[],[],[],[],[],[],[]
NLF001_1,NLF002_1,NLF005_1,NLF006_1,NLF007_1,NLF008_1,NLF009_1,NLF010_1,NLF011_1,NLF012_1,NLF013_1,NLF014_1,NLF015_1,NLM001_1,NLM002_1,NLM003_1,NLM004_1,NLM005_1,NLM006_1,NLM007_1,NLM008_1,NLM009_1,NLM010_1
 =[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]
NLM015_1,NLM016_1,NLM017_1,NLM021_1,NLM022_1,NLM023_1,NLM024_1,NLM025_1,NLM026_1,NLM027_1,NLM028_1,NLM029_1,NLM031_1,NLM032_1,NOF001_1,NOF002_1,NOF004_1,NOF005_1,NOF006_1,NOF007_1,NOF008_1,NOF009_1,NOF010_1
 =[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]
NOF011_1,NOF012_1,NOF013_1,NOF014_1,NOM001_1,NOM002_1,NOM004_1,NOM005_1,NOM007_1,NOM008_1,NOM009_1,NOM010_1,NOM012_1,NOM013_1,NOM015_1,NOM016_1,NOM017_1,NOM018_1,NOM019_1,NOM020_1,NOM022_1,NOM023_1,NOM025_1
 =[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]
NOM026_1,NOM027_1,NOM028_1,NOM029_1 = [],[],[],[]


for i in main_op_list_np:
if i in DLF002: DLF002_1.append('1')
else:DLF002_1.append('0')
if i in DLF004: DLF004_1.append('1')
else:DLF004_1.append('0')
if i in DLF005: DLF005_1.append('1')
else:DLF005_1.append('0')
if i in DLF006: DLF006_1.append('1')
else:DLF006_1.append('0')
if i in DLF007: DLF007_1.append('1')
else:DLF007_1.append('0')
if i in DLF008: DLF008_1.append('1')
else:DLF008_1.append('0')
##   if main_op_list[i] in DLF009: DLF009_1.append('1')
 ##   else:DLF009_1.append('0')
if i in DLF010: DLF010_1.append('1')
else:DLF010_1.append('0')
if i in DLF012: DLF012_1.append('1')
else:DLF012_1.append('0')
if i in DLF013: DLF013_1.append('1')
else:DLF013_1.append('0')
if i in DLF014: DLF014_1.append('1')
else:DLF014_1.append('0')
if i in DLM001: DLM001_1.append('1')
else:DLM001_1.append('0')
if i in DLM002: DLM002_1.append('1')
else:DLM002_1.append('0')
if i in DLM003: DLM003_1.append('1')
else:DLM003_1.append('0')
if i in DLM004: DLM004_1.append('1')
else:DLM004_1.append('0')
if i in DLM005: DLM005_1.append('1')
else:DLM005_1.append('0')
if i in DLM006: DLM006_1.append('1')
else:DLM006_1.append('0')
if i in DLM009: DLM009_1.append('1')
else:DLM009_1.append('0')
if i in DLM011: DLM011_1.append('1')
else:DLM011_1.append('0')
if i in DLM012: DLM012_1.append('1')
else:DLM012_1.append('0')
if i in DLM018: DLM018_1.append('1')
else:DLM018_1.append('0')
if i in DOF002: DOF002_1.append('1')
else:DOF002_1.append('0')
if i in DOF003: DOF003_1.append('1')
else:DOF003_1.append('0')
if i in DOF004: DOF004_1.append('1')
else:DOF004_1.append('0')
if i in DOF006: DOF006_1.append('1')
else:DOF006_1.append('0')
if i in DOF007: DOF007_1.append('1')
else:DOF007_1.append('0')
if i in DOF008: DOF008_1.append('1')
else:DOF008_1.append('0')
if i 

Re: [Tutor] FASTA FILE SUB-SEQUENCE EXTRACTION

2016-03-09 Thread syed zaidi
Well,  I know about it but I want to perform the task using python 2.7 since 
the tool I'm trying to develop is in 2.7. For some reason I want to do it 
without biopython please help. ..


Sent from my Samsung Galaxy smartphone.

 Original message 
From: Danny Yoo <danny@gmail.com> Date: 
3/9/2016  08:39  (GMT+08:00) To: syed zaidi 
<syedzaid...@hotmail.co.uk> Cc: Python Tutor Mailing List 
<tutor@python.org> Subject: Re: [Tutor] FASTA FILE SUB-SEQUENCE 
EXTRACTION 

You should probably look into Biopython:

   http://biopython.org/wiki/Main_Page

Your question should involve fairly straightforward use of the Seq methods
of Biopython. You should not try to write your own FASTA parser: Biopython
comes with a good one already.

Note that the tutor mailing list is not entirely general: the questions
here are expected to be beginner-level.  Bioinformatics questions are a bit
out of scope for tutor @python.org, since they involve a specialized domain
that our participants here won't probably be very familiar with.

The last times I participated, the Biopython forums were very active.  You
should check them out:

http://biopython.org/wiki/Mailing_lists

Good luck!
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] FASTA FILE SUB-SEQUENCE EXTRACTION

2016-03-08 Thread syed zaidi
Well, fasta is a file format used by biologists to store biological 
sequencesthe format is as under> sequence information (sequence name, sequence 
length etc)genomic sequence> sequence information (sequence name, sequence 
length etc)genomic sequenceI want to match the name of sequence with another 
list of sequence names and splice the sequence by the provided list of start 
and end sites for each sequenceso the pseudo code could beif line starts with 
'>':match the header name with sequence name:if sequence name 
found:splice from the given start and end positions of that 
sequencethe code I have devised so far is:import oswith 
open('E:/scaftig.sample - Copy.scaftig','r') as f:header = f.readline()
header = header.rstrip(os.linesep)sequence = ''for line in f:
line = line.rstrip('\n')if line[0] == '>':header = 
header[:]print headerif line[0] != '>': 
   sequence+= line 
print sequence, len(sequence)I would appreciate if you can 
helpThanksBest RegardsAli
> Date: Tue, 8 Mar 2016 03:11:42 -0500
> Subject: Re: [Tutor] FASTA FILE SUB-SEQUENCE EXTRACTION
> From: wolfrage8...@gmail.com
> To: syedzaid...@hotmail.co.uk
> 
> What is FASTA? This seems very specific. Do you have any code thus far
> that is failing?
> 
> On Tue, Mar 8, 2016 at 2:33 AM, syed zaidi <syedzaid...@hotmail.co.uk> wrote:
> > Hello all,
> > I am stuck in a problem, I hope someone can help me out. I have a FASTA 
> > file with multiple sequences and another file with the gene coordinates. 
> > SAMPLEFASTA FILE:
> >>EBM_revised_C2034_1  
> >>length=611GCAGCAGTAATATTGCACAATGGGCGCAAGCCTGATGCAGCCATGCCGCGTGTATGAAGAAGGCCTTCGGGTTGTAAAGTACTTTCAGCAGGAAGGGAGTAAAGTTAATACCTTTGCTCATTGACGTTACCCGCAGAAGAAGCACCGGCTAACTCCGTGCCAGCAGCCGCGGTAATACGGAGGGTGCAAGCGTTAATCGGAATTACTGGGCGTAAAGCGCACGCAGGCGGTTTGTTAAGTCAGATGTGAAATGGGCTCAACCTGGGAACTGCATCTGATACTGGCAAGCTTGAGTCTCGTAGAGGTAGAATTCCAGGTGTAGCGGTGAAATGCGTAGAGATCTGGAGGAATACCGGTGGCGAAGGCGGCTGGACGAAGACTGACGCTCAGGTGCGAAAGCGTAGCAAACAGGATTAGATACCCTGGTAGTCCACGCCGTAAACGATGTCGACTTGGAGGTTGTGCCCTTGAGGCGTGGCTTCCGGAGCTAACGCGTTAAGTCGACCGCCTAGTACGGCCGCAAGGTTCTCAAATGAATTGACGCCCGCACAAGCGGTGGAGCATGTGGTTTAATT>EBM_revised_C2104_1
> >>  
> >>length=923TCCGAGGGCGGTGGGATGTTGGTGCTGCAGCGGCTTTCGGATGCGCGGCGGTTGGGTCATCCGGTGTTGGCGGTGGTGGTCGGGTCGGCGGTTAATCAGGATCGTCGAATGGGTTGACCGCGCCTAATGGTCCTTCGCAGCAGCGGGTGGTGCGGGCGGCGTTGGCCAATGCCGGGTTGAGCGCGGCCGAGGTGGATGTGGTGGACATGGGACCGGGACCACGTTGATCCGATTGAGGCTCAGGCGTTGTTGGCCACTTATGGGCAAGATCGGAGCCGGGAGAACCTTTGTGGTTTCGGT
 GAA
> >  
> > GTCGAATATGGGTCATACGCAGGCCGCGGCGTGGCCTGATCAAGATGGTGTTGGCGATGCGCCATGAGCTGTTGCCGGCGACGTTGCACGTGGATGTGCCTAGCCCGCATGTGGATTGGTCGGCGCGGTGGAGTTGTTGACCGCGCCGCGGGTGTGGCCTGCTGGTGCTCGGACGCGTCGTGCGTGTCGTCGTTTGGGATTAGTGGCACTAATGCGCATGTGATTATCGAGGCGGTGCCGGTGGTGCCGCGGCGGGAGGCTGGTTGGGCCCGGTGGTGCCGTGGGTGGTGTCGGCGAAGTCGGAGTCGGCGTTGCGCAGGCGGCTCGGTTGGCCGCGTACGTGCGTGGCGATGATGGCCTCGATGTTGCCGATGTTGGTCGTTGGCGGGTCGTTCGGTGAGCATCGGGCGGTGGTGGTTGGCACCGTGATCGGTTGTTGGCCGGGCTCGATGAGCTGGCGGGTGACCAGTTGGGCGGCTCGGTTGTTCCACGGCGACTGCGGCGGGTAAGACGGTGTTCGTCTTGGCCAAGGCTCCCAATGGCTGGGCATGGGAAT
> > GENE COORD FILEScaf_nameGene_name   DS_St   
> > DS_EnEBM_revised_C2034_1gene1_1 33  99EBM_revised_C2034_1   
> > gene1_1 55  100EBM_revised_C2034_1  gene1_1 111 
> > 150EBM_revised_C2104_1  gene1_1 44  70
> > I want to perform the following steps:compare the scaf_name with the header 
> > of fasta sequenceif header matches then process the sequence and extract 
> > the sequence by the provided start and end positions.
> >
> > I would appreciate if someone can help
> > Thanks
> > Best Regards
> >
> > Ali
> >
> >> ___
> >> Tutor maillist  -  Tutor@python.org
> >> To unsubscribe or change subscription options:
> >> https://mail.python.org/mailman/listinfo/tutor
> >
> > ___
> > Tutor maillist  -  Tutor@python.org
> > To unsubscribe or change subscription options:
> > https://mail.python.org/mailman/listinfo/tutor
  
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


[Tutor] FASTA FILE SUB-SEQUENCE EXTRACTION

2016-03-08 Thread syed zaidi
Hello all,
I am stuck in a problem, I hope someone can help me out. I have a FASTA file 
with multiple sequences and another file with the gene coordinates. SAMPLEFASTA 
FILE:
>EBM_revised_C2034_1  
>length=611GCAGCAGTAATATTGCACAATGGGCGCAAGCCTGATGCAGCCATGCCGCGTGTATGAAGAAGGCCTTCGGGTTGTAAAGTACTTTCAGCAGGAAGGGAGTAAAGTTAATACCTTTGCTCATTGACGTTACCCGCAGAAGAAGCACCGGCTAACTCCGTGCCAGCAGCCGCGGTAATACGGAGGGTGCAAGCGTTAATCGGAATTACTGGGCGTAAAGCGCACGCAGGCGGTTTGTTAAGTCAGATGTGAAATGGGCTCAACCTGGGAACTGCATCTGATACTGGCAAGCTTGAGTCTCGTAGAGGTAGAATTCCAGGTGTAGCGGTGAAATGCGTAGAGATCTGGAGGAATACCGGTGGCGAAGGCGGCTGGACGAAGACTGACGCTCAGGTGCGAAAGCGTAGCAAACAGGATTAGATACCCTGGTAGTCCACGCCGTAAACGATGTCGACTTGGAGGTTGTGCCCTTGAGGCGTGGCTTCCGGAGCTAACGCGTTAAGTCGACCGCCTAGTACGGCCGCAAGGTTCTCAAATGAATTGACGCCCGCACAAGCGGTGGAGCATGTGGTTTAATT>EBM_revised_C2104_1
>  
>length=923TCCGAGGGCGGTGGGATGTTGGTGCTGCAGCGGCTTTCGGATGCGCGGCGGTTGGGTCATCCGGTGTTGGCGGTGGTGGTCGGGTCGGCGGTTAATCAGGATCGTCGAATGGGTTGACCGCGCCTAATGGTCCTTCGCAGCAGCGGGTGGTGCGGGCGGCGTTGGCCAATGCCGGGTTGAGCGCGGCCGAGGTGGATGTGGTGGACATGGGACCGGGACCACGTTGATCCGATTGAGGCTCAGGCGTTGTTGGCCACTTATGGGCAAGATCGGAGCCGGGAGAACCTTTGTGGTTTCGGTGAA
 
GTCGAATATGGGTCATACGCAGGCCGCGGCGTGGCCTGATCAAGATGGTGTTGGCGATGCGCCATGAGCTGTTGCCGGCGACGTTGCACGTGGATGTGCCTAGCCCGCATGTGGATTGGTCGGCGCGGTGGAGTTGTTGACCGCGCCGCGGGTGTGGCCTGCTGGTGCTCGGACGCGTCGTGCGTGTCGTCGTTTGGGATTAGTGGCACTAATGCGCATGTGATTATCGAGGCGGTGCCGGTGGTGCCGCGGCGGGAGGCTGGTTGGGCCCGGTGGTGCCGTGGGTGGTGTCGGCGAAGTCGGAGTCGGCGTTGCGCAGGCGGCTCGGTTGGCCGCGTACGTGCGTGGCGATGATGGCCTCGATGTTGCCGATGTTGGTCGTTGGCGGGTCGTTCGGTGAGCATCGGGCGGTGGTGGTTGGCACCGTGATCGGTTGTTGGCCGGGCTCGATGAGCTGGCGGGTGACCAGTTGGGCGGCTCGGTTGTTCCACGGCGACTGCGGCGGGTAAGACGGTGTTCGTCTTGGCCAAGGCTCCCAATGGCTGGGCATGGGAAT
GENE COORD FILEScaf_nameGene_name   DS_St   
DS_EnEBM_revised_C2034_1gene1_1 33  99EBM_revised_C2034_1   gene1_1 
55  100EBM_revised_C2034_1  gene1_1 111 150EBM_revised_C2104_1  gene1_1 
44  70
I want to perform the following steps:compare the scaf_name with the header of 
fasta sequenceif header matches then process the sequence and extract the 
sequence by the provided start and end positions.

I would appreciate if someone can help
Thanks
Best Regards

Ali

> ___
> Tutor maillist  -  Tutor@python.org
> To unsubscribe or change subscription options:
> https://mail.python.org/mailman/listinfo/tutor
  
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


[Tutor] Consecutive Sequence

2012-10-17 Thread syed zaidi

Hi,I am trying to develop a python code that takes a character string as input 
and  finds for the occurrence of letters that are occurring thrice or more 
consecutively.For E.g.
a = 'ataattaaacagagtgagcagt'In the output I want a list of those 
characters that are occuring thrice or more.
like in this case outout must b out_put = ['t','aaa','']
Can someone please suggest a code for this. 
  ___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


[Tutor] Help with regular expression

2012-04-15 Thread syed zaidi


Dear all
Can someone please tell me how to solve the following problem.  I have 
developed a python code to extract specific information from more than 1000 
files which have slightly different format. The problem I am facing is that I 
have to develop specific RE for each of the file which is very difficult when 
it comes to handle 1000s of files.
can someone please tell me how to solve this problem.
  ___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Help with regular expression

2012-04-15 Thread syed zaidi


Thanks for the help
I need the whole line starting from 'D' but in seperate columns.like KO, EC, 
Gene ID, Enzyme Name etc

 Date: Mon, 16 Apr 2012 00:24:17 +1000
 From: st...@pearwood.info
 To: tutor@python.org
 Subject: Re: [Tutor] Help with regular expression
 
 syed zaidi wrote:
  Dear Steve,Tutor doesn't allow attachment of huge files. I am attaching
  the files I am taking as input, code and the output CSV file. I hope then
  you would be able to help me. DOT keg files open in file viewer, you can
  also view them in python. The CSV file is the desired output file.
 
 
 There is no need to send four files when one will do. Also no need to send a 
 file with multiple thousands of lines long when a dozen or so lines should be 
 sufficient.
 
 It would also help if you told us what the fields in the file should be 
 called. You are probably familiar with them, but we aren't.
 
 Since I don't know what the fields are called, I'm going to just make up some 
 names.
 
 def parse_d_line(line):
  # Expects a line like this:
  # DSBG_0147 aceE; xxx xxx\tK00163 xxx xxx [EC:1.2.4.1]
  a, b = line.split('\t')  # split on tab character
  c, d = a.split(';')
  letter, sbg_code, other_code = c.split()
  compound1 = d.strip()
  words = b.split()
  k_code = words[0]
  ec = words[-1]
  compound2 =  .join(words[1:-1])
  return (letter, sbg_code, other_code, compound1, k_code, compound2, ec)
 
 
 kegfile = open('something.keg')
 # skip lines until a bare exclamation mark
 for line in kegfile:
  if line.strip() == '!':
  break
 
 # analyse D lines only, skipping all others
 for line in kegfile:
  if line.startswith('D'):
  print(parse_d_line(dline))
  elif line.strip() == '!':
  break  # stop processing
 
 
 You will notice I don't use regular expressions in this.
 
  Some people, when confronted with a problem, think I know,
  I'll use regular expressions. Now they have two problems.
  -- Jamie Zawinski
 
 
 
 
 -- 
 Steven
 
 ___
 Tutor maillist  -  Tutor@python.org
 To unsubscribe or change subscription options:
 http://mail.python.org/mailman/listinfo/tutor
  ___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor